WO2023021262A1

WO2023021262A1 - Methods of determining animal phenotypes

Info

Publication number: WO2023021262A1
Application number: PCT/GB2021/052135
Authority: WO
Inventors: Mike Coffey; Scott John DENHOLM
Original assignee: Scotland's Rural College
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2023-02-23

Abstract

The present invention relates to the analysis for mid-infrared spectra obtained from an animals milk to determine an animal phenotype. The invention uses statistically based methods to determine features of phenotypes such as disease state, pregnancy state, methane production and feed intake of animals. The methods involve the use of machine learning models such as neural networks and decision trees in order to predict or determine an animal phenotype allowing an animal owner to make informed decisions based on the animals phenotype.

Description

Methods Of Determining Animal Phenotypes Field of Invention [0001] The present invention relates to methods of determining an animals phenotype, such as disease state or pregnancy state by analysis of infra-red spectra obtained from samples of the animals milk, in particular the use of deep learning techniques of analysis to determine an animals phenotype. Background [0002] Techniques based on infrared spectroscopy in the mid-infrared range (frequencies from 900 cm^-1 to 5000 cm^-1) are already used in the agri-food industry and particularly in the dairy industry to determine the concentration of the major constituents in biological products such as milk, which are, for example, fats, proteins, lactose, but also those of minor components including urea, organic acids, free fatty acids, etc. [0003] To attain optimal herd efficiency, farmers aim for a 365-day calving interval, meaning the cow must be inseminated 80 days postpartum and maintain this pregnancy throughout lactation. The longer it takes to determine that the cow has not maintained the pregnancy, the greater the financial implications. Cows thought to be pregnant and identified late in lactation as being empty are often culled because a subsequent potential pregnancy does not fit the farms calving pattern or justify the prolonged dry period of the cow. Pregnancy diagnosis is routinely carried out by a veterinarian, usually using rectal palpation, approximately 3 weeks after insemination. [0004] Other methods of detecting whether a cow maintained a pregnancy involve the use of ultrasound and blood tests to detect pregnancy-specific protein B (PSPB) carried out around 30 days after the breeding period. Both these methods require a relatively long period of time to have elapsed from insemination or conception to detect the pregnancy state. In addition, blood tests for PSPB cows must be more than 90 days post-calving when tested in order to not lead to false positives (i.e. designated as pregnant) due to residual PDSB being in the cow’s blood stream from a previous pregnancy. [0005] On establishing pregnancy, the cow is assumed to be in calf unless she begins displaying signs of eostrus. Pregnancy diagnosis can also be established from a milk sample by measuring the concentration of progesterone at 24 days, with accuracies of 83.3 and 85%, respectively (Muhammad et al., 2000; Sheldon and Noakes, 2002). Pregnancy has also been shown to affect milk composition. Previous studies looking at phenotype prediction, such as pregnancy status, from milk MIR spectra have mostly focused on using partial least squares (PLS) analysis to develop prediction equations. Deep convolutional neural network (CNN) has been applied to MIR-matched pregnancy data to predict the pregnancy status of dairy cows (Brand et al., 2018). It was observed that milk MIR spectra contained features relating to pregnancy status and underlying metabolic changes in dairy cows, and that such features can be identified using a deep-learning approach. Pregnancy status was defined as a binary trait (i.e., pregnant, not-pregnant) and it was found that CNN significantly improved prediction accuracy, with trained models able to detect 83 and 73% of onsets and losses of pregnancy. [0006] WO2021012011 describes methods of determining the likelihood of cattle being inseminated using statistical analysis of milk infra-red spectra. However, the methods described do not provide information as to the pregnancy status of animal, such as pregnant or not pregnant or whether an animal is still pregnant after insemination. [0007] Tuberculosis (TB), particularly bovine TB (bTB) is a zoonotic disease endemic in the UK and Ireland, and is distributed worldwide. The disease affects animal health and welfare, causing substantial financial strain on the dairy cattle sector worldwide through involuntary culling, animal movement restrictions, and the cost of control and eradication programs (Allen et al., 2010). [0008] The current single intradermal comparative cervical tuberculin (SICCT) skin test has a high specificity (99.98%), indicating a high confidence in results where cows fail the test. Conversely, the sensitivity is not as high (ranges between 52–100%; average of 80%) indicating that not all cows that pass the test are truly bTB-free (i.e., some bTB-infected individuals are missed; de la Rua-Domenech et al., 2006). The current gamma interferon test, a more expensive test used alongside the SICCT test, is known to have a higher sensitivity than the SICCT test (~85–90%), but a lower specificity of 96.6% (Ryan et al., 2000; de la Rua-Domenech et al., 2006). [0009] Moreover, for farmers already involved in routine milk recording, obtaining additional MIR spectra-based herd information requires no extra labour costs or changes in herd management. For milk-recording agencies, this data can be offered as an additional service to dairy farmers for only incremental data-handling costs. [0010] As such it is an aim of the invention to provide a cost effective method of predicting or diagnosing diseases such as tuberculosis. [0011] In the case of phenotypes represented by discrete data (e.g., categorical and binary) the usual methods for developing prediction equations have proved less efficient and resulted in lower accuracy predictions (Toledo-Alvarado et al., 2018; Delhez et al., 2020). Hence, there is a requirement for alternative and novel mathematical and statistical techniques to better use milk MIR spectra. [0012] Since proving the concept of training MIR spectra to predict a categorical (binary) trait using a deep-learning approach (i.e., pregnancy status in dairy cows), the current invention extends the technique to predict other hard-to record phenotypes from MIR spectral data, specifically disease traits such as bTB. [0013] There is a need for improved methods of phenotyping animals based on milk spectral data. [0014] There is a need for improved methods of determining the pregnancy state of animals. [0015] There is a need for improved methods of determining the disease state of animals. [0016] There is a need for improved methods of detecting tuberculosis in cattle. Summary of Invention [0017] An aim of the present invention is to use phenotypic reference data combined with concurrent milk MIR spectral data from routine milk recording, to train deep artificial neural networks to develop a prediction pipeline for disease status. Such a tool would enable prediction of disease status, such as TB status from milk MIR spectral data alone and could be used as an early alert system as part of routine milk recording. [0018] Provided in a first aspect of the invention is a method of predicting or detecting a phenotype in a test animal comprising: detecting one or more features in at least one infrared spectrum obtained from the animal’s milk; wherein the presence or absence of the one or more features in the infrared spectra are indicative of a positive or negative phenotype; determining whether the animal is positive or negative for the phenotype based on the presence or absence of the one or more features; and wherein the phenotype is a disease state. [0019] In some embodiments, the disease state is positive or negative for tuberculosis and/or Paratuberculosis (Johne’s disease). In some embodiments, the disease state is positive or negative for tuberculosis. In some embodiments, the disease state is positive or negative for Paratuberculosis (Johne’s disease). [0020] In some embodiments, the at least one infrared spectrum is a mid-infrared spectra. [0021] In some embodiments, detecting comprises comparing the at least one infrared spectrum to one or more reference infrared spectra. [0022] In some embodiments, the one or more reference infrared spectra comprises: at least one first reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as positive for the phenotype based on phenotype data; and/or at least one second reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as negative for the phenotype based on phenotype data. [0023] In some embodiments, the one or more reference infrared spectra comprises: at least one first reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as positive for the phenotype based on phenotype data. [0024] In some embodiments, the one or more reference infrared spectra comprises: at least one second reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as negative for the phenotype based on phenotype data. [0025] In some embodiments, the phenotype data comprises at least disease data. [0026] In some embodiments, comparing comprises statistical comparison of the at least one infrared spectrum to the one or more reference infrared spectra. [0027] In some embodiments, the one or more features are detected and the animal being positive or negative for the phenotype is determined by a trained machine learning model. [0028] In some embodiments, the one or more features are determined by partial least squares regression, including partial least squares discriminant analysis (PLS-DA), C4.5 decision trees, naive Bayes, Bayesian network, logistic regression, support vector machine, random forest, rotation forest, a decision tree and/or a learned convolutional neural network. [0029] In some embodiments, the method further comprises modifying the at least one infrared spectrum to create a modified infrared spectra prior to determining. [0030] In some embodiments, modifying comprises transforming, standardising, granulating and/or converting. [0031] In some embodiments, the at least one first reference infrared spectra and/or the at least one second reference infrared spectra are modified. For example by transforming, standardising, granulating and/or converting the at least one first reference infrared spectra and/or the at least one second reference infrared spectra [0032] In some embodiments, the at least one infrared spectra comprises two combined infrared spectra. Optionally each infrared spectra are obtained at different time points. [0033] In some embodiments, the at least one first reference infrared spectra comprises two combined infrared spectra. Optionally each infrared spectra are obtained at different time points. [0034] In some embodiments the at least one second reference infrared spectra comprises two combined infrared spectra. Optionally each infrared spectra are obtained at different time points. [0035] In a second aspect of the invention there is provided a computer-implemented machine learning method for prediction or detection of an animal phenotype comprising: receiving a first training set comprising labelled infrared spectra, the labelled infrared spectra comprising a plurality of infrared spectra obtained from milk of a plurality of animals and corresponding phenotype data for each infrared spectra, wherein each infrared spectra is labelled as negative or positive for the phenotype based on the phenotype data; and training a machine learning model using the labelled infrared spectra in order to detect whether the phenotype of a test infrared spectra is positive or negative based on one or more features of the test infrared spectra. [0036] In some embodiments, the machine learning model is a neural network or a decision tree. In some embodiments, the machine learning model is a neural network. In some embodiments, the machine learning model is a decision tree. [0037] In some embodiments, the phenotype is a disease state, [0038] In some embodiments, the disease state is positive or negative for tuberculosis and/or Paratuberculosis (Johne’s disease). [0039] In some embodiments, the disease state is positive or negative for tuberculosis. [0040] In some embodiments, the disease state is positive or negative for Paratuberculosis (Johne’s disease). [0041] In some embodiments, wherein the phenotype data comprises at least disease data. [0042] In some embodiments, the infrared spectra labelled as positive comprises animals having at least one of a positive skin-test result, a positive observation of lesions and/or a positive culture status. In some embodiments, the infrared spectra labelled as positive comprises animals having a positive skin-test result. In some embodiments, the infrared spectra labelled as positive comprises animals having a positive observation of lesions. In some embodiments, the infrared spectra labelled as positive comprises animals having a positive culture status. [0043] In some embodiments, the infrared spectra labelled as negative comprises animals having a negative skin-test result, a negative observation of lesions and a negative culture status. [0044] In some embodiments, the phenotype is a pregnancy status, [0045] In some embodiments, the phenotype is change in pregnancy status. [0046] In some embodiments, the phenotype data comprises parturition data; and/or insemination data. In some embodiments, the phenotype data comprises parturition data. In some embodiments, the phenotype data comprises insemination data. In some embodiments, the phenotype data comprises parturition data and insemination data. [0047] In some embodiments, wherein the phenotype is pregnancy status, the infrared spectra labelled as negative comprises animals between parturition and first insemination. [0048] In some embodiments, wherein the phenotype is pregnancy status, the infrared spectra labelled as positive comprises animals between the last insemination and subsequent calving with a gestation length between about 228 and about 296 days. [0049] In some embodiments, the phenotype is methane production [0050] In some embodiments, wherein the phenotype is methane production, the phenotype data comprises methane emissions data, feed composition, and/or feed intake for each animal. [0051] In some embodiments, the phenotype is feed intake. [0052] In some embodiments, wherein the phenotype is feed intake, the phenotype data comprises net energy intake, dry matter intake, concentration of milk components (such as fat, protein and/or lactose), milk yield and/or body weight [0053] In some embodiments, the method further comprises modifying each infrared spectra to create modified infrared spectra prior to creating the first training set. [0054] In some embodiments, modifying comprises transforming, standardising, granulating and/or converting. [0055] In some embodiments, the method further comprises synthesising labelled artificial milk spectra data. [0056] In some embodiments, the labelled artificial milk spectra data is modified and included in the first training set. In some embodiments, modifying comprises transforming, standardising, granulating and/or converting. [0057] In some embodiments, synthesising comprises randomly selecting a minority instance, A, finding its k-nearest neighbours, and drawing a line segment in the feature space between A and a random neighbour and synthetically generating instances on the line. [0058] In some embodiments, wherein machine learning model is a neural network, the neural network is trained for a number of epochs determined by an early stopper. [0059] In some embodiments, wherein machine learning model is a neural network, the neural network is a convolutional neural network. [0060] In a third aspect of the invention there is provided, a computer-implemented machine learning method for prediction or detection of pregnancy status of an animal comprising: receiving a first training set comprising labelled infrared spectra, the labelled infrared spectra comprising a plurality of infrared spectra obtained from milk of a plurality of animals and corresponding pregnancy data for each infrared spectra, wherein each infrared spectra is labelled as negative or positive for pregnancy based on the pregnancy data; training a machine learning model using the set of labelled infrared spectra in order to detect whether the phenotype of a test infrared spectra is positive or negative based on one or more features of the test infrared spectra; wherein the infrared spectra labelled as negative (not pregnant) comprises animals between parturition and first insemination; and the infrared spectra labelled as positive comprises animals between the last insemination and subsequent calving with a gestation length between about 240 and about 284 days. [0061] In some embodiments, the machine learning model is a neural network or a decision tree. In some embodiments, the machine learning model is a neural network. In some embodiments, the machine learning model is a decision tree. [0062] In some embodiments, wherein the machine learning model is a neural network, the machine learning model is a convolutional neural network. [0063] In some embodiments, the method further comprises modifying each infrared spectra to create modified infrared spectra prior to creating the first training set. [0064] In some embodiments, modifying comprises transforming, standardising, granulating and/or converting. [0065] In some embodiments, each infrared spectra comprises two combined infrared spectra obtained at different time points. Optionally pregnancy status comprises change in pregnancy status between the two time points. [0066] In a fourth aspect of the invention there is provided, a method of predicting or determining a phenotype of an animal using a trained machine learning model, the method comprising; receiving an infrared spectra obtained from the animal’s milk; mapping the infrared spectra to a positive or negative phenotype; and providing an output comprising the animal’s phenotype. [0067] In some embodiments, the trained machine learning model is a trained neural network or a trained decision tree. [0068] In some embodiments, the trained machine learning model is trained according to the second or third aspects of the invention described herein. [0069] In some embodiments of the first aspect, the at least one infrared spectra according are mid-infrared spectra. [0070] In some embodiments of the first aspect, the one or more reference infrared spectra are mid-infrared spectra. [0071] In some embodiments of the second and third aspects, the infrared spectra are mid- infrared spectra. [0072] In some embodiments of any of the aspects described above, the animal is a milk producing mammal, optionally wherein the animal is a bovine. [0073] In some embodiments of any of the aspects described above, the one or more features comprise waveforms and/or wavelength values. [0074] In a fifth aspect of the invention there is provided, computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of any of the methods described herein. [0075] In another aspect of the invention there is provided, a method of predicting or detecting a phenotype in a test animal comprising: detecting one or more features in at least one infrared spectrum obtained from the animal’s milk; wherein the presence or absence of the one or more features in the infrared spectra are indicative of a positive or negative phenotype, wherein the phenotype is a disease state; determining whether the animal is positive or negative for the phenotype based on the presence or absence of the one or more features; and responsive to determining whether the animal is positive or negative for the phenotype, providing a treatment to the animal. [0076] In another aspect of the invention there is provided, method of predicting or determining a phenotype of an animal using a trained machine learning model, the method comprising; receiving an infrared spectra obtained from the animal’s milk; mapping the infrared spectra to a positive or negative phenotype; providing an output comprising the animal’s phenotype; and responsive to output comprising the animal’s phenotype, providing a treatment to the animal. [0077] In some embodiments, wherein the phenotype is a disease state the treatment comprises at least one of administering a drug to the animal, euthanizing the animal, and/or isolating the animal. [0078] In In some embodiments, wherein the phenotype is a disease state the method may further comprise isolating the animal and identifying and isolating all other animals that have been in contact with the animal. [0079] In some embodiments, wherein the phenotype is pregnancy status the treatment may be selected from at least one of administering a drug to the animal, inseminating the animal, euthanizing the animal, separating the animal from a population of pregnant animals to a population of non-pregnant animals, and/or isolating the animal. [0080] In some embodiments, wherein the phenotype is methane production the treatment may be selected from at least one of administering a drug to the animal, altering the animal’s feed composition, altering the animal’s feed amount, euthanizing the animal and/or isolating the animal. [0081] In some embodiments, wherein the phenotype is feed intake the treatment may be selected from at least one of administering a drug to the animal, altering the animal’s feed composition, altering the animal’s feed amount, euthanizing the animal and/or isolating the animal. [0082] Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to”, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. [0083] Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise. [0084] Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. [0085] Various aspects of the invention are described in further detail below. Brief Description of Figures [0086] Figure 1 shows visual interpretation of how data are transformed through different components in a genetic algorithm when applied for feature selection (GA = genetic algorithm; Ind = individual; Wave point = location within the MIR spectrum of 1060 wave points); [0087] Figure 2 shows a plot of validation accuracy and loss during training of Model 1 (Accuracy = (TP + TN) / (TP + TN + FP + FN); Loss = - Σ^C ti log(f(s)i)); where TP, TN, FP, and FN represent total numbers of true positive, true negative, false positive, and false negative predictions, respectively; and, s(xi) = e xi / Σj e xi); [0088] Figure 3 shows lot of training metrics for Model 2 for accuracy, training loss and validation loss. The dashed vertical line distinguishes the second phase of training from the first (Accuracy = (TP + TN) / (TP + TN + FP + FN); Loss = - Σ^C ti log(f(s)i); where TP, TN, FP, and FN represent total numbers of true positive, true negative, false positive, and false negative predictions, respectively; and s(xi) = e xi / Σj e xi); [0089] Figure 4 shows an example of a spectral record represented as a greyscale image. Mid infrared spectral images were created by reshaping spectral records from an array of size 1060×1 to an array of size 53×20. Each wavelength value in the reshaped array was normalized (in the range 0 to 1), multiplying by 255 to represent the wavelength values as grey scale pixels, and saved as a PNG image. Images filenames were generated using label, animal, and sample information. These spectral images were then used as features in training the deep neural networks; and [0090] Figure 5 shows a schema for MIR Alerts for bTB Flow diagram showing current bTB cattle control measures (A) and hypothesised control measures resulting from the MIR- based tool (B). At present once bTB is disclosed the herd is put under restriction and subjected to skin tests every 60 days until two sequential test periods result in no reactors. The total length of a breakdown is therefore 60×(n-1) days and due to the nature of bTB (infectious, chronic and slowly progressive) one breakdown has the potential to last for months or even years. Figure B shows the potential of the MIR prediction pipeline to alert the farmer to cows that will fail the skin test, allowing these “alerted” cows to be removed from the herd earlier and thus reducing the spread of bTB in the herd. This offers the potential to significantly reduce the number of days for a restricted herd to regain OTF status, e.g., from 60×(n-1) days to 60×(m-1) days, where m<n. Detailed Description [0091] The method described herein involve the analysis of infra-red spectral data of milk obtained from an animal. The animal may be any animal which produces milk, and as such milk can be collected for analysis. [0092] In particular the animal is a milk producing mammal. For example, the animal may be a bovine, a porcine, a caprine, a camillid, an ovine, a donkey or horse. In certain examples the animal is a cow. In other examples, the animal is a pig. In other examples, the animal is a goat. [0093] Methods of obtaining milk (or milking) are well known. Such methods include hand milking methods such as stripping, knuckling, fisting and/or full hand milking. Milking methods also include machine-based milking methods. [0094] The methods described herein relate to detecting phenotypes in animals. Determination of phenotypes may also be referred to as “phenotyping”. The term "phenotype" refers to one or a number of total biological characteristics that define an animal under a particular set of environmental conditions and factors, regardless of the actual genotype. [0095] Phenotypes that may be determined or predicted include pregnancy status, disease state, methane production and/or feed intake, body energy state. [0096] Pregnancy status refers to detection of whether an animal is pregnant or not. As such, the methods described herein may allow for detection of whether an animal has been pregnant, is pregnant and/or was pregnant but subsequentially is no longer pregnant. As such, the methods described herein may be used to detect a change in pregnancy status. For example, a change from not pregnant to pregnant may be detected (i.e. whether successful insemination has occurred). A pregnant animal may be referred to as a “closed” animal. For example, a change from pregnant to not pregnant may be detected (i.e. detection of loss of pregnancy). An animal which is not pregnant may be referred to as an “open” animal. [0097] In certain examples, the phenotype may be pregnancy loss. For example, predicting or determining whether an animal, such as a cow, has been pregnant and is no longer pregnant (i.e. has lost a pregnancy). [0098] As an example, the methods described herein may be used to detect whether an animal, for example a cow, that has been inseminated is pregnant or whether the cow has lost the pregnancy, that is to say the animal has not maintained its pregnancy. [0099] Cows have a gestation period of around 283 days. Pregnancy loss may be defined by the stage at which the pregnancy loss occurs such as early embryonic death, abortion or stillbirth. The conceptus is an embryo during the first 42 days; after that, all major organs and body systems are formed and the embryo becomes a foetus. If pregnancy loss occurs before 42 days, it’s termed early embryonic death. If loss occurs after 42 days, it is considered an abortion. Abortion involves expulsion of a dead foetus, or a living one incapable of maintaining life outside the uterus. Stillbirth is a full-term calf that is dead at birth. The methods described herein may be used to detect early embryonic death and/or abortion. The methods described herein may be used to detect early embryonic death. The methods described herein may be used to detect abortion. [00100] The term conceptus refers to an embryo and its adnexa or associated membranes. The conceptus includes all structures that develop from the zygote, both embryonic and extraembryonic. It includes the embryo as well as the embryonic part of the placenta and its associated membranes: amnion, chorion, and yolk sac. [00101] Without being bound by theory, the methods described herein may be capable of detecting changes in an animal’s milk when pregnant or not pregnant. [00102] The methods described herein may be able to detect or predict pregnancy status from at least 16 days after insemination. The methods described herein may be able to detect or predict loss of pregnancy from 21 days after successful insemination to subsequent calving. [00103] Another phenotype that may be detected by the methods described herein is disease state. Disease state refers to determination or prediction as to whether an animal has a certain disease or not. As such, the methods described herein may be used for detecting or predicting if an animal has disease or detecting or predicting if an animal does not have a disease. [00104] Diseases that may be detected by the methods described herein include tuberculosis; paratuberculosis (also known as Johne's disease); infectious pustular vulvovaginitis; viral diarrhoea; bluetongue; brucellosis; foot and mouth disease; schmallenberg virus; vaginitis; vulvitis; granular vulvovaginitis; necrotic vaginitis; infectious rhinotracheitis; and/or transmissible spongiform encephalopathies (TSE). [00105] Tuberculosis, for example bovine tuberculosis (bTB) is an infectious disease of cattle caused by the bacterium Mycobacterium bovis (M. bovis) which can also infect and cause disease in other mammals including humans, deer, goats, pigs, cats, dogs and badgers. In cattle, it is mainly a respiratory disease but clinical signs are rare. TB in humans can be caused by both Mycobacterium bovis and the human form, Mycobacterium tuberculosis. Evidence of bovine TB is most commonly found in the lymph glands of the throat and lungs of affected animals. This means that the bacteria, which cause the disease, are mainly passed out of the infected animal’s body in its breath or in discharges from the nose or mouth. Infection is mainly through inhalation or ingestion of the bacteria. Contaminated food and water can also be a source of infection. Cattle can spread this disease to other cattle: directly via respiratory route; directly via infected milk; directly before birth through the placenta; and/or indirectly via environmental contamination. The method of diagnosis in cattle is commonly the Single Intradermal Comparative Cervical Tuberculin (SICCT) test also referred to as the “TB skin test”. On Day 1 of the test, two sites are clipped on the neck of the animal. The skin thicknesses at both sites are measured and recorded. Two types of tuberculin, one made from killed M. bovis and the other from killed Mycobacterium avium, are injected under the outer layer of the skin of the neck (i.e. into the dermis) at the ‘bovine site’ and the ‘avian site’ respectively. On Day 4 of the test, the skin reactions to the two types of tuberculin are measured and compared. When the bovine site reaction exceeds the avian site reaction by more than 4 mm, the animal is declared a reactor under standard interpretation. When the bovine site reaction measures 1-4 mm more than the avian site reaction, the animal is declared an inconclusive reactor under standard interpretation. Interferon Gamma (IFNG) testing may also be used. IFNG testing involves a blood test and may be used in conjunction with the TB Skin Test. Animals which have been exposed to M. bovis can respond to IFNG before they will respond to the TB skin test. The IFNG will also sometimes identify TB-infected animals which do not respond to the skin test. The IFNG has a higher sensitivity (it will miss less TB-infected animals) but lower specificity (it may wrongly classify more non-infected animals as diseased) compared to the skin test. bTB may also be detected by examination of tissue from animals sent to slaughter. The tissue is examined for lesions. When lesions are seen, the animal is said to have had a “Lesion at Routine Slaughter” or LRS. Detection of bTB may also be done or confirmed by histological analysis of animal tissue and/or by bacteriological methods such as culturing. Culturing involves the growth of any bacteria from lesions detected in the tissue and identification of the bacteria grown. If mycobacterium bovis is detected in the cultures grown the animal is considered positive for bTB. Diagnosis may also be done by the use of antibody testing. Antibody testing involves the use of antibodies against Mycobacterium Bovis to detect Mycobacterium Bovis in the blood of an animal. [00106] Johne‘s disease is an infectious wasting condition of cattle and other ruminants caused by Mycobacterium avium subspecies paratuberculosis (commonly known as Map). Johne‘s disease progressively damages the intestines of affected animals, and in cattle this results in profuse and persistent diarrhoea, severe weight loss, loss of condition and infertility. Affected animals eventually and inevitably die. In dairy herds, the presence of Johne’s disease can significantly reduce milk yields. Diseased animals in general pass large numbers of Map in their faeces (dung). Calves may be infected in the womb, through drinking contaminated colostrum, through ingesting dung that may be present on unclean teats, through contaminated feed, and through contaminated environment or water supplies. Other animals, particularly deer, sheep, goats and South American camelids such as llamas and Alpaca, can carry Map and pass it in their dung. To confirm whether an animal has Johne’s disease blood tests may be carried out. Blood tests may not detect all infected animals, but at this stage are more likely to identify infection than tests for the organism itself. Signs of the disease are rarely seen before two to three years of age. Generally, there is a period of reduced milk output or fertility before the animals begin to show signs of advanced disease. Signs of advanced disease include persistent and profuse diarrhoea and significant weight loss and are seen most commonly in animals at three to five years of age. After the disease has developed the diagnosis can usually be confirmed microscopically from a dung sample. [00107] Tuberculosis may be treated by the administration of drugs to the animal, such as antibiotics. In some examples, animals with tuberculosis may be euthanized. In some examples, animals with tuberculosis may be isolated. Isolation may help prevent spread of tuberculosis to other animals that may be in close proximity to an animal with tuberculosis. In some cases, all animals that have come into contact with an animal having tuberculosis may be isolated and restricted from being moved. Such a process may be referred to as a herd breakdown. By preventing movement of animals that have come into contact with an animal with tuberculosis spread of tuberculosis to other animals may be prevented. [00108] Infectious pustular vulvovaginitis of cows is caused by bovine herpesvirus 1 and is transmitted via natural breeding, nasogenital contact, or by insect vectors such as flies. Affected cows show signs of vaginal discomfort (e.g., raised tail, frequent urination) and have numerous, round, white, raised lesions of the vestibular mucosa. Within a short time, these lesions progress to pustules and erosions or ulcers. Mucopurulent vaginal discharge may be prominent, even in pregnant animals in which pregnancy is uninterrupted. Histologic lesions consist of necrosis of vestibular and vaginal epithelium, with intranuclear inclusion bodies typical of herpesvirus infection. When the virus is transmitted in the semen, infected bulls may have similar lesions of the penis and prepuce. Intrauterine inoculation of the virus produces necrotizing endometritis and cervicitis. [00109] Animals with infectious pustular vulvovaginitis may be euthanized. In some cases, animals may be administered drugs, such as a vaccine after diagnosis that may reduce severity of the symptoms. [00110] Foot-and-mouth disease is a viral disease of cloven hoofed animals including pigs, cattle, water buffalo, sheep, goats and deer. It is endemic throughout many parts of the world but much of Europe, North America, Australia and New Zealand are free of disease. There are seven major virus types (serotypes) with a large number of subtypes (usually named with letters). Cattle typically show more severe disease than other animals. The incubation period is 2-10 days. Initially, one or two cattle present as diseased with fever (>40.0°C), depression, loss of appetite, marked drop in milk yield and salivation. When housed or closely confined, other cattle in the group will show clinical signs over the next 24- 48 hours. The commonly used detection method is a serological test, ELISA is a diagnosis method which is commonly used for detecting FMDV infection at present, and compared with a knot supplementing test, a neutralization test, an indirect hemagglutination inhibition test and an immunodiffusion precipitation test. [00111] There is no specific treatment for FMD. Administration of antibiotics may be used to control secondary bacterial infection of ulcers. In most cases animals are euthanized. Animals with FMD may also be isolated and/or prevented from being moved from their location in order to prevent further spread of the disease. [00112] Viral diarrhoea, such as bovine viral diarrhoea (BVD) or mucosal disease is an infectious disease caused by viruses, and cattle of various ages are susceptible to infection, and the susceptibility is highest among young cattle. Exposure to the virus can be diagnosed by serology. Active infection can be diagnosed by PCR based tests. Many different types of samples can be used for BVDV testing including blood, hair plucks and skin biopsies. [00113] Animals with BVD are normally treated by providing supportive therapies such as increased food and water. In some cases, such as when the animal is an infant, the animal may be euthanized. [00114] Brucellosis is a highly contagious disease of cattle caused by the bacterium Brucella which spreads as result of animals coming into contact with infected female cattle, aborted foetuses or discharged placental tissues and fluids. The disease can lead to an abortion storm in infected females and if it becomes established can lead to decreased milk yields, infertility, weak calves and serious financial loss. The disease is particularly dangerous to humans who come into contact with infected animals or material. Many serological techniques are available for the diagnosis of brucella, and among them, the most widely used are the serum agglutination test and the complement fixation test. Other serum methods include, for example, the tiger red plate agglutination test and the antiglobulin test. Generally, biological samples are taken from the uterus, vaginal secretions, blood, visceral organs of aborted dams, and aborted foetuses' stomach contents, liver, spleen, lymph nodes, blood for microbiological examination. [00115] Brucellosis may be treated by drug therapies such as antibiotics. In most cases animals with brucellosis are euthanized. Animals with brucellosis may also be isolated and/or prevented from being moved from their location in order to prevent further spread of the disease. [00116] Bluetongue is an insect-borne viral disease to which all species of ruminants are susceptible, in particular sheep. There are at least 24 different varieties (serotypes) of the bluetongue virus (BTV). The disease is caused by a virus which is transmitted by certain species of biting midges, such as the Culicoides species. Although sheep are most severely affected. Cattle and goats which appear healthy can carry high levels of the virus and provide a source of further infection. In sheep the clinical signs include: fever; swelling of the head and neck; lameness; inflammation and ulceration of the mucous membrane of the mouth, nose and eyes; drooling; haemorrhages in the skin and other tissues; respiratory problems, such as froth in the lungs and an inability to swallow; high mortality rate; discoloration and swelling of the tongue. Although Bluetongue usually causes no apparent illness in cattle or goats (it is possible that cattle will show no signs of illness) however clinical signs have included: nasal discharge; swelling and ulceration of the mouth; swollen teats. In all cases, animals can be infected with bluetongue before birth if the mother is infected while pregnant. Signs of infection include: newborn animals born small, weak, deformed or blind; death of newborns within a few days of birth; abortions/stillbirths. [00117] Bluetongue may be treated by the use of drug therapies such as antibiotics in order to reduce severity of secondary infections. Animals may also be treated with vaccines after diagnosis in order to reduce symptoms. Animals with Bluetongue may be euthanized. [00118] Schmallenberg virus can affect all ruminant species and has been particularly evident in cattle and sheep populations. The virus itself gives rise to only mild symptoms in cattle which are transient including fever, drop in milk yield and sometimes diarrhoea. In sheep few if any signs are exhibited. If ruminant animals should become infected when pregnant, it can lead to abortion or malformations in the foetus. [00119] There is no treatment available for Schmallenberg virus. As such, animals with Schmallenberg virus are euthanized. [00120] Infectious rhinotracheitis, such as infectious bovine rhinotracheitis (BV), is a highly contagious and infectious viral disease that affects cattle. Infection occurs by inhalation and requires contact between animals. The disease is characterised by inflammation of the upper respiratory tract. The virus that causes IBR, Bovine herpes virus 1 (BHV 1) also causes infectious pustular vulvovaginitis in the female, and infectious balanoposthitis in the male and can cause abortions and foetal deformities. Infected cattle develop a latent infection once recovered from the initial infection and despite appearing clinically normal may suffer recrudescence of disease when under stress. Diagnosis of IBR infection is via serology (blood samples) for latent infections or direct detection of the virus (PCR or fluorescent antibody tests on ocular or nasal secretions) for active infections. [00121] There is no specific treatment for IBR, secondary bacterial infections can be managed with antibiotics and animals with a high fever treated with non-steroidal anti- inflammatories. Preventative vaccination of the remaining herd members may aid in minimising disease spread. Animals with IBR may be euthanized. [00122] Transmissible spongiform encephalopathies (TSEs) or prion diseases are a family of rare progressive neurodegenerative brain disorders that affect both humans and animals. They have long incubation periods, progress rapidly once symptoms develop and are always fatal. Initial clinical signs are subtle and behavioral in nature. The spectrum increases and progresses over weeks to months, with most animals reaching a terminal state by 3 months after clinical onset. Commonly observed clinical signs include hyperesthesia, nervousness, difficulty negotiating obstacles, reluctance to be milked, aggression toward either farm personnel or other animals, low head carriage, hypermetria, ataxia, and tremors. Weight loss and decreased milk production are common. Yet, in a large portion of affected animals, clinical signs may be nonspecific, and involvement of the nervous system is not obvious in every case. Confirmatory diagnostic methods include PrPd immunohistochemistry and Western immunoblot in brain tissue. [00123] There is no treatment available for TSEs. As such, animals with TSEs are euthanized. [00124] In particular, the disease may be tuberculosis and/or paratuberculosis. In certain examples the disease is tuberculosis. In certain examples the disease is paratuberculosis. In certain examples the disease is bovine viral diarrhoea (BVD). In certain examples the disease is vaginitis. In certain examples the disease is vulvitis. In certain examples the disease is infectious pustular vulvovaginitis. [00125] Another phenotype that may be determined is methane production. Methane (CH₄) production refers to the methane emissions of individual animals. The disadvantages of existing methods for CH₄ measurement are the financial and human costs and/or the practical difficulties of these methods when they are used directly in the field. [00126] Methane emissions may be determined using methods such as respiration chamber methods, wherein the concentration of methane is measured at the air inlet and outlet vents of the chamber. The difference between outlet and inlet concentrations is multiplied by airflow to indicate methane emissions rate. In most installations, a single gas analyser is used to measure both inlet and outlet concentrations, often for two or more chambers. This involves switching the analyser between sampling points at set intervals, so concentrations are actually measured for only a fraction of the day. [00127] Another method that may be used to measure methane emissions is SF6 based techniques. SF6 tracer gas technique was developed in an attempt to measure methane emissions by animals without confinement in respiration chambers. Air is sampled near the animal’s nostrils through a tube attached to a halter and connected to an evacuated canister worn around the animal’s neck or on its back. A capillary tube or orifice plate is used to restrict airflow through the tube so that the canister is between 50 and 70% full after approximately 24 hours. A permeation tube containing SF6 is placed into the rumen of each animal. The pre-determined release rate of SF6 is multiplied by the ratio of methane to SF6 concentrations in the canister to calculate methane emission rate. [00128] Another method that may be used is breath sampling during milking and feeding. In such methods, air is sampled near the animal’s nostrils through a tube fixed in a feed bin and connected directly to a gas analyser. The feed bin might be in an automatic milking station or in a concentrate feeding station. Different gas analysers may be used for example, nondispersive Infrared (NDIR), Fourier-transform infrared (FTIR) or photoacoustic infrared (PAIR) and different sampling intervals may be used, for example 1, 5, 20 or 90–120 seconds. Methane concentration during a sampling visit of typically between 3 and 10 minutes may be specified as the overall mean, or the mean of eructation peaks. CO₂ may be used as a tracer gas and used to calculate daily methane output according to ratio of methane to CO₂. [00129] Understanding the levels of methane production from an animal may help determine and therefore reduce the environmental impact the animal may have on the environment. Methane production levels of an animal may be altered, such as lowered by altering factors that affect methane production, such as level of feed intake, type of feeds and/or quality of feeds. [00130] Another phenotype that may be detected is feed intake. Predictions of feed intake may be a cost-effective strategy for generating data for management purposes as well as for inclusion in a breeding program. [00131] Feed intake may include prediction or determination of dry matter intake by an animal over a time period and/or net energy intake over a time period. [00132] Dry matter intake (DMI) refers to the amount of feed an animal consumes per day on a moisture-free basis. Dry matter intake may be determined using any suitable method. For example, by predicting dry matter demand. Dry matter demand is the expected dry matter intake for a class of animal. Dry matter demand is generally based on class of animal, stage of life and production (for example, lactating, reproductive status, or growth stage) and body weight. Dry matter demand may be determined by using predicted dry matter intake (DMI) values from reference sources such as nutrient requirement tables and other previously published data relating to the animal being studied. Dry matter demand may be determined from percentage body weight value. For example, depending on the quality of diet, breed and size of the animal, and energy expenditure of the animal (amount of milk produced and distance walked), a mature beef cow, for example, may consume 1-3% of her body weight, while a mature dairy cow may consume 2.5-4.5% of her body weight. As such, dry matter demand may be calculated using the following formula: Dry Matter Demand = Body Weight x (DMI % Body Weight Value/100) [00133] The DMI from feed sources other than forage grazed from pasture (for example, hay and grain) may be determined by using the percent dry matter of the feed source to convert the mass of feed consumed on an as-fed basis to a dry matter basis. [00134] Feed composition information, including dry matter content, is available from the following example references/resources: Composition of Feeds, Nutrient Requirements of Domestic Animals Series, NRC (links above); United States-Canadian Tables of Feed Composition: Nutritional Data for the United States and Canadian Feeds, Third Revision (http://www.nap.edu/catalog.php?record_id=1713); Beef Magazine’s 2009 Feed Composition Tables – http://beefmagazine.com/nutrition/feedcomposition-tables/0301-feed- comp; Feed Composition Library, Dairy One (http://www.dairyone.com/Forage/FeedComp/); Feed Library, The Samuel Roberts Noble Foundation; (http://www.noble.org/Ag/FeedLib/Index.aspx). [00135] General assumptions for the percent dry matter may also be used: Grain = 89% dry matter Dry hay = 90% dry matter Grain Silage = 25-35% dry matter Haylage/Baleage = 35-60% dry matter. [00136] The mass of feed fed to an animal is then multiplied by the percentage of dry matter in the feed. [00137] Dry matter intake from pasture may be calculated by subtracting dry matter intake from other feed sources from the dry matter demand or may be from pasture by field measurements. [00138] Daily energy intake or net energy intake, may be calculated based on the NorFor evaluation system as described in “Wallén, S.E., E. Prestløkken, T.H.E. Meuwissen, S. McParland, and D.P. Berry.2018. Milk mid-infrared spectral data as a tool to predict feed intake in lactating Norwegian Red dairy cows. J. Dairy Sci.1–12.”. Net energy intake may be calculated from the DMI of silage and concentrate separately. Net energy intake for silage (MJ/kg of DM) may be calculated based on the chemical composition of the feed using standard feed values in NorFor. [00139] Understanding the feed intake of an animal may help reduce the costs associated with rearing the animal. For example, for an animal with a higher than average feed intake than has been determined or predicted to be necessary may lead to excessive feed costs. As such the amount of fed given to an animal may be reduced or type of feed altered. [00140] The methods described herein include detecting a phenotype based on the presence or absence of features in infrared spectra obtained from an animal’s milk. The features that may be indicative of a positive or negative (better or worse) phenotype may be determined by any statistical, mathematical or machine learning based methods described herein. [00141] It should be understood that the definition of phenotype states as positive or negative (i.e. diseased or not diseased respectively) as used herein is provided by way of example only. It will be understood that a phenotypic state, such as diseased or pregnant, may be labelled as negative or positive as long as the other phenotypic states is defined using the opposing definition. The defining of phenotypic states may be applied vice versa, or the opposite definitions as provided herein as the definition does not affect the final outcome of the methods described herein. As such, for pregnancy state, pregnant may be labelled as a negative phenotype and not pregnant may be labelled as a positive phenotype or pregnant may be labelled as a positive phenotype and not pregnant may be labelled as a negative phenotype. For disease state, diseased may be labelled as a negative phenotype and not diseased may be labelled as a positive phenotype or diseased may be labelled as a positive phenotype and not diseased may be labelled as a negative phenotype. [00142] Certain methods of detecting a phenotype may include comparison of infrared spectra obtained from the milk of a test animal to the infrared spectra of milk obtained from one or more animals that have been determined to be negative or positive for the phenotype (i.e. reference animals or spectra). [00143] Milk spectra from one or more reference animals (one or more reference spectra) may be labelled, designated or assigned to be negative or positive for the phenotype being predicted or determined. In order to allow direct comparison to a spectrum being tested (i.e. undergoing predication or determination of the phenotype) the reference spectra should be obtained from one or more reference animals of the same species as the test animal. [00144] The reference spectra may include at least one spectra that has been labelled as negative for the phenotype being predicted or determined. The reference spectra may include at least one spectra that has been labelled as positive for the phenotype being predicted or determined. [00145] The labelling of the one or more reference spectra may be based on data relating to the phenotype being predicted or determined. The data may be referred to herein as phenotype data. [00146] By labelling each of the reference spectra as negative or positive for the phenotype being predicted or determined, the features of the one or more reference spectra can be used to identify test spectra that contain the same or substantially similar features. For example, the features of one or more reference spectra labelled as positive for the phenotype may be compared to a test spectrum. Upon comparison, if the test spectrum includes the same features as those detected in the reference spectra, the test spectrum can be predicted or determined also to be positive. In order to determine features that are indicative of a positive or negative phenotype a plurality of labelled reference spectra may be analysed in order to detect common features in each spectra that are indicative of a positive or negative phenotype. [00147] The features that may be indicative of a positive or negative phenotype may be determined by any of the statistical or machine learning methods described herein. [00148] In some examples, the reference spectra and the features determined therefrom to be indicative of a negative or positive phenotype may not be used for a direct comparison to a test spectrum. The reference spectra may be used to teach a machine learning model, such as a neural network or decision tree, what features are indicative of a positive or negative phenotype. Once the model has analysed a plurality of labelled reference spectra it can recognise features that are indicative of a positive or negative phenotype and analyse a test spectrum in order to determine if the features are present. [00149] The data used to label reference spectra is dependent on the phenotype being predicted or determined. [00150] For example, when the phenotype is a disease state one or more reference spectra may be labelled as negative or positive based on disease data. Such as detection of the disease (or lack thereof) using the standard testing procedure as described herein for the specific disease. [00151] In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as negative (i.e. the animal does not have TB) wherein the animal has a negative skin-test result, a negative observation of lesions (no lesions observed in the tissue of the animal after slaughter), a negative antibody test, and/or a negative culture status (no mycobacterium detected upon culturing from lesions taken from the tissue of the animal). [00152] In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as negative wherein the animal has a negative skin-test result, a negative observation of lesions (no lesions observed in the tissue of the animal after slaughter), a negative antibody test, and a negative culture status (no mycobacterium detected upon culturing from lesions taken from the tissue of the animal). [00153] In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive (i.e. the animal has TB) wherein the animal has at least one of: a positive skin-test result; a positive observation of lesions (lesions observed in the tissue of the animal after slaughter); a positive antibody test; and/or a positive culture status (mycobacterium detected upon culturing from lesions taken from the tissue of the animal). [00154] In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive wherein the animal has a positive skin-test result. In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive wherein the animal has an inconclusive skin-test result. In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive wherein the animal has a positive observation of lesions. In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive wherein the animal has a positive antibody test. In the case of predicting or detecting a TB phenotype one or more reference spectra may be labelled as positive wherein the animal has a positive culture status. [00155] Additional data may be used in labelling reference spectra. Such as animal movement data. Animal movement data may include date, time and location of all births and deaths as well as age at death of animals, locations (movement to and from), length of stays, distances travelled, location types (e.g., agricultural holding, slaughterhouse, etc.). These data may be matched to concurrent bTB profiles of each animal being studied. In addition, corresponding dates for each of the types of movement data may also be included. [00156] In the case of detecting or predicting pregnancy status or change thereof one or more reference spectra may be labelled based on parturition data. In the case of detecting pregnancy one or more reference may be labelled based on insemination data. [00157] For example, a reference milk MIR spectrum obtained from an animal after parturition and before first insemination may be labelled as negative (i.e. not pregnant). A reference MIR spectrum obtained from an animal between the last insemination and the subsequent calving may be labelled as positive (i.e. is pregnant). A reference MIR spectrum obtained from an animal between the last insemination and the subsequent calving with a gestation length between 240 and 284 days may be labelled as positive. [00158] In certain examples, labelling of the one or more reference spectra may also be based on data relating to the recording date of the spectra. For example, in addition to the labelling of the spectra as negative or positive for the phenotype, the spectra may be labelled as current or previous, the spectra labelled as previous being recorded before the spectra labelled as current. For example, two or more reference spectra may be taken from the same animal at different time points. For example, a first spectrum obtained at a first timepoint that is labelled as positive (e.g. is pregnant or has TB), may also be labelled as previous. This spectrum may be input in addition to a second spectrum obtained from the same animal at a second timepoint that is labelled as negative (e.g. not pregnant or does not have TB) and this second spectrum may also be labelled as current. Using spectra from the same animal obtained at different time points may add granularity to the one or more reference spectra which may in turn improve the accuracy and/or precision of the methods described herein. For example, using previous and current spectra from an animal may provide not just a positive or negative phenotype but a transition between a positive and a negative phenotype. [00159] When the phenotype is methane production the reference infrared spectra may be labelled based on phenotype data that may include, feed intake, feed composition and/or methane emissions as measured by the standard methods described herein. [00160] Reference spectra may be labelled as negative or positive in respect of a predetermined level of methane concentration in a given volume of breath. For example, an expected or predetermined level of methane concentration in a given volume of breath for the animal being studied. A reference spectra may be labelled as positive when the methane concentration in a given volume of breath is greater than an expected methane concentration in a given volume of breath for an animal of the same species and/or breed. A reference spectra may be labelled as negative when the methane concentration in a given volume of breath is less than an expected methane concentration in a given volume of breath for an animal of the same species and/or breed. [00161] For example, a milk spectra obtained from an animal that produces a higher level of methane concentration per given column of breath than the predetermined level may be determined or predicted to have positive phenotype. For example, a milk spectra obtained from an animal that produces a lower level of methane concentration per given column of breath than the predetermined level may be determined or predicted to have negative phenotype. [00162] Predetermined levels of methane production, for example predetermined levels of methane concentration in a given breath may be defined based on industry standards for a given animal. For example, acceptable levels as defined by agriculture and/or government agencies. [00163] When the phenotype is feed intake, the reference spectra may be labelled based on phenotype data that may include one or more of net energy intake, dry matter intake, concentration of milk components (such as fat, protein and/or lactose), milk yield and/or body weight. For example, reference spectra may be labelled based on the mass of dry matter intake per day. In other examples, reference spectra may be labelled based on the average of dry matter intake per week. [00164] A reference spectra may be labelled as positive or negative based on a predetermined level of feed intake. For example, an expected or a predicted level of feed intake for the animal being studied. A reference spectra may be labelled as positive when the feed intake is greater than an expected or predicted feed intake for an animal of the same species and/or breed. Expected, predicted and/or predetermined levels of feed intake may be determined using the standard methods described above or any other methods known in the art. [00165] A reference spectra may be labelled as negative when the feed intake is less than an expected, predicted and/or predetermined feed intake for an animal of the same species and/or breed. [00166] For example, a milk spectra obtained from an animal that has a higher than expected, predicted and/or predetermined feed intake may be determined or predicted to be a positive phenotype. A milk spectra obtained from an animal that has a lower than expected feed intake may be determined or predicted to be a negative phenotype. [00167] Predetermined levels of feed intake may be determined by using predicted feed intake values from reference sources such as nutrient requirement tables and other previously published data relating to the animal being studied. For example, the predetermined level of feed intake for a cow may be from about 5 to 30 kg of dry matter per day. [00168] The methods described herein utilise infra-red spectral data obtained from an animal’s milk. In particular, the spectral data may be mid-infrared spectral data. [00169] A mid-infrared (MIR) spectrum of milk is obtained from infrared spectroscopy of the milk at defined wavelengths. For example, a recorded MIR spectrum will include numerous data points, with each point representing the absorption of infrared light through the milk at particular wavenumbers in the 400 to 5,000 cm^-1 region. The complete infrared spectrum of the milk may first be obtained with only data from the mid-infrared range subsequently used for the analysis, or the MIR spectrum in the 400 to 5,000 cm^-1 region only of the milk may be obtained. In some examples, the MIR spectra may be in the 900 to 5000 cm^-1 region. [00170] Infrared spectroscopy involves the interaction of infrared radiation with matter in the milk, and therefore exploits the differences in milk constitution that exists between different milk samples. Infrared spectroscopy of the milk may be performed using a standard benchtop infrared spectrophotometer available from commercial suppliers such as Bentley Instruments (Chaska, Minnesota, USA), Delta Instruments (Drachten, The Netherlands), Bruker Optics (Billerica, Minnesota, USA), JASCO (Eastland, Maryland, USA), Foss Analytics (Hillered, Denmark), Agilent Technologies (Santa Clara, California, USA), and ABB Analytical (Zurich, Switzerland). The infrared spectrophotometer may also be a portable or handheld device such as those also available from the above suppliers. Such portable devices are useful for on-farm analysis of milk samples. Other sources of spectroscopy apparatus are known. [00171] The infrared spectrum of milk is recorded by passing a beam of infrared light through the milk. When the frequency of the IR is the same as the vibrational frequency of a bond or collection of bonds, absorption occurs. Examination of the transmitted light reveals how much energy was absorbed at each frequency (or wavelength), which can be used to quantify the abundance of molecules present in the milk. This measurement can be achieved by scanning the relevant wavelength range using a monochromator. Alternatively, the entire wavelength range is measured using a Fourier transform instrument and then a transmittance or absorbance spectrum is generated using a dedicated procedure. [00172] Raw spectra of milk obtained over the 400 to 5,000 cm^-1 region may be subject to a pre-treatment before chemometric analysis. A pre-treatment is performed to eliminate regions of the spectra characterized by low signal to noise ratio resulting from high water absorption. In some embodiments, such spectral regions include 2998 to 3998 cm^-1, 1615 to 1652 cm^-1, and 649 to 925 cm^-1. [00173] The MIR spectral data collected may be modified. For example, subjected to pre- treatment and/or standardisation. [00174] Pre-treatments include the transformation of transmittance data of the MIR spectra to a linear absorbance scale. For example, by applying a log10^-0.5 transformation to the reciprocal of the transmittance data. [00175] In some examples, the MIR spectra may be reshaped. For example, reshaped to an array of size 53×20 pixels. The MIR spectra may also have the wavelength values normalized. For example, the wavelength values may be normalized to a value between 0 and 1. Spectra may also be converted to greyscale images. Such that each wavelength value is represented by a pixel. For example, normalized spectra may be converted to greyscale images. [00176] Standardization may help to account for drift incurred by collection of spectral data from different MIR instruments and across time. Standardization of the spectra may help ensure that the methods described herein can be applied to data streams from other machines throughout the world that have adopted the same standardization and that predictions and determinations by the methods described herein can be compared across different timepoints as standardization may account for drift in different machines. [00177] MIR spectra may be granulated. For example, by combining two spectra obtained for an animal at different timepoints. For example, by merging two spectra from the same animal, wherein one spectra is a first spectra obtained at a first timepoint and a second spectra is obtained at a second timepoint after the first timepoint. By granulating spectra, this may provide information of the transitional state of a phenotype. For example, in the case of pregnancy, granulation of data may provide information about the change of pregnancy status. Such as, providing prediction or determination of a change in pregnancy status or a loss of a pregnancy (i.e. an animal is pregnant at the first time point but is not pregnant at the second timepoint). [00178] The phenotype of an animal may be obtained through a statistical comparison of a test infrared spectra to one or more reference infrared spectra. Such a statistical comparison can be implemented through the use of any one of a number of algorithms which have, for example, the ability to compare infrared spectral features of each infrared spectrum being compared. For example, the infrared spectral features may be individual waveforms of each IR spectrum. [00179] The algorithms receive the infrared spectra with labels indicating if they are associated with a positive or a negative (better or worse) phenotype and automatically determine which features (or waveforms) of the infrared spectra best describe a negative or positive phenotype. Non limiting examples of algorithms include partial least squares regression (including partial least squares discriminant analysis (PLS-DA)), C4.5 decision trees, naive Bayes, Bayesian network, logistic regression, support vector machine, random forest, and rotation forest. These have been described in Hempstalk Ket al., 2015, J Dairy Sci., 98: 5262-5273. Once these features have been defined for one or more reference infrared spectra, a test spectrum can be analysed to determine the presence of the same features and thus predict or determine whether the test spectrum is positive or negative for the phenotype. [00180] Partial least squares regression (PLS; Geladi P and Kowalski BR, 1986, Anal. Chim. Acta, 185: 1-17) can be performed as a preprocessing step before training a machine learning algorithm; it works like principal component analysis (PCA) in that it transforms the data set into a new projection that represents the entire data set, and then chooses the C most informative axes (or "components") in the new projection as features in the transformed data set. Where the PCA and PLS algorithms differ is that PLS takes into consideration the dependent variable when constructing its projection, but PCA does not. One advantage of using the dependent variable during learning is that the algorithm is able to perform regression using the projections it has calculated. A binary prediction (for example pregnant or not; or diseased or not) can be made by creating a regression model that predicts the probability (of pregnant or diseased) and returning true if the probability reaches a set threshold, or false otherwise. PLS-DA is a variant of partial least squares regression when the response variable is categorical, which is used to find the relationship between two matrices. It is one of the most well-known classification methods in chemometrics, metabolomics, and proteomics with an ability to analyse highly collinear data which is often a problem with conventional regression methods, for example, logistic regression (Grom ski PS et al., 2015, Analytica Chimica Acta., 879: 10-23). [00181] The C4.5 decision tree (Quinlan R, 1993, Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, USA) builds a tree by evaluating the information gain of each feature (i.e., independent variable) and then creates a split (or decision) by choosing the most informative feature and dividing the records into left and right nodes of the tree. This process repeats until all of the records at a node belong to a single class (e.g. pregnant or diseased or not) or the number of records reaches the threshold defined in the algorithm (i.e., a minimum of 2 instances per leaf). A prediction is made by traversing the tree using the values from the current instance and returning the majority class at the leaf node reached by the traversal. The tree prevents over-fitting by performing pruning to remove nodes that may cause error in the final model. [00182] The naive Bayes algorithm "naively" assumes each feature is independent and builds a model based on Bayes' rule. It multiplies the probabilities of each feature belonging to each class (e.g. pregnant or diseased or not) to generate a prediction. The probability for each feature is calculated by supplying the mean and standard deviation to a Gaussian probability density function, which are then multiplied together using Bayes' rule. [00183] A Bayesian network classifier represents each feature as a node on a directed acyclic graph, each node containing the conditional probability distribution that can be used for class prediction. A Bayesian network assumes that each node is conditionally independent of its nondescendants, given its immediate parents. During calibration, the network structure is built by searching through the space of all possible edges and computing the log-likelihood of each resulting network as a measure of quality. [00184] Linear regression is a common statistical technique used to express a class variable as a linear combination of the features. However, it is designed to predict a real numeric value and cannot handle a categorical or binary class (i.e., conceived or not). To overcome this, a model can be built for each class value that ideally predicts 1 for that class value, and 0 otherwise, and at prediction time assigns the class value whose model predicts the greatest probability. Unfortunately, regression functions are not guaranteed to produce a probability between 0 and 1, and so the target class must first be transformed into a new space before it is learned. This is achieved using a log-transform, and this regression method is known as logistic regression (Witten IH et al., 2011, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA). In logistic regression, the weights are chosen to maximize the loglikelihood (instead of reducing the squared error), by iteratively solving a sequence of weighted least-squares regression problems until the log-likelihood converges on the maximum. One algorithm in WEKA Machine Learning Workbench that performs this type of logistic regression is SimpleLogisticRegression, which by default uses boosting (M = 500) to find the maximum log-likelihood, and cross-validation with greedy stopping (H = 50) to ensure the algorithm stops boosting if no gains have been made in the last H iterations. [00185] Support vector machines (SVM) can produce nonlinear boundaries (between classes) by constructing a linear boundary in a large, transformed version of the feature space (Hastie T et al., 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY.). In practice, a soft margin boundary (Cortes C and Vapnik P, 1995, Mach. Learn., 20: 273-297) is used to prevent over-fitting; however, a hard margin is easier to visualize when describing SVM. In the hard margin case, the algorithm assumes that classes in the transformed space are linearly separable, and it is possible to generate a hyperplane that completely separates them. By employing a technique known as the kernel trick (Aizerman MA et al., 1964, Autom. Remote Control, 25: 821-837), SVM are able to generate nonlinear decision boundaries. This is possible because the kernel trick reduces the computational effort by estimating similarities of the transformed instances as a function of their similarities in the original space. One example of an SVM is SMO, sequential minimal optimization (Platt J, 1998, Pages 185-208 in Advances in Kernel Methods: Support Vector Learning. B. Scholkopf, C. J. Burges, and A. J. Smola, ed. MIT Press, Cambridge, MA), from WEKA (Witten IH et al., 2011, supra), which uses the sequential minimal optimization algorithm to increase the speed of finding the maximum- margin hyperplane. [00186] Random forest (Brei man L, 2001, Mach. Learn., 45: 5-32) is an ensemble learner that creates a "forest" of decision trees, and predicts the most popular class estimated by the set of trees. Each tree is provided with a random set of training instances sampled with replacement from the entire training set. The intention of this step is to create a diverse set of trees. The algorithm differs from bagged decision trees (which also provide randomly selected subsets to each tree) because during training the algorithm randomly selects a subset of features available for selection at each split in the tree. One implementation of this algorithm is RandomForest in WEKA, which by default has an unlimited tree depth (maxDepth = 0) and the number of features randomly selected into each subset= log2(total number of features) + 1. By default, this algorithm creates a forest of 10 trees (numTrees =10); however, this can be increased to 1,000 (numTrees = 1000) to cater for poor accuracy when considering only 10 trees. The effect of increasing this parameter is that accuracy is improved, but also that the algorithm takes much longer to run. [00187] Rotation forest (Rodriguez JJ et al., 2006, IEEE Trans. Pattern Anal. Mach. Intel/., 28: 1619-1630) is an ensemble learner similar to random forest except that PCA is applied to select the features for each tree (instead of random selection), and the components are all kept when the base classifier is trained. The classifier sees a "rotated" set of features in each tree in its forest. The intention is to create individual accuracy in the tree and diversity in the ensemble, compared with random forest, which aims only to create diversity in the ensemble. Results for a rotation forest learner have been shown to be as good as those of other ensemble learning schemes such as bagging, boosting, and random forests (Rodriguez JJ et al., 2006, supra). [00188] Deep learning is a class of machine learning techniques employing representation learning methods that allows a machine to be given raw data and determine the representations needed for data classification. Deep learning ascertains structure in data sets using backpropagation algorithms which are used to alter internal parameters (e.g., node weights) of the deep learning machine. Deep learning machines can utilize a variety of multilayer architectures and algorithms. While machine learning, for example, involves an identification of features to be used in training the network, deep learning processes raw data to identify features of interest without the external identification. [00189] Deep learning in a neural network environment includes numerous interconnected nodes referred to as neurons. Input neurons, activated from an outside source, activate other neurons based on connections to those other neurons which are governed by the machine parameters. A neural network behaves in a certain manner based on its own parameters. Learning refines the machine parameters, and, by extension, the connections between neurons in the network, such that the neural network behaves in a desired manner. [00190] Deep learning that utilizes a convolutional neural network (CNN) segments data using convolutional filters to locate and identify learned, observable features in the data. A CNN assigns importance to these features in the form of learnable weights and biases. Each filter or layer of the CNN architecture transforms the input data to increase the selectivity and invariance of the data. This abstraction of the data allows the machine to focus on the features in the data it is attempting to classify and ignore irrelevant background information. [00191] Deep learning operates on the understanding that many datasets include high level features which include low level features. While examining an image, for example, rather than looking for an object, it is more efficient to look for edges which form motifs which form parts, which form the object being sought. These hierarchies of features can be found in many different forms of data such as speech and text, etc. [00192] Learned observable features include objects and quantifiable regularities learned by the machine during supervised learning. A machine provided with a large set of well classified data is better equipped to distinguish and extract the features pertinent to successful classification of new data. [00193] A deep learning machine that utilizes transfer learning may properly connect data features to certain classifications affirmed by a human expert. Conversely, the same machine can, when informed of an incorrect classification by a human expert, update the parameters for classification. Settings and/or other configuration information, for example, can be guided by learned use of settings and/or other configuration information, and, as a system is used more (e.g., repeatedly and/or by multiple users), a number of variations and/or other possibilities for settings and/or other configuration information can be reduced for a given situation. [00194] An example deep learning neural network can be trained on a set of expert classified data, for example. This set of data builds the first parameters for the neural network, and this would be the stage of supervised learning. During the stage of supervised learning, the neural network can be tested whether the desired behaviour has been achieved. [00195] Once a desired neural network behaviour has been achieved (e.g., a machine has been trained to operate according to a specified threshold, etc.), the machine can be deployed for use (e.g., testing the machine with “real” data, etc.). During operation, neural network classifications can be confirmed or denied (e.g., by an expert user, expert system, reference database, etc.) to continue to improve neural network behaviour. The example neural network is then in a state of transfer learning, as parameters for classification that determine neural network behaviour are updated based on ongoing interactions. In certain examples, the neural network can provide direct feedback to another process. In certain examples, the neural network outputs data that is buffered (e.g., via the cloud, etc.) and validated before it is provided to another process. [00196] Deep learning machines using convolutional neural networks (CNNs) can be used for image analysis. Deep learning machines can provide computer aided detection support to improve image analysis with respect to image quality and classification. [00197] Semi-supervised and unsupervised deep learning machines can be used to quantitatively measure qualitative aspects of images. For example, deep learning machines can be utilized after an image has been acquired to determine if the quality of the image is sufficient for phenotyping. Supervised deep learning machines can also be used for computer aided phenotyping. Supervised learning can help reduce susceptibility to false phenotyping of animals. [00198] Deep learning machines can utilize transfer learning when interacting with users to counteract a small dataset available in the supervised training. These deep learning machines can improve their computer aided phenotyping over time through training and transfer learning. [00199] A trained machine learning model may be used to predict a positive or negative phenotype from milk MIR spectral data. For example, the trained machine learning model may receive a milk MIR spectrum and predict, from the spectrum, a positive or negative phenotype. A desired phenotype may be input into the model, for example, pregnancy or TB or the model may learn the phenotype from the labels associated with the spectral data during training. For example, the trained machine learning model may predict, from the received spectrum of an animal, that the animal producing the milk associated with the spectrum is pregnant or has TB (positive phenotype) or is not pregnant or does not have TB (negative phenotype). [00200] The trained machine learning model is trained to learn how to predict the positive or negative phenotype from the milk MIR spectral data. The machine learning model is trained using reference MIR spectral records from animals that are known to have a positive or negative phenotype, each reference MIR spectral record labelled as providing a positive or negative phenotype. One way of training the machine learning model is to input the reference MIR spectral records from animals that are known to have a positive or negative phenotype into the machine learning model. The machine learning model then predicts a positive or negative phenotype from each record. The predicted phenotype is then compared to the known phenotype of the reference MIR spectral record and, based on the comparison, the machine learning model may be adapted. For example, if the predicted phenotype is not the same as the known phenotype, connections or weights within the machine learning model may be changed. Once the machine learning model predicts the correct phenotype with high accuracy, for example, once the machine learning model predicts the correct phenotype or the majority of records, it is trained and can be used to accurately predict a phenotype from MIR spectral data. [00201] Where the machine learning model is trained specifically for a particular phenotype, the label may be based on that phenotype. For example, for a machine learning model trained to predict pregnancy, each reference MIR spectral record may be labelled as either pregnant or not pregnant and, for a machine learning model trained to predict bTB, each reference MIR spectral record may be labelled as either being bTB infected or not being bTB infected. [00202] Training may be performed until there is no improvement in the accuracy of the predictions. For example, the number of epochs required for training may be determined by the inclusion of an “early stopper” in the code. Early stopping is a machine learning method used to stop training when there is no improvement in model performance, thus minimizing over- and under-fitting. [00203] The machine learning model may be a decision tree or a convolutional neural network. [00204] When the machine learning model is a convolutional neural network, the machine learning model may comprise layers of nodes. A first layer of nodes may be an input layer to receive the milk MIR spectral data. For example, the first layer may be configured to receive a milk MIR spectral record as input. A last layer of nodes may be an output layer to output the positive or negative phenotype. The first and last layers of nodes may be connected through one or more neural network layers of nodes. The nodes may each be weighted. The machine learning model is configured through training to map the milk MIR spectral data directly to a positive or negative phenotype. To train the machine learning model, the connections between the layers and the weighting of the nodes may be changed until the machine learning model predicts the phenotypes from the spectral data with a high accuracy. [00205] The convolutional neural network may be a dense convolutional neural network (also known as a DenseNet), meaning each layer of nodes is connected to every other layer of nodes in a feed forward fashion. For each layer, the outputs of all preceding layers are used as inputs, and its own outputs are used as inputs into all subsequent layers. Use of a DenseNet advantageously alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters required to produce accurate results. DenseNets provide significant improvements over the state-of-the-art, requiring less memory and computation to achieve high performance. [00206] Convolutional neural networks, including DenseNets, can be pretrained using other data before being trained for MIR spectral data. This reduces the time taken to train the neural networks on reference MIR spectral data and reduces the computation required. For example, the neural networks may be pretrained using images in order to identify particular features of the images such as edges. Once the neural network is pretrained, it can be adapted for MIR spectra classification through transfer learning; a process by which a previously fully trained model, trained for a specific task, is repurposed for a new, different task. The number of epochs required in pretraining may also be determined by the inclusion of an early stopper in the code, as explained above. [00207] Any form of the MIR spectral data may be input into the machine learning model. The MIR spectral data may be input int the machine learning model without any pre- treatment. Alternatively, the MIR spectral data may be pre-treated, for example standardised, before being input into the machine learning model, as described above. In another example, the spectral data may be transformed into an image before being input into the machine learning model. In one example, when using DenseNet, MIR spectral records may be converted to individual 53 x 20 px greyscale (PNG) images, which reduces the time and computation required for training when the DenseNet is already pretrained on images. In another example, before being input into a machine learning model, the MIR spectral data may be reduced in size. For example, each MIR spectral record may have a large number of wave points. In order to reduce the computation and size of the machine learning model, each spectral record may be reduced in size to only provide those wave points that are significant in predicting whether the phenotype is positive or negative. This may be done using a genetic algorithm, which takes different subsets of wave points of records and determines which wave points are the most significant for predicting the phenotype. The most significant wave points of the MIR spectral records may then be input into the machine learning model. Alternatively, the genetic algorithm may form part of the machine learning model. [00208] When the machine learning model is a decision tree, the machine learning model comprises leaves connected by branches and statistics are used to predict the phenotype. The decision tree is used as a predictive model to go from observations about an item (represented in the branches), which in this case is the spectral data to conclusions about the item's target value (represented in the leaves), which in this case is the phenotype. [00209] When the machine learning model is a decision tree, the machine learning model may use gradient boosting. Gradient boosting produces a strong prediction model in the form of an ensemble of weak decision trees. A weak decision tree is a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong decision tree or prediction model is a classifier that is arbitrarily well-correlated with the true classification. In an example, the machine learning model may perform gradient boosting using XGBoost, an optimized distributed gradient boosting library that implements machine learning algorithms under the Gradient Boosting framework. [00210] The methods described herein may require large amounts of data, such as reference spectra that has been labelled for the phenotype being predicted or determined, in order to help improve the accuracy of prediction or determination. Moreover, there may be a large number of reference spectra having one label but a small number of reference spectra having a different label. In order to train the machine learning model effectively, it is advantageous to provide substantially equal amounts of reference spectra for each label. [00211] Thus, in order to increase the amount of data available, methods described herein may include synthesising labelled data, such as, labelled reference spectra. All data may be synthesized or data for one of the labels (i.e. one of a negative or positive phenotype) may be synthesized. Where all data is synthesized, this may occur by producing new data based on the current data, for example by changing small elements of the current data, in order to provide a larger dataset to train the machine learning model. For example, for the reference spectral records, new spectral records may be produced that differ in a few wave points from one or more current spectral records. [00212] Where only data having a particular label is synthesized, the other data may not be synthesized. For example, for reference MIR spectral data, there may be less data having the label of TB positive or TB infected compared to data having the label of not TB positive or not TB infected. Therefore only data having the label of TB positive or TB infected may be synthesized in order to increase the amount of data having this label to be of a substantially equal amount to the data having the label of not TB positive or not TB infected. Synthesizing the MIR spectral data may be performed by producing new MIR spectral records that are similar to spectral records already obtained. Therefore MIR spectral data having the label of TB positive or TB infected may be used to produce new records having the label of TB positive or TB infected in order to increase the amount of data having this label. [00213] Labelled data may be synthesised using methods such as Synthetic Minority Over Sampling (SMOTE), Adaptive Synthetic sampling approach and generative adversarial networks (GAN). [00214] Synthesis of data may include the use of a use a k nearest neighbors approach to synthesize new data within the body of available data by randomly selecting a minority instance, A; finding its k nearest neighbors; and then, drawing a line segment in the feature space between A and a random neighbor. [00215] Machine learning models may be evaluated based on loss and/or accuracy. The methods described herein may have an accuracy of at least 65%. For example, the methods described herein may have an accuracy of at least 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%. For example, the methods described herein may have an accuracy of at least 71%. The methods described herein may have an accuracy of at least 82%. The methods described herein may have an accuracy of at least 85%. The methods described herein may have an accuracy of at least 88%. The methods described herein may have an accuracy of at least 90%. The methods described herein may have an accuracy of at least 95%. The methods described herein may have an accuracy of at least 97%. [00216] Accuracy is defined as the number of correct predictions or determinations divided by the total number of predictions or determinations. [00217] Accuracy may be calculated by the following equation: ACC = (TP+TN) / (TP + FP + FN + TN) [00218] ACC stands for accuracy; TP is a true positive recorded when a method correctly predicts a positive phenotype; TN is a true negative when a method correctly predicts a negative phenotype; FP is a false positive recorded when a method incorrectly predicts a negative phenotype as a positive phenotype; and FN is a false negative recorded when a method incorrectly predicts a positive phenotype as a negative phenotype. [00219] The methods described herein may have a positive predictive value (PPV), also known as precision, of at least 0.65. For example, a PPV of at least 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95. The methods described herein may have a PPV of at least 0.86. The methods described herein may have a PPV of at least 0.78. The methods described herein may have a PPV of 0.86. The methods described herein may have a PPV of 0.92. [00220] PPV is the probability that an animal predicted or determined to have a positive phenotype is indeed infected and was defined as the proportion of positive predictions or determinations that are verified as correct and may be calculated via: PPV = TP / (TP + FP) [00221] Thus, if a method produces no false positives it would have a PPV of 1. [00222] The methods described herein may have a negative predictive value (NPV) of at least 0.65. For example, a NPV of at least 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95. The methods described herein may have a NPV of at least 0.86. The methods described herein may have a NPV of at least 0.78. [00223] NPV is the probability that an animal with a negative phenotype prediction or determination is truly negative and may be defined as the proportion of negative predictions or determinations that were verified as correct and may be calculated via: NPV = TN / (TN + FN) [00224] Thus, if a method produces no false negatives it would have a NPV of 1. [00225] The methods described herein may have a sensitivity of at least 0.45. For example, a sensitivity of at least 0.45, 0.50, 0.55, 0.60, 0.65, 6.70, 0.75, 0.80, 0.85, 0.90, 0.95. The methods described herein may have a specificity of at least 0.48. The methods described herein may have a specificity of at least 0.78. The methods described herein may have a specificity of at least 0.89. The methods described herein may have a specificity of at least 0.91. The methods described herein may have a specificity of at least 0.96. [00226] Sensitivity (TPR, i.e., recall, or true positive rate) may be defined as the proportion of true positives a method identifies correctly and may be calculated via: TPR = TP / (TP + FN) [00227] Thus, if a method produces no false negatives it would have a TPR of 1. [00228] The methods described herein may have a specificity of at least 0.65. For example, a sensitivity of at least 0.65, 6.70, 0.75, 0.80, 0.85, 0.90, 0.95. The methods described herein may have a specificity of at least 0.68. The methods described herein may have a specificity of at least 0.78. The methods described herein may have a specificity of at least 0.86. The methods described herein may have a specificity of at least 0.92. The methods described herein may have a specificity of at least 0.94. [00229] Specificity (TNR, i.e., true negative rate) may be defined as the proportion of true negatives a method identifies correctly and may be calculated via: TNR = TN / (TN + FP) [00230] Thus, if a method produces no false positives it would have a TNR of 1. [00231] The machine learning model may be evaluated based on its loss. A small loss indicates that a machine learning model performs well. This is because the loss is based on the number of incorrect outputs, or predictions, of the machine learning model. Thus, for a machine learning model according to the present invention, the loss is based on how many times the output phenotype is wrong. This may be detected using the reference MIR spectral data as the output phenotype can be compared to the labelled phenotype of the MIR spectral record to determine if the phenotype is incorrect and consequently determine the loss. Loss is used to interpret the confidence of the model’s predictions. The loss metric close to zero ensures that the model is robust in its predictions. The methods described herein may have a loss of less than 0.1. The methods described herein may have a loss of less than 0.06. For example, the methods described herein may have a loss of between 0.05 and 0.06. Therefore the machine learning model described herein may be robust in its predictions.. Loss may be measured by a loss function. An example loss function may be a fraction of the wrong outputs in comparison to the total number of outputs. [00232] The methods described herein may be used to inform or determine a response to a phenotype. For example, in response to a phenotype an animal may be treated. As used herein the term “treated” and “treatment” are used broadly to refer to any action that may be taken in response to a positive or negative phenotype determined by the methods described herein. [00233] In the case that an animal is determined as having a disease (i.e. a positive disease state), treatment may include the use of drug therapies such as administration of antibiotics or vaccines to the animal. In addition, therapies may be administered to animals that are at risk of contracting the disease from an animal determined to have the disease. For examples, other animals that have been or are in close proximity to a diseased animal. For example other animals in a herd or group of animals. [00234] Animals may be isolated based on the phenotype determined. For example, for a disease state the animal may be isolated to prevent spread of the disease. Animals that have been in close proximity to a diseased animal may also be isolated to help prevent spread of the disease to other animals. [00235] For example, animals with a disease such as TB may undergo a herd breakdown as shown in Figure 5. Animals that are predicted or determined to be diseased by the methods described herein are removed from the group (herd) of animals and the remaining animals are isolated. The animals remaining in the group are then subjected to further anlysis by the methods described herein and/or by standard testing methods for the disease. Once no animals remain that have a diseased phenotype the group of animals is no longer considered broken down. [00236] In cases of pregnancy state, animals that are pregnant may be separated or segregated from those that are pregnant. [00237] In the case of pregnancy state, the animal may be inseminated. For example, if the animal is determined as not pregnant or, in the case of pregnancy state change, has lost a pregnancy the animal may be inseminated. [00238] In some cases animals may be euthanized or culled. For some disease drug therapies may not be available or commercially viable and as such the animals are euthanized. In the case of pregnancy state, and animal that does not maintain a pregnancy or is able of falling pregnant may not be considered commercially viable and as such may be euthanized. [00239] In some examples, treatment may depend on the disease being detected. As such, the treatment may include any of the known or standard treatments described herein. EXAMPLES Example 1 - Predicting Pregnancy Status From Milk Spectral Data [00240] Accurately identifying pregnancy status is important for a profitable dairy enterprise. Mid-infrared (MIR) spectroscopy is routinely used to determine fat and protein concentrations in milk samples. Mid-infrared spectra has successfully been used to predict other economically important traits including fatty acid content, mineral content, body energy status, lactoferrin, feed intake, and methane emissions. Machine learning has been used in a variety of fields to find patterns in vast quantities of data. This study aimed to use deep learning, a sub-branch of machine learning, to establish pregnancy status from routinely collected milk MIR spectral data. Milk spectral data was obtained from National Milk Records, who collect large volumes of data continuously on a monthly basis. Two approaches were followed by using genetic algorithms for feature selection and network design (Model 1) and transfer learning with a pre-trained DenseNet model (Model 2). Feature selection in Model 1 showed that the number of wave points in MIR data could be reduced from 1060 to 196 wave points. The trained model converged after 162 epochs with validation accuracy and loss of 0.89 and 0.18, respectively. Although the accuracy was sufficiently high, the loss (in terms of predicting only two labels) was considered too high and suggested that the model would not be robust enough to apply to industry. Model 2 was trained in two stages of 100 epochs each with spectral data converted to grey-scale images and resulted in accuracy and loss of 0.97 and 0.08, respectively. Inspection on inference data showed prediction sensitivity of 0.89, specificity of 0.86 and prediction accuracy of 0.88. Results indicate that milk MIR data contains features relating to pregnancy status and the underlying metabolic changes in dairy cows and such features can be identified by means of deep learning. Prediction equations from trained models can be used to alert farmers of non- viable pregnancies as well as to verify conception dates. [00241] Pregnancy status is an essential phenotype in dairy cattle and important in managing the reproductive – and subsequent production – performance of the herd. Over the course of lactation the milk yield peaks then declines, however, poor reproductive performance allows more cows to lactate far after they have reached their peak thus reducing profitability. To attain optimal herd efficiency, farmers aim for a 365-d calving interval, meaning the cow must be inseminated 80 days post-partum and maintain this pregnancy throughout her lactation. The longer it takes to determine if the cow has not maintained the pregnancy the greater the financial implications. Cows thought to be pregnant and identified late in lactation as being empty are often culled because a subsequent potential pregnancy does not fit the farms calving pattern or justify the prolonged dry period of the cow. Pregnancy diagnosis is routinely carried out by a veterinarian usually using rectal palpation approximately 3 weeks post-insemination (Sheldon and Noakes, 2002). On establishing pregnancy the cow is assumed to be in calf unless she begins displaying signs of oestrus. The ability and speed with which oestrus is detected is dependent on the quality of management and detection aids on farm (Roelofs et al., 2010). Pregnancy diagnosis can also be established from a milk sample by measuring the concentration of progesterone at 24 days with accuracies of 83.3 and 85% respectively (Muhammd et al., 2000; Sheldon and Noakes, 2002). Pregnancy has also been shown to affect milk composition (Olori et al., 1997; Penasa et al., 2016; Lainé et al., 2017), and was highlighted via the calibration of spectral data from mid-infrared (MIR) spectroscopy of milk samples collected as part of routine milk recording (Lainé et al., 2017). This is of particular interest as prediction of pregnancy status from samples collected as part of routine milk recording could provide a faster detection method that is non-invasive, cost-effective and able to be applied on a regular basis. [00242] Infrared radiation is the section of the electromagnetic radiation spectrum with wavelengths longer than light (780 nm^-1 mm) making it invisible to the human eye. The mid- infrared region of the infrared spectrum is between 3–50 μm. When MIR radiation hits an object the molecules from which it is composed absorb the energy and begin to rotate and vibrate. Like a fingerprint, the rotational and vibrational patterns are characteristic of different molecules, allowing identification of molecules by their pattern of absorbance. MIR spectroscopy is routinely used for the quantification of fat and protein content of milk samples; however, there are a number of other compounds expressed in milk samples which could be identified through MIR spectra and used to monitor the health status of the lactating cow. Already MIR spectra have been calibrated to develop prediction equations for, amongst others, fatty acid content (Soyeurt et al., 2006; Wojciechowski and Barbano, 2016), mineral content (Toffanin et al., 2015), body energy status (McParland et al., 2011; Smith et al., 2019), lactoferrin (Soyeurt et al., 2012), and methane emissions (Dehareng et al., 2012). Additional studies have focused on pregnancy diagnosis and have shown that signals in the milk MIR can provide an indication of a change in the pregnancy status of cows, however, mixed success in calibrating milk MIR spectra to predict pregnancy status has been reported (Lainé et al., 2014; Toledo-Alvarado et al., 2018; Delhez et al., 2020). [00243] Previous studies looking at phenotype prediction from milk MIR spectra have mostly focused on using partial least squares (PLS) analysis to develop prediction equations (e.g., see studies mentioned above and review by De Marchi et al., 2014). The volume of data, combined with the computing power, available to scientists today presents new techniques, such as machine learning and artificial neural networks (ANN), and opportunities to delve deeper in investigating relationships between MIR spectra and economically important phenotypes. [00244] Artificial neural networks are computer systems inspired by the biological neural networks found in mammalian brains (Ciresan et al., 2011) with extensive networks of interconnected neurons. Deep neural networks (DNN) are similar to ANN, except that they include 2 or more hidden layers that enables them to discover features in complex, high- dimensional data for classification or detection by means of representation-learning methods (Lecun et al., 2015). Advances in DNN have demonstrated the ability to accurately classify complex data from several disciplines, especially for computer vision (Jacobsen et al., 2017). Deep neural networks are essentially feed-forward systems where information is passed in a single direction. Convolutional neural networks (CNN) mimic the mammalian brain even further by using supervised back-propagation to update older assumptions with newly acquired knowledge during training, by means of sampling and sub-sampling maps (Ciresan et al., 2011). These CNN are essential to the extraction of high-level features from abstract data to improve the predictability of deep classifier layers. Transfer learning utilizes all the same design requirements, but exploits the fact that data from one feature space and distribution can be used to classify data in another feature space and distribution (Pan and Yang, 2010). This means that models can be trained on datasets where training data is excessive and subsequently used on sparse data for further training. Transfer learning models are mostly available for computer vision tasks such as classifying images into discreet categories. Following a machine learning approach, a pilot study by our group confirmed milk MIR spectra contained features relating to pregnancy status and underlying metabolic changes in dairy cows and those features could be identified using ANN (Brand et al., 2018) – this work was further extended and applied to milk MIR spectral data to successfully predict bovine tuberculosis status of individual cows (Denholm et al., 2020). [00245] The objective of this study was to use deep learning to model the relationship between milk MIR spectral data and pregnancy status in dairy cows. The ability to determine whether or not a cow is pregnant from her spectral profile alone would not only provide a non-invasive and low-cost method to diagnose pregnancy but also the ability to monitor the pregnancy status of the entire herd throughout lactation. More importantly it would enable the farmer to be alerted to any changes in status between recordings such as confirmation of pregnancy post insemination (i.e., moving from a not pregnant state to a pregnant state) as well as loss of pregnancy (i.e., moving from a pregnant state to a not pregnant state). MATERIALS AND METHODS Acquisition and scope of data [00246] MIR analysis of milk samples was carried out by National Milk Records (NMR) using FOSS spectrometers (FOSS Electric A/S, Hillerød, Denmark), based at the National Milk Laboratories (Glasgow, UK). Data were collected as part of routine milk recording services in the United Kingdom and electronically transferred to Scotland’s Rural College (Edinburgh, UK) nightly on a continuous basis. Sampling intervals were 30 days on average. [00247] The process of selecting records for analysis was based on the perceived ability to classify cows as pregnant or non-pregnant. The only certain way is to use records from cows that have calved again and assume prior to the calculated or recorded insemination the cow was not pregnant (and after she was pregnant). Insemination records are not always recorded thus all data after the “last” recorded insemination could not be assumed pregnant – the farmer could (and as is often the case) stop recording inseminations when the cow is not seen bulling and subsequently start recording again sometime later. In between recording periods (even with a confirmed pregnancy) the time when a cow was pregnant and then was not, is too imprecise. To avoid introducing such uncertain and imprecise records into the training set they were excluded. Thus, milk MIR spectral records from animals after parturition and before their first insemination were labelled as non-pregnant for the training dataset. Records between the last insemination and the subsequent calving with a gestation length between 240 and 284 days were labelled as pregnant records for the dataset. [00248] The amount of records for confirmed non-pregnant animals was the limiting factor as the distribution of animals in both categories in the training set should be close to equal (Lecun et al., 2015). After labelling the data a total of 3 million spectral records from 697,671 animals, born between 1999 and 2016, were available for further analysis. Pre-treatment and standardization of mid-infrared data [00249] The MIR spectrum was stored as 1060 data points spanning 900 to 5000 cm^-1, each point representing the pattern of absorption of infrared light at a given wavelength (Grelet et al., 2015). Spectral data was converted from transmittance to absorbance using a log -0.5 10 transformation. Additionally in order to account for the difference between different MIR instruments the data was standardized in accordance with the protocol set out by the EU- funded OptiMIR project (Friedrichs et al., 2015). Standardization files are received routinely from the Centre Walloon de Recherches Agronomiques (CRA-W). This ensures that comparisons can be made across any tools developed within the same dairy network or results collected where this standardization has been applied. Model development [00250] Two models were developed (labelled Model 1 and Model 2) and investigated by applying different deep learning techniques. The development of Model 1 involved a multi- step approach and used genetic algorithms (GA) to reduce the dimensionality of the MIR spectra by eliminating wave points that were not significant to predicting pregnancy status (feature selection). Genetic algorithms are computer programs that evolve in ways that resemble natural selection to solve complex problems. All GA were implemented on a representative subset of 100,000 records from the MIR data. The purpose of the GA was not to predict pregnancy, but rather to investigate the possibility to use a smaller subset of MIR wave points when predicting pregnancy as well as defining an appropriate deep neural network that can predict pregnancy status from a subset of MIR wave points. Each wave point was randomly assigned a discrete weighting of 0 or 1 that determined whether a wave point would be selected into the feature space (GA1) for feature selection. A visual description of how this was implemented can be seen in Figure 1. The first generation consisted of 50 individuals, each holding a random set of wave points for selection. A control test was done on all 1060 wave points to benchmark the predictive difference between using all wave points and using a subset. Each individual was evaluated on accuracy using a k- nearest neighbours approach. Individuals with the highest accuracy were subsequently selected to generate new individuals for future generations/iterations. Accuracy was defined by Equation (1) as follows: Accuracy = (TP + TN) / (TP + TN + FP + FN) (1) [00251] Where, TP, TN, FP, and FN represent total numbers of true positives, true negatives, false positives, and false negatives aspredicted by the model, respectively. [00252] The “fittest” individual after 250 iterations was selected and its selected wave points were used for the next GA. A second set of GA were implemented on the selected wave points from GA1 by assigning continuous weighting factors between 0 and 1 to ensure that the subset of wave points was indeed still trainable (GA2) for feature extraction. The benchmark in GA2 was the selected individual from GA1. This step was done to ensure that the reduced feature set could still be subjected to training and that too many features were not eliminated. In both GA1 and GA2, the prediction accuracies for the “fittest individual” and the population average (average accuracy of all individuals in a generation) were logged at the end of iteration. A third GA was trained with the reduced feature set to design an optimum deep neural network by setting each individual in the base population as a random network configuration and evolved for several generations (GA3). The resulting neural network architecture was subsequently applied to the spectral data for further training and optimization on a larger dataset of 3,000,000 spectral records, evenly distributed between pregnant and non-pregnant. [00253] The development of Model 2 involved obtaining of a pre-trained model called DenseNet (Huang et al., 2017) and adapting it to MIR spectra classification through transfer learning. Mid-infrared spectra records were individually converted into grey-scale images with dimensions of 53x20 pixels (from the original 1,060 wave points). Pre-trained models like DenseNet are trained on millions of images and are well adapted to extracting high level features from abstract data like images from its deeper convolutional layers. This allows for more robust models in subsequent training and that a smaller dataset can be used. Model 2 was trained on only 10,000 spectral images, equally distributed between both labels and spanning different stages of lactation. The model was trained for 100 epochs and with the convolutional layers set to non-trainable. This allowed the model to understand the MIR images first by training the dense, classifier layers only. Subsequently the model was trained for another 100 epochs and with convolutional layers available for training with a small learning rate. An inference dataset of 1,000 spectral images were used to test the quality of predictions from Model 2. [00254] Deep learning models are typically trained on a dataset split into two subsets of data, one for training and learning (the training set) and a second for validating during training (the validation set). Both of these datasets are passed to the model during training with the features (MIR spectral wavelengths) as well as the labels (binary pregnancy status). The ratio used to split a dataset into training and validation sets is usually 4:1 for training and validation, respectively; this ratio was maintained when creating training and validation sets in the present study. Models were evaluated on two metrics; accuracy, as defined in (1), and loss (obtained via a loss function). In the case of categorical labels, such as in the present study, a Softmax activation function (2) is applied to the final output layer of the network before applying a suitable loss function; here categorical cross-entropy (3). Note that the Softmax activation function normalizes the output of the network in the range (0, 1), i.e., providing a discrete probability distribution, such that the components of the resulting output vector sums to 1. (2)

(3)

[00255] Here x is the observation from j = 1 to C; C is the number of possible class labels (in this case C is 2, representing each pregnancy status); e is the standard exponential function; t is the target vector; and f(s) is the Softmax probability obtained by applying (2). [00256] Loss (3) helps to interpret the confidence of the model’s predictions and can range from zero to infinity, the former being the ideal goal. While the accuracy of prediction for binary labels can be high so too can the loss, thus, optimizing (i.e., reducing) the loss metric close to zero ensures that the model is robust in its predictions. [00257] Three further metrics, commonly used in machine and deep learning, were also calculated for resultant models in order to determine performance. These included precision, recall and F₁-score: [00258] Precision (i.e., the positive predictive value) was calculated via (4) and represents the proportion of positive predictions that were verified as correct. Recall (i.e., sensitivity, or true positive rate) was calculated via (5) and represents the proportion of true positives the model identified correctly. Finally, the F₁-score (used in the analysis of binary labels) was calculated via (6) and represents the harmonic mean of precision and recall. Precision = TP / (TP + FP) (4) Recall = TP / (TP + FN) (5) F1-score = 2 x (Precision x Recall) / (Precision + Recall) (6) [00259] Where TP, FP, and FN are numbers of true positives, false positives and false negatives as predicted by the model, respectively. [00260] Both models were developed and trained on a NVIDIA DGX Station with four NVIDIA Tesla V100 GPU cards (NVIDIA Corporation, Santa Clara, CA, USA). This improved training time significantly, especially as the second training of Model 2 had 28,744,386 trainable parameters that require updating at each epoch. The open source TensorFlow API from Google Inc. (Abadi et al., 2015) was used to develop Model 1 and the Fast.ai API (Howard and Gugger, 2020) was used to develop Model 2. [00261] Partial least squares (PLS) analysis has been the technique generally used to date when predicting phenotypes from milk MIR spectra (see review by De Marchi et al., 2014). When the response variable is categorical a PLS variant known as partial least squares discriminant analysis (PLS-DA) can be applied, as has been shown previously to develop prediction models of pregnancy status from milk MIR spectra (e.g., Delhez et al., 2020). Therefore, in addition to the two deep leaning models described above a PLS-DA was also applied to a subset of the data (balanced for label). Before applying the PLS-DA data were smoothed to remove baseline variation by calculating the first derivative of the raw spectra, i.e., subtracting from each wavelength value the immediately preceding wavelength value (e.g., McParland et al., 2011; Soyeurt et al., 2011; Smith et al., 2019). Partial least squares discriminant analysis was then carried out using python 3.5 (van Rossum, 1995) and the Scikit-learn machine learning package (Pedregosa et al., 2011). Cross validation (random, 10-fold CV) was used to evaluate PLS-DA model performance and enable comparison between the different models. RESULTS Model 1 [00262] The configuration for all GA are summarised in Table 1. After each generation, all individuals in the population were evaluated for its fitness based on its ability to accurately predict pregnancy status from its features and subsequently ranked by accuracy in descending order. The first 40% were selected as parents for the next generation. The rest of the population were individually given a 10% chance of being randomly selected as parents too to maintain variation. These individuals would then be randomly paired to create new individuals until the population capacity was reached. The process would then repeat itself until 250 iterations were completed which was sufficient time to allow for favourable random mutations in the equations.

Table 1. Standard configuration of genetic algorithms used for feature selection and neural network architecture [00263] The first genetic algorithm (GA1) selected 196 features after 157 iterations. Using all 1,060 MIR wave points could predict pregnancy status with an accuracy of 0.8225. The reduced feature set received an accuracy of 0.8501 and the average of the population was 0.8477. An increase of more than 2% in accuracy while using 18.49% of the original feature set was favourable and was less computationally demanding. Concerns on whether too much information was removed were addressed with GA2 that showed that the fittest individual could be further trained to an accuracy of 0.8731 and the 196 wave points were used in GA3 to obtain optimum neural network architecture. GA3 suggested a convolutional deep neural network with a Softmax activation function (Equation 2). The Softmax activation function is a normalised exponential function for multiclass classification and applied to the output layer of the classifier. [00264] Subsequent training of the neural network on the full dataset of 3 million records and 196 features converged after 162 epochs. The validation accuracy and loss is summarised in Figure 2. The training accuracy reached its peak at step 227,413 with a value of 0.90. Despite this the model reached its lowest loss at step 729,142 with a value of 0.18 and an accuracy of 0.89. Model 1 was not considered for further evaluation and inference due to the relatively high loss of training, although it was noted that the accuracy achieved was higher than with K Nearest Neighbours algorithm. Model 2 [00265] The training accuracy and losses of Model 2 for each epoch is summarized in Figure 3. Accuracy improved rapidly from the start of training until epoch 33 to 0.925 and thereafter increased at a lower rate to epoch 100 (0.955). The second phase of training showed an initial deterioration of accuracy, but this improved by epoch 157 and subsequently the accuracy converged to 0.9725. Similarly, the losses showed rapid improvement from start of training followed by a gradual improvement for the first phase of training. Training loss and validation converged at 0.057909 and 0.080359, respectively. [00266] A confusion matrix of the inference dataset of Model 2 is shown in Table 2.

Table 2. Model 2 performance. Precision, recall and F₁-scores from inference using Model 2 1 Precision (i.e., positive predictive value) = TP / (TP + FP) 2 Recall (i.e., sensitivity) = TP / (TP + FN) 3 F1-score = 2 x (precision x recall) / (precision + recall) 4 Accuracy = (TP + TN) / (TP + TN + FP + FN) Where TP, TN, FP, and FN represent total numbers of true positive, true negative, false positive, and false negative predictions, respectively [00267] Overall accuracy of prediction was 0.877 with a recall (sensitivity) of 0.894 and precision (positive predictive value) of 0.8646. Recall is disproportional to false negative rate (FNR) and showed the model had a low incidence of falsely predicting non pregnant animals. The F1 score (i.e., harmonic mean) was 0.8791 and corresponded well with the overall accuracy of the test. PLS-DA Model [00268] Results from the PLS-DA are summarized in Table 3.

Table 3. PLS-DAa model performance. Precision, recall and F1-scores from 10-fold cross validation of the PLS-DA model a Partial Least Squares Discriminant Analysis 1 Precision (i.e., positive predictive value) = TP / (TP + FP) 2 Recall (i.e., sensitivity) = TP / (TP + FN) 3 F₁-score = 2 x (precision x recall) / (precision + recall) 4 Accuracy = (TP + TN) / (TP + TN + FP + FN) Where TP, TN, FP, and FN represent total numbers of true positive, true negative, false positive, and false negative predictions, respectively [00269] Overall accuracy of the cross validation was 0.77 with a recall, precision and F1 score of 0.73, 0.80 and 0.76, respectively. Specificity was relatively high (0.82) and again overall accuracy and F1 score corresponded well. DISCUSSION [00270] The genetic algorithms proved to be an efficient technique in identifying features in MIR spectral data. The 196 wave points selected by the GA aligned with the wave points selected from the OptiMIR project (Friedrichs et al., 2015). The genetic algorithms proved to be versatile in their applications and were easily interpreted. Ultimately, Model 1 was not considered appropriate for further interrogation due to its higher loss metric. Convolutional neural networks are widely used for classifying images (Yim et al., 2015) and to use padding as a sub-sampling tool (Srivastava et al., 2015) to remove background noise from the edges of images. Zero-padding was specifically not used in the architecture of the CNN since the 196 features were already sub-sampled and it was imperative that feature detection occurred on the edges of the convolutional layers. Influence of Stage of Lactation [00271] Training records classified as not-pregnant were records obtained prior to first insemination and, therefore, early in lactation, as opposed to pregnant records that were generally later in lactation. Initial concerns that stage of lactation was being predicted instead of pregnancy status were not substantiated while examining the predictions as predicted onset of pregnancy varied substantially in the results and no linear trend could be found. In a previous trial, days in milk (DIM) was fitted as an additional feature and training accuracies were above 0.97. The model was able to predict pregnancy status with high accuracy based solely on stage of lactation and could not identify a single record where a pregnancy was terminated during the lactation. An almost linear increase in the probability of pregnancy was observed as days in milk increased. It was concluded that stage in lactation could rather be used to adjust the labels instead of being used as a feature by possibly labelling the data as non-pregnant, early-pregnant and late-pregnant, as such DIM was not fitted or made available to the models developed in the present study. Advantage of Transfer Learning [00272] Transfer learning has the advantage that a robust model for a specific target domain can be obtained by transferring knowledge contained in a different, but related, source domain (Zhuang et al., 2019). By default, this implies that less training data is required to achieve the target model. Model 2 was relatively easy to train with transfer learning as no prior configuration or investigation on network design was required. Training on spectral images was efficient and faster than parsing text files and converting data types as with Model 1. The results showed the capability of the DenseNet model to extract and engineer high level features from the MIR images. Figure 3 showed no indication of over- fitting (where the model is optimised to predict the validation dataset only), which is common in datasets with high complexity (Ghojogh and Crowley, 2019). On the deterioration of accuracy and loss immediately after 100 epochs in Model 2: the training of the deep convolutional layers started from epoch 101 and showed that the assigned learning rate was not optimal. Several learning rates were trialled, but all showed a sudden decay of accuracy and loss. A smoother transition may have resulted in an improved model, but these “golden” learning rates could not be obtained and the best learning rate was found between 1e^-4 and 1e^-6 for phase 2 of training. These learning rates are one of the most important hyper- parameters for a neural network, but are network and data specific (Howard and Gugger, 2020). [00273] Table 2 shows that 12.3% of the predictions in the inference dataset were predicted wrong (false positives and false negatives). Accuracy of predictions can be misleading, because it is discontinuous and especially in the case of binary classification. For example, consider a binary prediction with Softmax activation (Equation 2) of 0.49 and 0.51 for labels 0 and 1, respectively. If the actual record has a label of 1, the prediction would be 100% correct and if the label was 0, the prediction would be 100% incorrect. It is, however, clear from the Softmax prediction that the probabilities of both labels are almost equal. From Table 2, the average probabilities of true positive and true negative predictions were 0.971 and 0.968, respectively. In contrast, the average probabilities of false positive and false negative predictions were 0.898 and 0.892, respectively. This suggests that a further distinction can be made in practice by considering predictions with probabilities lower than 0.95 as inconclusive. Table 4 is the confusion matrix of only “conclusive” predictions. The accuracy of predictions improves from 0.877 to 0.9125 and the F1 score changes accordingly to 0.9142. A sensitivity and specificity of 0.91 and 0.92 is obtained from these results. Results found in literature from pregnancy-associated glycoprotein in dairy cows ranged from 0.96 to 0.99 for sensitivity and 0.87 to 0.95 for specificity (Commun et al., 2016; Dufour et al., 2017; Shephard and Morton, 2018). A point of concern is that 166 predictions were considered inconclusive when applying a minimum threshold for probability.

1 Precision (i.e., positive predictive value) = TP / (TP + FP) 2 Recall (i.e., sensitivity) = TP / (TP + FN) 3 F₁-score = 2 x (precision x recall) / (precision + recall) 4 Accuracy = (TP + TN) / (TP + TN + FP + FN) Where TP, TN, FP, and FN represent total numbers of true positive, true negative, false positive, and false negative predictions, respectively Table 4. Model 2 performance. Precision, recall and F₁-scores from inference using Model 2 when considering predictions with probabilities over 0.95 Comparison with Previous Studies [00274] Our study is not the first to investigate the utility of using milk MIR spectra in attempting to diagnose pregnancy in dairy cows, but we believe it is the first to attempt to do so using deep learning. As highlighted in our introduction previous studies have attempted to calibrate milk MIR spectra to predict pregnancy status in dairy cows, reporting accuracies of 0.90 (Lainé et al., 2014; based on sensitivity and specificity); 0.60 (Toledo-Alvarado et al., 2018; based on area under the receiver operator curve); and more recently, 0.65 to 0.76 (Delhez et al., 2020; based on area under the receiver operator curve). Prediction equations from these studies were developed using both residual- (Lainé et al., 2014; Delhez et al., 2020) and whole-spectrum MIR profiles (Toledo-Alvarado et al., 2018). Each of these studies highlighted the potential of milk MIR spectra as a predicator of pregnancy status. [00275] Lainé et al. (2014), using a discriminate analysis (DA) approach, were able to successfully discriminate between residual spectra from pregnant and non-pregnant cows with a sensitivity of 99.7% and specificity of 86.2% during cross-validation. Residual spectra were generated by subtracting expected open spectra (obtained via a mixed model) from observed spectra. Accuracy was reported to drop significantly (up to 50%) during external validation (Delhez et al., 2020), and an error rate of 55.5% was observed when applied to raw spectra (Lainé et al., 2014). [00276] Toledo-Alvarado et al. (2018), using whole-spectrum MIR from multiple breeds, predicted pregnancy status via generalized linear models fitting a combination of effects (DIM, parity, herd year) in addition to spectra, as well as from milk components. The best accuracies were obtained (area under curve) when herd and year were included with the spectra; lowest prediction accuracy was observed in the Holsteins (0.61). [00277] Delhez et al. (2020) adopted a PLS-DA approach and investigated 3 different strategies to discriminate between pregnant and non-pregnant cows based on: 1) a single spectra post insemination similar to Toledo-Alvarado et al. (2018), but with the addition of including cows with no calving records; 2) residual spectra similar to Lainé et al. (2014), but using only observed spectra (not modelled); and, 3) grouping records by period after insemination. Delhez et al. (2020) reported accuracies (area under curve) of 0.63 and 0.65 for training and testing, respectively, for strategy 1 (with corresponding sensitivity and specificity during testing of 0.65 and 0.56, respectively). For strategy 2 results were similar during testing with an accuracy, sensitivity, and specificity of 0.58, 0.59, and 0.52, respectively. The third strategy observed promising results for records 151d+ post insemination, reporting an average accuracy, sensitivity, and specificity of 0.76, 0.73, and 0.64, respectively. [00278] We observed significantly higher prediction accuracies than the studies highlighted above, 88% increasing to 91% when only considering predictions with a confidence over 0.95; this is especially the case when not considering previous results from residual spectra (only observed spectra were used in the development of our models). These higher accuracies may be attributed to a combination factors, including our use of a deep learning approach, phenotype definition, and volume of available data. Moreover, the results obtained by applying a PLS-DA to our data achieved similar accuracies to those obtained by the previous studies discussed above; we observed as accuracy, sensitivity, and specificity of 0.77, 0.73, and 0.82, respectively, compared to the accuracy, sensitivity, and specificity of 0.76, 0.73, and 0.64, respectively obtained by Delhez et al. (2020). Additionally, when comparing the PLS-DA method with the DL method used in the development of model 2, not only did we achieve higher accuracies across all metrics calculated using DL (0.91 compared to 0.77) but the development time was also vastly reduced – especially when considering the data used in the PLS-DA was a (random, balanced) subset of that used to train models 1 and 2. [00279] Use of deep learning in the agricultural space has been limited to date (Howard, 2018), and as such has been met with reservation and suspicion – rightly so without solid proof of validation and evidence of application. Our results highlight a high accuracy, sensitivity and specificity during both training, validation and testing. The training of DL networks involves a methodology that is similar to a combination of k-fold cross-validation and external validation. After each iteration of the training data (i.e., calibration) the resulting model is then applied to a set of validation data with results used to update the weights and biases at each node in the network, optimizing the model. The final optimized model is then further applied to an external test dataset – the test set is independent of the training and validation sets and simulates a live prediction scenario. Thus we believe this method of train- validate-test provides a robust indication of model performance. [00280] Definition of the pregnancy status phenotype is an important aspect of MIR-based prediction. Good quality and clean phenotypes are not only a crucial requirement of deep learning models (i.e., the labels) but are also an important requirement of any predictive modelling. In each of the three previous studies, and in our own study, the way in which pregnant and non-pregnant (or open) cows are defined differs. It is our belief that by defining non-pregnant records as those between parturition and first insemination we can say with 100% certainty that such records are representative of the non-pregnant class; similarly, for pregnant records (as those between the last insemination and the subsequent calving with a gestation length between 240 and 284 d). This gives us a robust phenotype to pass to the deep learning network. [00281] Finally, it is worth noting the differences in data volume available to each of the previous studies in comparison to our own. Previously developed models by Lainé et al. (2014), Toledo-Alvarado et al. (2018), and Delhez et al. (2020) used spectra from 68,998, 69,821, and 8,064 cows, respectively; the present study had access to UK national data from 697,671 cows obtained via monthly milk recording over an 8 year period. Moreover, the application of transfer learning greatly reduced the amount of data required to train models enabling us to create a training dataset containing equal numbers of the most accurate phenotypes possible. This combined with testing on (random) unseen data from throughout lactation (results in Tables 2 and 3) appears to give a good indicator of pregnancy status; a final test of the models ability to discriminate pregnant from non-pregnant cows will be obtained through live field testing. CONCLUSIONS [00282] Deep learning has been shown to be a viable tool in understanding complex data and generation of predictions in new datasets. We believe the present study to be the first to successfully predict pregnancy status (with high accuracy) of dairy cows from observed milk MIR spectral data using a deep learning approach. Convolutional neural networks were found to be an appropriate network architecture to predict pregnancy status from MIR spectra and allowed greater sub-sampling of features (Model 1). Transfer learning proved a viable option for creating high quality models ready for industry application (91% accuracy during testing). Prediction equations from Model 2 can be applied by industry as part of routine milk recording as a cost-effective monitoring tool to identify possible errors in data recording practices; to verify conception dates; and to alert farmers of nonviable or lost pregnancies as early as possible. Such a tool would also provide an effective enabling service allowing the farmer to take ownership of the health and fertility of their herd. Finally, such extra information can be generated requiring no additional input or labour on behalf of the farmer or any changes in herd management, and importantly, is non-invasive to the cow. REFERENCES [00283] Abadi, M., A. Agarwal, P.Barham, E. Brevdo, Z. Chen, et al.2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. [00284] Brand, W., A.T. Wells, andM.P. Coffey.2018. Predicting pregnancy status from mid-infrared spectroscopy in dairy cow milk using deep learning. Page 347 in Abstracts of the 2018 Annual Meeting of the American Dairy Science Association. Journal of Dairy Science (vol 101, suppl 2), Knoxville, Tennessee, USA. [00285] Ciresan, D., U. Meier, J. Masci,L.M. Gambardella, and J. Schmidhuber.2011. Flexible, High Performance Convolutional Neural Networks for Image Classification. Pages 1237–1242 in International Joint Conference on Artificial Intelligence IJCAI, Barcelona, Catalonia, Spain. [00286] Commun, L., K. Velek, J.B.Barbry, S. Pun, A. Rice, A. Mestek, C. Egli, and S. Leterme.2016. Detection of pregnancy-associated glycoproteins in milk and blood as a test for early pregnancy in dairy cows. J. Vet. Diagnostic Investig.28:207–213. [00287] Dehareng, F., C. Delfosse, E.Froidmont, H. Soyeurt, C. Martin, N. Gengler, a. Vanlierde, and P. Dardenne.2012. Potential use of milk mid-infrared spectra to predict individual methane emission of dairy cows. Animal 6:1694–1701. [00288] Delhez, P., P.N. Ho, N.Gengler, H. Soyeurt, and J.E. Pryce.2020. Diagnosing the pregnancy status of dairy cows: How useful is milk mid-infrared spectroscopy?. J. Dairy Sci. 103:3264–3274. [00289] Denholm, S.J., W. Brand, A.P.Mitchell, A.T. Wells, T. Krzyzelewski, S.L. Smith, E. Wall, and M.P. Coffey.2020. Predicting bovine tuberculosis status of dairy cows from mid- infrared spectral data of milk using deep learning. J. Dairy Sci. [00290] Dufour, S., J. Durocher, J.Dubuc, N. Dendukuri, S. Hassan, and S. Buczinski.2017. Bayesian estimation of sensitivity and specificity of a milk pregnancy-associated glycoprotein-based ELISA and of transrectal ultrasonographic exam for diagnosis of pregnancy at 28–45 days following breeding in dairy cows. Prev. Vet. Med.140:122–133. [00291] Friedrichs, P., C. Bastin, F.Dehareng, B. Wickham, and X. Massart.2015. Final OptiMIR Scientific and Expert Meeting : From milk analysis to advisory tools ( Palais des Congrès , Namur, Belgium). Pages 97–124 in Biotechnology, Agronomy, Society and Environment, Namur, Belgium. [00292] Ghojogh, B., and M. Crowley.2019. The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial. arXiv e-prints. [00293] Grelet, C., J.A. Fernández Pierna, P. Dardenne, V. Baeten, and F. Dehareng.2015. Standardization of milk mid-infrared spectra from a European dairy network. J. Dairy Sci. 98:2150–2160. [00294] Howard, J.2018. Deep Learning: The tech that’s changing everything, except animal breeding and genetics. Page in Proceedings of the World Congress on Genetics Applied to Livestock Production, Auckland. [00295] Howard, J., and S. Gugger.2020. Fastai: A Layered API for Deep Learning. Information 11:108. [00296] Huang, G., Z. Liu, L. van der Maaten, and K.Q. Weinberger.2017. Densely Connected Convolutional Networks. Pages 2261–2269 in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. [00297] Jacobsen, J.-H., E. Oyallon, S. Mallat, and A.W.M. Smeulders.2017. Multiscale Hierarchical Convolutional Networks. arXiv e-prints. [00298] Lainé, A., C. Bastin, C. Grelet, H. Hammami, F.G. Colinet, et al.2017. Assessing the effect of pregnancy stage on milk composition of dairy cows using mid-infrared spectra. J. Dairy Sci.100:2863–2876. [00299] Lainé, A., H. Bel Mabrouk, L. Dale, C. Bastin, and N. Gengler.2014. How to use mid-infrared spectral information from milk recording system to detect the pregnancy status of dairy cows. Comm. Appl. Biol. Sci 79:33–38. [00300] Lecun, Y., Y. Bengio, and G. Hinton.2015. Deep learning. Nature 521:436–444. [00301] De Marchi, M., V. Toffanin, M. Cassandro, and M. Penasa.2014. Invited review: Mid-infrared spectroscopy as phenotyping tool for milk traits. J. Dairy Sci.97:1171–1186. [00302] McParland, S., G. Banos, E. Wall, M.P. Coffey, H. Soyeurt, R.F. Veerkamp, and D.P. Berry.2011. The use of mid-infrared spectrometry to predict body energy status of Holstein cows. J. Dairy Sci.94:3651–3661. [00303] Muhammd, F., A. Sarwar, and C.S. Hayat.2000. Peripheral plasma progesterone concentration during early pregnancy in Holstein Friesian Cows. Pak. Vet. J.20:166–168. [00304] Olori, V.E., S. Brotherstone, W.G. Hill, and B.J. McGuirk.1997. Effect of gestation stage on milk yield and composition in Holstein Friesian dairy cattle. Livest. Prod. Sci. 52:167–176. [00305] Pan, S.J., and Q. Yang.2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.22:1345–1359. [00306] Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, et al.2011. Scikit- learn: Machine Learning in Python. J. Mach. Learn. Res.12:2825–2830. [00307] Penasa, M., M. De Marchi, and M. Cassandro.2016. Short communication: Effects of pregnancy on milk yield, composition traits, and coagulation properties of Holstein cows. J. Dairy Sci.99:4864–4869. [00308] Roelofs, J., F. López-Gatius, R.H.F. Hunter, F.J.C.M. van Eerdenburg, and C. Hanzen.2010. When is a cow in estrus? Clinical and practical aspects. Theriogenology 74:327–344. [00309] van Rossum, G.1995. Python tutorial, Technical Report CS-R9526. Available at www.python.org. Centrum voor Wiskunde en Informatica (CWI), Amsterdam. [00310] Sheldon, M., and D. Noakes.2002. Pregnancy diagnosis in cattle. In Pract.24:310– 317. [00311] Shephard, R.W., and J.M. Morton.2018. Estimation of sensitivity and specificity of pregnancy diagnosis using transrectal ultrasonography and ELISA for pregnancy-associated glycoprotein in dairy cows using a Bayesian latent class model. N. Z. Vet. J.66:30–36. [00312] Smith, S., S.J. Denholm, M.P. Coffey, and E. Wall.2019. Energy profiling of dairy cows from routine milk mid-infrared analysis. J. Dairy Sci.102:11169–11179. [00313] Soyeurt, H., C. Bastin, F.G. Colinet, V.M.R. Arnould, D.P. Berry, et al.2012. Mid- infrared prediction of lactoferrin content in bovine milk: Potential indicator of mastitis. Animal 6:1830–1838. [00314] Soyeurt, H., P. Dardenne, F. Dehareng, G. Lognay, D. Veselko, M. Marlier, C. Bertozzi, P. Mayeres, and N. Gengler.2006. Estimating fatty acid content in cow milk using mid-infrared spectrometry. J. Dairy Sci.89:3690–5. [00315] Soyeurt, H., F. Dehareng, N. Gengler, S. McParland, E. Wall, D.P. Berry, M.P. Coffey, and P. Dardenne.2011. Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries. J. Dairy Sci.94:1657–1667. [00316] Srivastava, R.K., K. Greff, and J. Schmidhuber.2015. Highway Networks. arXiv e- prints. [00317] Toffanin, V., M. De Marchi, N. Lopez-Villalobos, and M. Cassandro.2015. Effectiveness of mid-infrared spectroscopy for prediction of the contents of calcium and phosphorus, and titratable acidity of milk and their relationship with milk quality and coagulation properties. Int. Dairy J.41:68–73. [00318] Toledo-Alvarado, H., A.I. Vazquez, G. de los Campos, R.J. Tempelman, G. Bittante, and A. Cecchinato.2018. Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows. J. Dairy Sci.101:2496–2505. [00319] Wojciechowski, K.L., and D.M. Barbano.2016. Prediction of fatty acid chain length and unsaturation of milk fat by mid-infrared milk analysis1. J. Dairy Sci.99:8561–8570. [00320] Yim, J., J. Ju, H. Jung, and J. Kim.2015. Image classification using convolutional neural networks with multi-stage feature. Pages 587–594 in Advances in Intelligent Systems and Computing. Springer Verlag. Zhuang, F., Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He.2019. A Comprehensive Survey on Transfer Learning. arXiv e-prints. EXAMPLE 2 - PREDICTING bTB STATUS FROM MILK SPECTRAL DATA [00321] Milk mid-infrared spectral data collected as part of routine monthly milk recording were used to train deep artificial neural networks in order to predict bovine tuberculosis (bTB) status in dairy cows. Prediction accuracy of the network was 95% (with a sensitivity and specificity of 0.96 and 0.94, respectively) demonstrating the benefit of utilizing routine milk recording as a non-invasive method to alert farmers of cows potentially affected by bTB. This non-invasive method of rapid, routine prediction would enable alerted cows to be isolated or removed earlier and restrict the spread of bTB within the herd. [00322] Bovine tuberculosis (bTB) is a zoonotic disease of cattle that is transmissible to humans, is distributed worldwide, and considered endemic throughout much of England and Wales. Mid-infrared (MIR) analysis of milk is used routinely to predict fat and protein concentration but is also a robust predictor of several other economically important traits from individual fatty acids to body energy. This study predicted bTB status of UK dairy cows using their MIR spectral profiles collected as part of routine milk recording. Bovine TB data were collected as part of the national bTB testing program for Scotland, England, and Wales (GB); these data provided information from over 40,500 bTB herd breakdowns. Corresponding individual cow life history data were also available and provided information on births, movements, and deaths of all cows in the study. Data relating to single intradermal comparative cervical tuberculin (SICCT) skin test results, culture, slaughter status, and presence of lesions were combined to create a binary bTB phenotype; labelled 0 to represent non-responders (i.e., healthy cows) and 1 to represent responders (i.e., bTB- affected cows). Contemporaneous individual milk MIR spectral data were collected as part of monthly routine milk recording and matched to bTB status of individual animals on SICCT test date (±15d). Deep learning, a sub-branch of machine learning, was used to train artificial neural networks and develop a prediction pipeline for subsequent use in national herds as part of routine milk recording. Spectra were first converted to 53×20-pixel PNG images then used to train a deep convolutional neural network (CNN). Deep CNNs resulted in a bTB prediction accuracy (i.e., the number of correct predictions divided by the total number of predictions) of 71% after training for 278 epochs. This was accompanied by both a low validation loss (0.71) and moderate sensitivity and specificity (0.79 and 0.65, respectively). In order to balance data in each class, additional training data were synthesized using the synthetic minority over sampling technique (SMOTE). Accuracy was further increased to 95% (after 295 epochs) with corresponding validation loss minimized (0.26) when synthesized data were included during training of the network. Sensitivity and specificity also saw a 1.22- and 1.45-fold increase to 0.96 and 0.94, respectively, when synthesized data were included during training. We believe this study to be the first of its kind to predict bTB status from milk MIR spectral data. We also believe it to be the first study to use milk MIR spectral data to predict a disease phenotype and posit that the automated prediction of bTB status at routine milk recording could provide farmers with a robust tool enabling them to make early management decisions on potential reactor cows, and thus help slow the spread of bTB. [00323] Different physiological processes can leave molecular signatures in the milk of dairy cows (Soyeurt et al., 2006). Such signatures can potentially be detected by analyzing mid- infrared (MIR) spectral data, a by-product resulting from routine milk recording, and used as biomarkers for economically important traits (Soyeurt et al., 2006, 2011). Mid-infrared spectroscopy of milk samples is an internationally used non-invasive method for the prediction of milk fat and protein content during routine milk recording. This method of prediction is increasingly being used as an efficient and effective low-cost tool for rapid prediction of expensive and, more often than not, difficult-to-record phenotypes. The utility of using milk MIR spectra as a phenotyping tool has become an increasingly popular area of research over the last 15+ years (Berry et al., 2013; De Marchi et al., 2014) with success demonstrated in the prediction of milk fatty acids (Soyeurt et al., 2011); body energy (McParland et al., 2011; Smith et al., 2019); methane emissions (Dehareng et al., 2012); ketone bodies (Grelet et al., 2016); lactoferrin (Soyeurt et al., 2012); feed intake (Wallén et al., 2018); and, pregnancy status (Lainé et al., 2014; Toledo-Alvarado et al., 2018; Delhez et al., 2020). Further, such research has resulted in successful international and multidisciplinary collaborative projects such as RobustMilk (Veerkamp et al., 2013) and OptiMIR (Friedrichs et al., 2015). Moreover, for farmers already involved in routine milk recording, obtaining additional MIR spectra-based herd information requires no extra staff/labor costs or any changes in herd management. Regarding milk recording agencies, it offers additional services that can be offered to dairy farmers for only incremental data handling costs. [00324] Large datasets such as those containing MIR spectral records offer an exceptional opportunity to exploit the power of machine learning algorithms to investigate and better understand relationships between milk spectra and traits of importance that may go otherwise unnoticed using other, or unsuitable, statistical techniques. Deep learning, a sub- branch of machine learning, employs algorithms and techniques that are better able to make use of the increasingly huge datasets and advances in computer technology of the present day (Bengio, 2009; Jia Deng et al., 2009; Krizhevsky et al., 2012; Lecun et al., 2015). [00325] Recently our group applied a deep convolutional neural network (CNN) to MIR- matched pregnancy data in order to predict the pregnancy status of dairy cows (Brand et al., 2018). We observed that milk MIR spectra contained features relating to pregnancy status and underlying metabolic changes in dairy cows, and that such features can be identified using a deep learning approach. In our study we defined pregnancy status as a binary trait (i.e., pregnant, not-pregnant) and found CNNs significantly improved prediction accuracy with trained models able to detect 83% and 73% of onsets and losses of pregnancy, respectively (Brand et al., 2018). More recently the inventors have improved prediction accuracy such that models predict pregnancy status with an accuracy of 97% (with a corresponding validation loss of 0.08) after training for 200 epochs (unpublished, submitted). [00326] Having proven the concept of training MIR spectra to predict a categorical (binary) trait using a deep learning approach, i.e., pregnancy status in dairy cows, the technique has been extended to predicting other hard to record phenotypes from MIR spectral data; specifically disease traits such as bovine tuberculosis (bTB). [00327] Bovine tuberculosis is a zoonotic disease endemic in the UK and Ireland and is distributed worldwide in parts of Africa, Asia, Europe, the Middle East, the Americas and New Zealand (Humblet et al., 2009). This chronic, slowly progressive and debilitating disease presents a significant challenge to the UK cattle sector, and additionally, has considerable public health implications in countries where it is not subject to mandatory eradication programs (Olea-Popelka et al., 2017). The disease is caused by Mycobacterium bovis (M. bovis) infection primarily involving the upper and lower respiratory tract and associated lymph nodes (Pollock and Neill, 2002). The Department for Environment, Food and Rural Affairs (Defra) lists bTB as one of the four most important livestock diseases globally, incurring annual costs of about £175 million in the UK. In 2017 the total numbers of cows slaughtered due to bTB (i.e., all cows defined as reactors and inconclusive reactors) in England, Wales, and Scotland were 33,238, 10,053, and 273 cows, respectively, equating to a 14%, 1%, and 46% increase in the number of cows slaughtered compared to 2016 (Department for Environment, Food and Rural Affairs, 2018). The disease affects animal health and welfare, causing substantial financial strain to the dairy cattle sector worldwide through involuntary culling, animal movement restrictions and the cost of control and eradication programs (Allen et al., 2010). Moreover, the disease also has significant, and often unseen, social and psychological impact with farmers mental health being particularly affected (Parry et al., 2005; FarmingUK, 2018; Crimes and Enticott, 2019). [00328] Recent research has led to the development of the world’s first national genetic and genomic evaluation for bTB resistance of the Holstein dairy breed in the UK and the launch of the index TB Advantage (AHDB Dairy, 2016; Banos et al., 2017). Research confirmed the existence of significant genetic variation among individual animals for resistance to bTB infection, mainly inferred from the single intradermal comparative cervical tuberculin (SICCT) skin test and the presence of lesions and bacteriological tests following slaughter (Pollock and Neill, 2002; Bermingham et al., 2009; Brotherstone et al., 2010; Tsairidou et al., 2014). Initial research on dairy genetic evaluations for bTB has now been extended to all dairy breeds. [00329] The objective of the present study was to use phenotypic reference data obtained from the GB national bTB testing program, combined with concurrent milk MIR spectral data from routine milk recording, to train deep artificial neural networks in order to develop a prediction pipeline for bTB status. Such a tool would enable prediction of bTB status from milk MIR spectral data alone and could be used as an early alert system, for example, as part of routine milk recording. MATERIALS AND METHODS Animals [00330] Cow (n= 1,678,165) data were from national herds involved in routine milk recording with National Milk Records (NMR) and were distributed across Great Britain. National Milk Records are the leading supplier of milk recording services in the UK, processing a daily herd-level bulk milk sample from 97% of UK farms as well as a monthly individual milk sample from 60% of the individual cows in the UK (National Milk Records, 2019). Since 2013 SRUC have received spectral data daily, in addition to milk composition and pedigree information for cows from over 4,900 commercial farms across the UK 3 times per year. The majority of cows in this study were Holstein-Friesians (81%), followed by Belted Galloway (9%), Jersey (3%), Ayrshire (1%), Brown Swiss (0.8%), Swedish Red & White (0.8%), and Guernsey (0.7%). The data also included small numbers of other dairy breed and crosses (< 3.7%). Bovine Tuberculosis data [00331] Bovine TB data were made available by the Animal and Plant Health Agency (APHA) and were collected via the Great Britain national bTB testing program. These data provided information from over 40,500 confirmed and unconfirmed bTB herd breakdowns between October 2001 and January 2018, including breakdown start and end dates, breakdown duration, animal age at breakdown, SICCT skin test date, lesion status, SICCT skin test result, culture result, and slaughter status. Only data relating to dairy cows were considered in our study. Cattle Movements data [00332] Data relating to cattle births, movements, and deaths were supplied by the British Cattle Movements Service (BCMS). These data contained individual information relating to date, time and location of all births and deaths as well as age at death. Additionally, processed data (i.e., calculated from the raw data) relating to any individual cattle movements were available with corresponding dates, locations (to and from), length of stays, distances travelled, location types (e.g., agricultural holding, slaughterhouse, etc.). These data were matched to concurrent bTB profiles of each cow in the study. Mid-infrared Spectral Data [00333] Milk Sampling and Mid-infrared Spectral Analysis. Milk sampling of individual cows occurred at 30-day intervals between January 2012 and August 2019 as part of a routine milk recording service provided to farmers on a subscription basis. In addition to daily bulk milk testing NMR carry out MIR analysis of individual cow milk samples as part of their routine milk recording services. For the present study we focused on these routinely collected individual samples. Mid-infrared spectrometry of milk samples was carried out by National Milk Laboratories (NML, Wolverhampton, UK), part of the NMR group, using FOSS FTIR spectrometers (FOSS Electric A/S, Hillerød, Denmark). The FOSS machines use an interferometer and the Fourier Transform Infrared (FT-IR) technique within the MIR region of wavelengths from 900 to 5000 cm^-1 to generate spectra (FOSS, 2016). [00334] Pre-treatment and Standardization of Mid-Infrared Spectral Data. Following MIR analysis, a spectrum of 1,060 ‘transmittance’ data points are generated; these data represent the absorption of infrared light through the milk sample. Prior to use in any analyses the spectra were subject to several pre-treatments. Firstly, the transmittance data obtained from the spectrometer were converted to a linear absorbance scale by applying a log₁₀ ^-0.5 transformation to the reciprocal of the transmittance (Soyeurt et al., 2011). Secondly, spectral data were ‘standardized’ in order to account for drift incurred by collection of spectral data from different MIR instruments and across time (Grelet et al., 2015). Standardization was carried out using files supplied by the Walloon Agricultural Research Centre (CRA-W) and following protocols developed within the InterReg/EU-funded project OptiMIR (Friedrichs et al., 2015). Standardization of the spectra as above has the added value of ensuring resultant prediction tools can be applied to data streams from other machines throughout Europe that have adopted the same standardization procedure (Grelet et al., 2015) and that predictions can be compared across time because drift in the machines is accounted for. Creation of Training and Testing Datasets for Deep Learning [00335] Definition of bTB Phenotype. The bTB phenotype was created for each cow using data relating to SICCT skin test results, culture status, whether a cow was slaughtered, and whether any lesions were observed, all at the individual level. Information from each of these categories (where available) was combined to create a binary phenotype; labeled 0 to represent non-responders (i.e., healthy cows) and 1 to represent responders (i.e., bTB- affected cows). For example, if a skin test was inconclusive but data indicated the cow was slaughtered and there was a positive observation of lesions then this record was labeled as 1. Similarly, if a skin test suggested a non-responder but lesions were observed then this record was also labeled as 1, etc. Records were only ever labeled 0 when the skin test result combined with information relating to slaughter, culture, and lesions did not indicate the presence of bTB. [00336] Alignment of Spectral Data to bTB profile. For each cow in the dataset, bTB phenotype data (as described above) were matched to their concurrent milk MIR spectral data on sample date, i.e., the date of individual SICCT skin testing and individual milk sampling for bTB and spectral data, respectively. In the event of no milk spectral data collected on the same day as a SICCT skin test then the milk spectra sample closest to skin testing was used with a maximum tolerance of ± 15 days. [00337] Data Preparation. In order to investigate the degree of accuracy of the bTB phenotype, as well as the impact of herd location, three distinct datasets were created. In all 3 datasets responders were selected from confirmed bTB breakdown herds with non- responders selected as follows: 1) non-responders selected from herds with no confirmed responders; 2) non-responders selected from the same heard-breakdown as responders; and 3) non-responders that eventually test positive for bTB but where the time between a negative (non-responder) and positive (responder) result was greater than 183 days (i.e., a period of time sufficiently long enough to have observed multiple tests). Finally, datasets were randomly partitioned into training and validation sets for use in model development via deep learning. Datasets were partitioned such that approximately 80% of the data appeared in the training set with the remaining 20% in the validation set. Both training and validation data were balanced such that each set contained approximately equal numbers of reactors and non-reactors. Deep Learning [00338] Hardware and Software Requirements. To successfully utilize the power of deep learning in a timely manner certain hardware and software requirements needed to be met. The full system specifications used in the present study are presented in Table 5 and summarised as follows: NVIDIA DGX Station personal AI supercomputer (NVIDIA Ltd., 2019) fitted with 4 NVIDIA Tesla V100 graphics processing units (GPU); Linux (Ubuntu) operating system; Python 3.5 Virtual Environment running within a Docker container; and, PyTorch-GPU. PyTorch is an open source machine learning library (released under the Modified BSD license) developed by Facebook's artificial intelligence research group for use in research & development as well as production systems (Paszke et al., 2017). The GPU- enabled version of PyTorch offers enhanced processing speeds compared to the central processing unit (CPU) version.

1 (NVIDIA Ltd., 2019) 2 (Brand et al., 2018) Table 5. System specifications of the deep learning rig (building on a NVIDIA DGX Station1) Development of Prediction Tool [00339] Repeated Observations. Briefly concerning repeated observations on cows (i.e., only in the case of non-responders) the only data used to train models were the 1060 MIR wavelength values (i.e., features) with corresponding bTB status (i.e., labels). Deep learning algorithms did not have access to any animal information, thus were unable to differentiate between multiple and single observations. Moreover, the majority of data (89%) were from single observations. [00340] Data Synthesis. For supervised deep learning tasks an important requirement is a large quantity of balanced, labeled data (Lecun et al., 2015). In the case of bTB the literature reports herd incidence of bTB of around 0.3% to 7.5% for low and high risk areas, respectively (Brotherstone et al., 2010). Furthermore, an incidence of approximately 4% was observed in the data available to the present study. The requirement for balanced labels (i.e., bTB infected cows and healthy cows) meant that of the 250,000+ animal test-dates available to us, we could only train on around 20,000 due to the low number of TB positive records. To overcome this, we synthesized additional bTB positive MIR spectra and investigated the impact of including these data during training. For the purposes of the present study new data were synthesized using Synthetic Minority Over Sampling (SMOTE, Chawla et al., 2002) as well as the Adaptive Synthetic (ADASYN, Haibo He et al., 2008) sampling approach. Synthesized MIR data were only added to training sets and never to validation sets. Moreover, only bTB positive MIR spectra were synthesized with labels balanced using real MIR spectral data from healthy cows. [00341] Transfer Learning. Transfer learning is a machine learning technique in which a pre-trained (or learned) model, trained for a specific task, is re-purposed for a new, different task (Goodfellow et al., 2016). This method enabled harnessing of the knowledge and power of the vast amount of published research and development already available in the field of computer vision - without doubt the field with largest and most widely adopted use of deep learning. For the pre-trained model DenseNet-161 was opted to be used, a Dense Convolutional Network where each layer in the network is connected to every other layer in a feed-forward fashion (Huang et al., 2017). This was made possible by converting individual spectral records into 53×20pixel greyscale images, as described below. [00342] Creation of Images from MIR Spectral Wavelength Values. Mid infrared spectral images were created by iterating through the dataset, selecting an individual spectral record, and reshaping it from an array of size 1060×1 to an array of size 53×20. Each of the reshaped arrays then had their wavelength values normalized to a value between 0 and 1 before finally multiplying each normalized wavelength by 255 to represent the wavelength values as grey scale pixels. Resulting arrays were then saved as individual PNG images (Figure 4). [00343] Measures of Accuracy. To determine how well models performed a number of metrics, commonly used in machine and deep learning, were calculated for resultant models. One of the most important of these metrics is loss, a value that ranges between 0 and +∞ that is calculated by a specific loss function after each epoch during both training and validation (L_t and L_v, respectively with 0 ≤ L_{t, v}). Loss functions are used to measure how wrong a model is (error) by comparing the predicted value, ŷ, with the actual value, y (Lecun et al., 2015). If the distance between ŷ and y is large the loss will be high. Conversely, if the distance is small then the loss will be low, thus providing an indication of model performance during training as well any over- or under-fitting. Loss for models developed in the present study was calculated by pushing the final (output) layer through a softmax activation function (7), this ensured the output of each node was a probability between 0 and 1, before applying a log-loss function known as categorical cross entropy (8). Softmax(yi) = e^yi / Σj e^yj (7) Loss = –y log(ŷ) – (1 – y) log(1 – ŷ) (8) [00344] Confusion matrices were created with a true positive (TP) recorded when the model correctly predicted the positive class (responders) and a true negative (TN) when the model correctly predicted the negative class (non-responders). Similarly, a false positive (FP) was recorded when the model incorrectly predicted a non-responder as a responder, and likewise, a false negative (FN) was recorded when the model incorrectly predicted a responder as a non-responder. Ideally, one would want to minimize the number of FP and FNs. False negatives were considered as extremely important since they would have serious ramifications in a live setting resulting in potentially infected animals remaining in the herd. Total numbers of TP, TN, FP, and FN were then used to calculate additional metrics to determine model performance and included accuracy, precision, sensitivity, specificity, and the Matthews correlation coefficient. [00345] Accuracy (ACC) was defined as the fraction of total predictions where the model was correct and was calculated via: ACC = (TP+TN) / (TP + FP + FN + TN) [00346] Where 0 ≤ ACC ≤ 1. [00347] Positive predictive value (PPV) is the probability that an individual with a positive test result is indeed infected and was defined as the proportion of positive predictions that were verified as correct and was calculated via: PPV = TP / (TP + FP) [00348] Thus, if a model produces no false positives it would have a PPV of 1. [00349] Negative predictive value (NPV) is the probability that an individual with a negative test result is truly free from infection and was defined as the proportion of negative predictions that were verified as correct and was calculated via: NPV = TN / (TN + FN) [00350] Thus, if a model produces no false negatives it would have a NPV of 1. [00351] Sensitivity (TPR, i.e., recall, or true positive rate) was defined as the proportion of true positives the model identified correctly and was calculated via: TPR = TP / (TP + FN) [00352] Thus, if a model produces no false negatives it would have a TPR of 1. [00353] Specificity (TNR, i.e., true negative rate) was defined as the proportion of true negatives the model identified correctly and was calculated via: TNR = TN / (TN + FP) [00354] Thus, if a model produces no false positives it would have a TNR of 1. [00355] Finally, the Matthews correlation coefficient (MCC, Matthews, 1975), a balanced measure of binary classifications used in machine learning and nondependent on which class is the positive class, was calculated via: MCC = (TP ∙ TN – FP ∙ FN) / √ (TP + FP) (TP + FN) (TN + FP) (TN + FN) [00356] Where -1 ≤ MCC ≤ 1. It has been suggested that MCC is the most informative single value measure in evaluating binary classification problems (Powers, 2007) due to it taking into account the balance ratios of the confusion matrix categories (Chicco, 2017). RESULTS Alignment of Spectral Data to bTB profile [00357] Alignment of bTB phenotypes with concurrent milk MIR spectral records produced a dataset containing 259,957 animal test-dates relating to 234,073 cows from 1,959 herds. There were 1,899 instances where the bTB phenotype could not be defined using the available data, these data were subsequently removed but retained for future use. Thus, the final dataset for use in training models contained 258,058 animal test-dates relating to 231,893 cows from 1,946 herds and concerned 2,936 distinct herd breakdowns. Regarding herd breakdowns, the majority (2,105) were confirmed breakdowns (i.e., having Officially Tuberculosis Free – Withdrawn, OTFW, status), 809 were unconfirmed (i.e., having Officially Tuberculosis Free – Suspended, OTFS, status), and 22 were of unknown status. Descriptions of the datasets generated from these available data are summarized in Table 6.

1 Non-responders randomly selected from herds with no confirmed responders 2 Non-responders from same HYS as responders but never contracted bTB 3 Non-responders eventually contract bTB (≥183d between a positive and negative test) Table 6. Summary of the MIR spectra-aligned bTB baseline and training datasets. Total numbers per dataset of aligned records (animal test-dates) are presented, broken down into total numbers of cows, herds, herd breakdowns, as well as the number of cows labeled as responders and non-responders Development of the Prediction Tool [00358] Results from training and validation are presented in Tables 7. All models were trained in two stages: initially for 250 epochs for feature selection using the DenseNet161 pre-trained model. The initial features passed to the DenseNet161 pre-trained were our grayscale MIR PNG images (described earlier), as such the features selected by the model were not in the form of spectral wavelengths but were in the form of higher-level features created as a result of passing the images through the convolutional neural network (Liu et al., 2016; Huang et al., 2017). Models were then trained for a further 500, 500, and 28 epochs for dataset 1, 2, and 3, respectively. The number of epochs required in both stages of training was determined by the inclusion of an “early stopper” in the code. Early stopping is a machine learning method used to stop training when there is no improvement in model performance, thus minimizing over- and under-fitting. In the case of our network’s validation loss was the metric that was monitored with early stopping taking place when no improvement (i.e., minimize) was obtained over 25 iterations.

1 Non-responders randomly selected from herds with no confirmed responders 2 Non-responders from same HYS as responders but never contracted bTB 3 Non-responders eventually contract bTB (≥183d between a positive and negative test) Table 7. Summary of the MIR spectra-aligned bTB baseline and training datasets. Total numbers per dataset of aligned records (animal test-dates) are presented, broken down into total numbers of cows, herds, herd breakdowns, as well as the number of cows labeled as responders and non-responders [00359] In general model performance was greatest when developed using training dataset 3 (0.71 ACC; 0.79 TPR; 0.65 TNR). Dataset 1 showed the highest specificity (0.80) but also had a lower sensitivity (0.51) than the model developed using dataset 3. Training using dataset 2 resulted in the poorest performance (0.59 ACC; 0.48 TPR; 0.68 TNR). Dataset 3 also required the least number of epochs to train, converging approximately 2.7 times faster. Comparing the MCC of the models developed using the three datasets (0.32, 0.16, and 0.44, for dataset 1, 2, and 3, respectively) it was observed dataset 3 to once again yielded the better model. With all 3 MCC values less than 0.5, however, the MCC suggested that predicted label and the true label were only weakly to moderately correlated. This was further evidenced by the moderate PPV (0.63, 0.53, and 0.66, for dataset 1, 2, and 3, respectively) and NPV (0.71, 0.64, and 0.78, for dataset 1, 2, and 3, respectively) obtained. [00360] Data Synthesis. Investigations found that synthesizing data by applying SMOTE to real data returned improved results (higher ACC, lower L_v) in comparison to when ADASYN was applied, thus SMOTE was chosen to synthesize additional data for training the CNNs. In all instances, the addition of synthesized data in training datasets (only real data were used for validation) resulted in increased model performance (Table 8) with observations of lower validation loss (0.46, 0.60, and 0.26 for dataset 1, 2, and 3, respectively) and a 1.32-, 1.32-, and 1.34-fold increase in accuracy for dataset 1, 2, and 3, respectively (0.90, 0.78, and 0.95 for dataset 1, 2, and 3, respectively). Improved sensitivity (0.85, 0.78, and 0.96 for dataset 1, 2, and 3, respectively), and specificity (0.93, 0.78, and 0.94 for dataset 1, 2, and 3, respectively) were also obtained when synthesized data were included in the training set.

1 Non-responders randomly selected from herds with no confirmed responders 2 Non-responders from same HYS as responders but never contracted bTB 3 Non-responders eventually contract bTB (≥183d between a positive and negative test) Table 8. Measures of model performance resulting from training and validation where training datasets contained both real and synthesized MIR spectral data [00361] The MCC obtained were far more encouraging than those obtained previously (without synthesized data, Table 9) suggesting moderate (0.55 for dataset 2) to strong (0.78 and 0.90 for dataset 1 and 3, respectively) correlations between predicted and true labels. Again this was further evidenced by the strong PPV (0.89, 0.72, and 0.95, for dataset 1, 2, and 3, respectively) and NPV (0.90, 0.82, and 0.96, for dataset 1, 2, and 3, respectively) obtained. The results from dataset 3 signified the model was able to successfully distinguish between spectra from bTB positive and bTB negative cows with a high probability that those flagged as bTB infected and non-infected were indeed infected and free from infection, respectively.

1 Non-responders randomly selected from herds with no confirmed responders 2 Non-responders from same HYS as responders but never contracted bTB 3 Non-responders eventually contract bTB (≥183d between a positive and negative test) Table 9. Measures of model performance resulting from training and validation DISCUSSION [00362] The present study developed a pipeline for the prediction of bTB status in dairy cows by applying state-of-the-art deep learning techniques to their milk MIR spectral profiles. The prospect of using routinely collected milk samples for the early identification of bTB- infected cows represents an innovative, low cost and, importantly, non-invasive tool that has the potential to contribute substantially in the continuing push to eradicate bTB in England, Wales, and the wider UK. Such a tool would not only complement the current control measures (e.g., intradermal skin test, interferon-gamma assay, etc.) but also facilitate the rapid and seamless delivery of vital information to farmers allowing them to make fast and informed management decisions that would significantly increase the health and welfare of their animals in addition to reducing costs to the farm, government and taxpayer alike. If such a form of surveillance were to become approved certain contingencies would have to be put in place, for example, Defra would need to be informed in the first instance to stop the illegal movement of alerted animals. Harnessing the Power of Big Data and Artificial Intelligence [00363] The standard method of calibrating milk MIR spectral data using matched phenotypes by partial least squares (PLS) regression has delivered several successful quantitative analysis tools as highlighted (De Marchi et al., 2014). In the case of phenotypes represented by discrete data (e.g., categorical, binary, etc.) the usual methods for developing prediction equations have proved less efficient and resulted in lower accuracy predictions (e.g., Toledo-Alvarado et al., 2018; Delhez et al., 2020). Hence, there is a requirement for alternative and novel mathematical/statistical techniques to better utilize milk MIR spectra; a requirement we believe we have shown can be met using machine learning. [00364] As previously mentioned deep learning is a branch of the larger field of machine learning that employs algorithms that are better able to make use of today’s ever growing repositories of data and advances in computer technology (Bengio, 2009; Jia Deng et al., 2009; Krizhevsky et al., 2012; Lecun et al., 2015). Deep learning is now being used to develop solutions to problems in a variety of research fields from medicine (e.g., diagnosing unknown skin lesions, Kawahara et al., 2016) to transportation (e.g., self-driving vehicles, Martinez et al., 2017). Further examples of deep learning can be found powering the mobile phone in your pocket and the smart technologies in your home. In the agricultural and livestock/animal sciences uptake of deep learning techniques has been slow (Howard, 2018). Recently, however, applied a deep convolutional neural network (CNN) was applied to MIR spectra-matched pregnancy data and it was discovered that such algorithms significantly improved the prediction accuracy for pregnancy status in dairy cows (Brand et al., 2018). [00365] Deep learning tasks are known to require large volumes of data to successfully train a network, moreover, for supervised learning problems, such as in the present study, there is an additional requirement that data labels must be more or less equally distributed (Lecun et al., 2015; Goodfellow et al., 2016). In the case of bTB where the incidence of the disease is low (approx.4% in our data) one label dominates the data. Training on such a dataset would result in an immensely inaccurate model, and the simple approach of under-sampling would greatly reduce the amount of data available for training. To overcome these challenges, we adopted two separate approaches, one to increase the size of our training dataset (data synthesis), and another to lessen the impact of data size (transfer learning). [00366] Data synthesis is a technique commonly applied in machine learning for many different purposes, from creating naïve, clean data for training models (e.g., Mikołajczyk and Grochowski, 2018) to overcoming privacy or legal issues when working with financial or medical data (e.g., Choi et al., 2017). In order to synthesize data for our purpose we investigated two popular and widely used techniques, SMOTE and ADASYN. Both of these techniques use a k nearest neighbors approach to synthesize new data within the body of available data by randomly selecting a minority instance, A; finding its k nearest neighbors; and then, drawing a line segment in the feature space between A and a random neighbor. Synthetic instances are then generated on the line (Chawla et al., 2002; He, 2011). The ADASYN technique modifies SMOTE slightly in order to synthesize more instances in regions of the feature space where minority instances are sparse, and fewer (or none) where minority instances are dense (Haibo He et al., 2008). There are many other approaches available that can be taken to synthesize data, some of which are more advanced, themselves underpinned by deep learning, such as generative adversarial networks (GAN). The GAN uses two neural networks that are pitted against one another, a generative network which generates synthetic examples, and a discriminative network which evaluates them to determine if they are real or synthetic. The aim of the generative network is to trick the discriminative network into labeling a synthetic instance as real (Goodfellow et al., 2011). [00367] Another approach to enable the training of networks with less data available is that of transfer learning where a model developed for one task is repurposed as a starting point and fine-tuned to develop a model for another, different, task. Developing neural network models using deep learning requires high levels of resource in the form of both compute and time, as such utilizing a pre-trained model as a starting point, and subsequently fine-tuning it for a specific problem or task, can provide massive gains (Pan and Yang, 2010; Shin et al., 2016; Yang et al., 2020). [00368] Transfer learning combined with data synthesis can provide an effective enabling method for carrying out deep learning tasks when the underlying dataset size is on the smaller side, as evidenced by the present study. It is shown that it is possible to train a model to predict the bTB phenotype (as defined above) with 95% accuracy with a strong correlation between predicted and true labels (MCC = 0.90). [00369] The current SICCT skin test has a high specificity (99.98%) indicating a high confidence in results where cows “fail” the test. Conversely, the sensitivity is not as high (ranges between 52 to 100%; average of 80%) indicating that not all cows that “pass” the test are truly bTB-free, i.e., some TB-infected individuals are missed (de la Rua-Domenech et al., 2006). The current gamma interferon (IFN-γ) test, a more expensive test used alongside the SICCT test, is known to have a higher sensitivity than the SICCT test (~85 to 90%) but a lower specificity of 96.6% (Ryan et al., 2000; de la Rua-Domenech et al., 2006). Whilst the proposed tool has a slightly lower specificity that the SICCT test (96%) it is approximately equal to that of the IFN-γ test. Furthermore, a higher sensitivity than both of the current testing methods (94%) has been obtained, implying less false negatives will find themselves returning to the herd to infect other susceptible individuals. [00370] The present study reinforces the utility of a deep learning approach to calibrate MIR-spectra to predict economically important and hard to record phenotypes. It is believed that this study is the first of its kind to use deep learning to calibrate MIR-spectra for phenotype prediction. Furthermore, it is believed that this is the first study to use MIR spectral data to predict bTB, as well as the first to predict a contagious disease phenotype in general. The success of the prediction opens up the possibility to calibrate MIR spectra for other economically important diseases such as Paratuberculosis (Johne’s disease), a chronic and contagious enteritis of ruminants caused by the bacterium mycobacterium avium subsp. Paratuberculosis (M. paratuberculosis). Existing bTB Control Measures and Possible Applications of the MIR-based Tool [00371] The current bTB control strategy applied throughout GB is a combination of statutory and voluntary measures that are dependent on the perceived level of bTB risk in the area. The control measures applied to all areas regardless of risk can be split into 4 categories: surveillance, breakdown management, risk from badgers, and other disease prevention (DEFRA, 2014). Our proposed MIR based tool would complement both the “surveillance” and “breakdown management” areas of the current control strategy as follows: [00372] Surveillance. At present, key measures include on farm statutory testing as well as carcass testing at the abattoir. Results from the present study highlight the value of a MIR spectra-based “alert” of potential bTB infection within a herd, specifically, enabling the farmer (or a veterinarian) to identify and isolate (or cull) animals ahead of routine testing both on farm and at the abattoir. This would be especially beneficial in the case of OTF herds with no history of bTB outbreaks allowing farmers to monitor their herd through routine milk recording and minimize the length of a breakdown if bTB is subsequently discovered. Additionally, when alerts arise from milk MIR (animals likely to be exposed above a minimum threshold of accuracy) a herd test may be triggered allowing the farm to officially identify and isolate/quarantine potential reactors. [00373] Once removed at an earlier stage, infected animals will have a reduced opportunity to infect other animals (or other wildlife reservoirs) thus leading to a reduction in the overall level of herd infectivity. This may eventually reduce the basic reproductive number (R₀) to a level such that other interventions have a greater effect. The R₀ of an infection is defined as the average number of secondary infections produced by an infected individual in a completely susceptible host population and determines whether or not the infection can persist (Anderson and May, 1991). [00374] Breakdown Management. For herds already under, or on the onset of, restriction the proposed tool has the potential to significantly reduce the length of the breakdown (Figure 5A). At present once bTB is disclosed, the herd is put under restriction and subjected to skin tests every 60 days until two successive test periods result in zero reactors. The total length of a breakdown can therefore be calculated as 60×(n-1) days (where n = number of skin tests) and due to the infectious, chronic and slowly progressive nature of bTB one breakdown has the potential to last for months, years, or even decades. [00375] This is where early identification of infected animals would be advantageous. Alerting the farmer to cows that will fail the next skin test allows them to be removed from the herd, reducing the spread of bTB. This offers the potential to significantly reduce the length of restriction, e.g., from 60×(n-1) days to 60×(m-1) days, where m<n (Figure 5B). Moreover, for farms already involved in routine milk recording such a system would require no additional labor or changes in management. CONCLUSIONS [00376] Deep learning, underpinned by convolutional neural networks, has provided a promising method to calibrate milk MIR spectral data to predict bTB status of individual dairy cows. The models developed were able to successfully alert cows that would be expected to fail the SICCT skin test with an accuracy of 95% and a corresponding sensitivity and specificity of 0.96 and 0.94, respectively. Moreover, predictions were strongly correlated with true values (MCC = 0.90). The automated prediction of bTB status at routine milk recording could provide farmers with a robust tool enabling them to make early management decisions on potential reactor cows. The tool would have the added benefit of providing an effective enabling service giving farmers the opportunity to be more engaged with bTB testing, as well as the ability to take ownership of the health of their herd. Such a tool would also provide the government with an additional mechanism to have an immediate and enduring impact on the prevalence of bTB in UK dairy herds. REFERENCES [00377] AHDB Dairy.2016. TB Advantage - The Genetics of BTB. Accessed April 25, 2018. https://dairy.ahdb.org.uk/technical-information/breeding-genetics/tb-advantage. [00378] Allen, A.R., G. Minozzi, E.J. Glass, R.A. Skuce, S.W.J. McDowell, J.A. Woolliams, and S.C. Bishop.2010. Bovine tuberculosis: the genetic basis of host susceptibility. Proc. R. Soc. B Biol. Sci.277:2737–2745. [00379] Anderson, R.M., and R.M. May.1991. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford. [00380] Banos, G., M. Winters, R. Mrode, A.P. Mitchell, S.C. Bishop, J.A. Woolliams, and M.P. Coffey.2017. Genetic evaluation for bovine tuberculosis resistance in dairy cattle. J. Dairy Sci.100:1272–1281. [00381] Bengio, Y.2009. Learning Deep Architectures for AI. Found. Trends® Mach. Learn. 2:1–127. [00382] Bermingham, M.L., S.J. More, M. Good, A.R. Cromie, I.M. Higgins, S. Brotherstone, and D.P. Berry.2009. Genetics of tuberculosis in Irish Holstein-Friesian dairy herds. J. Dairy Sci.92:3447–3456. [00383] Berry, D.P., S. McParland, C. Bastin, E. Wall, N. Gengler, and H. Soyeurt.2013. Phenotyping of robustness and milk quality. Adv. Anim. Biosci.4:600–605. [00384] Brand, W., A.T. Wells, and M.P. Coffey.2018. Predicting pregnancy status from mid-infrared spectroscopy in dairy cow milk using deep learning. Page 347 in Abstracts of the 2018 Annual Meeting of the American Dairy Science Association. Journal of Dairy Science (vol 101, suppl 2), Knoxville, Tennessee, USA. [00385] Brotherstone, S., I.M.S. White, M.P. Coffey, S.H. Downs, a P. Mitchell, R.S. Clifton- Hadley, S.J. More, M. Good, and J. a Woolliams.2010. Evidence of genetic resistance of cattle to infection with Mycobacterium bovis. J. Dairy Sci.93:1234–1242. [00386] Chawla, N. V., K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer.2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res.16:321–357. [00387] Chicco, D.2017. Ten quick tips for machine learning in computational biology. BioData Min.10:1–17. [00388] Choi, E., S. Biswal, B. Malin, J. Duke, W.F. Stewart, and J. Sun.2017. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks 68:1–20. [00389] Crimes, D., and G. Enticott.2019. Assessing the Social and Psychological Impacts of Endemic Animal Disease Amongst Farmers. Front. Vet. Sci.6:1–13. [00390] DEFRA.2014. The Strategy for achieving Officially Bovine Tuberculosis Free status for England. Dep. Environ. Food Rural Aff.80–84. [00391] Dehareng, F., C. Delfosse, E. Froidmont, H. Soyeurt, C. Martin, N. Gengler, a. Vanlierde, and P. Dardenne.2012. Potential use of milk mid-infrared spectra to predict individual methane emission of dairy cows. Animal 6:1694–1701. [00392] Delhez, P., P.N. Ho, N. Gengler, H. Soyeurt, and J.E. Pryce.2020. Diagnosing the pregnancy status of dairy cows: How useful is milk mid-infrared spectroscopy?. J. Dairy Sci. 103:3264–3274. [00393] Department for Environment, Food and Rural Affairs.2018. Quarterly Publication of National Statistics on the Incidence and Prevalence of Tuberculosis (TB) in Cattle in Great Britain – to End December 2017. Accessed April 25, 2018. https://www.gov.uk/government/statistics/incidence-of-tuberculosis-tb-in-cattle-in-great- britain. [00394] FarmingUK.2018. Stress and Depression Common Causes of Ill Health in Farming. Accessed November 26, 2019. https://www.farminguk.com/news/stress-and-depression- common-causes-of-ill-health-in-farming_50623.html. [00395] FOSS.2016. FTIR Analysis of Food and Agri Products. Accessed June 7, 2016. http://www.foss.dk/. [00396] Friedrichs, P., C. Bastin, F. Dehareng, B. Wickham, and X. Massart.2015. Final OptiMIR Scientific and Expert Meeting: From milk analysis to advisory tools (Palais des Congrès, Namur, Belgium, 16-17 April 2015). Biotechnol. Agron. Société Environ.19:97– 124. [00397] Goodfellow, I., Y. Bengio, and A. Courville.2016. Deep Learning. MIT Press, Cambridge, Massachusetts. [00398] Goodfellow, I.J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.2011. Generative Adversarial Nets. Page 085201 in Advances in Neural Information Processing Systems. Elsevier. [00399] Grelet, C., C. Bastin, M. Gelé, J.-B. Davière, M. Johan, et al.2016. Development of Fourier transform mid-infrared calibrations to predict acetone, β-hydroxybutyrate, and citrate contents in bovine milk through a European dairy network. J. Dairy Sci.99:4816–4825. [00400] Grelet, C., J.A. Fernández Pierna, P. Dardenne, V. Baeten, and F. Dehareng.2015. Standardization of milk mid-infrared spectra from a European dairy network. J. Dairy Sci. 98:2150–2160. [00401] Haibo He, Yang Bai, E.A. Garcia, and Shutao Li.2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Pages 1322–1328 in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE. [00402] He, H.2011. Imbalanced Learning. H. He and Y. Ma, ed. John Wiley & Sons, Inc., Hoboken, New Jersey. [00403] Howard, J.2018. Deep Learning: The tech that’s changing everything, except animal breeding and genetics. Page in Proceedings of the World Congress on Genetics Applied to Livestock Production, Auckland. [00404] Huang, G., Z. Liu, L. van der Maaten, and K.Q. Weinberger.2017. Densely Connected Convolutional Networks. Pages 2261–2269 in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. [00405] Humblet, M.F., M.L. Boschiroli, and C. Saegerman.2009. Classification of worldwide bovine tuberculosis risk factors in cattle: A stratified approach. Vet. Res.40. [00406] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.2009. ImageNet: A large-scale hierarchical image database.2009 IEEE Conf. Comput. Vis. Pattern Recognit. 248–255. [00407] Kawahara, J., A. Bentaieb, and G. Hamarneh.2016. Deep features to classify skin lesions. Proc. - Int. Symp. Biomed. Imaging 2016-June:1397–1400. [00408] Krizhevsky, A., I. Sutskever, and G.E. Hinton.2012. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst.1–9. [00409] de la Rua-Domenech, R., A.T. Goodchild, H.M. Vordermeier, R.G. Hewinson, K.H. Christiansen, and R.S. Clifton-Hadley.2006. Ante mortem diagnosis of tuberculosis in cattle: A review of the tuberculin tests, γ-interferon assay and other ancillary diagnostic techniques. Res. Vet. Sci.81:190–210. [00410] Lainé, A., H. Bel Mabrouk, L. Dale, C. Bastin, and N. Gengler.2014. How to use mid-infrared spectral information from milk recording system to detect the pregnancy status of dairy cows. Comm. Appl. Biol. Sci 79:33–38. [00411] Lecun, Y., Y. Bengio, and G. Hinton.2015. Deep learning. Nature 521:436–444. [00412] Liu, Z., J. Gao, G. Yang, H. Zhang, and Y. He.2016. Localization and Classification of Paddy Field Pests using a Saliency Map and Deep Convolutional Neural Network. Sci. Rep.6. [00413] De Marchi, M., V. Toffanin, M. Cassandro, and M. Penasa.2014. Invited review: Mid-infrared spectroscopy as phenotyping tool for milk traits. J. Dairy Sci.97:1171–1186. [00414] Martinez, M., C. Sitawarin, K. Finch, L. Meincke, A. Yablonski, and A. Kornhauser. 2017. Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. arXiv Prepr. arXiv1712.01397. [00415] Matthews, B.W.1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. BBA - Protein Struct.405:442–451. [00416] McParland, S., G. Banos, E. Wall, M.P. Coffey, H. Soyeurt, R.F. Veerkamp, and D.P. Berry.2011. The use of mid-infrared spectrometry to predict body energy status of Holstein cows. J. Dairy Sci.94:3651–3661. [00417] Mikołajczyk, A., and M. Grochowski.2018. Data augmentation for improving deep learning in image classification problem.2018 Int. Interdiscip. PhD Work. IIPhDW 2018117– 122. [00418] National Milk Records.2019. History of NMR - National Milk Records. Accessed October 18, 2019. https://www.nmr.co.uk/company/history. [00419] NVIDIA Ltd.2019. NVIDIA DGX Station: AI Workstation for Data Science Teams. Accessed November 27, 2019. https://www.nvidia.com/en-gb/data-center/dgx-station. [00420] Olea-Popelka, F., A. Muwonge, A. Perera, A.S. Dean, E. Mumford, et al.2017. Zoonotic tuberculosis in human beings caused by Mycobacterium bovis—a call for action. Lancet Infect. Dis.17:e21–e25. [00421] Pan, S.J., and Q. Yang.2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.22:1345–1359. [00422] Parry, J., R. Lindsey, and R. Taylor.2005. Farmers, farm workers and work-related stress. [00423] Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, et al.2017. Automatic Differentiation in PyTorch. Accessed August 9, 2019. https://openreview.net/pdf?id=BJJsrmfCZ. [00424] Pollock, J.M., and S.D. Neill.2002. Mycobacterium bovis infection and tuberculosis in cattle. Vet. J.163:115–127. [00425] Powers, D.M.W.2007. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation (spie-07-001). Tech. Rep. Adelaide, Australia. [00426] Ryan, T.J., B.M. Buddle, and G.W. De Lisle.2000. An evaluation of the gamma interferon test for detecting bovine tuberculosis in cattle 8 to 28 days after tuberculin skin testing. Res. Vet. Sci.69:57–61. [00427] Shin, H.C., H.R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R.M. Summers.2016. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans. Med. Imaging 35:1285–1298. [00428] Smith, S., S.J. Denholm, M.P. Coffey, and E. Wall.2019. Energy profiling of dairy cows from routine milk mid-infrared analysis. J. Dairy Sci.102:11169–11179. [00429] Soyeurt, H., C. Bastin, F.G. Colinet, V.M.R. Arnould, D.P. Berry, et al.2012. Mid- infrared prediction of lactoferrin content in bovine milk: Potential indicator of mastitis. Animal 6:1830–1838. [00430] Soyeurt, H., P. Dardenne, F. Dehareng, G. Lognay, D. Veselko, M. Marlier, C. Bertozzi, P. Mayeres, and N. Gengler.2006. Estimating fatty acid content in cow milk using mid-infrared spectrometry. J. Dairy Sci.89:3690–5. [00431] Soyeurt, H., F. Dehareng, N. Gengler, S. McParland, E. Wall, D.P. Berry, M.P. Coffey, and P. Dardenne.2011. Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries. J. Dairy Sci.94:1657–1667. [00432] Toledo-Alvarado, H., A.I. Vazquez, G. de los Campos, R.J. Tempelman, G. Bittante, and A. Cecchinato.2018. Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows. J. Dairy Sci.101:2496–2505. [00433] Tsairidou, S., J.A. Woolliams, A.R. Allen, R.A. Skuce, S.H. McBride, et al.2014. Genomic prediction for tuberculosis resistance in dairy cattle. PLoS One 9. [00434] Veerkamp, R.F., L. Kaal, Y. de Haas, and J.D. Oldham.2013. Breeding for robust cows that produce healthier milk: RobustMilk. Adv. Anim. Biosci.4:594–599. [00435] Wallén, S.E., E. Prestløkken, T.H.E. Meuwissen, S. McParland, and D.P. Berry. 2018. Milk mid-infrared spectral data as a tool to predict feed intake in lactating Norwegian Red dairy cows. J. Dairy Sci.1–12. [00436] Yang, Q., Y. Zhang, W. Dai, and S.J. Pan.2020. Transfer Learning. Cambridge University Press, Cambridge, UK. EXAMPLE 3 – MilkFlow v1 (PyTorch) [00437] The first version of the tool described in Examples 1 and 2 above used convolutional neural networks (CNN) and a transfer learning approach to train a deep learning model capable of predicting phenotypes such as bovine tuberculosis and pregnancy status of individual dairy cows. The model used individual standardised MIR spectral records as input (i.e., features) and a corresponding (economically important) phenotype as output (i.e., labels). The development of the tool used the DenseNet pretrained model (DenseNet- 161) as a basis - adapted for MIR spectra classification through transfer learning; a process by which a previously fully trained model, trained for a specific task, is repurposed for a new, different task. The version of the tool described in Examples 1 and 2 utilised MIR spectral records converted to individual 53 x 20 px greyscale (PNG) images. The tool was written in Python and is underpinned by PyTorch, an open source machine learning framework that accelerates the path from research prototyping to production deployment (available at https://pytorch.org/) MilkFlow v2 (XGBoost) [00438] In an attempt to maximise accuracy, sensitivity, and specificity of MIR-based predictions, as well as minimise computation time in both training and predicting, a further version (V2) of the tool is described having the following changes: 1. V2 uses machine learning algorithms under the Gradient Boosting framework. Specifically, V2 uses XGBoost, an optimized distributed gradient boosting library (available at https://xgboost.readthedocs.io/). A GPU-accelerated version of XGboost was implemented using NVIDIA RAPIDS (https://developer.nvidia.com/rapids) 2. Raw (standardised) data (all 1060 wave point values) was used as reference or training data. 3. Granularity was added to predictions by using 2 spectral records per animal recording-date (previous record combined with current record) – this allowed for not only prediction of pregnancy status, but also the transitional state (e.g. change in pregnancy status) a. Records used had 2120 features - first 1060 from previous record; second 1060 from current record b. Labels are two parts (merged; labelled from 0 to 3) – the first label was the old label (state) from the previous record; and the second label was the current label (state) from the current record. EXAMPLE 4 – ALERT AS TO PREGNANCY STATUS [00439] After 16 days from insemination milk is collected from the cow. Mid-infrared spectra from the milk is obtained and the spectra are analysed to determine the pregnancy state of the cow. [00440] If the cow is pregnant, then the cow may be placed in a herd with other pregnant cows. The pregnancy is then confirmed by a veterinarian. [00441] If the cow is found to be not pregnant then the cow may be placed in a herd with other non-pregnant cows. The lack of pregnancy is then confirmed by a veterinarian. The cow is then subjected to another round of insemination and the steps above are repeated. [00442] Pregnant cows have milked collected on a regular basis from day 21 from successful insemination. Mid-infrared spectra from the milk is obtained and the spectra are analysed to determine the pregnancy state of the cow. By analysing the milk at regular intervals up to calving, the loss of a pregnancy can be confirmed. [00443] If a loss of pregnancy is detected and confirmed the cow is subjected to another round of insemination and the steps above are repeated. [00444] If a cow fails to be successfully inseminated after multiple attempts or loses its pregnancy after a number of successful inseminations the cow may be euthanized (culled), slaughter for meat or may be examined by a veterinarian to detect any medical reasons for the unsuccessful insemination or loss of pregnancy. If the veterinarian finds a medical cause for the unsuccessful insemination or loss of pregnancy that can be medically treated the cow is treated and the steps above are repeated. If the cause cannot be treated the cow may be euthanized. EXAMPLE 5 – DISEASE MANAGEMENT Surveillance. [00445] At present, key measures include on farm statutory testing as well as carcass testing at the abattoir. [00446] Milk is collected from cows as part of routine milking and the mid-infrared spectra obtained. The spectra is then analyzed to detect the phenotype. If TB is detected then the farmer (or a veterinarian) identifies and isolates (or culls) animals ahead of routine testing both on farm and at the abattoir. This is especially beneficial in the case of “officially TB free” (OTF) herds with no history of bTB outbreaks allowing farmers to monitor their herd through routine milk recording. Additionally, if an alert (diseased animal detected) arises from milk MIR (animals likely to be exposed above a minimum threshold of accuracy) a herd test is tested using standard and known TB tests allowing the farm to officially identify and isolate/quarantine potentially diseased cows. [00447] Once removed at an earlier stage, infected animals will have a reduced opportunity to infect other animals (or other wildlife reservoirs) thus leading to a reduction in the overall level of herd infectivity. Breakdown Management. [00448] For herds already under, or on the onset of, restriction due to TB having been previously detected, milk MIR analysis is used to detect diseased animals and non-diseased animals. [00449] At present once bTB is disclosed, the herd is put under restriction (i.e no movement or contact with other animals) and subjected to skin tests every 60 days until two successive test periods result in zero reactors. The total length of a breakdown can therefore be calculated as 60×(n-1) days (where n = number of skin tests) and due to the infectious, chronic and slowly progressive nature of bTB one breakdown has the potential to last for months, years, or even decades. [00450] Thus early identification of diseased animals is advantageous. Milk MIR analysis alerts the farmer to cows that will fail the next skin test. These cows are then removed from the herd, reducing the spread of bTB. This offers the potential to significantly reduce the length of restriction, e.g., from 60×(n-1) days to 60×(m-1) days, where m<n (Figure 5B).

Claims

Claims 1. A method of predicting or detecting a phenotype in a test animal comprising: detecting one or more features in at least one infrared spectrum obtained from the animal’s milk; wherein the presence or absence of the one or more features in the infrared spectra are indicative of a positive or negative phenotype; determining whether the animal is positive or negative for the phenotype based on the presence or absence of the one or more features; and wherein the phenotype is a disease state.

2. The method according to claim 1, wherein the disease state is positive or negative for tuberculosis and/or Paratuberculosis (Johne’s disease).

3. The method according to any of claims 1 or 2, wherein the at least one infrared spectrum is a mid-infrared spectra.

4. The method according to any of claims 1 to 3, wherein detecting comprises comparing the at least one infrared spectrum to one or more reference infrared spectra.

5. The methods according claim 4, wherein the one or more reference infrared spectra comprises: at least one first reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as positive for the phenotype based on phenotype data; and/or at least one second reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as negative for the phenotype based on phenotype data.

6. The method according to any of claims 4 to 5, wherein the phenotype data comprises at least disease data.

7. The method according to any of claims 4 to 6, wherein comparing comprises statistical comparison of the at least one infrared spectrum to the one or more reference infrared spectra.

8. The method according to any of claims 1 to 7, wherein the one or more features are detected and the animal being positive or negative for the phenotype is determined by a trained machine learning model.

9. The method according to any of claims 1 to 7, wherein the one or more features are determined by partial least squares regression, including partial least squares discriminant analysis (PLS-DA), C4.5 decision trees, naive Bayes, Bayesian network, logistic regression, support vector machine, random forest, rotation forest, a decision tree and/or a learned convolutional neural network.

10. The method according to any of claims 1 to 9, further comprising modifying the at least one infrared spectrum to create a modified infrared spectra prior to determining; optionally wherein modifying comprises transforming, standardising, granulating and/or converting.

11. The method according to any of claims 4 to 10, wherein the at least one first reference infrared spectra and/or the at least one second reference infrared spectra are modified according to claim 11 prior to comparing.

12. The method according to any of claims 1 to 11, wherein at least one of the at least one infrared spectra, the at least one first reference infrared spectra and/or the at least one second reference infrared spectra comprises two combined infrared spectra optionally wherein each infrared spectra are obtained at different time points.

13. A computer-implemented machine learning method for prediction or detection of an animal phenotype comprising: receiving a first training set comprising labelled infrared spectra, the labelled infrared spectra comprising a plurality of infrared spectra obtained from milk of a plurality of animals and corresponding phenotype data for each infrared spectra, wherein each infrared spectra is labelled as negative or positive for the phenotype based on the phenotype data; and training a machine learning model using the labelled infrared spectra in order to detect whether the phenotype of a test infrared spectra is positive or negative based on one or more features of the test infrared spectra.

14. The method according to claim 13, wherein the machine learning model is a neural network or a decision tree.

15. The method according to claim 13 or claim 14, wherein the phenotype is a disease state, optionally the disease state is positive or negative for tuberculosis and/or Paratuberculosis (Johne’s disease).

16. The method according to any of claims 13 to 15, wherein the phenotype data comprises at least disease data.

17. The method according to any of claims 13 to 16: wherein the infrared spectra labelled as positive comprises animals having at least one of a positive skin-test result, a positive observation of lesions and/or a positive culture status; and wherein the infrared spectra labelled as negative comprises animals having a negative skin-test result, a negative observation of lesions and a negative culture status.

18. The method according to claim 13 or claim 14, wherein the phenotype is a pregnancy status, optionally wherein the phenotype is change in pregnancy status.

19. The method according to any of claims 13, 14 or 18, wherein the phenotype data comprises: parturition data; and/or insemination data.

20. The method according to any of claims 13, 14, 18 or 19: wherein the infrared spectra labelled as negative (not pregnant) comprises animals between parturition and first insemination; and wherein the infrared spectra labelled as positive (pregnant) comprises animals between the last insemination and subsequent calving with a gestation length between about 228 and about 296 days.

21. The method according to claim 13 or claim 14, wherein the phenotype is: a. Methane production and wherein the phenotype data comprises methane emissions data, feed composition, and/or feed intake for each animal; or b. Feed intake and the wherein the phenotype data comprises net energy intake, dry matter intake, concentration of milk components (such as fat, protein and/or lactose), milk yield and/or body weight

22. The method of claims 13 to 21, further comprising modifying each infrared spectra to create modified infrared spectra prior to creating the first training set; optionally wherein modifying comprises transforming, standardising, granulating and/or converting.

23. The method according to any of claims 13 to 22, further comprising synthesising labelled artificial milk spectra data, optionally wherein the labelled artificial milk spectra data is modified according to claim 20 and included in the first training set.

24. The method according to claim 23, wherein synthesising comprises randomly selecting a minority instance, A, finding its k-nearest neighbours, and drawing a line segment in the feature space between A and a random neighbour and synthetically generating instances on the line.

25. The method according to any of claims 14 to 24, wherein the neural network is trained for a number of epochs determined by an early stopper.

26. The method according to any of claims 14 to 25, wherein the neural network is a convolutional neural network.

27. A computer-implemented machine learning method for prediction or detection of pregnancy status of an animal comprising: receiving a first training set comprising labelled infrared spectra, the labelled infrared spectra comprising a plurality of infrared spectra obtained from milk of a plurality of animals and corresponding pregnancy data for each infrared spectra, wherein each infrared spectra is labelled as negative or positive for pregnancy based on the pregnancy data; training a machine learning model using the set of labelled infrared spectra in order to detect whether the phenotype of a test infrared spectra is positive or negative based on one or more features of the test infrared spectra; wherein the infrared spectra labelled as negative (not pregnant) comprises animals between parturition and first insemination; and the infrared spectra labelled as positive (pregnant) comprises animals between the last insemination and subsequent calving with a gestation length between about 240 and about 284 days.

28. The method of claim 27 wherein the machine learning model is a neural network or a decision tree.

29. The method of claim 28 wherein the machine learning model is a convolutional neural network.

30. The method of any of claims 27 to 29, further comprising modifying each infrared spectra to create modified infrared spectra prior to creating the first training set; optionally wherein modifying comprises transforming, standardising, granulating and/or converting.

31. The method of any of claims 27 to 30 wherein each infrared spectra comprises two combined infrared spectra obtained at different time points and optionally wherein pregnancy status comprises change in pregnancy status.

32. The method of any of claims 1 to 31, wherein the animal is a milk producing mammal, optionally wherein the animal is a bovine.

33. The method according to any of claims 1 to 32, wherein the one or more features comprise waveforms and/or wavelength values of the at least one infrared spectra according to any of claims 1 to 12, the one or more reference infrared spectra according to any of claims 4 to 12, or the infrared spectra or test infrared spectra according to any of claims 13 to 32.

34. A method of predicting or determining a phenotype of an animal using a trained machine learning model, the method comprising; receiving an infrared spectra obtained from the animal’s milk; mapping the infrared spectra to a positive or negative phenotype; and providing an output comprising the animal’s phenotype.

35. The method of claim 34, wherein the trained machine learning model is a trained neural network or a trained decision tree.

36. The method according to claim 34 or claim 35, wherein the trained machine learning model is trained according to any of claims 13 to 33, optionally wherein the trained machine learning model is trained according to any of claims 13 or 18 to 25 and the phenotype is pregnancy status; or according to any of claims 13 to 17 and the phenotype is a disease state.

37. The method of any of claims 1 to 36, wherein the at least one infrared spectra according to any of claims 1 to 12, the one or more reference infrared spectra according to any of claims 4 to 12, or the infrared spectra according to any of claims 13 to 33 are mid-infrared spectra.

38. A computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of the method of any one of claims 1 to 37.

39. A method of predicting or detecting a phenotype in a test animal comprising: detecting one or more features in at least one infrared spectrum obtained from the animal’s milk; wherein the presence or absence of the one or more features in the infrared spectra are indicative of a positive or negative phenotype, wherein the phenotype is a disease state; determining whether the animal is positive or negative for the phenotype based on the presence or absence of the one or more features; and responsive to determining whether the animal is positive or negative for the phenotype, providing a treatment to the animal; wherein the phenotype is a disease state.

40. The method according to claim 39, wherein the disease state is positive or negative for tuberculosis and/or Paratuberculosis (Johne’s disease).

41. The method according to any of claims 39 or 40, wherein the at least one infrared spectrum is a mid-infrared spectra.

42. The method according to any of claims 39 to 41, wherein detecting comprises comparing the at least one infrared spectrum to one or more reference infrared spectra.

43. The methods according claim 42, wherein the one or more reference infrared spectra comprises: at least one first reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as positive for the phenotype based on phenotype data; and/or at least one second reference infrared spectra obtained from a reference animal’s milk, wherein the reference animal is the same species as the test animal and the reference animal is labelled as negative for the phenotype based on phenotype data.

44. The method according to any of claims 42 to 43, wherein the phenotype data comprises at least disease data.

45. The method according to any of claims 42 to 44, wherein comparing comprises statistical comparison of the at least one infrared spectrum to the one or more reference infrared spectra.

46. The method according to any of claims 39 to 45, wherein the one or more features are detected and the animal being positive or negative for the phenotype is determined by a trained machine learning model.

47. The method according to any of claims 39 to 45, wherein the one or more features are determined by partial least squares regression, including partial least squares discriminant analysis (PLS-DA), C4.5 decision trees, naive Bayes, Bayesian network, logistic regression, support vector machine, random forest, rotation forest, a decision tree and/or a learned convolutional neural network.

48. The method according to any of claims 39 to 47, further comprising modifying the at least one infrared spectrum to create a modified infrared spectra prior to determining; optionally wherein modifying comprises transforming, standardising, granulating and/or converting.

49. The method according to any of claims 42 to 48, wherein the at least one first reference infrared spectra and/or the at least one second reference infrared spectra are modified according to claim 48 prior to comparing.

50. The method according to any of claims 39 to 49, wherein the at least one the infrared spectra, the at least one first reference infrared spectra and/or the at least one second reference infrared spectra comprises two combined infrared spectra optionally wherein each infrared spectra are obtained at different time points.

51. The methods according to any of claims 39 to 50, wherein providing the treatment to the animal comprises at least one of administering a drug therapy to the animal, euthanizing the animal, and/or isolating the animal.

52. The methods according to any of claims 39 to 51, further comprising responsive to determining whether the animal is positive or negative for the phenotype, isolating the animal and identifying and isolating all other animals that have been in contact with the animal.

53. The methods according to any of claims 39 to 52, wherein the one or more features comprise waveforms and/or wavelength values of the at least one infrared spectra and/or the one or more reference infrared spectra.

54. The methods according to any of claims 39 to 52, wherein the animal is a milk producing mammal, optionally wherein the animal is a bovine.

55. A method of predicting or determining a phenotype of an animal using a trained machine learning model, the method comprising; receiving an infrared spectra obtained from the animal’s milk; mapping the infrared spectra to a positive or negative phenotype; providing an output comprising the animal’s phenotype; and responsive to output comprising the animal’s phenotype, providing a treatment to the animal.

56. The method of claim 55, wherein the trained machine learning model is a trained neural network or a trained decision tree.

57. The method according to claim 55 or 56, wherein the trained machine learning model is trained according to any of claims 13 to 33, optionally wherein the trained machine learning model is trained according to any of claims 13 or 18 to 25 and the phenotype is pregnancy status; or according to any of claims 13 to 17 and the phenotype is a disease state

58. The methods according to any of claims 55 to 57, further comprising responsive to determining whether the animal is positive or negative for the phenotype, isolating the animal and identifying and isolating all other animals that have been in contact with the animal.

59. The methods according to any of claims 57 or 58, wherein the one or more features comprise waveforms and/or wavelength values of the test infrared spectra and/or the infrared spectra.

60. The methods according to any of claims 55 to 59, wherein providing the treatment to the animal comprises at least one of administering a drug therapy to the animal, euthanizing the animal, and/or isolating the animal.

61. The methods according to any of claims 55 to 60, further comprising responsive to determining whether the animal is positive or negative for the phenotype, isolating the animal and identifying and isolating all other animals that have been in contact with the animal.

62. The methods according to any of claims 55 to 61, wherein the phenotype comprises pregnancy status, providing the treatment to the animal comprises at least one of inseminating the animal, euthanizing the animal, or separating the animal from a population of pregnant animals to a population of non-pregnant animals.

63. The methods according to any of claims 55 to 59, wherein the animal is a milk producing mammal, optionally wherein the animal is a bovine.