WO2018060967A1

WO2018060967A1 - Big data self-learning methodology for the accurate quantification and classification of spectral information under complex varlability and multi-scale interference

Info

Publication number: WO2018060967A1
Application number: PCT/IB2017/056039
Authority: WO
Inventors: Rui Miguel DA COSTA MARTINS
Original assignee: Inesc Tec - Instituto De Engenharia De Sistemas E Computadores, Tecnologia E Ciência
Priority date: 2016-09-29
Filing date: 2017-09-29
Publication date: 2018-04-05

Abstract

The present disclosure relates to a big data self-learning artificial intelligence methodology for the accurate quantification of metabolites classification of health conditions from spectral information, where complex biological variability and multi-scale spectral interference is present. In particular, this invention allows the breakdown of highly complex biological spectral signals into high dimensional feature space where local features of each sub-space are accurately correlated with both a specific metabolite concentration or a categorical condition. Such is achieved by a new self-learning method, that requires no human intervention. The developed artificial intelligence is able to establish its own knowledgebase when new data is fed by performing feature space transformations, searching directions of co-variance and optimizing local composition-spectral correlations. These methods allow the artificial intelligence to establish knowledge maps of both quantifications and classifications, that can be cashed for higher computational performance. In particular, direct search comprises of finding across the feature space data and dimensions that allow a direct linear correspondence between metabolic composition and spectral bands variance. Moreover, a similar approach is derived for defining the convex hull regions of different class of health conditions from body fluid spectra. Such results in the creation of knowledge maps for both quantification and classification. The present invention also allows to evaluate 'a priori' the predictability, accuracy and precision of new estimates. Furthermore, this invention provides a self-learning approach to de definition of the global feature space using big data, for its correct characterization under high variability, accurate detection of local anomalies, as well as, outliers that can contaminate the knowledge base. This invention is applicable to all regions of the electro-magnetic spectra used in spectroscopy analysis (x-ray, uv, vis, nir, ir, far-ir and microwaves), or with any other type of spectroscopy (absorvance, reflectance, fluorescence, phosphorescence, Raman scattering) where complex multi-scale interference and biological variability is present. It further extends to fields of non-destructive, non-invasive spectroscopy applications in fields such as healthcare, veterinary, biotechnology, pharmaceutical, food and agriculture.

Description

BIG DATA SELF-LEARNING METHODOLOGY FOR THE ACCURATE QUANTIFICATION AND CLASSIFICATION OF SPECTRAL INFORMATION UN DER COMPLEX VARIABILITY AND

MULTI-SCALE INTERFERENCE

FIELD OF THE INVENTION

[0001] The present disclosure relates to (lie field of big data machine learning (ML) and pattern recognition applied to the field of spectroscopy, for obtaining analytical grade quantification in biophotonic devices. In particular, this invention is applied to non-invasive and nnmmally invasive quantification and classification of clinical analysis parameters in body tissues and fluids, or^''In-νίνο', -with relevance for human and animal healthcare where significant biological variation and interference exists.

BACKGROUND ART

[0002] Current machine learning systems used in biophotonic systems present a series of technical limitations that do not allow them to produce analytical grade quantification in samples where highly complex variability and multi-scale interference exists (e.g. human blood, human tissues) .

[0003] The goal of this inventio is to disclose a big data ML methodology that -solves tins prior art technical limitation, and allows spectroscopy to be used as clinical analysis technology at the point-of- care or for self-testing.

SUMMARY

[0004] Herein is disclosed a big data self-learning methodology for the accurat quantification and classificatio of spec tral Information under complex variability and .multi-scale interference, comprising:

The correct sub-space identification of spectral features for the extraction of the local interference pattern, from which is able to sustain a statistically relevant co-variatio with sample composition, accurately relating spectral features to compound quantification

The same method -works with devices that use either reflectance, transmittance, Raman scattering or fluorescence spectroscopy.

On-demand sub-space identification and optimal local scale and feature extractio being able to produce for unknown samples (samples that do not belong to the knowledgebase), statistical metrics of predictability, accuracy and precision of quantification.

On-demand detection of outliers that can be predicted by the exiting knowledgebase. However if systematic variations are detected in unknown samples, the system uses this new- data for performing self-Je;vrni:tig and re-update the knowledgebase.

[0005 ] The self-learning methodology also includes: The accurate definition^' of the feature space for the correct statistical representation of both biological variability and multi-scale nature of the spectral signal;

Detection of uncovered knowledgebase space and incorporation of new representative data forperforni- ing accurate quantification in this sub- space,

Self -learning of new data to provide local metrics, such as, new sub-space predictive capacity;

Efficient definition of the local number of neighbors: with the correct co-variance eigenvector that is able to produce the prediction of an unknown sample.

10006] The self-learning methodology also includes:

The accurat definition of the feature space for the Correct statistical representation of both biological variab ility and multi-scale nature of the spectral signal for classification of sample and probabilistic diagnosis.

Detection of uncovered knowledgebase space and incorporation of new representative data for the 'classification of samples and probabilistic diagnosis.

Self- learning of new data to provide ne local metrics, such as, new sub-space predictive capacity for sample classification.

Efficient d efinition of the' local nuni ber of neighbors with the correct co- ariance eigenvector that is able to classify an unknown sample and provide a probabilistic diagnosis

[0007] The self-learnin methodology also includes:

Defining the local geometry and local sub-space identification by performing direct finding of coherent local co- ariance; and local optimization of the convex hull for optimal local model prediction;

Mapping of tire feature space for quantification, where the self -learning me thod maps the geometry path'Of co-variance between metabolic composition and spectral features;

Mapping the feature space for classification of diffent health conditions by defining the. convex hull of coherent directions with clas probabilit given by the logistic function;

Refinement of the local geome try of variation by minimizing the num ber of eigenvectors of co-variance and convex hull volume, orthogonal filtering and variables /samples optimization;

Providing metrics for sub-space characterization and management, so that the self-learning process can decide about learning or predicting of any new unknown recorded spectrum

BfflEF DESailPriON 01* THE DRAWINGS

[0008J For a more complete understanding of the invention, reference is made to the following description and accompanying drawings, in which:

[00091 PIG_* 1 Different Spectrum modes of variation quantifying the same molecule: {la) four node of variation mask the direction that quantifies the molecule; (lb) only one mode of variation is linear in this fea ture space transformation , represented by triangles and all other modes are non -linear; (1 c) Self - learning search space spawns in all directions of feature space when anew spectrum composition is to be predicted, but only one direction can hold quantifiation eigenvectors ie.g. grey samples): (Id) these samples correspond to the linear relationshi of samples in lire feature space given by triangles;

[0010] FIG, 2 Effects Of local clustering and prediction; (2a) models built using standard K-nearest neighbours lead to sub-optimal co-variance eigenvectors; (2b) optimal co-variance eigenvectors for unknown new spectra (samples 1 and 3); where as, sample 2 belong to an unknown group and cannot be predicted by the knowledgebase; (2c) optimal prediction using self-learning artificial intelligence where the co-variance eigenvectors are optimized; sample 2 can never be predicted using the existing knowledgebase;

[0011] FIG. 3 Local feature space search according to Algorithm 1: (3a) feature space with different classes represented by different symbols - it is quiet difficult to distinguish separate clusters of data, when big data is characterized a continuous distribution of classes is obtained; (3b) directional search - assigning the spectral projection into the feature- space, search directions, computing eigenvectors of eadi direction, determining new iteration search direction by computi g the optimal search vectors; and performing this cycle until directions are optimal, (3c) Optimization of the convex hull fay minimizing the volume search to obtain minimum bias-variance, that is, selecting the coherent samples that provide the best linear model.;

[0012] FIG. 4 Self-learning for quantification mapping according to Algorithm 2: (4a) feature space with different classes represented by different symbols; (4b) sequencial application of Algorithm 1 in order to obtain a continous quantification feature line; and (4c) resulting quantification map.;

[0013] FIG. 5 Self-learning classification map of spectral feature space: (5a) directional search of univariate class; (5b) directional search of exclusive class; (5c) directional search of multivariate/ complex class; and (5d) resulting classification map.;

[0014] FIG. 6 Illustration of the main steps fo finding the accurate dataset inside a searc direction to provide coherent quantificatio of classification models: 6a initial datasets ¾ to 5 ; 6b find data and variables that optimized the correlation; whereby a final optimal dataset X is related to Y6e die end result of the optimization, a linear model in the feature space,;

[0015] FIG. 7 illustration of how information complexity is locally reduced to allow coherent linear eigenvector extration; 7a to 7c optimization of data and variables are able to decrease the complexity of local geometry to tree relevant eigenvectors; step 7d is able to reduce to two eigenvectors by orthogonal filtering. Such allows to. develo models with solely die complementary information information between X and Y, minimizing the existence of systematic interference in the extracted models.;

[0016] FIG. 8 Self-learning cycle - how the system makes use of the learned feature space: new- data that does not exists in the knowledgebase enters .the self-learning cycle, where after being added to the quarantine database, can be used to learn uncharacterized -regions of the feature space. After model certification, quarantine data is added to the knowledgebase,;

[0017] FIG, 9 Self-learning the feature space transformation using multi-scale basis coinb iatorial reconstruction;

[0018] FIG. 10 Multi-scale self-learning algorithm detail: super-resolution and multiscale decomposition of feature space for performing self-learning;

[0019] PIG. 11 Self-learning cache: (11 a) procedure for using cached models for predicting new data and achieve optimal models that are cached in model knowledbase; (lib) procedure for non-existing cached model.;

[0020] F G. 12 Self-learning cached models management;

[0021] FIG. 13 Blood blind-test quantification predictions: (13a) Erytrpcytes■ large-scale PLS model; (13b) Erytrocytes big data self-learning method; (13c) Leucocytes large-scale PLS model; (13d) Erytrocytes big data self-learning method.:

[0022] FIG.1 Blood serum blind-test quantification predictions; (14a) Biiiirubin large-scale PLS model; (14b) Biiiirubin big data self-learning method; (I4c) Myoglobin large-scale PLS model; (14(1) Myoglobin big data self-learning method.;

(00231 FIG- 15 Self-learning parameter benchmarks: (15a) average error (%) of pis vs self-learnin method; and (15b) corresponding 'error reduction when using the self-learning method;; and

[0024] FIG. 16 Blood semm blind-test quantificationpredictions: (16a) Biiiirubin large-scale PLS model; (16b) Biiiirubin big data self-learning method; (16c) ^' Myoglobin large-scale PLS model; (16d) Myoglobin big data self-learning method..

BRIEF DESCRIPTION OF THE ALGORITHMS

[0025] Algorithm 1. Local geometry and sub-space identification.

[0026] Algorithm 2. Feature space mapping: building the quantification knowiedbase.

[0027] Algorithm 3. Feature s ace mapping: building the classification knowiedbase.

[0028] Algorithm 4, Characterization of the local geometry:

[0029] Algorithm 5, Sub-space optimization using orthogonal projection filtering.

BRIEF DESCRIPTION' OF THE TABLES [0030] Table 1. Metrics for the characterization of the feature space.

[0031 ] Table 2. Global PLS benchmark \¾ self-learning methodology in blood and blood serum samples.

[0032] Table 3. Classification benchmarks in blood and blood serum samples: true vs false -classifications.

GENERAL DESCRIPTION

[0033] Spectroscopy is an indirect measurement of metabolites, either for their ide tification or quantification. Each molecule or atom has a characteristic spectra! fingerprint obtained by absorvance or emission, reflectance, fluorescence, phosphorescence and Raman scattering; and band intensities are directly proportional to the specimens concentration [1, 2, 3, ]. In pure substances or in simple mixtures, the spectrum carries little interference. In these cases, specimens identification is directly applicable by band matching and intensity is proportions! to concentration [33.

[0034] In more complex mixtures, such as, -chemical or pharmaceutical products, the spectrum signal is _.the result of band interference of primary absorvance bands and overtones: resulting into a continuous spectra of overlapping bands. The increased interference between constituents, turns the possibilit of quantification by peak intensities difficult [i]. In this circumstances, metabolites must be quantified by their interference pattern [5, 6]. Moreover, dependind on the spectroscopy technology, band resolution and spectrum convolution is significantly different, and therefore, spectral information is locally distributed by the recurrent convolution of optical parts, leading in extreme cases of low quality, to a a hlglhy auto-correlated signal.

[0035] General linear models (GLM) are used to extract linear transformations between the interference pattern and compound quantification, hi particular, in chemometrics, the following GLM based on latent variables are models used; i) multivariate linear regression (MLR) [7] ; ii) principal component regression (PGR) [8]; in) partial least squares (PLS) [5, 6, 9j: iv) multivariate -curve resolution (MCR) [15]; v) independent component regression (ICR) [11]; vi) logistic regression [12, 13]; and vfi) support vector machines regression (SVMR) [14, 15, 16],

[0036] in particular, PLS has been extensively used in chemometrics. The success of the technique is due to finding the eigenvectors of the correlation matrix [¥¾, between the spectra fX) and composition 00. where the optimization procedures ...follows orthogonal deflation of these two matrices, and coefficients (β are determined by the external relationship given by the linear relationship Y - j3X [5 , 6] . The optimal β vector is obtained balancing the prediction error to a minimal with the number of used variables {bias vs variance) as in any other GLM approach [17, 18, 19], As the linear relationship is extracted using eigenvectors of recursive deflations, the method can use much less samples than variables, opposite to, least squares methods [5, 6|. However, extracting eigenvectors from small datasets results in high bias when hew; data is. predicted. In this reasoning, when big data is Modelled using PLS models, high- variance is expected. Therefore, traditional chemometics methodologies are only effective for datasets whith low spectral, variation, so that, statistically stable linear models are sustainable.

[0037] To try to overcome the bias -variance difficulty, PLS approache obtained significant improvements to deal ith spectral complexity and interference. The orthogonal PLS {OPLS} method [20] was introduced to remove from die optiiiiization systematic orthogonal effects to the quantification present in the spectra, such as in FTIR the effects of temperature and humidity, aiding further interpretation of signal parts [21]. A great effort was put into this technique to understand the nature of interferences and correctly select the number of latent variables used in the PLS model so that models hold acceptable low bia -variance in moderate complex samples and interpretation capacity [22, 23]. Despite these limitations, PLS is still one Of the most widely used methods, because the optimization procedure for extracting co-variance eigenvectors is extreme!ly efficient.

[0038] GLM fail accurate prediction when samples present comple composition. Such is the case of bodyfluids, such as blood, serum or urine. The composition of human and animal bodyfluids carries the complex metabolic information about the human body, such' as metabolites, proteins, genetic material and cells, but also the ones provoked by ^'infection^'s, allergies, poisoning or even medication; which all at ^'the same time contribute for the spectral fingerprint.

[0039] The spectrum carries both physical and chemical information, as well as, the complex interference pattern between their constituents. The quantum nature of spectroscopy means the information about any pure compound is widespread along different wavelengths at several scales of intensity. Due to the wave nature of light, superposition of. information -results -into constructive or destructive inter- ference between the sample constituents bands. Therefore, the observed variations bodyfluids is highly non-linear with local chaotic variations, that cannot simply be modeled by GLM

[0040] Machine learning algorithms such as artificial neural ne tworks (ANN), support vect or mach ines (SVMJ or more recently deep learning, which present complex model structures, have been applied to spectroscopy to model the non-linearities [24], In order to avoid developing a theoretical support for extracting spectral information from non-linear effects, many chemometritians tried to appl machine learning algorithms such as artificial neural networks, kernel methods and support vector machines. It was expected that more complex model structures could capture all the non-linearit and provide better predictions.

[0041 ] Artificial neural networks (ANN) capture non-linearity by constructing complex networks architecture models based on the logistic function for each neuron unit. As models have a significant number of layers, a ver significant number of parameters must be regressed by back-propagation [25]. A ANN architecture is data dependent, as new data is evaluated, the ANN architecture must be revised, and therefore, when applied to big data, it becomes computational resourceful and time consuming for spectroscopy applications.

[0042] Kernel methods arid support vector machines were developed to provide an efficient way of dealing with non-linearity by increasing the dimensions of the original data and forcing through transformation to be linearly separable or quantifiable [26] . Kernels are computationally efficient ways of mapping signal features into a linear feature space, as these compute the inner product of the transformati on, being it self a similarity measure in the feature space [27, 22]·. Afterwards, linear classifiers such as PLS [22] or support vectors [14, 28] can be used to either classify or quantify data. Least-squares support vector machines [15 , 28, 16] uses kernels and support vectors to fist classify the data in the feature space, and thereafter, perform linear regression of each class against the predictor of interest [24, 29] .

[0043] Kernel methods efficiency is limited to the linearity capacity of the feature mapping function. When taking into consideration big data and the complexity of spectroscopy information, finding the correct kernel is not easy, nor if exists any or many necessary transformations. In chemometrics, radial functions have been mostly used. Alth ugh .authors of benchmarks against previous "methods argue that least: squares support vector machines fLS-SVM) provide better results, the improvement is not significant for medical purposes [30, 31, 26]. There is no bibliographic evidence that this method can hold predictability once big data is tested as most of published benchmarks are tested with low number of amples (e.g. 200 to 500}.

[0044] State of the art machine learning is moving towards extreme leaning- machines -and deep ANN architectures [32] , This philosophy tries to mimic the brain function in terms of hierarchical representation of patterns, from a very fragmented lower level representing features of signals, to a more higher level representation, of both classification and quantification [33] . Deep ANN architectures provide state of the art classification of images, sounds and text interpretation [34, 35], In order to achieve this level of accuracy, big data is necessary to optimize both parameters- and architecture [36]; where a very significant computational cost is required. As. the architecture is monolithic, when a new or unknown data to the characteristic architecture is provided, predictions may suffer significant bias.

Previous methods present (lie following difficulties in spectroscopy modeling;

Big data spectral variability Deep ANN and non-linear SVM are complex function models that fit to all data, That is, a structured monolithic model is produced for a populatio of data. Such leads to highly complex architectures, which is not the best for big data in spectroscopy, due to the local chaotic nature of the signal Once a new sample with new local variations is .introduced, the model is unable to find the correct co-variance between spectral bands and composition. Furthermore, if not fed with a large amount of data, AN and SVN. are extremely exposed to significant bias in their predictions. These methods became only interesting' when the feature space is almost totally represented; which is hard, as biological variability is extreinelly vast.

Re-training computational cost As. ANN and SVM are global methods, once a new set of outliers are identified, the complex model structure must b e re-optimized. Once this is made with large database! significant computing resources must he used to re-compute model structure [35],

Outliers detection The complex structure of ANN and SVN- makes difficult to determine 'a priori' if a new spectra is an outlier. As there is no apparent law from ^'which to draw conclusions about, (lie predictability of any new result, knowing if a spectra measurement is an outlier is difficult. This is specially critical in medical, veterinary or even hazardous industrial processes where substances accurate quantification is paramount and prediction failure has disastrous consequences.

[0045] Therefore, information processing technologies that lake into account the systematic variation of optics and spectroscopy are more !ikely to solve the problem without. the need of high computational cost. For instance, local calibration approaches were developed to breakdown the global spectral variance into characteristic groups where variations are systematic [37] . in many cases, local approaches outperform AN and SVM [38], Techniques such as locally weighted partial least squares (LW-PLS) [39, 40], LOCAL [41], locally biased regression [42] and CARNAC [43] and local PLS modeling approaches [44] provided complexity reduction and stable calibrations.

[0046] One of the latest developments is the method 'spectral based learner' (SBL) used to model big data of NIK soil composition [45]. SBL is based on a knowledgebase constructed using optimized principal components (oPC) [46] , where the local calibration is obtained by using the oPC's dimensional distance neighbors (e.g. determined for instance by k-nearest neighbors or other distance metrics), determined by similarity in chemical com osition, using the root square mean difference (RSMD) of composition [45]. Local-sample selection is solely based on the effects of major components, that is, substances that influence, the spectral 'fingerprint with lower frequenc or signal baseline. SBL will always struggle to quantify lower concentrations, where the information is present in the high frequency of the spectrum fingerprint.

[0047] The present state-of-the-art approaches are unable to technically solve the complexity of spectrum quantification and provide the necessary accuracy and precision (bias and variance) to be used in. critical applications, such as, medicine. Correct medical decisions can only be supported by analytical grade data. The present disclosure presents a new methodology intended to overcome the. current technical difficulties of artificial intelligence and pattern recognition in spectroscopy, to provide accurate quantification -and classification of spectral samples under complex variability and multi-scale interference.

DETAILED DESCRIPTION

[0048] Obtaining accurate quantifications (y) from spectral knowledgebase (X) using a projection model /fX) becomes feasible if: i) the co-variance is stable (X^fY); ii) the variance of the spectral feature space is stable {X¾¾ iii) the bias-variance of the predicted f is low; ivj extracted eigenvectors, projections and coefficients are statistically coherent andinterpretable,

[0049] Globally stable X V and X'X across the feature space of big data biological spectra does not exists. Co-variance direction is non-linear and eigenvectors rotate across the feature space, depending on local characteristics. Given the unlimited number c>f possible observations of X, it becomes unfeasible to quantify a .new unknown spectrum X,_WJi? if the variance of the feature space is non-linear across the feature space,

[0050] Given such physical constrains, it's disclosed in this invention a totally new method regarding spectral prediction, based on the fact that any unknown spectra X,_!Bit,. should be consistent with the feature space at a local sub-space, so that, it can also hold consistent information In terms Of co-variance with compositional data {X^fY). The prediction of y is now a problem of finding a consistent sub-space of ¾ that holds correspondent information about Ύ, so'thatX'Y is consistent with X_new variation, producing a stable and reliable prediction. Moreover, only by ensuring that X'X.and Tf'X are locally coherent, allows us to know 'a priori' ^'if any unknow speetrax>_J(?!(, can be predicted based on previous knowledge.

[0051] One can consider that , there is no. 'it .priori' model to quantify substances for a given unknown Xne- ¾ ¾^' happens with previous methodologies (PGR, PLS, LS-SVM, ANN and Deep Learning}. It is possible t postulate (bat for any given X_{H W}, there will be subset of (lie knowledgebase feature space that will be able to sustain con istency to predict y. Therefore, once a new spectra is recorded, the AI must learn if any subspace in the kiiowledbase exists that allows to perform a correct prediction according to improve standards.

[0052] There are significant advantages of using subspace identification in spectroscopy; i) in er reter tion of the sub -space becomes feasible due to complexity reduction; fi) local independence of data rep- resentativity (nu mber of data), that is, predictions became not affected by more databeingin the kiiowledbase; iii) local multi-scale consistency ; iv) interpretation of bands used to perform the quantification; iv) better control of what bands are used in q uantification; v) spectral corrections are more accurate, as baseline, mie and Rayleigh scattering corrections enhance spectral bands variation if spectral variance is consistent; vi) feature, space, transformation ie.g, kernel, derivates, wavelets] between locally consistent; and vii) adaptability; as quantification is self-learned, local adaptation will always find the optimal set of spectra (X) in the knowledbase that provides the best prediction of y. Sub-sp ace identification allows the Al to self-learn and became independent from human supervision during model building.

Theore.ctical principle

[0053] The problem of quantification based on complex spectral information is empirically explained in Figure 1. Figure la shows a collection of spectra, where the composition is a mixture of the same components in different quantities. The substances present in the mixture are highly interferent between each other, making irnpossible to derive directly a peak correlation with concentration to provide a simple method of quantification. Nevertheless, there are four classes of spectral non-linear variation, that is, four modes that may provide direct correlation to the composition of a particular substance. Even under such a simple example, the use of a GLM (e.g. PLS) would provide the prediction of j> ith high variance. Moreover, if any mode of variation has lack of representation, providing a prediction for a ne spectra within this class, will inevitabl provide a high biased prediction. Figure l b shows why euclidean distances are not a good measure of spectral features nor able to correlate to composition. All four groups of variation of Figure la exhibit completely different non-!inesi projections that allow quantification that may not be linearly correlated to concentration. Spectroscopy quantification in .complex mixtures is non-linear search for co-variation projection of eigenvectors that locally produce minimal bias -variance of predictions.

[0054] Therefore, in order to provide accurate quantifications for a given X_ni>_H,, one must search across the feature space the set of neighbors that allow optimal projections, as presented in Figure Id,

[0055] Figure 2a shows that when using clustering techniques, such as, hierarchical clustering, k- nearest neighbors (KNN) used in the previous art. {spectral based learner), it will always result in sub- optimal projections with bias and possible outliers. Figure 2b shows the optimal projections for unknown spectra #1 and #3, which are under different: local co-variations, and. #2 is an outlier that cannot be pre- dieted by the knowledgebase.

[0050] Here in, it's disclosed a self-learning methodology for spectroscopy big data. The new method is able to find the coherent eigenvectors of quantification that sustai a consistent X' X and X'Y. The proposed self-learning does not produces a monolithic model. For each new data, the system .has to. leain the coherence of 'X and X^rY, to project X„_ew and estimate y, Moreover, if both X¾ and X'Yare locally coherent, the prediction problem of any X-n w can be estimated. ¾ priori^*. Metrics about the variance- covariance consistency allows to infere the local confidence in predictability.

[0057] The methodology comprises the following three major steps: i) local geometry and sub-space identification - where the local geometry of spe ctral information is extracted as a characteristic sub-space with characteristic eigenvectors that support local quantiflcation/classification;. ii] building the knowledgebase of non linear feature mapping . process by which applying recursive local geometry arid sub- space identification allow to build the artificial intelligence knowledgebase on non-linear mapping of spectral information; iii) local optimization of spectral information - process of local refining die quantification or classification by ininirmzing the local convex hull volume and prediction error by filtering out non-related information in both Y and X, or their correspondent feature space transformation K and F.

I. l ical Geometry and Sub-Space Identification

[0058] As presented previously, it's postulated that there is always a local direction clustering of data that Ls abl to sustain^' a coherent eigenvector's} of qualification. This local cluster represents a local mode of variation within the vast iion-liaear feature space. Therefore, let one consider then-dimensional feature space F, where the coordinates of the feature space are proportional to linear combinations of Spectral features, that are implicitly, correlated to the sample composition. Let 's also assume that discrete finite directions at a local point of the feature space, represent a mode of variation in the spectra, coherent with a local level of speciemeiis concentration. Such enables the possiblity of extracting a coherent eigenvector from local co-variance (X'Yj, between spectral data X and composition Y. Moreover, highly non-linear feature space can be locally characterized by a hyperdimensional polytope that has consistent directions of quantification, that is, all the data contained by its correspondent convex hull presents a mode where all spectra inside it follow a mode of variation that quantifies different: parameters with low bias-variance.

[0059] Therefore, the local geometry is directly related to composition at this local subspace, one can find an optimal eigenvector of quantification. The problem of big dat spectral quantification and classification is reduced the search of local geometry that ; i) minimizes the number of directions /dimensions of the polytope; ii) obtains a principal direction that minimizes bias- variance; and iii) minimizes the convex hull volume of the selected optimal directional pohtope; so that a direct linear model is applicable to thi finite space approximation.

[0060] Fi ures illustrates the problem in the feature space with a feature space map.. n this example, 2 -dimensional feature space is presented with highly no« - linear variation of different classes that occupy different regions of space. The (xmtinuous line represents a coherent co-variance feature with a specimen concentration, that is, along this line, is possible to find coherent eigenvectors of local X'Y in orde to produce low error estimates of y. Furthermore, ^'the line represents a self-learned characteristic of th feature space. For instance, every new spectra ¾_eii, that i projected into the vicinity of this line can be directly predicted by the self-learned subspace model.

[0061] The self-learning process focuses in searching the coherent polytopes subs paces that allow specimens quantification with low bias and variance, being illustrated by Figure 3, and with pseudocode of the process present in Algorithm I. Let One assume thai a new spectra x« is projected into the feature space. Once projected, the self-learning process has to find nearest neighbors that are within the

Step A. Direction finding

Objective: Find the 'minimum directions and local sub-space geometry ofx;.

1. Initialization: i) define a circle areaaraund xi projection with a radius of search; ii) define the number of directions; in) define the dimensions of each direction search.

2. Initial search: 1} determine the number of optimal eigenvectors and predict the errors of ideal models; ii) remove directions that are statistically inconsistent;

3« Prepare new iteration; i) inside the consistent directions, remove the worst contributions, removing search length or increasing in a articular way; ii) taking into consideration die extreme vertices's of each convex hull and x, , compute the new directions of search;

4. The search loop; i) determine the number of eigenvectors and predic tion error of the^' new directions; ii) eliminate the worst directions; iii) re -dimension the search by eliniinating the worst (smaller or larger) length, and increase or decrease the search length accordingly; iii) loop the previous operations until no statistically significant direction or dimension change occurs.

5, Output: Minimum number of feasible directions and convex hull volume of each direction. Step B. Optimization of the convex, hull

Objective: minimize the convex hull volume and prediction error

1. Initialization; Merge the previous output data into an initial cluster that defines the initial convex hull.

2. Define: i) the outer vertices's of me convex hull; ii) minimum and maximum moving boundary of the convex hull adaptive geometry.

3. Main loop; i) de termine model errors; ii) remove outliers; iii) define the new borders of the con vex hu ll by using simplex geometric optimization - for each outlier removed, move the boundary inwards; iy) compute the new convex hull. Do this cycle until no more outliers are found and mode] error is stable.

4. Output: optimal convex hull and local model prediction

[0062] At the end of this procedure, one expects to obtain the optimal geometry of data (hat is able do predict any new spectra X, . Mathematical and algorithmic details are presented in Algorithm 1.

IL Mapping the feature space - building the knowledgebase of quantification

[0063] Following a similar phyiosophy, one can recursivel map the self-learning process across all feature space. This mapping constitutes the global knowledbase of the big data spectral data feature space. Let's take into consideration all the steps in Algoritlun I and appiy it recursively across the feature space, following a stepwise sequential protocol as illustrated in Figure 4. The procedure Is as follows;

Objective: Sequentially mapping the geometry of co-variance in the feature space 1. Initialization: i) start at any given point of the feature sp ace; ii) define: search circle diameter, number of search directions and dimensions of search area:

2» Perform Algorithm 1: define the locariinear geometry of the convex hull for the spectra x_i:

3. Recursive mapping: select inside If re optimized convex hull of x₍;, a new data point ₍₊i and perform recursively Algorithm 1 until no more directions are feasible to be extracted to expand (lie convex hull,

4. Re-sample; Proceed to another ^'.uncovered location in the feature space

■5. Main loop: repeat operations 3 and 4 until a given ratio of coverage of the feature space volume is assured

6. Compilation: Compiie knowledgebase q antification paths in the feature space map by registe ing all model paths to be used as cached models

7. Output; Compiled mapping of cached models

Mathematical and algorithmic details of this procedure are presented in Algorithm 2.

[0064] The result of this process is the construction of .the .feature space quantification map in Figure 4d. The map constitutes the self ^learning process of die artificial intelligence methodology. The lines in the map represent coherent path of model local prediction, that ^'is, when a new spectra x^w i projected nearby the line convex hull, will likely follow a similar mode of variation of its neighbors inside the convex hull and can be predicted based on the data of the local model. The characterization of the feature space allows to: i. use cashed models to speed-up computing efficiency - if a new spectra is projected into the convex hull of a previous prediction line, the calculation can be performed directly with a cashed model, where calculations are direct;

ii. characterization of typical conditions that lead to different modes of quantiiication - many predictio lines provide metabolic information for different types of health conditions and their evolution;

iii. determine how well the information is represented across the feature space - only regions with sufficient data allow to produce correct quantifications and effective searches;

iv. provide a map for understanding spectral patterns along time, that is, an interpretation of spectral pattern recognition for the implementation of precision medicine;

v. provide the basis of a higher level artificial intelligence for condition diagnosis using non-supervised spectra) information, allowing the construction of a non-linear classification ma of complex and mufti factor health conditions.

111. Classification mapping

10065] The construction of a knowledgebase map of the classification of health conditions is an important tool for precision medicine. The capacity to diagnose conditions, such as, anemia, infections, throm- bocythemia, thrombotopenia, hepatic insuffici ncy, diabetes melfitus, acute myocardial infarction, renal dysfunction and inflammation, among o thers, as well as, their degree of severity is of exterme relevance. [0066] The presented self-learning artificial intelligence method was devoted to optimize directional search that allows the quantification of metabolites. Health conditions, however, are classified within characteristic intervals of metabolic composition,, that provide intervals of spectral variation in the spectral feature space, and not, solely directions of variation. Moreover, health conditions exhibit characteristic intervals of metabolic variation, as well as, under treatment, characteristic medication^' or other medical interventions (e,g, surgery) that significantly affects the spectral fingerprint, so that it corresponds to a particular group of health condition.

[0067] Healt conditions cluster as nonlinear subspaces with a characteristic convex hull geometry; where inside this multidimensional geometry, a health condition has different degrees of severity or characteristics: or under complex conditions where multiple organs are affected. The degree of severity, is observed as continuous non-linear■transitions in the feature space, not as isolated clusters.

[0068] The self -le rning artificial intelligence method for classification has as major objective of finding the class geometry in the feature space by: i) maximizing, the local volume of the class; ii) minimizing .the total volume across the feature space in the case of non-linear classes; and Hi) minimizing the error of class prediction; by delimiting the class boundary with relevant eigenvector variation. Furthermore, one can expect that many classes can be highly non-linear and extremely segmented throughout the feature space. Many classes can also have scattered clusters across the feature space, because other conditions are dominant of feature space variation..

[0069] Due t the complex classification of health conditions, and as many conditions are multivariate, the supervised clustering is devised into the following classes: 1} single univariate diagnosis - where the discrimination function is a single .parameter interval or a threshold: ii) exclusive univariate or multivariate diagnosis - where only isolated cases of each class in the feature space are identified without any overlapping with other classes; and iii) multivariate /complex diagnosis - where only overlapping of data from multiple conditions in the feature space are taken into consideration (see Figure 5),

[0070] The clustering criteria allows to characterize complex health, conditions and map them into the spectral feature space, constituting a classification knowledgebase. The following procedure was developed to build Use classification knowledge map:

Objective: Sequentially mapping the geometry of classes logis tic probability co-variance in the feature space.

1, Initialization; a) Define the clustering criteria: a) univariate classes; b} exclusive classes or c) multivariate classes; and d} each class threshold; b) Provide: supervision vectors or matrix S

2. Convex Hull determination: i) select the supervised data in the feature space; ii) find the max and min coordinates of the supervised data in the feature space; iii) select one of the vertices*; iv) define the size of the directional search box; iv} define- the volume increment criteria of the convex hull (<5 v) ,

4. Initial search; i). determine the number of eigenvectors and predict the errors of classification; ii) remove statistically inconsistent data; iii) if no relevant direction is found, re-shape the convex hull geometry by moving inwards to adf and perform step 4; repeat steps 1 and 2 until it stabilizes. 5. Determine clusters boundary: For each clyster, perform Algorithm 2, where the supervision vector s or matrix S is the logistic function probability of the cluster class associated to the corresponding spectra. See Figure 5 how Algorithm 2 is: used to find the cluster predictive convex hulls.

6. Compilation: Compile knowledgebase classification clusters in the feature space map by registering their convex hulls

7. Output: Compiled mapping of cached ciusters

Mathematical and algorithmic details of this rocedure are presented in Algorithm 3.

10071] At the end of this procedure, the complete cluster map of the feature space is recorded as a classification knowledgebase. The full composite of all types of classifications for different conditions represent the classification complexity of the knowledgebase, where interactions between conditions, as- well as, their metabolic causes cart be studied. By projecting a new spectra into this classification map, one can predict the expected probability of a correspondent condition based on die coordinates of the knowledgebase map.

IV, The geometry of local Variation

[0072] Previous sections explained how the Ai of this invention is able to provide quantification and classification by recurring to search algorithms across the feature space using co-variance eigenvector extraction, providing maps of quantification and classification. The study of local geometr of variation lies at tire heart of_. the AI invention.

[0073] Let's consider that any collection of spectra X and compositional data ¥ can be transformed (e.g. kernel, derivative, Fourier, wavelets, curvelets) into the feature space P and , respectively. We must find a basis W and C, so (hat, the covariance between local latent variance of F and Κ,Τ atid lJ are maximized.

[0074] The problem is reduced to the local optimization in the feature space of: ( , cj = argmax(t'u) (I) where: f ^' = tw^f; and k = uc^f and subjected to: ' = I and c*c = I. By applying the Lagrangian multipliers method to solve the^' optimization problem, one resumes it to: F = ∑c^f m which is the singular value decomposition ofK^rF, where w = W[l, ] c = C[l,], with associated variance ∑[!, ], One can further conclude that F^KKF'w = Aw arid K^PF^Kc = Ae, Therefore, w and c are characteristic eigenvectors of^'Co«'(F, )² = CoffK, F) , expressed in die laten space ί'Ίι, where wand c spawn a characteristic dimension of the co-variance geometry.

[0075] The same kind of derivation is feasible assuming; /{w,.c) ^::: argmaxtt i) when t = u, being particularly useful, because, after deflation t becomes a orthogonal. This assumption, is also the basis of other eigenvector extraction algorithms [47].

[0076] In order to study the geometry of t'u, an ortho-normal basis of eigenvectors w and c is necessary, so that, for each local F one can derive It's local cliaraeteristic dimensions and geometry. Such i achieved by deflation of F and K: (3)

(4) where: tt ~ F^Wf .-Uj = j C,-, and Wf = wj/|jw,-ij, C; = Cj /i jCj l i.

Recurrent deflations until the maximum rank of F allow to determine the geometry of co -variance and its complexity,, by interpreting t* , , and their corresponding importance in relation to the captured co- variance∑ for each eigenvector [48, 4.9],

[00771 When using the approach where t is orthogonal, deflation is performed as follows:

F_/÷i = F_i - ti _i (5)

(6) where p and q are determined by:

[0078] From the relation of p and Q, one can derive a direct linear model, sucha as R - F 5_P/_S + e, where:

where S_{p S} is the pis regression coefficients,

[007&] The fact that a complex geometry of T is condensed into a oblique projection _pi_s [6] , and a GLM is produced, is the cause of PLS inefficiency in big data, especially if a relative high number of dimensions or components are used due to a non-linear feature space, 'Therefore, this strategy iniplies that the local structure of K'Fhas almost only systematic information abotit Y contained In , with almost no random effects at different scales of the spectroscopy signal,

[0080] Moreover, (lie correct feature space transformation is the -one^' that makes possible to obtain similar information structures of F' F and K'F, so that ideally:

(F'F - Afivf = Q (11) (Κ^ι¥~ λ^}ν^ = {1 {12) the local optimization problem remains posed as J(w, c) = argmax u) but with the ideal restriction of ^'information structure being similar {v_¾- ~ v* ~ v^f). In perfect conditions, spectral information shares coilinear eigenvector structure with ^'the composition, such as it happens with pure compounds or substances with negligible interference. Thus, providing co-variation maximization in the first component is paramount.

[0081] A way to provide the AI the interpretation to oth taiidw, is to make both paiiwise orthogonal. We can orthogonalized t using the pis definition^' that produces orthogonal w;

(13)

where Ύ₀ - US and W* ~ (WV)* [50, 51]. There is a direct correspondence to the orthogonal scores T_B, by

T„,, ^ T{P^!W)^{_ i} {16)

And therefore; tiie AI has a way to perform pattern analysis in orthogonal T_w, or T₍, with corresponding orthogonal Win order to derive the coherence of ^'the local subspace and models.

Respecting the following feature space internal relationship: t-i - T_w/3r (17) given by least- squares estimates;

So that any feature space sample projection into T_w follows a coherent linear interference pattern for the local relevant eigenstructure. Therefore, any given new data projected into T_w subspace must be contained in the^' confidence intervals of ts = T_wj8y.

[0082] The complexity of a local dataset can be reduced by refining both samples and variables, as shown in Figure 6. Within the local direction selection of Algorithm 1 one has to find groups of samples ami variables that provide a consistent eigenstructure. Therefore, a fitness function that translates local stability of co-variance is also proposed as the. optimization procedure, opposed to the simple use of cross-validation, of residuals. The local optimization problem has the following properties: i) blindly select the number of .starling, datasets; if) perform relevant PLS regression for each dataset; iii) project into die scores space { = FWi; iv) use robust linear regression to identify eigeratrutitur.es inside T (e.g. RANSAQ; v) re-do the procedure until coherence is achieved. However, the existence of a linear model in the T scores space means that deflation is modelling systematic variation within all the data used to build, the local linear PLS model,

[0083] -The consistenc of each covariance direction is determined by cross-validation (e.g. leave-n- ouf), where ail data points must sustain die local eigenstructure. For any new unknow data, the prediction is performed as follows: i) determine the consistency of predictions in selected subspace; ii) low bias- ariance^' of all training set; iii) low bias- variance of prediction for the unknown data ^'using the different data in the selected datasei; iv) presence in the linearity of the τ eigenstrucliire. Moreover, die predictability of any new unknown data can be obtained by deriving the confidence intervals of the extracted linear eigenstructure in T, so that a p-vaiue of tire prediction is also forecasted. p-Values above pre-defined, thresholds can be considered categorical or unpredictable. The AI has the possibilit 'a priori' of knowing if a prediction with the necessary accuracy, because it only uses well^' known coherent eigeiiStructured data to perform predictions.

[0084] To provide the correct local optimization model oiie must: i) minimize the residuals and their deflation structure; ii) maximize the number of sampies and minimizing the number of variables within the same co-variance; iii) ensure coherence inside (lie T space; and iv) eigenstructure similarity between F and K. Taking all theses objectives into consideration, we can express them .with, .the following optimization function as:

/ - argmin PRESS x —■+ ^{rhe ,<irs} ₊ l^ _{+ p} ^„ _J/a;_u J | (_1¾ n Uvars C.01.'(V,W) J J

which can b e regarded as a 'corrected' PRESS (Predicted Error Sum of Squares), where: npc is the number of components, n is the number of data, n_{fe vars}-is the number of selected variables of F from a total of variables: c ii-^f{V, ) is the covariance of F*F and K*F eigenvectors, arid p-value the probability value of the leas t squares model in the T space.

[0085] At the end of this optimization, optimal local models are obtained. Further model refinement is performed by orthogonal information filtering.

V. Coherence of the local sub-space

[0086J The coherence of the local feature space F is ensured by i) eigenstructure similarity between F and K; ii) lo ^■■complexity; and iii) information deterntiiiism. F and K have similar eigenstructure when: j ^ argm-axiV Nf} {20) where: F - IfpSpV^ = T_FV _;; nd K= UA¾V^ = ¾V .

There is an ^'efficient transformation of X and Y, where F = f ) K ~ ^"(¥)» s that ideally Τ¾· = T . As spectral information is multi-scale, one can propose the following multi-scale optimization of signal basis (e.g. Fourier, wavelets, eurvelets):

T= I<? i21) i-l

where i ¾ are the selected individual signal scales so that V Vp is maximized.

[0087] Eigenstructure of K' F is of extreme importance. Complexity of F ca be estimated by the distribution of its eigenvalues∑, which define die characteristic dimensions of the feature space_* in spectroscopy signals, one expects∑ to decrease exponentially to a limit value:

∑^■:=∑_f + (∑i -∑,·) x e~^u {22) where∑< is the expected <th eigenvalue,∑,· the residual eigenvalue,∑i the largest eigenvalue, .a¾d A ^'.the d ecay factor. Local complexity of the feature space can be measured by (lie following metrics: I n_pc

C (23) k n

When k,^■→· +oo implies that C— 0. Such limit is asymptotically approximated when ii_pe -> 1 /\ « » n_fic. When∑,■—♦· 0, K^fF is. rank deficient.

Randomness of eigenstructure is obtained by randomization of F, K and K'F [52], Row randomization allows to determine the limit of sample spectra that determine the numbe of eigenvectors that spawn the row vectors; where as, column randomization determines the limit of eigenvectors that allow variables spawn of column vectors. Statistical stability of the number of eigenvectors is provided by cross- validation.

VI. Sub- space Information Optimization

[0088] Given the previous procedures, the selected direction of the feature ^"space already provides a sta- b!elinear model, it is further expected, that the minimal number of eigenvectors are necessary to predict Y. Despite the signal being pre-processed {e.g. baseline, scattering effects, stray light) and transformed to a better feature space basis, there will be always systematic interference in the data. Such interferences affect the scores -loading (t-p) relationship beyond the first component. Therefore, ideally, ail GLM should be obtained with only one eigenvector. Such is not always feasible, but, one can greatly simplify model relationships by orthogonal filtering.

[0089J F and may have systematic information tha t is not related to each other. Therefore, one must know how the local information is structured, that it how much information F and K hold in common, and how much is independent by performing orthogonal filtering (20, 21, 22]

[0090] Figure 7 shows how orthogonal filtering results into lower complex eigenstructure. Figure 7a to 7c shows how the previous steps optimized the local calibration dataset using XI, X2 and X3 subsets, greatly reducing the complexity of the linear model to only three predicting latent variables. Such result still shows that XI, X2 and X3 subsets have systematic interference and can be subjected to orthogonal filtering, so that;

P ^ TP* - T_;,P' (24)

K ~ TQ' +V,_lP^t _tl

Where T are scores with common information between F and that maximize co-variance, and P, Q the correspondiiig loadings. % and U₀ the orthogonal scores to the covariance and P₀, Q₀ the corresponding orthogonal loadings; that is, T₀± and U₀_LF [21].

[0091] By recursive selection of samples and variables, one can maximize TP* {Figure 7c to 7d), where the number of latent variables is minimized to the optimal lower level near one, that is, no deflation- is necessary, and exists a direct correspondence between F and K. its expected that die correct feature space transformation leads Τ_ήΡ£ -» 0 and that F = TP^f as obtained by regular PLS algoritiiins.

[0092] Similarly, 0«P would be zero. Any quantification with analytical grade quality should not have any systematic variation, orthogonal to its quantification. When υ_βΡή is significant, it means that tire AI cannot be properly trained to provide an accurate prediction, as the original training information suffers of systematic errors. Under the correct conditions, U₀P₀ ^! -» 0 and TP' 3 ^'T₀P£, so that: K = P/J^.

[0093] IfT₀P is sigmficant, the feature space transformation w s inefficient, in these cases, prediction is performed by applying the orthogonal filter first, t₀ = ¾>ο ρ£ρ«_> and ί™,_τ = f- t_epj, and k = ΐ ,η-βρί,·:- For completeness, the method is described in Algorithm 5,

VII, Metrics for sub-space characterization

[0094] One of the main advantage of the proposed approach is the possibility of characterization of the self-learned knowledgebase by incorporation oimaps of local learning metrics, such as: i) number of data representation; ii) eigenstructure complexity; iii) coUinearity between F K; iv) predicted sum of squares (PRESS); vj variance of ΚΊ-'; and vi) model information structure. Detailed metrics of the knowledgebase are presented in Table 1,

[009$] By characterizing the feature space, the AI system manages both seif-leaning and prediction by knowing how the different regions of die feature space cover the quantification and qualification accurately

VIII. Self- learning mechanics

[0096] The previous sections demonstrate tire algorithms and algebraic procedures f the self-learning methodology. Herein, its provided how the procedures are put together into a system that auto -implements its self- learning from feed data without human intervention, so drat it can: i) learn autonomously by data feed from zero data to vast quantities of data in big data spectroscopy data; ii) determine the best multi- scale feature space that best captures co-variance; iii) predict new unknown data based on the knowledgebase and how it handles unpredictable data; iv) self-learn building the quantification and classification maps, and uses them to perform computationally efficient predictions and learning f om new data.

[0097] Biological variability in body-fluids and body tissues is extremely vast, In big data, One may never determine the meaning of a representative sample to build a robust knowledgebase to be able to build a monolithic model strategy that copes with all the possible spectral combinations. Moreover, biological systems evolve, and their biochemistry is always changing, new cells, new proteins, new metabolites. Therefore, spectroscopy Ai applied to biological systems must always self-learn. The developed system is able to self-learn from an initial, very limited, knowledgebase, b constantly adding new data that the sy tem cannot predict. The system begins by computing me feature space and the initial knowledgebase by using the inetrics and methods of the previous sections. By managing die predictability of each sub -space of the feature space, die system can sustain if die new aquired data is eidier predictable or should enter a learning cycle. If cannot be predicted, data is added to a quarantine database, that acts as a vault repository of data that either has no neighonrs (e.g. in the begining, any system will never cover all the feature space) or consistent modelling. The gathered data in the quaiai.it.ine database passes only to the knowledgebase once new gathered data completes the corresponding area of the feature space allowing the development of a coherent sub -space knowledgebase.

[0090] Figure 8 shows the main mechanics of the self-iearning process. Let's consider that the system is inicially fed with a limited number of pair .of spectra and corresponding composition {X and Y). A any newX is recorded, it's projected into the initial feature space map and tested if it belongs to existing kiiowledbase. IF the projection -is within the vicinity of an existing model path, and a direct prediction using existing cached model, a prediction is formulated. If the prediction is not within die tire expected quality, and if there exists neigbours that make it possible, a new model is built by the Algorith 1, a prediction is performed and the corresponfing model and path obtained by Algorith 2 are cashed.

[0099 J When any new spectra is projected into the feature space and has no neigbours, its immediatly quarantined. The system enters into the learning cycle and asks the user or system to provide the composition of the sample to be quarantined. Once it has the pair X and ¥, it searches for quarantine neighbours., f f it has no neigbours, die data just stays quarentiried , If it has neigbours, the learning process begins using Algorithms 1 and 2 fo searching both local models and build the local co -variance map. Only vvhen^' a new data in conjunction with the quarantined data are able to produce a consistent local model and model path, the data is certified to pass into the knowledgebase. The knowledgebase receives constant updates as new data is added, and predictions are extended to new regions of the feature space.

[0100 ] In this sense, the s :ys tern: i) never produces redictions that are iiotwithin the knowledgebase ; ii) maintains and studies the quarantine database; iii) validates quarantine data to pass into certified knovvi- edbase; iv). only uses certified data to build the knowledgebase and predictions; vj self-learns without human intervention; vi) independent of^' the data size, growing the knowledgebase with fed data. Moreover, this approach does not need large scale databases for starting building the knowledgebase and performing predictions, such as deep learning neural etworks. The system only uses the certified knowledgebase, and therefore, predictions do not suffer from bias, as other modelling approaches would, because they need significant amount of data t o produce a globally stable model architecture. Co-variance, classification maps and cached models make the system very computationally efficient. I^'he system asms any spectrometer into an operating independent machine that does not need human intervention to build mathematical models as today's previous-art systems.

(0101] Finding the correct basis of transformation of X into F and Y into Klies at the heart of building a comprehensive feature space where local linear models are extrac ted as presented in section V, The basic principal for feature space transformation is the maximization of the eigenstructure similarity between F and as presented in equations 20 and 21. If a base transformation is able to filter systematic variance unrelated between X and Y and noise, the eigenstructure of P and became equal,

[0102] Figure 9 Shows how the feature space transformation i performed. Any spectroscopy signal is decomposed into a ormonorrnal basis (e.g. Fourier, wavelets, curvelets). These basis provide an independent basis to re-construct scale of the signal based on the basis properties. If present, the information about any metabolite is scattered across different scales of the spectrum, and therefore, the optimal spectral variation for a particular molecule has to be extracted from the original signal using scale reconstruction.

10103] After- full spectra decomposition, one must find the .optimal basis that provides tire combinations that maximize the eigenstructure similarity between F and K, where, P = TfV^ and K = T&V ; and V V is maximal. Under perfect match of information, T/ - ¾ = T, so that both F and K have the same eigenstructure. Note that, under N1PALS PLS, the assumption forehand that maximizes correlation the non-transformed X and Y it that part of the information structure has the same eigenstructure, tha t is the same scores T. Here in, we first build a feature space that will allow scores similarity assump tion, greatly contributing for the sucess of the presented invention. Its expected that once the feature space transformatio is acheived that a direct linear relationship between K and F exists: K = ¥β. Therefore* what we prove is that PLS or SVM type of assumptions are only possible if the eigenstructure of K and E are similar. Otherwise, systematic information will contaminate die scores inner relationship assumption and support vector coherence.

[0104] Therefore, let's consider that one decomposes into an orthonormal basis μ both K and F:

F = O_f (26) κ == υ,.» {27} where there is a combination of U/ and ¾ that minimizes fee error { } of tfj- - ΙΙ / β + e, where β - POf&f) ^{~ l}Vtf Ut.- The problem of finding maximal similarity of eigenstructure is an optimization problem Of findin the best linear combinations of Vtj and t¾ that maximize^' the coin n information between F and K, defining autonouiy the feature! space transformation. [01051 B performing this transformation, most of the unrelated systematic and random components of the spectra and composition are eliminated. The system self -learns how to extract the best combinations of μ that quantify a particular metabolite by evolutionary algorithms, such as, simplex, particle swarn optimization and genetic algOritms. Once a feature space transformation is learned for a particular sub-space, the system does not need to re-caleulate, but uses the transformation directly to produce a predicti n.

[0106] Figure 9 shows the iluxogram of feature space transformation where; i) the original signals are decomposed; ii) me iniciai estimate of best basis are estimated by linear regression; and iii) the basis combination is optimized by evolutionary methodologies. If a combination of basis is found so that the eigenstrueiure criteria is met, the information about the transformatino is cached and used in future preditions as the feature space transformation for building tire feature space.

IX, Cashed models, Co-variance and clijssific tiori maps

[0107] Using cached models, co -variance and classification maps are paramount for computationally efficient self -learning artificial inteliigeiicej leading to significant savings iri computational resources. Figure 13 shows how cached models are used to speed up predictions. Once a new spectra is recorded, it's projected into the feature space and checked for a model path nearby. If so, the prediction is performed by using methods in section IV; and once any new spect ais recorded the following actions are performed: i) is a cached model is able to perform the predictions accurately, the result is presented to the end user; ii) i f neighbor models are able to present boundary threshold quality predictions, the system can provide a consensus prediction before computing a new model and updating the knowledgebase; and iii.) if neighbor models do not provide sufficient quality predictions, a new search for a local model is performed, deploying a new model path in the knowledbase.

RESULTS AND DISCUSSION

Q lific tion

[0108] Herein, a demonstration the. effectiveness of the self-learning artificial intelligence method is perfomed, b benchmarking the prediction of unknown blood and blood serum samples. Results are compared to the state-of-the-art of chernometrics partial least squares (PLS) global model to provide a simple base of comparison to the previous art. The global PLS was obtained balancing between bias- variance by cross-validation to derive the minimum number of eigenvectors, or latent variables (LV's). Whole- blood, arid serum unknown sample predictions were analyzed intertns of; i) model complexity; ii) average error of prediction (%); and iii) co-linearity - Pearson correlation {if},

[0109] Figure 13 exemplifies why PLS cannot cope with the complexity of a biological fluid, such as blood. Despite erythrocytes are the major cellular component of blood, and directly related to hemoglobin content, it could be expected that a Unear model, would be sufiieht to predict accurately the amount of erythrocytes cells. Figure 13a shows exactly the opposite, that erythrocytes spectral quantification is highly non-linear affected by significant interferences, so that, a PLS model shows very high variance and significant bias at high erythrocytes count {e.g. > 5 - 10¹² cells/ L), Interferences are expressed in the? LV's of the PLS model. Such means mat non-linear iterative least squares had to deflate 7 eigenvectors to find a common direction in data that quantifies erythrocytes count. This large variance means that even major components exhibit complex spectral patterns, that once reduced to linear quantification, significant prediction bias is obtained {11,50%, Table 1). General linear models struggle to acheive analytical grade prediction in healthcare. [01501 Figure 13c shows the PLS prediction for leukocytes. Leukocytes are present in biood in lower concentrations than erythrocytes, but are still, a significant proportion of the cellular component. The difference in magnitude is .enough to show that it is not possible to predict leukocytes with PLS. Results of Figure 1 lc ^'show dial predictions have a very significant variance and large bias. PLS. could Only provide a model with 27% error (ff² = 0.45}, with a large number of LV's (10), showing the significant amount of spectral interference affects the leucocytes quantification in hole blood.

[0111] Erythrocytes and leukocytes are a good example on how the self -learning method handles the complexit of spectral information to provide an accurate prediction based on local multi-scale modeling. Figures lib and lid present the self-learning artificial intelligence results for erythrocytes and leukocytes, respectively. Both parameters exhibit very low variance and bias, -allowing medical grade quantification for diagnosis, with only 2.4% and 5.15% error and very significant correlations (See Table 1).

[0112] Most importantly is the complexi ty reduction of both models to only one IV. The self-learning artificial intelligence was able to find local multi-scale linear relationships, filter variables and samples, so that, a direct correspondence between spectral information and quantification was found, filtering-out the complex interference effects in biological samples.

[0113 ] Table 1 resumes the quantification results for whole blood and blood serum parameters. Hemogra parameters such - s- erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes and platelets. Results how a very significant improvement by the self-learning methodology where all parameter estimates exhibit errors below 6% within tire studied range.

[0114] Figure 14 shows the results for bilirubin and myoglobin quantification in blood serum. Bilirubin is a significant constituent of blood serum, with yellow-brown coloration. Myoglobin is present in- lower quantities, but when present, its spectral fingerprint is very significant in blood serum in the vis-nir region. Therefore, it would also be expected that both, molecules could be be linearly quantified by a PLS model. Results in Figure 12 a and 12c show that bilirubin and myoglobin PLS prediction exhibits very significant variance, -with .errors of 12,5% and 31.0%, respectively. Despite thes molecules provide a very strong fingerprint in the spectral signal, they still suffer significant interference.

[0115] The most relevant result is the fact that bias-variance is significantly reduced when using the self-learning- rtificial intelligence method. Most models decrease in complexity, and all parameters that are presented in higher concentrations only use 1 eigenvector projection {one LV), The proposed method was able to find a local mufti- scale spectral information linearly correlated with molecular quantification, in tiiis sense, all the studied hemogram parameters (erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes and platelets} were able to attai analytical grade quality with. bias below 6%.

[0116] Similar conclusions were obtained for blood serum, where high concentration parameter such as bilirubin or high absorvance such as myoglobin are directly quantifiable usin only 1 IX Other lower concentration parameters, such as, glucose, creatinine, GR¾ triglycerides, urea and uric acid, greatly reduced their model complexity to 2 to 3 LV¾. Such is an indication that lower concentration parameters suffer more interferences and local variations, as well as, their accuracy starts to be affected by die detector b ckground noise.

[0117] Figure 15 presents the benchmark of PLS v Self -learning artificial intelligence. PLS modeling could onl sustain POC qualitative quantification lor: erythrocytes, hemoglobin, MCV MCHC, platelets, bilirubin and CRE The error of these parameters are .around 7% to 12%. All other parameters estimated using PLS modeling did not met the 15% error criteria for POC (see Figure 15).

[0118] Se!f -leartiing AI wa able to attain medical analytical grade quality in the following parameters; erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes, platelets, bilirubin, glucose, myoglobin, CRR triglycerides and uric acid. Only creatinine and urea quantification were above the 5% limit, but di d qualified for POC qualitative analysis.

[0119] The proposed self-learning artificial intelligence method greatly solves the previous technical barriers presented in .the background art, allowing spectroscopy to attain analytical grade errors.

Classification

[0120] Herein its also demonstrated the effectiveness of the proposed self-learning method for classification of known health conditions, such as: anemia, leukocytosis, thrOmbotopenia, thrOmbocyfliemia, hepatic insufficiency, diabetes mellitus, acute myocardial infarction, renal dysfunction and inflammation,

[0121] The classification of these conditions was perfpnned according to the diagnosis cut-off values: 0 anemia - erythrocytes count levels below 4 · I Q^U I L and hemoglobin levels below 13 g/dL: ii) leukocytosi - leukocytes levels above i0^{i u}/.L; iii) throrabotopeiiia - platelets levels belo 100 · 10^s/£; iv) thrombo- cythemia - platelets levels above 400 · 10⁹ I; vj hepatic insufficiency - bilirubin levels above 1,2 rng/dl; vi) diabetes mellitus · glucose levels above 100 mg/dl; vii) acute myocardial infarction - myoglobin levels above 147 ng/mi; viii) renal dysfunction - creatinine levels above 1.3 mg/ml; fx) infiamatioii - C-reactive protein levels above 2.0 mg/dl.

[0122] Table 2 presents the classification results for the presented conditions, in term of true and false, positive and negative combinations, respectively. Results show that self-learning classification is superior to a linear classifier, logistic PLS. This is especially signi ficant for conditions where tire cut-off value for diagnosis is at low concentrations, such as for thrombotopenia, or for conditions that suffer complex interferences, such as infections with high levels of leukocytes {leukocytosis}. The global PLS model is only able to sustain point- of -care {15% error of classification) for anemia, thrombocythemia and acute myocardial infarction. Most parameters exhibit levels of 50% to 80% chance of correct diagnosis, and therefore .using linear classifiers proves to be very limited for classification of health conditions,

[0123] Self-learning^' method was able always perform above 85% chances of correct diagnosis. The Self-learning method was able to correctly diagnose 100% of the cases of anemia, thrombocythemia and acute myocardial infarction. Conditions, such as, leukocytosis, diabetes mellitus and hepatic function also attain near complete correct classification (97% chance of being correct), Such, is because, value that are miss classified are near the cut-off, and the laboratory emir was not taken into the account in die Classification methodology. If one takes it into consideration, with an error margin of S%, tiiese conditions are also 100% classified, thrombotopenia and renal dysfunction have classification rate of 87% and 89%, respectively (see Table 2). Such result was expected, as platelets and creatinine values are signMcaritly low for their signal information in .the spectra (e.g. creatinine has 14% of prediction using self-learning, see Table 1}. Nevertheless, tire two conditions are below the 15 classification error, i t is to be appreciated that certain embodiments of the disclosure as described herein may ^'be incorporated as ?ode {e.g., a software algorithm or program) residing in firmware^' aiid/or on computer useable medium havin control logic for enabling execution on a computer system having a computer processor, such as any of the servers -described herein . Such a computer system typically includes memory storage configured to provide output from execution of the code which configures a processor in accordance with the execution. Cited documents:

HI DA Burns and EM. Ciurczak. Handbook of near infrared analysis. Marcel Dekker, liie, New^"Y¾rk, 2nd .edition,

2001.

2] Howell Edwards Ian R. Lewis. Handbook of Raman Spectroscopy: From the Research laboratory to the Process Line. Marcell Dekker, eiv York, 2001.

IS) P. Misra and MA DubinskiL Ultraviolet Spectroscopy And Uv Lasers. Marcell Dekker, ew¾rk, 2002.

[4] L. Marcu, P.M.W French, and D.S, Elson. Fluorescence Lifetime Spectroscopy and Imaging: Principles and Applications in Biomedical Diagnostics, CRC Press, Boca Raton, Florida, 2014.

[5] E Geladi andB. Kowalsky. Partial least squares regression: a tutorial Analytical Chemical Acta, 185:1-17, 1986;

[6] A. Phatak and S. Jong. The geometry of {partial leas t squares. Journal of Chemometrics, 11:311-338, 1997.

[7] J, Neter, M.H, umer, CJ, Naehtsrieine, and W. Wasserman, Applied Linear Statistical Models. IRWIN, Chicago, 4 edition, 19S6.

[8] M. Sutter, J,H. Kalivas, and Patrick ML. Which principal components^' to utilize for principal component regression. Journal of Che mo metrics, 6(43:217-225, 1992,

[9] E Bouveresse arid D,L, Massart. Improvement of the piece ise direct standardisation procedure for the transfer of iiir spectra for. multivariate calibration. Chemometrics and LtelUgent Laboratory Systems, 32(2):201-213, 1996,

[10] M. Sutter, J.H. Kalivas, and EM. Lang. Multivariate curve resolution: A review of advanced and tailored applications and challenges. Anefytka C imica Acta, 765:28-36, 2013.

[11] X. Shao, W. Wanga, Z. Hou, and W Cai. A new regression method based on independent component analysis.

Takmta, 69(33:67f-^'680, 2006.

[12] S,K. Shevade and S.S. Keerth. A simple and efficient algorithm for gene selection using sparseiogisticregressiori.

Biolnformatics, 19(Π}:2246-2253, 2003.

113] K, Kim and S, Lee. Logistic regression classification by principal component, selection. Communications for Statistical Applications and Methods, 21^'(l):61-68, 014,

[14] J..{Biawne-Taylor and . Gristianini. An introduction- to support vector machines. Cambridge University Press, Cambridge, Ufc, 2000.

[15] W. Wang and Z. Xu, A heuristic, trainning fo -support vector regression. Neurocomputittg, 61:259-275, 2004.

[16] D. Basak, S. Pal, and D.C. Patranabis. Support vector regression, ^'f ural. Information .Proc kt& .11 {10):203-224·,

2007.

[17] W, Kennard nd L, Stone. Computer aided design of experiments. Techturm ttics, 11:137-148, 1969.

[18] B. K _.). Manly, Randomization,- Bootstrap and Monte Carlo Methods in Biology.- Chapman & Hall, London, England, 2nd edition, 1998.

[19] H. ^'Trevor, J. Robert, and F Jerome. The Elements of Statisti l Learning - Date Mining, Inference, ami Prediction.

Springer- Ve lag, New York, USA, 2009.

[20] J. Trygg nd S. Wold. Orthogonal projections to latent structures {o-pls}. Journal of Chemometrics, 16:1 13-128, 2002. 121] J. liygg and S, Wold. 02-pls, a two-block (xaASy) latent variable regression (Ivr) method with an integral Osc filter. Journal ofOtemometrks, l?{l):53-64, 2003.

[22] Ά Bylesjo, M. Saiitalainen, J.K. Nicholson, Έ. Holmes, and J. Trygg. K-opI$ package: kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space. BMC Bioinform tics, 9:106, 2008.

[23] I-I. Stenlund, E Johansson, J. Gottfries, and J. Trygg. Unlocking interpretation in near infrared multivariate calibrations by orthogonal partial least squares. Analytical Chemistry, 81 (!)^•.203-209, 2009.

[24] H, Lt, X Liang, and Q. Xu. Support vecto machines and its applications in chemistry, Chemornetrics and Intelligent Mbomt ry Systems, 95:188-198, 2009.

[251 CQ. Loo.ney. Pattern Recognition Using Neural Networks: Theory and Algorithms for ^'Engineers and Scientists, Oxford University Press, Oxford, UK, 1997.

[26] O. Devos, C, Ruekebuscha, A, Duranda, I.. Dupoachel, and I.P. Huvennea. Support vector machines (svtn) in near infrared fnir) spectroscopy: Focus ^'on parameters optimization and model interpretation. Chemornetric mid Intelligent Laboratory Systems, 96:27aA$33, 2009.

[27] ; Cristia^'nini and J. Shavwie-Tayior, Kernel methods -for pattern analysis. Cambridge University Press, Cambridge, Uk 2004.

[28] M-W. Chang arid CI. Lin, I.eave one out bounds for snppor vector regression se!eeion. Neural Computation, 17:1118-1222, 2005.

[29] J. Luts, F. OJeda, R, Van_^de Pias, B. De-Moor, S. Van-Huflei, and J,A, Stiykens, A tutorial on support vector machine-based methods for .'.classification, problems in chemornetrics. Analytical Chimk Acta, 665:129-145, 2010.

[30] U. Thisseri, Mill. Pepers, B. Ustun, WJ. Meissen, and Buy dens L.M.C. Comparing suppo t vector machines- o pis for spectral regression applications, Chemornetrics and Intelligent Laboratory Systems, ?3:169-i i), 2010.

[31] F. Chauchard, .fi. Cogdili, S. Roussel, J. . Soger, and V. Bellon-Maurel. Application of ls-svm to non-linear phenomena in {MIR} s ectroscopy: development of a robust and portable sensor for acidity prediction in grapes. Chemornetrics and Intelligent laboratory Systems, 7.1 {2)1141-150, 2004.

|32] Yi Bengio,. Learning deep architectures fbrai. Foundations and TmndsAS in ^'Machine Learning, 2(1):1-127, 2009.

[33] G. I-Hnton, 1. Deng, D a, A.R. Mohamed, M. Fairly, A. Senior, V. Vanhoucke, P Nguyen, T.S.G Dabl, and B. Kingsbury. Deep neural networks for acoustic modeling in speech, recognition. IEEE Signal Processing Magazine, 29(6}:82-97, 2009.

[34] J. Schmldhuber, Deep learning in neural networks: An. overview. Neural Networks, 61:87 -117, 2013.

[35] G, Huang, G..B. Huaii, S. Sting, and.K. You. Trends in extfeitie leartriTig inaciiineS;: A review. Neural ^'- Networks, 61 :32a AS48, 2013.

[36] W. Yu,^'E Zhuang, Q. He, and Z. Shi. Learning deep representations via extreme learning machines. Ne rocom- puting, i49:308aAS315, 2013.

[37] L, Ramirez-Lopez, T. Behrensa, K. Schmidt, A. Stevens, JA-M.. Dematte_., and X ScboUen. The_.speetrurn-based learner. A new local approach for modeling soil vis-hir spectra of Complex datasets. Geoderma, .195 - 196:268 - 279, 2013.

[38] D. Sotomatine, Maskey, M., and Shrestha. D.L. Instance-based learning compared to other data-driven methods in hydrological forecasting* I-tydml. Process, 22:275 - 287, 2008.

[39] T. Naes, T. Isaksson, and B. Kowaiski, Locally Weighted regression and scatter correction for near-infrared reflectance data..Anal Chem., 62f7}:fJ64- 673, 1990.

[40] CD. Christ and S.A. Dyer. Estimation of soii properties rising a combination o f spectral and scalar sensor data.

In instrumentation -and-Mea remmt Technology Conference, 2006, !MTC 2006, Proceedings of the IEEE, 24-27 April 2006, Sorrento, Italy, 2006. 141] J.S. Shenk, M.Q. Westerhaus, and 9. Beraaghi. local prediction with near infrared multi-product databases. Journal of Near Infrared Spectroscopy, 5:223-232, 1997.

[42] Feam and'A.M.C. Davies. Locaiiy-biased ^'regression. Journal of Near Infrared, ll{6):467- 78, 2003.

[43] A.M.C D^'ayies and T. Fearn, Quantitative analysis via near infrared databases: comparison- analysis using re.- structui¾d nearest infrared and constituent daia-deitx ^'{carnac-dt}. Journal of Near Infrared, 14{6):403—41 i, 2003.

{44] E Goge^", R. Joffre, C, Joiivet, I. Ross, and L, Banjard. Optimization criteria in sample selection step of local regression for quantitative analysis of large soil nirs database. Chetn&tmii s mid Intelligent Laboratory Systems, ΠΟ{Ι):168-176, 2012.

[45] L Ramirez-Lopez, T. Behrerts, K, Schmidt, A. Stevens, J.A.M, Demattfi, and T. Schoiten, The spectrum-based learner: a new local approach for ^'modelling soil vis-hir spectra of complex datasets. Geoderma, 195-196:268- 279, 2013.

[46] L. Ramirez- Lopez, T. Behrerts, K, Schmidt. SA Viscarra Rossel, .(AM, Demattg, arid X Schoiien, Distance and similarity-search metrics for use with soil vis nir spectra. Geodmna, 199:43-53, 2013.

[47] U.G. indahi. The geometry of plsl explained properly: 10 key notes on mathematical properties of and some alternative algorithmic approaches to plsl modelling. Journal ofChernormtrics, 24:188-180, 2014,

[48] R.J, Peii, L.S. Ramos, and R, Marine. The model space in partial least squares regression. Journal ofCherrtamet- rics, 21:165-172, 2007,

[49] S. Wold, M. Hoyc, H. Martens, J. Trygg, E Westade, J. MacGregor, and B.M. Wise. The pis mode! space revisited.

journal of Chemomelrics, 23:67-68, 2009.

[50] . Ergon. Finding y- relevant: part of x by use of per and plsr mode I reduction me thods. Journal afChemometrics,, 21:537-346, 200?.

[51] R. Ergon. Re- interpretation of nipais results solves plsr inconsistency problem. Journal of Osmometries, 23:7.2- 75, 2009.

[52] It C. Martins, V. V. Lopes, K Valentao, J,C. ,E Carvaiho, P. Isabel, M. T. Amaral, , T. Batista, R B. Andrade, and B.M. Silva. Iteievant rindp lCampoaent anal st applied.to^' tlie liafac erisaticm ofporttiguese lieatlier honey. Natural Product ' Research, 22:1560-1582, 2007.

Claims

C L A I M S

1. Method for artificial intelligence self-learning for quantification of metabolites from spectral information across a big data feature space, comprising:

defining a robust feature space transformation at a local scale and its corresponding local sub-space;

finding a direction of co-variance at the local sub-space and defining sub-space geometry;

optimizing of the convex hull geometry of the local feature space to produce local models.

2. Method according to the previous claim, for mapping co-variance in the feature space, comprising:

search directional finding and local co-variance optimization for quantification from spectra of said spectral information;

determining recursively paths of optimal covariance along the feature space;

mapping optimal co-variance paths across the feature space, with recursive path determination;

using path models as a knowledge base, and corresponding cached models to perform direct metabolite quantification.

3. Method according to any previous claim, for defining the geometry of the local co- variance sub-space of the feature space, comprising:

maximizing eigen-structure similarity and corresponding latent projections of spectra and metabolic quantification;

extracting robust minimal eigenvectors combinations of local models to obtain near direct correlation, such that model complexity and consequently bias- variance is lowered;

determining optimal samples, scales and variable defining the optimal local sub- space using multi-factor optimization of the prediction error, latent variable ratio, eigenvectors of local feature space, and their corresponding co-variance, and probability of model fit;

determining consistency of systematic variation of a local subspace present in its eigenvectors by a score relationship linear model.

4. Method according to the previous claim, for determining the predictability of a new unknown spectrum, comprises analyzing the projection of said new unknown spectrum into the local sub-space score relationship linear model, where a p-value of the prediction corresponds to a distance position to an expected linear trend and corresponding confidence interval.

5. Method according to claim 3 or 4, comprises determining the coherence of the local sub-space from the structure of extracted co-variance eigenvectors, systematic non-related information and error filtering, minimizing the number of latent variables and local model complexity.

6. Method according to any previous claim, comprises sub-space optimization and complexity reduction by local geometry orthogonal filtering of both compositional and spectral data, such that a more direct correlation is obtained between both compositional and spectral data.

7. Method according to any previous claim, for determining optimal multi-scale transformation to both composition and spectral data such that the direct relationship between both is maximized, comprises filtering out scale and frequency combinations that have non-related systematic variation.

8. Method according to any previous claim, for performing classification of health conditions by searching and delimiting non-linear groups in the spectral feature space, comprising:

defining a cluster criterium of univariate, multivariate or exclusive class;

determining convex hulls across the feature space and their individuality; determining iteratively a class boundary using iterative logistic partial least squares path modelling;

compiling the mapped clusters from said class boundaries.

9. Method according to any previous claim, comprising recursive cluster classification for delineating the classification knowledge base map, comprising:

observing and determining the self-learned knowledge about the composite of all trained types of classes;

determining the complexity of the classification knowledge base;

observing interactions between conditions and evolution of complex conditions; predicting the health status of a complex condition by projection into the knowledge map.

10. Method according to any previous claim, comprising calculating the predictability of the classification based on local scores projection into the linear model of scores, where:

inside the boundary of the cluster all data is linearly classified;

all the data inside a convex hull follows the scores linear classifier;

determination of the probability of near class border is classified by the linear model of scores method.

11. Method according to any previous claim, for updating knowledge base from a data feed, comprises:

new data, from the data feed, not having neighbours is placed into a quarantine database;

when stored new data completes a quarantine local dataset, it is flagged for local model building;

if a local quarantine model exists, the knowledge base is updated with the local model and corresponding data;

a new path is added to the co-variance map based on the updated knowledge base; a new cluster or clusters are added to the category map based on the updated knowledge base.

12. Method according to any previous claim, for self-analysing a new unknown spectra projected into the knowledge base for its corresponding predictability, comprising: determine the predictability of quantification or classification;

if the probability of quantification or classification is lower than a predetermined threshold, the new unknown spectra data is sent to a quarantine database and lab analysis are flagged to be ordered for training the unknown sub-space.

13. Method according to any previous claim, comprising storing uncovered regions of the feature space in a quarantine database while the uncovered regions of the feature space do not have enough characterization to establish predictable local models and mapping of co-variance and classification.

14. Method according to any previous claim, wherein input spectra are transformed into super resolution using deconvolution prior to basis transformation.

15. Method according to any previous claim, for analysis of nearby path model projection in co-variance maps:

if a new spectra is within the geometry of local model, a quantification prediction is directly performed;

if a new spectra is nearby a path model and cached models are near the limit of predictability, performing a consensus prediction and afterwards updating the knowledge base with a new model;

if no model in the vicinity is able to produce an accurate quantification prediction, a new model is computed, added to the knowledgebase and a quantification prediction is output.

16. Method according to any previous claim, for self-learning outlier data, comprising: placing said data into quarantine database; every new spectrum that cannot be predicted by the knowledge base, is added to the quarantine database and the subspaces of the quarantine are analyzed;

if a quantifiable mode is found, the data and self-learned information is passed into the knowledge base to provide new accurate predictions in this identified sub-space.

17. Method according to any previous claim, for providing both analytical and point- of-care grade quantification of clinical analysis parameters, applicable to both healthcare or veterinary practice, in particular to one or more of the following in blood or blood serum: erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes, platelets, billirubin, glucose, myoglobin, creatinine, C-reactive protein, triglycerides, urea and uric acid.

18. Method according to any previous claim, for providing point-of-care probabilistic health condition diagnosis to one or more of the following conditions: anemia, leukocytosis, thrombotopenia, thrombocythemia, hepatic insufficiency, diabetes mellitus, acute myocardial infarction, renal dysfunction and inflammation.

19. System for artificial intelligence self-learning for quantification and classification of spectral information configured to carry out the method of any of the previous claims and comprising a spectroscopy device data feed.

20. System according to the previous claim for quantification and classification of metabolites from biological samples, such as blood or blood serum, comprising: a spectrometer and spectrometer operating system for acquisition of spectra, sending and receiving results;

a cloud database system for storing raw spectral data, processed spectral data and results;

a cloud based self-learning artificial intelligence engine, to process data accordingly; a cloud server based interface for communicating data and results with devices and client applications.

21. System according to claim 19 or 20, configured such data is recorded by an independent spectrometer and recorded data is sent to a cloud server system where storage and calculations are able to performed.

22. System according to any of the claims 19-21, comprising a cloud server system comprising:

raw spectra database;

feature space, co-variance map and classification map;

pre-processing and feature space transformations;

processed spectra, intermediate results and metrics;

cached quantification and classification models.

23. Non-transitory storage media including program instructions for implementing a method for artificial intelligence self-learning for quantification of metabolites from spectral information across a big data feature space, the program instructions including instructions executable to carry out the method of any of the claims 1-18.