WO2019194693A1 - Spectrophotometry method and device for predicting a quantification of a constituent from a sample - Google Patents

Spectrophotometry method and device for predicting a quantification of a constituent from a sample Download PDF

Info

Publication number
WO2019194693A1
WO2019194693A1 PCT/PT2018/050012 PT2018050012W WO2019194693A1 WO 2019194693 A1 WO2019194693 A1 WO 2019194693A1 PT 2018050012 W PT2018050012 W PT 2018050012W WO 2019194693 A1 WO2019194693 A1 WO 2019194693A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
constituent
quantification
quantified
sample points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/PT2018/050012
Other languages
English (en)
French (fr)
Inventor
Rui Miguel DA COSTA MARTINS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INESC TEC Instituto de Engenharia de Sistemas e Computadores Tecnologia e Ciencia
Original Assignee
INESC TEC Instituto de Engenharia de Sistemas e Computadores Tecnologia e Ciencia
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INESC TEC Instituto de Engenharia de Sistemas e Computadores Tecnologia e Ciencia filed Critical INESC TEC Instituto de Engenharia de Sistemas e Computadores Tecnologia e Ciencia
Priority to PCT/PT2018/050012 priority Critical patent/WO2019194693A1/en
Priority to CN201880092068.3A priority patent/CN111989747A/zh
Priority to US17/043,481 priority patent/US20210020276A1/en
Priority to JP2020554164A priority patent/JP7273844B2/ja
Priority to EP18724345.6A priority patent/EP3776561B1/en
Publication of WO2019194693A1 publication Critical patent/WO2019194693A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/27Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands using photo-electric detection ; circuits for computing concentration
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/483Physical analysis of biological material
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J3/00Spectrometry; Spectrophotometry; Monochromators; Measuring colours
    • G01J3/28Investigating the spectrum
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Definitions

  • the present disclosure relates to spectrophotometry method and device for predicting a quantification of a constituent from a sample to be quantified.
  • Spectroscopy is an indirect measurement of metabolites, either for their identification or quantification.
  • Each molecule or atom has a characteristic spectral fingerprint obtained by absorvance or emission, reflectance, fluorescence, phosphorescence and Raman scattering; and band intensities are directly proportional to the specimens concentration.
  • the spectrum signal is the result of band interference of primary absorvance bands and overtones; resulting into a continuous spectra of overlapping bands.
  • the increased interference between constituents turns the possibility of quantification by peak intensities difficult.
  • metabolites must be preferably quantified by their interference pattern (Geladi and Kowalski:1986, Phatak and Jong:1997).
  • Big data spectral variability - Deep ANN and non-linear SVM are complex function models that fit to all data, That is, a structured monolithic model is produced for a population of data.
  • the model is unable to find the correct co-variance between spectral bands and composition.
  • ANN and SVN are extremely exposed to significant bias in their predictions.
  • LW-PLS locally weighted partial least squares
  • SBL spectral based learner'
  • oPC optimized principal components
  • RSMD root square mean difference
  • the present state-of-the-art approaches are unable to technically solve the complexity of spectrum quantification and provide the necessary accuracy and precision (bias and variance) to be used in critical applications, such as, medicine. Correct medical decisions can only be supported by analytical grade data.
  • the present disclosure presents a method and device intended to overcome the mentioned technical problems and the current technical difficulties of artificial intelligence and pattern recognition in spectroscopy, to provide accurate quantification and
  • This disclosure relates to a big data self-learning artificial intelligence method and device for the accurate quantification of metabolites classification of health conditions from spectral information, where complex biological variability and multi scale spectral interference is present.
  • this disclosure allows the breakdown of highly complex biological spectral signals into high dimensional feature space where local features of each sub-space are accurately correlated with both a specific metabolite concentration or categorical condition. Such is achieved by a self learning method, that requires no human intervention.
  • the developed artificial intelligence is able to establish its own knowledgebase when new data is fed by performing feature space transformations, searching directions of co-variance and optimizing local composition-spectral correlations.
  • the present disclosure also allows evaluating ' a priori' the predictability, accuracy and precision of new estimates. Furthermore, this disclosure provides a self learning approach to de definition of the global feature space using big data, for its correct characterization under high variability, accurate detection of local anomalies, as well as, outliers that can contaminate the knowledge base.
  • This disclosure is applicable to all regions of the electro-magnetic spectra used in spectroscopy analysis (x-ray, uv, vis, nir, ir, far-ir and microwaves), or with any other type of spectroscopy (absorvance, reflectance, fluorescence, phosphorescence, Raman scattering) where complex multi-scale interference and biological variability is present. It further extends to fields of non-destructive, non-invasive spectroscopy applications in fields such as healthcare, veterinary, biotechnology, pharmaceutical, food and agriculture.
  • An embodiment comprises, for determining the predictability of quantification of the constituent of the sample to be quantified, by:
  • An embodiment comprises, if the minimum of neighbouring sample points is not existing, the steps of:
  • An embodiment comprises:
  • the subset allows predictability, releasing the subset from the quarantine database for use in the present spectrophotometry method for predicting a quantification of a constituent.
  • An embodiment comprises, for determining if a subset of the accumulated spectra and measured quantifications allows predictability of quantification of the constituent of the sample to be quantified:
  • said released selected neighbouring sample points constitute a local model.
  • selecting the minimum of neighbouring sample points from sample points within said feature space comprises the steps of:
  • each directional search volume being defined as a region of the feature space that includes said projected spectrum sample point, that extends along a search direction by a predetermined search radius distance from said projected sample point, and that extends from said search direction by a predetermined search width distance;
  • each said model is calculated by selecting a dimension subset from the dimensions of the feature space, said model being calculated using the projected sample points within the directional search volume corresponding to the search direction such that covariance is maximised;
  • selecting the search direction that has the corresponding prediction model that has a maximum predictability of quantification of the constituent to be quantified using the projected sample points within a selected directional search volume corresponding to the selected search direction as the selected minimum of neighbouring sample points.
  • each directional search volume is defined as a region of the feature space that originates from said projected spectrum sample point.
  • An embodiment comprises:
  • minimizing the selected directional search volume by reducing the predetermined search width distance such that the predictability of quantification of the constituent to be quantified is maximized by the model calculated by the selected dimension subset and calculated using the projected sample points within the directional search volume being minimized.
  • the prediction model of each search direction is calculated by:
  • the calculated multivariate linear prediction model joined with the projection defined by the predetermined vector basis provides an interpretive correlation between an input spectrum and constituent quantification.
  • each said directional search volumes is a multidimensional box or cylinder defined by a predetermined distance from a line defined by the respective search direction and the projected spectrum sample point.
  • An embodiment comprises:
  • An embodiment comprises repeating a selection the minimum of neighbouring sample points from sample points within said feature space, by the steps of:
  • each directional search volume being defined as a region of the feature space that originates from the end of the predetermined search radius distance along the selected search direction of the previously selected directional search volume;
  • each said model is calculated by selecting a dimension subset from the dimensions of the feature space, said model being calculated using the projected sample points within said directional search volume corresponding to the search direction;
  • An embodiment comprises repeating the steps above until a predetermined criteria is reached in respect of covariance of the projected spectrum of the sample to be quantified together with the projected spectra of the selected neighbouring sample points.
  • An embodiment comprises:
  • said path model is cached for subsequent predictions of constituent quantification without recalculating prediction models.
  • the quantification to be predicted is a logistic function of a class to be determined from a constituent or constituents from the sample.
  • the predetermined search radius distance, the predetermined search width distance, and the number of the plurality of search directions in the feature space is determined using an iterative optimization method, in particular a simplex algorithm.
  • selecting a minimum of neighbouring sample points from sample points within said feature space comprises selecting a minimum number of sample points above a predetermined threshold of covariance.
  • the predetermined vector basis is an orthogonal information-preserving decomposition into constituent functions or into a matricial factor decomposition, in particular singular-value decomposition - SVD, wavelets, Fourier transform, wavelets, or curvelets.
  • the pre-processing of the obtained spectra comprises deconvolution and/or resolution enhancing of said spectra.
  • the sample is biological and the constituent or constituents are biological metabolites, in particular blood metabolites.
  • Non-transitory storage media including program instructions for implementing a spectrophotometry method for predicting a quantification of a constituent from a sample to be quantified, the program instructions including instructions executable to carry out the method of any of the disclosed embodiments.
  • a spectrophotometry device for predicting a quantification of a constituent from a sample to be quantified, the device comprising an electronic data processor configured for carrying out the spectrophotometry method of any of the disclosed embodiments.
  • An embodiment comprises a spectrophotometer and non-transitory storage media including program instructions for implementing said spectrophotometry method.
  • the prediction of Y is now a problem of finding a consistent sub-space of X l X that holds correspondent information about Y, so that X l Y is consistent with X new variation, producing a stable and reliable prediction. Moreover, only by ensuring that X l X and X l Y are locally coherent, allows us to know ' a priori' if any unknown spectra X new can be predicted based on previous knowledge.
  • spectral non-linear variation there are four classes of spectral non-linear variation, that is, four modes that may provide direct correlation to the composition of a particular substance. Even under such a simple example, the use of a GLM (e.g. PLS) would provide the prediction of ⁇ with high variance. Moreover, if any mode of variation has lack of representation, providing a prediction for a new spectra within this class, will inevitably provide a high biased prediction.
  • Figure lb shows why euclidean distances are not a good measure of spectral features nor able to correlate to composition. All four groups of variation of Figure la exhibit completely different non-linear projections that allow quantification that may not be linearly correlated to concentration. Spectroscopy quantification in complex mixtures is non-linear search for co-variation projection of eigenvectors that locally produce minimal bias-variance of predictions.
  • Figure 2a shows that when using clustering techniques, such as, hierarchical clustering, k-nearest neighbors (KNN) used in the previous art (spectral based learner), it will always result in sub-optimal projections with bias and possible outliers.
  • Figure 2b shows the optimal projections for unknown spectra #1 and #3, which are under different local co-variations, and #2 is an outlier that cannot be predicted by the knowledge base.
  • KNN k-nearest neighbors
  • the new method is able to find the coherent eigenvectors of quantification that sustain a consistent X l X and X l Y.
  • the proposed self-learning does not produces a monolithic model. For each new data, the system has to learn the coherence of X l X
  • Embodiments of the disclosed method comprise the following three major steps: i) local geometry and sub-space identification - where the local geometry of spectral information is extracted as a characteristic sub-space with characteristic eigenvectors that support local quantification/classification; ii) building the knowledgebase of non-linear feature mapping - process by which applying recursive local geometry and subspace identification allows to build the artificial intelligence knowledgebase on non-linear mapping of spectral information; iii) local optimization of spectral information - process of local refining the quantification or classification by minimizing the local convex hull volume and prediction error by filtering out non- related information in both Y and X, or their correspondent feature space transformation K and F.
  • the problem of big data spectral quantification and classification is reduced the search of local geometry that: i) minimizes the number of directions/dimensions of the polytope; ii) obtains a principal direction that minimizes bias-variance; and iii) minimizes the convex hull volume of the selected optimal directional polytope; so that a direct linear model is applicable to this finite space approximation.
  • Figure 3 illustrates the problem in the feature space with a feature space map.
  • a 2-dimensional feature space is presented with highly non-linear variation of different classes that occupy different regions of space.
  • the continuous line represents a coherent co-variance feature with a specimen concentration, that is, along this line, is possible to find coherent eigenvectors of local X l Y in order to produce low error estimates of Y.
  • the line represents a self-learned characteristic of the feature space. For instance, every new spectra X new that is projected into the vicinity of this line can be directly predicted by the self-learned subspace model.
  • the self-learning process focuses in searching the coherent polytopes subspaces that allow specimens quantification with low bias and variance, being illustrated by Figure 3, and with pseudo-code of the process present in Algorithm 1 (see also Figure 17a). Let one assume that a new x, is projected into the feature space. Once projected, the self-learning process has to find nearest neighbors that are within the direction of variation with x, in relation to Y. Therefore, its necessary to search the convex hull that provides the correct direction in the feature space for quantification. Such convex hull must also present a minimal volume and minimal number of eigenvectors of the local X l Y that predict Y. The following sequence of procedures describe an embodiment the self-learning process:
  • Initialization i) define a circle area around x, projection with a radius of search; ii) define the number of directions; iii) define the dimensions of each direction search. 2.
  • Initial search i) determine the number of optimal eigenvectors and predict the errors of local models; ii) remove directions that are statistically inconsistent;
  • the search loop i) determine the number of eigenvectors and prediction error of the new directions; ii) eliminate the worst directions; iii) re-dimension the search by eliminating the worst (smaller or larger) length, and increase or decrease the search length accordingly; iii) loop the previous operations until no statistically significant direction or dimension change occurs.
  • Step B Optimization of the convex hull
  • Initialization Merge the previous output data into an initial cluster that defines the initial convex hull.
  • Main loop i) determine model errors; ii) remove outliers; iii) define the new borders of the convex hull by using simplex geometric optimization - for each outlier removed, move the boundary inwards; iv) compute the new convex hull. Do this cycle until no more outliers are found and model error is stable.
  • Initialization i) start at any given point of the feature space; ii) define: search circle diameter, number of search directions and dimensions of search area;
  • Compilation Compile knowledgebase quantification paths in the feature space map by registering all model paths to be used as cached models (4c)
  • the result of this process is the construction of the feature space quantification map in Figure 4c.
  • the map constitutes the self-learning process of the artificial intelligence method and device.
  • the lines in the map represent coherent path of model local prediction, that is, when a new spectra X new is projected nearby the line convex hull, will likely follow a similar mode of variation of its neighbors inside the convex hull and can be predicted based on the data of the local model.
  • the characterization of the feature space allows to: i. Use cashed models to speed-up computing efficiency - if a new spectra is projected into the convex hull of a previous prediction line, the calculation can be performed directly with a cashed model, where calculations are direct;
  • v. Provide the basis of a higher level artificial intelligence for condition diagnosis using non-supervised spectral information, allowing the construction of a non-linear classification map of complex and multi factor health conditions.
  • the self-learning artificial intelligence method for classification has as major objective of finding the class geometry in the feature space by: i) maximizing the local volume of the class; ii) minimizing the total volume across the feature space in the case of non-linear classes; and iii) minimizing the error of class prediction; by delimiting the class boundary with relevant eigenvector variation. Furthermore, one can expect that many classes can be highly non-linear and extremely segmented throughout the feature space. Many classes can also have scattered clusters across the feature space, because other conditions are dominant of feature space variation.
  • the supervised clustering is devised into the following classes: i) single univariate diagnosis - where the discrimination function is a single parameter interval or a threshold; ii) exclusive univariate or multivariate diagnosis - where only isolated cases of each class in the feature space are identified without any overlapping with other classes; and iii) multivariate/complex diagnosis - where only overlapping of data from multiple conditions in the feature space are taken into consideration (see Figure 5).
  • the clustering criteria allow to characterize complex health conditions and map them into the spectral feature space, constituting a classification knowledgebase.
  • Initialization a) Define the clustering criteria: a) univariate classes; b) exclusive classes or c) multivariate classes; and d) each class threshold; b) Provide: supervision vector s or matrix S
  • Convex Hull determination i) select the supervised data in the feature space; ii) find the max and min coordinates of the supervised data in the feature space; iii) select one of the vertices'; iv) define the size of the directional search box; iv) define the volume increment criteria of the convex hull (dn).
  • Initial search i) determine the number of eigenvectors and predict the errors of classification; ii) remove statistically inconsistent data; iii) if no relevant direction is found, re-shape the convex hull geometry by moving inwards to a dn and perform step 4; repeat steps 1 and 2 until it stabilizes.
  • the complete cluster map of the feature space is recorded as a classification knowledgebase.
  • the full composite of all types of classifications for different conditions represent the classification complexity of the knowledgebase, where interactions between conditions, as well as, their metabolic causes can be studied.
  • By projecting a new spectra into this classification map one can predict the expected probability of a correspondent condition based on the coordinates of the knowledgebase map.
  • any collection of spectra X and compositional data Y can be transformed (e.g. kernel, derivative, Fourier, wavelets, curvelets) into the feature space F and K, respectively.
  • the problem is reduced to the local optimization in the feature space of:
  • Ki+i Ki — UjC
  • spectral information shares collinear eigenvector structure with the composition, such as it happens with pure compounds or substances with negligible interference.
  • providing co-variation maximization in the first component is paramount.
  • the Al has a way to perform pattern analysis in orthogonal T w or T 0 with corresponding orthogonal W in order to derive the coherence of the local subspace and models.
  • T scores space
  • RANSAC robust linear regression to identify eigenstructures inside T
  • v re-do the procedure until a threshold is achieved.
  • the existence of a linear model in the T scores space means that deflation is modeling systematic variation within all the data used to build the local linear PLS model.
  • the predictability of any new unknown data can be obtained by deriving the confidence intervals of the extracted linear eigenstructure in T, so that a p-value of the prediction is also forecasted.
  • p-Values above pre-defined thresholds can be considered categorical or unpredictable.
  • the Al has the possibility ' a priori' of knowing if a prediction with the necessary accuracy, because it only uses well known coherent eigenstructured data to perform predictions.
  • npc is the number of components
  • n sei.Vars is the number of selected variables of F from a total of variables n vars , cov(V,W) is the covariance of F l F and K l F eigenvectors
  • p-value the probability value of the least squares model in the T space.
  • the following pertains to coherence of the local sub-space.
  • the coherence of the local feature space F is ensured by: i) eigenstructure similarity between F and K; ii) low complexity; and iii) information determinism.
  • F and K have similar eigenstructure when:
  • Eigenstructure of K l F is preferably of extreme importance. Complexity of F can be estimated by the distribution of its eigenvalues ⁇ , which define the characteristic dimensions of the feature space. In spectroscopy signals, one expects ⁇ to decrease exponentially to a limit value:
  • is the expected i'th eigenvalue
  • ⁇ r the residual eigenvalue
  • ⁇ i the largest eigenvalue
  • k the decay factor
  • F and K may have systematic information that is not related to each other. Therefore, one must know how the local information is structured, that it how much information F and K hold in common, and how much is independent by performing orthogonal filtering (TryggandWold:2002, TryggandWold:2003, Bylesjoetal:2008).
  • Figure 7 shows how orthogonal filtering results into lower complex eigenstructure.
  • Figure 7a to 7c shows how the previous steps optimized the local calibration dataset using XI, X2 and X3 subsets, greatly reducing the complexity of the linear model to only three predicting latent variables. Such result still shows that XI, X2 and X3 subsets have systematic interference and can be subjected to orthogonal filtering, so that:
  • T are scores with common information between F and K that maximize co- variance
  • P Q the corresponding loadings.
  • T Q and U 0 the orthogonal scores to the covariance and P Q , Q 0 the corresponding orthogonal loadings; that is, T 0 is orthogonal to K and U 0 is orthogonal to F (Trygg and Wold:2003).
  • the following pertains to metrics for sub-space characterization.
  • One of the main advantage of the proposed approach is the possibility of characterization of the self-learned knowledgebase by incorporation of maps of local learning metrics, such as: i) number of data representation; ii) eigenstructure complexity; iii) collinearity between F and K; iv) predicted sum of squares (PRESS); v) variance of K l F; and vi) model information structure.
  • maps of local learning metrics such as: i) number of data representation; ii) eigenstructure complexity; iii) collinearity between F and K; iv) predicted sum of squares (PRESS); v) variance of K l F; and vi) model information structure.
  • Detailed metrics of the knowledgebase are presented in Table 1.
  • the Al system manages both self-leaning and prediction by knowing how the different regions of the feature space cover the quantification and qualification accurately.
  • the system can sustain if the new acquired data is either predictable or should enter a learning cycle. If cannot be predicted, data is added to a quarantine database, that acts as a vault repository of data that either has no neighbors (e.g. in the beginning, any system will never cover all the feature space) or consistent modeling.
  • the gathered data in the quarantine database passes only to the knowledgebase once new gathered data completes the corresponding area of the feature space allowing the development of a coherent sub-space knowledgebase.
  • Figure 8 shows the main mechanics of the self-learning process. Let's consider that the system is initially fed with a limited number of pair of spectra and corresponding composition X and Y. As any new X is recorded, it's projected into the initial feature space map and tested if it belongs to existing knowledbase. If the projection is within the vicinity of an existing model path, and a direct prediction using existing cached model, a prediction is formulated. If the prediction is not within the the expected quality, and if there exists neighbors that make it possible, a new model is built by the Algorithm 1, a prediction is performed and the corresponding model and path obtained by Algorithm 2 are cashed.
  • any new spectra is projected into the feature space and has no neigbours, its immediately quarantined.
  • the system enters into the learning cycle and asks the user or system to provide the composition of the sample to be quarantined. Once it has the pair X and Y, it searches for quarantine neighbors. If it has no neighbors, the data just stays quarantined. If it has neighbors, the learning process begins using Algorithms 1 and 2 for searching both local models and build the local co- variance map. Only when a new data in conjunction with the quarantined data are able to produce a consistent local model and model path, the data is certified to pass into the knowledgebase. The knowledgebase receives constant updates as new data is added, and predictions are extended to new regions of the feature space.
  • the system i) never produces predictions that are not within the knowledgebase; ii) maintains and studies the quarantine database; iii) validates quarantine data to pass into certified knowledgebase; iv) only uses certified data to build the knowledgebase and predictions; v) self-learns without human intervention; vi) independent of the data size, growing the knowledgebase with fed data.
  • this approach does not need large scale databases for starting building the knowledgebase and performing predictions, such as deep learning neural networks.
  • the system only uses the certified knowledgebase, and therefore, predictions do not suffer from bias, as other modeling approaches would, because they need significant amount of data to produce a globally stable model architecture.
  • Co-variance, classification maps and cached models make the system very computationally efficient.
  • the system turns any spectrometer into an operating independent machine that does not need human intervention to build mathematical models as today's previous-art systems.
  • Figure 9 shows how the feature space transformation is performed.
  • Any spectroscopy signal is decomposed into a orthonormal basis (e.g. Fourier, wavelets, curvelets). These basis provide an independent basis to re-construct scales of the signal based on the basis properties. If present, the information about any metabolite is scattered across different scales of the spectrum, and therefore, the optimal spectral variation for a particular molecule has to be extracted from the original signal using scale reconstruction.
  • a orthonormal basis e.g. Fourier, wavelets, curvelets
  • K F . Therefore, what we prove is that PLS or SVM type of assumptions are only possible if the eigenstructure of K and F are similar. Otherwise, systematic information will contaminate the scores inner relationship assumption and coherence.
  • PLS or SVM type of assumptions are only possible if the eigenstructure of K and F are similar. Otherwise, systematic information will contaminate the scores inner relationship assumption and coherence.
  • the same principles can be applied to artificial neural networks or 'deep learning'.
  • Figure 10 shows the fluxogram of feature space transformation where: i) the original signals are decomposed; ii) the initial estimate of best basis are estimated by linear regression; and iii) the basis combination is optimized by evolutionary methods. If a combination of basis is found so that the eigenstructure criteria is met, the information about the transformation is cached and used in future perditions as the feature space transformation for building the feature space.
  • FIG. 11a shows how cached models are used to speed up predictions. Once a new spectra is recorded, it's projected into the feature space and checked for a model path nearby. If so, the prediction is performed by using methods in section IV; and once any new spectra is recorded the following actions are performed:
  • i) is a cached model is able to perform the predictions accurately, the result is presented to the end user
  • the system can provide a consensus prediction before computing a new model and updating the knowledgebase;
  • Figure 13 exemplifies why PLS cannot cope with the complexity of a biological fluid, such as blood.
  • erythrocytes are the major cellular component of blood, and directly related to hemoglobin content, it could be expected that a linear model, would be sufficient to predict accurately the amount of erythrocytes cells.
  • Figure 13a shows exactly the opposite, that erythrocytes spectral quantification is highly non linear affected by significant interferences, so that, a PLS model shows very high variance and significant bias at high erythrocytes count (e.g. > 5-10 12 cells/L). Interferences are expressed in the 7 LV's of the PLS model.
  • Figure 13c shows the PLS prediction for leukocytes.
  • Leukocytes are present in blood in lower concentrations than erythrocytes, but are still, a significant proportion of the cellular component. The difference in magnitude is enough to show that it is not possible to predict leukocytes with PLS.
  • Results of Figure 13c show that predictions have a very significant variance and large bias.
  • Erythrocytes and leukocytes are a good example on how the self-learning method handles the complexity of spectral information to provide an accurate prediction based on local multi-scale modeling.
  • Figures 13b and 13d present the self learning artificial intelligence results for erythrocytes and leukocytes, respectively.
  • Table 1 resumes the quantification results for whole blood and blood serum parameters.
  • Hemogram parameters such as erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes and platelets. Results how a very significant improvement by the self-learning method and device where all parameter estimates exhibit errors below 6% within the studied range.
  • Figure 14 shows the results for bilirubin and myoglobin quantification in blood serum.
  • Bilirubin is a significant constituent of blood serum, with yellow-brown coloration.
  • Myoglobin is present in lower quantities, but when present, its spectral fingerprint is very significant in blood serum in the vis-nir region. Therefore, it would also be expected that both molecules could be linearly quantified by a PLS model.
  • Results in Figure 14a and 14c show that bilirubin and myoglobin PLS prediction exhibits very significant variance, with errors of 12.5% and 31.0%, respectively. Despite these molecules provide a very strong fingerprint in the spectral signal, they still suffer significant interference.
  • the most relevant result is the fact that bias-variance is significantly reduced when using the self-learning artificial intelligence method of the disclosure 14b and 14d.
  • Most models decrease in complexity, and all parameters that are presented in higher concentrations only use 1 eigenvector projection (one LV).
  • the proposed method was able to find a local multi-scale spectral information linearly correlated with molecular quantification. In this sense, all the studied hemogram parameters (erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes and platelets) were able to attain analytical grade quality with bias below 6%.
  • Figure 15 presents the benchmark of PLS vs Self-learning artificial intelligence of the present disclosure.
  • PLS modeling could only sustain POC qualitative quantification for: erythrocytes, hemoglobin, MCV, MCHC, platelets, bilirubin and CRP. The error of these parameters are around 7% to 12%. All other parameters estimated using PLS modeling did not met the 15% error criteria for POC (see Figure 15).
  • Self learning Al was able to attain medical analytical grade quality in the following parameters: erythrocytes, hemoglobin, hematocrit, MCV, MCHC, leukocytes, platelets, bilirubin, glucose, myoglobin, CRP, triglycerides and uric acid. Only creatinine and urea quantification were above the 5% limit, but did qualified for POC qualitative analysis.
  • the proposed self-learning artificial intelligence method greatly solves the previous technical barriers presented in the background art, allowing spectroscopy to attain analytical grade errors.
  • BB erythrocytes count levels below 4*10 12 /L and hemoglobin levels below 13 g/dL; ii) leukocytosis - leukocytes levels above 10 10 /L; iii) thrombotopenia - platelets levels below 100*10 9 /L; iv) thrombocythemia - platelets levels above 400*10 9 /L; v) hepatic insufficiency - bilirubin levels above 1.2 mg/dl; vi) diabetes mellitus - glucose levels above 100 mg/dl; vii) acute myocardial infarction - myoglobin levels above 147 ng/ml; viii) renal dysfunction - creatinine levels above 1.3 mg/ml; ix) inflammation - C-reactive protein levels above 2.0 mg/dl.
  • Table 2 presents the classification results for the presented conditions, in terms of true and false, positive and negative combinations, respectively.
  • Results show that self-learning classification is superior to a linear classifier, logistic PLS. This is especially significant for conditions where the cut-off value for diagnosis is at low concentrations, such as for thrombotopenia, or for conditions that suffer complex interferences, such as infections with high levels of leukocytes (leukocytosis).
  • the global PLS model is only able to sustain point-of-care (15% error of classification) for anemia, thrombocythemia and acute myocardial infarction. Most parameters exhibit levels of 50% to 80% chance of correct diagnosis, and therefore using linear classifiers proves to be very limited for classification of health conditions.
  • Self-learning method was able always perform above 85% chances of correct diagnosis.
  • the Self-learning method was able to correctly diagnose 100% of the cases of anemia, thrombocythemia and acute myocardial infarction.
  • Conditions such as, leukocytosis, diabetes mellitus and hepatic function also attain near complete correct classification (97% chance of being correct). Such, is because, values that are miss classified are near the cut-off, and the laboratory error was not taken into the account in the classification method. If one takes it into consideration, with an error margin of 5%, these conditions are also 100% classified thrombotopenia and renal dysfunction have classification rate of 87% and 89%, respectively (see Table 2).
  • code e.g., a software algorithm or program
  • firmware e.g., a software algorithm or program
  • computer useable medium having control logic for enabling execution on a computer system having a computer processor, such as any of the servers described herein.
  • Such a computer system typically includes memory storage configured to provide output from execution of the code which configures a processor in accordance with the execution.
  • the code can be arranged as firmware or software, and can be organized as a set of modules, including the various modules and algorithms described herein, such as discrete code modules, function calls, procedure calls or objects in an object-oriented programming environment. If implemented using modules, the code can comprise a single module or a plurality of modules that operate in cooperation with one another to configure the machine in which it is executed to perform the associated functions, as described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Pathology (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Computing Systems (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Hematology (AREA)
  • Molecular Biology (AREA)
  • Urology & Nephrology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/PT2018/050012 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample Ceased WO2019194693A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/PT2018/050012 WO2019194693A1 (en) 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample
CN201880092068.3A CN111989747A (zh) 2018-04-05 2018-04-05 用于预测样品中的成分的定量的分光光度法和装置
US17/043,481 US20210020276A1 (en) 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample
JP2020554164A JP7273844B2 (ja) 2018-04-05 2018-04-05 試料からの成分の定量化値を予測する分光測光方法及び装置
EP18724345.6A EP3776561B1 (en) 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/PT2018/050012 WO2019194693A1 (en) 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample

Publications (1)

Publication Number Publication Date
WO2019194693A1 true WO2019194693A1 (en) 2019-10-10

Family

ID=62152608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/PT2018/050012 Ceased WO2019194693A1 (en) 2018-04-05 2018-04-05 Spectrophotometry method and device for predicting a quantification of a constituent from a sample

Country Status (5)

Country Link
US (1) US20210020276A1 (https=)
EP (1) EP3776561B1 (https=)
JP (1) JP7273844B2 (https=)
CN (1) CN111989747A (https=)
WO (1) WO2019194693A1 (https=)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539576A (zh) * 2020-04-29 2020-08-14 支付宝(杭州)信息技术有限公司 一种风险识别模型的优化方法及装置
CN112329792A (zh) * 2020-10-30 2021-02-05 中国电子科技集团公司第五十四研究所 一种基于光谱角度的高光谱图像目标特征提取方法
CN113269203A (zh) * 2021-05-17 2021-08-17 电子科技大学 一种用于多旋翼无人机识别的子空间特征提取方法
JP2023043378A (ja) * 2021-09-16 2023-03-29 株式会社Ihi 調味料の評価装置、評価方法、及び、評価プログラム
CN116307079A (zh) * 2023-02-03 2023-06-23 华东理工大学 一种污水处理过程的关键指标预测方法及装置
CN119337160A (zh) * 2024-12-19 2025-01-21 浙江万胜智能科技股份有限公司 一种基于智能电表通信模块的用电负荷预测方法

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11353394B2 (en) * 2020-09-30 2022-06-07 X Development Llc Deformulation techniques for deducing the composition of a material from a spectrogram
CN113020629A (zh) * 2021-03-30 2021-06-25 东南大学 一种基于特征光谱的针对金属粉末氧含量检测的3d打印设备及其检测方法
CN113094892A (zh) * 2021-04-02 2021-07-09 辽宁石油化工大学 一种基于数据剔除与局部偏最小二乘的石油浓度预测方法
KR102572517B1 (ko) * 2021-08-18 2023-08-31 주식회사 아폴론 인공지능 기반 호산구의 라만 데이터 처리 방법 및 장치
US20230267369A1 (en) * 2022-01-31 2023-08-24 Idaho State University Generalized local adaptive fusion regression process based on physicochemical and physiochemical underlying hidden properties for quantitative analysis of molecular based spectroscopic data
KR102767158B1 (ko) * 2022-02-08 2025-02-13 재단법인 아산사회복지재단 머신 러닝 기반 라만 분광 분석을 이용한 염증 질환 분류 방법 및 장치
CN115015136B (zh) * 2022-04-13 2023-05-12 中煤科工集团重庆研究院有限公司 一种基于主成分优化的气体浓度检测方法
KR102439163B1 (ko) * 2022-06-24 2022-09-01 주식회사 아이브 인공지능 기반의 비지도 학습 모델을 이용한 불량 제품 검출 장치 및 그 제어방법
WO2024123681A2 (en) * 2022-12-05 2024-06-13 Washington University Improved method for scalable untargeted metabolomic workflow
CN116543848B (zh) * 2023-07-05 2023-09-29 潍坊学院 基于平行因子和粒子群优化算法的混合物组分定量方法
US12437842B2 (en) * 2023-09-28 2025-10-07 Vionix Biosciences Inc. System and method for analyzing spectral data using artificial intelligence
US20250321973A1 (en) * 2024-04-12 2025-10-16 Thermo Scientific Portable Analytical Instruments Inc. Time series matching of raw spectral vector data
CN118464858A (zh) * 2024-05-13 2024-08-09 大连海事大学 一种基于子空间分析的微塑料荧光光谱识别系统及方法
CN119007874B (zh) * 2024-10-22 2025-02-25 宜兴市星光宝亿化工有限公司 无磷灰水分散剂制备用参数分析方法
CN119959163B (zh) * 2025-04-09 2025-07-08 深圳微子医疗有限公司 一种离子浓度分析方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0807809A2 (en) * 1996-05-13 1997-11-19 Perstorp Analytical, Inc. System for indentifying materials by NIR spectrometry
EP1967846A1 (en) * 2007-03-05 2008-09-10 National University of Ireland Galway En ensemble method and apparatus for classifying materials and quantifying the composition of mixtures
EP1992939A1 (en) * 2007-05-16 2008-11-19 National University of Ireland, Galway A kernel-based method and apparatus for classifying materials or chemicals and for quantifying the properties of materials or chemicals in mixtures using spectroscopic data.
EP3136270A1 (en) * 2015-08-26 2017-03-01 Viavi Solutions Inc. Raw material identification using spectroscopy

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6512937B2 (en) * 1999-07-22 2003-01-28 Sensys Medical, Inc. Multi-tier method of developing localized calibration models for non-invasive blood analyte prediction
JP2003035663A (ja) * 2001-07-19 2003-02-07 Ishikawajima Harima Heavy Ind Co Ltd 吸収スペクトルの検量線作成方法
DE10250100A1 (de) * 2002-10-28 2004-05-13 Leica Microsystems Heidelberg Gmbh Mikroskopsystem und Verfahren zur Analyse und Auswertung von Mehrfachfärbungen eines mikroskopischen Objekts
GB2406377A (en) * 2003-09-25 2005-03-30 Qinetiq Ltd Laser spectroscopic identification of asbestos
CN101561325B (zh) * 2009-04-23 2010-10-06 浙江大学 聚合物本体温度的检测方法
US8429153B2 (en) * 2010-06-25 2013-04-23 The United States Of America As Represented By The Secretary Of The Army Method and apparatus for classifying known specimens and media using spectral properties and identifying unknown specimens and media
JP6235886B2 (ja) * 2013-01-08 2017-11-22 キヤノン株式会社 生体組織画像の再構成方法及び装置並びに該生体組織画像を用いた画像表示装置
US9347314B2 (en) * 2013-06-07 2016-05-24 Schlumberger Technology Corporation System and method for quantifying uncertainty of predicted petroleum fluid properties
CN104949936B (zh) * 2015-07-13 2017-10-24 东北大学 基于优化偏最小二乘回归模型的样品成份测定方法
DE102015220322B4 (de) * 2015-10-19 2020-02-13 Bruker Biospin Gmbh Verfahren und Vorrichtung zur automatisierbaren Ermittlung der Bestimmungsgrenze und des relativen Fehlers bei der Quantifizierung der Konzentration einer zu untersuchenden Substanz in einer Messprobe
CN105787518B (zh) * 2016-03-17 2019-04-23 浙江中烟工业有限责任公司 一种基于零空间投影的近红外光谱预处理方法
CN105928901B (zh) * 2016-07-11 2019-06-07 上海创和亿电子科技发展有限公司 一种定性定量相结合的近红外定量模型构建方法
CN106951720B (zh) * 2017-04-12 2019-05-31 山东省科学院海洋仪器仪表研究所 基于典型相关性分析及线性插值的土壤养分模型转移方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0807809A2 (en) * 1996-05-13 1997-11-19 Perstorp Analytical, Inc. System for indentifying materials by NIR spectrometry
EP1967846A1 (en) * 2007-03-05 2008-09-10 National University of Ireland Galway En ensemble method and apparatus for classifying materials and quantifying the composition of mixtures
EP1992939A1 (en) * 2007-05-16 2008-11-19 National University of Ireland, Galway A kernel-based method and apparatus for classifying materials or chemicals and for quantifying the properties of materials or chemicals in mixtures using spectroscopic data.
EP3136270A1 (en) * 2015-08-26 2017-03-01 Viavi Solutions Inc. Raw material identification using spectroscopy

Non-Patent Citations (24)

* Cited by examiner, † Cited by third party
Title
A. PHATAK; S. JONG: "The geometry of partial least squares", JOURNAL OF CHEMOMETRICS, vol. 11, 1997, pages 311 - 338
A.M.C DAVIES; T.EARN: "Quantitative analysis via near infrared databases: comparison analysis using restructured nearest infrared and constituent data-deux (carnac-d", JOURNAL OF NEAR INFRARED, vol. 14, no. 6, 2003, pages 403 - 411
BATHEN T. F. ET AL: "Quantification of plasma lipids and apolipoproteins by use of proton NMR spectroscopy, multivariate and neural network analysis", NMR IN BIOMEDIC, WILEY, LONDON, GB, vol. 13, no. 5, 1 August 2000 (2000-08-01), pages 271 - 288, XP008105223, ISSN: 0952-3480, [retrieved on 20000621], DOI: 10.1002/1099-1492(200008)13:5<271::AID-NBM646>3.0.CO;2-7 *
BÖHM G. ET AL: "Quantitative analysis of protein far UV circular dichroism spectra by neural networks", PROTEIN ENGINEERING, DESIGN AND SELECTION, vol. 5, no. 3, 1 April 1992 (1992-04-01), GB, pages 191 - 195, XP055549832, ISSN: 1741-0126, DOI: 10.1093/protein/5.3.191 *
C.D. CHRISTY; S.A. DYER, ESTIMATION OF SOIL PROPERTIES USING A COMBINATION OF SPECTRAL AND SCALAR SENSOR DATA
D.P. SOLOMATINE; MASKEY. M.; SHRESTHA. D.L.: "Instance-based learning compared to other data-driven methods in hydrological forecasting", HYDROL. PROCESS, vol. 22, 2008, pages 275 - 287
F.GOGE; R. JOFFRE; C. JOLIVET; I. ROSS; L. RANJARD: "Optimization criteria in sample selection step of local regression for quantitative analysis of large soil nirs database", CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, vol. 110, no. 1, 2012, pages 168 - 176, XP028341734, DOI: doi:10.1016/j.chemolab.2011.11.003
GEODERMA, vol. 195-196, 2013, pages 268 - 279
HUANG, G.B; HUAN, S. SONG; K. YOU: "Trends in extreme learning machines: A review", NEURAL NETWORKS, vol. 61, 2013, pages 32 - 48
J.S. SHENK; M.O. WESTERHAUS; P.ERZAGHI: "Local prediction with near infrared multi-product databases", JOURNAL OF NEAR INFRARED SPECTROSCOPY, vol. 5, 1997, pages 223 - 232
L. RAMIREZ-LOPEZ; T. BEHRENS; K. SCHMIDT; A. STEVENS; J.A.M. DEMATTE; T. SCHOLTEN, THE SPECTRUM-BASED LEARNER: A NEW LOCAL APPROACH FOR MODELLING SOIL VIS-NIR SPECTRA OF COMPLEX DATASETS
L. RAMIREZ-LOPEZ; T. BEHRENSA; K. SCHMIDT; A. STEVENS; J.A.M. DEMATTE; T. SCHOLTEN: "The spectrum-based learner: A new local approach for modeling soil vis-nir spectra of complex datasets", GEODERMA, vol. 195 - 19, 2013, pages 268 - 279, XP028972744, DOI: doi:10.1016/j.geoderma.2012.12.014
L. RAMIREZ-LOPEZ; T.BEHRENS; K.SCHMIDT; R.A. VISCARRAROSSEL; J.A.M. DEMATTE; T. SCHOLTEN: "Distance and similarity-search metrics for use with soil vis nir spectra", GEODERMA, vol. 199, 2013, pages 43 - 53
O'CONNELL M-L. ET AL: "Classification of a target analyte in solid mixtures using principal component analysis, support vector machines, and Raman spectroscopy", VISUAL COMMUNICATIONS AND IMAGE PROCESSING; 20-1-2004 - 20-1-2004; SAN JOSE,, vol. 5826, no. 1, 1 January 2005 (2005-01-01), pages 340 - 350, XP002444469, ISBN: 978-1-62841-730-2, DOI: 10.1117/12.605156 *
P. GELADI; B. KOWALSKY: "Partial least squares regression: a tutorial", ANALYTICAL CHEMICAL ACTA, vol. 185, 1986, pages 1 - 17, XP000578888, DOI: doi:10.1016/0003-2670(86)80028-9
R. ERGON: "Finding y-relevant part of x by use of per and plsr model reduction methods", JOURNAL OF CHEMOMETRICS, vol. 21, 2007, pages 537 - 546
R. ERGON: "Re-interpretation of nipals results solves plsr inconsistency problem", JOURNAL OF CHEMOMETRICS, vol. 23, 2009, pages 72 - 75
R.C. MARTINS; V.V. LOPES; P. VALENTAO; J.C.M.F. CARVALHO; P. ISABEL; M.T. AMARAL; M.T. BATISTA; P. B. ANDRADE; B.M. SILVA: "Relevant principal component analysis applied to the characterisation of portuguese heather honey", NATURAL PRODUCT RESEARCH, vol. 22, 2007, pages 1560 - 1582
R.J. PELL; L.S. RAMOS; R. MANNE: "The model space in partial least squares regression", JOURNAL OF CHEMOMETRICS, vol. 21, 2007, pages 165 - 172, XP055439764, DOI: doi:10.1002/cem.1067
RAMIREZ-LOPEZ L. ET AL: "The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data", GEODERMA, ELSEVIER, AMSTERDAM, NL, vol. 195, 22 January 2013 (2013-01-22), pages 268 - 279, XP028972744, ISSN: 0016-7061, DOI: 10.1016/J.GEODERMA.2012.12.014 *
S. WOLD; M. HOYC; H. MARTENS; J. TRYGG; F. WESTADE; J. MACGREGOR; B.M. WISE: "The pis model space revisited", JOURNAL OF CHEMOMETRICS, vol. 23, 2009, pages 67 - 68
T. FEARN; A.M.C. DAVIES: "Locally-biased regression", JOURNAL OF NEAR INFRARED, vol. 11, no. 6, 2003, pages 467 - 478
T. NAES; T. ISAKSSON; B. KOWALSKI: "Locally weighted regression and scatter correction for near-infrared reflectance data", ANAL. CHEM., vol. 62, no. 7, 1990, pages 664 - 673
U.G. INDAHL: "The geometry of plsl explained properly: 10 key notes on mathematical properties of and some alternative algorithmic approaches to plsl modelling", JOURNAL OF CHEMOMETRICS, vol. 24, 2014, pages 168 - 180

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539576A (zh) * 2020-04-29 2020-08-14 支付宝(杭州)信息技术有限公司 一种风险识别模型的优化方法及装置
CN111539576B (zh) * 2020-04-29 2022-04-22 支付宝(杭州)信息技术有限公司 一种风险识别模型的优化方法及装置
CN112329792A (zh) * 2020-10-30 2021-02-05 中国电子科技集团公司第五十四研究所 一种基于光谱角度的高光谱图像目标特征提取方法
CN112329792B (zh) * 2020-10-30 2022-12-09 中国电子科技集团公司第五十四研究所 一种基于光谱角度的高光谱图像目标特征提取方法
CN113269203A (zh) * 2021-05-17 2021-08-17 电子科技大学 一种用于多旋翼无人机识别的子空间特征提取方法
CN113269203B (zh) * 2021-05-17 2022-03-25 电子科技大学 一种用于多旋翼无人机识别的子空间特征提取方法
JP2023043378A (ja) * 2021-09-16 2023-03-29 株式会社Ihi 調味料の評価装置、評価方法、及び、評価プログラム
JP7678428B2 (ja) 2021-09-16 2025-05-16 株式会社Ihi 調味料の評価装置、評価方法、及び、評価プログラム
CN116307079A (zh) * 2023-02-03 2023-06-23 华东理工大学 一种污水处理过程的关键指标预测方法及装置
CN119337160A (zh) * 2024-12-19 2025-01-21 浙江万胜智能科技股份有限公司 一种基于智能电表通信模块的用电负荷预测方法

Also Published As

Publication number Publication date
JP7273844B2 (ja) 2023-05-15
EP3776561C0 (en) 2025-11-19
EP3776561B1 (en) 2025-11-19
EP3776561A1 (en) 2021-02-17
JP2021526628A (ja) 2021-10-07
US20210020276A1 (en) 2021-01-21
CN111989747A (zh) 2020-11-24

Similar Documents

Publication Publication Date Title
EP3776561B1 (en) Spectrophotometry method and device for predicting a quantification of a constituent from a sample
WO2018060967A1 (en) Big data self-learning methodology for the accurate quantification and classification of spectral information under complex varlability and multi-scale interference
Trevisan et al. Extracting biological information with computational analysis of Fourier-transform infrared (FTIR) biospectroscopy datasets: current practices to future perspectives
Paul et al. Chemometric applications in metabolomic studies using chromatography-mass spectrometry
Mutihac et al. Mining in chemometrics
Köse et al. Effect of missing data imputation on deep learning prediction performance for vesicoureteral reflux and recurrent urinary tract infection clinical study
Marvuglia et al. Machine learning for toxicity characterization of organic chemical emissions using USEtox database: Learning the structure of the input space
US10448898B2 (en) Methods and systems for predicting a health condition of a human subject
Jinadasa et al. Deep learning approach for Raman spectroscopy
Wang et al. SVM classification method of waxy corn seeds with different vitality levels based on hyperspectral imaging
Zhao et al. Reducing moisture effects on soil organic carbon content estimation in vis-NIR spectra with a deep learning algorithm
Wang et al. Feature selection for chemical sensor arrays using mutual information
CN120652073A (zh) 一种全自动水质cod分析方法及装置
CN110163247A (zh) 基于深度最近类均值的模型设计方法及增量气味分类方法
Zoubir et al. Geoscatt-gnn: A geometric scattering transform-based graph neural network model for ames mutagenicity prediction
del Real Mata et al. Evaluation of machine learning and deep learning models for the classification of a single extracellular vesicles spectral library
US20250226058A1 (en) Method and system for predicting gene expression perturbations
Mikalsen et al. An unsupervised multivariate time series kernel approach for identifying patients with surgical site infection from blood samples
Chapman et al. Artificial intelligence applied to healthcare and biotechnology
Song et al. Mung bean seed classification based on multimodal features and Kepler-optimized stacking ensemble learning model
Wang et al. Novel developments in fuzzy clustering for the classification of cancerous cells using FTIR spectroscopy
Campanella et al. Biomedical and clinical applications of Raman spectroscopy and multivariate chemometric methods
Nishanth et al. Machine learning in preclinical research: Prediction of blood brain barrier permeability
Paleczek Recent achievements of exhaled breath analysis at the research stage—Artificial intelligence and machine learning algorithms
Junior et al. CNN-BiLSTM for sustainable and non-invasive COVID-19 detection via salivary ATR-FTIR spectroscopy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18724345

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020554164

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2018724345

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018724345

Country of ref document: EP

Effective date: 20201105

WWG Wipo information: grant in national office

Ref document number: 2018724345

Country of ref document: EP