WO2021053409A1 - Encoding raman spectral data in optical identification tags for analyte identification - Google Patents

Encoding raman spectral data in optical identification tags for analyte identification Download PDF

Info

Publication number
WO2021053409A1
WO2021053409A1 PCT/IB2020/055808 IB2020055808W WO2021053409A1 WO 2021053409 A1 WO2021053409 A1 WO 2021053409A1 IB 2020055808 W IB2020055808 W IB 2020055808W WO 2021053409 A1 WO2021053409 A1 WO 2021053409A1
Authority
WO
WIPO (PCT)
Prior art keywords
spectrum
oit
raman
oits
composition
Prior art date
Application number
PCT/IB2020/055808
Other languages
French (fr)
Inventor
Niveen M. KHASHAB
Phuong Mai HOANG
Original Assignee
King Abdullah University Of Science And Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by King Abdullah University Of Science And Technology filed Critical King Abdullah University Of Science And Technology
Publication of WO2021053409A1 publication Critical patent/WO2021053409A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • G01N21/658Raman scattering enhancement Raman, e.g. surface plasmons

Definitions

  • AA amino acid
  • FTIR Fourier transform infrared
  • Previous work identified AA components through indirect detection by chemical or physical conjugation of known reporters such as dyes or strong Raman scatterers. While appearing promising, the labeling step required preparation of compatible tags and is prone to artifacts during multi-step synthesis procedure. There is a need for more cost-effective methods providing more straightforward analysis.
  • Raman spectroscopy is a fast, powerful, and easy-to-use analytical technique. Detailed fingerprints in combination with non-destructivity and minimal sample preparation has allowed the construction of reference libraries of Raman spectra in a variety of research fields. Since libraries often contain many different and/or highly similar spectra, it is important that each data point in all the spectra corresponds to the exact Raman wavenumber. Spectral resolution and relative intensities (due to resonance effects) are known to affect compatibility between spectral libraries. Automated data analysis requires a large library of spectra. Calibration transfer schemes for distributing reference data or methods between different instruments are needed to merge spectral data. Such schemes would permit spectra collected from different instruments to be merged in a large reference library.
  • Raman imaging data subsequent to its collection is essential for the ability of this technique to provide spatially resolved chemical and molecular information.
  • a variety of multivariate methods have been used to analyze Raman imaging data, including multivariate curve resolution — alternating least squares (MCR-ALS), principal component analysis (PCA), cluster analysis, neural networks, partial least squares (PLS), and direct classical least squares (DCLS).
  • MCR-ALS multivariate curve resolution — alternating least squares
  • PCA principal component analysis
  • PLS partial least squares
  • DCLS direct classical least squares
  • embodiments of the present disclosure feature methods, materials, and systems for chemical analysis of compositions based on the spectral fingerprint of Raman spectral data, which has been encoded in an optical identification tag (OIT).
  • OIT optical identification tag
  • Embodiments described in the present disclosure include methods and systems for encoding a Raman spectrum in an OIT for library construction and sample analysis and methods and systems for analyte identification based on the similarity of a sample OIT to one or more reference OITs of a screening library.
  • the methods and systems of the present disclosure permit automated OIT library construction and search to encode Raman spectra from any sample, including samples at very low concentration (e.g., picomolar) with greater efficiency and reliability than methods requiring complex algorithms or machine learning.
  • the binary format of an OIT enables efficient, specific, and selective search of an OIT library using a binary matching process with a screening library of candidate OITs, whereby the goodness of a match between two OITs is efficiently evaluated using a bit based similarity or distance measure, such as the binary distance metric, Hamming distance.
  • embodiments of the present disclosure feature a method for identifying one or more components of a composition, the method including: (a) receiving a Raman spectrum of a composition having one or more analytes of unknown chemical identity; (b) converting the Raman spectrum of the composition to an optical identification tag (OIT) which includes or is a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks and the position and width of each bar aligns to a characteristic spectral peak; (c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; and (d) matching the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold
  • the received Raman spectrum can be collected using an excitation wavelength selected from the group consisting of 473-, 532-, 633-, and 785-nm.
  • the range of wavenumbers of the Raman spectrum used in the converting step can be selected from the group consisting 200-4000 cm 1 , 200-1800 cm 1 , and 2500-4000 cm 1 , or a combination thereof.
  • Correcting the spectrum can include: (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; (d) subtracting the background of the smoothed spectrum; and (e) correcting the relative intensity of the background corrected spectrum.
  • Processing the corrected spectrum can include: (a) normalizing the corrected spectrum to a maximum intensity of 1 ; (b) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normalized spectrum to eliminate weak vibrational peaks; (c) refining the remaining peaks using second order derivatives; and (d) converting the positive values to the bars of the barcode.
  • the method can further include creating the screening barcode library of reference OITs by: (a) receiving Raman spectrum of one or more reference molecules; and (b) converting the Raman spectrum of each reference molecule to a reference OIT, wherein the reference OIT comprises a barcode, wherein the position and width of each bar aligns to a peak of the Raman spectrum of the reference molecule.
  • the Raman spectrum can be a Surface- Enhanced Raman Scattering (SERS) spectrum obtained from a sample immobilized on a SERS-active substrate.
  • SERS Surface- Enhanced Raman Scattering
  • the SERS-active substrate can be solvophobic or omniphobic the converting step can include encoding the Raman spectrum by: (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; (d) subtracting the background of the smoothed spectrum; (e) correcting the relative intensity of the background corrected spectrum; (f) normalizing the corrected spectrum to a maximum intensity of 1 ; (g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normali ed spectrum to eliminate weak vibrational peaks; (h) refining the remaining peaks using second order derivatives; and (i) converting the positive values to the bars of the barcode.
  • the one or more analytes can be selected from the group consisting of amino acids of a polypeptide, an active pharmaceutical ingredient or excipient of
  • embodiments of the present disclosure feature a system for identifying one or more components of a composition, the system including: (a) a Raman spectrometer for collecting a Raman spectrum of a composition comprising one or more components, wherein the one or more components are unlabeled analytes; (b) an optical identification tag (OIT) converter configured to receive and convert the Raman spectrum to an OIT comprising a barcode, wherein the configuration includes instructions for correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks, and wherein the position and width of each bar aligns to a characteristic spectral peak, and a processor to execute the instructions; and (c) an OIT analyzer configured to: (i) compare the OIT of the converted spectrum to a reference OIT of a screening library wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; (ii) calculate a binary similarity measure or binary distance
  • Correcting the spectrum can include (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; and (d) subtracting the background of the smoothed spectrum; and wherein processing the corrected spectrum includes: (e) correcting the relative intensity of the background corrected spectrum; (f) normalizing the corrected spectrum to a maximum intensity of 1; (g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normali ed spectrum to eliminate weak vibrational peaks; (h) refining the remaining peaks using second order derivatives; and (i) displaying the positive values as bars of the barcode.
  • the Raman spectrometer can be configured for Surface-Enhanced Raman Scattering (SERS).
  • the system can further include a SERS-active substrate comprising SERS-active particles immobilized on a solvophobic or omniphobic solid support.
  • the SERS-active particles can include silver, gold copper, aluminum, platinum, palladium, or a combination thereof.
  • the solvophobic or omniphobic solid support can include a perfluoropolyether coating.
  • the solvophobic or omniphobic solid support can include a material selected from the group consisting of silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals, or a combination thereof.
  • the Raman spectrometer can include a laser source having an excitation wavelength in the range of about 400 nm to about 1000 nm.
  • the range of wavenumbers of the Raman spectrum receivable by the OIT converter can be selected from the group consisting 200-4000 cm 200-1800 cm 1 , and 2500-4000 cm 1 .
  • embodiments of the present disclosure feature a method for identifying one or more components of a composition, wherein at least one of the components is present at a submicromolar concentration, the method comprising: (a) receiving Raman spectrum of a composition comprising one or more components of unknown chemical identity; wherein the Raman spectrum is a SERS-active Raman spectrum obtained by depositing a solution of the composition on a SERS-active substrate; (b) converting the Raman spectrum of the composition to an optical identification tag (OIT) comprising a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve fitting calculation to obtain a plurality of characteristic spectral peaks, wherein the position and width of each bar aligns to a characteristic spectral peak; (c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of
  • SERS-active substrate for collecting Raman spectrum of an analyte present in a submicromolar concentration.
  • the SERS-active substrate comprises particles of a SERS-active (i.e., plasmonic) material immobilized on a solid support to form a composite.
  • the particles can be immobilized in an array of discrete spots.
  • the discrete spots can have a diameter of about 150 to 250 pm.
  • the immobilized particles can include nanoparticles having an average diameter in the range of about 10-100 nm.
  • the immobilized nanoparticles can be nanocubes or nanostars.
  • the particles can include silver, gold copper, aluminum, platinum, palladium, or a combination thereof.
  • the solid support can have a solvophobic or omniphobic surface.
  • the solid support can comprise silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals and combination thereof.
  • the SERS-active substrate can be coated with a fluorinated or perfluorinated material.
  • the particles can be coated with an omniphobic material.
  • the SERS-active substrate can include a solid support comprising a perfluoropolyether (PFPE) fluid (a proprietary composition sold under the trade name GPL- 100) membrane (i.e., a PFPE-coated polymer membrane) having lH,lH,2H,2H-Perfluorodecanethiol- coated nanostars immobilized thereon.
  • PFPE perfluoropolyether
  • FIGS, la-c describe methods, according to one or more embodiments of the present disclosure: (a) describes a method of converting a Raman spectrum (i.e., spectral data) to an OIT, according to one or more embodiments of method 100; (b) shows an embodiment of a Raman spectra (or SERS) processing scheme; and (c) describes method 200 for identifying one or more components of a composition. [0014] FIGS.
  • 2a-i describe a detailed spectrum processing procedure for cysteine (Cys), according to one or more embodiments of the present disclosure: (a) shows a raw Raman spectrum and the x- waveshift spectrum after calibration with a standard reference; (b) shows a compatible spectrum with even spacing; (c) shows a compatible spectrum smoothed using Sarvitzky-Golay filter with one order and 5 points window; (d) shows background subtraction using asymmetric least square smoothing with smoothing factor of 6 and asymmetric factor of 0.001; (e) shows a relative intensity correction (ISC) based on laser source and grating with a reference halogen-tungsten source; (f) shows a corrected spectrum normalized between 0 and a maximum intensity of 1 ; (g) shows peaks fitting (analysis) using Gaussian-Lorentz formula; (h) shows features refinement using second-order derivatives; and (i) shows conversion of final peaks to a barcode Optical Identification Tag (OIT).
  • ISC relative intensity correction
  • FIGS. 3a-d show an AA component analysis scheme for peptides, according to one or more embodiments of the present disclosure: (a) shows conversion from the collected Raman spectrum collected at 473 nm laser wavelength to the OIT screening library of 20 AAs (See also FIG. 15); (b) shows conversion from collected Raman spectrum to the OIT of peptides; (c) demonstrates AAs screening process using distance metric for peptide Ab(15-20) with a sequence of QKLVFF (SEQ ID NO: 1) and the aromatic component phenylalanine (F); (d) shows AA composition results for peptides with chain length of 6 AAs Ab(15-20), 11 AAs Ab(25-35), and the full sequence Ab(1-42).
  • FIGS.4a-b show OIT according to one or more embodiments of the present disclosure accurately identified all non-aromatic AAs in Ab(15-20): (a) shows the identification of AA composition of the peptide Ab(15-20) at 473 nm using the 200-1700 cm 1 low wavenumber (LW) region and the 2500-4000 cm 1 high wavenumber (HW) region; and (b) shows the normalized intensity of Ab(15-20) and the AA components at 473 nm, 633 nm and 785 nm visualized in intensity map from 0 to 1 corresponding to the peak intensity, according to one or more embodiments of the present disclosure.
  • LW low wavenumber
  • HW high wavenumber
  • FIG. 5 is a flowchart of a Surface-enhanced Raman Scattering (SERS) spectra acquisition scheme, according to one or more embodiments of the present disclosure.
  • SERS Surface-enhanced Raman Scattering
  • FIG. 6 is a fabrication scheme for a SERS-active substrate, according to one or more embodiments of the present disclosure.
  • Step a) includes the plasmonic nanostructure synthesis procedure: icosahedral seed is grown to a nanostar, which is treated with a perfluorinated compound to form nanostar@PF;
  • Step b) includes spin coating a Teflon membrane with lubricating liquid (SLIP), and then depositing nanostars@PF on SLIP-coated surface; c) analyte is deposited on a SERS-active spot prior to analysis.
  • SLIP Teflon membrane with lubricating liquid
  • FIGS. 7a-d provide a characterization of nanoparticles synthesized according to one or more embodiments of the present disclosure: (a) show Transmission Electron Microscopy (TEM) image of Au seeds and (b) show TEM image of Au nanostars; (c) shows a Scanning Electron Microscopy (SEM) image of the nanostars; (d) shows the UV- visible extinction spectra of the Au seed and star nanoparticles.
  • TEM Transmission Electron Microscopy
  • SEM Scanning Electron Microscopy
  • FIGS.8a-c provide a characterization of a SERS-active substrate according to one or more embodiments of the present disclosure: (a) shows an optical image of as deposited nanostars solution on a substrate; (b) shows an optical image of a single SERS- spot formed after drying of the nanostars solution; and (c) shows a SEM image of the SERS-spot.
  • FIGS. 9a-b describe the reproducibility of Raman signal on different SERS-spots on a SERS-active substrate according to one or more embodiments of the present disclosure: (a) shows the Raman spectra of Ab(15-20) at the concentration of 10 5 M deposited on 8 different SERS-spots; and (b) shows the intensity of aromatic vibration peak (1003 cm 1 ) in spectra of Ab(15-20) at the concentration of 10 5 M deposited on 8 different SERS-spots.
  • FIG. 10 shows a comparison between non-enhanced Raman spectra collected on CaF2 and Raman spectra collected on SERS-spot for mutated peptides: K28A-Ab(25-35) (2 x 10 -7 mole on SERS-spot and 1.8xl0 -12 mole on CaF2), M35G- Ab(25-35) (2.09 x 10 -7 mole on SERS-spot and 2.3 x 10 -12 mole on CaF2) and Ab(1-43) (1.08 x 10 -8 mole on SERS-spot and 1 x 10 -9 mole on CaF2) according to one or more embodiments of the present disclosure (SERS spectra shown in red and non-enhanced Raman spectra shown in black).
  • FIGS, lla-b show OITs of twenty AAs and five peptide samples according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
  • FIGS. 12a-b show AA composition results using OITs with full Raman spectral range (200-4000 cm 1 ) at different laser wavelengths for the peptide with chain length of 6 AAs Ab(15-20), 11 AAs Ab(25-35), and the full sequence Ab(1-42), according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
  • FIGS. 13a-b show AA composition results of Ab(15-20) using OITs at different laser wavelengths in the LW range (200-1700 cm 1 ) and the HW range (2500- 4000 cm 1 ), according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
  • FIG. 14a-c show AA composition results using Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) method using Raman spectra obtained at different laser wavelengths for peptide chains Ab(15-20), Ab(25-35), and the full sequence Ab(1-42), for comparison with one or more embodiments of the present disclosure, using (a) 473 nm; (b) 633 nm; and (c) 785 nm excitation.
  • MCR-ALS Multivariate Curve Resolution Alternating Least Squares
  • FIG. 15 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 473 nm laser wavelength, according to more embodiments of the present disclosure.
  • FIG. 16 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 633 nm laser wavelength, according to more embodiments of the present disclosure.
  • FIG. 17 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 785 nm laser wavelength, according to more embodiments of the present disclosure.
  • Embodiments of the present disclosure feature methods and systems for chemical composition analysis using Raman spectroscopy, including methods for analyte identification using a screening library of binary optical identification tags (OITs) containing encoded Raman spectra, and methods and systems for encoding a Raman spectrum in an OIT for library construction and sample analysis.
  • OITs binary optical identification tags
  • the binary format of the OIT enables efficient, specific, and selective search of an OIT library that can be used to identify an unidentified chemical compound or unidentified components of a composition.
  • the methods and systems described in the present disclosure permit automated OIT library construction and search with greater efficiency and reliability than methods requiring complex algorithms or machine learning.
  • analyte or “target” (or “targeted molecule”) refers to one or more molecules in the sample to be analyzed.
  • the term “molecule” generally refers to a chemical composed of two or more atoms and includes macromolecules, molecules or polymers described herein.
  • a molecule of interest to be analyzed can be any organic or inorganic molecule, or a combination thereof.
  • Some non-limiting examples of analytes include small molecules such as active pharmaceutical ingredients, pharmaceutical excipients, nutraceutical agents, agricultural agents, formulation aids, hydrocarbons, biomolecules such as biologically active small molecules, nucleic acids and their sequences, peptides, and polypeptides.
  • the analyte can be a molecule found directly in a sample such as a chemical composition, a manufactured food or agricultural product, a pharmaceutical composition, an environmental sample (e.g., a wastewater sample), a biological sample (e.g., vegetative material) or a body fluid.
  • the analyte can be present in the sample to be measured in the solid, liquid, gas, or vapor phase.
  • gas or vapor phase analyte is meant a molecule or compound that is present, for example, in the top of the liquid, in the surrounding air, in the breath sample, in the gas, or a combination of any of the foregoing.
  • the physical state of the gas or vapor phase can be varied by pressure, temperature and by the presence or addition of a salt which affects the surface tension of the liquid.
  • the sample can be pre-treated to make the analyte easier to detect, e.g., by depositing on a Raman-active surface.
  • an “array” is a group of intentionally positioned particles, which particles can be prepared synthetically and subsequently systematically arranged.
  • plasmonically-active nanoparticles can be grown in a solution and deposited as concentrated spots in an array of rows and columns.
  • solid support refers interchangeably to a material or group of materials having a rigid or semi-rigid surface.
  • at least one surface of the solid support is substantially flat.
  • a solid support of the present disclosure can have the form of beads.
  • omniphobic refers to a surface characterized by display of contact angles greater than 150° and low contact angle hysteresis with both polar and nonpolar liquids possessing a wide range of surface tensions.
  • An omniphobic surface of the present disclosure can support a composite interface with substantially all known liquids (i.e., essentially all known liquids).
  • optical identification tag refers to a binary format, analogous to a barcode, encoding Raman spectral data collected from a substance, composition, or material.
  • the terms “optical identification tag”, “OIT”, and “barcode” are used interchangeably in the present disclosure.
  • the OIT is distinct from a “probe” or “probe molecule” that binds to a target molecule for target analysis, such as a nanoparticle probe labeled with an oligonucleotide and/or Raman-active dye, which are sometimes referred to as “Raman tags” or “labels” in other publications.
  • the OITs of the present disclosure can be converted from Raman spectra without the need for modifying the analyte by chemical tagging or labeling or by using probes.
  • OIT converter refers to a system or computer program product configured to convert Raman spectral data, received or collected, to an OIT.
  • the OIT converter can include hardware, software (including firmware, resident software, micro-code, etc.) or a combination of software and hardware that may be referred to herein as a “processor,” “device,” or “system.”
  • the OIT converter can be in the form of a computer program product embodied in one or more computer readable mediums having computer readable program code embodied thereon. Any combination of one or more computer readable mediums may be utilized.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non- exhaustive list) of the computer readable storage medium include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium can include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to CDs, DVDs, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the OIT converter may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide
  • nanostar refers to a type of nanoparticle characterized by a spherical core with multiple protruding sharp tips, resembling a star.
  • nanoparticle refers to a particle that exhibits one or more properties not normally associated with a corresponding bulk material (e.g., quantum optical effects, etc.) and having at least two dimensions that do not exceed 1000 nm.
  • a nanoparticle can be a solid particle ranging from less than 1 nm to 1 ,000 nm in length, width, or diameter.
  • General embodiments of the present disclosure feature a method of generating an OIT for a sample chemical substance, compound, or composition comprising receiving a Raman spectrum of the sample; correcting the Raman spectrum; and processing the corrected spectrum to generate an OIT.
  • the method can be computer implemented for automatic OIT generation.
  • the method is optionally repeated for a plurality of samples of chemical substances, compounds, or compositions to build a library containing a sufficient number of reference OIT to perform a chemical composition analysis for the identification of an unidentified analyte based on the similarity of the analyte’s OIT to one or more reference OITs.
  • the Raman spectrum is a surface-enhanced Raman spectroscopy (SERS) spectrum.
  • the sample can be an inorganic or organic molecule, including organic molecules selected from the group consisting of active pharmaceutical compounds (API), excipients, hydrocarbons, polymers, and biological material.
  • the sample is a heterogeneous mixture of chemical compounds.
  • the identification can be performed without multivariate analysis.
  • the received Raman spectrum can be the full Raman spectrum or a portion of the Raman spectrum.
  • the spectral range may be about 800 to about 1800 cm -1 , for example, which includes heavy atom stretching as well as X — H (X: heavy atom with atomic number>12) deformation modes.
  • multiple OITs can be generated from a single spectrum, using different spectral ranges.
  • General embodiments of the present disclosure include a method of identifying one or more components in a composition
  • a method of identifying one or more components in a composition comprising receiving a Raman spectrum of a sample containing on or more unidentified substances; generating a query OIT from a Raman spectrum of the sample; and comparing the query OIT to at least one of a plurality of reference OITs using a binary similarity measure or binary distance measure; and identifying the substance based on the closest matches.
  • the method of identifying can be computer-implemented.
  • the received Raman spectrum can be a surface-enhanced Raman spectroscopy (SERS) spectrum.
  • SERS surface-enhanced Raman spectroscopy
  • inventions of the present disclosure are described as methods, the skilled artisan would recognize that these methods be embodied as a system.
  • the system can include computer-implemented embodiments including one or more of hardware and software (including firmware, resident software, micro-code, etc.) referred to herein as a “processor,” “device,” or “system.”
  • general embodiments of the present disclosure also include a system for identifying a material, comprising a plurality of stored OITs, wherein each stored OIT is based on a Raman spectrum of an identified material; a barcode converter that receives a Raman spectrum of an unidentified material and generates a query OIT; and a query OIT analyzer that compares the query OIT and at least one of the stored OITs using a distance metric.
  • Each block can be implemented by computer program instructions, which can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the steps specified in the blocks.
  • the computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the steps specified in the blocks.
  • FIG. la describes method 100 for converting a Raman spectrum to an OIT.
  • Embodiments of method 100 can be used to build, construct, or assemble an OIT library.
  • the OIT library can be used for quality assurance or control.
  • the library can include a plurality of known chemical substances, compounds, and compositions relevant for the intended application.
  • a library constructed using method 100 can be used to determine, confirm, or verify the identity of a composition or component of a composition, or to screen a composition for the presence of specific component.
  • the library can include OITs of the active pharmaceutical ingredient (API), including various polymorphs, one or more of the excipients to be used in the formulation, and common contaminants.
  • API active pharmaceutical ingredient
  • the library can include OITs of one or more contraband chemicals or compositions, in addition to the chemicals and compositions identified on a cargo manifest or bill of lading.
  • the library can include a plurality of AA OITs as shown in FIGs. 15-17.
  • Method 100 receives a Raman spectrum that has been collected for a known substance, compound or composition and converts the spectral data to an OIT.
  • the Raman spectrum is a plot of the intensity of Raman scattered radiation as a function of its frequency difference from the incident radiation (usually in units of wavenumbers, cm 1 ).
  • the method permits extraction of characteristic peaks by encoding the distribution of strong or moderately strong vibrational peaks into the OIT.
  • method 100 includes collecting the Raman spectrum of the known substance, compound, or composition.
  • method 100 can include collecting Raman spectra of precious compounds (e.g., limited-quantity or low concentration samples) using Surface-Enhanced Raman Spectroscopy (SERS).
  • SERS Surface-Enhanced Raman Spectroscopy
  • Generating an OIT in method 100 includes a series of correcting steps 101a and a series of OIT processing steps 101b.
  • the correcting steps 101a correct for instrumentation and environmental factors allowing for raw spectra to be collected from different Raman spectrometers and standardized.
  • processing steps in 101b convert the corrected spectra into a binary signal for referencing.
  • the correcting steps include processing with mathematical algorithms to standardize the received spectrum for further processing into an OIT that is suitable for inclusion in an OIT screening library or for querying an OIT screening library.
  • Correcting steps 101a can include processing a collected spectrum according to one or more calibrating steps, transformation steps, smoothing steps and background subtraction steps which reduce or eliminate irrelevant, random, and systematic variations in the spectral data, such as signal intensity variations linked to laser intensity, low signal-to-noise ratio, high background (fluorescence), and spikes.
  • correcting steps can include applying a missing point polynomial filter, a robust smoothing filter, a moving window filter, or a wavelet transform method to remove spikes.
  • OIT processing steps 101b can include processing with mathematical algorithms to normali e the corrected spectrum, to perform curving to extract characteristic peaks of the corrected spectrum, and calculate a second derivative of the normali ed and/or curve fitted spectral data.
  • the results of processing can be displayed as a barcode, with the normalized positive and negative values as color-coded bars positioned at the corresponding wavenumbers of the characteristic peaks of the Raman spectrum.
  • the bar position and width can be substantially aligned with the peaks in the original Raman spectra.
  • Method 100 includes calibration step 102, to calibrate the accuracy of the wavelength axis (x-axis) of the Raman spectrum.
  • calibrating can include processing the x-axis of the collected spectrum based upon the spectrum of a standard reference material.
  • step 102 can include collecting the spectrum of the standard and the using the collected spectrum for calibration.
  • the standard can be any Raman shift frequency standard (e.g., naphthalene, 1,4 bis(2- methylstyryl) benzene (BMB), 50/50 (v/v) toluene/acetonitrile, 4-acetamidophenol, benzonitrile, cyclohexane, or polystyrene).
  • Method 100 includes transformation step 104, comprising transforming the spectral resolution of the calibrated spectrum to match a common spectral resolution.
  • the optimal spectral resolution for an OIT library will depend on the intended use of the library and the types of compositions or compounds being analyzed (e.g., whether the samples is amorphous or crystalline, whether there is hydrogen bonding in the sample, and whether analytes are present in primary, secondary, and/or tertiary structures in the composition).
  • a common spectral resolution permits comparison of collected spectra and merging of data from different instruments or different time points of collection.
  • Spectral resolution depends on the configuration of the spectrometer illumination, the grating dispersion, the spectrometer length (from grating to detector), and either exit slit or, in the case of CCD, the pixel size, etc.
  • Transforming the spectral resolution can include processing the spectrum for compatibility with spectra collected or to be collected using different instruments.
  • a common spectral resolution can be low, medium, or high. For example, low to medium resolution can be useful to identify a chemical component, while higher resolution may be required to reveal small changes in the shape or position of a spectral peak. In some cases, the common spectral resolution is 8, 4, 2, 1, or 0.5 cm 1 .
  • Method 100 includes smoothing step 106, comprising reducing the noise in the Raman signal of the calibrated spectrum.
  • Electronic noise such as flicker noise, shot noise and thermal noise is an unpredictable and constant occurrence which appears primarily in the spectral intensity.
  • smoothing includes calculating the average, the moving average, or the moving median of the spectrum.
  • Smoothing can include applying a digital filter to the spectrum to increase the precision of the data. The digital filter is selected based on its ability to increase precision without distorting the signal tendency.
  • smoothing includes applying a smoothing algorithm using a window of 5-25 points, such as 5-13 points, 13-25 points, or 8-15 points. The window size selection can be determined for a specific composition based the characteristic peaks.
  • smoothing step 106 can include applying a moving polynomial filter, such as a Savitzky-Golay filter or a Fourier filter, to the spectrum.
  • method 100 can include smoothing using Savitzky-Golay filter with a 5-point window.
  • Method 100 includes background subtraction step 108 using on or more methods of background correction.
  • Exemplary methods include polynomial methods of background correction such as offset correction (e.g., subtracting a constant), subtracting a line, or approximating a polynomial through basis points; derivate methods of background correction; Fast Fourier Transform background correction with a high pass filter; and Wavelet transformation techniques.
  • Derivate background correction methods include calculating a moving simple difference, calculating a moving average difference, and applying Savitzky-Golay filter.
  • step 108 includes creating a baseline function to be subtracted from the smoothed data resulting from step 106.
  • a baseline can be created using a first or second derivative method to define anchor points, or an asymmetric least squares (AFS) method.
  • the baseline can be optimized using the AFS method by adjusting parameters such as the asymmetric factor (0-1) (i.e., the weight of points above baseline), the threshold, the smoothing factor (i.e., window size factor), and the number of iterations.
  • the smoothing factor can be 6.
  • the asymmetric factor can be close to zero, such as 0.001.
  • Method 100 includes relative intensity correction step 110.
  • Intensity correction compensates for instrumental response.
  • Raman spectra obtained with different instruments may show significant variations in the measured relative peak intensities as a result of differences in wavelength-dependent optimal transmission and detector quantum efficiency. The variations can be large when different laser excitations are used but can also occur over time with the same laser excitation is used.
  • Relative intensity can be calibrated based on white-light calibration, tungsten or tungsten-halogen lamps, or luminescent glass standards designed for relative intensity calibration of Raman spectrometers operating with 1064 nm, 785 nm, 532 nm, or 488 nm/514.5 nm laser excitation. In some cases, the relative intensity correction is calibrated with a reference tungsten-halogen source for the chosen laser excitation wavelength and the grating used for acquisition.
  • Method 100 includes normalization step 112 to eliminate (compensate for) systematic differences among measurements resulting from changes that may have occurred during data acquisition (e.g., changes in laser power, differences in focusing depth, and sample volume differences). Normalizing is performed after baseline correction. Normalizing can include vector normalization, standard normal variate normalization, or Min-/Max-Normalization. For example, Min-/Max-Normalization can be used to scale the spectra from 0 to 1 (e.g., compare FIG. 2e and 2f).
  • Method 100 includes analysis step 114 for numerical determination of Raman peak parameters using curve fitting algorithms.
  • the curve fitting can be based on vibrational strength to improve the precision in peak position and intensity. Low strength bands can be eliminated (e.g., compare FIG. 2f and 2g).
  • the normalized spectral data can be fitted using Gaussian, Lorentzian, Gaussian-Lorentzian, Voigtian, Pearson type IV, and beta profiles.
  • the fitting results provide the peak position, intensity, area, and full width at half-maximum values of the measured Raman bands.
  • the most appropriate fitting profile should be selected depending on the nature of the Raman bands, which are related to the sample properties. Multiple peaks that are not clearly separated can be fit simultaneously provided the residuals in the fitting of one peak will not affect the fitting or the remaining peaks to a significant degree.
  • Method 100 includes refining step 116 for highlighting the important features of the spectra signature and minimizing the effect of variations due to acquisition artifacts and sample properties. Refining includes calculating the second derivative of the analyzed spectra.
  • Conversion of the refined (final) peaks to the OIT includes assigning a binary value (0 or 1) to each second derivative spectral data point primarily based on the sign or the value of the second derivative, i.e., 0 for upward curvature (positive second derivatives), and 1 for downward curvature (negative second derivatives).
  • a threshold for zeros can be selected as a percentage of the maximum absolute value of the second derivative for negative second derivative readings (for all absolute value larger than the threshold, 1 is retained; otherwise the value is changed to 0).
  • the OIT captures the characteristic peaks of the spectrum as positive values. Barcodes are assigned on the basis of the second derivative sign, with the position and thickness of the positive values (wavenumber cm 1 ) displayed as black bars (FIG. 2i).
  • Method 100 can be fully automated for high-throughput spectral processing.
  • Current Raman-based characterization of compositions is complex and time-consuming, resulting in low throughput and limiting the statistical significance of the acquired data.
  • High-throughput screening Raman spectroscopy would reduce or eliminate the complexity and reduce the extent of human involvement.
  • the need to provide initial estimates of the number of peaks, their band shapes, and the initial parameters of these bands have presented an obstacle to the full automation of peak fitting and its incorporation into fully automated spectral-preprocessing workflows.
  • Spectral conversion to OITs increases the quality of information obtained from Raman spectra and improves data visualization, thereby facilitating automation.
  • Method 100 can permit the determination of spectral classification within a priori library reference databases (i.e., libraries containing OITs of molecules of interest); and can be used to create an OIT library reference database.
  • the OIT database can be stored and organized like a barcode database, with the OIT providing a reference which a computer uses to look up associated computer disk record(s) containing descriptive data and other pertinent information (e.g., chemical or composition name).
  • the OIT of the database can be organized and stored in different file formats (e.g., industry standard formats) with metadata describing the instrument configuration parameters of the acquired spectrum, user comments, and/or user defined parameters.
  • the data files can be organized in a file system.
  • the OIT database can be organized in a sample-centric manner, or a sample/OIT-centric manner.
  • the OIT database can be static or can permit periodic or continuous expansion to enhance identification.
  • Method 100 can be used to generate an OIT for identification of an unidentified chemical compound.
  • the identification can be in real-time.
  • the binary format of the OIT enables efficient, specific, and selective search that can be used to identify an unidentified chemical compound or unidentified components of a composition by comparing a query OIT with a stored OIT (e.g., an OIT of a screening library) using a binary similarity or binary distance measure, with greater accuracy than more complex algorithms which regularly used for qualitative Raman spectral analysis.
  • the binary distance measure determines the number of points as which the binary codes of the query and reference OITs are different (i.e., the distance).
  • the comparison can utilize one or more binary distance measures, such as the Hamming distance.
  • the identity of the query OIT can be based upon the closest distance match.
  • a Raman spectrum from a sample will contain Raman information about the molecules present within the analysis volume of the system.
  • the Raman spectrum of a sample comprising a composition including a mixture of molecules will contain peaks representing all of the different molecules. If the OITs of the molecules are known (e.g., stored in an OIT database), the composition OIT can be matched with known OITs to provide quantitative information about the molecules in the mixture.
  • method 100 can include matching the query OIT (i.e., the OIT of the converted spectrum) and one or more stored OITs (i.e., reference OITs) based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identify one or more components present in the composition.
  • the OIT of a composition comprising several analytes can be matched with one or more OITs corresponding to one or more chemical compounds within the composition.
  • a peptide OIT can be matched several or all of the A A OITs of the primary sequence and permit identification of a point mutation (as demonstrated in the Examples below).
  • method 100 includes collecting spectra data of relevant reference samples to build the screening OIT library.
  • a Raman spectroscopy system generally includes four major components to collect a Raman spectrum: (1) excitation source (e.g., laser or high brightness laser; (2) sample illumination system and light collection optics; (3) wavelength selector (e.g., Filter or Spectrophotometer); and (4) detector (e.g., photodiode array, CCD or PMT).
  • excitation source e.g., laser or high brightness laser
  • sample illumination system and light collection optics e.g., Filter or Spectrophotometer
  • detector e.g., photodiode array, CCD or PMT.
  • the Raman spectroscopy system can be a bench-top or portable (e.g., handheld) system.
  • the system can include software for selecting the optimum exposure time and number of exposures for a sample, a signal-to-noise ratio, maximum collection time, and other data collections parameters (e.g., autofocus), to give the best spectrum without saturating the detector.
  • the system is a Fligh Throughput Screening (FITS) Raman system.
  • FITS Fligh Throughput Screening
  • An FITS Raman spectroscopy system can use a combination of automated sample movement, autofocus devices, and automated data acquisition and analysis procedures to acquire spectra from hundreds of samples sequentially. Automated measurements can be integrated with robot handling to reduce or eliminate the need for expertise and operator intervention.
  • Method 100 can include receiving the entire collected spectrum or portions thereof.
  • the OIT can be converted from the spectral data within a range of wavenumbers.
  • a range of wavenumbers can be selected from wavenumbers within the range of 200-4000 cm 1 , such as 200-2000 cm 1 , 200-1800 cm 1 , 400-1800 cm 1 , 500-2000 cm 1 , 600-4000 cm 1 , 2400-3800 cm 1 or 2500-4000 cm 1 .
  • multiple OITs can be converted from spectral data for a single compound or composition using spectral data from more than one range of wavenumbers (e.g., low wavenumbers, high wavenumbers, and the full range of wavenumbers).
  • method 200 includes the use of one or more of the steps set forth in method 100 to identify one or more unidentified analytes, such as the molecular components of a sample composition.
  • the method can use an OIT encoding a Raman spectrum to automatically resolve the individual, pure chemical species within a complex, heterogeneous sample to identify the components and/or verify the composition quality.
  • the identification performed in method 200 can be used to verify the composition of a sample for quality control.
  • Method 200 includes step 202 directed to receiving and storing an OIT database (e.g., a screening or reference OIT library) comprising a plurality of OITs, wherein each OIT corresponds to a molecule relevant to the sample composition.
  • OIT database e.g., a screening or reference OIT library
  • Data received and/or stored with the OIT can include information about the corresponding molecule, e.g., chemical identity (name).
  • Receiving can include building an OIT library (as discussed above) or loading a pre-built screening library into the system for implementing method 200.
  • Step 202 can include collecting the Raman spectrum using a system comprising a Raman spectrometer, such as a Raman spectroscopy system as described above for method 100.
  • a Raman spectrometer such as a Raman spectroscopy system as described above for method 100.
  • Method 200 includes step 204 directed to obtaining a sample composition containing at least one component of unknown or unverified chemical identity.
  • the form of the sample composition can be a solid, powder, liquid, gel, emulsion, slurry, suspension, or gases.
  • Method 200 does not require labeling any of the components of the sample for detection and identification. In some cases, none of the components of the sample are labeled or tagged.
  • Method 200 includes collecting step 206 to acquire the Raman spectrum of the sample composition using a Raman spectroscopy system, such as a system described above.
  • Collecting a Raman spectrum can include exposing the sample composition to an excitation wavelength and detecting the Raman signal.
  • the excitation wavelength can be the range of about 400 nm to about 1000 nm, such as 473 nm, 532 nm, 633 nm, and 785 nm.
  • multiple spectra are collected using different excitation wavelengths for each spectrum.
  • the sample composition is exposed to the laser source for a sufficient period of time to generate a Raman signal. In some exemplary embodiments, the period of time ranges from less that about 1 second to about 3 hours. Two or more collected spectrum can be averaged to provide the Raman spectrum.
  • the Raman spectrum is a Surface-Enhanced Raman Spectroscopy (SERS)-spectrum collected using SERS.
  • step 206 includes preparing the sample for SERS by bringing an analyte in contact with, or adjacent to, a Raman-activate metal surface or structure (a “SERS-active structure”). The interactions between the analyte and the SERS-active structure cause an increase in the strength of the Raman signal. Therefore, SERS can be used to collect Raman spectra from samples containing unidentified compounds in submicromolar concentrations.
  • SERS-active structure Raman-activate metal surface or structure
  • the concentration of the one or more analytes can range from picomolar to 1000 nM, such as equal or less than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 400, 500, 600, 700, 800, 900, and 1000 nM.
  • Step 206 can include collecting the Raman spectrum using a system configured for SERS.
  • the Raman spectrum is collected using a SERS- active structure such as a SERS-active substrate.
  • Step 206 can include contacting the sample composition with a SERS-active structure by depositing the sample on a prepared SERS-active substrate.
  • the SERS-active substrate can be a composite material.
  • method 200 includes one or more steps for preparing the SERS-active substrate.
  • preparing the SERS-active substrate can include immobilizing particles of a SERS-active material to a solid support to form a composite.
  • the particles can be immobilized randomly or uniformly (e.g., in an array).
  • the particles can be immobilized in an array of discrete spots formed by depositing a drop of suspended Raman-active material on the solid support.
  • the discrete spots can have a diameter suitable for collection of the Raman spectrum.
  • the SERS spots have a diameter greater than the laser spot diameter, such as diameter of greater than 10, 50, 100, 150, or 200 pm, and up to 250 pm.
  • the uniformity of a substrate with an array of SERS- sensing spots can be confirmed by measuring variation in the intensity of a peak from a Raman standard at one or more of the spots.
  • the variation in the signal intensity collected from a plurality spots of a SERS- active substrate as described above is less than 20%, such as 18%, 17% or 16% or less.
  • the immobilized particles can include nanoparticles, such as nanospheres, having an average diameter in the range of about 10-100 nm. Particles of the composite can be prepared in various sizes and shapes to fine tune the surface polarization and plasmon resonance in SERS. Nanoparticles are a preferred size as they provide more plasmonically active surface area.
  • the nanoparticle will have an effective average diameter (i.e., the smallest cross-section of the nanoparticle or plasmon-resonating layer) of about 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 nm.
  • the average diameter is in the range of about 10-20, 10-30, 10-40, 10-50, 10-60, 10-70, 10-80, 10-90, or 10-100 nm.
  • Nanoparticle shapes include spherical (nanospheres), cube shape (nanocubes), rod shape (nanorod) or wire shape (nanowires).
  • the nanoparticles have a spherical shape (nanospheres) with protrusions on the surface thereof.
  • gold nanostars with different shapes (number of protrusions) and plasmonic properties can be synthesized by changing the seed volume and concentration of HAuCU-
  • the density of the nanoparticles on the substrate may vary depending on factors such as the substances to be detected and the production process for the nanoparticles.
  • Non-limiting examples of nanoparticle density include about 10-800 particles/pm 2 , such as about 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450 and 500 particles/pm 2 .
  • the solid support can be chemically resistant, and preferably repels one or more solvents.
  • the solid support can be an inherently solvophobic, or a solid support functionalized to be solvophobic by coating, grafting, or the like, with one or more solvophobic moieties.
  • the solid support is omniphobic.
  • Suitable materials for the solid support include silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals and combination thereof.
  • the solid support is a silicon wafer or a polytetrafluoroethylene membrane (e.g., as sold under the trade name TEFFON).
  • the surface of the solid support can be rendered super-resistant by grafting a sufficient number of low surface energy functional groups (e.g., CF2, CF2H, and/or CF3 groups), or by coating the surface with a lubricating fluid, such as a fluorinated or perfluorinated material selected from perfluorinated phosphates, perfluorinated silanes, fluorinated monomers, polymers and copolymers, and other fluorinated precursors.
  • a lubricating fluid such as a fluorinated or perfluorinated material selected from perfluorinated phosphates, perfluorinated silanes, fluorinated monomers, polymers and copolymers, and other fluorinated precursors.
  • the thickness of the coating may range from about for example about 1 nm to about 300 nm. Exemplary embodiments of the thickness of the coating include about 1, 5, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 250 and 300
  • the particles of the composite may be composed of various Raman active materials.
  • the immobilized particles include at least a noble metal such as aluminum, copper, silver, platinum, palladium, or gold.
  • the particles include alloys such as copper/silver/gold alloy (e.g., copper-silver alloy, copper-gold alloy, silver-gold alloy, copper-silver-gold alloy).
  • the particles are core-shell nanoparticles, which include a core of, for example silica, platinum, or other metal particles, onto which a layer is deposited, e.g., layers of Cu, Ag, or Au.
  • the particles are coated with a solvophobic or omniphobic material, as discussed above with regards to the solid support.
  • the particles can be coated before or after immobilization on the solid support.
  • the composite can include a surface- functionalized solid support in combination with surface-functionalized nanoparticles.
  • the same omniphobic coating e.g., functional groups
  • a SERS-active substrate can include a perfluoropolyether (PFPE)-coated TEFLON membrane solid support having lH,lH,2H,2H-PerfluorodecanethioI-coated nanostars immobilized thereon.
  • PFPE perfluoropolyether
  • the PFPE fluid used to coat the solid support can be the proprietary fluid composition sold commercially under the tradename GPL- 100.
  • Step 206 can include contacting the above described composite with the sample composition to be analyzed, wherein the one or more components, are chemisorbed or physisorbed to the particles of the composite.
  • the analyte, substance, or composition to be tested may be disposed on the nanoparticles in various forms.
  • sample composition is deposited on or near the nanoparticle surface in a powder, a vapor, or a solution or suspension.
  • a SERS-active substrate comprising nanostars immobilized in an array of concentrated spots (“SERS- sensing spots” or “SERS spots”) (as shown in FIG. 6) can be contacted with the sample by applying microdrops of sample solution onto the spots.
  • step 206 can include collecting a SERS-spectrum by exposing the one or more substrate-bound components of the sample composition to an excitation wavelength and detecting the Raman signal.
  • the excitation wavelength can be the range of about 400 nm to about 1000 nm, such as 473 nm, 532 nm, 633 nm, and 785 nm. In some cases, multiple spectra are collected using different excitation wavelengths for each spectrum.
  • the sample composition is exposed to the laser source for a sufficient period of time to generate a Raman signal. In some exemplary embodiments, the period of time ranges from less that about 1 second to about 3 hours.
  • Method 200 includes spectral data converting step 208 whereby the collected spectrum is corrected and processed into a query OIT.
  • Method 200 can include steps 101a and 101b as described above for method 100.
  • Method 200 can include steps 102-110 and steps 112-116.
  • method 200 includes performing the same steps in the same order as the method used to obtain the stored OITs to generate the query OIT.
  • Step 208 can be performed using a system for identifying one or more components of a composition, which system can include an OIT converter configured to receive and convert the Raman spectrum to an OIT.
  • Method 200 includes matching step 210 whereby the query OIT is systematically compared with at least one of the plurality of stored OITs.
  • Method 200 does not rely upon principle component analysis (PC A) or discriminant function analysis (DFA) for classification or identification of the query OIT.
  • PC A principle component analysis
  • DFA discriminant function analysis
  • Method 200 includes comparing OIT using a bit- based measure, such as binary similarity or dissimilarity measures.
  • Step 210 can include computing the similarity or dissimilarity (distance) between the query OIT and one or more of the stored OITs.
  • the similarity metric can be a correlation- or noncorrelation- based metric.
  • Dissimilarity can be computed using a binary distance metric, such as the binary Euclidean distance, the Bray and Curtis distance, or the Hamming distance.
  • matching step 210 includes accepting or rejecting the computed dissimilarity of the query OIT to a stored OIT based on the Hamming distance.
  • the computation includes adding the total number of times that two corresponding values in the two OITs disagree. Expressed as a fraction between 0 and 1, the Hamming distance between two random and independent OITs would be expected to be 0.5, since any pair of corresponding values has a 50% likelihood of agreeing and a 50% likelihood of disagreeing. Thus, if two OITs from different molecules are compared, their Hamming distance would be expected to be 0.5. If the stored OIT and the query OIT are from the same molecule, the Hamming distance would be expected to be considerably lower.
  • the Hamming distance can be computed using the elementary logical operator XOR (Exclusive-OR) and thus can be performed more efficiently than complex algorithms.
  • Matching can include a reasonable match as determined by a one or more predetermined thresholds for rejection. For example, a match can be a “similar” confidence match based on a first threshold and a “high” confidence match based on a second threshold.
  • step 210 can include calculating the decision confidence level. Yes/No decisions in OIT matching have four possible outcomes: either a given OIT is or is not a match, and for either of these two cases, the decision made can be correct or incorrect. The four outcomes can be described as a usually termed a true positive, a false negative, false positive and a true negative.
  • Step 210 can be performed using a system for identifying one or more components of a composition, which system can include an OIT analyzer configured to compare the query OIT to a reference OIT of the screening library and calculate a binary similarity measure or binary distance measure between the compared OITs.
  • the OIT analyzer can be configured to match the query OIT to one or more reference OITs based on the calculated measure.
  • Method 200 includes identifying step 212.
  • Step 212 includes retrieving the chemical identity (e.g., chemical name, chemical structure, or index number (e.g., CAS No.)) of the molecule that corresponds to the stored OITs that matched with the query OIT, or a portion of the query OIT.
  • Step 212 can be performed using a local processor configured to retrieve the chemical identity of the stored OITs from a remote processor.
  • the identifying step includes generating a list providing the chemical identities of any stored OITs that are identical to one or more regions of the query OIT, or of any stored OITs that are similar to one or more regions of the query OIT.
  • the identifying step includes generating a list providing the chemical identities of any stored OITs that are identical to one or more regions of the query OIT, or of any stored OITs that are similar to one or more regions of the query OIT.
  • Method 200 can be more accurate and efficient than existing methods of spectral matching algorithms such as MCR-ALS, principal component analysis, and machine learning, as demonstrated in the Examples below. Method 200 is more efficient than machine learning methods requiring large and comprehensive training data. Moreover, method 200 can be automated for high-throughput analysis and does not require direct matching of spectral data with point by point comparison, or of identified peaks derived from the spectrum. The need to provide initial estimates of the number of peaks, their band shapes, and the initial parameters of these bands have presented an obstacle to the full automation of peak fitting and its incorporation into fully automated spectral -preprocessing workflows. When used for the identification of incoming materials before entry into a manufacturing process, for example, to detect incorrect formulation, contamination, mislabeled containers or counterfeit materials, method 200 can improve the efficiency of process workflow compared with other methods of material identification.
  • Method 200 can be advantageous in a variety of fields.
  • Method 200 can be used in a manufacturing, distribution, repackaging, or a healthcare environment (e.g., hospital formulary, gas monitoring during anesthesia, exhaled-gas monitoring, etc.).
  • a healthcare environment e.g., hospital formulary, gas monitoring during anesthesia, exhaled-gas monitoring, etc.
  • method 200 can be used for process quality control in polymer, chemical, pharmaceutical, or food manufacturing by facilitating raw material identification (e.g., active pharmaceutical ingredients, excipients, pharmaceutical raw materials, pharmaceutical packaging materials) or testing for product composition and uniformity.
  • Products to be analyzed can include pharmaceutical dosage forms, such as oral dosage forms, injectables, inhalants, intravenous solutions, transdermals, suppositories, ophthalmic preparations.
  • Method 200 can be used for chemical detection in at ports-of- entry, custom facilities, import facilities, mail facilities, regulatory centers, for forensic analysis, toxicological evaluation, and detection of environmental pollutants.
  • method 200 can be used to identify a source of a composition or to confirm the labeling of a composition.
  • method 200 can be used for to detect and identify adulterants in pharmaceutical or food/beverage samples.
  • Method 200 can be used in diagnostic applications.
  • method 200 can be used to detect and identify a metabolite or mutation (e.g., a point mutation in a peptide) that is associated with a pathological condition.
  • a metabolite or mutation e.g., a point mutation in a peptide
  • Amino acid (AA) substitutions are directly correlated with specific pathologies such as Alzheimer’s disease, making their rapid screening and detection critical to treatment and scientific study.
  • This example demonstrates a proof-of-concept implementation of the label-free and non-invasive Raman spectroscopy technique for the detection of AA substitutions in primary peptide fragments.
  • OITs optical identification tags
  • the mutation screening strategy can detect a single point missense mutation in a 11 -AA peptide fragment of amyloid beta Ab(25-35) and a frameshift mutation in a 42- AA fragment Ab(1-42) down to picomolar concentrations.
  • SERS surface-enhanced Raman scattering
  • the present work describes a non-invasive and label-free detection method for analytes based on encoded Raman fingerprint tags or optical identification tags (OIT) and a method of creating a library of OITs for high-throughput mass screening of analyte.
  • Raman spectroscopy is a vibrational technique, which has been used to provide direct measure chemical-specific information, from which non destructive and rapid detection is possible.
  • the method is demonstrated using peptides.
  • OITs which containing Raman fingerprints of the individual amino acids (AA)
  • these OITs can be amplified using a plasmonic substrate, referred to herein as a SERS-active substrate, based on surface-enhanced Raman scattering effect.
  • a custom SERS sensor was created by depositing SERS-active symmetric gold nanostars on an omniphobic membrane.
  • the OIT screening strategy can detect a single frameshift and missense mutation of a peptide sample at sub-nanomolar concentration.
  • the proposed detection strategy includes two main parts: encoding of the SERS-amplified Raman spectra of screening AAs into OIT library and decoding AAs composition of the peptide.
  • SERS-active substrate improves the sensitivity of the detection in the case of limited sample availability.
  • the SERS-active substrate was fabricated by depositing highly symmetrical gold nanostars on an omniphobic membrane.
  • the gold nanostars were synthesized from icosahedral symmetrical gold nanoseeds. Following, these stars were coated with perfluorinated ligand (PF) to prevent the direct conjugation of the gold surface to high-affinity thiol and amine groups in biomolecules.
  • PF perfluorinated ligand
  • SERS-sensing spots were formed by depositing the SERS-active nanostructures onto a liquid-repellent lotus-leaf- inspired membrane fabricated using a slippery liquid-infused porous surface (SLIPS).
  • SLIPS slippery liquid-infused porous surface
  • the SERS-active substrate prevents the “coffee ring” defect during drying and improves the sensitivity of OIT analysis when the available concentrations of library members and targeted peptides are low.
  • the general procedure of substrate fabrication and SERS signal collection is shown in FIG. 2.
  • the proposed OIT screening system was based on the direct conversion of the Raman spectra of each of the twenty AAs into a binary barcode tag (the OIT). Afterward, the OIT can be matched with the barcode equivalent of the Raman fingerprint of a given peptide to determine its composition. It is crucial to note here that this is not a sequencing method; instead, the mutation point was identified by comparing the composition analysis of wild-type and mutant.
  • the procedure of encoding OIT library of screening AAs and the decoding AAs composition of the peptide was provided in FIG. 5. [0091] First, Raman spectra of the twenty AAs were collected using 473 nm laser.
  • the OITs were subsequently generated by subjecting the spectra to autonomous processing steps as demonstrated in FIGs. 2a-i. After each conversion process, the spectra were reduced to a series of binary digits by labeling wavenumbers with positive intensities as 1 (black) and other wavenumbers as 0 (white). The final binary pattern was referred to as the OIT, in which each bar position and width aligned well with the encoded peaks in the original Raman spectra. Using this process, a unique OIT for each AA was collected to build the screening library and the binary barcode of six Alzheimer Ab peptides were also collected for comparison.
  • the correlations of the molecular vibrational spectrum and its structure allows the structural components to be derived from its spectra or its OIT equivalent.
  • a matching function “Flamming distance)
  • the OIT of the peptide can be screened through the constructed library containing the OIT-encoded structural information of all the AAs to determine its composition.
  • the AAs composition detection process was also optimized using different wavenumber range of the Raman spectrum and different excitation laser sources at 473 nm, 633 nm, and 785 nm.
  • the optimization process was summarized in Table 1, which showed the highest performance of the encoding/decoding process was obtained when OITs were converted from the extended range (200-4000 cm 1 ) of the Raman spectra collected at 473 nm excitation laser wavelength.
  • the optical identification tag (OIT) technique can be used to convert a Raman fingerprint of an analyte into its own identification label, which enables tagging and tracking of the analyte using its unique OIT.
  • this approach facilitates the encoding a screening library of OITs, in which each tag represents a corresponding library member.
  • the correlation between of vibrational Raman spectrum (barcode tag) and the molecular structure together with the multiplex capabilities permits composition analysis based on shared features of the codes.
  • this strategy demonstrates detection of a mutation point in peptide fragment by directly generate an OIT screening library from the Raman fingerprint of the amino acids and derive the amino acid composition of peptide chains. Then, a mutation point can be identified by comparing the composition analysis result of wild- type and mutant peptides.
  • the example provides a robust encoding method that generates an OIT screening library directly from the Raman fingerprint of twenty proteinogenic AAs and derives the A A composition of Ab peptide fragments, specifically the Ab(15-20), Ab(25- 35) and Ab(1-42) peptides that are heavily studied in Alzheimer’s research. A mutation point can then be detected by comparing the AA composition of the mutant and wild- types peptides.
  • a sensitive SERS-active substrate composed of gold nanostars can be provided (FIG. 5) to amplify detection of mutant protein at picomolar concentrations based on plasmonic field Raman enhancement effect.
  • An additional advantage of this substrate design is achieved by incorporating an omniphobic platform that prevents non-uniform enhancement of Raman spectra due “coffee ring” drying defect and allows a variety of carrier solvents to be used.
  • the SERS-active substrate was fabricated by depositing gold nanostars on an omniphobic membrane. Nanostars were selected because of the higher magnitude of Raman enhancement due to their anisotropic geometry compared to smooth, spherical nanostructures. After a seed-mediated growth step from the icosahedral gold seeds according to routine methods, the stars were coated with perfluorinated ligand (PF) to prevent the direct conjugation of the gold surface to high-affinity thiol and amine groups in biomolecules (FIG. 5a). Characterization data of the seed and star nanoparticles including UV-visible extinction spectra, transmission electron microscopy (TEM) image and scanning electron microscopy (SEM) image were provided in FIGS. 6a-d.
  • TEM transmission electron microscopy
  • SEM scanning electron microscopy
  • SERS- sensing spots were formed by depositing the SERS-active nanostructures onto a liquid- repellent lotus-leaf-inspired membrane fabricated using a slippery liquid-infused porous surface (SLIPS), (FIG. 5b). Before each Raman experiment, the analyte solution was deposited on each spot (FIG. 5c). Optical image of the substrate and SEM image of a single SERS-spot (about 150 pm in diameter) were shown in FIGS. 7a-c. To evaluate the substrate uniformity, Raman spectrum of Ab(15-20) at a concentration of 10 5 M was taken from eight SERS-spots (FIG. 9a).
  • the intensity of the aromatic vibrational peak at 1003 cm 1 was used to confirm the reproducibility of the spectrum, which showed an average value of 122 counts per seconds (cps) with a standard deviation of 16% ((FIG. 9b).
  • the SERS-active substrate improves the sensitivity of the OIT analysis when the available concentrations of the library members and targeted peptides are limited.
  • the OIT screening system was based on the direct conversion of the Raman spectra of each of twenty AAs and peptides into binary barcode tags (i.e., the OITs) (FIGS. 3a-b).
  • Raman spectra of the samples were collected at 473 nm laser wavelength.
  • the OITs were subsequently generated by subjecting the spectra to autonomous processing steps as fully detailed in the data analysis in the supporting information and the supplementary file. Demonstration of the encoding process is shown in FIGS. 2a-i) for one library member, cysteine (C), as an example. Analysis result of a l samples in this example is also provided in FIGS. 15-17.
  • the OIT library collected at 473 nm was used to analyze the composition of three Ab peptides including Ab(15-20), Ab(25-35) and the entire sequence Ab(1-42) with 6, 11, and 42 AA residues, respectively.
  • the correlations of the molecular vibrational spectrum and its structure allows the structural components to be derived from its spectra or its OIT equivalent.
  • the OIT of the peptide can be screened through the constructed library containing the OIT-encoded structural information of all the A As to determine its composition (FIG. 3c). Details of the screening procedure were provided in the data analysis in the supporting information and the supplementary file.
  • the screening process resulted in true positives/negatives when AAs were correctly detected as present/not-present and false positives/negatives when AAs were mis-assigned (FIG. 3d). All AA components of Ab(15-20) were detected with one false negative value for tyrosine (Y). For Ab(25-35), one false negative proline (P) and one false positive glycine (G) were misidentified. For the long peptide Ab(1-42), only glycine (G) and methionine (M) were incorrectly identified as being present. Since glycine (G) is the smallest among all AAs and has no side chain, its signature could be overwhelmed by other components. Nonetheless, the OITs approach can detect the vast proportion of AAs present in each peptide chain, confirming the baseline sensitivity of the screening system.
  • the procedure correctly recognized the C-terminal insertion of threonine (T) [Ab(1-43)] at a concentration of 1 nmol using SERS-active substrate; but there was a negligible improvement in detection compared to non-enhanced Raman analysis due to the increase in the sequence length and the complexity of the structure. Still, the developed OITs screening strategy was capable of extracting AA compositions and individual mutations at sub-mihoI level.
  • the LW region of the OIT could not be used alone for composition analysis of peptides containing both aromatic and non- aromatic A As.
  • the ability of aromatic signals to overwhelm other signals in a spectrum has been reported in previous Raman study of peptides, indicating the necessity of including additional spectral indicators.
  • AA composition result of the mutant and wild type peptides was compared to detect the mutation point.
  • the sequence information of the wild type and mutant peptide is provided in Table 3 and the result of the mutation point detection is provided in Table 4.
  • Table 4 demonstrates that the OIT screening strategy can be implemented under normal Raman conditions without the SERS-active substrate and delivers a sensitivity at submicromolar concentrations.
  • MCR-ALS multiple component regression with alternating least square constraints
  • composition analysis using OITs has several advantages over other methods, including ease of library preparation and straightforward screening procedure. These advantages are particularly critical when compared with other analysis methods in terms of the cost of acquiring training data and complicated result interpretation.
  • Triethylene glycol (TEG), gold(III) chloride trihydrate (HAuCUGtbO), polyvinylpyrrolidone (PVP, average MW 55,000), distilled ultrapure water (DI), octylamine, hydrochloric acid (HC1) 37%, lH,lH,2H,2H-Perfluorodecanethiol (PF), N- dimethylformamide (DMF) were acquired from Sigma- Aldrich and used without further purification.
  • Peptides were purchased from Sigma-Aldrich and Abeam.
  • CaF2 Raman- grade slides were purchased from Crystran.
  • GPF-100 solution perfluoropoly ether (PFPE)
  • TEFLON polytetrafluoroethylene (PTFE) membranes were obtained from Thermo Fisher Scientific.
  • Teflon membrane was spin-coated with GPL-100 solution at 500 rpm for 1 minute, after which concentrated gold nanostars (2 m ⁇ ) was deposited for each SERS- spot.
  • Gold nanostars and seeds were imaged with transmission electron microscopy (Tecnai G2 20 TWIN, FEI) after casting on a copper grid. Scanning electron microscopy (Nova 600 NanoLab, FEI) was used to image the nanostars deposited on ITO glass as well as the final SERS-active substrate. The UV-visible absorption spectra of the seed and star nanoparticles were measured using a spectrophotometer (Cary 6000, Agilent).
  • the lower limit of similarity score to determine the AA composition of the reference peptide was determined at the highest MCC correlation value. AAs composition of the mutated peptide was derived above the calculated limit value of the reference peptide.
  • Mathews correlation coefficient was used to calculate the correlation between true positive/ negative (TP/TN) and false positive/negative (FP/FN) values with the actual composition.
  • MCR-ALS analysis [0119] Normalized Raman spectrum of AAs and peptides were used as input for Multivariate Curve Resolution-Alternating Least Squares analysis using MCR-ALS 2.0 toolbox. All spectra were x- waveshift calibrated and relative intensity corrected.

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

Embodiments of the present disclosure feature methods and systems for chemical analysis using Raman spectroscopy, including methods for analyte identification using a screening library of binary optical identification tags (OITs) containing encoded Raman spectra, and methods and systems for encoding a Raman spectrum of a molecule or composition in an OIT for screening library construction and sample analysis.

Description

ENCODING RAMAN SPECTRAL DATA IN OPTICAL IDENTIFICATION TAGS FOR ANALYTE IDENTIFICATION
SEQUENCE LISTING
[0001] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on May 29, 2020, is named 4053_236USP2_SL.txt and is 2,305 bytes in size.
BACKGROUND
[0002] Amino acid (AA) substitutions are directly correlated with specific pathologies such as Alzheimer’s disease, making their rapid screening and detection critical to treatment and scientific study. Established peptide composition analysis techniques, i.e., Edman degradation and mass spectrometry are time-consuming, low- throughput, and require complete sample destruction. Another vibrational technique such as Fourier transform infrared (FTIR) is not suitable for bio-applications due to intense absorption of water in the infrared region. Previous work identified AA components through indirect detection by chemical or physical conjugation of known reporters such as dyes or strong Raman scatterers. While appearing promising, the labeling step required preparation of compatible tags and is prone to artifacts during multi-step synthesis procedure. There is a need for more cost-effective methods providing more straightforward analysis.
[0003] Raman spectroscopy is a fast, powerful, and easy-to-use analytical technique. Detailed fingerprints in combination with non-destructivity and minimal sample preparation has allowed the construction of reference libraries of Raman spectra in a variety of research fields. Since libraries often contain many different and/or highly similar spectra, it is important that each data point in all the spectra corresponds to the exact Raman wavenumber. Spectral resolution and relative intensities (due to resonance effects) are known to affect compatibility between spectral libraries. Automated data analysis requires a large library of spectra. Calibration transfer schemes for distributing reference data or methods between different instruments are needed to merge spectral data. Such schemes would permit spectra collected from different instruments to be merged in a large reference library.
[0004] The analysis of Raman imaging data subsequent to its collection is essential for the ability of this technique to provide spatially resolved chemical and molecular information. A variety of multivariate methods have been used to analyze Raman imaging data, including multivariate curve resolution — alternating least squares (MCR-ALS), principal component analysis (PCA), cluster analysis, neural networks, partial least squares (PLS), and direct classical least squares (DCLS). The accuracy of these models is substantially dependent on large volumes of training data, which can be challenging for some applications. Improved methods of automated data analysis are needed to enhance the accuracy and efficiency of Raman-based screening technologies.
SUMMARY
[0005] In general, embodiments of the present disclosure feature methods, materials, and systems for chemical analysis of compositions based on the spectral fingerprint of Raman spectral data, which has been encoded in an optical identification tag (OIT). Embodiments described in the present disclosure include methods and systems for encoding a Raman spectrum in an OIT for library construction and sample analysis and methods and systems for analyte identification based on the similarity of a sample OIT to one or more reference OITs of a screening library. The methods and systems of the present disclosure permit automated OIT library construction and search to encode Raman spectra from any sample, including samples at very low concentration (e.g., picomolar) with greater efficiency and reliability than methods requiring complex algorithms or machine learning. The binary format of an OIT enables efficient, specific, and selective search of an OIT library using a binary matching process with a screening library of candidate OITs, whereby the goodness of a match between two OITs is efficiently evaluated using a bit based similarity or distance measure, such as the binary distance metric, Hamming distance.
[0006] Accordingly, in a first aspect, embodiments of the present disclosure feature a method for identifying one or more components of a composition, the method including: (a) receiving a Raman spectrum of a composition having one or more analytes of unknown chemical identity; (b) converting the Raman spectrum of the composition to an optical identification tag (OIT) which includes or is a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks and the position and width of each bar aligns to a characteristic spectral peak; (c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; and (d) matching the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identifying one or more components present in the composition. The received Raman spectrum can be collected using an excitation wavelength selected from the group consisting of 473-, 532-, 633-, and 785-nm. The range of wavenumbers of the Raman spectrum used in the converting step can be selected from the group consisting 200-4000 cm 1, 200-1800 cm 1, and 2500-4000 cm 1, or a combination thereof. Correcting the spectrum can include: (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; (d) subtracting the background of the smoothed spectrum; and (e) correcting the relative intensity of the background corrected spectrum. Processing the corrected spectrum can include: (a) normalizing the corrected spectrum to a maximum intensity of 1 ; (b) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normalized spectrum to eliminate weak vibrational peaks; (c) refining the remaining peaks using second order derivatives; and (d) converting the positive values to the bars of the barcode. The method can further include creating the screening barcode library of reference OITs by: (a) receiving Raman spectrum of one or more reference molecules; and (b) converting the Raman spectrum of each reference molecule to a reference OIT, wherein the reference OIT comprises a barcode, wherein the position and width of each bar aligns to a peak of the Raman spectrum of the reference molecule. The Raman spectrum can be a Surface- Enhanced Raman Scattering (SERS) spectrum obtained from a sample immobilized on a SERS-active substrate. The SERS-active substrate can be solvophobic or omniphobic the converting step can include encoding the Raman spectrum by: (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; (d) subtracting the background of the smoothed spectrum; (e) correcting the relative intensity of the background corrected spectrum; (f) normalizing the corrected spectrum to a maximum intensity of 1 ; (g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normali ed spectrum to eliminate weak vibrational peaks; (h) refining the remaining peaks using second order derivatives; and (i) converting the positive values to the bars of the barcode. The one or more analytes can be selected from the group consisting of amino acids of a polypeptide, an active pharmaceutical ingredient or excipient of a pharmaceutical dosage form and an adulterant in a pharmaceutical composition, food, or beverage.
[0007] In another aspect, embodiments of the present disclosure feature a system for identifying one or more components of a composition, the system including: (a) a Raman spectrometer for collecting a Raman spectrum of a composition comprising one or more components, wherein the one or more components are unlabeled analytes; (b) an optical identification tag (OIT) converter configured to receive and convert the Raman spectrum to an OIT comprising a barcode, wherein the configuration includes instructions for correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks, and wherein the position and width of each bar aligns to a characteristic spectral peak, and a processor to execute the instructions; and (c) an OIT analyzer configured to: (i) compare the OIT of the converted spectrum to a reference OIT of a screening library wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; (ii) calculate a binary similarity measure or binary distance measure for the comparison between the compared OITs; and (iii) match the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identify one or more components present in the composition. Correcting the spectrum can include (a) calibrating the x-axis of the spectrum; (b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library; (c) smoothing the transformed spectrum; and (d) subtracting the background of the smoothed spectrum; and wherein processing the corrected spectrum includes: (e) correcting the relative intensity of the background corrected spectrum; (f) normalizing the corrected spectrum to a maximum intensity of 1; (g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normali ed spectrum to eliminate weak vibrational peaks; (h) refining the remaining peaks using second order derivatives; and (i) displaying the positive values as bars of the barcode. The Raman spectrometer can be configured for Surface-Enhanced Raman Scattering (SERS). The system can further include a SERS-active substrate comprising SERS-active particles immobilized on a solvophobic or omniphobic solid support. The SERS-active particles can include silver, gold copper, aluminum, platinum, palladium, or a combination thereof. The solvophobic or omniphobic solid support can include a perfluoropolyether coating. The solvophobic or omniphobic solid support can include a material selected from the group consisting of silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals, or a combination thereof. The Raman spectrometer can include a laser source having an excitation wavelength in the range of about 400 nm to about 1000 nm. The range of wavenumbers of the Raman spectrum receivable by the OIT converter can be selected from the group consisting 200-4000 cm 200-1800 cm 1, and 2500-4000 cm 1.
[0008] In another aspect, embodiments of the present disclosure feature a method for identifying one or more components of a composition, wherein at least one of the components is present at a submicromolar concentration, the method comprising: (a) receiving Raman spectrum of a composition comprising one or more components of unknown chemical identity; wherein the Raman spectrum is a SERS-active Raman spectrum obtained by depositing a solution of the composition on a SERS-active substrate; (b) converting the Raman spectrum of the composition to an optical identification tag (OIT) comprising a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve fitting calculation to obtain a plurality of characteristic spectral peaks, wherein the position and width of each bar aligns to a characteristic spectral peak; (c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; and (d) matching the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identifying one or more components present in the composition. .
[0009] Another embodiment of the present disclosure is a Surface-Enhanced Raman Scattering (SERS)-active substrate for collecting Raman spectrum of an analyte present in a submicromolar concentration. The SERS-active substrate comprises particles of a SERS-active (i.e., plasmonic) material immobilized on a solid support to form a composite. The particles can be immobilized in an array of discrete spots. The discrete spots can have a diameter of about 150 to 250 pm. The immobilized particles can include nanoparticles having an average diameter in the range of about 10-100 nm. The immobilized nanoparticles can be nanocubes or nanostars. The particles can include silver, gold copper, aluminum, platinum, palladium, or a combination thereof. The solid support can have a solvophobic or omniphobic surface. The solid support can comprise silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals and combination thereof. The SERS-active substrate can be coated with a fluorinated or perfluorinated material. The particles can be coated with an omniphobic material. The SERS-active substrate can include a solid support comprising a perfluoropolyether (PFPE) fluid (a proprietary composition sold under the trade name GPL- 100) membrane (i.e., a PFPE-coated polymer membrane) having lH,lH,2H,2H-Perfluorodecanethiol- coated nanostars immobilized thereon. lH,lH,2H,2H-Perfluorodecanethiol-coated nanostars are referred to as nanostar@PF in the Examples below.
[0010] The details of one or more examples are set forth in the description below. Other features, objects, and advantages will be apparent from the description and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
[0011] This written disclosure describes illustrative embodiments that are non limiting and non-exhaustive. In the drawings, which are not necessarily drawn to scale, like numerals describe substantially similar components throughout the several views. Like numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0012] Reference is made to illustrative embodiments that are depicted in the figures, in which:
[0013] FIGS, la-c describe methods, according to one or more embodiments of the present disclosure: (a) describes a method of converting a Raman spectrum (i.e., spectral data) to an OIT, according to one or more embodiments of method 100; (b) shows an embodiment of a Raman spectra (or SERS) processing scheme; and (c) describes method 200 for identifying one or more components of a composition. [0014] FIGS. 2a-i describe a detailed spectrum processing procedure for cysteine (Cys), according to one or more embodiments of the present disclosure: (a) shows a raw Raman spectrum and the x- waveshift spectrum after calibration with a standard reference; (b) shows a compatible spectrum with even spacing; (c) shows a compatible spectrum smoothed using Sarvitzky-Golay filter with one order and 5 points window; (d) shows background subtraction using asymmetric least square smoothing with smoothing factor of 6 and asymmetric factor of 0.001; (e) shows a relative intensity correction (ISC) based on laser source and grating with a reference halogen-tungsten source; (f) shows a corrected spectrum normalized between 0 and a maximum intensity of 1 ; (g) shows peaks fitting (analysis) using Gaussian-Lorentz formula; (h) shows features refinement using second-order derivatives; and (i) shows conversion of final peaks to a barcode Optical Identification Tag (OIT).
[0015] FIGS. 3a-d show an AA component analysis scheme for peptides, according to one or more embodiments of the present disclosure: (a) shows conversion from the collected Raman spectrum collected at 473 nm laser wavelength to the OIT screening library of 20 AAs (See also FIG. 15); (b) shows conversion from collected Raman spectrum to the OIT of peptides; (c) demonstrates AAs screening process using distance metric for peptide Ab(15-20) with a sequence of QKLVFF (SEQ ID NO: 1) and the aromatic component phenylalanine (F); (d) shows AA composition results for peptides with chain length of 6 AAs Ab(15-20), 11 AAs Ab(25-35), and the full sequence Ab(1-42).
[0016] FIGS.4a-b show OIT according to one or more embodiments of the present disclosure accurately identified all non-aromatic AAs in Ab(15-20): (a) shows the identification of AA composition of the peptide Ab(15-20) at 473 nm using the 200-1700 cm 1 low wavenumber (LW) region and the 2500-4000 cm 1 high wavenumber (HW) region; and (b) shows the normalized intensity of Ab(15-20) and the AA components at 473 nm, 633 nm and 785 nm visualized in intensity map from 0 to 1 corresponding to the peak intensity, according to one or more embodiments of the present disclosure.
[0017] FIG. 5 is a flowchart of a Surface-enhanced Raman Scattering (SERS) spectra acquisition scheme, according to one or more embodiments of the present disclosure.
[0018] FIG. 6 is a fabrication scheme for a SERS-active substrate, according to one or more embodiments of the present disclosure. Step a) includes the plasmonic nanostructure synthesis procedure: icosahedral seed is grown to a nanostar, which is treated with a perfluorinated compound to form nanostar@PF; Step b) includes spin coating a Teflon membrane with lubricating liquid (SLIP), and then depositing nanostars@PF on SLIP-coated surface; c) analyte is deposited on a SERS-active spot prior to analysis.
[0019] FIGS. 7a-d provide a characterization of nanoparticles synthesized according to one or more embodiments of the present disclosure: (a) show Transmission Electron Microscopy (TEM) image of Au seeds and (b) show TEM image of Au nanostars; (c) shows a Scanning Electron Microscopy (SEM) image of the nanostars; (d) shows the UV- visible extinction spectra of the Au seed and star nanoparticles.
[0020] FIGS.8a-c provide a characterization of a SERS-active substrate according to one or more embodiments of the present disclosure: (a) shows an optical image of as deposited nanostars solution on a substrate; (b) shows an optical image of a single SERS- spot formed after drying of the nanostars solution; and (c) shows a SEM image of the SERS-spot.
[0021] FIGS. 9a-b describe the reproducibility of Raman signal on different SERS-spots on a SERS-active substrate according to one or more embodiments of the present disclosure: (a) shows the Raman spectra of Ab(15-20) at the concentration of 10 5 M deposited on 8 different SERS-spots; and (b) shows the intensity of aromatic vibration peak (1003 cm 1) in spectra of Ab(15-20) at the concentration of 105 M deposited on 8 different SERS-spots.
[0022] FIG. 10 shows a comparison between non-enhanced Raman spectra collected on CaF2 and Raman spectra collected on SERS-spot for mutated peptides: K28A-Ab(25-35) (2 x 10-7 mole on SERS-spot and 1.8xl0-12 mole on CaF2), M35G- Ab(25-35) (2.09 x 10-7 mole on SERS-spot and 2.3 x 10-12 mole on CaF2) and Ab(1-43) (1.08 x 10-8 mole on SERS-spot and 1 x 10-9 mole on CaF2) according to one or more embodiments of the present disclosure (SERS spectra shown in red and non-enhanced Raman spectra shown in black).
[0023] FIGS, lla-b show OITs of twenty AAs and five peptide samples according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
[0024] FIGS. 12a-b show AA composition results using OITs with full Raman spectral range (200-4000 cm 1) at different laser wavelengths for the peptide with chain length of 6 AAs Ab(15-20), 11 AAs Ab(25-35), and the full sequence Ab(1-42), according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
[0025] FIGS. 13a-b show AA composition results of Ab(15-20) using OITs at different laser wavelengths in the LW range (200-1700 cm 1) and the HW range (2500- 4000 cm 1), according to one or more embodiments of the present disclosure, using (a) 633 nm; and (b) 785 nm excitation.
[0026] FIG. 14a-c show AA composition results using Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) method using Raman spectra obtained at different laser wavelengths for peptide chains Ab(15-20), Ab(25-35), and the full sequence Ab(1-42), for comparison with one or more embodiments of the present disclosure, using (a) 473 nm; (b) 633 nm; and (c) 785 nm excitation.
[0027] FIG. 15 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 473 nm laser wavelength, according to more embodiments of the present disclosure.
[0028] FIG. 16 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 633 nm laser wavelength, according to more embodiments of the present disclosure.
[0029] FIG. 17 show the Raman spectra (left) and OITs (right) of 20 AAs and peptide samples at 785 nm laser wavelength, according to more embodiments of the present disclosure.
DETAILED DESCRIPTION
[0030] Embodiments of the present disclosure feature methods and systems for chemical composition analysis using Raman spectroscopy, including methods for analyte identification using a screening library of binary optical identification tags (OITs) containing encoded Raman spectra, and methods and systems for encoding a Raman spectrum in an OIT for library construction and sample analysis. The binary format of the OIT enables efficient, specific, and selective search of an OIT library that can be used to identify an unidentified chemical compound or unidentified components of a composition. The methods and systems described in the present disclosure permit automated OIT library construction and search with greater efficiency and reliability than methods requiring complex algorithms or machine learning.
Definitions [0031] The terms recited below have been defined as described below. All other ter s and phrases in this disclosure shall be construed according to their ordinary meaning as understood by one of skill in the art.
[0032] As used herein, “analyte” or “target” (or “targeted molecule”) refers to one or more molecules in the sample to be analyzed. The term “molecule” generally refers to a chemical composed of two or more atoms and includes macromolecules, molecules or polymers described herein. A molecule of interest to be analyzed can be any organic or inorganic molecule, or a combination thereof. Some non-limiting examples of analytes include small molecules such as active pharmaceutical ingredients, pharmaceutical excipients, nutraceutical agents, agricultural agents, formulation aids, hydrocarbons, biomolecules such as biologically active small molecules, nucleic acids and their sequences, peptides, and polypeptides. The analyte can be a molecule found directly in a sample such as a chemical composition, a manufactured food or agricultural product, a pharmaceutical composition, an environmental sample (e.g., a wastewater sample), a biological sample (e.g., vegetative material) or a body fluid. The analyte can be present in the sample to be measured in the solid, liquid, gas, or vapor phase. By “gas or vapor phase analyte” is meant a molecule or compound that is present, for example, in the top of the liquid, in the surrounding air, in the breath sample, in the gas, or a combination of any of the foregoing. The physical state of the gas or vapor phase can be varied by pressure, temperature and by the presence or addition of a salt which affects the surface tension of the liquid. The sample can be pre-treated to make the analyte easier to detect, e.g., by depositing on a Raman-active surface.
[0033] As used herein with respect to a Surface-Enhanced Raman Scattering (SERS)-substrate, an “array” is a group of intentionally positioned particles, which particles can be prepared synthetically and subsequently systematically arranged. For example, plasmonically-active nanoparticles can be grown in a solution and deposited as concentrated spots in an array of rows and columns.
[0034] The terms “solid support”, “support” and “substrate” refer interchangeably to a material or group of materials having a rigid or semi-rigid surface. In some embodiments of the present disclosure, at least one surface of the solid support is substantially flat. In some cases, a solid support of the present disclosure can have the form of beads.
[0035] As used herein, “omniphobic” refers to a surface characterized by display of contact angles greater than 150° and low contact angle hysteresis with both polar and nonpolar liquids possessing a wide range of surface tensions. An omniphobic surface of the present disclosure can support a composite interface with substantially all known liquids (i.e., essentially all known liquids).
[0036] As used herein, “optical identification tag” or “OIT” refers to a binary format, analogous to a barcode, encoding Raman spectral data collected from a substance, composition, or material. The terms “optical identification tag”, “OIT”, and “barcode” are used interchangeably in the present disclosure. The OIT is distinct from a “probe” or “probe molecule” that binds to a target molecule for target analysis, such as a nanoparticle probe labeled with an oligonucleotide and/or Raman-active dye, which are sometimes referred to as “Raman tags” or “labels” in other publications. The OITs of the present disclosure can be converted from Raman spectra without the need for modifying the analyte by chemical tagging or labeling or by using probes.
[0030] As used herein, “OIT converter” refers to a system or computer program product configured to convert Raman spectral data, received or collected, to an OIT. The OIT converter can include hardware, software (including firmware, resident software, micro-code, etc.) or a combination of software and hardware that may be referred to herein as a “processor,” “device,” or “system.” The OIT converter can be in the form of a computer program product embodied in one or more computer readable mediums having computer readable program code embodied thereon. Any combination of one or more computer readable mediums may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non- exhaustive list) of the computer readable storage medium include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable signal medium can include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to CDs, DVDs, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the OIT converter may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. The remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
[0037] As used herein, “nanostar” refers to a type of nanoparticle characterized by a spherical core with multiple protruding sharp tips, resembling a star.
[0038] The term “nanoparticle” as used herein refers to a particle that exhibits one or more properties not normally associated with a corresponding bulk material (e.g., quantum optical effects, etc.) and having at least two dimensions that do not exceed 1000 nm. For example, a nanoparticle can be a solid particle ranging from less than 1 nm to 1 ,000 nm in length, width, or diameter.
[0039] General embodiments of the present disclosure feature a method of generating an OIT for a sample chemical substance, compound, or composition comprising receiving a Raman spectrum of the sample; correcting the Raman spectrum; and processing the corrected spectrum to generate an OIT. The method can be computer implemented for automatic OIT generation. The method is optionally repeated for a plurality of samples of chemical substances, compounds, or compositions to build a library containing a sufficient number of reference OIT to perform a chemical composition analysis for the identification of an unidentified analyte based on the similarity of the analyte’s OIT to one or more reference OITs. In one or more embodiments, the Raman spectrum is a surface-enhanced Raman spectroscopy (SERS) spectrum. The sample can be an inorganic or organic molecule, including organic molecules selected from the group consisting of active pharmaceutical compounds (API), excipients, hydrocarbons, polymers, and biological material. In some cases, the sample is a heterogeneous mixture of chemical compounds. The identification can be performed without multivariate analysis. The received Raman spectrum can be the full Raman spectrum or a portion of the Raman spectrum. The spectral range may be about 800 to about 1800 cm-1, for example, which includes heavy atom stretching as well as X — H (X: heavy atom with atomic number>12) deformation modes. In some cases, multiple OITs can be generated from a single spectrum, using different spectral ranges.
[0040] General embodiments of the present disclosure include a method of identifying one or more components in a composition comprising receiving a Raman spectrum of a sample containing on or more unidentified substances; generating a query OIT from a Raman spectrum of the sample; and comparing the query OIT to at least one of a plurality of reference OITs using a binary similarity measure or binary distance measure; and identifying the substance based on the closest matches. The method of identifying can be computer-implemented. The received Raman spectrum can be a surface-enhanced Raman spectroscopy (SERS) spectrum.
[0041] While one or more of the embodiments of the present disclosure are described as methods, the skilled artisan would recognize that these methods be embodied as a system. The system can include computer-implemented embodiments including one or more of hardware and software (including firmware, resident software, micro-code, etc.) referred to herein as a “processor,” “device,” or “system.” Thus, general embodiments of the present disclosure also include a system for identifying a material, comprising a plurality of stored OITs, wherein each stored OIT is based on a Raman spectrum of an identified material; a barcode converter that receives a Raman spectrum of an unidentified material and generates a query OIT; and a query OIT analyzer that compares the query OIT and at least one of the stored OITs using a distance metric. [0042] Embodiments of the present disclosure are described below with reference to Figures with flowcharts. Each block can be implemented by computer program instructions, which can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the steps specified in the blocks. The computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the steps specified in the blocks.
[0043] The following Examples are intended to illustrate the above invention and should not be construed as to narrow its scope. One skilled in the art will readily recognize many other ways in which the invention could be practiced. It should be understood that numerous variations and modifications may be made while remaining within the scope of the invention.
[0044] FIG. la describes method 100 for converting a Raman spectrum to an OIT.Embodiments of method 100 can be used to build, construct, or assemble an OIT library. The OIT library can be used for quality assurance or control. The library can include a plurality of known chemical substances, compounds, and compositions relevant for the intended application. A library constructed using method 100 can be used to determine, confirm, or verify the identity of a composition or component of a composition, or to screen a composition for the presence of specific component. For example, for pharmaceutical quality control, such as verification of raw materials, the library can include OITs of the active pharmaceutical ingredient (API), including various polymorphs, one or more of the excipients to be used in the formulation, and common contaminants. For forensic analysis at a port of entry, for example, the library can include OITs of one or more contraband chemicals or compositions, in addition to the chemicals and compositions identified on a cargo manifest or bill of lading. For detection of a point mutation in a peptide, for example, the library can include a plurality of AA OITs as shown in FIGs. 15-17.
[0045] Method 100 receives a Raman spectrum that has been collected for a known substance, compound or composition and converts the spectral data to an OIT. The Raman spectrum is a plot of the intensity of Raman scattered radiation as a function of its frequency difference from the incident radiation (usually in units of wavenumbers, cm 1). The method permits extraction of characteristic peaks by encoding the distribution of strong or moderately strong vibrational peaks into the OIT. In one or more embodiments of the present disclosure, method 100 includes collecting the Raman spectrum of the known substance, compound, or composition. For example, method 100 can include collecting Raman spectra of precious compounds (e.g., limited-quantity or low concentration samples) using Surface-Enhanced Raman Spectroscopy (SERS).
[0046] Generating an OIT in method 100 includes a series of correcting steps 101a and a series of OIT processing steps 101b. The correcting steps 101a correct for instrumentation and environmental factors allowing for raw spectra to be collected from different Raman spectrometers and standardized. Then, processing steps in 101b convert the corrected spectra into a binary signal for referencing. The correcting steps include processing with mathematical algorithms to standardize the received spectrum for further processing into an OIT that is suitable for inclusion in an OIT screening library or for querying an OIT screening library. Correcting steps 101a can include processing a collected spectrum according to one or more calibrating steps, transformation steps, smoothing steps and background subtraction steps which reduce or eliminate irrelevant, random, and systematic variations in the spectral data, such as signal intensity variations linked to laser intensity, low signal-to-noise ratio, high background (fluorescence), and spikes. For example, correcting steps can include applying a missing point polynomial filter, a robust smoothing filter, a moving window filter, or a wavelet transform method to remove spikes. OIT processing steps 101b can include processing with mathematical algorithms to normali e the corrected spectrum, to perform curving to extract characteristic peaks of the corrected spectrum, and calculate a second derivative of the normali ed and/or curve fitted spectral data. The results of processing can be displayed as a barcode, with the normalized positive and negative values as color-coded bars positioned at the corresponding wavenumbers of the characteristic peaks of the Raman spectrum. The bar position and width can be substantially aligned with the peaks in the original Raman spectra.
[0047] Method 100 includes calibration step 102, to calibrate the accuracy of the wavelength axis (x-axis) of the Raman spectrum. In some cases, calibrating can include processing the x-axis of the collected spectrum based upon the spectrum of a standard reference material. In one or more embodiments, step 102 can include collecting the spectrum of the standard and the using the collected spectrum for calibration. The standard can be any Raman shift frequency standard (e.g., naphthalene, 1,4 bis(2- methylstyryl) benzene (BMB), 50/50 (v/v) toluene/acetonitrile, 4-acetamidophenol, benzonitrile, cyclohexane, or polystyrene).
[0048] Method 100 includes transformation step 104, comprising transforming the spectral resolution of the calibrated spectrum to match a common spectral resolution. The optimal spectral resolution for an OIT library will depend on the intended use of the library and the types of compositions or compounds being analyzed (e.g., whether the samples is amorphous or crystalline, whether there is hydrogen bonding in the sample, and whether analytes are present in primary, secondary, and/or tertiary structures in the composition). A common spectral resolution permits comparison of collected spectra and merging of data from different instruments or different time points of collection. Spectral resolution depends on the configuration of the spectrometer illumination, the grating dispersion, the spectrometer length (from grating to detector), and either exit slit or, in the case of CCD, the pixel size, etc. Transforming the spectral resolution can include processing the spectrum for compatibility with spectra collected or to be collected using different instruments. A common spectral resolution can be low, medium, or high. For example, low to medium resolution can be useful to identify a chemical component, while higher resolution may be required to reveal small changes in the shape or position of a spectral peak. In some cases, the common spectral resolution is 8, 4, 2, 1, or 0.5 cm 1. [0049] Method 100 includes smoothing step 106, comprising reducing the noise in the Raman signal of the calibrated spectrum. Electronic noise, such as flicker noise, shot noise and thermal noise is an unpredictable and constant occurrence which appears primarily in the spectral intensity. In some cases, smoothing includes calculating the average, the moving average, or the moving median of the spectrum. Smoothing can include applying a digital filter to the spectrum to increase the precision of the data. The digital filter is selected based on its ability to increase precision without distorting the signal tendency. In one or more embodiments, smoothing includes applying a smoothing algorithm using a window of 5-25 points, such as 5-13 points, 13-25 points, or 8-15 points. The window size selection can be determined for a specific composition based the characteristic peaks. In some cases, smoothing step 106 can include applying a moving polynomial filter, such as a Savitzky-Golay filter or a Fourier filter, to the spectrum. For example, method 100 can include smoothing using Savitzky-Golay filter with a 5-point window. [0050] Method 100 includes background subtraction step 108 using on or more methods of background correction. Exemplary methods include polynomial methods of background correction such as offset correction (e.g., subtracting a constant), subtracting a line, or approximating a polynomial through basis points; derivate methods of background correction; Fast Fourier Transform background correction with a high pass filter; and Wavelet transformation techniques. Derivate background correction methods include calculating a moving simple difference, calculating a moving average difference, and applying Savitzky-Golay filter. In some cases, step 108 includes creating a baseline function to be subtracted from the smoothed data resulting from step 106. For example, a baseline can be created using a first or second derivative method to define anchor points, or an asymmetric least squares (AFS) method. The baseline can be optimized using the AFS method by adjusting parameters such as the asymmetric factor (0-1) (i.e., the weight of points above baseline), the threshold, the smoothing factor (i.e., window size factor), and the number of iterations. In one or more embodiments, the smoothing factor can be 6. For positive peaks, the asymmetric factor can be close to zero, such as 0.001. Once the baseline is created, the baseline is subtracted from the spectral data to produce the background subtracted spectrum.
[0051] Method 100 includes relative intensity correction step 110. Intensity correction compensates for instrumental response. Raman spectra obtained with different instruments may show significant variations in the measured relative peak intensities as a result of differences in wavelength-dependent optimal transmission and detector quantum efficiency. The variations can be large when different laser excitations are used but can also occur over time with the same laser excitation is used. Relative intensity can be calibrated based on white-light calibration, tungsten or tungsten-halogen lamps, or luminescent glass standards designed for relative intensity calibration of Raman spectrometers operating with 1064 nm, 785 nm, 532 nm, or 488 nm/514.5 nm laser excitation. In some cases, the relative intensity correction is calibrated with a reference tungsten-halogen source for the chosen laser excitation wavelength and the grating used for acquisition.
[0052] Method 100 includes normalization step 112 to eliminate (compensate for) systematic differences among measurements resulting from changes that may have occurred during data acquisition (e.g., changes in laser power, differences in focusing depth, and sample volume differences). Normalizing is performed after baseline correction. Normalizing can include vector normalization, standard normal variate normalization, or Min-/Max-Normalization. For example, Min-/Max-Normalization can be used to scale the spectra from 0 to 1 (e.g., compare FIG. 2e and 2f).
[0053] Method 100 includes analysis step 114 for numerical determination of Raman peak parameters using curve fitting algorithms. The curve fitting can be based on vibrational strength to improve the precision in peak position and intensity. Low strength bands can be eliminated (e.g., compare FIG. 2f and 2g). The normalized spectral data can be fitted using Gaussian, Lorentzian, Gaussian-Lorentzian, Voigtian, Pearson type IV, and beta profiles. The fitting results provide the peak position, intensity, area, and full width at half-maximum values of the measured Raman bands. The most appropriate fitting profile should be selected depending on the nature of the Raman bands, which are related to the sample properties. Multiple peaks that are not clearly separated can be fit simultaneously provided the residuals in the fitting of one peak will not affect the fitting or the remaining peaks to a significant degree.
[0054] Method 100 includes refining step 116 for highlighting the important features of the spectra signature and minimizing the effect of variations due to acquisition artifacts and sample properties. Refining includes calculating the second derivative of the analyzed spectra.
[0055] Conversion of the refined (final) peaks to the OIT includes assigning a binary value (0 or 1) to each second derivative spectral data point primarily based on the sign or the value of the second derivative, i.e., 0 for upward curvature (positive second derivatives), and 1 for downward curvature (negative second derivatives). A threshold for zeros can be selected as a percentage of the maximum absolute value of the second derivative for negative second derivative readings (for all absolute value larger than the threshold, 1 is retained; otherwise the value is changed to 0). The OIT captures the characteristic peaks of the spectrum as positive values. Barcodes are assigned on the basis of the second derivative sign, with the position and thickness of the positive values (wavenumber cm 1) displayed as black bars (FIG. 2i).
[0056] Method 100 can be fully automated for high-throughput spectral processing. Current Raman-based characterization of compositions is complex and time-consuming, resulting in low throughput and limiting the statistical significance of the acquired data. High-throughput screening Raman spectroscopy would reduce or eliminate the complexity and reduce the extent of human involvement. The need to provide initial estimates of the number of peaks, their band shapes, and the initial parameters of these bands have presented an obstacle to the full automation of peak fitting and its incorporation into fully automated spectral-preprocessing workflows. Spectral conversion to OITs increases the quality of information obtained from Raman spectra and improves data visualization, thereby facilitating automation.
[0057] Method 100 can permit the determination of spectral classification within a priori library reference databases (i.e., libraries containing OITs of molecules of interest); and can be used to create an OIT library reference database. The OIT database can be stored and organized like a barcode database, with the OIT providing a reference which a computer uses to look up associated computer disk record(s) containing descriptive data and other pertinent information (e.g., chemical or composition name). The OIT of the database can be organized and stored in different file formats (e.g., industry standard formats) with metadata describing the instrument configuration parameters of the acquired spectrum, user comments, and/or user defined parameters. The data files can be organized in a file system. The OIT database can be organized in a sample-centric manner, or a sample/OIT-centric manner. The OIT database can be static or can permit periodic or continuous expansion to enhance identification.
[0058] Method 100 can be used to generate an OIT for identification of an unidentified chemical compound. The identification can be in real-time. The binary format of the OIT enables efficient, specific, and selective search that can be used to identify an unidentified chemical compound or unidentified components of a composition by comparing a query OIT with a stored OIT (e.g., an OIT of a screening library) using a binary similarity or binary distance measure, with greater accuracy than more complex algorithms which regularly used for qualitative Raman spectral analysis. The binary distance measure determines the number of points as which the binary codes of the query and reference OITs are different (i.e., the distance). The comparison can utilize one or more binary distance measures, such as the Hamming distance. The identity of the query OIT can be based upon the closest distance match.
[0059] A Raman spectrum from a sample will contain Raman information about the molecules present within the analysis volume of the system. Thus, the Raman spectrum of a sample comprising a composition including a mixture of molecules will contain peaks representing all of the different molecules. If the OITs of the molecules are known (e.g., stored in an OIT database), the composition OIT can be matched with known OITs to provide quantitative information about the molecules in the mixture. Thus, method 100 can include matching the query OIT (i.e., the OIT of the converted spectrum) and one or more stored OITs (i.e., reference OITs) based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identify one or more components present in the composition. In some cases, the OIT of a composition comprising several analytes can be matched with one or more OITs corresponding to one or more chemical compounds within the composition. For example, a peptide OIT can be matched several or all of the A A OITs of the primary sequence and permit identification of a point mutation (as demonstrated in the Examples below).
[0060] In one or more embodiments of the present disclosure, method 100 includes collecting spectra data of relevant reference samples to build the screening OIT library. A Raman spectroscopy system generally includes four major components to collect a Raman spectrum: (1) excitation source (e.g., laser or high brightness laser; (2) sample illumination system and light collection optics; (3) wavelength selector (e.g., Filter or Spectrophotometer); and (4) detector (e.g., photodiode array, CCD or PMT). The Raman spectroscopy system can be a bench-top or portable (e.g., handheld) system. The system can include software for selecting the optimum exposure time and number of exposures for a sample, a signal-to-noise ratio, maximum collection time, and other data collections parameters (e.g., autofocus), to give the best spectrum without saturating the detector. In some cases, the system is a Fligh Throughput Screening (FITS) Raman system. An FITS Raman spectroscopy system can use a combination of automated sample movement, autofocus devices, and automated data acquisition and analysis procedures to acquire spectra from hundreds of samples sequentially. Automated measurements can be integrated with robot handling to reduce or eliminate the need for expertise and operator intervention.
[0061] Method 100 can include receiving the entire collected spectrum or portions thereof. The OIT can be converted from the spectral data within a range of wavenumbers. For example, a range of wavenumbers can be selected from wavenumbers within the range of 200-4000 cm 1, such as 200-2000 cm 1, 200-1800 cm 1, 400-1800 cm 1, 500-2000 cm 1, 600-4000 cm 1, 2400-3800 cm 1 or 2500-4000 cm 1. In some cases, multiple OITs can be converted from spectral data for a single compound or composition using spectral data from more than one range of wavenumbers (e.g., low wavenumbers, high wavenumbers, and the full range of wavenumbers).
[0062] Turning now to FIG. lc, method 200 includes the use of one or more of the steps set forth in method 100 to identify one or more unidentified analytes, such as the molecular components of a sample composition. The method can use an OIT encoding a Raman spectrum to automatically resolve the individual, pure chemical species within a complex, heterogeneous sample to identify the components and/or verify the composition quality. In some cases, the identification performed in method 200 can be used to verify the composition of a sample for quality control.
[0063] Method 200 includes step 202 directed to receiving and storing an OIT database (e.g., a screening or reference OIT library) comprising a plurality of OITs, wherein each OIT corresponds to a molecule relevant to the sample composition. Data received and/or stored with the OIT can include information about the corresponding molecule, e.g., chemical identity (name). Receiving can include building an OIT library (as discussed above) or loading a pre-built screening library into the system for implementing method 200. As the method can be used for the identification or verification of a multi-component product, including formulated products such as pharmaceutical products, food, baby formula, agricultural compositions (e.g., fertilizers, pesticides, herbicides, animal feeds, etc.), cosmetics, paint (or other coating material), nutraceuticals, perfumes, plastics, biological samples, and petroleum compositions, relevant chemical compounds of the stored OIT can include expected chemical constituents of the type of formulated multi-component product being analyzed (e.g., candidates for determining a match). In some cases, the OIT library can include possible chemical contaminants associated with the product, the method of manufacture, the supply chain, and/or counterfeit versions of the formulated multi-component product. Step 202 can include collecting the Raman spectrum using a system comprising a Raman spectrometer, such as a Raman spectroscopy system as described above for method 100.
[0064] Method 200 includes step 204 directed to obtaining a sample composition containing at least one component of unknown or unverified chemical identity. The form of the sample composition can be a solid, powder, liquid, gel, emulsion, slurry, suspension, or gases. Method 200 does not require labeling any of the components of the sample for detection and identification. In some cases, none of the components of the sample are labeled or tagged.
[0065] Method 200 includes collecting step 206 to acquire the Raman spectrum of the sample composition using a Raman spectroscopy system, such as a system described above. Collecting a Raman spectrum can include exposing the sample composition to an excitation wavelength and detecting the Raman signal. The excitation wavelength can be the range of about 400 nm to about 1000 nm, such as 473 nm, 532 nm, 633 nm, and 785 nm. In some cases, multiple spectra are collected using different excitation wavelengths for each spectrum. The sample composition is exposed to the laser source for a sufficient period of time to generate a Raman signal. In some exemplary embodiments, the period of time ranges from less that about 1 second to about 3 hours. Two or more collected spectrum can be averaged to provide the Raman spectrum.
[0066] In some cases, the Raman spectrum is a Surface-Enhanced Raman Spectroscopy (SERS)-spectrum collected using SERS. Accordingly, in one or more embodiments of the present disclosure, step 206 includes preparing the sample for SERS by bringing an analyte in contact with, or adjacent to, a Raman-activate metal surface or structure (a “SERS-active structure”). The interactions between the analyte and the SERS-active structure cause an increase in the strength of the Raman signal. Therefore, SERS can be used to collect Raman spectra from samples containing unidentified compounds in submicromolar concentrations. For example, the concentration of the one or more analytes can range from picomolar to 1000 nM, such as equal or less than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 400, 500, 600, 700, 800, 900, and 1000 nM.
[0067] Step 206 can include collecting the Raman spectrum using a system configured for SERS. In some cases, the Raman spectrum is collected using a SERS- active structure such as a SERS-active substrate. Step 206 can include contacting the sample composition with a SERS-active structure by depositing the sample on a prepared SERS-active substrate.
[0068] The SERS-active substrate can be a composite material. In some embodiments, method 200 includes one or more steps for preparing the SERS-active substrate. For example, preparing the SERS-active substrate can include immobilizing particles of a SERS-active material to a solid support to form a composite. The particles can be immobilized randomly or uniformly (e.g., in an array). The particles can be immobilized in an array of discrete spots formed by depositing a drop of suspended Raman-active material on the solid support. The discrete spots can have a diameter suitable for collection of the Raman spectrum. For example, the SERS spots have a diameter greater than the laser spot diameter, such as diameter of greater than 10, 50, 100, 150, or 200 pm, and up to 250 pm. The uniformity of a substrate with an array of SERS- sensing spots can be confirmed by measuring variation in the intensity of a peak from a Raman standard at one or more of the spots. In one or more embodiments of the present disclosure, the variation in the signal intensity collected from a plurality spots of a SERS- active substrate as described above is less than 20%, such as 18%, 17% or 16% or less. [0069] The immobilized particles can include nanoparticles, such as nanospheres, having an average diameter in the range of about 10-100 nm. Particles of the composite can be prepared in various sizes and shapes to fine tune the surface polarization and plasmon resonance in SERS. Nanoparticles are a preferred size as they provide more plasmonically active surface area. In one or more embodiments of the present disclosure, the nanoparticle will have an effective average diameter (i.e., the smallest cross-section of the nanoparticle or plasmon-resonating layer) of about 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 nm. In some exemplary embodiments, the average diameter is in the range of about 10-20, 10-30, 10-40, 10-50, 10-60, 10-70, 10-80, 10-90, or 10-100 nm. Non limiting examples of nanoparticle shapes include spherical (nanospheres), cube shape (nanocubes), rod shape (nanorod) or wire shape (nanowires). In some cases, the nanoparticles have a spherical shape (nanospheres) with protrusions on the surface thereof. For example, gold nanostars with different shapes (number of protrusions) and plasmonic properties can be synthesized by changing the seed volume and concentration of HAuCU-
[0070] The density of the nanoparticles on the substrate may vary depending on factors such as the substances to be detected and the production process for the nanoparticles. Non-limiting examples of nanoparticle density include about 10-800 particles/pm2, such as about 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450 and 500 particles/pm2.
[0071] The solid support can be chemically resistant, and preferably repels one or more solvents. For example, the solid support can be an inherently solvophobic, or a solid support functionalized to be solvophobic by coating, grafting, or the like, with one or more solvophobic moieties. Preferably, the solid support is omniphobic. Suitable materials for the solid support include silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals and combination thereof. In some embodiments, the solid support is a silicon wafer or a polytetrafluoroethylene membrane (e.g., as sold under the trade name TEFFON). The surface of the solid support can be rendered super-resistant by grafting a sufficient number of low surface energy functional groups (e.g., CF2, CF2H, and/or CF3 groups), or by coating the surface with a lubricating fluid, such as a fluorinated or perfluorinated material selected from perfluorinated phosphates, perfluorinated silanes, fluorinated monomers, polymers and copolymers, and other fluorinated precursors. The thickness of the coating may range from about for example about 1 nm to about 300 nm. Exemplary embodiments of the thickness of the coating include about 1, 5, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 250 and 300 nm. The solid support can be coated to achieve a smooth surface.
[0072] The particles of the composite may be composed of various Raman active materials. In some embodiments, the immobilized particles (e.g., nanoparticles) include at least a noble metal such as aluminum, copper, silver, platinum, palladium, or gold. In some embodiments, the particles include alloys such as copper/silver/gold alloy (e.g., copper-silver alloy, copper-gold alloy, silver-gold alloy, copper-silver-gold alloy). In some embodiments, the particles are core-shell nanoparticles, which include a core of, for example silica, platinum, or other metal particles, onto which a layer is deposited, e.g., layers of Cu, Ag, or Au.
[0073] In some cases, the particles are coated with a solvophobic or omniphobic material, as discussed above with regards to the solid support. The particles can be coated before or after immobilization on the solid support. The composite can include a surface- functionalized solid support in combination with surface-functionalized nanoparticles. The same omniphobic coating (e.g., functional groups) may be used to functionalize the solid support and the particles or nanoparticles immobilized thereon, to provide an omniphobic SERS-active substrate. For example, a SERS-active substrate can include a perfluoropolyether (PFPE)-coated TEFLON membrane solid support having lH,lH,2H,2H-PerfluorodecanethioI-coated nanostars immobilized thereon. The PFPE fluid used to coat the solid support can be the proprietary fluid composition sold commercially under the tradename GPL- 100.
[0074] Step 206 can include contacting the above described composite with the sample composition to be analyzed, wherein the one or more components, are chemisorbed or physisorbed to the particles of the composite. The analyte, substance, or composition to be tested may be disposed on the nanoparticles in various forms. In some exemplary embodiments sample composition is deposited on or near the nanoparticle surface in a powder, a vapor, or a solution or suspension. For example, a SERS-active substrate comprising nanostars immobilized in an array of concentrated spots (“SERS- sensing spots” or “SERS spots”) (as shown in FIG. 6) can be contacted with the sample by applying microdrops of sample solution onto the spots.
[0075] In one or more embodiments of the present disclosure, step 206 can include collecting a SERS-spectrum by exposing the one or more substrate-bound components of the sample composition to an excitation wavelength and detecting the Raman signal. The excitation wavelength can be the range of about 400 nm to about 1000 nm, such as 473 nm, 532 nm, 633 nm, and 785 nm. In some cases, multiple spectra are collected using different excitation wavelengths for each spectrum. The sample composition is exposed to the laser source for a sufficient period of time to generate a Raman signal. In some exemplary embodiments, the period of time ranges from less that about 1 second to about 3 hours.
[0076] Method 200 includes spectral data converting step 208 whereby the collected spectrum is corrected and processed into a query OIT. Method 200 can include steps 101a and 101b as described above for method 100. Method 200 can include steps 102-110 and steps 112-116. Preferably, method 200 includes performing the same steps in the same order as the method used to obtain the stored OITs to generate the query OIT. Step 208 can be performed using a system for identifying one or more components of a composition, which system can include an OIT converter configured to receive and convert the Raman spectrum to an OIT.
[0077] Method 200 includes matching step 210 whereby the query OIT is systematically compared with at least one of the plurality of stored OITs. Method 200 does not rely upon principle component analysis (PC A) or discriminant function analysis (DFA) for classification or identification of the query OIT. Thus, the method does not include cluster analysis techniques. Method 200 includes comparing OIT using a bit- based measure, such as binary similarity or dissimilarity measures. Step 210 can include computing the similarity or dissimilarity (distance) between the query OIT and one or more of the stored OITs. The similarity metric can be a correlation- or noncorrelation- based metric. Dissimilarity can be computed using a binary distance metric, such as the binary Euclidean distance, the Bray and Curtis distance, or the Hamming distance.
[0078] In one or more embodiments of method 200, matching step 210 includes accepting or rejecting the computed dissimilarity of the query OIT to a stored OIT based on the Hamming distance. The computation includes adding the total number of times that two corresponding values in the two OITs disagree. Expressed as a fraction between 0 and 1, the Hamming distance between two random and independent OITs would be expected to be 0.5, since any pair of corresponding values has a 50% likelihood of agreeing and a 50% likelihood of disagreeing. Thus, if two OITs from different molecules are compared, their Hamming distance would be expected to be 0.5. If the stored OIT and the query OIT are from the same molecule, the Hamming distance would be expected to be considerably lower. The Hamming distance can be computed using the elementary logical operator XOR (Exclusive-OR) and thus can be performed more efficiently than complex algorithms. Matching can include a reasonable match as determined by a one or more predetermined thresholds for rejection. For example, a match can be a “similar” confidence match based on a first threshold and a “high” confidence match based on a second threshold. In some cases, step 210 can include calculating the decision confidence level. Yes/No decisions in OIT matching have four possible outcomes: either a given OIT is or is not a match, and for either of these two cases, the decision made can be correct or incorrect. The four outcomes can be described as a usually termed a true positive, a false negative, false positive and a true negative. The goal of method 200 is to maximize the likelihoods of true positives and true negatives, while minimizing the likelihoods of false positives and false negatives. Confidence algorithms ranking confidence in decreasing order and/or algorithms of degrees of disambiguation incorporating a distance functions may also be applied to evaluate proximity of the query OIT to one or more of the stored OITs. Step 210 can be performed using a system for identifying one or more components of a composition, which system can include an OIT analyzer configured to compare the query OIT to a reference OIT of the screening library and calculate a binary similarity measure or binary distance measure between the compared OITs. The OIT analyzer can be configured to match the query OIT to one or more reference OITs based on the calculated measure. For example, where a similarity measure is used to identifying a matching OIT, the match can be based on the calculated measure being greater than a threshold value. Where a distance measure, such as a Hamming metric is used to identify a matching OIT, the match can be based on the calculated measure being less than a threshold value. In some cases, the OIT analyzer can include a local processor configured to compare the query OIT with a reference OIT that is stored in a remote processor. [0079] Method 200 includes identifying step 212. Step 212 includes retrieving the chemical identity (e.g., chemical name, chemical structure, or index number (e.g., CAS No.)) of the molecule that corresponds to the stored OITs that matched with the query OIT, or a portion of the query OIT. Step 212 can be performed using a local processor configured to retrieve the chemical identity of the stored OITs from a remote processor. In some cases, the identifying step includes generating a list providing the chemical identities of any stored OITs that are identical to one or more regions of the query OIT, or of any stored OITs that are similar to one or more regions of the query OIT. In some cases, the identifying step includes generating a list providing the chemical identities of any stored OITs that are identical to one or more regions of the query OIT, or of any stored OITs that are similar to one or more regions of the query OIT.
[0080] Method 200 can be more accurate and efficient than existing methods of spectral matching algorithms such as MCR-ALS, principal component analysis, and machine learning, as demonstrated in the Examples below. Method 200 is more efficient than machine learning methods requiring large and comprehensive training data. Moreover, method 200 can be automated for high-throughput analysis and does not require direct matching of spectral data with point by point comparison, or of identified peaks derived from the spectrum. The need to provide initial estimates of the number of peaks, their band shapes, and the initial parameters of these bands have presented an obstacle to the full automation of peak fitting and its incorporation into fully automated spectral -preprocessing workflows. When used for the identification of incoming materials before entry into a manufacturing process, for example, to detect incorrect formulation, contamination, mislabeled containers or counterfeit materials, method 200 can improve the efficiency of process workflow compared with other methods of material identification.
[0081] Method 200 can be advantageous in a variety of fields. Method 200 can be used in a manufacturing, distribution, repackaging, or a healthcare environment (e.g., hospital formulary, gas monitoring during anesthesia, exhaled-gas monitoring, etc.). For example, method 200 can be used for process quality control in polymer, chemical, pharmaceutical, or food manufacturing by facilitating raw material identification (e.g., active pharmaceutical ingredients, excipients, pharmaceutical raw materials, pharmaceutical packaging materials) or testing for product composition and uniformity. Products to be analyzed can include pharmaceutical dosage forms, such as oral dosage forms, injectables, inhalants, intravenous solutions, transdermals, suppositories, ophthalmic preparations. Method 200 can be used for chemical detection in at ports-of- entry, custom facilities, import facilities, mail facilities, regulatory centers, for forensic analysis, toxicological evaluation, and detection of environmental pollutants. In some cases, method 200 can be used to identify a source of a composition or to confirm the labeling of a composition. For example, method 200 can be used for to detect and identify adulterants in pharmaceutical or food/beverage samples. Method 200 can be used in diagnostic applications. For example, method 200 can be used to detect and identify a metabolite or mutation (e.g., a point mutation in a peptide) that is associated with a pathological condition. [0082] Other embodiments of the present disclosure are possible. Although the description above contains much specificity, these should not be construed as limiting the scope of the disclosure, but as merely providing illustrations of some of the presently preferred embodiments of this disclosure. It is also contemplated that various combinations or sub-combinations of the specific features and aspects of the embodiments may be made and still fall within the scope of this disclosure. It should be understood that various features and aspects of the disclosed embodiments can be combined with or substituted for one another in order to form various embodiments. Thus, it is intended that the scope of at least some of the present disclosure should not be limited by the particular disclosed embodiments described above.
[0083] Thus the scope of this disclosure should be determined by the appended claims and their legal equivalents. Therefore, it will be appreciated that the scope of the present disclosure fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present disclosure is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular is not intended to mean “one and only one” unless explicitly so stated, but rather “one or more.” All structural, chemical, and functional equivalents to the elements of the above-described preferred embodiment that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address every problem sought to be solved by the present disclosure for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims.
[0084] The foregoing description of various preferred embodiments of the disclosure have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise embodiments, and obviously many modifications and variations are possible in light of the above teaching. The example embodiments, as described above, were chosen and described in order to best explain the principles of the disclosure and its practical application to thereby enable others skilled in the art to best utilize the disclosure in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the disclosure be defined by the claims appended hereto. EXAMPLE
[0085] Amino acid (AA) substitutions are directly correlated with specific pathologies such as Alzheimer’s disease, making their rapid screening and detection critical to treatment and scientific study. This example demonstrates a proof-of-concept implementation of the label-free and non-invasive Raman spectroscopy technique for the detection of AA substitutions in primary peptide fragments. By encoding the Raman “fingerprint” of individual AAs into binary formats called optical identification tags (OITs), a library of identifiers can be created, which can then be used for detecting mutations. When the recorded Raman signal is enhanced by using a surface-enhanced Raman scattering (SERS) substrate, the mutation screening strategy can detect a single point missense mutation in a 11 -AA peptide fragment of amyloid beta Ab(25-35) and a frameshift mutation in a 42- AA fragment Ab(1-42) down to picomolar concentrations. The combination of high sensitivity and simple operation make the use of OITs a promising approach for high-throughput automated screening.
[0086] In general, the present work describes a non-invasive and label-free detection method for analytes based on encoded Raman fingerprint tags or optical identification tags (OIT) and a method of creating a library of OITs for high-throughput mass screening of analyte. Raman spectroscopy is a vibrational technique, which has been used to provide direct measure chemical-specific information, from which non destructive and rapid detection is possible.
[0087] The method is demonstrated using peptides. By creating a library of OITs which containing Raman fingerprints of the individual amino acids (AA), we can screen the AAs composition of the peptide and detect the mutations. Moreover, these OITs can be amplified using a plasmonic substrate, referred to herein as a SERS-active substrate, based on surface-enhanced Raman scattering effect. To achieve this, a custom SERS sensor was created by depositing SERS-active symmetric gold nanostars on an omniphobic membrane. When the OITs are used together with the SERS-active substrate, the OIT screening strategy can detect a single frameshift and missense mutation of a peptide sample at sub-nanomolar concentration. The combination of high sensitivity and automated screening strategy allow faster and more sensitive composition screening for low availability sample concentration compared to other reported methods. The application shown in this work is demonstrated for single point mutation detection in the amyloid-beta peptides; however, this approach can also provide a general avenue for designing corresponding libraries used for selection and compositional screening. [0088] Established peptide composition analysis techniques, i.e., Edman degradation and mass spectrometry are time-consuming, low-throughput, and require complete sample destruction. Another vibrational technique such as Fourier transform infrared (FTIR) is not suitable for bio-application due to intense absorption of water in the infrared region. Previous works identified AAs components through indirect detection by chemical or physical conjugation of known reporters such as dyes or strong Raman scatterers. While appear promising, the labeling step required preparation of compatible tags and is prone to artifacts during multi-step synthesis procedure. On the contrary, a method as described in the present disclosure directly utilized the intrinsic Raman vibrational fingerprint of the analyte for identification. This facilitates the creation of a screening library of OITs, in which each tag represents a corresponding library member. The proposed detection strategy includes two main parts: encoding of the SERS-amplified Raman spectra of screening AAs into OIT library and decoding AAs composition of the peptide.
[0089] SERS-active substrate improves the sensitivity of the detection in the case of limited sample availability. The SERS-active substrate was fabricated by depositing highly symmetrical gold nanostars on an omniphobic membrane. The gold nanostars were synthesized from icosahedral symmetrical gold nanoseeds. Following, these stars were coated with perfluorinated ligand (PF) to prevent the direct conjugation of the gold surface to high-affinity thiol and amine groups in biomolecules. SERS-sensing spots were formed by depositing the SERS-active nanostructures onto a liquid-repellent lotus-leaf- inspired membrane fabricated using a slippery liquid-infused porous surface (SLIPS). Before analysis, the analyte solution was deposited on each SERS spot to collect the SERS spectrum of the analyte. The SERS-active substrate prevents the “coffee ring” defect during drying and improves the sensitivity of OIT analysis when the available concentrations of library members and targeted peptides are low. The general procedure of substrate fabrication and SERS signal collection is shown in FIG. 2.
[0090] The proposed OIT screening system was based on the direct conversion of the Raman spectra of each of the twenty AAs into a binary barcode tag (the OIT). Afterward, the OIT can be matched with the barcode equivalent of the Raman fingerprint of a given peptide to determine its composition. It is crucial to note here that this is not a sequencing method; instead, the mutation point was identified by comparing the composition analysis of wild-type and mutant. The procedure of encoding OIT library of screening AAs and the decoding AAs composition of the peptide was provided in FIG. 5. [0091] First, Raman spectra of the twenty AAs were collected using 473 nm laser. The OITs were subsequently generated by subjecting the spectra to autonomous processing steps as demonstrated in FIGs. 2a-i. After each conversion process, the spectra were reduced to a series of binary digits by labeling wavenumbers with positive intensities as 1 (black) and other wavenumbers as 0 (white). The final binary pattern was referred to as the OIT, in which each bar position and width aligned well with the encoded peaks in the original Raman spectra. Using this process, a unique OIT for each AA was collected to build the screening library and the binary barcode of six Alzheimer Ab peptides were also collected for comparison.
[0092] The correlations of the molecular vibrational spectrum and its structure allows the structural components to be derived from its spectra or its OIT equivalent. By a matching function (Flamming distance), the OIT of the peptide can be screened through the constructed library containing the OIT-encoded structural information of all the AAs to determine its composition. The AAs composition detection process was also optimized using different wavenumber range of the Raman spectrum and different excitation laser sources at 473 nm, 633 nm, and 785 nm. The optimization process was summarized in Table 1, which showed the highest performance of the encoding/decoding process was obtained when OITs were converted from the extended range (200-4000 cm 1) of the Raman spectra collected at 473 nm excitation laser wavelength.
[0093] The optical identification tag (OIT) technique can be used to convert a Raman fingerprint of an analyte into its own identification label, which enables tagging and tracking of the analyte using its unique OIT. In addition, this approach facilitates the encoding a screening library of OITs, in which each tag represents a corresponding library member. The correlation between of vibrational Raman spectrum (barcode tag) and the molecular structure together with the multiplex capabilities permits composition analysis based on shared features of the codes. Flere, this strategy demonstrates detection of a mutation point in peptide fragment by directly generate an OIT screening library from the Raman fingerprint of the amino acids and derive the amino acid composition of peptide chains. Then, a mutation point can be identified by comparing the composition analysis result of wild- type and mutant peptides.
1. Introduction
[0094] Changes in the structure of expressed proteins are an important hallmark of pathophysiology. Mutating one amino acid in the sequence can affect protein stability and lead to the formation of misfolded proteins. In Alzheimer’s disease, single -point mutations in the amyloid beta (Ab) protein result in aggregation into amyloid fibrils, which is recognized to be a key part of disease progression. Therefore, rapid and accurate identification of such structural changes at the primary protein structure level is essential for early diagnosis. Chemical assisted selective cleaving of the N-terminal AA, i.e., Edman degradation has been used extensively to analyze the sequence of Ab peptide. Although complete sequence information could be obtained using this method, the analysis is costly and unable to apply for N-terminally blocked sequence due to N- acetylation or cyclization of glutamic acid and glutamine. Several mass spectroscopy methods including matrix-assisted laser desorption/ionization (MALDI) and plasmonic enabled LDI mass spectrometry were developed to improve scanning time and sensitivity of the peptide analysis but were unable to discriminate structural isomers leucine and isoleucine component. Though both Edman degradation and mass spectrometry techniques have been well-developed to be effective detection methods for biomolecules, one existing common issue for these traditional peptide sequencing systems is that they require complete sample destruction. Hence, development of non-invasive methods is essential for better characterization and understanding of biological systems. Raman spectroscopy is one such promising technique that has been noted for its specific, label- free, and sensitive nature.
[0095] However, while Raman spectroscopy has been effectively used to study the conformation of Ab, there have been no reports exploiting the high sensitivity of Raman for peptide composition analysis.
[0096] In previous works, conjugated tags with unique Raman spectra have been utilized for peptide primary structure analysis. However, indirect methods require the chemical conjugation or physical binding of the molecule with a known tag (i.e., label). The tagging process itself is complicated and laborious as it requires the ad hoc preparation of compatible reactive tags and is prone to artifacts during coupling and purification. In contrast, label-free detection directly utilizes the spectroscopic fingerprint of the molecule as its identification tag. Consequently, it is possible to build a library of unique optical identification tags (OITs) - analogous to barcodes - that is amenable for mass screening. In addition, the strong interconnection between the vibrational Raman spectrum (barcode tag) and molecular structure, together with the multiplex capabilities, permits structural composition to be deduced from the shared features of the encoded spectra. Current use of Raman spectra as OITs for screening purpose has two major limitations. First, there is no general rule for coding representation for such barcode tag. Second, direct composition analysis is challenging when the peptide sample is present at low concentration due to the inherently weak nature of Raman scattering.
[0097] The example provides a robust encoding method that generates an OIT screening library directly from the Raman fingerprint of twenty proteinogenic AAs and derives the A A composition of Ab peptide fragments, specifically the Ab(15-20), Ab(25- 35) and Ab(1-42) peptides that are heavily studied in Alzheimer’s research. A mutation point can then be detected by comparing the AA composition of the mutant and wild- types peptides. To address the problem of weak signal, a sensitive SERS-active substrate composed of gold nanostars can be provided (FIG. 5) to amplify detection of mutant protein at picomolar concentrations based on plasmonic field Raman enhancement effect. An additional advantage of this substrate design is achieved by incorporating an omniphobic platform that prevents non-uniform enhancement of Raman spectra due “coffee ring” drying defect and allows a variety of carrier solvents to be used.
2. Results and Discussion
2.1. Design of a Raman Amplification Unit (SERS-active substrate)
[0098] The SERS-active substrate was fabricated by depositing gold nanostars on an omniphobic membrane. Nanostars were selected because of the higher magnitude of Raman enhancement due to their anisotropic geometry compared to smooth, spherical nanostructures. After a seed-mediated growth step from the icosahedral gold seeds according to routine methods, the stars were coated with perfluorinated ligand (PF) to prevent the direct conjugation of the gold surface to high-affinity thiol and amine groups in biomolecules (FIG. 5a). Characterization data of the seed and star nanoparticles including UV-visible extinction spectra, transmission electron microscopy (TEM) image and scanning electron microscopy (SEM) image were provided in FIGS. 6a-d. SERS- sensing spots were formed by depositing the SERS-active nanostructures onto a liquid- repellent lotus-leaf-inspired membrane fabricated using a slippery liquid-infused porous surface (SLIPS), (FIG. 5b). Before each Raman experiment, the analyte solution was deposited on each spot (FIG. 5c). Optical image of the substrate and SEM image of a single SERS-spot (about 150 pm in diameter) were shown in FIGS. 7a-c. To evaluate the substrate uniformity, Raman spectrum of Ab(15-20) at a concentration of 105 M was taken from eight SERS-spots (FIG. 9a). The intensity of the aromatic vibrational peak at 1003 cm 1 was used to confirm the reproducibility of the spectrum, which showed an average value of 122 counts per seconds (cps) with a standard deviation of 16% ((FIG. 9b). The SERS-active substrate improves the sensitivity of the OIT analysis when the available concentrations of the library members and targeted peptides are limited.
3.2. Development of the OIT Library
[0099] The OIT screening system was based on the direct conversion of the Raman spectra of each of twenty AAs and peptides into binary barcode tags (i.e., the OITs) (FIGS. 3a-b). First, Raman spectra of the samples were collected at 473 nm laser wavelength. The OITs were subsequently generated by subjecting the spectra to autonomous processing steps as fully detailed in the data analysis in the supporting information and the supplementary file. Demonstration of the encoding process is shown in FIGS. 2a-i) for one library member, cysteine (C), as an example. Analysis result of a l samples in this example is also provided in FIGS. 15-17. Briefly, the procedure involved spectral calibration, followed by background subtraction and peaks extraction. Afterward, the spectra were reduced to a series of binary digits by labeling wavenumbers with positive intensities as 1 (black) and other wavenumbers as 0 (white) (FIG. 2i). The final binary pattern was referred to as the OIT, in which the position and width of each bar aligned well with the encoded peak in the original Raman spectra. Using this process, a unique OIT for each AA was created to build the screening library (FIG. 3a). Single-letter code and vibrational peaks assignment for all AAs are shown in Table 1. The binary barcodes of six Alzheimer Ab peptides used in this study were also collected for AA component analysis (FIG. 3b). The AA sequence of these peptides was given in Table 2. The tagging system described in this example is distinctly different from the previously reported methods, as this procedure provides an autonomous tag generation and does not require post-synthetic modification of the compound with tags such as strong Raman scatterers or fluorescent probes.
Table 1. Single-letter code and peaks assignment of 20 AAs.
Figure imgf000035_0001
Figure imgf000036_0001
Table 2. The amino acid sequence of the Ab fragments used in this study. The mutation point in the mutant type appears in bold text.
Figure imgf000036_0002
Figure imgf000037_0001
[0100] As a preliminary demonstration of the OIT screening strategy, the OIT library collected at 473 nm was used to analyze the composition of three Ab peptides including Ab(15-20), Ab(25-35) and the entire sequence Ab(1-42) with 6, 11, and 42 AA residues, respectively. The correlations of the molecular vibrational spectrum and its structure allows the structural components to be derived from its spectra or its OIT equivalent. By a matching function (Hamming distance), the OIT of the peptide can be screened through the constructed library containing the OIT-encoded structural information of all the A As to determine its composition (FIG. 3c). Details of the screening procedure were provided in the data analysis in the supporting information and the supplementary file. The screening process resulted in true positives/negatives when AAs were correctly detected as present/not-present and false positives/negatives when AAs were mis-assigned (FIG. 3d). All AA components of Ab(15-20) were detected with one false negative value for tyrosine (Y). For Ab(25-35), one false negative proline (P) and one false positive glycine (G) were misidentified. For the long peptide Ab(1-42), only glycine (G) and methionine (M) were incorrectly identified as being present. Since glycine (G) is the smallest among all AAs and has no side chain, its signature could be overwhelmed by other components. Nonetheless, the OITs approach can detect the vast proportion of AAs present in each peptide chain, confirming the baseline sensitivity of the screening system.
[0101] Having ddemonstrated the potential of OIT screening, we measured the limit of detection (LOD) of the SERS-active substrate by examining its sensitivity to two single point missense mutations in Ab(25-35) and a frameshift mutation in Ab(1-42). The probability of correct identification of the AA composition at LOD level required the conservation of all spectral features of the peptide in its tag. Therefore, the LOD for the mutation point screening was determined at the highest Matthews correlation coefficient (MCC) between the decoded and the actual composition of the sequence. Table 3 compared the LOD results of the mutation detection for SERS and non-enhanced Raman analysis. While SERS spectra of the peptides contained similar chemical fingerprint information as conventional Raman (FIG. 10), measurements in SERS condition required significantly less amount of sample, which improved the LOD of Raman analysis and can be beneficial for limited sample clinical study. For the 11 -AA peptide Ab(25-35), a replacement of lysine (K) to alanine (A) [K28A-Ab(25-35)] and methionine (M) to glycine (G) | M350-Ab(25-35)| were both detected at concentrations close to 2 pmol using SERS-active substrates, which was five orders of magnitude lower compared to non-enhanced Raman analysis. For the full-length sequence Ab(1-42), the procedure correctly recognized the C-terminal insertion of threonine (T) [Ab(1-43)] at a concentration of 1 nmol using SERS-active substrate; but there was a negligible improvement in detection compared to non-enhanced Raman analysis due to the increase in the sequence length and the complexity of the structure. Still, the developed OITs screening strategy was capable of extracting AA compositions and individual mutations at sub-mihoI level.
Table 3. Comparison of Raman and SERS detection limits for two single point substitution mutation of Ab(25-35) and a C-insertion mutation of the full sequence Ab(1- 42) at 473 nm.
Figure imgf000038_0001
2.3. Optimized Screening Range and Sources of Excitation [0102] The correct determination of AA composition relied on the spectral features encoded in the OITs of both individual AAs and the target peptide, which contained information from two spectral regions - the low wavenumber (LW) (200-1700 cm-1) and high wavenumber (HW) (2500-4000 cm-1) regions. To examine the relative importance of both regions in encoding the OITs, we compared the following performance metrics (sensitive, specificity and MCC) of the AA composition screening for Ab(15-20) with the sequence of glutamine-lysine-leucine-valine-phenylalanine -phenylalanine (QKLVFF) (SEQ ID NO: 1) using only one of the two regions obtained at 473 nm excitation wavelength (Table 3). Despite the significant spectral information contained in the LW region, the overwhelming dominance of the aromatic signal at 1006 cm-1 from the aromatic ring in the side chain of phenylalanine (F) resulted in a low MCC correlation value of 0.25 when using only the LW region for screening, and 22% of the negatives residues detected were false negatives. Therefore, the LW region of the OIT could not be used alone for composition analysis of peptides containing both aromatic and non- aromatic A As. The ability of aromatic signals to overwhelm other signals in a spectrum has been reported in previous Raman study of peptides, indicating the necessity of including additional spectral indicators. In contrast, this effect was not observed in the HW region of the Raman spectrum, which mainly contained the stretching vibrations of the methyl and methylene (CH3 and CH2) group in the side chain of non-aromatic AAs. The OIT in this range accurately identified all non-aromatic AAs, but gave false identification results for ring-containing structures including tyrosine (Y) and phenylalanine (F) in Ab( 15-20) (FIG. 4a). When using only the F1W region, 90% (18 out of 20) of AAs were correctly identified in Ab(15-20) with two false detection values, which resulted in a MCC correlation value of 0.69. When both ranges were accounted for in the sequence OIT, the success rate rose to 95% with one false positive and a correlation of 0.77 (Table 4). Thus, the highest level of accuracy was achieved when both regions were utilized.
Table 4. Performance comparison of composition analysis for fragment Ab(15-20) using the LW (200-1800 cm 1), F1W (2500-4000 cm 1), extended range at 473 nm, 633 nm, and 785 nm
Figure imgf000039_0001
[0103] AA composition result of the mutant and wild type peptides was compared to detect the mutation point. The sequence information of the wild type and mutant peptide is provided in Table 3 and the result of the mutation point detection is provided in Table 4. Table 4 demonstrates that the OIT screening strategy can be implemented under normal Raman conditions without the SERS-active substrate and delivers a sensitivity at submicromolar concentrations.
[0104] Since the Raman spectra in SERS condition are highly dependent on the excitation condition, it is important to determine if the use of other laser excitation wavelengths would affect the accuracy of OIT screening. Therefore, OITs library of AAs and Ab peptides were constructed by acquiring the complete Raman spectral range with both LW and HW regions at different laser wavelengths of 633 nm and 785 nm (FIG. lla-b). Overall, identification results using OITs obtained at 633 nm and 785 nm were less accurate compared to 473 nm (FIG. 12a-b), which can be attributed to the decreased number of spectral features collected at longer excitation wavelengths (FIG. 4b). Based on the generated OITs, it was observed that the spectral patterns in the LW range remained similar for all laser wavelengths, but a lower proportion of the FiW region was converted into the OITs at 633 nm and 785 nm compared to 473 nm (FIG. 13a-b). One possible explanation for the decreasing signal acquisition was the inversely proportional relationship between Raman scattering efficiency and the fourth power of the excitation laser wavelength. Another contributing factor to the less satisfactory detection result using longer laser wavelength is the inherent lower sensitivity of the charge-coupled detector (CCD) in near-infrared range. Despite the added relative intensity correction in the calibration procedure to account for this issue, not all characteristic features of AAs in the FiW range were recorded, the OITs derived from longer wavelength sources had lower sensitivity and specificity. For the Ab(15-20) fragment, 7% and 20% of negatives were misidentified using 633 nm- and 785 nm-based OITs, while no false negatives were found at 473 nm (Table 4). Similarly, the rate of false positives increased from 17% at 473 nm to 20% at 633 nm and 50% at 785 nm. Therefore, careful selection of the excitation laser wavelength and the spectral region in the OITs is necessary for successful AAs identification.
2.4. OIT Compared with MCR-ALS
[0105] To benchmark this system with other screening systems, the AA composition derived from one of the established component analysis models using multiple component regression with alternating least square constraints (MCR-ALS) was compared with the OIT approach (FIG. 14a-c). A more detailed comparison of these multivariate statistical models is available in the literature. Overall, MCR-ALS analysis was less accurate than the OIT system, as it failed to identify two AAs in Ab( 15-20) (versus one for OIT), eight AAs in Ab(25-35) (versus two for OIT), and eleven AAs in the whole peptide Ab(1-42) (versus two for OIT). Complex algorithms such as MCR- ALS, principal component analysis, and machine learning have been tested to predict the aggregation propensity of proteins upon point mutation. However, the accuracy of these models is substantially dependent on large volumes of training data, which can be challenging for biological applications; while the SERS-enabled OIT strategy described here only required a minute quantity of sample. Furthermore, composition analysis using OITs has several advantages over other methods, including ease of library preparation and straightforward screening procedure. These advantages are particularly critical when compared with other analysis methods in terms of the cost of acquiring training data and complicated result interpretation.
3. Conclusion
[0106] The examples above detail the establishment of a reliable method for AAs identification in peptides using a screening library of binary optical tags (OITs) containing encoded Raman spectra of AAs and peptides. The specificity of the encoding system stemmed from the extended screening range, which combined the LW (200-1700 cm-1) and HW (2500-4000 cm-1) Raman spectral regions. Simultaneously, high sensitivity was achieved through the integration of a SERS-active substrate, resulting in the successful detection of point mutations of Ab peptides at picomol concentrations. Furthermore, the OIT strategy did not require labels or destructive preparation. The direct conversion of chemical information into an electronic tag without the need for complicated intermediate processes was shown to be useful in the mutations detection of Ab peptides. Although OIT screening for AAs was used here as a proof-of-concept to identify mutation points, the system is not limited to the analysis of peptide sequences. Rather, this study provides a general avenue for designing corresponding libraries used for selection and compositional screening. Finally, the fast and straightforward strategy by which vibrational fingerprints are encoded raises the possibility that OIT libraries can be used as a new method by which automation and machine learning can be used to rapidly collect, identify, and store proteomic information.
4. Experimental Section
4.1. Chemicals
[0107] Triethylene glycol (TEG), gold(III) chloride trihydrate (HAuCUGtbO), polyvinylpyrrolidone (PVP, average MW 55,000), distilled ultrapure water (DI), octylamine, hydrochloric acid (HC1) 37%, lH,lH,2H,2H-Perfluorodecanethiol (PF), N- dimethylformamide (DMF) were acquired from Sigma- Aldrich and used without further purification. Peptides were purchased from Sigma-Aldrich and Abeam. CaF2 Raman- grade slides were purchased from Crystran. GPF-100 solution (perfluoropoly ether (PFPE)) was purchased from Chemours. TEFLON (polytetrafluoroethylene (PTFE)) membranes were obtained from Thermo Fisher Scientific.
4.2. SERS-active substrate fabrication
(A) Preparation of gold nanostars:
[0108] Icosahedral gold seeds were synthesized as previously reported with some modification. PVP (0.1 g) in TEG (25 ml) was heated to reflux in a sand bath (330 °C), after which HAuCUGtbO (20 mg) in TEG (2 ml) was rapidly transferred. After 10 minutes, the reaction flask was cooled to room temperature. Ethanol was then added to the flask, and the entire solution was centrifuged at 600 g twice for 30 minutes, each time discarding the precipitate and keeping the supernatant. Afterward, the supernatant was centrifuged at 1125 g for 30 minutes. The pellet was resuspended in DMF and re centrifuged twice before the final pellet was dispersed in DMF (27 mL).
[0109] The growth solution to form stars was prepared as follows: DMF (15 ml) containing PVP (1.2 g), octylamine (40 pi), 2.5 M HC1 (60 mΐ) was stirred for 15 minutes. Next, the gold seed solution in DMF (1 ml, O.D. =2.17) was pipetted to the reaction flask and stirred for 2 hours at room temperature before immersion in an 80 °C oil bath for 4 hours. The nanostars were obtained by centrifugation at 400 g for 30 minutes and washed with ethanol twice before dispersion in 2 ml ethanol solution. Finally, the nanostars were then coated with PF, and excess reagent was removed by centrifugation and the nanostars were washed with ethanol five times.
(B) Final substrate fabrication:
[0110] The Teflon membrane was spin-coated with GPL-100 solution at 500 rpm for 1 minute, after which concentrated gold nanostars (2 mΐ) was deposited for each SERS- spot.
4.3. SERS measurements
[0111] To obtain SERS spectra of samples for binary tag conversion, the 0.01 M of AA (2 mΐ) was dissolved in 18 MW water and drop-coated each SERS-spot for screening library. The spectra of the samples were taken after drying at room temperature. Spectra were acquired using a Horiba Aram is dispersive micro-Raman spectrometer with a motorized XYZ stage equipped with a 50x objective lens (N.A. = 0.75) in backscattering configuration mode. Three wavelengths of laser excitation (473, 633, and 785 nm) were used for the evaluation. The laser power was kept constant at around 2 mW. Before each experiment session, the system was calibrated using naphthalene as a standard reference. All spectra represented averages of 16 measurements collected across a 4x4 point grid on the substrate with a point spacing of 1 pm. A spectral range of 200-4000 cm 1, an acquisition of 10 seconds and 3 accumulations were collected for each measurement point.
4.4. Raman spectroscopy measurements
[0112] To obtain Raman spectra of samples for comparison, the sample solid was crushed and milled to eliminate the effect of crystal orientation. CaF2 was used as the substrate for Raman measurement. Spectral data were collected in the same manner as with the SERS experiment.
4.5. Physical characterization
[0113] Gold nanostars and seeds were imaged with transmission electron microscopy (Tecnai G2 20 TWIN, FEI) after casting on a copper grid. Scanning electron microscopy (Nova 600 NanoLab, FEI) was used to image the nanostars deposited on ITO glass as well as the final SERS-active substrate. The UV-visible absorption spectra of the seed and star nanoparticles were measured using a spectrophotometer (Cary 6000, Agilent).
5. Data analysis
5.1. OIT conversion
[0114] An averaged spectrum for each sample was x-wavenumber calibrated with the reported Raman spectrum of the standard reference (FIG. 2a). The Raman shift calibrated spectrum was made compatible with a common spectral resolution of 1 cm 1 (FIG. 2b). Compatible spectrum was smoothed using a first-order Sarvitzky-Golay filter over a 5 -point window (FIG. 2c). Background subtraction was done using asymmetric least squares with a smoothing factor of 6 and an asymmetric factor of 0.001 (FIG. 2d). Relative intensity correction was calibrated with a reference halogen-tungsten source for the chosen laser excitation wavelength and the grating (FIG. 2e). Corrected spectrum was normalized to a maximum intensity of 1 (FIG. 2f). Peaks fitting was done using Gaussian- Lorentz formula (FIG. 2g). Features refinement was done using second-order derivatives (FIG. 2h). Final peaks were converted to a barcode format to use as OIT tag (FIG. 2i). A detailed procedure, including calibration steps and conversion steps, is also shown in FIG. lb.
5.2. Structural AA components detection [0115] First, the Hamming distance was used to count the total of features (Raman wavenumbers) in the binary sequence of the OIT representing the AA component that would need to change to be identical to the binary sequence of the peptide.
Figure imgf000044_0001
k is the index of wavenumber compared between spectrum A (individual AA) and B (peptide); y is the value of k-variable out of total N wavenumbers (size of the sequence) recorded in the barcode.
Demonstration of the process for screening of Ab( 15-20) and the aromatic component is shown in FIG. 3c.
[0116] The similarity score was calculated based on the structural similarity features between the scanned OIT sequence and the peptide and was normalized with respect to the sequence size (N). similarity score = N d NHAD * 100 (2)
[0117] A similar process was repeated for all the AAs in the library. The Raman spectra of all AAs and peptide in the screening process were collected using the same experimental conditions.
The lower limit of similarity score to determine the AA composition of the reference peptide was determined at the highest MCC correlation value. AAs composition of the mutated peptide was derived above the calculated limit value of the reference peptide.
5.3. The performance metrics
[0118] Mathews correlation coefficient (MCC) was used to calculate the correlation between true positive/ negative (TP/TN) and false positive/negative (FP/FN) values with the actual composition.
Figure imgf000044_0002
Sensitivity and specificity were used to evaluate the efficiency of the screening method for optimization.
Figure imgf000044_0003
5.4. MCR-ALS analysis [0119] Normalized Raman spectrum of AAs and peptides were used as input for Multivariate Curve Resolution-Alternating Least Squares analysis using MCR-ALS 2.0 toolbox. All spectra were x- waveshift calibrated and relative intensity corrected.
[0120] Various examples have been described. These and other examples are within the scope of the following cl aims.

Claims

WHAT IS CLAIMED IS:
1. A method for identifying one or more components of a composition, the method comprising:
(a) receiving a Raman spectrum of a composition comprising one or more analytes of unknown chemical identity;
(b) converting the Raman spectrum of the composition to an optical identification tag (OIT) comprising a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks and the position and width of each bar aligns to a characteristic spectral peak;
(c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; and
(d) matching the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identifying one or more components present in the composition.
2. The method of claim 1, wherein the received Raman spectrum is collected using an excitation wavelength selected from the group consisting of 473-, 532-, 633-, and 785- nm.
3. The method of claim 1, where the range of wavenumbers of the Raman spectrum used in the converting step is selected from the group consisting 200-4000 cm 1, 200-1800 cm 1, and 2500-4000 cm 1, or a combination thereof.
4. The method of claim 3, wherein correcting the spectrum comprises:
(a) calibrating the x-axis of the spectrum;
(b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library;
(c) smoothing the transformed spectrum;
(d) subtracting the background of the smoothed spectrum; and (e) correcting the relative intensity of the background corrected spectrum.
5. The method of claim 3, wherein processing the corrected spectrum comprises:
(a) normalizing the corrected spectrum to a maximum intensity of 1 ;
(b) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normalized spectrum to eliminate weak vibrational peaks;
(c) refining the remaining peaks using second order derivatives; and
(d) converting the positive values to the bars of the barcode.
6. The method of claim 1, further comprising creating the screening barcode library of reference OITs by:
(a) receiving Raman spectrum of one or more reference molecules; and
(b) converting the Raman spectrum of each reference molecule to a reference OIT, wherein the reference OIT comprises a barcode, wherein the position and width of each bar aligns to a peak of the Raman spectrum of the reference molecule.
7. The method of claim 1, wherein the Raman spectrum is a Surface-Enhanced Raman Scattering (SERS) spectrum obtained from a sample immobilized on a SERS- active substrate.
8. The method of claim 7, wherein the SERS-active substrate is solvophobic or omniphobic.
9. The method of claim 1, wherein the converting step includes encoding the Raman spectrum by:
(a) calibrating the x-axis of the spectrum;
(b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library;
(c) smoothing the transformed spectrum;
(d) subtracting the background of the smoothed spectrum;
(e) correcting the relative intensity of the background corrected spectrum;
(f) normalizing the corrected spectrum to a maximum intensity of 1 ; (g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normalized spectrum to eliminate weak vibrational peaks;
(h) refining the remaining peaks using second order derivatives; and
(i) converting the positive values to the bars of the barcode.
10. The method of claim 1, wherein the one or more analytes are selected from the group consisting of amino acids of a polypeptide, an active pharmaceutical ingredient or excipient of a pharmaceutical dosage form and an adulterant in a pharmaceutical composition, food, or beverage.
11. A system for identifying one or more components of a composition, the system comprising:
(a) a Raman spectrometer for collecting a Raman spectrum of a composition comprising one or more components, wherein the one or more components are unlabeled analytes;
(b) an optical identification tag (OIT) converter configured to receive and convert the Raman spectrum to an OIT comprising a barcode, wherein the configuration includes instructions for correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks, and wherein the position and width of each bar aligns to a characteristic spectral peak; and
(c) an OIT analyzer configured to:
(i) compare the OIT of the converted spectrum to a reference OIT of a screening library wherein the screening library comprises a plurality of reference OITs of components of known chemical identity;
(ii) calculate a binary similarity measure or binary distance measure for the comparison between the compared OITs; and
(iii) match the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identify one or more components present in the composition.
12. The system of claim 11, wherein correcting the spectrum includes
(a) calibrating the x-axis of the spectrum;
(b) transforming the spectral resolution of the calibrated spectrum to the spectral resolution of the OITs of the screening library;
(c) smoothing the transformed spectrum; and
(d) subtracting the background of the smoothed spectrum; and wherein processing the corrected spectrum includes:
(e) correcting the relative intensity of the background corrected spectrum;
(f) normalizing the corrected spectrum to a maximum intensity of 1 ;
(g) performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks by analyzing the vibrational strength of peaks of the normali ed spectrum to eliminate weak vibrational peaks;
(h) refining the remaining peaks using second order derivatives; and
(i) displaying the positive values as bars of the barcode.
13. The system of claim 10, wherein the Raman spectrometer is configured for Surface-Enhanced Raman Scattering (SERS).
14. The system of claim 13, further comprising a SERS-active substrate comprising SERS-active particles immobilized on a solvophobic or omniphobic solid support.
15. The system of claim 14, wherein the SERS-active particles include silver, gold copper, aluminum, platinum, palladium, or a combination thereof.
16. The system of claim 14, wherein the solvophobic or omniphobic solid support comprises a perfluoropolyether coating.
17. The system of claim 16, wherein the solvophobic or omniphobic solid support comprises a material selected from the group consisting of silicon, polymers, glass, silicon nitride, quartz, ceramics, sapphire, and metals, or a combination thereof.
18. The system of claim 10, wherein the Raman spectrometer includes a laser source having an excitation wavelength in the range of about 400 nm to about 1000 nm.
19. The system of claim 10, wherein the range of wavenumbers of the Raman spectrum received by the OIT converter is selected from the group consisting 200-4000 cm 1, 200-1800 cm 1, and 2500-4000 cm 1.
20. A method for identifying one or more components of a composition, wherein at least one of the components is present at a submicromolar concentration, the method comprising:
(a) receiving Raman spectrum of a composition comprising one or more components of unknown chemical identity; wherein the Raman spectrum is a SERS- active Raman spectrum obtained by depositing a solution of the composition on a SERS- active substrate;
(b) converting the Raman spectrum of the composition to an optical identification tag (OIT) comprising a barcode, wherein converting includes correcting the spectrum and processing the corrected spectrum into the barcode by performing a curve-fitting calculation to obtain a plurality of characteristic spectral peaks, wherein the position and width of each bar aligns to a characteristic spectral peak;
(c) comparing the OIT of the converted spectrum to a reference OIT of a screening library and calculating a binary similarity measure or a binary distance measure between the compared OITs, wherein the screening library comprises a plurality of reference OITs of components of known chemical identity; and
(d) matching the OIT of the converted spectrum and one or more reference OITs based on the calculated distance measure being less than a threshold value or the calculated similarity measure being greater than a threshold value; and thereby identifying one or more components present in the composition.
PCT/IB2020/055808 2019-09-20 2020-06-19 Encoding raman spectral data in optical identification tags for analyte identification WO2021053409A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962903314P 2019-09-20 2019-09-20
US62/903,314 2019-09-20

Publications (1)

Publication Number Publication Date
WO2021053409A1 true WO2021053409A1 (en) 2021-03-25

Family

ID=71738204

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2020/055808 WO2021053409A1 (en) 2019-09-20 2020-06-19 Encoding raman spectral data in optical identification tags for analyte identification

Country Status (1)

Country Link
WO (1) WO2021053409A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115060631A (en) * 2022-07-14 2022-09-16 长光辰英(杭州)科学仪器有限公司 Self-adaptive particle Raman similarity discrimination method
CN117877007A (en) * 2024-03-12 2024-04-12 生态环境部华南环境科学研究所(生态环境部生态环境应急研究所) Quick detection method for perfluorooctane sulfonate

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687963A (en) * 2005-03-24 2005-10-26 华中科技大学 Method for coding and identifying resin microsphere by infrared spectrum
US20110087439A1 (en) * 2009-10-09 2011-04-14 Ziegler Lawrence D Systems and methods for identifying materials utilizing multivariate analysis techniques
US20110212512A1 (en) * 2005-12-19 2011-09-01 Hong Wang Monitoring network based on nano-structured sensing devices
CN103186803A (en) * 2013-03-19 2013-07-03 南京大学 Raman-spectrum-based nanometer bar code smart label and identification method thereof
US20150185156A1 (en) * 2012-07-31 2015-07-02 Northwestern University Dispersible Surface-Enhanced Raman Scattering Nanosheets
US20160139052A1 (en) * 2014-11-19 2016-05-19 National Chung Cheng University Microfluidic biosensing system
CN106706594A (en) * 2016-12-23 2017-05-24 中国农业科学院农产品加工研究所 Method for achieving simultaneous detection of chemical structure and biological activity of functional component of food
AU2016201510A1 (en) * 2016-03-09 2017-09-28 Monash University Plasmene Nanosheets & Methods of Synthesis Thereof
US20180180482A1 (en) * 2016-12-26 2018-06-28 Nuctech Company Limited Raman spectrum-based object inspection apparatus and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687963A (en) * 2005-03-24 2005-10-26 华中科技大学 Method for coding and identifying resin microsphere by infrared spectrum
US20110212512A1 (en) * 2005-12-19 2011-09-01 Hong Wang Monitoring network based on nano-structured sensing devices
US20110087439A1 (en) * 2009-10-09 2011-04-14 Ziegler Lawrence D Systems and methods for identifying materials utilizing multivariate analysis techniques
US20150185156A1 (en) * 2012-07-31 2015-07-02 Northwestern University Dispersible Surface-Enhanced Raman Scattering Nanosheets
CN103186803A (en) * 2013-03-19 2013-07-03 南京大学 Raman-spectrum-based nanometer bar code smart label and identification method thereof
US20160139052A1 (en) * 2014-11-19 2016-05-19 National Chung Cheng University Microfluidic biosensing system
AU2016201510A1 (en) * 2016-03-09 2017-09-28 Monash University Plasmene Nanosheets & Methods of Synthesis Thereof
CN106706594A (en) * 2016-12-23 2017-05-24 中国农业科学院农产品加工研究所 Method for achieving simultaneous detection of chemical structure and biological activity of functional component of food
US20180180482A1 (en) * 2016-12-26 2018-06-28 Nuctech Company Limited Raman spectrum-based object inspection apparatus and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115060631A (en) * 2022-07-14 2022-09-16 长光辰英(杭州)科学仪器有限公司 Self-adaptive particle Raman similarity discrimination method
CN117877007A (en) * 2024-03-12 2024-04-12 生态环境部华南环境科学研究所(生态环境部生态环境应急研究所) Quick detection method for perfluorooctane sulfonate
CN117877007B (en) * 2024-03-12 2024-05-10 生态环境部华南环境科学研究所(生态环境部生态环境应急研究所) Quick detection method for perfluorooctane sulfonate

Similar Documents

Publication Publication Date Title
Kim et al. Surface enhanced Raman scattering artificial nose for high dimensionality fingerprinting
JP6895711B2 (en) Systems and methods for serum-based cancer detection
Blum et al. Tip‐enhanced Raman spectroscopy–an interlaboratory reproducibility and comparison study
US7256875B2 (en) Method for detection of pathogenic microorganisms
US7945393B2 (en) Detection of pathogenic microorganisms using fused sensor data
US20110237446A1 (en) Detection of Pathogenic Microorganisms Using Fused Raman, SWIR and LIBS Sensor Data
US20070155020A1 (en) Detection of chemical analytes by array of surface enhanced Raman scattering reactions
Zhou et al. Machine learning-augmented surface-enhanced spectroscopy toward next-generation molecular diagnostics
WO2021053409A1 (en) Encoding raman spectral data in optical identification tags for analyte identification
Dos Santos et al. Unraveling surface-enhanced Raman spectroscopy results through chemometrics and machine learning: Principles, progress, and trends
Ryder et al. A stainless steel multi‐well plate (SS‐MWP) for high‐throughput Raman analysis of dilute solutions
WO2003027619A2 (en) Devices and methods for verifying measurement of analytes by raman spectroscopy and surface plasmon resonance
Weng et al. Fast and quantitative analysis of ediphenphos residue in rice using surface‐enhanced Raman spectroscopy
Khan et al. From micro to nano: Analysis of surface-enhanced resonance Raman spectroscopy active sites via multiscale correlations
US20120252058A1 (en) System and Method for the Assessment of Biological Particles in Exhaled Air
Yu et al. Ultrasensitive amyloid β‐protein quantification with high dynamic range using a hybrid graphene–gold surface‐enhanced Raman spectroscopy platform
Ju et al. Identifying surface-enhanced raman spectra with a raman library using machine learning
Goodacre et al. Biofluids and other techniques: general discussion
Fitzgerald et al. Sensor arrays from spectroscopically-encoded polymers: towards an affordable diagnostic device for biomolecules
Waeytens et al. Determination of secondary structure of proteins by nanoinfrared spectroscopy
US7298474B2 (en) Plasmonic and/or microcavity enhanced optical protein sensing
US20120041689A1 (en) System and method for particle detection in spectral domain
CN109520950A (en) The insensitive chemical component spectral method of detection of a kind of pair of spectral shift
Hornemann et al. Qualifying label components for effective biosensing using advanced high-throughput SEIRA methodology
Kode et al. Raman labeled nanoparticles: characterization of variability and improved method for unmixing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20743793

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20743793

Country of ref document: EP

Kind code of ref document: A1