WO2014094039A1 - A background correction method for a spectrum of a target sample - Google Patents
A background correction method for a spectrum of a target sample Download PDFInfo
- Publication number
- WO2014094039A1 WO2014094039A1 PCT/AU2013/001472 AU2013001472W WO2014094039A1 WO 2014094039 A1 WO2014094039 A1 WO 2014094039A1 AU 2013001472 W AU2013001472 W AU 2013001472W WO 2014094039 A1 WO2014094039 A1 WO 2014094039A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- background
- signal
- points
- correction method
- Prior art date
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 300
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000003705 background correction Methods 0.000 title claims abstract description 73
- 230000003595 spectral effect Effects 0.000 claims abstract description 8
- 230000000694 effects Effects 0.000 claims description 34
- 238000001237 Raman spectrum Methods 0.000 claims description 20
- 238000001845 vibrational spectrum Methods 0.000 claims description 15
- 238000005033 Fourier transform infrared spectroscopy Methods 0.000 claims description 10
- 238000002460 vibrational spectroscopy Methods 0.000 claims description 8
- 235000009413 Ratibida columnifera Nutrition 0.000 claims description 6
- 241000510442 Ratibida peduncularis Species 0.000 claims description 6
- 238000002329 infrared spectrum Methods 0.000 claims description 4
- 238000001843 vibrational microscopy Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 28
- 238000009499 grossing Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 15
- 239000000523 sample Substances 0.000 description 14
- 238000011088 calibration curve Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 11
- 238000001069 Raman spectroscopy Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 8
- 238000003491 array Methods 0.000 description 8
- 238000012937 correction Methods 0.000 description 8
- 238000013459 approach Methods 0.000 description 7
- MTCFGRXMJLQNBG-REOHCLBHSA-N L-Serine Natural products OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 6
- 239000013078 crystal Substances 0.000 description 6
- 230000007423 decrease Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- CEQFOVLGLXCDCX-WUKNDPDISA-N methyl red Chemical compound C1=CC(N(C)C)=CC=C1\N=N\C1=CC=CC=C1C(O)=O CEQFOVLGLXCDCX-WUKNDPDISA-N 0.000 description 6
- PYWVYCXTNDRMGF-UHFFFAOYSA-N rhodamine B Chemical compound [Cl-].C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=CC=C1C(O)=O PYWVYCXTNDRMGF-UHFFFAOYSA-N 0.000 description 6
- 229960001153 serine Drugs 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- 239000000758 substrate Substances 0.000 description 6
- 230000010339 dilation Effects 0.000 description 5
- 239000010931 gold Substances 0.000 description 5
- 238000004416 surface enhanced Raman spectroscopy Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 229940024606 amino acid Drugs 0.000 description 4
- 230000006399 behavior Effects 0.000 description 4
- 239000012472 biological sample Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- -1 serine amino acid Chemical class 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 229910052737 gold Inorganic materials 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000004611 spectroscopical analysis Methods 0.000 description 3
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000003363 endpoint correction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000012535 impurity Substances 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 229940043267 rhodamine b Drugs 0.000 description 2
- 238000012306 spectroscopic technique Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000005079 FT-Raman Methods 0.000 description 1
- 238000003841 Raman measurement Methods 0.000 description 1
- 238000007605 air drying Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 239000008367 deionised water Substances 0.000 description 1
- 229910021641 deionized water Inorganic materials 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000010894 electron beam technology Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001704 evaporation Methods 0.000 description 1
- 230000008020 evaporation Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000010408 film Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000010453 quartz Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/27—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands using photo-electric detection ; circuits for computing concentration
- G01N21/274—Calibration, base line adjustment, drift correction
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N2021/3595—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using FTIR
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/65—Raman scattering
Definitions
- the present invention relates to methods for background correction of the spectrum of a target sample.
- the invention more particularly relates to methods for background correction of the spectrum of a target sample which does not require a de-noising or smoothing step before background removal to produce the background corrected spectrum.
- vibrational spectroscopy including for example, Fourier Transform Infrared and Raman
- Vibrational spectroscopy is a nondestructive technique, routinely used to qualitatively and quantitatively analyse materials by identifying their native structures and structural impurities. Vibrational spectroscopy can also be used to investigate the thermodynamics and phase equilibrium of a variety of materials.
- vibrational spectroscopy is a strong candidate for mapping biological samples due to the ability to differentiate biological components due to the differences in their resonance nature, especially by the position of signal peaks in vibrational spectra as well as their relative intensities, which is related to the quantity of each molecular structure.
- Vibrational signals from biological samples typically have lower signal-to- noise ratios and a higher magnitude of background arising mainly from auto- fluorescence. This arises particularly in relation to biological samples, due to the sensitivity of the biological samples to the incident wavelength of the laser. Since the existence of the background suppresses the main spectrum, interpretation becomes very difficult. Accordingly, a background correction method must to be applied to the spectrum before performing any detailed analysis of the spectra obtained from vibrational spectroscopy.
- a sample signal can be considered as an array (S) that can be given as:
- PS, B and N are related to the noiseless signal without background (pure spectrum), background and noise, respectively.
- noise and background signals must be removed from the experimental spectra.
- the reasons for requiring removal of noise and background from an experimentally obtained spectrum are diverse and application- dependent, however in most of the cases, it is necessary to apply a background correction algorithm to increase the effective resolution for quantitative analyses. Accordingly, significant efforts have been directed towards determining approaches for the removal of background from signals with higher accuracy, independent of signal nature and human error.
- BCMs background correction methods
- SNR signal-to-noise ratio
- NMM noise median method
- SRM signal removal method
- TBC threshold based classification
- each of the aforementioned groups of BCM's has associated disadvantages.
- a limitation of the first group of BCMs is that the existing noise in the signal requires de-noising and smoothing as a precursor to engaging in any background removal. This is because most BCMs in this group (e.g. SRM or TBC) essentially employ the derivative of a signal that is estimated numerically, which is overwhelming to calculate without first employing a smoothing process.
- SRM or TBC essentially employ the derivative of a signal that is estimated numerically, which is overwhelming to calculate without first employing a smoothing process.
- applying a smoothing procedure as a precursor to background removal can introduce unnecessary errors into the signals depending on the de-noising methodology employed. An erroneous de-noising process can further result in peak shifts or even peak suppression in the case of low SNRs.
- the frequency based methods divide the signal into components based on frequency, which can be rather daunting due to the fact that components in real signals do not have constant frequencies, i.e. the frequency component is not a constant value but consists of a range of frequencies, making selection of thresholds a challenge. Therefore, deconstructing the spectrum into different frequencies followed by rebuilding the final spectrum may leave traces of noise or background.
- a background correction method for a spectrum of a target sample including the following steps: (a) inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data; (b) estimating a signal-to-noise ratio (SNR) for the spectrum; (c) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT); (d) removing the plurality of signal peaks to identify the background; (e) subtracting the background from the spectrum to obtain a background corrected spectrum; and (f) outputting the background corrected spectrum as a target sample signal.
- SNR signal-to-noise ratio
- the position of at least most of the plurality of signal peaks may be determined by applying a second order derivative.
- the wavelet transform (WT) used to approximate the second order derivative of the spectrum is a "Mexican Hat" mother wavelet.
- the wavelet transform (WT) may be a continuous wavelet transform (CWT) or a discrete wavelet transform (DWT).
- the step of estimating a signal-to-noise ratio for the spectrum includes the following steps: (a) dividing the spectrum into a plurality of segments; (b) estimating a standard deviation for at least most of the plurality of segments; (c) estimating the background of the segments using a minimum estimated standard deviation; (d) calculating the root mean square (RMS) of a total background signal for a totality of segments of the spectrum; (e) calculating the root mean square (RMS) of the spectrum; and (f) calculating the signal-to-noise ratio.
- removing the plurality of signal peaks to identify the background includes the following steps: (a) determining a start point and a finish point corresponding to each signal peak by calculating zero crossing points corresponding to the start points and finish points by applying a second order derivative; (b) dividing the spectrum into sections each section corresponding to an individual signal peak or an individual feature comprising merged multiple signal peaks based on the start and finish points; (c) calculating an area for each section; and (d) selecting a minimum area corresponding to the section having a largest signal peak in the spectrum and using the minimum area to define a minimum threshold; wherein any signal peak having an area less than the minimum threshold constitutes background.
- Subtracting the background from the spectrum to obtain a background corrected spectrum may be preceded by the step of minimising the effect of a first and second endpoint of the spectrum.
- the step of minimising the effect of a first and second endpoint of the spectrum includes the following steps: (a) extending the spectrum from the first endpoint corresponding to the start of the signal by adding signal points based on the slope of the signal adjacent to the first endpoint; and (b) extending the spectrum from the second endpoint corresponding to the end of the signal by adding signal points based on the slope of the signal adjacent to the second endpoint.
- the spectrum is a vibrational spectrum.
- the vibrational spectrum may be a Raman spectrum or an Infrared spectrum.
- the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
- the background correction method of the present invention may be applied to a spectrum collected by vibrational spectroscopy or microscopy.
- an apparatus for producing a background corrected spectrum of a sample the sample being obtained by a spectroscopic device including a light source and a detector assembly for detecting photons scattered by the sample when illuminated by the light source, the apparatus including: a processor configured to execute a machine readable code to perform the following steps: (i) inputting a spectrum including a plurality of signal peaks attributable to spectral data, background and noise data; (ii) estimating a signal-to-noise ratio (SNR) for the spectrum; (iii) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT); (iv) removing the plurality of signal peaks to identify the background; (v) subtracting the background from the spectrum to obtain a background corrected
- SNR signal-to-noise ratio
- the spectrum is a vibrational spectrum.
- the vibrational spectrum may be a Raman spectrum or an Infrared spectrum.
- the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
- Figure 1 exemplifies the spectra generated artificially for the purpose of modelling the background correction method of the present invention.
- Figure 1 a shows the simulated spectrum having ten peaks with no background and no noise.
- Figure 1 b) shows the same simulated spectrum as Figure 1 a) with noise added at a SNR of 25.
- Figure 1 c) shows a linear simulated background.
- Figure 1 d) shows the simulated spectrum with noise of Figure 1 b) and the linear background of Figure 1 c) added.
- Figure 1 e shows a sigmoidal simulated background.
- Figure 1f shows the simulated spectrum with noise of Figure 1 b) and the sigmoidal background of Figure 1 e) added.
- Figure 1 g shows a sinusoidal simulated background.
- Figure 1 h) shows the simulated spectrum with noise of Figure 1 b) and the sinusoidal background of Figure 1 g) added.
- Figure 2 shows a general flow chart for the background correction method of the present invention.
- Figure 3 shown a more detailed version of the background correction method of Figure 2.
- Figure 4 shows a detailed flowchart corresponding to step 310 of the flowchart shown in Figure 3 relating to preliminary data processing.
- Figure 5 shows a detailed flowchart corresponding to step 320 of the flowchart shown in Figure 3 relating to estimating SNR.
- Figure 6 shows a detailed flowchart corresponding to step 330 of the flowchart shown in Figure 3 relating to calculating the 2 nd derivative of the spectrum and correcting their end effects and noise.
- Figure 7 shows a detailed flowchart corresponding to step 340 of the flowchart shown in Figure 3 relating to peak removal and finding background points.
- Figure 8 shows a detailed flowchart corresponding to step 350 of the flowchart shown in Figure 3 relating to adjusting end points effects.
- Figure 9 shows end point adjustment in accordance with Condition 2.
- Figure 9a shows the full original spectrum before adjusting the end points.
- Figure 9b shows the magnified regions of the start points after adjusting the end points.
- Figure 9c shows the magnified regions of the finish points after adjusting the end points.
- Figure 10 shows end point adjustment in accordance with Condition 3.
- Figure 10a shows the full original spectrum before adjusting the end points.
- Figure 10b shows the magnified regions of the start points after adjusting the end points.
- Figure 10c shows the magnified regions of the finish points after adjusting the end points
- Figure 1 1 shows end point adjustment in accordance with Condition 4.
- Figure 1 1 a shows the full original spectrum before adjusting the end points.
- Figure 1 1 b shows the magnified regions of the start points after adjusting the end points.
- Figure 1 1 c) shows the magnified regions of the finish points after adjusting the end points.
- Figure 12 shows a detailed flowchart corresponding to step 360 of the flowchart shown in Figure 3 relating to fitting and adjustments.
- Figures 13a) to 13f) shows a single artificially synthesized Gaussian peak without noise and with SNR values of 10, 20, 30, 40 and 50 respectively.
- Figure 14 shows the variation of correlation coefficient (r) with SNR
- Figure 15 shows a calibration curve for the signals shown in Figure 14.
- Figure 16 shows the effect of peak width on Best-Scale in different SNR values.
- Figure 17 shows the signal-to-noise (SNR) estimation in accordance with an embodiment of the present invention.
- Figure 17a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sigmoidal background, wherein the shaded region represents the segment size for calculating the standard deviation (STD).
- Figure 17b shows the STD for different segments of the spectrum.
- Figure 17c shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 17d shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 17e shows the different smoothing levels of the spectrum.
- Figure 18 shows the signal-to-noise (SNR) estimation in accordance with another embodiment of the present invention.
- Figure 18a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with linear background, wherein the shaded region represents the segment size for calculating the STD.
- Figure 18b shows the STD for different segments of the spectrum.
- Figure 18c shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 18d shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 18e shows the different smoothing levels of the spectrum.
- Figure 19a shows the synthetic linear spectrum with start and finish points determined.
- Figure 19e shows the original spectrum of synthetic linear data together with the background corrected spectrum of synthetic linear data.
- Figure 20a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sinusoidal background, wherein the shaded region represents the segment size for calculating the STD.
- Figure 20b shows the STD for different segments of the spectrum.
- Figure 20c shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 20d shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 20e shows the different smoothing levels of the spectrum.
- Figure 21 a shows the synthetic sinusoidal spectrum with start and finish points determined.
- Figure 21 e shows the original spectrum of synthetic sinusoidal data together with the background corrected spectrum of synthetic sinusoidal data.
- Figure 22 shows the accuracy tests for the SNR estimation algorithm.
- Figure 22a shows the variation of SNR with smoothing.
- Figure 22b shows the effect of background intensity where the background intensity ratio is calculated by dividing values of the intensity of highest peak in the spectrum with the background to intensity of the spectrum without the background.
- Figure 23a shows the synthetic spectrum with ten peaks randomly distributed on a signal with an SNR of 20 with no end effect correction.
- Figure 23b) shows the second derivative of the spectrum without end effect correction through wavelet transform at a Best Scale of 17.
- Figure 23c shows extension for the spectrum form the first and second end points comprising end effect correction of the spectrum.
- Figure 23d shows the second derivative of the spectrum with end effect correction through wavelet transform at a Best Scale of 17.
- Figure 24b shows the numerical second derivative of the spectrum of Figure 24a).
- Figure 24c shows the second derivative of the spectrum through wavelet transform at a Best Scale of 17.
- Figure 24d shows the squared second derivative to suppress noise.
- Figure 25 shows the variation of the degree of separation (R) with position in two similar Gaussian peaks.
- Figure 25a shows two Gaussian peaks where the value of R is 0.37 together with their second and third derivatives.
- Figure 25b shows two Gaussian peaks where the value of R is 1 .1 1 together with their second and third derivatives.
- Figure 25c shows two Gaussian peaks where the value of R is 2.60 together with their second and third derivatives.
- Figure 26a shows the start and finish points for each peak signal in a spectrum with background.
- Figure 26b shows the start point condition of the spectrum.
- Figure 26c shows the finish point condition of the spectrum.
- Figure 26d shows the background points and their fittings to determine the background.
- Figure 26e shows the original spectrum together with the background corrected spectrum obtained by subtracting the background from the unprocessed spectrum.
- Figure 27 shows the root mean squared error (RMSE) obtained over 900 iterations of the background correction method of the present invention and the distribution of the RMSE with the number of iterations.
- RMSE root mean squared error
- Figure 28 shows the variation of RMSE with the number of signal peaks in the spectrum.
- Figure 29 shows the variation of RMSE with the SNR in the spectrum.
- Figure 30 shows examples of the application of the background correction method of the present invention for experimentally obtained real Raman spectra.
- Figure 30a shows application of the background correction method for experimentally obtained Raman spectra for serine amino acid.
- Figure 30b shows application of the background correction method for experimentally obtained Raman spectra for rhodamine.
- Figure 30c shows application of the background correction method for experimentally obtained Raman spectra for methyl red.
- Figure 30d shows application of the background correction method for experimentally obtained Raman spectra for crystal violet.
- Figure 31 is a schematic diagram showing various functional elements of a computer-enabled system for performing the background correction method of the present invention in block form.
- Wavelets transforms like Fourier Transforms (FT) are a convolution between a wavelet function ( ⁇ ) and signal ( x(t) ).
- FT Fourier Transforms
- ⁇ wavelet function
- x(t) signal
- b is the parameter for transition and a represents dilation (that is always a positive integer). If representing an average frequency, b indicates the position of a wavelet window.
- WT can be divided into two main categories, Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT) and can be defined as: where the asterisk ( * ) represents complex conjugation. This equation can also be given as:
- the n th derivative of a signal can be estimated using Gaussian wavelet applied n times to the spectrum or a proper n th derivative of the Gaussian function.
- noise and spectrum have different frequencies where lower frequency components are related to the higher dilation (scale) numbers. Accordingly, the side effects of noise of the transformed signal can be suppressed by increasing the dilation. This allows the derivative of a noisy spectrum to be approximated by reducing the influence of noise.
- Spectroscopic techniques employ derivative calculations as a resolution enhancement technique, especially the 2 nd order derivatives for extracting peak characteristics such as position and start and finish points of a signal.
- higher order derivatives can be used for locating and deconvolving overlapping peaks.
- signal peaks can be mined from a spectrum where the remaining points of the spectra would be the representative segments of background that can be further used for background estimations.
- the presence of noise in these signals can be a serious drawback in finding signal peaks and calculating derivatives of experimental spectra.
- One technique used to calculate the derivative is "Numerical Calculation". Due to the random nature of noise in experimental spectra, a numerical calculation results in noisy signals especially where the spectrum has a low SNR, which makes spectral smoothing an essential process.
- Figure 1 a) shows the simulated spectrum having ten Gaussian peaks with no background and no noise.
- Figure 1 c) shows a linear background simulated in accordance with the following formula:
- Figure 1 d) shows the simulated spectrum with noise added of Figure 1 b) with the linear background of Figure 1 c) added.
- Figure 1 e) shows a sigmoidal background simulated in accordance with the following formula:
- Figure 1f) shows the simulated spectrum with noise added of Figure 1 b) with the sigmoidal background of Figure 1 e) added.
- Figure 1 g) shows a sinusoidal background simulated in accordance with the following formula:
- the spectrum has a signal without noise, its 2 nd derivative is readily calculated numerically. If noise with a known SNR is added to this signal using VVT, the resultant transformed spectrum can be calculated at different scales. Thereafter, a comparison of the resultant spectrum at each scale with a noiseless 2 nd derivative of the signal and the respective correlation coefficient values, provides the variation of correlation coefficient with increase in SNR.
- the method includes inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data.
- the method includes estimating a SNR for the spectrum.
- the method includes determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (VVT).
- VVT wavelet transform
- the method includes removing the plurality of signal peaks to identify the background.
- the method includes subtracting the background from the spectrum to obtain a background corrected spectrum.
- the method includes outputting the background corrected spectrum as a target sample signal.
- Step 310 involves inputting the spectrum to be background corrected.
- Step 320 involves estimating SNR.
- Step 330 involves calculating a 2 nd derivative of the spectrum using VVT in the Best-Scale using the SNR estimated at step 320.
- Step 340 involves finding the start points and finish points of signal peaks in the spectrum through the estimated 2 nd derivative and identifying the points related to background by removing the signal peaks from the spectrum.
- a n th order polynomial function is fitted to the background points to adjust the end point effects.
- the polynomial function fitting is adjusted.
- the background correction is then applied by subtracting the fitted background points from the spectrum. More detailed flowcharts for each of the steps 320 to 360 are provided as Figures 4 to 8 and Figure 12.
- Data processing 400 involves reading the spectrum at step 410 and isolating or cutting the regions of interest in a particular spectrum 420 to (i) increase the visual resolution, (ii) increase the accuracy due to a decrease in the magnitude of the calculation, and (iii) minimize unexpected errors due to abrupt variations in background or impurities added by other erroneous peaks.
- This mirrors processing of experimental spectroscopic data where often only a section of the spectrum is required for spectral analysis.
- the data points are adjusted to a maximum of 5000 points at step 430 and the data processing step 310 is then complete top allow the SNR to be estimated at step 320.
- RMS represents the root mean square.
- This process 500 is expanded in the flowchart shown in Figure 5.
- the input data is processed in accordance with the process described in reference to Figure 4.
- the spectrum is divided into a plurality (30 in this example) segments or scanning windows of equal length (X-axis) for the purpose of estimating the noise profile.
- a standard deviation (STD) is estimated for each or at least most of the segments.
- STD standard deviation
- the minimum local standard deviation is used and assumed as the temporary background to identify a noise profile.
- the root mean squared (RMS) of the noise profile of the whole spectrum is calculated.
- the spectrum is smoothed using a Savitzky-Golay filter at different levels from 0.1 to 0.9, followed by subtracting each of them from the spectrum to provide a temporary background correction.
- the SNRs of the temporarily background corrected signals are calculated using equation (10) above.
- the average of the SNRs of the temporarily background corrected signals are used to choose the Best-Scale depending on the signal peak width.
- the 2 nd derivative of the spectrum is calculated and its endpoint effects and residual noise is corrected.
- step 610 the SNR is estimated in accordance with the method described in reference to Figure 5. Due to the discrete nature of a spectrum, artificial peaks are typically generated at both the ends of the transformed signals during transformation. To address this issue, at step 620, points are added to the start and the end of the original spectrum to shift and restrict the influence of this erroneous endpoint effect. After transformation at step 630, the erroneous areas are readily removed from the signal and the 2 nd derivative. The points are added such that there is minimal discontinuity or changes to the slope of the spectrum since this would generate considerable artificial peaks in the derivative spectrum.
- Step 650 signal peak removal and identification of the background points as in step 340 of Figure 3.
- the background correction method employs a signal removal method (SRM) 700.
- SRM signal removal method
- the 2 nd derivative of the spectrum is calculated and its endpoint effects and residual noise is corrected.
- the first step 720 of the signal removal method involves the isolation of peaks from the signal (i.e. the residual corresponds to the background).
- the signal peak start points and finish points are identified using the 2 nd derivative obtained at step 710 (also see Figure 6 for detail regarding calculation of the 2 nd derivative of the spectrum).
- the start points and finish points for each signal peak correspond to the zero crossing points. Based on the zero crossing points calculated at step 720, the spectrum can be divided into sections, each comprising a discrete start and finish point pair.
- step 730 the areas of each section within the 2 nd derivative spectrum of a particular signal defined by a zero crossing pair is calculated, followed by the selection of the minimum (i.e. the largest negative) local area which corresponds to the largest and sharpest peak in the signal.
- step 740 any local areas smaller than the threshold calculated in accordance with the following formula are considered as background:
- the background points are saved into fitting arrays of FIT_X (wavenumber) and FIT_Y (intensity).
- the next derivative of SSDS i.e. the endpoint corrected 2 nd derivative of the spectrum
- CVVT the next derivative of SSDS
- the positive area is scanned for the minimum extreme points, which correspond to zero crossing points of the first derivative of SSDS with negative slopes.
- estimation of the background using signal- deprived spectrum is based on fitting of residual points with a n th order polynomial function.
- the signal peaks are removed and background points identified as previously described.
- the fitting may select any arbitrary condition, likely to result in failure to provide correct background correction towards the signal endpoints.
- One approach employed to address this problem is to continue the minimum of the nearest background point as a horizontal line. However, this approach produces an artificial offset at the ends of the spectrum. In order to better address the issue, 100 points are fitted with a cubic polynomial to each of the start and finish points of the spectrum at step 810.
- This step decreases the effect of noise in the selected sections of the spectrum.
- the conditions relate to the endpoints of the spectrum and can be divided into four main categories incorporating subclasses 0 - 6.
- the subclasses are determined based on FIT_X(1), X(1), FIT_Y(1), Y(1) and SlopeS that relate to the first point of fitting arrays (wavenumber), first point of spectrum (wavenumber), first point of fitting arrays (intensity), first point of spectrum (intensity) and the slope of the fitted cubic polynomial for the initial 20 points, respectively. Only the start point relating to the various subclasses are explained here, but the same can be extrapolated to the finish points.
- Figure 9a shows the full original spectrum before adjusting the endpoints
- 9b) shows the magnified regions of the start points, after adjusting the end points
- 9c) shows the magnified regions of the finish points, after adjusting the end points.
- Figure 10 for end point adjustment in accordance with Condition 3.
- Figure 10a shows the full original spectrum before adjusting the endpoints
- Figure 10b shows the magnified regions of the start points after adjusting the end points
- Figure 1 1 a) shows the full original spectrum before adjusting the end points
- Figure 1 1 b) shows the magnified regions of the start points after adjusting the end points
- Rhodamine B, crystal violet and methyl red were purchased from Merck Chemicals and L-serine amino acid was purchased from Sigma-Aldrich. All chemicals were used without further modifications.
- the metal layers were deposited by a BalzersTM electron beam evaporator.
- the layer composed of 1000 A Au with an underlying 100 A Ti layer.
- the films were deposited sequentially by electron evaporation process onto the bare AT-cut quartz substrates.
- the purpose of the Ti layer is to assist with the adhesion of the Au layer to the substrate surface.
- FIG. 13 there is shown a single artificially synthesized Gaussian peak with different SNR values.
- the width of the Gaussian peak equals 40 units in this analysis.
- SNR is an important factor to determine the Best-Scale values for estimating 2 nd derivative of a spectrum. Due to the dependency of Best-Scale to SNR, it is important to estimate the SNR of a spectrum before estimating 2 nd derivative.
- the first step for this calculation is de-convoluting or estimating the noise profile from the signal. This issue may be addressed by smoothing a noisy signal and subtracting the de-noised signal from the spectrum that results in the noise profile. While this approach is used extensively, there are a number of issues associated with this approach. Primarily, in the case where there is a high level of de-noising, if the signal has sharp peaks, the de-noised spectrum could reduce the intensity of these peaks.
- the noise profile derived from simple subtraction of the de- noised spectrum from the noisy spectrum would result in artificial peaks where sharp peaks occur in the spectrum.
- This error induces higher intensities in the noise profile within the ranges where sharp peaks are smoothed in the spectrum.
- the peaks with lower intensities are suppressed during the de-noising step, which introduces errors in estimating the noise profile.
- noise is considered as a high frequency signal distributed evenly over the whole spectrum
- a section of its profile can be used to represent the noise profile where the range is comparably larger than the average noise wavelength.
- two different sections of a noise profile should have similar RMS values with negligible variance if they are distributed evenly and have the same intensity in the overall range.
- One aspect that needs to be addressed is selecting the threshold for dividing the spectrum into measurable sections.
- the division window should be large enough to provide a significant sample of the noise profile for calculations and also small enough to make it possible to select a region that does not include peaks.
- the standard deviation (STD) for each window is calculated and the lowest value should correspond to a part of the signal which consists of noise and background without peaks.
- the background can be estimated using a simple linear fit.
- Figure 17 The results of the noise profile selection are shown in Figure 17 for a spectrum with sigmoidal background with 10 peaks and initial SNR equal to 20.
- Figure 17a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sigmoidal background, wherein the shaded region represents the segment size for calculating the STD.
- Figure 17b) shows the STD for different segments of the spectrum.
- Figure 17c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 17d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 17e shows the different smoothing levels of the spectrum.
- Figure 18a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with linear background, wherein the shaded region represents the segment size for calculating the STD.
- Figure 18b) shows the STD for different segments of the spectrum.
- Figure 18c) shows the spectrum in the segment having a minimum standard deviation, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 18d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 18e) shows the different smoothing levels of the spectrum.
- Figure 19a shows the synthetic linear spectrum with start and finish points determined.
- Figure 19d) shows background estimation points fitted.
- Figure 19e) shows the original spectrum of synthetic linear data together with the background corrected spectrum of synthetic linear data.
- Figure 20a shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sinusoidal background, wherein the shaded region represents the segment size for calculating the STD.
- Figure 20b shows the STD for different segments of the spectrum.
- Figure 20c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
- Figure 20d shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
- Figure 20e) shows the different smoothing levels of the spectrum.
- Figure 21 a) shows the synthetic sinusoidal spectrum with start and finish points determined.
- Figure 21 d) shows background estimation points fitted.
- Figure 21 e) shows the original spectrum of synthetic sinusoidal data together with the background corrected spectrum of synthetic sinusoidal data.
- Figure 22a shows the variation of SNR with smoothing.
- Figure 22b) shows the effect of background intensity where the background intensity ratio is calculated by dividing values of the intensity of highest peak in the spectrum with the background to intensity of the spectrum without the background.
- Figure 22c) shows the effect of change in the real SNR on the estimated SNR values by comparing the estimated SNR and the initial SNR.
- the active regions of the "Mexican Hat" wavelet are equal to [- 5 - a,5 - a] where a represents the scale of transform.
- a represents the scale of transform.
- the degree of separation is defined as:
- the location of the minimum point that lies in a positive area sandwiched between the two negative areas can be observed.
- the location of this minimum point can be established by considering the zero crossing 3 rd derivative of the spectrum. If the intensity of this point exceeds half of the intensity of a maximum adjacent point, it could be considered roughly as a part of the background.
- the signal peaks are removed from the spectrum by applying the algorithm previously described.
- the areas between start and end points are related to the background. These areas are selected for fitting and estimating the background of the signal.
- the algorithm finds the subclass of the start and end points. In the example illustrated in Figure 26, a subclass value of 0 for start points and a subclass value of 4 for finish points are detected (see Figures 26b) and 26c) respectively).
- the dashed line in Figure 26d) is the first fitting estimation for background. This background estimated curve crosses the spectrum at the end. In order to correct this issue, the fitting and adjustment algorithm is applied.
- Figure 30 there is shown application of the proposed algorithm for background correction of four different noisy experimental systems (L-serine, rhodamine, methyl red and crystal violet).
- Figure 30a) shows application of the background correction method for experimentally obtained real Raman spectra for serine amino acid.
- Figure 30b) shows application of the background correction method for experimentally obtained real Raman spectra for rhodamine.
- Figure 30c) shows application of the background correction method for experimentally obtained real Raman spectra for methyl red.
- Figure 30d) shows application of the background correction method for experimentally obtained real Raman spectra for crystal violet.
- the background correction method of the present invention may be implemented using hardware, software or a combination thereof and may be implemented in one or more computer systems or processing systems capable of carrying out the above described functionality.
- an exemplary computer system 3100 includes one or more processors, such as processor 3105.
- the processor 3105 is connected to a communication infrastructure 31 10.
- the computer system 3100 may include a display interface 31 15 that forwards graphics, texts and other data from the communication infrastructure 31 10 for supply to the display unit 3120.
- the computer system 3100 may also include a main memory 3125, preferably random access memory, and may also include a secondary memory 3130.
- the secondary memory 3130 may include, for example, a hard disk drive 3135, magnetic tape drive, optical disk drive, etc.
- the removable storage drive 3140 reads from and/or writes to a removable storage unit 3145 in a well-known manner.
- the removable storage unit 3145 represents a floppy disk, magnetic tape, optical disk, USB etc.
- the removable storage unit 3145 includes a computer usable storage medium having stored therein computer software in a form of a series of instructions to cause the processor 3105 to carry out desired functionality.
- the secondary memory 3130 may include other similar means for allowing computer programs or instructions to be loaded into the computer system 3100. Such means may include, for example, a removable storage unit 3140 and interface 3150.
- the computer system 3100 may also include a communications interface 3160.
- Communications interface 3160 allows software and data to be transferred between the computer system 3100 and external devices. Examples of communication interface 3160 may include a modem, a network interface, a communications port, a PCMIA slot and card etc.
- Software and data transferred via a communications interface 3160 are in the form of signals 3165 which may be electromagnetic, electronic, optical or other signals capable of being received by the communications interface 3160.
- the signals are provided to communications interface 3160 via a communications path 3170 such as a wire or cable, fibre optics, phone line, cellular phone link, radio frequency or other communications channels.
- the invention is implemented primarily using computer software, in other embodiments the invention may be implemented primarily in hardware using, for example, hardware components such as an application specific integrated circuit (ASICs).
- ASICs application specific integrated circuit
- Implementation of a hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art.
- the invention may be implemented using a combination of both hardware and software.
- the present invention provides an improved background correction method or algorithm based on wavelet transformation for baseline correction of vibrational spectra with the ability to work with noisy signals without de-noising.
- the background correction method is equally applicable to other types of vibrational spectra including at least including at least Raman and Infrared such as Fourier Transform Infrared.
- the background correction algorithm benefits from WT to enable it to work directly with noisy signal, and SRM for enabling peak removal from the signal and finding the background shape.
- WT eliminates the requirement for prior smoothing of the signal and also gives a good approximation to estimate the start and finish points of signal due to its ability to calculate 2 nd derivative of the noisy spectrum.
- the background correction method of the present invention is adapted for integration with commercially-available large vibrational spectrophotometers (including infrared and Raman spectrophotometers) as well as more recently- commercialised hand-held Raman spectrophotometers.
- large vibrational spectrophotometers including infrared and Raman spectrophotometers
- Raman spectrophotometers including infrared and Raman spectrophotometers
- the instrumentation market is highly competitive and the end users of such equipment demand high quality background corrected data to be output directly from the equipment without the need for further data processing. Therefore, the adoption of the background correction method by instrumentation manufacturers should secure a significant competitive advantage in marketing and increasing the user base of their products.
- the proposed algorithm has been tested for accuracy and has achieved an acceptable level of error that makes the background correction method useful for most of the data analysis essential for vibrational spectroscopy.
- the tests for accuracy as well as experimental results demonstrate that the background correction method of the present invention would be useful in instances where automatic baseline detection is required.
- This approach could address the problems of background corrections on real data where the quality of spectra is low (e.g. biological and/or chemical samples with low SNR and/or high fluorescence). Also, based on accuracy tests, this approach has a minimal variance in the relative peak intensities during analyses.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Spectrometry And Color Measurement (AREA)
- Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
Abstract
A background correction method for a spectrum of a target sample, the method including the steps of: (a) inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data; (b) estimating a signal-to- noise ratio (SNR) for the spectrum; (c) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT); (d) removing the plurality of signal peaks to identify the background; (e) subtracting the background from the spectrum to obtain a background corrected spectrum; and (f) outputting the background corrected spectrum as a target sample signal.
Description
A BACKGROUND CORRECTION METHOD FOR A SPECTRUM OF A TARGET
SAMPLE
Field of the Invention
[0001] The present invention relates to methods for background correction of the spectrum of a target sample. The invention more particularly relates to methods for background correction of the spectrum of a target sample which does not require a de-noising or smoothing step before background removal to produce the background corrected spectrum.
Background to the Invention
[0002] Due to its ability to provide information about the physical and chemical characteristics of materials, vibrational spectroscopy, including for example, Fourier Transform Infrared and Raman, is applied in various branches of science including biology, chemistry and materials science. Vibrational spectroscopy is a nondestructive technique, routinely used to qualitatively and quantitatively analyse materials by identifying their native structures and structural impurities. Vibrational spectroscopy can also be used to investigate the thermodynamics and phase equilibrium of a variety of materials. Moreover, vibrational spectroscopy is a strong candidate for mapping biological samples due to the ability to differentiate biological components due to the differences in their resonance nature, especially by the position of signal peaks in vibrational spectra as well as their relative intensities, which is related to the quantity of each molecular structure.
[0003] Vibrational signals from biological samples typically have lower signal-to- noise ratios and a higher magnitude of background arising mainly from auto- fluorescence. This arises particularly in relation to biological samples, due to the sensitivity of the biological samples to the incident wavelength of the laser. Since the existence of the background suppresses the main spectrum, interpretation becomes very difficult. Accordingly, a background correction method must to be applied to the spectrum before performing any detailed analysis of the spectra obtained from vibrational spectroscopy.
[0004] Given the significant scope of different spectroscopic techniques in analysing biological, chemical and materials samples, extracting meaningful information from spurious spectra is essential. Hence signal processing software must to be able to distinguish noise and background from the target sample signal.
Mathematically, a sample signal can be considered as an array (S) that can be given as:
r&■*■ s / -A ^ j
where, PS, B and N are related to the noiseless signal without background (pure spectrum), background and noise, respectively. In order to separate spurious features from spectroscopic data, noise and background signals must be removed from the experimental spectra. The reasons for requiring removal of noise and background from an experimentally obtained spectrum are diverse and application- dependent, however in most of the cases, it is necessary to apply a background correction algorithm to increase the effective resolution for quantitative analyses. Accordingly, significant efforts have been directed towards determining approaches for the removal of background from signals with higher accuracy, independent of signal nature and human error.
[0005] Most known background correction methods (BCMs), require some information related to the signal or background prior to application of the background correction method. Based on the type of information that needs to be extracted from signals, BCMs can be categorized into two major groups. The first group of BCMs include methods requiring knowledge about background, blurring effect and noise that predominantly deal with signals by utilising knowledge about the signal components such as background shape, position and signal-to-noise ratio (SNR). Some examples in this category include the noise median method (NMM), signal removal method (SRM) and threshold based classification (TBC). The second group of BCMs include those requiring knowledge about frequency of signal components, i.e. if a signal is deconstructed based on frequency, the noise and background would have very different characteristics since noise is generally a high frequency phenomenon, while background is a low frequency component of a signal. This suggests that deconstruction of signals based on frequency and filtering the noise and background components, can produce a pure noiseless and background corrected signal. This type of signal processing forms the base for the more commonly employed Fourier transform (FT) and wavelet transform (VVT) methods.
[0006] However, each of the aforementioned groups of BCM's has associated disadvantages. A limitation of the first group of BCMs is that the existing noise in the signal requires de-noising and smoothing as a precursor to engaging in any
background removal. This is because most BCMs in this group (e.g. SRM or TBC) essentially employ the derivative of a signal that is estimated numerically, which is overwhelming to calculate without first employing a smoothing process. However, applying a smoothing procedure as a precursor to background removal can introduce unnecessary errors into the signals depending on the de-noising methodology employed. An erroneous de-noising process can further result in peak shifts or even peak suppression in the case of low SNRs. In contrast, for the second group, the frequency based methods divide the signal into components based on frequency, which can be rather daunting due to the fact that components in real signals do not have constant frequencies, i.e. the frequency component is not a constant value but consists of a range of frequencies, making selection of thresholds a challenge. Therefore, deconstructing the spectrum into different frequencies followed by rebuilding the final spectrum may leave traces of noise or background.
[0007] The discussion of the background to the invention hereinabove is included to explain the context of the invention. This is not to be taken as an admission that any of the material referred to was published, known or part of the common general knowledge in Australia as at the priority date of the present application.
Summary of the Invention
[0008] According to an aspect of the present invention, there is provided a background correction method for a spectrum of a target sample, the method including the following steps: (a) inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data; (b) estimating a signal-to-noise ratio (SNR) for the spectrum; (c) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT); (d) removing the plurality of signal peaks to identify the background; (e) subtracting the background from the spectrum to obtain a background corrected spectrum; and (f) outputting the background corrected spectrum as a target sample signal.
[0009] The position of at least most of the plurality of signal peaks may be determined by applying a second order derivative.
[0010] In one embodiment, the wavelet transform (WT) used to approximate the second order derivative of the spectrum is a "Mexican Hat" mother wavelet. The wavelet transform (WT) may be a continuous wavelet transform (CWT) or a discrete
wavelet transform (DWT).
[0011] In one form of the invention, the step of estimating a signal-to-noise ratio for the spectrum includes the following steps: (a) dividing the spectrum into a plurality of segments; (b) estimating a standard deviation for at least most of the plurality of segments; (c) estimating the background of the segments using a minimum estimated standard deviation; (d) calculating the root mean square (RMS) of a total background signal for a totality of segments of the spectrum; (e) calculating the root mean square (RMS) of the spectrum; and (f) calculating the signal-to-noise ratio.
[0012] According to another form of the invention, removing the plurality of signal peaks to identify the background includes the following steps: (a) determining a start point and a finish point corresponding to each signal peak by calculating zero crossing points corresponding to the start points and finish points by applying a second order derivative; (b) dividing the spectrum into sections each section corresponding to an individual signal peak or an individual feature comprising merged multiple signal peaks based on the start and finish points; (c) calculating an area for each section; and (d) selecting a minimum area corresponding to the section having a largest signal peak in the spectrum and using the minimum area to define a minimum threshold; wherein any signal peak having an area less than the minimum threshold constitutes background.
[0013] Subtracting the background from the spectrum to obtain a background corrected spectrum may be preceded by the step of minimising the effect of a first and second endpoint of the spectrum. According to this embodiment, the step of minimising the effect of a first and second endpoint of the spectrum includes the following steps: (a) extending the spectrum from the first endpoint corresponding to the start of the signal by adding signal points based on the slope of the signal adjacent to the first endpoint; and (b) extending the spectrum from the second endpoint corresponding to the end of the signal by adding signal points based on the slope of the signal adjacent to the second endpoint.
[0014] According to a preferred form of the invention, the spectrum is a vibrational spectrum. The vibrational spectrum may be a Raman spectrum or an Infrared spectrum. In another form, the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
[0015] The background correction method of the present invention, may be applied to a spectrum collected by vibrational spectroscopy or microscopy.
[0016] According to another aspect of the present invention, there is provided an apparatus for producing a background corrected spectrum of a sample, the sample being obtained by a spectroscopic device including a light source and a detector assembly for detecting photons scattered by the sample when illuminated by the light source, the apparatus including: a processor configured to execute a machine readable code to perform the following steps: (i) inputting a spectrum including a plurality of signal peaks attributable to spectral data, background and noise data; (ii) estimating a signal-to-noise ratio (SNR) for the spectrum; (iii) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT); (iv) removing the plurality of signal peaks to identify the background; (v) subtracting the background from the spectrum to obtain a background corrected spectrum; and (vi) outputting the background corrected spectrum as a target sample signal.
[0017] According to a preferred form of the invention, the spectrum is a vibrational spectrum. The vibrational spectrum may be a Raman spectrum or an Infrared spectrum. In another form, the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
Brief Description of the Drawings
[0018] It will be convenient to hereinafter describe the invention in greater detail by reference to the accompanying figures which facilitate understanding of the method according to this invention. The particularity of the figures and the related description is not to be understood as superseding the generality of the broad identification of the invention as given in the attached claims.
[0019] Figure 1 exemplifies the spectra generated artificially for the purpose of modelling the background correction method of the present invention.
[0020] Figure 1 a) shows the simulated spectrum having ten peaks with no background and no noise.
[0021] Figure 1 b) shows the same simulated spectrum as Figure 1 a) with noise added at a SNR of 25.
[0022] Figure 1 c) shows a linear simulated background.
[0023] Figure 1 d) shows the simulated spectrum with noise of Figure 1 b) and the linear background of Figure 1 c) added.
[0024] Figure 1 e) shows a sigmoidal simulated background.
[0025] Figure 1f) shows the simulated spectrum with noise of Figure 1 b) and the sigmoidal background of Figure 1 e) added.
[0026] Figure 1 g) shows a sinusoidal simulated background.
[0027] Figure 1 h) shows the simulated spectrum with noise of Figure 1 b) and the sinusoidal background of Figure 1 g) added.
[0028] Figure 2 shows a general flow chart for the background correction method of the present invention.
[0029] Figure 3 shown a more detailed version of the background correction method of Figure 2.
[0030] Figure 4 shows a detailed flowchart corresponding to step 310 of the flowchart shown in Figure 3 relating to preliminary data processing.
[0031] Figure 5 shows a detailed flowchart corresponding to step 320 of the flowchart shown in Figure 3 relating to estimating SNR.
[0032] Figure 6 shows a detailed flowchart corresponding to step 330 of the flowchart shown in Figure 3 relating to calculating the 2nd derivative of the spectrum and correcting their end effects and noise.
[0033] Figure 7 shows a detailed flowchart corresponding to step 340 of the flowchart shown in Figure 3 relating to peak removal and finding background points.
[0034] Figure 8 shows a detailed flowchart corresponding to step 350 of the flowchart shown in Figure 3 relating to adjusting end points effects.
[0035] Figure 9 shows end point adjustment in accordance with Condition 2.
[0036] Figure 9a) shows the full original spectrum before adjusting the end points.
[0037] Figure 9b) shows the magnified regions of the start points after adjusting the end points.
[0038] Figure 9c) shows the magnified regions of the finish points after adjusting the end points.
[0039] Figure 10 shows end point adjustment in accordance with Condition 3.
[0040] Figure 10a) shows the full original spectrum before adjusting the end points.
[0041] Figure 10b) shows the magnified regions of the start points after adjusting the end points.
[0042] Figure 10c) shows the magnified regions of the finish points after adjusting the end points
[0043] Figure 1 1 shows end point adjustment in accordance with Condition 4.
[0044] Figure 1 1 a) shows the full original spectrum before adjusting the end points.
[0045] Figure 1 1 b) shows the magnified regions of the start points after adjusting the end points.
[0046] Figure 1 1 c) shows the magnified regions of the finish points after adjusting the end points.
[0047] Figure 12 shows a detailed flowchart corresponding to step 360 of the flowchart shown in Figure 3 relating to fitting and adjustments.
[0048] Figures 13a) to 13f) shows a single artificially synthesized Gaussian peak without noise and with SNR values of 10, 20, 30, 40 and 50 respectively.
[0049] Figure 14 shows the variation of correlation coefficient (r) with SNR and
CVVT scales for the artificially synthesized spectra shown in Figure 13.
[0050] Figure 15 shows a calibration curve for the signals shown in Figure 14.
[0051] Figure 16 shows the effect of peak width on Best-Scale in different SNR values.
[0052] Figure 17 shows the signal-to-noise (SNR) estimation in accordance with an embodiment of the present invention.
[0053] Figure 17a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sigmoidal background, wherein the shaded region represents the segment size for calculating the standard deviation (STD).
[0054] Figure 17b) shows the STD for different segments of the spectrum.
[0055] Figure 17c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
[0056] Figure 17d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
[0057] Figure 17e) shows the different smoothing levels of the spectrum.
[0058] Figure 18 shows the signal-to-noise (SNR) estimation in accordance with another embodiment of the present invention.
[0059] Figure 18a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with linear background, wherein the shaded region represents the segment size for calculating the STD.
[0060] Figure 18b) shows the STD for different segments of the spectrum.
[0061] Figure 18c) shows the spectrum in the segment having a minimum STD,
wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
[0062] Figure 18d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
[0063] Figure 18e) shows the different smoothing levels of the spectrum.
[0064] Figure 19a) shows the synthetic linear spectrum with start and finish points determined.
[0065] Figure 19b) shows a start point condition of the spectrum (Subclass=0).
[0066] Figure 19c) shows a finish point condition of the spectrum (Subclass=6).
[0067] Figure 19d) shows background estimation points fitted.
[0068] Figure 19e) shows the original spectrum of synthetic linear data together with the background corrected spectrum of synthetic linear data.
[0069] Figure 20a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sinusoidal background, wherein the shaded region represents the segment size for calculating the STD.
[0070] Figure 20b) shows the STD for different segments of the spectrum.
[0071] Figure 20c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background.
[0072] Figure 20d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum.
[0073] Figure 20e) shows the different smoothing levels of the spectrum.
[0074] Figure 21 a) shows the synthetic sinusoidal spectrum with start and finish points determined.
[0075] Figure 21 b) shows a start point condition of the spectrum (Subclass=3).
[0076] Figure 21 c) shows a finish point condition of the spectrum (Subclass=4).
[0077] Figure 21 d) shows background estimation points fitted.
[0078] Figure 21 e) shows the original spectrum of synthetic sinusoidal data together with the background corrected spectrum of synthetic sinusoidal data.
[0079] Figure 22 shows the accuracy tests for the SNR estimation algorithm.
[0080] Figure 22a) shows the variation of SNR with smoothing.
[0081] Figure 22b) shows the effect of background intensity where the background intensity ratio is calculated by dividing values of the intensity of highest peak in the spectrum with the background to intensity of the spectrum without the
background.
[0082] Figure 22c) shows the effect of change in the real SNR on the estimated
SNR values by comparing the estimated SNR and the initial SNR.
[0083] Figure 23a) shows the synthetic spectrum with ten peaks randomly distributed on a signal with an SNR of 20 with no end effect correction.
[0084] Figure 23b) shows the second derivative of the spectrum without end effect correction through wavelet transform at a Best Scale of 17.
[0085] Figure 23c) shows extension for the spectrum form the first and second end points comprising end effect correction of the spectrum.
[0086] Figure 23d) shows the second derivative of the spectrum with end effect correction through wavelet transform at a Best Scale of 17.
[0087] Figure 24a) shows the synthetic spectrum with ten peaks randomly distributed on a signal with SNR=20 with no end effect correction.
[0088] Figure 24b) shows the numerical second derivative of the spectrum of Figure 24a).
[0089] Figure 24c) shows the second derivative of the spectrum through wavelet transform at a Best Scale of 17.
[0090] Figure 24d) shows the squared second derivative to suppress noise.
[0091] Figure 25 shows the variation of the degree of separation (R) with position in two similar Gaussian peaks.
[0092] Figure 25a) shows two Gaussian peaks where the value of R is 0.37 together with their second and third derivatives.
[0093] Figure 25b) shows two Gaussian peaks where the value of R is 1 .1 1 together with their second and third derivatives.
[0094] Figure 25c) shows two Gaussian peaks where the value of R is 2.60 together with their second and third derivatives.
[0095] Figure 25d) two Gaussian peaks where the value of R is 3.34 together with their second and third derivatives.
[0096] Figure 26a) shows the start and finish points for each peak signal in a spectrum with background.
[0097] Figure 26b) shows the start point condition of the spectrum.
[0098] Figure 26c) shows the finish point condition of the spectrum.
[0099] Figure 26d) shows the background points and their fittings to determine the background.
[0100] Figure 26e) shows the original spectrum together with the background corrected spectrum obtained by subtracting the background from the unprocessed spectrum.
[0101] Figure 27 shows the root mean squared error (RMSE) obtained over 900 iterations of the background correction method of the present invention and the distribution of the RMSE with the number of iterations.
[0102] Figure 28 shows the variation of RMSE with the number of signal peaks in the spectrum.
[0103] Figure 29 shows the variation of RMSE with the SNR in the spectrum.
[0104] Figure 30 shows examples of the application of the background correction method of the present invention for experimentally obtained real Raman spectra.
[0105] Figure 30a) shows application of the background correction method for experimentally obtained Raman spectra for serine amino acid.
[0106] Figure 30b) shows application of the background correction method for experimentally obtained Raman spectra for rhodamine.
[0107] Figure 30c) shows application of the background correction method for experimentally obtained Raman spectra for methyl red.
[0108] Figure 30d) shows application of the background correction method for experimentally obtained Raman spectra for crystal violet.
[0109] Figure 31 is a schematic diagram showing various functional elements of a computer-enabled system for performing the background correction method of the present invention in block form.
Detailed Description
[0110] Wavelets transforms (WT) like Fourier Transforms (FT) are a convolution between a wavelet function (ψ ) and signal ( x(t) ). The major difference between FT and WT is that in FT, the wavelet has a sine or cosine form that specifically provides information in the frequency domain, while in WT, the mother wavelet could have any function if it has zero-mean oscillation behaviour. A mother wavelet could produce families of waves through:
where b is the parameter for transition and a represents dilation (that is always a positive integer). If representing an average frequency, b indicates the position of a
wavelet window. Hence, in employing WT, information on both time and frequency can be extracted from a spectrum. It is essential to note that although, both WT and FT provide information on frequency they are not interchangeable.
[0111] WT can be divided into two main categories, Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT) and can be defined as:
where the asterisk (*) represents complex conjugation. This equation can also be given as:
W(a,b)= f(b)®^;(b) (4)
where ® represents convolution.
[0112] Although wavelet transformation has been studied for processing spectroscopic data, only recently has this technique been applied to calculate the approximate derivative of a signal. This was substantiated by showing that the nth order derivative of a signal could be achieved in a dilation (scale) of a by applying an appropriate mother wavelet. Furthermore, the mother wavelet was chosen such that its derivative still had a wavelet nature. For example, if the Gaussian function is considered as a mother wavelet, it's 2nd derivative commonly referred to as "Mexican Hat' or "Mar? with a minus sign can also be used as a wavelet. In the present case, the nth derivative of a signal can be estimated using Gaussian wavelet applied n times to the spectrum or a proper nth derivative of the Gaussian function. As previously outlined, noise and spectrum have different frequencies where lower frequency components are related to the higher dilation (scale) numbers. Accordingly, the side effects of noise of the transformed signal can be suppressed by increasing the dilation. This allows the derivative of a noisy spectrum to be approximated by reducing the influence of noise.
[0113] Spectroscopic techniques employ derivative calculations as a resolution enhancement technique, especially the 2nd order derivatives for extracting peak characteristics such as position and start and finish points of a signal. For high resolution enhancement, higher order derivatives can be used for locating and deconvolving overlapping peaks. With the 2nd order derivative, signal peaks can be mined from a spectrum where the remaining points of the spectra would be the representative segments of background that can be further used for background
estimations. The presence of noise in these signals can be a serious drawback in finding signal peaks and calculating derivatives of experimental spectra. One technique used to calculate the derivative is "Numerical Calculation". Due to the random nature of noise in experimental spectra, a numerical calculation results in noisy signals especially where the spectrum has a low SNR, which makes spectral smoothing an essential process.
where a is the intensity controller, c and σ are median and variance of the Gaussian peak, respectively. Referring now to Figures 1 a) to 1 h), the program creates simulated Gaussian peaks of variable quantity with random positions, intensity and width distributed in the spectrum with three alternative background forms as linear, sigmoidal and sinusoidal and variable background constants. Figure 1 a) shows the simulated spectrum having ten Gaussian peaks with no background and no noise. Figure 1 b) shows the same simulated spectrum as Figure 1 a) with noise added at SNR=25. Figure 1 c) shows a linear background simulated in accordance with the following formula:
Background = a · x + b (6)
where a and b are the slope and scope of line, respectively. Figure 1 d) shows the simulated spectrum with noise added of Figure 1 b) with the linear background of Figure 1 c) added. Figure 1 e) shows a sigmoidal background simulated in accordance with the following formula:
Background = , ^ , r I + O (7)
1 + exp(- a(x - c))
where a is the gradient at the inflection point, c is the location of the inflection point, / is the intensity controller (since sigmoid function results in numbers between 0 and 1 ) and 0 is an offset. Figure 1f) shows the simulated spectrum with noise added of Figure 1 b) with the sigmoidal background of Figure 1 e) added. Figure 1 g) shows a sinusoidal background simulated in accordance with the following formula:
Background = x1 5Sin(-) · I + O (8)
a
where a is the frequency controller, / is the intensity and 0 is the offset. Noise is considered as white Gaussian noise and added based on calculated SNRdB-
[0115] The 2 derivative of a spectrum can be estimated with VVT using a "Mexican hat" mother wavelet. An appropriate scale is then selected to reduce the effects of noise in the 2nd derivative. A higher scale (i.e. towards lower frequencies) results in a corresponding decrease in the influence of noise. The higher scale also produces a broadening of the wavelet (i.e. widening of the transformed peaks), which in turn reduces the resolution of the derivative spectrum due to the merging of signal peaks at higher dilation numbers. To address this issue and to select the Best-Scale, being representative for the derivative of noiseless spectrum, the SNR of the spectrum must be considered during calculations. The correlation coefficient (r) was used as a factor to select the Best-Scale:
where x; and y; represent the element of vectors X and V, respectively, where (r) could have values between 0 and 1 , and if r = 1 , vectors X and V are similar to each other.
[0116] If the spectrum has a signal without noise, its 2nd derivative is readily calculated numerically. If noise with a known SNR is added to this signal using VVT, the resultant transformed spectrum can be calculated at different scales. Thereafter, a comparison of the resultant spectrum at each scale with a noiseless 2nd derivative of the signal and the respective correlation coefficient values, provides the variation of correlation coefficient with increase in SNR.
[0117] In order to test the background correction method, several spectra containing a single Gaussian peak of similar intensities and positions, but with varying widths and SNRs were synthesized. The numerical 2nd derivatives of these noiseless spectra were determined before adding a white Gaussian noise to the spectra. Thereafter, the correlation coefficients between the numerically derived 2nd derivative and wavelet transformed spectra with different SNRs at different scales were calculated. Transformed spectra scaled with high correlation coefficients were chosen for each SNR and considered to represent the Best-Scales. Following determination of a Best-Scale for signals with different SNRs, these parameters were plotted for each of the signal widths and the respective calibration curves were estimated by fitting a function to these points. This allowed a Best-Scale values to be
estimated based on SNR and signal width.
[0118] Referring now to Figure 2, there is shown a flowchart outlining the background correction method 200 of the present invention. At step 210, the method includes inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data. At step 220 the method includes estimating a SNR for the spectrum. At step 230 the method includes determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (VVT). At step 240 the method includes removing the plurality of signal peaks to identify the background. At step 250 the method includes subtracting the background from the spectrum to obtain a background corrected spectrum. Finally, at step 260 the method includes outputting the background corrected spectrum as a target sample signal.
[0119] More particularly, the underlying principles for background correction employed in the background correction method according to an embodiment are shown in Figure 3 as follows. Step 310 involves inputting the spectrum to be background corrected. Step 320 involves estimating SNR. Step 330 involves calculating a 2nd derivative of the spectrum using VVT in the Best-Scale using the SNR estimated at step 320. Step 340 involves finding the start points and finish points of signal peaks in the spectrum through the estimated 2nd derivative and identifying the points related to background by removing the signal peaks from the spectrum. At step 350, a nth order polynomial function is fitted to the background points to adjust the end point effects. At step 360 the polynomial function fitting is adjusted. The background correction is then applied by subtracting the fitted background points from the spectrum. More detailed flowcharts for each of the steps 320 to 360 are provided as Figures 4 to 8 and Figure 12.
[0120] Referring now to Figure 4 which expands step 310 of the flowchart shown in Figure 3 relating to preliminary data processing. Data processing 400 involves reading the spectrum at step 410 and isolating or cutting the regions of interest in a particular spectrum 420 to (i) increase the visual resolution, (ii) increase the accuracy due to a decrease in the magnitude of the calculation, and (iii) minimize unexpected errors due to abrupt variations in background or impurities added by other erroneous peaks. This mirrors processing of experimental spectroscopic data where often only a section of the spectrum is required for spectral analysis. The data points are adjusted to a maximum of 5000 points at step 430 and the data processing step 310
is then complete top allow the SNR to be estimated at step 320.
where RMS represents the root mean square. This process 500 is expanded in the flowchart shown in Figure 5. At step 510 the input data is processed in accordance with the process described in reference to Figure 4. At step 520 the spectrum is divided into a plurality (30 in this example) segments or scanning windows of equal length (X-axis) for the purpose of estimating the noise profile. At step 530, a standard deviation (STD) is estimated for each or at least most of the segments. At step 540, the minimum local standard deviation is used and assumed as the temporary background to identify a noise profile. At step 550, the root mean squared (RMS) of the noise profile of the whole spectrum is calculated.
[0122] At step 560, to calculate the RMS of the signal, the spectrum is smoothed using a Savitzky-Golay filter at different levels from 0.1 to 0.9, followed by subtracting each of them from the spectrum to provide a temporary background correction. At step 570, the SNRs of the temporarily background corrected signals are calculated using equation (10) above. At step 580, the average of the SNRs of the temporarily background corrected signals are used to choose the Best-Scale depending on the signal peak width. At step 590, the 2nd derivative of the spectrum is calculated and its endpoint effects and residual noise is corrected.
[0123] Referring now to Figure 6, there is shown in expanded form step 590 of the flowchart shown in Figure 5. At step 610, the SNR is estimated in accordance with the method described in reference to Figure 5. Due to the discrete nature of a spectrum, artificial peaks are typically generated at both the ends of the transformed signals during transformation. To address this issue, at step 620, points are added to the start and the end of the original spectrum to shift and restrict the influence of this erroneous endpoint effect. After transformation at step 630, the erroneous areas are readily removed from the signal and the 2nd derivative. The points are added such that there is minimal discontinuity or changes to the slope of the spectrum since this would generate considerable artificial peaks in the derivative spectrum. Since noise does not follow the same frequency as the experimental spectral data, some traces of noise will remain in the Best-Scale. These remnant traces of noise will have a small
intensity when compared with the peaks related to the signal. To completely eliminate the effect of residual noise in the 2nd derivative, the spectrum is squared to enhance the signals (SSDS) at step 640. Step 650 signal peak removal and identification of the background points as in step 340 of Figure 3.
[0124] Referring now to Figure 7, the background correction method employs a signal removal method (SRM) 700. At step 710, the 2nd derivative of the spectrum is calculated and its endpoint effects and residual noise is corrected. The first step 720 of the signal removal method involves the isolation of peaks from the signal (i.e. the residual corresponds to the background). During isolation, the signal peak start points and finish points are identified using the 2nd derivative obtained at step 710 (also see Figure 6 for detail regarding calculation of the 2nd derivative of the spectrum). At step 720, the start points and finish points for each signal peak correspond to the zero crossing points. Based on the zero crossing points calculated at step 720, the spectrum can be divided into sections, each comprising a discrete start and finish point pair. At step 730, the areas of each section within the 2nd derivative spectrum of a particular signal defined by a zero crossing pair is calculated, followed by the selection of the minimum (i.e. the largest negative) local area which corresponds to the largest and sharpest peak in the signal. At step 740, any local areas smaller than the threshold calculated in accordance with the following formula are considered as background:
[0125] At step 750, the background points are saved into fitting arrays of FIT_X (wavenumber) and FIT_Y (intensity). At step 760, the next derivative of SSDS (i.e. the endpoint corrected 2nd derivative of the spectrum) is calculated through CVVT using "Gaussl " as the mother wavelet, which produced a 3rd derivative of the spectrum. In the case of a positive area surrounded with two negative areas in SSDS, the positive area is scanned for the minimum extreme points, which correspond to zero crossing points of the first derivative of SSDS with negative slopes. These points are then added to the fitting arrays FIT_X (wavenumber) and FIT_Y (intensity) to bring the fitted background closer to the real peak minima. At step 770, the endpoints effects are adjusted.
[0126] Referring now to Figure 8, estimation of the background using signal- deprived spectrum is based on fitting of residual points with a nth order polynomial
function. At step 805, the signal peaks are removed and background points identified as previously described. Where there are no background points at the start and finish of the spectrum (e.g. a simulated spectrum), the fitting may select any arbitrary condition, likely to result in failure to provide correct background correction towards the signal endpoints. One approach employed to address this problem is to continue the minimum of the nearest background point as a horizontal line. However, this approach produces an artificial offset at the ends of the spectrum. In order to better address the issue, 100 points are fitted with a cubic polynomial to each of the start and finish points of the spectrum at step 810. This step decreases the effect of noise in the selected sections of the spectrum. Based on the location of the background start and finish points and the slopes of the spectrum at its respective, seven possible conditions may occur. The conditions relate to the endpoints of the spectrum and can be divided into four main categories incorporating subclasses 0 - 6. The subclasses are determined based on FIT_X(1), X(1), FIT_Y(1), Y(1) and SlopeS that relate to the first point of fitting arrays (wavenumber), first point of spectrum (wavenumber), first point of fitting arrays (intensity), first point of spectrum (intensity) and the slope of the fitted cubic polynomial for the initial 20 points, respectively. Only the start point relating to the various subclasses are explained here, but the same can be extrapolated to the finish points.
[0127] Condition 1 . If FIT_X(1)=X(1); (Subclass=0): In condition 1 , start or finish points of the spectrum exist in a fitting array. Hence no modification is required as they are already included in the background.
[0128] Condition 2. If FIT_X(1)≠X(1) and FIT_Y(1)>Y(1) and SlopeS>0; (Subclass=1 for start and 4 for finish points): In Condition 2, the end points of the fitting array have intensities higher than the end points of signal. See Figure 9 where the unfinished Gaussian peaks at the start and the end of the spectrum are cut from the last 25 points and based on SlopeS, a 2nd order polynomial is fitted to these points, following which the minimum point of this polynomial is determined. Thereafter, 10 additional points are added as background from the location of minima to outside of the spectrum. Figure 9a) shows the full original spectrum before adjusting the endpoints, 9b) shows the magnified regions of the start points, after adjusting the end points and 9c) shows the magnified regions of the finish points, after adjusting the end points. See also steps 815 and 820 of Figure 8 for Subclass = 1 and steps 845 and 850 of Figure 8 for Subclass = 4.
[0129] Condition 3. If FIT_X(1)≠X(1) and FIT_Y(1)<Y(1) and SlopeS>0; (Subclass=2 for start and 5 for finish points): This condition is quite similar to Subclass=1 for start and 4 for finish points (i.e. Condition 2). The only difference is that FIT_Y(1)<Y(1) and therefore, there is no need to cut the spectrum at end points. See Figure 10 for end point adjustment in accordance with Condition 3. Figure 10a) shows the full original spectrum before adjusting the endpoints, Figure 10b) shows the magnified regions of the start points after adjusting the end points, and Figure 10c) shows the magnified regions of the finish points after adjusting the end points. See also steps 825 and 830 of Figure 8 for Subclass = 2 and steps 855 and 860 of Figure 8 for Subclass = 5.
[0130] Condition 4. If FIT_X(1)=X(1) and FIT_Y(1)>Y(1) and SlopeS<0; {Subclass=3 for start and 6 for finish points): In Condition 4 it is extremely difficult to determine the finish point of the signal as shown in Figure 1 1 . Accordingly, 10 additional points are added to the ends of fitting arrays with intensities equal to FIT_Y(1) at the start and FIT_Y(end) at the finish point of the spectrum. Figure 1 1 a) shows the full original spectrum before adjusting the end points, Figure 1 1 b) shows the magnified regions of the start points after adjusting the end points, and Figure 1 1 c) shows the magnified regions of the finish points after adjusting the end points. See also steps 835 and 840 of Figure 8 for Subclass = 3 and steps 865 and 870 of Figure 8 for Subclass = 6.
[0131] At step 875 fitting and adjustments are made which will be described in more detail with reference to Figure 12.
[0132] Referring now to Figure 12, following the end point correction at step 1210, points relating to the background are fitted with a nth order polynomial (in this example a 9th order polynomial) at step 1240. To control the fitting behaviour, a correction condition is required. After estimating the SNR it becomes relatively simple to estimate the Peak-to-Peak (PTP) value of the noise. An important aspect that needs to be considered to obtain a precise background correction is that following this process, the resultant background corrected spectrum should not have any data lower where ε is related to the background correction error, which should
not be more than the value of PTP itself. In the background correction method, a simple loop validates this threshold for all the points of the spectrum. If this condition fails, the coordinates of the minimum of the spectrum at the failed ranges are added
PTP
to the fitting arrays and fitting process reiterates until the + ε condition is valid
2 )
at all the points of the spectrum. This process eliminates the possible fluctuation in the background estimation, which may otherwise arise due to the lack of background points in some sections of the spectrum.
[0133] Testing the Accuracy of the Algorithm
In order to assess the accuracy of the background correction method, a number of statistical analyses have been performed:
Producing 900 spectra, each containing 10 Gaussian peaks of random characteristics (i.e. μ, σ, and intensity) with constant SNR=60 dB
Changing the number of Gaussian peaks from 2 to 30 with random characteristics (i.e. μ, σ, and intensity) and constant SNR=60 dB, with each condition applied for 200 times.
Changing SNR of the spectrum from 10 to 130 dB with 10 Gaussian peaks of random characteristics (μ, σ, and Intensity), with each condition applied 200 times.
[0134] The background corrected spectrum was compared to the original spectrum and Root Mean Squared Error (RMSE) values were calculated.
[0135] Experimental data
Chemicals
Rhodamine B, crystal violet and methyl red were purchased from Merck Chemicals and L-serine amino acid was purchased from Sigma-Aldrich. All chemicals were used without further modifications.
Preparation of e-beam evaporated substrates
The metal layers were deposited by a Balzers™ electron beam evaporator. The layer composed of 1000 A Au with an underlying 100 A Ti layer. The films were deposited sequentially by electron evaporation process onto the bare AT-cut quartz substrates. The purpose of the Ti layer is to assist with the adhesion of the Au layer to the substrate surface.
Raman scattering measurements
To obtain good Raman signals, gold substrates were immersed in 1 mM solutions of rhodamine B, crystal violet or methyl red for 1 hour, followed by washing with deionized water (MilliQ) and air drying. In case of L-serine amino acid, the powder was directly placed on a flat gold substrate before Raman measurements. It is known that Au and Ag thin films and nanostructured substrates assist in increasing the
Raman scattering cross-section of molecules by a surface enhanced Raman scattering (SERS) process. The above samples containing different Raman active molecules were analysed using a Perkin Elmer Raman Station 200F (785nm laser, spot size of 100 pm) with an exposure time of 1 sec and 20 acquisitions, with disabled background correction feature.
[0136] Results and Discussion
The performance of the current algorithm is summarized by stating the results of the calibration curve calculations and further explaining the results of each section outlined in the methods.
[0137] Referring now to Figure 13, there is shown a single artificially synthesized Gaussian peak with different SNR values. The width of the Gaussian peak equals 40 units in this analysis.
[0138] Referring now to Figure 14, there is shown the variation of correlation coefficient (r) with SNR and CWT scales for the artificially synthesized spectra outlined in Figure 13.
[0139] To generate the calibration curve, the scale related to the highest correlation coefficient (r) obtained for Gaussian peaks exhibiting different SNR is plotted against the SNR values. A function can then be fitted through these points, where the Best-Scale for obtaining the 2nd derivative of any spectra after finding its SNR can be estimated. The function which best fits this calibration curve is exponential in nature. The best fit function generates real numbers, while scales for CWT should contain only integer numbers. Accordingly, rounding towards positive infinity of the fitted function is considered as the calibration curve. Calibration curve for the current example is shown in Figure 15.
[0140] There are three main issues when working with calibration curves:
Changes in the number of signal data points can change the calibration curve; The numbers of scales in WT, should be same for all investigated spectra; and The calibration curve is dependent on the width of the peaks (in Gaussian peaks this is known as variance).
[0141] To address the first condition, the number of data points for each spectrum was fixed at 5000 points. This adjustment was carried out using cubic spline data interpolation. The scales for all transforms have been chosen in a constant array from 1 to 200. The variation of calibration curve for different variance of simulated Gaussian peak is shown in Figure 16. In the case of a constant SNR, an increase in
peaks width results in an increase in Best-Scale number. By fixing data points and scale array lengths, the only user input will be the width estimation. In the case where a lower variance is selected, sometimes, part of the peak base is selected as background as well. In this case, background estimation through fitting of nth order polynomial, which could be corrected with the algorithm, is explained in with reference to Figure 12. In the case where the variance estimations are high, the areas that relate to the background would be more confined due to the increase in Best-Scale values. This may result in false background estimation producing artificial humps in the corrected spectrum. This can be addressed by decreasing variance estimation number with user input. An important aspect of this is that if the average peak width in the spectrum lies within the same range of the estimated variance number, then this fixed number would address all similar situations.
[0142] In accordance with investigated experimental results, keeping the variance number at a constant value of 20 resulted in appropriate background estimations in most of cases, as is explained in the experimental results section.
[0143] Estimating Signal-to-Noise Ratio
SNR is an important factor to determine the Best-Scale values for estimating 2nd derivative of a spectrum. Due to the dependency of Best-Scale to SNR, it is important to estimate the SNR of a spectrum before estimating 2nd derivative. As previously described, the first step for this calculation is de-convoluting or estimating the noise profile from the signal. This issue may be addressed by smoothing a noisy signal and subtracting the de-noised signal from the spectrum that results in the noise profile. While this approach is used extensively, there are a number of issues associated with this approach. Primarily, in the case where there is a high level of de-noising, if the signal has sharp peaks, the de-noised spectrum could reduce the intensity of these peaks. Subsequently, the noise profile derived from simple subtraction of the de- noised spectrum from the noisy spectrum would result in artificial peaks where sharp peaks occur in the spectrum. This error induces higher intensities in the noise profile within the ranges where sharp peaks are smoothed in the spectrum. On the other hand, in the case of low SNR values, the peaks with lower intensities are suppressed during the de-noising step, which introduces errors in estimating the noise profile.
[0144] If noise is considered as a high frequency signal distributed evenly over the whole spectrum, a section of its profile can be used to represent the noise profile where the range is comparably larger than the average noise wavelength. In other
words, two different sections of a noise profile should have similar RMS values with negligible variance if they are distributed evenly and have the same intensity in the overall range. One aspect that needs to be addressed is selecting the threshold for dividing the spectrum into measurable sections. The division window should be large enough to provide a significant sample of the noise profile for calculations and also small enough to make it possible to select a region that does not include peaks. After selecting an appropriate window size, the standard deviation (STD) for each window is calculated and the lowest value should correspond to a part of the signal which consists of noise and background without peaks. In the case where the selected window size is small enough, the background can be estimated using a simple linear fit.
[0145] The results of the noise profile selection are shown in Figure 17 for a spectrum with sigmoidal background with 10 peaks and initial SNR equal to 20. Figure 17a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sigmoidal background, wherein the shaded region represents the segment size for calculating the STD. Figure 17b) shows the STD for different segments of the spectrum. Figure 17c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background. Figure 17d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum. Finally Figure 17e) shows the different smoothing levels of the spectrum.
[0146] SNR estimation results for a spectrum with other types of background are shown in Figures 18 and 19 for a linear background, and Figure 20 and 21 for a sinusoidal background.
[0147] Figure 18a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with linear background, wherein the shaded region represents the segment size for calculating the STD. Figure 18b) shows the STD for different segments of the spectrum. Figure 18c) shows the spectrum in the segment having a minimum standard deviation, wherein the line shows a linear fitting of the spectrum in the segment to identify the background. Figure 18d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum. Figure 18e) shows the different smoothing levels of the spectrum.
[0148] Figure 19a) shows the synthetic linear spectrum with start and finish points determined. Figure 19b) shows a start point condition of the spectrum (Subclass=0).
Figure 19c) shows a finish point condition of the spectrum (Subclass=6). Figure 19d) shows background estimation points fitted. Figure 19e) shows the original spectrum of synthetic linear data together with the background corrected spectrum of synthetic linear data.
[0149] Figure 20a) shows the simulated Raman spectrum with ten peaks randomly distributed on a signal with sinusoidal background, wherein the shaded region represents the segment size for calculating the STD. Figure 20b) shows the STD for different segments of the spectrum. Figure 20c) shows the spectrum in the segment having a minimum STD, wherein the line shows a linear fitting of the spectrum in the segment to identify the background. Figure 20d) shows the estimated noise profile obtained by subtracting the linear background and the spectrum. Figure 20e) shows the different smoothing levels of the spectrum.
[0150] Figure 21 a) shows the synthetic sinusoidal spectrum with start and finish points determined. Figure 21 b) shows a start point condition of the spectrum (Subclass=3). Figure 21 c) shows a finish point condition of the spectrum (Subclass=4). Figure 21 d) shows background estimation points fitted. Figure 21 e) shows the original spectrum of synthetic sinusoidal data together with the background corrected spectrum of synthetic sinusoidal data.
[0151] In order to estimate the accuracy of the background correction algorithm, testing has been carried out using similar signal features (10 peaks randomly distributed on the signal with sigmoidal background) while changing two parameters: the SNR of the signal and the background intensity.
[0152] Referring now to Figure 22, there are shown the accuracy tests for the SNR estimation algorithm. Figure 22a) shows the variation of SNR with smoothing. Figure 22b) shows the effect of background intensity where the background intensity ratio is calculated by dividing values of the intensity of highest peak in the spectrum with the background to intensity of the spectrum without the background. Figure 22c) shows the effect of change in the real SNR on the estimated SNR values by comparing the estimated SNR and the initial SNR.
[0153] In the case where the level of smoothing is low, certain wider peaks are considered as background, while, in the case where the level or smoothing is higher, it tends towards a linear profile where most sections of the smoothed curve are located under the real background resulting in higher noise PTP values. Therefore, increasing the smoothing levels shows an increase in SNR values. Due to these
changes, the average values of the SNRs derived in different smoothing levels could be a good estimation for the real SNR of the signal. In the tested data (see Figure 22a)), the initial SNR of the synthesized spectrum was 30 while the estimated SNR value was 29.3, which when rounded towards positive infinity correlates well with the initial SNR value. The intensity of background could also influence the estimated SNR. If the intensity of the background increases, peaks would be suppressed in the signal. Accordingly, smoothing curves tend to follow peak shapes in lower intensities which results in an increase in estimated SNR. If initial background correction is ignored in the algorithm, then it results in higher values as shown in Figure 22b). But in the current method for estimating SNR, with increasing intensity of the background (see Figure 22b)), estimated SNR is in general agreement with the initial SNR values. On comparing the initial and estimated SNR values, where these values were changed with a constant intensity ratio of 6, it resulted in the estimated SNR values to be closer to the real SNR values. Similarly to the earlier discussed scenarios, if the initial background correction is ignored then the estimated SNR would have larger values than real SNR (see Figure 22c)).
[0154] 2nd Derivative and End Effect
Most spectra are discrete in nature, i.e. they do not always tend to be of zero intensity at the start and finish points (i.e. see the discussion on end effects). These end points are considered as break points in the spectrum and during wavelet transformation and an artificial peak will appear in these areas. Approximating the 2nd derivative of the synthetic spectrum without applying end effect correction results in artificial peaks, see for example Figure 23b) at either end of the spectrum . As previously described, negative peaks in the 2nd derivative correspond to the position of the signal peaks in the spectrum. Due to the end effect, two artificial peaks are added resulting in an error during the peak removal process. The 2nd derivative of any spectra is calculated using wavelet transform with "Mexican Hat" as the mother wavelet, as previously described. The active regions of the "Mexican Hat" wavelet are equal to [- 5 - a,5 - a] where a represents the scale of transform. Thus, if the spectrum is extended from both sides such that the added points could have a length wider than 5 a , the end effects would be confined to these regions. As the 2nd derivative of a signal is highly sensitive to any breaking points and sharp changes in slopes in the signal, the extending points should be added such that they follow an
adjacent slope of the signal.
[0155] Referring now to Figure 23c), following estimation of Best-Scale (where the value for a can be determined), 10*a points are added to the start and end of the signal based on the signals local slope at these junctions. The active regions were doubled in this case to ensure that no trace of end effects remains in the 2nd derivative. The corrected 2nd derivative of the spectrum is shown in Figure 23d).
[0156] Referring now to Figure 24b), due to the existence of noise in the spectrum, the 2nd derivative of a spectrum, through numerical calculation, would result in a noisy spectrum. This profile does not provide appropriate information that is required to determine the peak positions in the 2nd derivative of the spectrum. Estimating the 2nd derivative of the spectrum after end point correction using WT in Best-Scale would still exhibit traces of noise in the spectrum that can be suppressed by simply squaring of the estimated 2nd derivative, as the intensity of the reminiscent noise is less than 1 (see Figure 24d)).
[0157] When the distance between peaks is less than the width of the peaks, the signal peaks tend to overlap each other. When this phenomenon occurs, merging of peaks can follow. Under such conditions, estimating background points using 2nd derivative is a challenging process. If two peaks exist in a signal, based on their position and their Full Width at Half Maximum (FWHM), the degree of separation (R) could be defined as a variable to show the overlap and peak conditions with respect to each other. If these peaks follow a Gaussian function, FWHM of each peak can be calculated as:
The degree of separation is defined as:
R = ¾ ~ Xl (13)
^ (FWHMj + FWHM
[0158] Referring now to Figure 25, the smaller the value of R, the more likely the overlap of signal peaks. In Figure 25 the variation of R with position in two similar Gaussian peaks is explained. By increasing the distance between signals, R is increased. Referring ne to Figure 25b), if the spectrum is considered as not overlapping parts of Gaussian peaks, the 2nd derivative of the signal would show a positive peak by increasing distance between peaks. When two peaks are located within their FWHM range, this positive peak reaches its maximum value. Referring
now to Figure 25c), by moving from these points, a minima extremum point is generated in 3rd derivative. If the areas of 2nd derivative of the spectrum between zero crossing points are considered, the location of the minimum point that lies in a positive area sandwiched between the two negative areas can be observed. The location of this minimum point can be established by considering the zero crossing 3rd derivative of the spectrum. If the intensity of this point exceeds half of the intensity of a maximum adjacent point, it could be considered roughly as a part of the background.
[0159] This approximation, results in adding points to the background arrays where 2nd derivative fails to estimate the background in the highly peak populated areas. Accordingly, instead of following an arbitrary shape, the fitting process progresses through these points. The location of these points could be a little higher than the base of the peaks, but, the fitting and adjustment algorithm should correct any overlapping due to the points added to the background arrays.
[0160] Background Correction
Referring now to Figure 26, after finding squared 2nd derivative of the synthesized spectrum, the signal peaks are removed from the spectrum by applying the algorithm previously described. The areas between start and end points (represented as arrows in Figure 26a)) are related to the background. These areas are selected for fitting and estimating the background of the signal. Following this process, the algorithm then finds the subclass of the start and end points. In the example illustrated in Figure 26, a subclass value of 0 for start points and a subclass value of 4 for finish points are detected (see Figures 26b) and 26c) respectively). The dashed line in Figure 26d) is the first fitting estimation for background. This background estimated curve crosses the spectrum at the end. In order to correct this issue, the fitting and adjustment algorithm is applied. By adding points in this region after 314 loops, the final background is determined. This process eliminates the creation of artificial signal peaks due to fluctuation of estimated background in an area where there are not enough points to fit a nth order polynomial. Referring now to Figure 26e), a simple subtraction of this curve from the original spectrum results in the background corrected spectrum.
[0161] Testing the Accuracy of Proposed Algorithm
Following 900 iterations testing of the algorithm outlined previously, a comparison between the background corrected spectrum and the original signal before adding
background was carried out. Referring now to Figure 27a), there is shown the root mean squared error (RMSE) obtained over 900 iterations of the background correction method of the present invention and the distribution of the RMSE with the number of iterations. Based on a previous studies, the best known background correction methods report an RMSE value of more than 0.1 . The median RMSE calculated using the proposed algorithm is about 0.075 which is less than 0.1 indicating that the proposed algorithm provides improved approximation for background correction. Referring now to Figure 27b), there are shown the frequency changes of RMSE, which show that typically more than 94% of the points have an RMSE lower than 0.2, out of which more than 77% lie below 0.1 RMSE, suggesting that the proposed algorithm would have less than 6% error in all conditions of signal features. Hence, the background correction method is assumed to be an excellent candidate for automation. Referring now to Figure 28, testing with varying peak numbers shows that by increasing the number of peaks in a spectrum, the median RMSE initially changes slowly, but changes rapidly after 20 peaks. This behaviour is a direct consequence of the decreasing number of background points in the spectrum.
[0162] Referring now to Figure 29, the values of Best-Scale exponentially increase with a decrease in SNR while an increase in the scale results in widening of the 2nd derivative peaks. Wider peaks confine the background points that can be selected between peaks. Thus, as shown in Figure 29, a lower SNR produces a higher RMSE. By increasing SNR, values of RMSE initially show a drastic decrease, however after a while a slight increase is observed that becomes constant at higher values. The explanation for this behaviour is related to the exponential nature of the calibration curve and inaccuracies introduced by performing wavelet transform on an essentially noiseless signal. Rounding the value towards positive infinity in calculating Best-Scale values is inevitable due to integer nature of the WT scales. However, it makes the Best-Scale constant at higher level of SNR. The slight increase after the initial decrease in RMSE could be related to this feature where for all SNR values larger than 80, the Best-Scale varies from 6 to 1 .
[0163] Experimental Results
Referring now to Figure 30 there is shown application of the proposed algorithm for background correction of four different noisy experimental systems (L-serine, rhodamine, methyl red and crystal violet). Figure 30a) shows application of the
background correction method for experimentally obtained real Raman spectra for serine amino acid. Figure 30b) shows application of the background correction method for experimentally obtained real Raman spectra for rhodamine. Figure 30c) shows application of the background correction method for experimentally obtained real Raman spectra for methyl red. Figure 30d) shows application of the background correction method for experimentally obtained real Raman spectra for crystal violet.
[0164] Analysis of the performance of the algorithm in real data is not available due to the inherent inability to obtain the experimental data without a background to compare results. However, the proposed algorithm appears to demonstrate good performance in most cases, with the exception of a few minor errors resulting from the condensation of peaks (see for example Figure 30b)). Interestingly, the end effect errors are considerably less than the commonly reported studies due to the ability of algorithm reported in this study to follow background feature in the end points.
[0165] The background correction method of the present invention may be implemented using hardware, software or a combination thereof and may be implemented in one or more computer systems or processing systems capable of carrying out the above described functionality.
[0166] Such a computer system is illustrated in Figure 31 . In this Figure, an exemplary computer system 3100 includes one or more processors, such as processor 3105. The processor 3105 is connected to a communication infrastructure 31 10. The computer system 3100 may include a display interface 31 15 that forwards graphics, texts and other data from the communication infrastructure 31 10 for supply to the display unit 3120. The computer system 3100 may also include a main memory 3125, preferably random access memory, and may also include a secondary memory 3130.
[0167] The secondary memory 3130 may include, for example, a hard disk drive 3135, magnetic tape drive, optical disk drive, etc. The removable storage drive 3140 reads from and/or writes to a removable storage unit 3145 in a well-known manner. The removable storage unit 3145 represents a floppy disk, magnetic tape, optical disk, USB etc.
[0168] As will be appreciated, the removable storage unit 3145 includes a computer usable storage medium having stored therein computer software in a form of a series of instructions to cause the processor 3105 to carry out desired functionality. In alternative embodiments, the secondary memory 3130 may include
other similar means for allowing computer programs or instructions to be loaded into the computer system 3100. Such means may include, for example, a removable storage unit 3140 and interface 3150.
[0169] The computer system 3100 may also include a communications interface 3160. Communications interface 3160 allows software and data to be transferred between the computer system 3100 and external devices. Examples of communication interface 3160 may include a modem, a network interface, a communications port, a PCMIA slot and card etc. Software and data transferred via a communications interface 3160 are in the form of signals 3165 which may be electromagnetic, electronic, optical or other signals capable of being received by the communications interface 3160. The signals are provided to communications interface 3160 via a communications path 3170 such as a wire or cable, fibre optics, phone line, cellular phone link, radio frequency or other communications channels.
[0170] Although in the above described embodiments the invention is implemented primarily using computer software, in other embodiments the invention may be implemented primarily in hardware using, for example, hardware components such as an application specific integrated circuit (ASICs). Implementation of a hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art. In other embodiments, the invention may be implemented using a combination of both hardware and software.
[0171] The present invention provides an improved background correction method or algorithm based on wavelet transformation for baseline correction of vibrational spectra with the ability to work with noisy signals without de-noising. Whilst the invention has been described in the context of Raman spectra, it is to be understood that the background correction method is equally applicable to other types of vibrational spectra including at least including at least Raman and Infrared such as Fourier Transform Infrared. The background correction algorithm benefits from WT to enable it to work directly with noisy signal, and SRM for enabling peak removal from the signal and finding the background shape. Use of WT eliminates the requirement for prior smoothing of the signal and also gives a good approximation to estimate the start and finish points of signal due to its ability to calculate 2nd derivative of the noisy spectrum. On the other hand, using SRM, the peaks remain untouched and background estimation can be achieved using fitting of the remaining data points in the spectrum.
[0172] The background correction method of the present invention is adapted for integration with commercially-available large vibrational spectrophotometers (including infrared and Raman spectrophotometers) as well as more recently- commercialised hand-held Raman spectrophotometers. Notably, the instrumentation market is highly competitive and the end users of such equipment demand high quality background corrected data to be output directly from the equipment without the need for further data processing. Therefore, the adoption of the background correction method by instrumentation manufacturers should secure a significant competitive advantage in marketing and increasing the user base of their products.
[0173] In regard to hand-held Raman spectrophotometers, it is effectively requisite critical for the instrumentation manufacturers and/or users to employ this type of background correction method. This is because the portability of these hand-held systems will be compromised in the absence of an appropriate data processing and data interpretation algorithm. Surface enhanced Raman spectroscopy (SERS) is currently considered to have the potential to be a critical technology that will transform our societies by enabling highly sensitive, selective, ultra-fast and low-cost environmental monitoring, and efficient diagnosis of infections, diseases, foods, forensic applications, etc. Portable Raman spectrophotometers and their variants employing SERS technology will rely on provision of efficient background correction techniques as proposed by the present invention for rapid interpretation of data at the point of analysis. The background correction method of the present invention will find application in both high-throughput and real-time vibrational analysis of different sample types, including solids, liquids and gases.
[0174] The proposed algorithm has been tested for accuracy and has achieved an acceptable level of error that makes the background correction method useful for most of the data analysis essential for vibrational spectroscopy. The tests for accuracy as well as experimental results demonstrate that the background correction method of the present invention would be useful in instances where automatic baseline detection is required. This approach could address the problems of background corrections on real data where the quality of spectra is low (e.g. biological and/or chemical samples with low SNR and/or high fluorescence). Also, based on accuracy tests, this approach has a minimal variance in the relative peak intensities during analyses.
[0175] Application of the proposed algorithm for background correction of four different noisy experimental systems (L-serine, rhodamine, methyl red and crystal violet) showed good performance of the algorithm, wherein the end effect errors were found considerably less than known methods. It is a significant strength of the proposed algorithm is that it does not involve any smoothing step, avoiding which is a major challenge in obtaining background-corrected spectra.
[0176] Where the terms "comprise", "comprises" "comprised" or "comprising" are used in this specification (including the claims), they are to be interpreted as specifying the presence of stated features, integers, steps or components referred to, but not preclude the presence of one or more other feature, integer, step, component or group thereof.
[0177] While the invention has been described in conjunction with a limited number of embodiments, it will be appreciated by those skilled in the art that many alternative, modifications and variations in light of the foregoing description are possible. Accordingly, the present invention is intended to embrace all such alternative, modifications and variations as may fall within the spirit and scope of the invention as disclosed.
Claims
1 . A background correction method for a spectrum of a target sample, the method including the following steps:
(a) inputting the spectrum including a plurality of signal peaks attributable to spectral data, background and noise data;
(b) estimating a signal-to-noise ratio (SNR) for the spectrum;
(c) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT);
(d) removing the plurality of signal peaks to identify the background;
(e) subtracting the background from the spectrum to obtain a background corrected spectrum; and
(f) outputting the background corrected spectrum as a target sample signal.
2. A background correction method according to claim 1 , wherein the position of at least most of the plurality of signal peaks is determined by applying a second order derivative.
3. A background correction method according to claim 2, wherein the wavelet transform (WT) used to approximate the second order derivative of the spectrum is a "Mexican Hat" mother wavelet.
4. A background correction method according to claim 2 or 3, wherein the wavelet transform (WT) is a continuous wavelet transform (CWT) or a discrete wavelet transform (DWT).
5. A background correction method according to any one of claims 1 to 4, wherein step of estimating a signal-to-noise ratio for the spectrum includes the following steps:
(a) dividing the spectrum into a plurality of segments;
(b) estimating a standard deviation for at least most of the plurality of segments;
(c) estimating the background of the segments using a minimum estimated standard deviation;
(d) calculating the root mean square (RMS) of a total background signal for a
totality of segments of the spectrum;
(e) calculating the root mean square (RMS) of the spectrum; and
(f) calculating the signal-to-noise ratio.
6. A background correction method according to any one of claims 1 to 5, wherein removing the plurality of signal peaks to identify the background includes the following steps:
(a) determining a start point and a finish point corresponding to each signal peak by calculating zero crossing points corresponding to the start points and finish points by applying a second order derivative;
(b) dividing the spectrum into sections each section corresponding to an individual signal peak or an individual feature comprising merged multiple signal peaks based on the start and finish points;
(c) calculating an area for each section; and
(d) selecting a minimum area corresponding to the section having a largest signal peak in the spectrum and using the minimum area to define a minimum threshold; wherein any signal peak having an area less than the minimum threshold constitutes background.
7. A background correction method according to any one of claims 1 to 6, wherein subtracting the background from the spectrum to obtain a background corrected spectrum is preceded by the step of minimising the effect of a first and second endpoint of the spectrum.
8. A background correction method according to claim 7, wherein the step of minimising the effect of a first and second endpoint of the spectrum includes the following steps:
(a) extending the spectrum from the first endpoint corresponding to the start of the signal by adding signal points based on the slope of the signal adjacent to the first endpoint; and
(b) extending the spectrum from the second endpoint corresponding to the end of the signal by adding signal points based on the slope of the signal adjacent to the second endpoint.
9. A background correction method according to any one of claims 1 to 8, wherein the spectrum is a vibrational spectrum.
10. A background correction method according to claim 9, wherein the vibrational spectrum is a Raman spectrum or an Infrared spectrum.
1 1 . A background correction method according to claim 9, wherein the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
12. A background correction method according to any one of claims 1 to 1 1 , wherein the background correction method is applied to a spectrum collected by vibrational spectroscopy or microscopy.
13. An apparatus for producing a background corrected spectrum of a sample, the sample being obtained by a spectroscopic device including a light source and a detector assembly for detecting photons scattered by the sample when illuminated by the light source, the apparatus including:
a processor configured to execute a machine readable code to perform the following steps:
(i) inputting a spectrum including a plurality of signal peaks attributable to spectral data, background and noise data;
(ii) estimating a signal-to-noise ratio (SNR) for the spectrum;
(iii) determining a position of each of the plurality of signal peaks by approximating a derivative of the spectrum using a wavelet transform (WT);
(iv) removing the plurality of signal peaks to identify the background;
(v) subtracting the background from the spectrum to obtain a background corrected spectrum; and
(vi) outputting the background corrected spectrum as a target sample signal.
14. An apparatus for producing a background corrected spectrum of a sample according to claim 13, wherein the spectrum is a vibrational spectrum.
15. An apparatus for producing a background corrected spectrum of a sample according to claim 14, wherein the vibrational spectrum is a Raman spectrum or an Infrared spectrum.
16. An apparatus for producing a background corrected spectrum of a sample according to claim 14, wherein the vibrational spectrum is a Fourier Transform Infrared (FTIR) spectrum.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2012905622A AU2012905622A0 (en) | 2012-12-19 | A background correction method for a spectrum of a target sample | |
AU2012905622 | 2012-12-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014094039A1 true WO2014094039A1 (en) | 2014-06-26 |
Family
ID=50977391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2013/001472 WO2014094039A1 (en) | 2012-12-19 | 2013-12-17 | A background correction method for a spectrum of a target sample |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014094039A1 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104931518A (en) * | 2015-06-09 | 2015-09-23 | 东南大学 | Method of X-ray fluorescence spectrum background rejection |
CN105466908A (en) * | 2015-12-31 | 2016-04-06 | 安徽芯核防务装备技术股份有限公司 | Raman spectrum method for removing interference noise produced during sample bottle fixing |
CN106404178A (en) * | 2016-08-30 | 2017-02-15 | 中国科学院地理科学与资源研究所 | Hyperspectral thermal infrared surface temperature and emissivity separation method based on wavelet transformation |
CN106908655A (en) * | 2017-03-06 | 2017-06-30 | 广东顺德工业设计研究院(广东顺德创新设计研究院) | Photosignal peak-value detection method and system |
CN107014785A (en) * | 2017-05-15 | 2017-08-04 | 浙江全世科技有限公司 | A kind of improved method of emission spectrum background correction |
CN108152262A (en) * | 2018-01-11 | 2018-06-12 | 南京溯远基因科技有限公司 | A kind of Capillary Electrophoresis method for nucleic acid analysis and system |
CN108241845A (en) * | 2016-12-26 | 2018-07-03 | 同方威视技术股份有限公司 | Method for deducting spectrogram background and the method by Raman mass spectrum database substance |
CN109342336A (en) * | 2018-12-10 | 2019-02-15 | 合肥泰禾光电科技股份有限公司 | A kind of real-time spectrometer system and device for deducting dark background |
CN109669205A (en) * | 2019-01-08 | 2019-04-23 | 山东省科学院海洋仪器仪表研究所 | A kind of Peak Search Method of seawater radionuclide K40 element |
EP3486641A1 (en) * | 2017-11-09 | 2019-05-22 | Jeol Ltd. | Data processing apparatus and data processing method |
CN110162740A (en) * | 2019-05-14 | 2019-08-23 | 广西科技大学 | A kind of inverse matrix iteration Deconvolution Method for spectrally resolved enhancing |
EP3575775A1 (en) * | 2018-05-29 | 2019-12-04 | Horiba, Ltd. | Calibration curve setting method used for drug analysis |
CN111289489A (en) * | 2020-03-05 | 2020-06-16 | 长春长光辰英生物科学仪器有限公司 | Raman spectrum-based microbial unicell growth detection method |
US10760966B2 (en) * | 2015-04-07 | 2020-09-01 | Analytik Jena Ag | Method for the correction of background signals in a spectrum |
CN111982949A (en) * | 2020-08-19 | 2020-11-24 | 东华理工大学 | Method for separating EDXRF spectrum overlapping peak by combining fourth derivative with three-spline wavelet transform |
CN113008874A (en) * | 2021-03-11 | 2021-06-22 | 合肥工业大学 | Method for improving qualitative detection capability of laser-induced breakdown spectroscopy technology based on baseline correction and spectral peak recognition |
CN114354566A (en) * | 2021-11-30 | 2022-04-15 | 安徽中科赛飞尔科技有限公司 | Method for improving effective information rate of SERS signal based on stray peak deduction |
CN115078616A (en) * | 2022-05-07 | 2022-09-20 | 天津国科医工科技发展有限公司 | Multi-window spectral peak identification method, device, medium and product based on signal-to-noise ratio |
CN115308130A (en) * | 2022-08-08 | 2022-11-08 | 厦门大学 | Method for analyzing and processing noise of micro-spectrum instrument and noise reduction model |
CN117593205A (en) * | 2023-10-31 | 2024-02-23 | 北京霍里思特科技有限公司 | Method for distinguishing background and characteristic peaks of spectrum and related products thereof |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5885841A (en) * | 1996-09-11 | 1999-03-23 | Eli Lilly And Company | System and methods for qualitatively and quantitatively comparing complex admixtures using single ion chromatograms derived from spectroscopic analysis of such admixtures |
-
2013
- 2013-12-17 WO PCT/AU2013/001472 patent/WO2014094039A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5885841A (en) * | 1996-09-11 | 1999-03-23 | Eli Lilly And Company | System and methods for qualitatively and quantitatively comparing complex admixtures using single ion chromatograms derived from spectroscopic analysis of such admixtures |
Non-Patent Citations (1)
Title |
---|
ZHANG, Z.-M. ET AL.: "An Intelligent Background-Correction Algorithm for Highly Fluorescent Samples in Raman Spectroscopy", JOURNAL OF RAMAN SPECTROSCOPY, vol. 41, no. 6, 2009, pages 659 - 669 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10760966B2 (en) * | 2015-04-07 | 2020-09-01 | Analytik Jena Ag | Method for the correction of background signals in a spectrum |
CN104931518A (en) * | 2015-06-09 | 2015-09-23 | 东南大学 | Method of X-ray fluorescence spectrum background rejection |
CN105466908A (en) * | 2015-12-31 | 2016-04-06 | 安徽芯核防务装备技术股份有限公司 | Raman spectrum method for removing interference noise produced during sample bottle fixing |
CN105466908B (en) * | 2015-12-31 | 2018-04-20 | 安徽芯核防务装备技术股份有限公司 | A kind of sample bottle fixes the Raman spectrum minimizing technology of interference noise |
CN106404178B (en) * | 2016-08-30 | 2018-12-04 | 中国科学院地理科学与资源研究 | EO-1 hyperion thermal infrared surface temperature and emissivity separation method based on wavelet transformation |
CN106404178A (en) * | 2016-08-30 | 2017-02-15 | 中国科学院地理科学与资源研究所 | Hyperspectral thermal infrared surface temperature and emissivity separation method based on wavelet transformation |
US11493447B2 (en) | 2016-12-26 | 2022-11-08 | Nuctech Company Limited | Method for removing background from spectrogram, method of identifying substances through Raman spectrogram, and electronic apparatus |
CN108241845A (en) * | 2016-12-26 | 2018-07-03 | 同方威视技术股份有限公司 | Method for deducting spectrogram background and the method by Raman mass spectrum database substance |
WO2018121121A1 (en) * | 2016-12-26 | 2018-07-05 | 同方威视技术股份有限公司 | Method for use in subtracting spectrogram background, method for identifying substance via raman spectrum, and electronic device |
CN106908655A (en) * | 2017-03-06 | 2017-06-30 | 广东顺德工业设计研究院(广东顺德创新设计研究院) | Photosignal peak-value detection method and system |
CN107014785A (en) * | 2017-05-15 | 2017-08-04 | 浙江全世科技有限公司 | A kind of improved method of emission spectrum background correction |
EP3486641A1 (en) * | 2017-11-09 | 2019-05-22 | Jeol Ltd. | Data processing apparatus and data processing method |
CN108152262B (en) * | 2018-01-11 | 2024-06-11 | 南京溯远基因科技有限公司 | Capillary electrophoresis nucleic acid analysis method and system |
CN108152262A (en) * | 2018-01-11 | 2018-06-12 | 南京溯远基因科技有限公司 | A kind of Capillary Electrophoresis method for nucleic acid analysis and system |
EP3575775A1 (en) * | 2018-05-29 | 2019-12-04 | Horiba, Ltd. | Calibration curve setting method used for drug analysis |
US11719627B2 (en) | 2018-05-29 | 2023-08-08 | Horiba, Ltd. | Calibration curve setting method used for drug analysis |
CN109342336B (en) * | 2018-12-10 | 2021-07-06 | 合肥泰禾智能科技集团股份有限公司 | Spectrometer system and device for deducting dark background in real time |
CN109342336A (en) * | 2018-12-10 | 2019-02-15 | 合肥泰禾光电科技股份有限公司 | A kind of real-time spectrometer system and device for deducting dark background |
CN109669205A (en) * | 2019-01-08 | 2019-04-23 | 山东省科学院海洋仪器仪表研究所 | A kind of Peak Search Method of seawater radionuclide K40 element |
CN110162740A (en) * | 2019-05-14 | 2019-08-23 | 广西科技大学 | A kind of inverse matrix iteration Deconvolution Method for spectrally resolved enhancing |
CN110162740B (en) * | 2019-05-14 | 2023-03-31 | 广西科技大学 | Inverse matrix iteration deconvolution method for spectral resolution enhancement |
CN111289489B (en) * | 2020-03-05 | 2023-06-02 | 长春长光辰英生物科学仪器有限公司 | Raman spectrum-based microorganism single cell growth detection method |
CN111289489A (en) * | 2020-03-05 | 2020-06-16 | 长春长光辰英生物科学仪器有限公司 | Raman spectrum-based microbial unicell growth detection method |
CN111982949B (en) * | 2020-08-19 | 2022-06-07 | 东华理工大学 | Method for separating EDXRF spectrum overlapping peak by combining fourth derivative with three-spline wavelet transform |
CN111982949A (en) * | 2020-08-19 | 2020-11-24 | 东华理工大学 | Method for separating EDXRF spectrum overlapping peak by combining fourth derivative with three-spline wavelet transform |
CN113008874A (en) * | 2021-03-11 | 2021-06-22 | 合肥工业大学 | Method for improving qualitative detection capability of laser-induced breakdown spectroscopy technology based on baseline correction and spectral peak recognition |
CN114354566A (en) * | 2021-11-30 | 2022-04-15 | 安徽中科赛飞尔科技有限公司 | Method for improving effective information rate of SERS signal based on stray peak deduction |
CN115078616A (en) * | 2022-05-07 | 2022-09-20 | 天津国科医工科技发展有限公司 | Multi-window spectral peak identification method, device, medium and product based on signal-to-noise ratio |
CN115078616B (en) * | 2022-05-07 | 2024-06-07 | 天津国科医疗科技发展有限公司 | Multi-window spectrum peak identification method, equipment, medium and product based on signal to noise ratio |
CN115308130A (en) * | 2022-08-08 | 2022-11-08 | 厦门大学 | Method for analyzing and processing noise of micro-spectrum instrument and noise reduction model |
CN117593205A (en) * | 2023-10-31 | 2024-02-23 | 北京霍里思特科技有限公司 | Method for distinguishing background and characteristic peaks of spectrum and related products thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014094039A1 (en) | A background correction method for a spectrum of a target sample | |
Kandjani et al. | A new paradigm for signal processing of Raman spectra using a smoothing free algorithm: Coupling continuous wavelet transform with signal removal method | |
TWI468666B (en) | Monitoring, detecting and quantifying chemical compounds in a sample | |
US11493447B2 (en) | Method for removing background from spectrogram, method of identifying substances through Raman spectrogram, and electronic apparatus | |
EP0535700B1 (en) | Method and apparatus for comparing spectra | |
JP6357661B2 (en) | Terahertz spectroscopy system | |
Kumar et al. | Analysis of dilute aqueous multifluorophoric mixtures using excitation–emission matrix fluorescence (EEMF) and total synchronous fluorescence (TSF) spectroscopy: a comparative evaluation | |
US20040080761A1 (en) | Method and apparatus for thickness decomposition of complicated layer structures | |
CN109883547A (en) | A kind of wide-band spectrum signal antinoise method based on wavelet threshold difference | |
Wentzell et al. | Characterization of heteroscedastic measurement noise in the absence of replicates | |
CN105628675B (en) | A kind of removing method of the Raman fluorescence interference of power sensitive substance | |
Tseitlin et al. | Uncertainty analysis for absorption and first‐derivative electron paramagnetic resonance spectra | |
CN108195817B (en) | Raman spectrum detection method for removing solvent interference | |
Liu et al. | Fast extraction of resonant vibrational response from CARS spectra with arbitrary nonresonant background | |
CN106404743A (en) | Raman spectrum and near infrared spectrum combined detection method and detection device | |
US20140253928A1 (en) | Thickness change monitor wafer for in situ film thickness monitoring | |
CN111157115B (en) | Underwater Brillouin scattering spectrum acquisition method and device | |
CN111077128B (en) | Raman signal position correction using relative integration parameters | |
Liu et al. | Simultaneous quantitative analysis of three components in mixture samples based on NIR spectra with temperature effect | |
JP6572169B2 (en) | Component concentration measuring apparatus and component concentration measuring method | |
US10488329B2 (en) | Calibration apparatus, calibration curve creation method, and independent component analysis method | |
Kandjani | A novel approach towards a better background correction of Raman signals | |
Rutledge et al. | PoLiSh—smoothed partial least-squares regression | |
CN111157116B (en) | Underwater Brillouin scattering spectrum test system | |
Woehl et al. | User-independent nonlinear modeling using adjusted spline-interpolated knots (UNMASK) and indirect hard modeling for deriving compositions from spectra with background signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13864256 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13864256 Country of ref document: EP Kind code of ref document: A1 |