WO2022146103A1

WO2022146103A1 - Construction of and searching method for raman scattering spectrum database through machine learning

Info

Publication number: WO2022146103A1
Application number: PCT/KR2021/020362
Authority: WO
Inventors: 이동우
Original assignee: 모던밸류 주식회사
Priority date: 2020-12-31
Filing date: 2021-12-31
Publication date: 2022-07-07
Also published as: KR20220097351A

Abstract

In order to accurately predict bio-information such as presence or absence and/or concentration of specific biological materials (e.g., proteins, amino acids, lipids, and nucleic acids), chemical bonds and constituents derived from cells (e.g., bacteria, cancer cells, and normal cells), and/or identification and/or concentration of cells, a bio-information predictive calculation device according to the present invention utilizes, as input information for machine learning algorithm, (a) one or more Raman shift values included in a Raman spectrum list for biological entity-derived samples, (b) a Raman maximum intensity which is the highest value among Raman intensities on the vertical axis in each shift value, and (c) a Raman minimum intensity which is the lowest value among Raman intensities on the vertical axis, and takes advantage of the mechanical learning algorithm to lean character sets of biological entity-derived samples, whereby characteristic information calculation or processing can be achieved to predict information about states of animals or cells from which biological entity-derived samples are extracted, disease diagnosis and/or an assay for efficacy of therapeutic agents, and/or information about bacterial infection of the samples.

Description

Raman scattering spectrum database construction and search method through machine learning

The present invention provides a method for building and searching a Raman scattering spectrum database through machine learning; and an apparatus for calculating desired biometric prediction information from a Raman spectrum database by performing the method.

Raman spectroscopy is a phenomenon in which resonance between the frequency of the changed polarization and the frequency within the molecule occurs when a part of the incident light changes the polarizability of a molecule when a short wavelength incident light such as laser light is exposed. It is a spectroscopy method that measures the intrinsic scattering frequency of molecules in the Raman effect. Raman spectroscopy is a method of irradiating incident light directly on a sample to be measured. It is easy to measure, and it is possible to measure even a very small amount of sample, there is no interference between moisture and carbon dioxide, and it can be used in a visible area. Therefore, Raman spectroscopy mainly uses a visible laser to detect light scattered by Raman molecules.

As shown in FIG. 2 , as a type of scattering, Rayleigh scattering without change in the frequencies of incident light and scattered light; Stokes scattering, in which incident light loses energy due to collision with atoms and decreases in frequency; and Anti-Stokes scattering, in which incident light gains energy by collision with atoms and increases in frequency. Among the scattered light, the scattering of light having less or more energy than the original incident light energy is called Raman scattering. Although the vibrational energy cannot be measured directly, it can be observed whether energy is lost or gained compared to Rayleigh scattering.

On the other hand, molecules cannot be excited to all energy states, but can be excited only to the level allowed by the Selection Rule.

In addition, Raman scattering occurs only in the mode in which the polarization degree is changed among the vibrational modes of the molecule. In Symmetric Vibration, the Raman spectrum is strongly generated.

Laser light is light in phase with a single wavelength. In general, the laser beam is thin and does not spread. Lasers are mainly used in spectroscopy because of their precisely defined monochromatic wavelengths. In the case of a pulsed laser, it is used to observe a phenomenon occurring in a short time by using a short pulse width.

Raman spectroscopy is known as a technique suitable for single-cell level bacterial detection because it can quickly measure intracellular lipids, nucleic acids, and proteins by using the property of laser scattering by molecular resonance. Because of the high specificity and sensitivity to cellular components, it is possible to analyze bacterial phylogeny at the level of some species using only Raman spectra. In addition, when isotopes such as carbon-13 and hydrogen-2 are used at the same time, it can be used for quantitative evaluation of changes in the physiological activity of single cells.

However, since the signal strength of Raman scattering is weak, it has been limitedly used in the field of biology for measuring trace amounts of substances or biological tissues. In the field of biology, various methods have been developed to amplify the weak Raman scattering signal. A typical SERS (surface enhanced raman spectroscopy) technique is known, and uses the principle that light irradiated to the surface of metal nanoparticles such as silver or gold enhances the plasmon resonance phenomenon of the material surface to amplify the Raman signal. In addition to this, the UV Resonance Raman (UVRR) method, which measures the Raman signal with a UV laser (180~260 nm), is also used, and this method can amplify the Raman signal by 10 ³ to 10 ⁵ times.

As we enter an aging society, interest in disease diagnosis, treatment, and prevention is rapidly increasing along with the realization of a healthy future and a healthy society. In order to realize such a healthy future society, efforts to protect oneself from diseases by maintaining biological functions and, above all, early diagnosis or prediction of diseases have increased. According to these social demands, more advanced medical technologies such as regular health monitoring for disease prevention, early diagnosis of diseases, and personalized diagnosis and treatment are required.

Accordingly, the demand for biosensors for medical examinations for timely treatment and prevention of diseases has increased, and accordingly, the biosensor market is rapidly expanding worldwide. Although a blood sample-based screening method is common, recently, with the development of a high-sensitivity sensor, a patient-friendly non-invasive sensing method capable of screening with body fluids such as urine, saliva, and tears is also being actively studied. This change in the sensor market paradigm and the rapid development of micro- and nano-materials and the development of analysis technology enable the development of miniaturized, high-sensitivity sensors that detect nano-sized biomarkers (proteins, genes, peptides, and cytokines) present in body fluids. As a result, various types of sensors have been actively reported in various academic fields such as chemistry, physics, materials, and medicine.

This movement has accelerated as we enter the digital health care era with the 4th industrial revolution, and with the start of the development of advanced biosensors, various capabilities such as signal transmission using communication networks and real-time health checkups are required.

As illustrated in FIG. 7 , various functions are required for biosensor development. For example, in fields such as signal transmission, communication, and signal conversion, research is mainly conducted in electrical engineering, computer engineering, and mechanical engineering, and the operating part for sensor driving is mainly in the fields of chemistry, biotechnology, materials engineering, and chemical biological engineering. research is in progress.

It can be said that the sensitivity and function of the sensor are very important for precise examination for early diagnosis. Among the characteristics of a sensor, the ability to recognize an analyte is the most important key factor in determining the sensitivity of a sensor.

The factors that determine the sensitivity of the sensor can be divided into two main categories. The first is a receptor site that recognizes the target material, and the second is a transducer that generates a signal after recognition and converts it into a desired signal form. can be said Antibodies, aptamers, peptides, nucleic acid sequences, etc. capable of recognizing a target are attached to the receptor site, so that the recognition ability is determined according to affinity with the target. When a receptor is introduced into the sensor surface, research has been conducted to optimize it, as cognitive intelligence is determined by the introduction method (chemical, physical, biological) and structural stability of the receptor. However, the structure that can interact with the target material is limited, and there is a limit to improving the sensitivity of the sensor using only the above-described method, so the need for a new approach is starting to be emphasized. Accordingly, we improved the emission system that generates a signal after recognizing a target material at the same time as stabilizing the sensor recognition unit, and through this, we focused more research on the signal transmission site that can recognize even a trace amount of material and receive it as a sensor signal. .

The development of new materials and the development of measuring devices also contributed to improving the sensor sensitivity, but above all, as micro- and nano-sized materials can be manufactured and patterned, it is possible to miniaturize the sensor and improve the sensitivity at the same time, such as measuring signals and shortening the assay time. it was done Among them, the development of nanomaterials not only improves the signal amplification and sensitivity of the biosensor, but also utilizes the optical, physical, and chemical properties of the material itself to secure a new type of sensor signal generation system, leading to innovative progress in diagnostic technology. have. For example, as a core element of a high-tech biosensor, there is an in vitro diagnostic sensor using nanoparticles among nanomaterials. In this case, the sensor signal detection method according to the characteristics of the nanoparticles is different.

In particular, Raman spectroscopy has greatly improved measurement sensitivity and specificity due to recent technological developments, and thus has a high utility value in the field of microbial ecology research. Therefore, studies on bacterial identification, functional analysis, and biogeography can be supplemented to analyze the functional role of microorganisms in the environment. The use of Raman spectroscopy enables real-time analysis of Raman signature analysis (detection) unique to bacteria and analysis of substrate specificity combined with isotopes (functional analysis), thus supplementing the limitations of existing analysis methods.

The present invention provides a method for constructing and searching a Raman scattering spectrum database through machine learning in order to derive desired information from data produced in a biological system; and to provide an apparatus for collecting, indexing, and storing desired biometric information from a Raman spectrum database by performing the above method.

Another object of the present invention is to provide computer software for deriving desired information from data produced in a biological system using a Raman spectrum database constructed through machine learning.

A first aspect of the present invention is a Raman scattering spectrum database construction and search method through machine learning,

From each sample, (a) one or more Raman shift values, from each shift value, (b) the Raman peak, which is the relatively highest value among the Raman intensities on the vertical axis, and (c) the Raman lowest point, which is the relatively lowest value among the Raman intensities on the vertical axis, is derived. , a first step of generating a Raman spectrum of each sample;

Step 2-1 of section learning the Raman shift value (a) of the first step;

Step 2-2 of cluster learning the Raman peak (b) of the first step, and

Step 2-3 of cluster learning the Raman lowest point (c) of the first step

a second step of generating (a') Raman shift values, (b') Raman peaks, and (c') Raman troughs from each shift value according to a machine learning algorithm that performs

Based on (a') Raman shift values generated in step 2, (b') Raman peaks and (c') Raman troughs at each shift value, (d) signal to noise ratio within a given repeatability is Sensitivity, defined as the ratio of the spectrum that is 50% or more Defined, stability and (f) based on the number of measurements within a given inspection time, the above-defined spectrum of stability of 50% or more and sensitivity of 50% or more has repeatability. Step 3;

a fourth step of calculating fractional bandwidth;

a fifth step of calculating spectral selectivity by selecting a spectrum having repeatability and having a stability of 80% or more and a sensitivity of 90% or more as a selection value within the shift as defined in the third step;

(a) Raman shift values of the first step, (b) Raman peaks and (c) Raman troughs at each shift value; (a') Raman shift values generated by machine learning in the second step, (b') Raman highest points and (c') Raman lowest points at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and a sixth step of constructing a Raman spectrum database by inputting the spectral selectivity calculated in the fifth step; and

Optionally, by inputting (a) Raman shift values, (b) Raman peaks and (c) Raman troughs at each shift value of the sample in the first step, the desired prediction from the Raman spectrum database constructed in the sixth step a seventh step of calculating information;

It provides a method for building and searching a Raman scattering spectrum database through machine learning, characterized by including.

A second aspect of the present invention transmits a program for executing at least one of the first to ninth steps to a computer so that the Raman scattering spectrum database construction and search method through machine learning according to the first aspect is performed on the computer Provides a medium or computer-readable recording medium.

A third aspect of the present invention provides an apparatus for calculating desired bio-prediction information from a Raman spectrum database,

From a biological-derived liquid sample, (a) one or more Raman shift values, and at each shift value, (b) a Raman peak, which is a relatively highest value among Raman intensities on the vertical axis, and (c) a Raman minimum, which is a relatively low value among Raman intensities on the vertical axis. Information receiving unit (A) to collect the Raman spectrum list (list) of the sample;

Information included in the Raman spectrum list of the sample, (a-1) an algorithm for section learning the Raman shift value; (b-1) an algorithm for cluster learning of Raman peaks; and (c-1) using the Raman lowest point as an input of an algorithm for cluster learning,

Based on (a') Raman shift values generated by the machine learning algorithm, (b') Raman peaks and (c') Raman troughs at each shift value, (d) signal to noise ratio within a given repeatability ), defined as the ratio of the spectrum where 50% or more, sensitivity (sensitivity), (e) the spectrum of the standard deviation (δ) range (μ-δ, μ+δ) of the normal distribution mean (μ) in which the distribution range of a given spectrum value is Stability, defined as the composition ratio, and (f) repeatability, defined as repeatability of a spectrum with a stability of 50% or more and a sensitivity of 50% or more, as defined above based on the number of measurements within a given inspection time, is machine learning. infer,

Spectrum with repeatability, stability of 80% or more, and sensitivity of 90% or more, is selected as the selection value within the shift, and the spectrum selectivity is calculated,

Information included in the Raman spectrum list of the sample, (a) Raman shift values, at each shift value, (b) Raman peaks and (c) Raman troughs; (a') Raman shift values generated by a machine learning algorithm, (b') Raman peaks and (c') Raman troughs at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred from this by machine learning; and a Raman spectrum database (B) constructed by inputting the calculated spectral selectivity by selecting it as a selection value within the corresponding shift from this;

Optionally, by inputting (a) Raman shift value, (b) Raman peak and (c) Raman trough at each shift value of the sample, desired biometric prediction information is calculated from the constructed Raman spectrum database (B) Biometric information prediction unit (C)

It provides an apparatus for calculating biometric prediction information, characterized in that it comprises a.

The apparatus for calculating desired biometric prediction information from the Raman spectrum database according to the third aspect may perform the Raman scattering spectrum database construction and search method through machine learning according to the first aspect.

Therefore, the apparatus for calculating desired biometric prediction information from the Raman spectrum database according to the present invention corresponds to an apparatus for constructing and searching a Raman scattering spectrum database through a kind of machine learning.

Hereinafter, the present invention will be described.

The present invention relates to the presence and/or concentration of a specific biological material (eg, protein, amino acid, lipid, nucleic acid), chemical binding derived from cells (eg, bacteria, cancer cells, normal cells), identification of constituents and/or cell types It is characterized by building a Raman scattering spectrum database and pattern matching algorithm based on unsupervised machine learning techniques to accurately predict biometric information such as identification and/or concentration.

Specifically, in order to accurately predict and calculate biometric information, the biometric prediction information calculating device according to the present invention includes (a) one or more Raman shift values and each shift value included in a Raman spectrum list of a biological sample. (b) Raman highest point and (c) Raman lowest point as input information to the machine learning algorithm, and by learning the feature set of the biological sample using the machine learning algorithm, information about the state of the animal or cell from which the biological sample is extracted, disease It is characterized in that diagnosis and/or evaluation of the effectiveness of a therapeutic agent, and/or calculation or processing of specific information for predicting bacterial infection information of the sample is realized.

When a detection indicator from which the Raman shift value (a) is to be derived is connected to nanoparticles dispersed in a liquid sample, in order to measure the Raman shift value (a), the local surface of the nanoparticles Surface analysis Raman spectroscopy using localized surface plasmon resonance (LSPR) may be performed. In this case, surface analysis Raman spectroscopy may be performed after concentrating or filtering nanoparticles to which a detection indicator is connected.

Therefore, the Raman scattering spectrum database construction and search method through machine learning according to the present invention,

Step 2-1 of section learning the Raman shift value (a) of the first step;

Step 2-2 of cluster learning the Raman peak (b) of the first step, and

Step 2-3 of cluster learning the Raman lowest point (c) of the first step

a fourth step of calculating fractional bandwidth;

Optionally, by inputting (a) Raman shift values, (b) Raman peaks and (c) Raman troughs at each shift value of the sample in the first step, the desired prediction from the Raman spectrum database constructed in the sixth step and a seventh step of calculating information.

The sample to be constructed of the Raman spectrum database may be a liquid sample.

(a) one or more Raman shift values, at each shift value, (b) Raman peaks and (c) Raman troughs included in the list of Raman spectra of the biological sample, whenever measured in a liquid sample, in a stationary phase Unlike measurement, since it does not appear as a fixed value that can be intuitively confirmed, the present invention is more useful when constructing and searching a Raman spectrum database from a liquid sample in order to predict the above-described biometric information.

Therefore, in the method of constructing and searching a Raman scattering spectrum database through machine learning according to the present invention, in the sixth step of constructing the Raman spectrum database, when the Raman spectrum database exclusively for liquid samples is differentiated and constructed,

Optionally, in the case of a liquid sample, an eighth step of separating and storing through spectral intensity indexing; and

Optionally, in the case of a liquid sample, the ninth step of filtering noise spectrum intensity using spectral pattern matching may be included.

In the present invention, it is preferable to set a baseline as the average value of the standard deviation for each Raman shift of a signal experimentally output through a negative control, and to derive all Raman spectra after performing baseline correction.

[제1단계][Step 1]

The first step is from each sample, (a) one or more Raman shift values, and at each shift value, (b) Raman peak, which is the highest value among Raman intensities on the vertical axis, and (c) Raman, which is the lowest value among Raman intensities on the vertical axis. This is a step of generating a Raman spectrum of each sample by deriving the lowest point.

(a) 라만 쉬프트(a) Raman shift

As shown in FIG. 2 , when light (laser) is incident on a target molecular material and is scattered, scattering in which the amount of energy is not the same is called Raman scattering, and a change in the energy level is called Raman shift.

In general, the Raman value does not use the wavelength (wave length) of Raman itself, but uses the wave number (cm ^-1 ) that is the Raman shift.

The Raman shift value may be derived by Equation 1 or Equation 2 below.

[Equation 1]

[Equation 2]

As illustrated in FIG. 3 , one or more Raman shift values may be derived from one sample.

As illustrated in FIG. 3 , (b) Raman highest point and (c) Raman lowest point are obtained from each Raman shift value. In the Raman shift range of 400 cm ^-1 to 3200 cm ^-1 , for example, in the Raman shift range of the inspection equipment (eg, 500 cm ^-1 to 2,000 cm ^-1 ) by 1 to 5 shifts, preferably moving by 1 shift The lowest and highest points can be obtained from each shift value.

(b) 라만 최고점(b) Raman peak

Since the Raman intensity is generally expressed in arbitrary units (a.u.), it is not an absolute number, and the Raman peak can be defined as the highest intensity within the relatively Raman shift range.

As illustrated in FIG. 3 , the Raman peak may be derived from the relatively highest value among the Raman intensities of the vertical axis in each Raman shift value.

(c) 라만 최저점(c) Raman trough

The Raman intensity (a.u.) is not an absolute number because it is generally expressed in arbitrary units, and the Raman lowest point can be defined as the lowest intensity within the relatively Raman shift range.

As illustrated in FIG. 3 , the Raman lowest point may be derived from the relatively lowest value among the Raman intensities of the vertical axis in each Raman shift value.

[제2단계][Step 2]

The second step is a step 2-1 of section learning the Raman shift value (a) of the first step, a step 2-2 of cluster learning the Raman peak (b) of the first step, and the Raman of the first step Generate (a') Raman shift values, (b') Raman peaks, and (c') Raman troughs from each shift value, learned according to a machine learning algorithm that performs step 2-3 of cluster learning the lowest point (c) is a step to

The second step is performed to select a clustering candidate group through unsupervised machine learning, and is a preprocessing process to reduce over-fitting as much as possible.

Machine learning can collect training data through an algorithm and then create a more accurate model based on that data. (a') Raman shift values, (b') Raman peaks, and (c') Raman troughs at each shift value are outputs generated when training a machine learning algorithm using data, and provide a machine learning model. After training, you provide input to the machine learning model and you will receive the output.

[제3단계][Step 3]

Step 3 is based on (a') Raman shift values generated in Step 2, (b') Raman peaks and (c') Raman troughs at each shift value, (d) sensitivity, (e) It is a step of inferring stability and (f) repeatability by machine learning.

The third step is to select a main peak capable of substantially identifying a molecule from among the Raman peak clusters clustered in the previous step, and select peaks that satisfy the following conditions as final candidates for the main peak. is to perform

(d) 민감도(sensitivity)(d) sensitivity

In the present invention, sensitivity may be defined as a ratio of a spectrum in which a signal to noise ratio within a given repeatability is 50% or more.

(e) 안정도(stability)(e) stability

In the present invention, stability may be defined as a spectrum composition ratio in which the distribution range of a given spectrum value is above the standard deviation (δ) range of the normal distribution mean (μ) (μ-δ, μ+δ).

(f) 반복성(repeatability)(f) repeatability

In the present invention, repeatability can be defined as having repeatability of a spectrum with a stability of 50% or more and a sensitivity of 50% or more, as defined above, based on the number of measurements within a given inspection time.

[ Step 4 ]

The fourth step is to calculate the fractional bandwidth.

Fractional Bandwidth can be said to be a relative bandwidth, which means the bandwidth with respect to the center frequency.

Fractional Bandwidth = bandwidth / center frequency

The reason why such a concept is necessary is that bandwidth is not simply a matter to be considered absolutely, but rather a concept to be considered relative to the center frequency.

The fourth step is to select only a single peak within the same fractional bandwidth as a representative among the final selected main peak candidates. If there are several main peak candidates within the same fractional bandwidth, the peak that best meets the condition is selected among them, and if the same score is obtained, the left candidate peak closer to the center frequency is selected.

[ Step 5 ]

Step 5 is a step of calculating spectral selectivity by selecting a spectrum having a repeatability and stability of 80% or more and a sensitivity of 90% or more as a selection value within the shift as defined in the third step.

The fifth step is to select one representative main peak that best fits the condition among the main peak candidates within each fractional bandwidth. Through this, representative peaks to be used for identification of the final molecule are selected.

선택도(selectivity)selectivity

In the present specification, selectivity has repeatability as defined above, and a spectrum having a stability of 80% or more and a sensitivity of 90% or more is selected as a selection value within the shift.

[ Step 6 ]

Step 6 includes (a) Raman shift values of the first step, (b) Raman peaks and (c) Raman minimums at each shift value; (a') Raman shift values generated by machine learning in the second step, (b') Raman peaks and (c') Raman troughs at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and inputting the spectral selectivity calculated in step 5 to build a Raman spectrum database.

(a) one or more Raman shift values derived from each sample, at each shift value, (b) Raman peaks and (c) Raman lowest points are the first data set of the sample to be trained in the second step, and It can be expressed as a Raman spectra.

In addition, the second data set constructed through machine learning may be (a') Raman shift values generated by learning in the second step, (b') Raman peaks and (c') Raman lowest points at each shift value, This may also be expressed as a Raman spectra derived through machine learning for the corresponding sample.

Further, the third data set inferred by machine learning includes: (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and the spectral selectivity calculated in the fifth step.

The first data set, the second data set, and the third data set may be constructed as the Raman scattering spectrum database of the present invention.

The sixth step of constructing the Raman spectrum database may store the following items (i) to (iv):

(i) the grant code corresponding to the substance;

(ii) any selected Raman shift values corresponding to the material;

(iii) Relative Intensity values for all selection shift values

(iv) the baseline difference value of the negative control used for the shift

[ Step 7 ]

In step 7, if (a) Raman shift value of the sample, (b) Raman highest point, and (c) Raman lowest point at each shift value are input in step 1, desired prediction information from the Raman spectrum database constructed in step 6 is the step to calculate

The prediction information calculated in the seventh step includes (a') Raman shift values generated by learning in the second step, (b') Raman highest points and (c') Raman lowest points at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and one or more output values obtained through a specific function by inputting one or more values selected from the group consisting of the spectral selectivity calculated in the fifth step.

In this case, the function may have a relationship as shown in FIG. 5 .

The prediction information (output value) output through the function (hidden layer) is the state information of the animal or cell from which the sample is extracted, which is the target for deriving the Raman shift value (a) in the first step, disease diagnosis and/or evaluation of the effect of a therapeutic agent, and / or bacterial infection information of the sample.

In addition, the prediction information (output value) output through the function (hidden layer) is the presence and/or concentration of specific biomaterials (eg, proteins, amino acids, lipids, nucleic acids), cells (eg, bacteria, cancer cells, normal cells) It may be the chemical bond of origin, the identification and/or the concentration of the constituent and/or the cell type.

[ Step 8 ]

In the eighth step, when a Raman spectrum database dedicated to a liquid sample is differentiated and constructed in the sixth step of constructing a Raman spectrum database, in the case of a liquid sample, separate storage through spectral intensity indexing to construct a Raman spectrum dedicated to a liquid sample is a step to

[ Step 9 ]

In the ninth step, in the case of a liquid sample, noise spectrum intensity reduction filtering using spectral pattern matching is performed in order to distinguish and build a Raman spectrum database dedicated to a liquid sample in the sixth step of constructing the Raman spectrum database.

The ninth step may be to determine the spectral pattern matching based on the coincidence rate with the Raman shift values selected based on the material.

Determining the spectral pattern matching by the coincidence rate with the Raman shift values selected based on the material may be by using at least one of the following methods (i) to (iv):

(i) Noise is defined as 50% or less of the signal-to-noise ratio of each shift;

(ii) matching the minimum and maximum values of the obtained Raman shift values to the previously obtained reference minimum and maximum values for the material and adjusting the ratio;

(iii) Values within 1% of both sides of each selected Raman shift value are judged to be identical;

(iv) If the matching rate of all selected spectra is more than 95%, it is judged to be consistent

Since biosensors are also connected to human health, quality of life, and life, efforts to improve accuracy, not just to increase the sensitivity of the sensor, must be made together. As the sensitivity of the sensor increases, the sensitivity to accept non-specific signals also increases. Therefore, there may be cases where a false-positive result is transmitted. Because this can lead to side effects such as drug abuse and unnecessary treatment, it should be considered first when developing a sensor.

The Raman scattering spectrum database construction and search method through machine learning according to the present invention can solve this problem.

1 is a schematic diagram of an algorithm driving a Raman scattering spectrum database construction and search method through machine learning according to the present invention.

2 is a schematic diagram for explaining the principle of Raman scattering.

3 is an exemplary diagram for explaining a method of obtaining (a) one or more Raman shift values, (b) Raman highest point, and (c) Raman lowest point from each shift value, which is Raman spectrum information of a sample.

4 is a schematic diagram illustrating an explanation of artificial intelligence (AI), which is an entire category including machine learning and natural language processing.

5 is a schematic diagram illustrating the architecture of a neural network as a kind of function.

6 is a schematic diagram of radio wave plasmon and localized surface plasmon resonance (LSPR).

7 is a schematic diagram illustrating components of a biosensor and their relationship.

8 is a schematic diagram illustrating the operating principle of a biosensor using metal nanoparticles exhibiting localized surface plasmon resonance (LSPR).

9 is a diagram showing the operating principle of a nucleic acid-based self-assembly complex (NEW structure) for Raman, which acts as a sensor of a turn-off signal method in the presence of a target nucleic acid.

10 is a graph showing the result of repeated 100 measurements of the Raman signal coming out using the NEW structure prepared in Example 1. FIG.

11 shows the Raman signal coming out using the NEW construct prepared in Example 1, the left graph is the NC (negative control) state in which the NEW construct is not dissociated and completely exists in the absence of the target nucleic acid, and the right graph is the target nucleic acid If present, the structure is in a completely dissociated PC (positive control) state.

12 shows the Raman signal coming out using the NEW construct prepared in Example 1, and it shows that the signal gradually decreases as the amount increases in a turn-off method according to the amount of target nucleic acid present.

13 is a conceptual schematic diagram of an exemplary computer server used to process the systems and methods described herein.

As for the terms used in this specification, general terms that are currently widely used as possible have been selected, but in certain cases, there are also terms arbitrarily selected by the applicant. So the meaning should be understood.

Hereinafter, the technical configuration of the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.

1. 기계 학습 또는 머신 러닝1. Machine Learning or Machine Learning

Machine learning is a form of AI that can learn systems from data rather than through explicit programming (Figure 4). However, machine learning is not a simple process. After collecting training data through an algorithm, a more accurate model can be created based on that data. A machine learning model is an output generated when training a machine learning algorithm using data. After training, you provide input to the model and you will receive the output. For example, in a predictive algorithm, a predictive model is generated. Then, you provide data to a predictive model, and you receive predictions based on the data that you trained on that model.

2. 머신 러닝의 반복 학습2. Iterative learning in machine learning

Machine learning allows a model to be trained on a data set before it is deployed. Some machine learning models are online and persistent. This iterative process of the online model can improve the types of connections made between data elements. These patterns and associations are easy to overlook by users due to their complexity and size. After training the model, you can use the model in real time to learn from the data. The learning process and automation involved in machine learning can improve accuracy.

3. 머신 러닝에 대한 접근법3. Approaches to Machine Learning

Machine learning techniques are needed to improve the accuracy of predictive models. Based on data type and capacity, there are various approaches such as:

3-1. 감독 학습3-1. supervised learning

Supervised learning usually begins with a solid understanding of the data set that has been built and how to classify that data. Supervised learning is a way to find patterns in data that can be applied to the analytic process. These data have classified functions that define the meaning of the data.

3-2. 무감독 학습3-2. unsupervised learning

Unsupervised learning is used when a problem requires a huge amount of unclassified data. Understanding the meaning behind these data requires algorithms that classify data based on discovered patterns or clusters. Self-learning performs an iterative process and analyzes data without user intervention.

3-3. 강화 학습3-3. reinforcement learning

Reinforcement learning is a behavioral learning model. Feedback from data analysis is applied to the algorithm to guide users to optimal results. Reinforcement learning is different from other types of supervised learning. This is because we are not training the system using a sample data set. Instead, it learns the system through trial and error. Thus, the process is strengthened by a series of successful decision-making, because it always solves the problem most effectively.

3-4. 딥 러닝3-4. deep learning

Deep learning is a specific machine learning methodology that integrates neural networks into successive layers so that they can iteratively learn from data. Deep learning is especially useful when learning patterns from unstructured data. Thus, computers can be trained to deal with poorly defined abstractions and problems.

4. 머신 러닝 환경의 빅데이터4. Big Data in Machine Learning Environments

To do machine learning, you need to apply the right data set to the learning process. Big data can increase the reliability of raw data and learning results by pre-processing data suitable for the purpose of use, and the presence of big data can help to increase the accuracy of machine learning models. Big data can be used to virtualize data so that it can be stored in the most efficient and cost-effective way, whether on premises or in the cloud. Additionally, improvements in network speed and reliability may remove other physical limitations associated with managing large amounts of data at acceptable rates.

5. 머신 러닝의 포괄적인 운영 방법5. Comprehensive Operational Methods of Machine Learning

The advantage of machine learning is that it can utilize algorithms and models to predict outcomes. The trick is to use the right algorithms, collect the most appropriate data (that is, accurate and clean data), and ensure that the best performing models are consistently available. Putting all these elements together, you can continuously train the model by learning from the data and re-learn from the results. By automating this process of modeling, model learning, and testing, accurate predictions can be derived, and various bio-prediction information can be supported from the Raman spectrum database.

6. 라만 쉬프트 6. Raman shift

In Raman spectroscopy, when strong light having a single wavelength is irradiated to a material, most of it undergoes elastic scattering, but a part of the light is used for molecular resonance and is scattered with a different frequency (Inelastic scattering: It is a method to analyze the chemical composition and structure of molecules using the Raman effect, which is a Raman scattering phenomenon. When Raman scattering is performed, the degree of shift compared to elastic scattering is called a Raman shift, and the characteristic of a medium can be expressed by expressing it as a spectra.

Therefore, Raman spectroscopy, for example, can examine cellular components such as bacterial proteins, lipids, and nucleic acids, and according to the characteristics of the components, different signal intensities in the Raman shift 400-3200 cm ^-1 section are measured by Raman spectra ( The result can be expressed as spectra). Theoretically, each bacteria has its own Raman spectra, and some bacteria can be distinguished at the species level. In addition, the expression of genetic traits according to various environmental conditions affects the cell composition, which appears as a change in the Raman spectra, allowing information on the cell status within the same species to be confirmed. In general, when measuring single cells, a 532 nm laser with the least background effect by fluorescence is used. Table 1 shows the Raman shift information of chemical bonds and cellular components exhibiting resonance with a 532 nm laser.

In addition, chemical bonds such as CC and CN and characteristics of various constituents such as DNA and amino acids can be confirmed. In particular, the amino acid phenylalanine found in general Raman spectra of bacteria can be measured at a Raman shift of 1004 cm ^-1 and is used as a major factor in determining the accuracy of Raman spectra of bacteria.

In the medical field, Raman spectroscopy is used to detect infectious bacteria and diagnose diseases caused by bacterial infection. For example, in order to diagnose urinary tract infection, it can be diagnosed by detecting Escherichia coli and Enterococcus faecalis, which are the main causes of inflammation, from clinical samples, and it can be applied to the evaluation of the effect of prescribed antibiotics. In particular, in the case of E. coli, the treatment effect of the four antibiotics ampicillin, ciprofloxacin, gentamicin, and sulfamethoxazole can be directly confirmed. It can be used for tuberculosis diagnosis by constructing a DB for Mycobacterium tuberculosis, which is known as a major pathogen of tuberculosis. As such, by applying Raman spectroscopy to real-time bacterial detection technology, it can be utilized for disease diagnosis.

In addition to the medical field, Raman spectroscopy is being actively used in the food field. Salmonella spp., Escherichia coli, Pseudomonas aeruginosa, Listeria monocytogenes, Legionella spp., and Staphylococcus aureus, which cause foodborne illness, can be quickly detected for food-borne illness. Currently, it is possible to detect infectious bacteria contained in milk and meat (chicken, minced meat, etc.) at the single-cell level. Salmonella spp. In the case of detection technology, the use of Raman spectroscopy is common in the food field to the extent that ISO international standards have been established for various foods. In addition, Pseudomonas aeruginosa and Legionella spp. can be detected from tap water and commercially available drinking samples and used for water quality health management.

Furthermore, by utilizing amplification technologies such as SERS or TERS, in the medical field, a method of directly irradiating cells with Raman spectroscopy is included in the disease diagnosis process and utilized to prevent diseases such as cancer. Breast cancer cells had a lower signal intensity than normal cells at a Raman shift of 1003 cm ⁻¹ , and the signal intensity of platelets extracted from mice transplanted with Alzheimer’s gene at 740 cm ⁻¹ and 1654 cm ⁻¹ was compared with the normal control group. is known to be high. With such high measurement sensitivity and specificity, it can be applied to disease diagnosis by discriminating differences in minute Raman scattering signals.

Even without a separate amplification technology, the sensitivity of biomolecule measurement is greatly improved, and the range of applications is expanding to analysis at the single-cell level. In some cases, a separate sample pretreatment process such as bacterial fixation, culture, and hybridization can be omitted, and the measurement time (10 to 60 seconds) is short, enabling real-time analysis.

In addition, since there is little interference from water during measurement, it is possible to measure liquid samples such as culture or environmental sample extracts as they are, and since there is little risk of cell destruction even after measurement, the detected single cells are separated for culture and single cell genome analysis. can be applied These unique advantages of Raman spectroscopy make it possible to analyze the characteristics of bacteria present in environmental samples at the single-cell level.

Furthermore, physiological activity analysis using SIP-Raman technology SIP (stable isotope probing) technology analyzes bacteria using substrates containing stable isotopes such as carbon-13 and nitrogen-15 to study specificity for specific substrates way to do it At this time, cell components such as nucleic acids and amino acids are labeled with stable isotopes, and the labeled components have different values from the existing Raman shift measurement values, and are distinguished from bacteria cultured on a general substrate. For example, the phenylalanine Raman shift of carbon-12 is observed at 1004 cm ^-1 , but the Raman shift of phenylalanine labeled with carbon-13 shows a Raman signal at 967 cm ^-1 . In addition to phenylalanine, it is possible to directly measure the physiological activity of bacteria by comparing the Raman shift difference that can be distinguished when cell components are substituted with isotopes of carbon-13, nitrogen-15, and hydrogen-2.

In order to distinguish bacteria that decompose naphthalene in groundwater, after culturing naphthalene labeled with carbon-13 as a carbon source, bacterial single cells can be measured. As a result of the measurement, phenylalanine of Aicdovorax, a bacterium that decomposes naphthalene, showed a Raman signal at a Raman shift of 967 cm ^-1 unlike other bacterial single cells, and it can be verified that Aicdovorax is a bacterium that uses naphthalene as a carbon source. As another example, when sodium bicarbonate of carbon-13 is injected to detect cyanobacteria containing carotenoids in an environmental sample, the Raman signal of phenylalanine of the labeled cyanobacteria appears at 991 cm ^-1 . It exhibits a characteristic that the intensity of the Raman shift is proportionally changed as much as the ratio of the cell constituents labeled with carbon-13. This means that it is possible to quantitatively analyze bacterial activity on a substrate using the intensity information of the Raman shift. However, if the labeling degree of the carbon-13 substrate is less than 10%, it cannot be distinguished from carbon-12 due to the detection limit of Raman spectroscopy. When a nitrogen isotope is used, the change in Raman shift is relatively smaller than when a carbon isotope is used, and the change occurs mainly in the Raman shift value of a nucleic acid, unlike carbon, which showed a difference in amino acids. Even these changes are buried in different Raman shifts, making them very difficult to distinguish in complex samples. When E. coli cultured in a medium with different ratios of nitrogen-14 ammonium chloride and nitrogen-15 ammonium chloride is measured by Raman spectroscopy, the position change of the Raman shift cannot be confirmed, but the signal intensity depends on the injection concentration of the isotope. It can be seen that it increases proportionally. There is a limit to distinguishing the physiological activity of bacteria because the change in intensity not accompanied by a change in the position of the Raman shift can be changed according to differences in experimental methods (focus control, bacterial measurement position, etc.). Therefore, in order to more clearly see the changes according to nitrogen isotopes, amplification techniques such as SERS can be connected. Hydrogen-2 (D; Deuterium), a hydrogen isotope, is mainly used to study lipid metabolism, and hydrogen-2 labeled cell components (CD binding) can be measured at a Raman shift of 2000-2300 cm ^-1 do. When no isotope is used, a special signal does not exist in the 2000-2300 cm ^-1 Raman shift region, but a new signal is measured when CD binding is present. In addition, when the hydrogen-2 isotope is sufficiently labeled with a cell component other than a lipid, a phenomenon in which the Raman shift of phenylalanine moves to 959 cm ^-1 has also been reported, similar to the carbon isotope result.

Since the portion containing lipids in bacteria is extensive, it can be applied to monitoring lipid metabolism using imaging techniques in addition to simple detection. However, there is a limitation in that it is difficult to check the changing Raman shift value because the amount of hydrogen used as a cell constituent material is relatively small even if the experiment is performed by substituting hydrogen in a general substrate for hydrogen-2 isotopes. For example, in Geobacter metalireducens cultured with acetic acid substituted with hydrogen-2 as a substrate, it is difficult to observe the change in Raman signal according to the use of isotopes. It is possible to analyze the physiological activity by replacing the culture medium with heavy water (D ₂ O). Bacteria with substrate activity can be analyzed through Raman shift without using isotope-substituted substrates by using the property of substituting hydrogen ions in the culture medium during lipid biosynthesis.

Experiments using heavy water may be difficult because environmental samples contain various substrates (including organic matter generated from cadavers).

7. 기지의 라만 쉬프트 값에 대한 라만 신호로 검출하고자 하는 분석물7. An analyte to be detected as a Raman signal for a known Raman shift value

For example, the analyte to be detected may include amino acids, peptides, polypeptides, proteins, glycoproteins, lipoproteins, nucleosides, nucleotides, oligonucleotides, nucleic acids, sugars, carbohydrates, oligosaccharides, polysaccharides, fatty acids, lipids, hormones, metabolites, cytokines, chemokines, receptors, neurotransmitters, antigens, allergens, antibodies, substrates, metabolites, cofactors, inhibitors, drugs, pharmaceuticals, nutrients, prions, toxins, poisons, explosives, pesticides, chemicals inorganic agents, biohazardous agents, radioisotopes, vitamins, heterocyclic aromatic compounds, carcinogens, mutagens, anesthetics, amphetamines, barbiturates, hallucinogens, wastes or contaminants, and the like.

In addition, when the analyte is a nucleic acid, the nucleic acid is a gene, viral RNA and DNA, bacterial DNA, fungal DNA, mammalian DNA, cDNA, mRNA, RNA and DNA fragments, oligonucleotides, synthetic oligonucleotides, modified oligonucleotides, single stranded and double-stranded nucleic acids, natural and synthetic nucleic acids.

In addition, non-limiting examples of biomolecules capable of recognizing the analyte include antibodies, antibody fragments, genetically engineered antibodies, single chain antibodies, receptor proteins, binding proteins, enzymes, inhibitor proteins, lectins, cell adhesion proteins, oligonucleotides. , polynucleotides, nucleic acids or aptamers.

When an analyte is detected with a Raman signal for a known Raman shift value, (a) the analyte itself or (b) a biomolecule capable of recognizing the analyte is a molecule or a compound of which polarization occurs In case (a) the analyte itself or (b) the Raman signal for the Raman shift value of a biomolecule capable of recognizing the analyte may be measured, (a) the analyte itself or (b) the analyte is recognized It is also possible to measure a Raman signal with respect to a Raman shift value of a Raman marker by linking a Raman marker to be described later to a biomolecule capable of doing so.

8. 국부적 표면 플라즈몬 공명(localized surface plasmon resonance, LSPR)을 이용한 표면 분석 라만 분광법8. Raman spectroscopy for surface analysis using localized surface plasmon resonance (LSPR)

Au and Ag have high free electron density compared to other metals and are very stable because of their relatively low ionization tendency. Also, the high free electron density makes the real part of the dielectric constant of the metal negative and makes the metal have a large polarization, causing strong electric field enhancement. And, in the case of the imaginary part, since it indicates the degree of absorption of light, which is energy loss, the value must be small for effective augmentation.

Accordingly, in the case of Au, it has a real part value of a relatively low dielectric constant at about 630 nm in the visible ray region and has the lowest imaginary part value. In the case of Ag, when both the real part and the imaginary part of the permittivity are considered, it has a value capable of efficiently enhancing at about 530 nm.

Surface plasmon (FIG. 6) refers to a collective vibration phenomenon of free electrons propagating along the interface between a metal having a dielectric constant less than zero, such as Ag and Au, and a dielectric having a dielectric constant greater than zero to which the metal belongs. At this time, a phenomenon in which the frequency of the electromagnetic field (generally visible light) incident on the metal coincides with the frequency of the surface plasmon, resulting in resonance, and having a size more enhanced than the incident wave is called surface plasmon resonance (SPR). And the SPR has an evanescent wave form that exponentially decreases as it goes away from the interface. Such SPR includes radio wave plasmon that occurs on a thin metal plane and localized surface plasmon resonance (LSPR) that occurs on metal nanoparticles (FIG. 6).

Surface enhanced Raman spectroscopy (SERS) uses the principle that light irradiated on the 'surface of metal nanoparticles such as silver or gold' causes plasmon resonance on the material surface to amplify the Raman scattering signal. That is, when a target molecule is present in the vicinity of the metal nanostructure, a phenomenon in which the Raman scattering signal of the corresponding molecule is greatly increased is used. One of the advantages of surface-enhanced Raman scattering analysis is that it can provide information that is difficult to obtain with general Raman analysis.

Surface-enhanced Raman spectroscopy (SERS) may be introduced to amplify a Raman scattering signal that is relatively difficult to detect due to a small Raman scattering cross-section. By using metal nanoparticles such as silver (Ag) or gold (Au), the Raman signal of the sample adsorbed on the surface of the nanoparticles can be amplified and detected by the interaction between the metal nanoparticles and incident light.

At this time, the amplification degree of the signal varies depending on the shape and size of the metal nanoparticles and the type of metal, and also depends on the angle, wavelength, and polarization of the incident light. By controlling these various factors, even the Raman scattering signal of a single molecule can be confirmed with SERS. However, although SERS overcomes the shortcomings of the small scattering cross-sectional area of Raman spectroscopy, high-resolution Raman images cannot be obtained due to the diffraction limit of the optical system.

As a technique capable of overcoming the limitations of chemical analysis of such samples, there is tipenhanced Raman spectroscopy (TERS), which combines SPM and Raman spectroscopy. TERS is a Raman spectroscopy technique developed using the principle of surface-enhanced Raman spectroscopy (SERS).

9. 국부적 표면 플라즈몬 공명(LSPR)을 발휘하는 금속 나노입자의 특성 9. Characterization of Metal Nanoparticles to Exhibit Localized Surface Plasmon Resonance (LSPR)

Metal nanoparticles are actively used in in vivo and in vitro diagnostic fields due to their excellent durability and unique physical, chemical, and electrochemical properties according to their size. The signal generated based on the material, shape, and size of the metal nanoparticles has the advantage of being able to generate a stable signal for a long time because it is possible to transmit a unique signal without an additional labeling material. Another advantage of metal nanoparticles is that they can amplify the signal generation of fluorescent substances and small molecule labeling substances based on the material properties of the metal. For example, the plasmon resonance phenomenon of metal nanoparticles can have the effect of amplifying optical properties such as Raman signals and fluorescent molecular signals.

In addition, the surface-modified metal nanoparticles can improve the function of the sensor, such as amplification of an electrochemical signal, and improving sensitivity and selectivity. In addition, metal nanoparticles are being used variously for clinical, pharmaceutical, and cancer treatment delivery along with their use as sensors.

Metal nanoparticles exhibiting localized surface plasmon resonance (LSPR) can be synthesized as metal nanoparticles themselves, biofunctionalized metal nanoparticles, metal nanocomposites, or nanohybrids ( FIG. 8 ).

On the other hand, it is important to synthesize high-purity nanomaterials with certain physicochemical characteristics, surface charge, and shape. In the case of nanoparticles, it is easy to control the size, and various materials can be used depending on the purpose, and since they are often synthesized in an aqueous solution, a large amount of material can be synthesized in a relatively easy way.

The high surface area to volume ratio of metal nanoparticles increases the efficiency of the catalyst or improves the sensitivity of the sensor. , which can have many advantages for clinical applications. Due to this, it is possible to detect sensitive (or small amount) cells and biomarkers in the human body in medical diagnosis and clinical analysis, and to perform detailed examination of local tissue sites. It is possible to develop and use a metal nanomaterial-based electrochemical sensor and biosensor platform to detect a very small amount of samples mainly of clinical and biological origin and to find biomedical important analytes.

10. 나노 입자에 의해 국부적 표면 플라즈몬 공명(LSPR)을 이용한 표면 분석 라만 분광법에서 라만 쉬프트 값(a)을 도출하고자, 나노 입자 상에 연결되어 있는 라만 표지자(Raman indicator)10. Surface analysis using localized surface plasmon resonance (LSPR) by nanoparticles To derive the Raman shift value (a) in Raman spectroscopy, a Raman indicator connected to the nanoparticles

In the present invention, a Raman scattering signal of a Raman indicator may be obtained through Raman spectroscopy.

Raman indicators are organic or inorganic molecules, atoms, complexes or synthetic molecules, dyes, naturally occurring dyes (phycoerythrin, etc.), organic nanostructures such as C ₆₀ , bucky balls, carbon nanotubes, quantum dots, organic It may be a fluorescent molecule or the like. Specifically, as a non-limiting example of a Raman indicator, FAM, Dabcyl, TRITC (tetramethyl rhodamine-5-isothiocyanate), MGITC (malakit green isothiocyanate), XRITC (X- Rhodamine-5-isothiocyanate), DTDC (3,3-diethylthiadicarbocyanine iodide), TRIT (tetramethyl rhodamine isothiol), NBD (7-nitrobenz-2-1,3 -diazole), phthalic acid, terephthalic acid, isophthalic acid, para-aminobenzoic acid, erythrosine, biotin, digoxigenin, 5-carboxy-4',5'-dichloro-2',7' -dimethoxy, fluorescein, 5-carboxy-2',4',5',7'-tetrachlorofluorescein, 5-carboxyfluorescein, 5-carboxyrhodamine, 6-carboxyrhodamine, 6-carboxytetramethyl amino phthalocyanine, azomethine, cyanine (Cy3, Cy3.5, Cy5), xanthine, succinylfluorescein, aminoacridine, quantum dots, carboisotropes, cyanide, thiol, chlorine, bromine, methyl, phosphorus or sulfur. The Raman indicator should show a clear Raman spectrum, and preferably, it is an organic fluorescent molecule including a cyanine-based fluorescence-maintaining molecule, Cy3, Cy3.5, Cy5, or a FAM, Dabcyl, or Rhodamine-based fluorescent molecule. Organic fluorescent molecules have the advantage of being able to detect higher Raman scattering signals by resonating with the excitation laser wavelength used for Raman analysis.

As an example of the Raman indicator, a fluorescent substance absorbs light according to its unique structure, and when a molecule reaching an excited state loses energy and returns to a stable ground state, a radiation process in which energy is emitted again as light is a substance with

Such a fluorescence material can be used in liquid-based assays or imaging in a living environment. Since such a fluorescent material can control a signal (or color) according to its molecular structure, multiple detection is possible, and some fluorescent molecules have affinity only for a specific material, so a selective reaction is possible. However, when these fluorescent materials are exposed to light for a long time, the signal intensity decreases, making it difficult to monitor for a long time, and there are disadvantages in that the detectable fluorescence intensity is weak. In order to overcome this, a large amount of fluorescent material can be integrated or the fluorescence intensity can be amplified by using metal nanoparticles.

By simultaneously introducing a fluorescent material to a metal nanoparticle having a receptor capable of recognizing a target material, even if a single target is recognized, the fluorescent material connected to the nanoparticle exhibiting local surface plasmon resonance (LSPR) can amplify the signal. and this may increase the sensitivity.

In the case of gold nanoparticles, it is possible to detect biochemicals using the Au-thiol reaction to easily modify the surface and attach a large amount of material to the surface of the nanoparticles. Gold nanoparticles not only have a very high extinction coefficient, but also can act as a quencher because they have a broad absorption wavelength that overlaps most of the emission wavelengths of commonly used energy donors. By utilizing these characteristics, it is possible to develop a sensor with an on/off signal system.

As an example, DNA with a hairpin structure is designed, and a fluorescent material is attached to the end of gold nanoparticles that exhibit local surface plasmon resonance (LSPR). As the fluorescent material moves away from the surface of the gold nanoparticles that exert local surface plasmon resonance (LSPR), the Raman intensity of the fluorescent material turns off, and biochemicals can be detected by measuring it.

11. 라만 신호를 이용한 타겟 핵산 검출 방법 및 타겟 핵산 검출용 핵산 기반 자가조립 복합체 11. Target nucleic acid detection method using Raman signal and nucleic acid-based self-assembly complex for target nucleic acid detection

According to an embodiment of the present invention, the Raman scattering spectrum database construction and search method through machine learning can be used in a method of detecting a target nucleic acid using a Raman signal derived from a nucleic acid-based self-assembly complex in a liquid, comprising the following steps. :

(a) a first nanoparticle-based structure linked to a first metal nanoparticle at least one first nucleotide that mates with a target nucleic acid according to conditions, and (b) a second nucleotide complementary to the first nucleotide by at least 10 base pairs (bp) The nucleic acid-based self-assembly complex formed by self-assembly between the second nanoparticle-based structure linked to the second metal nanoparticle is not formed by hybridization between the first nucleotide and the target nucleic acid in the presence of the target nucleic acid. Step I of preparing a target nucleic acid detection reagent designed to measure the change value of the Raman signal when it is not or disassembled;

A hybridization reaction (hybridization) with a target nucleic acid detection reagent containing (a) the first nanoparticle-based structure to which the first nucleotide is linked and (b) the second nanoparticle-based construct to which the second nucleotide is linked in step I in the nucleic acid-containing liquid sample Step II to carry out;

Step III of measuring the Raman signal derived from the nucleic acid-based self-assembly complex in the liquid sample before, after and/or simultaneously with the occlusion reaction of step II; and

Step IV providing detection and/or quantitative data of the target nucleic acid in the sample through an algorithm that analyzes the Raman signal measured in step III or its change value

includes

In this case, the target nucleic acid detection reagent of step I is (a) a first nanoparticle-based structure in which a first nucleotide that intersects with a target nucleic acid is linked to a first metal nanoparticle, and (b) a second nucleotide complementary to the first nucleotide From a second nanoparticle-based structure linked to a bimetallic nanoparticle, a nucleic acid-based self-assembly complex is formed by spontaneous bonding at a molecular level between the first nucleotide and the second nucleotide, that is, complementary hydrogen bonding of at least 10 base pairs. ,

When a nucleic acid-based self-assembly complex is formed, (i) a nanogap is formed by two adjacent metal nanoparticles, and (ii) the nanogap generates and further strengthens a surface plasmon resonance phenomenon (electromagnetic effect) upon irradiation with light. space, and (iii) a Raman indicator linked to the second oligonucleotide may be positioned in the nanogap to enhance the Raman scattering signal detected during light irradiation (FIG. 9).

In addition, the first nucleotide that mates with the target nucleic acid is an oligonucleotide probe having a nucleic acid sequence that mates with a partial sequence of the target nucleic acid depending on conditions, spacers and nano an oligonucleotide adhesive that attaches to the particle; A second nucleotide complementary to the first nucleotide by 10 bp or more is a 2-1 oligonucleotide adhesive having a nucleic acid sequence complementary to the oligonucleotide probe of the first nucleotide, and the degree of freedom of action is increased by 10 bp or more It may include a spacer and a second 2-2 oligonucleotide adhesive attached to the nanoparticles to assist in hydrogen bonding.

The target nucleic acid detection reagent of the present invention is capable of measuring the change value of the Raman signal when the nucleic acid-based self-assembly complex is not formed or disassembled by hybridization between the first nucleotide and the target nucleic acid in the presence of the target nucleic acid. characteristic (FIGS. 9 to 12).

Steps III and IV may be performed while utilizing the Raman scattering spectrum database construction and search method through machine learning of the present invention ( FIGS. 10 to 12 ).

For example, hybridization with the target nucleic acid detection reagent is performed in a nucleic acid-containing liquid sample, and a Raman signal derived from a nucleic acid-based self-assembly complex in the liquid sample before, after and/or simultaneously with the hybridization reaction, that is, (a) Raman shift If (b) Raman highest point and (c) Raman lowest point are measured and inputted from each shift value, desired prediction information can be calculated from the constructed Raman spectrum database.

For example, the target nucleic acid may be a genome or a fragment thereof, and through detection and/or quantification of the target nucleic acid, various predictive information such as identification of viruses and/or microorganisms or diagnosis of diseases and/or evaluation of the effectiveness of therapeutic agents can be calculated. .

The nucleic acid-based self-assembly complex functions as a sensor of a turn-off signal method in the presence of a target nucleic acid. For example, the nucleic acid-based self-assembly complex forms a precisely structurally defined nanogap between two metal nanoparticles and can exert localized surface plasmon resonance (LSPR) that amplifies the Raman scattering signal. By locating a Raman indicator in the nanogap, the on/off signal system in which the formation of a nanogap is determined by inversely interlocking with the presence or absence of the target nucleic acid to be measured in order to reproducibly secure the enhanced Raman scattering signal. It is to form or contain a nucleic acid-based self-assembly complex for Raman that can serve as a sensor and thereby quantitative analysis of a target nucleic acid (FIG. 9).

Therefore, the target nucleic acid detection reagent according to one embodiment of the present invention can confirm whether or not the nucleic acid-based self-assembly complex is formed and/or the degree of formation (quantitation) with the Raman signal captured in the nucleic acid-based self-assembly complex in the liquid, and from this It is possible to detect or quantify a target nucleic acid that hybridizes with the first nucleotide so that the nucleic acid-based self-assembly complex is not formed or disassembled ( FIG. 12 ).

For example, a target nucleic acid detection reagent that forms or contains a nucleic acid-based self-assembly complex at a known concentration has a Raman marker signal at its maximum in the absence of the target nucleic acid, and as the target nucleic acid increases, the Raman marker signal decreases, resulting in a known concentration of the target nucleic acid. In the presence of an excess of the target nucleic acid corresponding to the nucleic acid-based self-assembly complex of the Raman signal becomes the minimum (FIG. 11). Therefore, it is possible to secure or predict the minimum and maximum reference points of the Raman signal for each concentration of a nucleic acid-based self-assembly complex that can be formed in the target nucleic acid detection reagent ( FIG. 12 ).

In addition, the minimum and maximum reference points of the Raman signal for each concentration of the nucleic acid-based self-assembly complex that can be formed in the target nucleic acid detection reagent can be machine-learned through the Raman scattering spectrum database construction and search method through the machine learning of the present invention.

12. 생명공학 발명에 수반되는 컴퓨터 소프트웨어 관련 발명12. Computer software related inventions accompanying biotechnology inventions

The present invention provides target information from data produced in a biological system. As computer software for deriving, a medium for transmitting a program for executing at least one of the first to ninth steps to a computer so that the Raman scattering spectrum database construction and search method through machine learning of the present invention is performed on a computer; or A computer-readable recording medium is provided.

In the present specification, a computer is a device having information processing capability. Information processing is the operation or processing of information according to the purpose of use.

In this specification, software is a set of instructions and commands (including audio or image information) that enable commands, input, processing, storage, output, and interaction with equipment such as a computer and its peripheral devices.

In the present specification, a computer program is a program installed in a computer to perform a specific function, and is a set of instructions suitable for executing the first to ninth steps with a computer.

In the present specification, a data recording medium is a computer-readable medium in which data having a structure in which processing contents performed by a computer are specified due to the recorded data structure.

In some embodiments, by inputting (a) Raman shift value(s), (b) Raman peaks, and (c) Raman troughs at each shift value of a sample, the method of calculating desired prediction information from the constructed Raman spectrum database is processed on a server or computer server (FIG. 13). In some implementations, server 401 includes a central processing unit (CPU, also “processor”) 405 , which is a single core processor, a multi-core processor, or multiple processors for parallel processing. In some implementations, the processor used as part of the control assembly is a microprocessor. In some implementations, server 401 may also include memory 410 (eg, random access memory, read-only memory, flash memory); electronic storage unit 415 (eg hard disk); a communication interface 420 (eg, a network adapter) for communicating with one or more other systems; and peripheral devices 425 including cache, other memory, data storage, and/or electronic display adapters. The memory 410, the storage unit 415, the interface 420, and the peripheral device 425 communicate with the processor 405 via a communication bus (solid line), such as a motherboard. In some implementations, the storage unit 415 is a data storage unit for storing data. The server 401 is operatively coupled to a computer network (“network”) 430 with the aid of a communication interface 420 . In some implementations, a processor assisted by additional hardware is also operatively coupled to the network. In some implementations, network 430 is an intranet and/or extranet that communicates with the Internet, an intranet and/or extranet, the Internet, a telecommunications or data network.

In some implementations, network 430 assisted by server 401 implements a peer-to-peer network, which enables a device coupled to server 401 to act as a client or server. In some embodiments, the server is configured to provide computer-readable instructions (eg, device/system operating protocols or parameters) or data (eg, sensor measurements, detection of metabolites) via electronic signals transmitted over the network 430 . raw data obtained, analysis of raw data obtained from detection of metabolites, interpretation of raw data obtained from detection of metabolites, etc.) can be transmitted and received. Moreover, in some implementations, a network is used, for example, to transmit or receive data across international boundaries.

In some implementations, the server 401 communicates with one or more output devices 435 such as a display or printer, and/or one or more input devices 440 such as, for example, a keyboard, mouse, or joystick. In some implementations, the display is a touch screen display, in which case it functions as both a display device and an input device. In some implementations, different and/or additional input devices are present, such as enunciators, speakers, or microphones. In some implementations, the server uses any one of a variety of operating systems, such as, for example, Windows®, or MacOS®, or any one of several versions of Unix®, or Linux®.

In some implementations, the storage unit 415 stores files or data related to the operation of an apparatus, system, or method described herein.

In some implementations, the server communicates with one or more remote computer systems via a network 430 . In some implementations, the one or more remote computer systems include, for example, personal computers, laptops, tablets, telephones, smartphones, or personal digital terminals.

In some implementations, the control assembly includes a single server 401 . In other contexts, a system includes multiple servers that communicate with each other via intranets, extranets, and/or the Internet.

In some implementations, server 401 is adapted to store device operating parameters, protocols, methods described herein, and other potentially relevant information. In some implementations, such information is stored on storage unit 415 or server 401 and such data is transmitted over a network.

A general communication network, communication line, etc. may transmit predetermined information such as a program or data.

Non-limiting examples of computer-readable recording media include hard disks, floppy disks, magnetic recording media, and optical recording media, and non-limiting examples of transmission media include a transmission (communication) medium, a carrier wave (carrier) medium. wave), carrier wave, transmission (communication) mechanism, etc.).

13. 광원 및 라만 검출 장치13. Light source and Raman detection device

Laser light is light in phase with a single wavelength. In general, the laser beam is thin and does not spread. Lasers are mainly used in spectroscopy because of their precisely defined monochromatic wavelengths.

Fundamentally, the disadvantage of Raman spectroscopy is that the signal strength is weak, so it is preferable to use a laser capable of providing high-power incident light, that is, high-density photons, as a light source. Accordingly, it is preferable to include a photomultiplier tube (PMT), an avalanche photodiode (APD), a charge coupled device (CCD), or the like, which can effectively amplify the detection signal as the detector.

In the present invention, (i) Raman surface enhancement effect by metal nanoparticles using localized surface plasmon resonance (LSPR), (ii) Raman indicator further amplified due to nano-gap The Raman scattering signal intensity amplification level and/or (iii) the Raman shift value of the Raman indicator may vary depending on the wavelength of the laser incident light used in the Raman analysis.

The method of acquiring a Raman scattering signal through Raman spectroscopy may be performed by any known Raman spectroscopy, preferably, Surface Enhanced Raman Scattering (SERS), Surface Enhanced Resonance Raman Spectroscopy (SERRS, Surface). enhanced resonance Raman spectroscopy), hyper-Raman and/or incoherent anti-Stokes Raman spectroscopy (CARS, coherent anti-Stokes Raman spectroscopy) may be used.

Any suitable form or configuration of Raman spectroscopy or related techniques known in the art may be used for analyte detection, including normal Raman scattering, resonance Raman scattering, surface enhanced Raman scattering, surface enhanced resonance Raman scattering. , incoherent anti-Stokes Raman spectroscopy (CARS), stimulated Raman scattering, inverse Raman spectroscopy, excitation gain Raman spectroscopy, hyper-Raman scattering, molecular optical laser examiner (MOLE) or Raman microprobe or Raman microscopy or confocal Raman microspectroscopy, three-dimensional or scanning Raman, Raman saturation spectroscopy, time-resolved resonance Raman, Raman dissociation spectroscopy or UV-Raman microscopy.

In the present invention, the Raman detection apparatus may include a computer. The above embodiment places no restrictions on the type of computer used. An example computer may include a bus for exchanging information and a processor for processing information. A computer may further include RAM (RAM) or other dynamic storage devices, ROM (ROM) or other static storage devices and data storage devices, such as magnetic or optical disks and corresponding drives. Computers also include peripheral devices known in the art, such as display devices (eg cathode ray tubes or liquid crystal displays), alphabet input devices (eg keyboards), cursor control devices (eg mouse, trackball, or cursor arrow keys), and communication devices. (eg, a modem, network interface card or interface device used to couple with an Ethernet, token ring, or other type of network).

In the present invention, the Raman detection apparatus may be operatively coupled with a computer. Data from the detection device may be processed by a processor and the data stored in main memory. Data on release profiles for standard analytes may also be stored in main memory or ROM. The processor may compare the emission spectra from the analyte on the Raman active substrate to determine the analyte type of the sample. The processor may analyze data from the detection device to determine the identity and/or concentration of various analytes. Differently equipped computers may be used for specific implementations. Accordingly, the structure of the system may differ in different embodiments of the present invention. After a data collection job, typically the data will be sent to a data analysis job. To facilitate the analytical task, the data obtained by the detection device will typically be analyzed using a digital computer as described above. Typically, the computer will be suitably programmed for receiving and storing data from the detection device, as well as for analysis and reporting of the collected data.

A non-limiting example of a Raman detection device is disclosed in US Pat. No. 6,002,471. The excitation beam is generated by a frequency superposed Nd:YAG laser at a wavelength of 532 nm or a frequency superposed Ti: sapphire laser at a wavelength of 365 nm. A pulsed laser beam or a continuous laser beam may be used.

Another example of a detection device is disclosed in US Pat. No. 5,306,403, which is a Spex Model equipped with a gallium-arsenide photomultiplier tube (RCA Model C31034 or Burle Industries Model C3103402) operating in a single photon counting mode. ) 1403 double grating spectrometer. Excitation sources include a 514.5 nm line argon-ion laser from SpectraPhysics, model 166, and a 647.1 nm line from a krypton-ion laser (Innova 70, incoherent).

Other sources of excitation include nitrogen lasers at 337 nm (Laser Science Inc.) and helium-cadmium lasers at 325 nm (Liconox (US Pat. No. 6,174,677), light emitting diodes, Nd : include YLF laser, and/or various ion laser and/or dye laser.Excitation beam is spectrally refined by bandpass filter (Corion) on Raman active substrate using 6X objective lens (Newport, Model L6X) can be focused.

Hereinafter, the present invention will be described in more detail through examples. However, the following examples are only for clearly illustrating the technical features of the present invention, and do not limit the protection scope of the present invention.

실시예 1: 핵산 기반 자가조립 복합체 함유 타겟 핵산 검출 시약의 제조Example 1: Preparation of a target nucleic acid detection reagent containing a nucleic acid-based self-assembly complex

As illustrated in FIG. 9 , the target nucleic acid detection reagent to be prepared in Example 1 contains a nucleic acid-based self-assembly complex (NEW construct), and the NEW construct has (a) a first nucleotide that mates with the target nucleic acid has a diameter of 20 From a first nanoparticle-based structure linked to spherical gold nanoparticles of ~30 nm and (b) a second nanoparticle-based structure in which a second nucleotide complementary to the first nucleotide is linked to spherical gold nanoparticles with a diameter of 20-30 nm, Self-assembly in a water-based solvent through complementary hydrogen bonding of the first nucleotide and the second nucleotide.

In this case, the target nucleic acid is synthesized with a nucleotide sequence (12 mer to 30 mer) that can be identified as the genome of the currently prevalent coronavirus.

An oligonucleotide probe having a nucleotide sequence that intersects with a target nucleic acid, a C3 spacer of the following formula (1) that increases the freedom of activity to facilitate complementary hydrogen bonding of 10 bp or more, and an oligonucleotide adhesive (poly-adenine) attached to the nanoparticles 10mer) was sequentially ligated to prepare a first nucleotide that mates with the target nucleic acid.

In addition, the oligonucleotide probe of the first nucleotide and the 2-1 oligonucleotide attacher having a nucleotide sequence complementary to 20 to 50 bp, increasing the freedom of action to help hydrogen bond complementary to 10 bp or more A second nucleotide complementary to the first nucleotide was prepared by sequentially linking the C3 spacer of 1 and the 2-2 oligonucleotide adhesive (poly-adenine 10mer) attached to the nanoparticles. In this case, Cy3 as a Raman marker is located between the C3 spacer in the second nucleotide and the oligonucleotide adhesive (poly-adenine 10mer) attached to the nanoparticles.

However, the sequence length of the 2-1 oligonucleotide adhesive is shorter than the sequence length of the synthesized target nucleic acid so that the target nucleic acid has the upper hand in competition during mating with the first nucleotide (FIG. 9).

[Formula 1]

100 mM phosphate buffer and 2M NaCl were sequentially added to a mixed solution of the first nucleotide modified with a -SH group at one end and gold nanoparticles, and reacted at room temperature to synthesize a first nanoparticle-based structure. Similarly, 100 mM phosphate buffer and 2M NaCl were sequentially added to a mixed solution of the second nucleotide modified with a -SH group at one end and gold nanoparticles, and reacted at room temperature to synthesize a second nanoparticle-based structure.

Subsequently, the aqueous solution containing the first nanoparticle-based structure and the aqueous solution containing the second nanoparticle-based structure are mixed, and a nucleic acid-based self-assembly complex (NEW structure) containing target nucleic acid formed through complementary hydrogen bonding between the first nucleotide and the second nucleotide A detection reagent was prepared.

Using the self-made inverted Raman detection device, the Raman signal of the NEW construct, that is, the Raman signal of the Raman marker Cy3 linked to the second nucleotide, was measured in the target nucleic acid detection reagent containing the NEW construct in the phosphate buffer prepared in Example 1. The results are shown in FIG. 10 .

Scattered Raman spectra were recorded at one acquisition, 1 second accumulation, 400 μW, in the range of 500-2000 cm ^-1 . In spite of the low intensity (intensity), the characteristic peaks of Cy3 appeared at 1470 and 1580 cm ^-1 , which are fingerprint spectra in the laser incident light of 532 nm.

Surprisingly, it was found that the above-described NEW construct in the target nucleic acid detection reagent provides a stably enhanced Raman scattering signal within a certain range and reproducibly even when continuously measured 100 times per second. However, a stable signal could be obtained, but the exact same signal could not be obtained.

실시예 2: 타겟 핵산과 교합 반응 후 라만 신호 측정Example 2: Measurement of Raman signal after bite reaction with target nucleic acid

The nucleic acid-based self-assembly complex (NEW construct) prepared in Example 1 is a Raman indicator that exhibits a known Raman shift value to a second nucleotide that competes with a target nucleic acid in a occlusion reaction with the first nucleotide. is connected, and the nucleic acid-based self-assembly complex (NEW structure) acts as a sensor of the turn-off signal method in the presence of a target nucleic acid, so whether or not the nucleic acid-based self-assembly complex is formed/number/concentration of the Raman indicator It can be confirmed (quantified) by a signal (FIGS. 9 and 12).

The synthesized target nucleic acid (12 mer to 30 mer) was added to the target nucleic acid detection reagent containing the nucleic acid-based self-assembly complex (NEW construct) prepared in Example 1 at a known concentration (FIG. 12).

(a) a first nanoparticle-based construct in which a first nucleotide that occludes with a target nucleic acid is linked to a spherical gold nanoparticle, and (b) a second nanoparticle-based construct in which a second nucleotide complementary to the first nucleotide is linked to a spherical gold nanoparticle From, the temperature at which the nucleic acid-based self-assembly complex (NEW structure) formed through the complementary hydrogen bonding of the first nucleotide and the second nucleotide is dissociated, that is, the temperature at which the complementary hydrogen bond between the first nucleotide and the second nucleotide is removed (Tm) ), after raising it to 72.0 °C, the temperature was lowered to about 5 °C to the temperature (Ta) at which the synthesized target nucleic acid and the first nucleotide were mated.

An inverted Raman detection device manufactured by itself in the same manner as in Example 1 was was used to measure the Raman signal of the Raman marker linked to the second nucleotide.

The graph on the left in FIG. 11 is a Raman spectrum measured after raising the target nucleic acid detection reagent without a target nucleic acid to 72.0°C and lowering the temperature to about 5°C, and the graph on the right in FIG. 11 shows the number of first nucleotides in the target nucleic acid detection reagent It is a Raman spectrum measured when an excessive amount of the synthesized target nucleic acid is taken into consideration.

According to Example 1, the number of nucleic acid-based self-assembling complexes (NEW constructs) formed by spontaneous hydrogen bonding at the molecular level between the first nucleotide, which is a probe that mates with the target nucleic acid, and the second nucleotide complementary thereto, the number of It is in a functional relationship that is linked to the number of nucleic acids oppositely (FIG. 9). That is, when (i) the target nucleic acid in the sample is absent or less than the minimum value of the detection sensitivity of the target nucleic acid detection reagent for the target nucleic acid (the minimum value of the detection range of the target nucleic acid detection reagent containing or forming a nucleic acid-based self-assembly complex) The number of nucleic acid-based self-assembly complexes (NEW constructs) formed by self-assembly of complementary first and second nucleotides is the maximum, and (ii) the target nucleic acid in the sample is the detection sensitivity of the target nucleic acid detection reagent to the target nucleic acid When present above the maximum value of , the number of nucleic acid-based self-assembly complexes (NEW constructs) formed by self-assembly of complementary first nucleotides and second nucleotides is the minimum ( FIG. 11 ).

Therefore, the nucleic acid-based self-assembly complex (NEW construct) in the target nucleic acid detection reagent prepared in Example 1 serves as a sensor of the turn-off signal method in the presence of the target nucleic acid, so that the nucleic acid-based self-assembly complex containing the nucleic acid-based self-assembly complex at a known concentration The target nucleic acid detection reagent has a maximum Raman signal in the absence of the target nucleic acid, and the Raman signal decreases as the amount of target nucleic acid increases. and it is possible to secure or predict the reference points (Min, Max) of the optical signal for each concentration of the nucleic acid-based self-assembly complex in the target nucleic acid detection reagent (FIG. 11).

On the other hand, Figure 12 shows the target nucleic acid (12 mer ~ 30 mer) synthesized in the target nucleic acid detection reagent containing the nucleic acid-based self-assembly complex (NEW construct) prepared in Example 1 at a known concentration (0 M, 10 ^-16 M, 10 ^-12 M) is the measured Raman spectrum.

Surprisingly, in the absence of a target nucleic acid or in the presence of a known concentration of target nucleic acid in the target nucleic acid detection reagent that forms the above-described nucleic acid-based self-assembly complex (NEW structure), the Raman scattering signal is reversely linked to the target nucleic acid concentration upon light irradiation. was found to decrease in a constant pattern (FIG. 12). This is because the number of formation of the aforementioned nucleic acid-based self-assembly complex (NEW structure), that is, the number of nanogap formations thereof and the intensity of the enhanced Raman scattering signal formed therefrom, is in a functional relationship with the number of target nucleic acids, so the above-described nucleic acid-based self-assembly The target nucleic acid detection reagent that forms the complex (NEW structure) can act as a sensor of the on/off signal system, and it is also possible to quantitatively analyze the target nucleic acid through a computer algorithm from the intensity of the Raman scattering signal measured during light irradiation. can be inferred

In short, the target nucleic acid detection reagent of Example 1 is the change value of the Raman signal of the Raman marker when the nucleic acid-based self-assembly complex is not formed or disassembled by hybridization between the first nucleotide and the target nucleic acid in the presence of the target nucleic acid ( Intensity reduction) can be measured, and a target nucleic acid can be quantitatively analyzed ( FIGS. 9 to 12 ).

Claims

In the Raman scattering spectrum database construction and search method through machine learning,

From each sample, (a) one or more Raman shift values, from each shift value, (b) the Raman peak, which is the relatively highest value among the Raman intensities on the vertical axis, and (c) the Raman lowest point, which is the relatively lowest value among the Raman intensities on the vertical axis, is derived. , a first step of generating a Raman spectrum of each sample;

Step 2-1 of section learning the Raman shift value (a) of the first step;

Step 2-2 of cluster learning the Raman peak (b) of the first step, and

Step 2-3 of cluster learning the Raman lowest point (c) of the first step

a second step of generating (a') Raman shift values, (b') Raman peaks, and (c') Raman troughs from each shift value according to a machine learning algorithm that performs

Based on (a') Raman shift values generated in step 2, (b') Raman peaks and (c') Raman troughs at each shift value, (d) signal to noise ratio within a given repeatability is Sensitivity, defined as the ratio of the spectrum that is 50% or more Defined, stability and (f) based on the number of measurements within a given inspection time, the above-defined spectrum of stability of 50% or more and sensitivity of 50% or more has repeatability. Step 3;

a fourth step of calculating fractional bandwidth;

a fifth step of calculating spectral selectivity by selecting a spectrum having repeatability as defined in the third step and having a stability of 80% or more and a sensitivity of 90% or more as a selection value within the shift;

(a) Raman shift values of the first step, (b) Raman peaks and (c) Raman troughs at each shift value; (a') Raman shift values generated by machine learning in the second step, (b') Raman peaks and (c') Raman troughs at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and a sixth step of constructing a Raman spectrum database by inputting the spectral selectivity calculated in the fifth step; and

Optionally, by inputting (a) Raman shift values, (b) Raman peaks and (c) Raman troughs at each shift value of the sample in the first step, the desired prediction from the Raman spectrum database constructed in the sixth step a seventh step of calculating information;

Raman scattering spectrum database construction and search method through machine learning, characterized in that it comprises a.
The method of claim 1 , wherein the sample to be constructed of the Raman spectrum database is a liquid sample.
According to claim 2, wherein the detection indicator to derive the Raman shift value (a) is connected to the nanoparticles dispersed in the liquid sample,

To measure the Raman shift value (a), Raman scattering spectrum database construction and search through machine learning, characterized in that surface analysis Raman spectroscopy using localized surface plasmon resonance by the nanoparticles is performed Way.
[Claim 3] The method of claim 2, wherein the target for constructing the Raman spectrum database is a Raman scattering signal derived from a nucleic acid-based self-assembled complex undergoing Brownian motion in a liquid.
[Claim 5] The Raman scattering spectrum database construction and How to search.
The method of claim 1, wherein the prediction information calculated in step 7,

(a') Raman shift values generated by learning in the second step, (b') Raman highest points and (c') Raman lowest points at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred by machine learning in the third step; and Raman scattering spectrum database construction through machine learning characterized in that it is one or more output values generated through a specific function (hidden layer) by inputting one or more values selected from the group consisting of the spectrum selectivity calculated in step 5 and search methods.
The method of claim 6, wherein the prediction information output through the function is (a) state information of an animal or cell from which a sample is extracted, which is a target for deriving a Raman shift value, in the first step, disease diagnosis and/or evaluation of the effect of a therapeutic agent, and/ Or a Raman scattering spectrum database construction and search method through machine learning, characterized in that the bacterial infection information of the sample.
The method according to claim 6, wherein the prediction information output through the function is the presence and/or concentration of a specific biomaterial, a chemical bond derived from a cell, identification and/or concentration of a constituent material and/or a cell type. A method for building and searching a Raman scattering spectrum database through machine learning.
The method of claim 1, wherein the baseline is set as the average value of the standard deviation for each shift of the signal experimentally output through the reference material (negative control), and all Raman spectra are derived after baseline correction. Raman scattering spectrum database construction and retrieval methods.
The method of claim 1, wherein the Raman shift value ( Δv ) is derived through Equation 1 or Equation 2 below.

[Equation 1]

[Equation 2]
[Claim 2] The method of claim 1, wherein in the first step, one or more Raman shift values are derived in the range of 400 cm -1 to 3200 cm -1 .
The method of claim 1, wherein the first step is to obtain the lowest point and the highest point from each Raman shift value while moving 1 to 5 shifts in the Raman shift range of the inspection equipment. .
The method of claim 1, wherein the Raman peak is the highest intensity within the relatively Raman shift range, and the Raman lowest point is the lowest intensity within the relatively Raman shift range. .
[2] The method of claim 1, wherein the sixth step of constructing the Raman spectrum database comprises storing the items of (i) to (iv) below:

(i) the grant code corresponding to the substance;

(ii) any selected Raman shift values corresponding to the material;

(iii) Relative Intensity values for all selection shift values

(iv) the baseline difference value of the negative control used for the corresponding shift
The method according to claim 1, wherein the sixth step of constructing the Raman spectrum database is constructed by distinguishing a Raman spectrum database dedicated to the liquid sample,

Optionally, in the case of a liquid sample, an eighth step of separating and storing through spectral intensity indexing; and

Optionally, in the case of a liquid sample, a Raman scattering spectrum database construction and search method through machine learning, characterized in that it comprises a ninth step of filtering noise spectrum intensity reduction using spectral pattern matching.
The method according to claim 15, wherein, in the ninth step of filtering the noise spectrum intensity using spectral pattern matching in the case of a liquid sample, determining the spectral pattern matching by a coincidence rate with Raman shift values selected based on the material A method for building and searching a Raman scattering spectrum database through machine learning.
16. The method of claim 15, wherein to determine the spectral pattern matching by the coincidence rate with the Raman shift values selected based on the material, at least one of the following methods (i) to (iv) is used. How to build and search a scatter spectrum database:

(i) Noise is defined as 50% or less of the signal-to-noise ratio of each shift;

(ii) matching the minimum and maximum values of the obtained Raman shift values to the previously obtained reference minimum and maximum values for the material and adjusting the ratio;

(iii) Values within 1% of both sides of each selected Raman shift value are judged to be identical;

(iv) If the matching rate of all selected spectra is more than 95%, it is judged to be consistent
The method of claim 1, wherein the Raman scattering spectrum database construction and search method through machine learning is characterized in that it is used in a method for detecting a target nucleic acid using a Raman signal derived from a nucleic acid-based self-assembly complex in a liquid, comprising the following steps:

(a) a first nanoparticle-based structure linked to a first metal nanoparticle at least one first nucleotide that mates with a target nucleic acid according to conditions, and (b) a second nucleotide complementary to the first nucleotide by at least 10 base pairs (bp) The nucleic acid-based self-assembly complex formed by self-assembly between the second nanoparticle-based structure linked to the second metal nanoparticle is not formed by hybridization between the first nucleotide and the target nucleic acid in the presence of the target nucleic acid. A first step of preparing a target nucleic acid detection reagent designed to measure the change value of the Raman signal when it is not or disassembled;

Hybridization reaction (hybridization) with a target nucleic acid detection reagent containing (a) a first nanoparticle-based construct linked with a first nucleotide and (b) a second nanoparticle-based construct linked with a second nucleotide in Step I in a nucleic acid-containing liquid sample A second step of performing;

a third step of measuring a Raman signal derived from a nucleic acid-based self-assembly complex in a liquid sample before, after and/or simultaneously with the occlusion reaction of the second step; and

A step IV of providing detection and/or quantitative data of a target nucleic acid in a sample through an algorithm for analyzing the Raman signal measured in step III or a change value thereof.
The method according to claim 18, wherein the target nucleic acid detection reagent of step I comprises (a) a first nanoparticle-based structure in which a first nucleotide that mates with a target nucleic acid is linked to a first metal nanoparticle and (b) is complementary to the first nucleotide Forming a nucleic acid-based self-assembly complex through complementary hydrogen bonding of a first nucleotide and a second nucleotide from a second nanoparticle-based structure in which a second nucleotide is linked to a second metal nanoparticle,

When the nucleic acid-based self-assembly complex is formed, (i) a nanogap is formed by two adjacent metal nanoparticles, (ii) the nanogap is a space that generates and further enhances surface plasmon resonance when irradiated with light, (iii) ) A Raman scattering spectrum database construction and search method through machine learning, characterized in that it is designed to enhance the Raman scattering signal detected during light irradiation by placing a Raman indicator linked to the second oligonucleotide in the nanogap.
The method of claim 18, wherein the hybridization reaction with the target nucleic acid detection reagent is performed in the nucleic acid-containing liquid sample, and (a) Raman shift derived from the nucleic acid-based self-assembly complex in the liquid sample before, after and/or simultaneously with the hybridization reaction Raman scattering spectrum database construction through machine learning, which is characterized by calculating the desired prediction information from the constructed Raman spectrum database by measuring and inputting (b) Raman peak and (c) Raman minimum at each shift value. and search methods.
A program for executing at least one of the first to ninth steps in a computer so that the Raman scattering spectrum database construction and search method through machine learning according to any one of claims 1 to 20 is performed in the computer A transmission medium or a computer-readable recording medium.
In the apparatus for calculating desired biometric prediction information from a Raman spectrum database,

From a biological-derived liquid sample, (a) one or more Raman shift values, and at each shift value, (b) a Raman peak, which is a relatively highest value among Raman intensities on the vertical axis, and (c) a Raman minimum, which is a relatively low value among Raman intensities on the vertical axis. Information receiving unit (A) to collect the Raman spectrum list (list) of the sample;

Information included in the Raman spectrum list of the sample, (a-1) an algorithm for section learning the Raman shift value; (b-1) an algorithm for cluster learning of Raman peaks; and (c-1) using the Raman lowest point as an input to the algorithm for cluster learning,

Based on (a') Raman shift values generated by the machine learning algorithm, (b') Raman peaks and (c') Raman troughs at each shift value, (d) signal to noise ratio within a given repeatability ), defined as the ratio of the spectrum with 50% or more, sensitivity, (e) the spectrum of the standard deviation (δ) range (μ-δ, μ+δ) of the normal distribution mean (μ) in which the distribution range of a given spectrum value is Stability, defined as the composition ratio, and (f) repeatability, defined as repeatability of a spectrum with a stability of 50% or more and a sensitivity of 50% or more, as defined above based on the number of measurements within a given inspection time, is machine learning. infer,

Spectrum with repeatability, stability of 80% or more, and sensitivity of 90% or more is selected as the selection value within the shift, and spectrum selectivity is calculated,

Information included in the Raman spectrum list of the sample, (a) Raman shift values, at each shift value, (b) Raman peaks and (c) Raman troughs; (a') Raman shift values generated by a machine learning algorithm, (b') Raman peaks and (c') Raman troughs at each shift value; (d) sensitivity, (e) stability and (f) repeatability, inferred from this by machine learning; and a Raman spectrum database (B) constructed by inputting the calculated spectral selectivity by selecting it as a selection value within the corresponding shift therefrom;

Optionally, by inputting (a) Raman shift value, (b) Raman peak and (c) Raman trough at each shift value of the sample, desired biometric prediction information is calculated from the constructed Raman spectrum database (B) Biometric information prediction unit (C)

Bio-prediction information calculating device, characterized in that it comprises a.
23. The apparatus of claim 22, wherein the method for constructing and searching a Raman scattering spectrum database through machine learning according to any one of claims 1 to 20 is performed.