WO2023035970A1 - 一种光谱学和人工智能交互的血清分析方法及其应用 - Google Patents

一种光谱学和人工智能交互的血清分析方法及其应用 Download PDF

Info

Publication number
WO2023035970A1
WO2023035970A1 PCT/CN2022/114961 CN2022114961W WO2023035970A1 WO 2023035970 A1 WO2023035970 A1 WO 2023035970A1 CN 2022114961 W CN2022114961 W CN 2022114961W WO 2023035970 A1 WO2023035970 A1 WO 2023035970A1
Authority
WO
WIPO (PCT)
Prior art keywords
serum
samples
sers
data
analysis method
Prior art date
Application number
PCT/CN2022/114961
Other languages
English (en)
French (fr)
Inventor
肖湘衡
董仕练
汪付兵
蒋昌忠
Original Assignee
武汉大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉大学 filed Critical 武汉大学
Publication of WO2023035970A1 publication Critical patent/WO2023035970A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • G01N21/658Raman scattering enhancement Raman, e.g. surface plasmons

Definitions

  • the invention belongs to the field of nanomaterials and artificial intelligence, and in particular relates to a serum analysis method for the interaction of spectroscopy and artificial intelligence and its application in high-precision identification and differential SERS peak position analysis for various cancer patients and normal people.
  • liquid biopsy also known as non-invasive tumor diagnostic technology.
  • liquid biopsy can realize early screening and molecular typing of cancer by detecting free circulating tumor cells, circulating tumor DNA, protein and other cancer-related biomolecules in human urine, sweat, blood and other body fluids , medication guidance, recurrence monitoring and other clinical applications. Liquid biopsy can efficiently screen and diagnose cancer without causing damage to patients, which has great clinical significance and application prospects.
  • serum is the most widely used cancer liquid biopsy biological sample in medicine at present. It refers to the light yellow transparent liquid separated after removing fibrinogen and some coagulation factors in plasma. Its main function is to provide basic nutrients, hormones and Various growth factors, provide binding proteins, provide contact-promoting and growth factors to protect cells from mechanical damage, and protect cells in culture.
  • the biomolecules contained in serum are closely related to the growth/inhibition of human cells, so the development and expansion of serum analysis is of great significance to the current liquid biopsy of cancer.
  • SERS Surface Enhanced Raman Scattering
  • the mainstream method of applying SERS technology to pathological diagnosis is to first combine biomolecules modified with Raman probes with SERS substrates, and then use biospecific interactions to anchor free biomarkers in body fluids.
  • the biomarkers to be studied such as RNA, DNA, protein, polypeptide, etc.
  • This method is difficult to obtain the essential information of biomarkers, and the use of biologically specific and related antibody antigens makes the cost of this type of cancer detection method higher. Therefore, it is an urgent problem to find a SERS method that can directly, efficiently and inexpensively detect the essential information of biomarkers.
  • the purpose of the present invention is to interact spectroscopy with artificial intelligence algorithms and analyze the differences between a large number of cancer patients and normal human serum samples to achieve fast, low-cost and accurate high-throughput cancer detection.
  • the serum analysis method of spectroscopy and artificial intelligence interaction provided by the present invention can simultaneously realize high-accuracy recognition and precise positioning of differential SERS peaks in the serum of various cancer patients and normal people. This method is expected to be used in actual clinical practice. It plays an important role in the detection of cancer-related serum.
  • a blood serum analysis method that interacts with spectroscopy and artificial intelligence, using silver nanowires without intrinsic Raman signals as SERS probes, and directly performing silver nanowire solutions with serum samples of diseased patients and normal people without any specificity
  • the liquid-phase mixing and co-incubation of the sexual labeling process after the incubation, the serum SERS spectral data is collected under the Raman spectrometer test to obtain the original spectral data points, and then the original spectral data points are reduced by using the covariance calculation method (matrix algorithm) Processing, the spectral data points obtained by dimensionality reduction are the difference peak positions of diseased samples compared with normal samples, and then use the support vector machine (svm) model to classify and identify the spectral data after dimensionality reduction and output different The recognition accuracy of diseased samples compared with normal samples.
  • matrix algorithm covariance calculation method
  • the serum analysis method comprises the following steps:
  • the serum SERS spectral data from different sources are first subjected to dimensionality reduction processing to remove irrelevant items in the sample data points, and finally screen out effective dimensions that can reflect data differences, Specifically: Calculate the original data dimension correlation between different samples through the covariance (covariance matrix), and then sort the data points with the lowest correlation (preferably, the original 1456 dimensions from low to high in frequency, every 60 Among the continuous dimensions, select the two dimensions with the lowest correlation) as the effective dimensions after the dimension reduction process, and finally select two effective dimensions for the remaining continuous dimensions less than 60, and these dimensions correspond to the difference peak positions between different cases;
  • y is the data before scaling
  • y' is the data after scaling
  • lower and upper are the minimum and maximum values of the scaled data
  • min and max are the minimum and maximum values of the data before scaling
  • the kernel function used in the algorithm processing is radial basis kernel function (i.e. RBF kernel function), namely:
  • is the hyperparameter of the Gaussian kernel function
  • is the Lagrange multiplier
  • w is the normal vector on the plane, which determines the direction of the hyperplane
  • b is the displacement term, representing the distance from the hyperplane to the origin
  • represents the slack variable
  • is the dual variable
  • Parameters C and g are the best parameters after grid optimization by the parameter optimization tool grid.py in libsvm, where C is the penalty coefficient, that is, the tolerance for errors. The higher the C, the easier it is to overfit, which means the less Tolerate errors; the smaller C, it is easy to underfit; if C is too large or too small, the generalization ability will deteriorate.
  • g is after selecting the RBF function as the kernel function, a parameter that comes with the function implicitly determines the distribution of the data after it is mapped to the new feature space. The larger the g, the fewer the support vectors, and the smaller the g value, the support vector The more, the number of support vectors affects the speed of training and prediction;
  • d(x, z) is the distance
  • is the width parameter of the function
  • the training set is used to train to obtain the svm model for serum SERS spectral data.
  • the classification decision function used in this process is:
  • a* is obtained by the smo algorithm
  • K(xi,x) corresponds to the Gaussian kernel function
  • b* is the threshold, which has been obtained in the previous step.
  • 2 is the regularization term, namely:
  • the rotational speed of the original silver nanowires in step (1) is 6000r/min when centrifuged.
  • the original silver nanowire solution is prepared by the following method: earlier the CuCl of 1.665g of polyvinylpyrrolidone (molecular weight is 360000) and 0.0019g Add in the ethylene glycol of 100ml, in the ultrasonic pool Stir and disperse evenly to obtain A solution; then dissolve 1.7g of AgNO 3 in 100ml of ethylene glycol to obtain B solution; then add the above A solution dropwise to B solution and stir evenly, and finally move the mixed solution to 250ml In a high-pressure reaction kettle, seal the reaction kettle and put it into an oven, heat at 160° C. for 3 hours, and obtain the original silver nanowire solution after cooling to room temperature.
  • step (3) Before calculating the dimensional correlation of the original data between different samples through the covariance in step (3), it is necessary to convert the format of all spectral data into libsvm format with the help of weka software, and then divide it into several effective frequencies at certain intervals part.
  • the present invention also provides the application of the above serum analysis method in high-precision identification and differential SERS peak position analysis for various patients and normal people.
  • the patient is a lung cancer patient and a colorectal cancer patient
  • the serum analysis method is used for high-precision identification and differential SERS peak analysis of lung cancer patients, colorectal cancer patients, and normal people, in step (3)
  • Each original data has about 1456 dimensions before dimension reduction processing, and after dimension reduction processing, it is simplified to 50 dimensions, that is, corresponding to 50 SERS characteristic peaks with obvious differences.
  • normal human serum is used as a Class, cancer patient serum as another class; another part of the cancer patient and normal person samples for algorithm training, and the remaining samples for cancer recognition, the serum spectral data of cancer patients during training and recognition as cancer class, normal person's serum spectrum
  • the data is regarded as the normal class alone, and the two types of data are imported into the svm model for algorithm training and recognition, and finally the recognition accuracy of cancer patients compared with normal people is obtained.
  • the identification of lung cancer with an accuracy of 94.1% and a sensitivity of 91.84% and the identification of colorectal cancer with an accuracy of 98.25% and a sensitivity of 97.73% can be achieved, and 50 patients with lung cancer and colorectal cancer and The differential SERS peak position of normal people is expected to be used in the clinical diagnosis of cancer and the tracing of pathological essence.
  • the analysis method is applied to the detection of serum samples, and then it is preliminarily judged that the detection object is at least one or none of the sick patients.
  • the sample pretreatment stage does not require any biospecific modification process, and the intrinsic spectral signal of the serum sample can be obtained, so the cost of consumables is relatively low, and the cost of consumables for each serum sample is about 1 yuan;
  • the dimensionality reduction process of artificial intelligence algorithms can be used to locate the difference peaks of SERS between cancer patients and normal people, which is expected to provide guidance for clinical cancer diagnosis and treatment;
  • the present invention interacts with SERS spectroscopy technology and artificial intelligence technology to obtain high-precision cancer identification and locate the peak difference between cancer patients and normal people.
  • the present invention does not require any antibody antigen Bio-specific modification processes such as bio-specific, can obtain the intrinsic spectral signal of serum samples, and finally successfully achieved a cheaper, faster and more accurate distinction between cancer patients and normal human serum signals, which provides a new method for the field of clinical liquid biopsy today. A new idea of detection and pathological information acquisition.
  • Fig. 1 is a flowchart of a serum analysis method in which spectroscopy and artificial intelligence interact in the present invention.
  • Fig. 2 is the typical pattern and summary pattern of SERS of several cases of normal human serum and serum of patients with colorectal cancer and lung cancer in Example 2.
  • Fig. 3 is a part-dimensional thermodynamic diagram of the dimensionality reduction analysis performed on the SERS spectral data of 244 cases of lung cancer serum samples & 350 cases of normal samples in Example 2.
  • Fig. 4 is a screenshot of the statistical table of 50 Raman characteristic peaks obtained after dimensionality reduction of lung cancer & normal samples in Example 2.
  • Fig. 5 is a part-dimensional thermodynamic diagram when the dimensionality reduction analysis is performed on the SERS spectral data of 216 colorectal cancer serum samples & 350 normal samples in Example 2.
  • Fig. 6 is a screenshot of the statistical table of 50 Raman characteristic peaks obtained after dimensionality reduction of colorectal cancer & normal human samples in Example 2.
  • FIG. 7 is a flow chart of the recognition accuracy output for colorectal cancer patients, lung cancer patients, and normal people in Embodiment 3.
  • Fig. 8 is a logic diagram of algorithm operation for colorectal cancer patients, lung cancer patients and normal people in embodiment 3.
  • FIG. 9 is a scatter distribution diagram and a statistical diagram of accuracy and sensitivity for the identification of three types of samples of colorectal cancer, lung cancer and normal people in Example 3.
  • the present invention mainly combines the SERS spectrum technology in the physical field with the artificial intelligence technology in the computer field.
  • the "Output-It is expected to guide clinical treatment" mode perfectly combines SERS technology and artificial algorithms and completes information interaction, and finally achieves high-accuracy, rapid cancer identification and very informative differential peak positioning. Specifically include the following steps:
  • silver nanowires are used as SERS probes, and the specific preparation process of the original silver nanowire solution is as follows:
  • the above-mentioned silver nanowires need to be centrifuged to remove impurities, and the obtained silver nanowires have a diameter of about 100 nm and a length of 10-20 ⁇ m.
  • the specific operation of centrifugation is as follows: take 4.5ml of the original silver nanowire solution, keep the rotation speed at 6000r/min during centrifugation, remove all the supernatant with a straw after centrifugation for 10min, and reweight the obtained silver nanowire precipitation with 1mL deionized water. Finally, use an ultrasonic cleaner to disperse evenly to obtain a concentrated silver nanowire solution;
  • the lens used for spectrum collection was a 50x confocal lens, and the laser wavelength was 532nm. , the range of spectrum collection is 600cm -1 ⁇ 1800cm -1 , each serum sample is processed 5 times under the same conditions after the same treatment, and the total time spent on each sample is about 15 minutes.
  • the SERS spectra of all serum samples from different sources can be obtained.
  • the serum SERS profiles of several normal people can be found, and the serum spectrum of each normal person can be found
  • the curves all have obvious characteristic peak positions, and all spectral curves have some common characteristic peak positions;
  • Figure 2(c) and Figure 2(d) show several cases of lung cancer patients and several cases of colorectal cancer patients It can be found that the spectral curves of each lung cancer patient and colorectal cancer patient also have obvious characteristic peaks. Although all the spectral curves in Figure 2 have some common characteristic peaks, the cancer patients’ Certain characteristic peaks have different degrees of subtle differences from those of normal people.
  • the present invention proposes a method for statistically processing, analyzing and identifying a large amount of serum SERS spectral data by means of artificial intelligence algorithm technology.
  • the algorithm tool used in the present invention is libsvm, before carrying out svm model training and testing with serum spectrum data, the format of all spectral data is converted into the format required by libsvm by means of weka software. Since the data of each sample are data points between 600cm -1 and 1800cm -1 , there are 1456 detailed data points in this frequency range.
  • the abscissa of the SERS spectral data of all samples is the same frequency, but the peak intensity corresponding to each frequency of each sample is different, so each frequency is regarded as an index value, and a corresponding peak intensity is a dimension , so that the data of each sample becomes a 1456-dimensional data, and the 1456 dimensions are sorted from low to high in frequency. But not every dimension is useful, and some dimensions do not have characteristics, so the next step is to clean the data and reduce the dimensionality of features.
  • normal people are divided into one category and two types of cancer patients are divided into another category during the dimensionality reduction process.
  • the original 600cm -1 ⁇ 1800cm -1 frequency band spectral data is first divided into 60cm -1 intervals Divide it into several effective frequency segments, and then use the covariance to calculate the correlation between the features of each segment in different frequency segments, where the correlation is between -1 and 1, the closer to -1 and 1, the greater the correlation, the more The closer to 0, the smaller the correlation, and finally present the correlation of frequency features in different ranges in the form of a heat map.
  • FIG 3 it shows the correlation heat map of 244 lung cancer samples relative to 350 normal control samples in the frequency range of 600cm -1 ⁇ 623.7705cm -1 , in which the correlation between different dimensions can be clearly found There is a marked difference in the distribution.
  • the 50 Raman peaks represent 50 differences between the SERS spectrum of the serum of lung cancer patients and the SERS spectrum of normal human serum.
  • Figure 5 shows the correlation heat map of 216 colorectal cancer samples relative to 350 normal control samples in the frequency range of 600cm -1 ⁇ 623.7705cm -1 , and can also clearly find different dimensions The correlation between them shows a significant difference in the distribution, and the specific dimensional differences between the SERS spectra of colorectal cancer patients and normal people are shown in Figure 6.
  • the present invention can simplify the tedious SERS peak position process and achieve more accurate SERS differential peak position positioning, which is expected to provide guidance for future clinical cancer diagnosis and treatment.
  • the SERS spectrum of each serum sample can be reduced to 50 dimensions, and then all data is processed according to the flow chart shown in Figure 7, and the following two types of training and identification are performed:
  • the label of serum spectral data of colorectal cancer patients is 1, and the label of serum spectral data of normal people is 0, which is used to judge whether it is a colorectal cancer patient;
  • the label of serum spectral data of lung cancer patients is 1, and the label of serum spectral data of normal people is 0, which is used to Determine whether it is a lung cancer patient.
  • the logic diagram of the algorithmic operation performed in this embodiment is shown in Figure 8.
  • the normalization formula used in the scaling process is:
  • y is the data before scaling
  • y' is the data after scaling
  • lower and upper are the minimum and maximum values of the data after scaling
  • min and max are the minimum and maximum values of the data before scaling.
  • the kernel function used in the algorithm processing is the radial basis kernel function (RBF kernel function).
  • RBF kernel function maps samples nonlinearly to a higher-dimensional space. Unlike linear kernels, it can handle classification labels and attributes. The nonlinear relationship of , has shown good performance in practical problems.
  • the specific expression is:
  • is the hyperparameter of the Gaussian kernel function. Specifically:
  • is the Lagrangian multiplier
  • w is the normal vector on the plane, which determines the direction of the hyperplane
  • b is the displacement term, representing the distance from the hyperplane to the origin
  • represents the slack variable
  • is the dual variable.
  • the parameters C and g in the present invention are the best parameters after grid optimization by grid.py in libsvm, and C is the penalty coefficient, that is, the tolerance to errors.
  • C is the penalty coefficient, that is, the tolerance to errors.
  • g is after selecting the RBF function as the kernel function, a parameter that comes with the function implicitly determines the distribution of the data after it is mapped to the new feature space. The larger the g, the fewer the support vectors, and the smaller the g value, the support vector The more, the number of support vectors affects the speed of training and prediction.
  • d(x, z) is the distance
  • is the width parameter of the function.
  • the colorectal cancer patient’s serum spectral data label is 1, and the normal human serum spectral data label is 0, the C value is 8, and the g value is 0.0488; the lung cancer patient’s serum spectral data label is 1, and the normal human serum spectral data label is 0.
  • the data label is 0, the C value is 8, and the g value is 0.25.
  • the training set is used to train to obtain the svm model for serum SERS spectral data.
  • the classification decision function used in this process is:
  • a* is the optimal solution for a group of a i satisfying the conditions, which is obtained by the smo algorithm, K( xi ,x) corresponds to the Gaussian kernel function, and b* is the threshold value, which has been obtained in the previous step.
  • 2 is the regularization term, namely:
  • the absolute value of y(wx+b) represents how far the sample is from the decision boundary. The larger the absolute value, the farther the sample is from the decision boundary.
  • the hinge loss is only 0 when the sample is correctly classified and the function margin is greater than 1, otherwise the loss is 1-y(wx+b).
  • Figure 9a-b shows the scatter distribution diagrams for three different data sets, and it can be found that the algorithm model established by the present invention has excellent classification effect on serum Raman data from different sources, among them, for colorectal cancer
  • the classification recognition effect is slightly better than that of lung cancer.
  • Figure 9c it can be found that compared with normal people, lung cancer and colorectal cancer all show high-sensitivity recognition with a recognition accuracy rate higher than 94.1% and a sensitivity higher than 91.84%. Specifically, the achievable accuracy is 94.1%.
  • the serum analysis method of spectroscopy and artificial intelligence interaction proposed by the present invention can realize high-accuracy cancer detection, which is of great significance for the rapid, high-precision, and non-invasive detection of clinical cancer.
  • the method of the present invention takes a very short time, from sample collection-sample preparation-spectrum sampling-algorithm training-recognition accuracy result output, the whole process It takes about 1 hour, and the cost of consumables (silver nanowire solution) is less than 1 yuan, except for the cost of the detection instrument itself.
  • This is of great significance to the current field of cancer liquid biopsy, and it solves the problem of traditional medicine in the process of cancer detection for a long time.
  • the method is highly invasive, the detection period is long, and the cost is high.

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

一种光谱学和人工智能交互的血清分析方法及在对多种病人、正常人的有效识别和差异SERS峰位分析中的应用。血清分析方法,包含对临床血清样本的体相SERS光谱数据采集,利用协方差算法对光谱数据降维处理获得癌症病人和正常人的光谱差异峰位,借助人工智能算法的svm模型进行光谱数据处理和算法识别获得癌症识别率。因为无需任何抗体抗原等生物特异性修饰过程,所以能更低廉、快速、精准地对病人和正常人血清进行识别,还能定位大量血清样本间SERS光谱的差异峰位,为临床癌症的液体活检领域提供了一种全新的检测和分析方法。

Description

一种光谱学和人工智能交互的血清分析方法及其应用 技术领域
本发明属于纳米材料和人工智能领域,具体涉及一种光谱学和人工智能交互的血清分析方法及在对多种癌症病人、正常人的高精度识别和差异SERS峰位分析方面的应用。
背景技术
癌症作为一种严重威胁人类生命的世界性疾病,每年以可怕的数量夺走了无数生命。虽然已有新的癌症疗法投入使用,但肿瘤本身的复杂性和异质性使得现有的临床治疗方案收效甚微。近年来,人们非常重视使用液体活检技术来检测和分类癌症,液体活检又称肿瘤无创诊断技术。作为体外诊断的一个分支,液体活检通过检测对人体尿液、汗液、血液等体液中的游离循环肿瘤细胞、循环肿瘤DNA、蛋白质等癌症相关生物分子,可以实现癌症的早期筛查、分子分型、用药指导、复发监测等临床应用。液体活检在对病人不造成损伤的情况下进行高效的癌症筛查和诊断,具有非常大的临床意义和应用前景,因而曾被MIT科技综述杂志评为2015年的年度十大突破技术之一。其中血清作为目前医学上使用范围最广的癌症液体活检生物样本,是指去除血浆中纤维蛋白原及某些凝血因子后分离出的淡黄色透明液体,其主要作用是提供基本营养物质、激素和各种生长因子、提供结合蛋白、提供促接触和生长因子使细胞贴壁免受机械损伤、对培养中的细胞起到某些保护作用等。血清中含有的生物分子对人体细胞的生长/抑制息息相关,因此血清分析学的发展和拓展对于当前的癌症液体活检具有非常重要的意义。
目前针对血清的主要分析方法大部分都是通过抗体抗原、碱基互补等特异性的生物相互作用对血清中某一些特定的、已知的生物小分子进行靶向识别和检测。以血清中的蛋白质检测为例,目前医学上常用的方法有酶联免疫吸附测定法和蛋白质印迹分析法,当需要测定血清中某种蛋白质的含量,这两种常用医学方法必须进行与该蛋白质相匹配的特异性抗体标记过程,但这类标记过程较繁琐且成本较为昂贵。
表面增强拉曼散射(Surface Enhanced Raman Scattering,SERS)作为一种物理光谱学技术,其主要原理是利用贵金属(Au、Ag、Cu等)基底与激发激光之间的等离激元共振将基底表面附近的分子拉曼散射光谱显著放大,进而得到类似于人类指纹的分子内部结构光谱信息,具有极高的灵敏性。目前利用SERS方法解决生物学问题是一大热点,其主要原因是传统医学检测手段存在耗时长、效率低的弊端,而SERS光谱采集则往往只需要极短的时间(10分钟内)且灵敏度极高(增强因子可达10 13)。虽然SERS方法具有上述这些优点,但是对于 当前的临床血清检测和分析而言还存在着一些亟待解决的问题,例如:
(1)目前将SERS技术应用于病理学诊断的主流方法都是先将修饰有拉曼探针的生物分子与SERS基底结合,再借助生物特异性相互作用使体液中游离的生物标志物锚定在SERS基底上,最后通过拉曼探针的信号变化间接性的分析所要研究的生物标志物(如RNA、DNA、蛋白质、多肽等)。这一方法很难得到生物标志物的本质信息,且生物特异性相关的抗体抗原使用使得此类癌症检测方法的成本较高。因此寻找一种能直接、高效、低廉的检测生物标志物本质信息的SERS方法是迫待解决的一个问题。
(2)当待检测的生物样本数量变大时,利用SERS技术采集到的光谱学数据量也随之变大,这使得很难直接通过人力进行有效的数据区分,例如对于几百乃至上千例的癌症病人/正常人的血清SERS数据分析而言,人眼根本无法对所有癌症病人、正常人采集到的光谱学数据进行系统的统计学区分。因此寻找一种识别大批量光谱学数据的方法是使SERS技术真正应用在临床医学上的必经之路。
(3)利用当前的SERS技术,即使能直接或间接的获得部分癌症标志物的光谱,但当癌症标志物的样本数据量增大时通过人眼也很难识别出不同样本间SERS光谱的差异峰位。因此寻找一种能进行大批量样本的SERS差异峰位定位的可靠方法,也是使SERS技术在实际癌症诊疗中真正得到有效推广的重要途径。
发明内容
针对现有技术中存在的不足,本发明的目的是将光谱学与人工智能算法交互并进行大量的癌症病人和正常人血清样本差异性分析进而实现快速、低廉且精准的高通量癌症检测,本发明所提供的光谱学与人工智能交互的血清分析方法,可同时实现多种癌症病人和正常人血清的高准确度识别并精确定位出差异性的SERS峰位,这种方法有望在实际的临床癌症相关血清检测中发挥重要的作用。
本发明的目的通过下述技术方案实现:
一种光谱学和人工智能交互的血清分析方法,以无本征拉曼信号的银纳米线作为SERS探针,将银纳米线溶液分别与患病病人和正常人的血清样本直接进行无任何特异性标记过程的液相混合共孵化,孵化完毕后,在拉曼光谱仪测试下进行血清SERS光谱数据采集得到原始光谱数据点,随后利用协方差计算方法(矩阵算法)对原始光谱数据点进行降维处理,降维得到的光谱学数据点即为患病样本相比于正常样本的差异峰位,再借助支持向量机(svm)模型对降维后的光谱学数据进行分类训练和识别后输出不同患病样本相比于正常样本的识别准确率。
优选地,所述血清分析方法包括如下步骤:
(1)去除原始银纳米线溶液中的银纳米颗粒杂质(优选将原始银纳米线溶液进行离心处理以去除杂质,银纳米颗粒杂质在上清液中),然后去离子水重悬银纳米线沉淀、分散后获得银纳米线溶液备用;另外将不同类型患病病人、正常人的外周血血浆样本进行离心处理获得对应血清样本并备用;
(2)将银纳米线溶液分别与上述所有的血清样本按照固定的体积比进行液相混合孵化,确保银纳米线与血清充分接触,孵化完毕后利用拉曼光谱仪对所有的样本进行体相的SERS光谱学数据采集,采谱时激光波长为532nm,采谱范围为600cm -1~1800cm -1,每个样本采谱5次;
(3)待所有血清样本的光谱学数据采集完毕后,先将不同来源的血清SERS光谱数据进行降维处理,去除样本数据点中的无关项,最终筛选出能体现数据差异性的有效维度,具体的:通过协方差(协方差矩阵)算出不同样本间的原始数据维度相关性,再以相关性最低的数据点(优选的,原始的1456个维度按照频率从低到高进行排序,每60个连续维度里选择相关性最低的2个维度)作为降维处理后的有效维度,最后剩余不足60个的连续维度仍然选择2个有效维度,这些维度对应不同病例间的差异性峰位;
(4)然后进行算法训练:在进行算法训练和识别时以降维处理后的数据点作为特征值进行二分类处理,将所有样本分成训练集和测试集两部分,然后对每一例样本的数据进行缩放,缩放范围为[0,1],缩放过程中用的归一化公式为:
Figure PCTCN2022114961-appb-000001
这里y是缩放前的数据,y'是缩放后的数据,lower与upper是缩放后数据的最小值、最大值,min与max是缩放前数据的最小值、最大值;
对应的支持向量展式为:
Figure PCTCN2022114961-appb-000002
其中k(x,x i)为核函数,上式显示出模型最优解可以通过训练样本的核函数展开;
算法处理过程中使用的核函数为径向基核函数(即RBF核函数),即:
K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0;
γ就是高斯核函数的超参数;
具体为:
首先,将原始问题转化为凸优化问题:
原始问题:
Figure PCTCN2022114961-appb-000003
s.t.y i(w·x i+b)≥1-ξ i,i=1,2,…,N
ξ i≥0,i=1,2,…N;
然后进行凸优化问题求解;
①原始问题的对偶问题,构造拉格朗日函数:
Figure PCTCN2022114961-appb-000004
其中,α为拉格朗日乘子;w为平面上的法向量,决定了超平面的方向;b为位移项,代表超平面到原点的距离;ξ为代表松弛变量;μ为对偶变量,先求对w,b,ξ的极小值,分别求偏导并令导数为0,然后代入原函数,再对极小值求α的极大值,再将求极大转换为求极小,得到对偶问题:
Figure PCTCN2022114961-appb-000005
Figure PCTCN2022114961-appb-000006
0≤α i≤C,i=1,2,…N;
选择K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0为核函数;
②由KKT条件成立可以得到:
Figure PCTCN2022114961-appb-000007
Figure PCTCN2022114961-appb-000008
参数C,g是经过libsvm中的参数优化工具grid.py进行网格寻优后的最佳参数,其中C是惩罚系数,即对误差的宽容度,C越高容易过拟合,说明越不能容忍出现误差;C越小,容易欠拟合;C过大或过小,泛化能力变差。g是选择RBF函数作为核函数后,该函数自带的一个参数,隐含地决定了数据映射到新的特征空间后的分布,g越大,支持向量越少,g值越小,支持向量越多,支持向量的个数影响训练与预测的速度;
关于γ和g的关系,根据以下公式推出:
Figure PCTCN2022114961-appb-000009
其中,d(x,z)为距离,gamma=γ,也就是g值,与高斯核函数的超参数相等,σ为函数的宽度参数;
核函数和参数C,g选择好后,用训练集进行训练获得针对血清SERS光谱数据的svm模型,此过程中用到的分类决策函数为:
Figure PCTCN2022114961-appb-000010
其中的a*由smo算法得到,K(xi,x)对应高斯核函数,b*为阈值,在上一步中已经求出。
选择合页损失函数为损失函数,λ||w|| 2为正则化项,即:
Figure PCTCN2022114961-appb-000011
当样本被正确分类时:y(wx+b)>0;当样本被错误分类时:y(wx+b)<0,y(wx+b)的绝对值代表样本距离决策边界的远近程度,绝对值越大,表示样本距离决策边界越远,当样本被正确分类且函数间隔大于1时,合页损失才是0,否则损失是1-y(wx+b);
(5)再利用测试集对得到的模型进行测试,并将实际情况与模型预测结果进行比对,最终获得识别准确率结果输出。
优选地,步骤(1)中原始的银纳米线离心时转速为6000r/min。
优选地,步骤(1)中原始银纳米线溶液由以下方法制备得到:先将1.665g的聚乙烯吡咯烷酮(分子量为360000)和0.0019g的CuCl 2加入100ml的乙二醇中,在超声池中搅拌分散均匀得A溶液;再将1.7g的AgNO 3溶于100ml的乙二醇中得B溶液;然后将上述A溶液匀速地滴加到B溶液中并搅拌均匀,最后将混合溶液移至250ml高压反应釜中,将反应釜密封后放入烘箱中,在160℃下加热3h,待冷却至室温后,得到原始银纳米线溶液。
优选地,步骤(3)中通过协方差算出不同样本间的原始数据维度相关性之前需要先借助 weka软件将所有光谱数据的格式转变成libsvm格式,再将以一定的间隔划分为若干个有效频率段。
本发明还提供了上述血清分析方法在对多种病人、正常人的高精度识别和差异SERS峰位分析中的应用。
优选地,所述病人为肺癌病人及结直肠癌病人,所述血清分析方法在对肺癌病人、结直肠癌病人、正常人的高精度识别和差异SERS峰位分析时,步骤(3)中,降维处理前每一个原始数据约有1456个维度,降维处理后精简为50个维度,即对应差异性较明显的50处SERS特征峰位,进行二分类处理时,将正常人血清作为一类,癌症病人血清作为另外一类;另外其中一部分的癌症病人和正常人样本进行算法训练,剩余的样本进行癌症识别,训练和识别时癌症病人的血清光谱数据作为癌症类,正常人的血清光谱数据单独作为正常类,将两类数据导入svm模型中进行算法训练和识别后,最终获得癌症病人相比于正常人的识别准确率。
利用所述分析方法可以实现准确度为94.1%,灵敏度为91.84%的肺癌识别以及准确率为98.25%,灵敏度为97.73%的结直肠癌识别,并分别获得50处肺癌、结直肠癌癌症病人与正常人的差异SERS峰位,这有望用于临床的癌症实际诊断和病理学本质追溯。
当上述分析方法最终输出的识别准确率大于90%时,将所述分析方法应用于血清样本检测,进而初步判断检测对象为患病病人中的至少一种或者都不是。
本发明的技术方案,相对于现有技术具有如下优点和有益效果:
(1)样品预处理阶段不需要任何的生物特异性修饰过程,能获得血清样本的本征光谱学信号,因此耗材成本相对低廉,每检测一份血清样本的耗材成本约为1元;
(2)对于待检测癌症血清样本的种类没有限制,不论是肺癌病人还是结直肠癌病人,都能和正常人的血清进行有效区分,这两类癌症病人相比于正常人识别的准确度均可达94%以上,甚至接近100%;
(3)借助人工智能算法的降维处理过程可以定位出癌症病人相比于正常人的SERS差异峰位,这有望为临床的癌症诊疗提供指导性意见;
(4)借助SERS技术的高灵敏度和人工智能的高识别准确率能获得高准确率的癌症诊断结果和差异性峰位定位,整体“制样-检测-分析-结果输出”的流程耗时较短,仅需1小时,这对新一代的癌症诊疗策略提供了全新的思路和启发。
本发明将SERS光谱学技术和人工智能技术相交互进而获得高精度的癌症识别并定位出癌症病人、正常人的峰位差异,相对于常规分析血清的医学手段而言,本发明无需任何抗体抗原等生物特异性修饰过程,能获取血清样本的本征光谱学信号,最终成功实现了更低廉、快速且精准的癌症病人和正常人血清信号区分,这为当今的临床液体活检领域提供了一种全 新的检测和病理信息获取思路。
附图说明
图1是本发明的一种光谱学与人工智能交互的血清分析方法的流程图。
图2是实施例2中若干例正常人血清和结直肠癌、肺癌病人血清的SERS典型图谱和汇总图谱。
图3是实施例2中对244例肺癌血清样本&350例正常样本的SERS光谱数据进行降维分析时的部分维度热力图。
图4是实施例2中肺癌&正常人样本降维后得到的50个拉曼特征峰位统计表截图。
图5是实施例2中对216例结直肠癌血清样本&350例正常样本的SERS光谱数据进行降维分析时的部分维度热力图。
图6是实施例2中结直肠癌&正常人样本降维后得到的50个拉曼特征峰位统计表截图。
图7是实施例3中对结直肠癌病人、肺癌病人、正常人的识别准确度输出流程图。
图8是实施例3中对结直肠癌病人、肺癌病人、正常人的算法运算逻辑图。
图9是实施例3中对结直肠癌、肺癌、正常人三类样本识别的散点分布图和准确率、灵敏度统计图。
具体实施方式
以下实施例1、实施例2、实施例3用于进一步说明本发明,但不应理解为对本发明的限制。若未特别指明,实施例中所用的技术手段为本领域技术人员所熟知的常规手段。
实施例1
本发明主要将物理领域的SERS光谱技术与计算机领域的人工智能技术相结合,如图1所示,本发明提出的血清分析方法以“临床样本收集-制样-采谱-数据训练识别-结果输出-有望指导临床治疗”的模式将SERS技术和人工算法完美结合并完成信息交互,最终实现了高准确度、快速的癌症识别和极具参考性的差异峰位定位。具体包括如下步骤:
(a)在临床样本收集上,本实施例中分别提取244例肺癌病人、216例结直肠癌病人、350例正常人等不同来源的人体外周血,再借助离心机对每一份外周血进行离心操作,离心时间为10min,所使用的外周血体积为1.5ml,待离心结束后仔细提取所得液体的上层淡黄色血清,即分别得到肺癌病人、结直肠癌病人以及正常人的血清样本,备用;
(b)本实施例以银纳米线为SERS探针,原始银纳米线溶液具体的制备过程为:
先将1.665g聚乙烯吡咯烷酮(分子量为360000)和0.0019g CuCl 2加入100ml乙二醇 中,在超声池中搅拌分散均匀得A溶液;再将1.7g AgNO 3溶于100ml乙二醇中得B溶液;然后,将上述A溶液匀速地滴加到B溶液中并搅拌均匀,最后将混合溶液移至250ml高压反应釜中,将反应釜密封后放入烘箱中,在160℃下加热3h,反应结束后冷却至室温,即得到原始银纳米线溶液,备用。
在拉曼光谱测试前需对上述银纳米线进行离心操作以去除杂质,所得银纳米线直径约为100nm,长度为10-20μm。离心的具体操作为:取用原始银纳米线溶液4.5ml,离心时转速保持在6000r/min,离心10min后将上清液用吸管全部移除并把所得银纳米线沉淀用1mL去离子水重悬,最后利用超声波清洗器使其分散均匀,得到浓缩后的银纳米线溶液;
(c)接着进行SERS测试制样,先用移液枪取30μl的血清样本于100μl的锥形管中,再取15μl浓缩后的银纳米线溶液与血清样本充分混合,此时银纳米线溶液与血清样本的体积比固定为1:2(确保各同样体积的血清样本中加入的SERS探针量相等),银纳米线探针与血清充分接触,待室温混合孵化10min后,取30μl孵化后的混合液移至倒置的锥形管管帽内进行拉曼光谱测试,样品定位时先借助共聚焦显微镜聚焦到液面以下,采谱时使用的镜头为50倍共聚焦镜头,激光波长为532nm,采谱范围为600cm -1~1800cm -1,每个血清样本经同样的处理后在相同条件下采谱5次,每个样本采谱5次总耗时约为15min。
实施例2
完成实施例1的“临床样本收集-制样-采谱”步骤,然后对采集到的350例正常人、244例肺癌病人、216例结直肠癌病人的所有拉曼光谱数据进行筛选,每个样本最终选择5次数据中重复性最好的一次光谱数据作为最终的采谱结果,如图2(a)所示显示的是经过筛选后的某一例正常人血清样本的典型SERS光谱图,可以发现其存在明显的特征峰,这也证实了SERS技术极高的检测灵敏度。
待所有拉曼采谱数据筛选完毕后,即可得到所有不同来源血清样本的SERS图谱,如图2(b)显示的若干例正常人的血清SERS汇总图谱,可以发现每一条正常人的血清光谱曲线都具有明显的特征峰位,且所有光谱曲线都有一些共同的特征峰位;图2(c)和图2(d)分别显示的为若干例例肺癌病人和若干例例结直肠癌病人的血清SERS汇总图谱,可以发现每一条肺癌病人、结直肠癌病人的光谱曲线也具有明显的特征峰位,虽然图2中的所有光谱曲线都有一些共同的特征峰位,但是这些癌症病人的某些特征峰与正常人的光谱特征峰有不同程度的细微区别。通过肉眼观察、对比图2(b)中的正常人和图2(c)、(d)中的肺癌、结直肠癌病人的血清SERS汇总图谱,可以发现细微差异,但是根本无法对不同来源的光谱学数据进行系统的统计学分析。
基于上述批量分析光谱数据的瓶颈问题,本发明提出了借助人工智能算法技术对大量的 血清SERS光谱数据进行统计学处理、分析、识别的方法。本发明利用的算法工具为libsvm,在用血清光谱学数据进行svm模型训练和测试之前,先借助weka软件将所有光谱数据的格式转变成libsvm所需要的格式。由于每个样本的数据都是600cm -1~1800cm -1之间的数据点,此频率范围内共包含1456个详细的数据点。所有样本的SERS光谱数据的横坐标都是一样的频率,只是每个样本在每个频率上对应的峰强不一样,所以将每个频率看成是索引值,对应的一个峰强是一个维度,这样每个样本的数据就变成了一个1456维的数据,1456个维度按照频率从低到高进行排序。但并不是每个维度都是有用的,有些维度并不具有特征,所以接下来需要对数据进行清洗和特征降维。
本实施例在降维处理过程中将正常人分为一类,两种癌症病人分为另一类,具体的先将原始的600cm -1~1800cm -1频率段光谱数据以60cm -1为间隔划成若干个有效频率段,再用协方差算出不同频率段中每一段特征之间的相关性,其中相关度在-1到1之间,越接近-1和1表示相关性越大,越接近0则表示相关性越小,最终以热力图的形式将不同范围的频率特征相关性进行呈现。如图3所示,显示了244例肺癌样本相对于350例正常对照样本在600cm -1~623.7705cm -1这一部分频率段的相关性热力图,其中可以清晰的发现不同维度之间的相关度呈现明显差异性分布。在降维处理过程中选取每60个连续维度里选择相关性最低的2个维度作为有效特征点,剩余不足60个的连续维度仍然选择2个有效维度,最终将原始的1456维降成了50维,即对应50个特征拉曼频率,所有肺癌病人和正常人SERS光谱之间的具体维度差异明细见图4。这50个拉曼峰位代表了肺癌病人血清SERS光谱相比于正常人血清SERS光谱的50处差异处。同样的,如图5所示,显示了216例结直肠癌样本相对于350例正常对照样本在600cm -1~623.7705cm -1这一部分频率段的相关性热力图,也可以清晰的发现不同维度之间的相关度呈现明显差异性分布,对应结直肠癌病人和正常人SERS光谱之间的具体维度差异明细见图6。综上所述,本发明可以将繁琐的SERS峰位过程进行简化,并实现更精准的SERS差异峰位定位,这有望为未来临床的癌症诊疗提供指导性的意见。
实施例3
在完成实施例2的降维处理后每一个血清样本的SERS光谱图可以精简为50个维度,接下来按照如图7所示的流程图对所有数据进行处理,进行下面两类训练和识别:结直肠癌病人血清光谱数据标签为1,正常人血清光谱数据标签为0,用来判断是否为结直肠癌病人;肺癌病人血清光谱数据标签为1,正常人血清光谱数据标签为0,用来判断是否为肺癌病人。
本实施例中进行的算法运算逻辑图如图8所示,在算法运算时先对以上每种情况按8:2分成训练集和测试集,然后先对数据进行缩放,范围为[0,1],因为数据太过于分散,scaling后数据相对集中一些,能解决一些奇异数据带来的影响。缩放过程中用的归一化公式为:
Figure PCTCN2022114961-appb-000012
这里y是缩放前的数据,y'是缩放后的数据,lower与upper是缩放后数据的最小值、最大值,min与max是缩放前数据的最小值、最大值。
对应的支持向量展式为:
Figure PCTCN2022114961-appb-000013
其中k(x,x i)为核函数,上式显示出模型最优解可以通过训练样本的核函数展开。
算法处理过程中使用的核函数为径向基核函数(即RBF核函数),这个核函数将样本非线性地映射到一个更高维的空间,与线性核不同,它能够处理分类标注和属性的非线性关系,在实际问题中表现出了良好的性能。具体表达式为:
K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0;
γ就是高斯核函数的超参数。具体为:
首先,将原始问题转化为凸优化问题:
Figure PCTCN2022114961-appb-000014
s.t.y i(w·x i+b)≥1-ξ i,i=1,2,…,N
原始问题:ξ i≥0,i=1,2,…N;;
然后进行凸优化问题求解;
①原始问题的对偶问题,构造拉格朗日函数:
Figure PCTCN2022114961-appb-000015
其中,α为拉格朗日乘子;w为平面上的法向量,决定了超平面的方向;b为位移项, 代表超平面到原点的距离;ξ为代表松弛变量;μ为对偶变量。先求对w,b,ξ的极小值,分别求偏导并令导数为0,然后代入原函数,对极小值求α的极大值,再将求极大转换为求极小,得到对偶问题:
Figure PCTCN2022114961-appb-000016
Figure PCTCN2022114961-appb-000017
0≤α i≤C,i=1,2,…N;
选择K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0为核函数;
②由KKT条件成立可以得到:
Figure PCTCN2022114961-appb-000018
Figure PCTCN2022114961-appb-000019
需要注意的是:
本发明中的参数C,g是经过libsvm中的grid.py进行网格寻优后的最佳参数,C是惩罚系数,即对误差的宽容度,C越高容易过拟合,说明越不能容忍出现误差;C越小,容易欠拟合;C过大或过小,泛化能力变差。g是选择RBF函数作为核函数后,该函数自带的一个参数,隐含地决定了数据映射到新的特征空间后的分布,g越大,支持向量越少,g值越小,支持向量越多,支持向量的个数影响训练与预测的速度。
关于γ和g的关系,根据以下公式推出:
Figure PCTCN2022114961-appb-000020
其中,d(x,z)为距离,gamma=γ,也就是g值,与高斯核函数的超参数值相等,σ为函数的宽度参数。
本实施例中,当结直肠癌病人血清光谱数据标签为1,正常人血清光谱数据标签为0时, C值取8,g值取0.0488;肺癌病人血清光谱数据标签为1,正常人血清光谱数据标签为0,C值取8,g值取0.25。
核函数和参数C,g选择好后,用训练集进行训练获得针对血清SERS光谱数据的svm模型,此过程中用到的分类决策函数为:
Figure PCTCN2022114961-appb-000021
其中的a*是一组a i满足条件的最优解,其由smo算法得到,K(x i,x)对应高斯核函数,b*为阈值,在上一步中已经求出。
选择合页损失函数为损失函数,λ||w|| 2为正则化项,即:
Figure PCTCN2022114961-appb-000022
当样本被正确分类时:y(wx+b)>0;当样本被错误分类时:y(wx+b)<0。y(wx+b)的绝对值代表样本距离决策边界的远近程度。绝对值越大,表示样本距离决策边界越远。当样本被正确分类且函数间隔大于1时,合页损失才是0,否则损失是1-y(wx+b)。
再利用测试集对得到的模型进行测试,并将实际情况与模型预测结果进行比对,最终获得识别准确率结果输出。
如图9a-b所示,显示了对于不同三种数据集的散点分布图,可以发现本发明建立的算法模型对于不同来源的血清拉曼数据具有优异的分类效果,其中对于结直肠癌的分类识别效果略好于肺癌。另外通过观察图9c,可以发现肺癌、结直肠癌相比于正常人,均体现识别准确率高于94.1%、灵敏度高于91.84%的高灵敏识别,具体的,可实现准确度为94.1%,灵敏度为91.84%的肺癌识别以及准确率为98.25%,灵敏度为97.73%的结直肠癌识别,具体识别效果接近100%。因此本发明提出的光谱学与人工智能交互的血清分析方式能实现高准确率的癌症检测,这对于临床的癌症快速、高精度、非侵入性检测具有非常大的意义。
另外需要强调的是,本发明的方法相对于单一血清样本的高精度癌症检测分析而言,耗时很短,从样品收集-制样-采谱-算法训练-识别准确率结果输出,全过程耗时约1小时,除去检测仪器本身成本外所用耗材(银纳米线溶液)的成本不足1元,这对于当前的癌症液体活检领域是具有重大意义的,解决了长期以来癌症检测过程中传统医学方法侵入性强、检测周期长、费用高昂的问题。
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。

Claims (7)

  1. 一种光谱学和人工智能交互的血清分析方法,其特征在于,所述血清分析方法以无本征拉曼信号的银纳米线作为SERS探针,将银纳米线溶液分别与患病病人和正常人的血清样本直接进行液相混合共孵化,孵化完毕后,在拉曼光谱仪测试下进行血清SERS光谱数据采集得到原始光谱数据点,随后利用协方差矩阵对原始光谱数据点进行降维处理,降维得到的光谱学数据点即为患病样本相比于正常样本的差异峰位,再借助支持向量机模型对降维后的光谱学数据进行分类训练和识别,最终获得不同患病样本相比于正常样本的识别准确率。
  2. 根据权利要求1所述的血清分析方法,其特征在于,所述血清分析方法包括如下步骤:
    (1)准备好纯化后的银纳米线溶液,备用;另外将不同类型患病病人、正常人的外周血血浆样本进行离心处理获得对应血清样本并备用;
    (2)将银纳米线溶液分别与上述所有的血清样本按照同样的比例进行液相混合孵化,确保银纳米线与血清充分接触,孵化完毕后利用拉曼光谱仪对所有的样本进行体相的SERS光谱学数据采集,采谱时激光波长为532nm,采谱范围为600cm -1~1800cm -1,每个样本采谱5次;
    (3)待所有血清样本的光谱学数据采集完毕后,先将不同来源的血清SERS光谱数据进行降维处理,去除样本数据点中的无关项,最终筛选出能体现数据差异性的有效维度,具体的:通过协方差矩阵算出不同样本间的原始数据维度相关性,再以相关性最低的数据点作为降维处理后的有效维度,这些维度对应不同病例间的差异性峰位;
    (4)然后进行算法训练:在进行算法训练和识别时以降维处理后的数据点作为特征值进行二分类处理,将所有样本分成训练集和测试集两部分,然后对每一例样本的数据进行缩放,缩放范围为[0,1],缩放过程中用的归一化公式为:
    Figure PCTCN2022114961-appb-100001
    这里y是缩放前的数据,y'是缩放后的数据,lower与upper是缩放后数据的最小值、最大值,min与max是缩放前数据的最小值、最大值;
    对应的支持向量展式为:
    Figure PCTCN2022114961-appb-100002
    其中k(x,x i)为核函数,上式显示出模型最优解可以通过训练样本的核函数展开;
    算法处理过程中使用的核函数为径向基核函数(即RBF核函数),即:
    K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0;
    γ就是高斯核函数的超参数;
    具体为:
    首先,将原始问题转化为凸优化问题:
    原始问题:
    Figure PCTCN2022114961-appb-100003
    然后进行凸优化问题求解;
    ①原始问题的对偶问题,构造拉格朗日函数:
    Figure PCTCN2022114961-appb-100004
    其中,α为拉格朗日乘子;w为平面上的法向量,决定了超平面的方向;b为位移项,代表超平面到原点的距离;ξ为代表松弛变量;μ为对偶变量,先求对w,b,ξ的极小值,分别求偏导并令导数为0,然后代入原函数,再对极小值求α的极大值,再将求极大转换为求极小,得到对偶问题:
    Figure PCTCN2022114961-appb-100005
    选择K(x i,x j)=exp(-γ||x i-x j|| 2),γ>0为核函数;
    ②由KKT条件成立可以得到:
    Figure PCTCN2022114961-appb-100006
    Figure PCTCN2022114961-appb-100007
    其中C是惩罚系数,即对误差的宽容度,g是选择RBF函数作为核函数后,该函数自带的一个参数,经过libsvm中的参数优化工具grid.py进行网格寻优选择参数C和g最佳参数;
    关于γ和g的关系,根据以下公式推出:
    Figure PCTCN2022114961-appb-100008
    其中,d(x,z)为距离,gamma=γ,也就是g值,与高斯核函数的超参数值相等,σ为函数的宽度参数;
    核函数和参数C,g选择好后,用训练集进行训练获得针对血清SERS光谱数据的svm模型,此过程中用到的分类决策函数为:
    Figure PCTCN2022114961-appb-100009
    其中的a*由smo算法得到,K(x i,x)对应高斯核函数,b*为阈值;
    选择合页损失函数为损失函数,λ||ω|| 2为正则化项,即:
    Figure PCTCN2022114961-appb-100010
    (5)再利用测试集对得到的模型进行测试,并将实际情况与模型预测结果进行比对,最终获得识别准确率结果输出。
  3. 根据权利要求2所述的血清分析方法,其特征在于,步骤(1)中将原始银纳米线溶液进行离心处理的转速为6000r/min。
  4. 一种权利要求1-3任一项所述的血清分析方法在对多种癌症病人、正常人的高精度识别和差异SERS峰位分析中的应用。
  5. 根据权利要求4所述的应用,其特征在于:所述血清分析方法步骤(4)中进行二分类处理时,将正常人血清作为一类,某种癌症病人血清作为另外一类;另外其中一部分的癌症病人和正常人样本进行算法训练,剩余的样本进行癌症识别,训练和识别时某种癌症病人的血清光谱数据作为癌症类,正常人的血清光谱数据单独作为正常类,最终获得癌症识别的准确度。
  6. 根据权利要求5所述的应用,其特征在于,所述病人为肺癌病人及结直肠癌病人。
  7. 根据权利要求6所述的应用,其特征在于,所述血清分析方法在对肺癌病人、结直肠癌病人、正常人的高精度识别和差异SERS峰位分析时,步骤(3)中,降维处理前每一个原始数据约有1456个维度,降维处理后精简为50个维度,即对应差异性较明显的50处SERS特征峰位。
PCT/CN2022/114961 2021-09-07 2022-08-25 一种光谱学和人工智能交互的血清分析方法及其应用 WO2023035970A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111044298 2021-09-07
CN202111044298.2 2021-09-07

Publications (1)

Publication Number Publication Date
WO2023035970A1 true WO2023035970A1 (zh) 2023-03-16

Family

ID=83257993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114961 WO2023035970A1 (zh) 2021-09-07 2022-08-25 一种光谱学和人工智能交互的血清分析方法及其应用

Country Status (2)

Country Link
CN (1) CN115078331B (zh)
WO (1) WO2023035970A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117147273A (zh) * 2023-10-31 2023-12-01 成都博瑞科传科技有限公司 一种本底样本浓缩装置及其浓缩方法、检测设备校准方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115078331B (zh) * 2021-09-07 2024-03-29 武汉大学 一种光谱学和人工智能交互的血清分析方法及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799421A (zh) * 2010-04-19 2010-08-11 福建师范大学 一种体液表面增强拉曼光谱的检测方法
CN108872182A (zh) * 2018-03-16 2018-11-23 广东医科大学 一种基于sers的循环肿瘤细胞检测方法
CN109765214A (zh) * 2019-03-29 2019-05-17 北京中科遗传与生殖医学研究院有限责任公司 基于表面增强拉曼光谱的不孕不育患者血清的检测方法
WO2021172944A1 (ko) * 2020-02-26 2021-09-02 사회복지법인 삼성생명공익재단 소변의 라만 신호를 이용한 암 진단 방법
CN115078331A (zh) * 2021-09-07 2022-09-20 武汉大学 一种光谱学和人工智能交互的血清分析方法及其应用

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102175664A (zh) * 2011-02-17 2011-09-07 福建师范大学 一种血液rna表面增强拉曼光谱检测方法
JP6294614B2 (ja) * 2013-05-08 2018-03-14 有限会社マイテック 癌関連物質の定量方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799421A (zh) * 2010-04-19 2010-08-11 福建师范大学 一种体液表面增强拉曼光谱的检测方法
CN108872182A (zh) * 2018-03-16 2018-11-23 广东医科大学 一种基于sers的循环肿瘤细胞检测方法
CN109765214A (zh) * 2019-03-29 2019-05-17 北京中科遗传与生殖医学研究院有限责任公司 基于表面增强拉曼光谱的不孕不育患者血清的检测方法
WO2021172944A1 (ko) * 2020-02-26 2021-09-02 사회복지법인 삼성생명공익재단 소변의 라만 신호를 이용한 암 진단 방법
CN115078331A (zh) * 2021-09-07 2022-09-20 武汉大学 一种光谱学和人工智能交互的血清分析方法及其应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 15 February 2021, CHINA JILIANG UNIVERSITY, CNKI, article LIU, KAIYUAN: "Application of Surface Enhanced Raman Spectroscopy in Rapid Screening of Lung Adenocarcinoma", pages: 1 - 84, XP009544414, DOI: 10.27819/d.cnki.gzgjl.2019.000377 *
"Master's Thesis", 15 January 2021, HARBIN INSTITUTE OF TECHNOLOGY, CN, article XIE, JINMEI: "Study on the Differential Diagnosis of Non-Hodgkin's Lymphoma Using SERS Technique Based on Serum Samples", pages: 1 - 70, XP009544415, DOI: 10.27061/d.cnki.ghgdu.2020.002586 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117147273A (zh) * 2023-10-31 2023-12-01 成都博瑞科传科技有限公司 一种本底样本浓缩装置及其浓缩方法、检测设备校准方法
CN117147273B (zh) * 2023-10-31 2024-02-02 成都博瑞科传科技有限公司 一种本底样本浓缩装置及其浓缩方法、检测设备校准方法

Also Published As

Publication number Publication date
CN115078331B (zh) 2024-03-29
CN115078331A (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
WO2023035970A1 (zh) 一种光谱学和人工智能交互的血清分析方法及其应用
Hands et al. Brain tumour differentiation: rapid stratified serum diagnostics via attenuated total reflection Fourier-transform infrared spectroscopy
Hamadeh et al. An overview of toxicogenomics
Kwak et al. Multimodal microscopy for automated histologic analysis of prostate cancer
CN108956968B (zh) 用于诊断增殖性紊乱的试剂盒的制备
Chen et al. Immunoassay for LMP1 in nasopharyngeal tissue based on surface-enhanced Raman scattering
CN111812078A (zh) 基于表面增强拉曼光谱的人工智能辅助前列腺肿瘤早期诊断方法
Raji et al. Biosensors and machine learning for enhanced detection, stratification, and classification of cells: A review
US11604133B2 (en) Use of multi-frequency impedance cytometry in conjunction with machine learning for classification of biological particles
Liang et al. Label-free distinction between p53+/+ and p53-/-colon cancer cells using a graphene based SERS platform
US20170059581A1 (en) Methods for diagnosis and prognosis of inflammatory bowel disease using cytokine profiles
WO2017201924A1 (zh) 基于表面增强共振拉曼光谱的尿液修饰核苷检测分析方法
Guo et al. Circulating tumor cell identification based on deep learning
CN112313497A (zh) 使用微流式细胞术诊断疾病的方法
Sahu et al. Efficient role of machine learning classifiers in the prediction and detection of breast cancer
Chen et al. High-throughput recognition of tumor cells using label-free elemental characteristics based on interpretable deep learning
Wen et al. Detection and classification of multi-type cells by using confocal Raman spectroscopy
Fitzgerald et al. Sensor arrays from spectroscopically-encoded polymers: towards an affordable diagnostic device for biomolecules
CN114822827B (zh) 一种慢性阻塞性肺疾病急性加重预测系统和预测方法
CN116519935A (zh) 一种用于肿瘤标志物cyfra21-1 sers检测的免疫生物芯片及其制备方法
CN107003371A (zh) 用于确定主体患胰腺癌的可能性的方法
CN114496220A (zh) 一种发掘和检测肿瘤初步筛选指标的荧光探针快速设计方法
Jiang et al. Label-free, rapid and highly accurate identification and categorization of leukemia cells via Raman spectroscopy
EP4172629A1 (en) Biomarker combinations for determining aggressive prostate cancer
CN114613494A (zh) 一种用于快速筛查宫颈肿瘤的模型及其建立方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866438

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE