WO2022266928A1 - Metabolic characteristic spectrum inference method and system, and computer device and storage medium - Google Patents

Metabolic characteristic spectrum inference method and system, and computer device and storage medium Download PDF

Info

Publication number
WO2022266928A1
WO2022266928A1 PCT/CN2021/102060 CN2021102060W WO2022266928A1 WO 2022266928 A1 WO2022266928 A1 WO 2022266928A1 CN 2021102060 W CN2021102060 W CN 2021102060W WO 2022266928 A1 WO2022266928 A1 WO 2022266928A1
Authority
WO
WIPO (PCT)
Prior art keywords
mass
retention time
charge ratio
interval
metabolic
Prior art date
Application number
PCT/CN2021/102060
Other languages
French (fr)
Chinese (zh)
Inventor
李伟忠
邓永洁
胡寓旻
黄蓬
Original Assignee
中山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中山大学 filed Critical 中山大学
Priority to PCT/CN2021/102060 priority Critical patent/WO2022266928A1/en
Publication of WO2022266928A1 publication Critical patent/WO2022266928A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the invention relates to the field of metabolomics data analysis, in particular to a method, system, computer equipment and storage medium for inferring metabolic profile.
  • Metabolites in human serum include host metabolites, microbial-derived metabolites, and exogenous substances such as diet, which are closely related to the occurrence and development of various diseases.
  • Current metabolomics methods are capable of quantitative determination, identification and analysis of metabolites in serum.
  • Liquid Chromatograph-Mass Spectrometer (LC-MS) is a commonly used detection technology for metabolites. Different substances are separated by high-performance liquid chromatography, and mass spectrometry is used to analyze the quality of the substances separated in different phases. .
  • the identification of substances in non-targeted LC-MS raw data is mainly carried out through database comparison. substances for comparison.
  • the Human Metabolome Database (The Human Metabolome Database, HMDB) contains 114,305 metabolite entries. But they are still few compared to the actual chemical space. More than 166 billion small organic molecules are listed in the Chemical Universe database GDB-17. Furthermore, there are several challenges in the processing of metabolomics data (i.e., sparse, noisy, heterogeneous, time-dependent, etc.). At this stage, deep learning techniques are rarely used in metabolomics data. The SteroidXtract tool applies deep learning technology and can directly use the original mass spectrum to classify steroid substances and non-steroid substances. However, LC-MS data is a kind of complex three-dimensional spatial data.
  • the same sample contains multiple time-phase data (ie, different retention times), and each time-phase data has a mass spectrum.
  • the SteroidXtract method and other metabolomics analysis methods require artificial de-redundancy processing of these large mass spectra.
  • the biological processes involved in metabolites in serum are often not associated with a single type or substance, and these different substances are often distributed in different phases.
  • the technical problem to be solved by the present invention is to provide a method, system, computing device, and storage medium for inferring metabolic profiles, which can solve the difficulties in error handling, loss of a large number of original signals, and the limitations of classification in existing metabolomics methods. sex issue.
  • the present invention provides a method for inferring metabolic profiles, comprising: subjecting target sample data to LC-MS technology processing to obtain LC-MS raw data; performing dimensionality reduction transformation on the LC-MS raw data Processing to obtain a two-dimensional matrix that retains the retention time, mass-to-charge ratio, and ion intensity of the LC-MS raw data; inputting the two-dimensional matrix into a convolutional neural network model to infer the target sample Metabolite profiles of the data.
  • the step of performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix includes: performing format conversion on the LC-MS raw data; setting the initial retention time, termination retention time, retention Time interval, retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein the retention time interval ranges from the initial retention time to the termination retention time
  • the range of the mass-to-charge ratio is the range between the initial mass-to-charge ratio and the end mass-to-charge ratio; in the retention time interval and the mass-to-charge ratio interval, the retention time sampling interval and mass-to-charge ratio sampling The interval is a sliding window, and the maximum ion intensity within the retention time interval and mass-to-charge ratio interval is sampled to obtain a two-dimensional matrix of ion intensity.
  • the step of filtering the class activation scores to obtain retained features comprises: filtering out molecular features whose class activation scores are less than a first preset threshold and whose ion intensity is less than a second preset threshold, to obtain Preserve features.
  • the present invention also provides a metabolic profile inference system, including: an LC-MS processing module, used to process target sample data with LC-MS technology to obtain LC-MS raw data; a dimensionality reduction conversion processing module, used to convert The LC-MS raw data is subjected to dimension reduction conversion processing to obtain a two-dimensional matrix, and the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion intensity of the LC-MS raw data; the metabolic profile deduction module is used for The two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the target sample data.
  • the dimensionality reduction conversion processing module includes: a format conversion unit for performing format conversion on the LC-MS raw data; a parameter setting unit for setting the initial retention time, termination retention time, retention time interval, Retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein the retention time interval ranges from the initial retention time to the termination retention time, The mass-to-charge ratio interval is the range between the initial mass-to-charge ratio and the end mass-to-charge ratio; the dimensionality reduction sampling unit is used for sampling intervals of the retention time and The mass-to-charge ratio sampling interval is a sliding window, and the maximum ion intensity within the retention time interval and the mass-to-charge ratio interval is sampled to obtain a two-dimensional matrix of ion intensity.
  • the filtering unit is configured to filter out molecular features whose class activation scores are smaller than a first preset threshold and whose ion intensity is smaller than a second preset threshold, so as to obtain reserved features.
  • the present invention also provides a computer device, which includes a memory, a processor, and computer instructions stored in the memory and operable on the processor. The steps of the above method are realized when the processor executes the instructions.
  • the present invention also provides a storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the steps of the above method are realized.
  • the raw data of LC-MS is obtained by first processing the sample that needs to infer the metabolic profile through LC-MS technology, wherein the LC-MS technology is liquid chromatography mass spectrometry technology; and then the LC-MS technology is The MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the sample.
  • the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS
  • the two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
  • Fig. 1 is the flowchart of the metabolic profile deduction method provided by the present invention
  • Fig. 2 is the flow chart of the method for dimension reduction conversion processing provided by the present invention
  • Fig. 3 is the flow chart of the method for inferring the profile of metabolites provided by the present invention.
  • Fig. 4 is a schematic diagram of the method for inferring metabolic profiles provided by the present invention.
  • Fig. 5 is a functional block diagram of the metabolic profile inference system provided by the present invention.
  • Fig. 6 is a functional block diagram of the dimension reduction conversion processing module provided by the present invention.
  • Fig. 7 is a functional block diagram of the metabolic profile inference module provided by the present invention.
  • the present invention provides a method for inferring metabolic profiles, including:
  • the raw data of LC-MS is obtained by first processing the sample that needs to infer the metabolic profile through LC-MS technology, wherein the LC-MS technology is liquid chromatography mass spectrometry technology; and then the LC-MS technology is The MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the sample.
  • the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS
  • the two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
  • the step of performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix includes;
  • S202 setting a starting retention time, an ending retention time, a retention time interval, a retention time sampling interval, a starting mass-to-charge ratio, an ending mass-to-charge ratio, a mass-to-charge ratio interval, and a mass-to-charge ratio sampling interval.
  • the range of the retention time interval is the range from the initial retention time to the end retention time
  • the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the end mass-to-charge ratio
  • the two-dimensional matrix of ionic strength is:
  • i(t,r) max ⁇ intensity(t,r),...,intensity(t,r+Rgap)...,intensity(t+Tgap,r+Rgap) ⁇ , t ⁇ (T0,Te ), r ⁇ (R0,Re), where t is the retention time, r is the mass-to-charge ratio, intensity is the ionic strength, T0 is the initial retention time, Te is the termination retention time, Tgap is the retention time sampling interval, R0 is The starting mass-to-charge ratio, Re is the ending mass-to-charge ratio, and Rgap is the mass-to-charge ratio sampling interval.
  • the existing deep learning technology obtains two-dimensional mass spectra of each time phase, and carries out substance identification and subsequent analysis based on the mass spectra.
  • the mass spectrum only contains the mass-to-charge ratio and ion intensity information of the substance, plus the phase label of each mass spectrum, which is still a huge three-dimensional data. Therefore, it is necessary to perform a de-redundancy operation to remove a large amount of phase information, so that only a single class of steroids can be processed.
  • the present invention innovatively converts the original data in three-dimensional space into a two-dimensional matrix, which can simultaneously retain information such as retention time, mass-to-charge ratio, and ion intensity of the original data.
  • the original data of the serum sample after LC-MS detection is a kind of three-dimensional point cloud data, including three dimensions of retention time, mass-to-charge ratio, and ion intensity.
  • the dimension reduction transformation is carried out by the method of the present invention, two-dimensional matrix data with retention time and mass-to-charge ratio as axes and ion intensity as values is obtained. It effectively reduces the dimensionality of the original data while retaining the metabolite signal to the greatest extent.
  • the step of inputting the two-dimensional matrix into the convolutional neural network model to infer the metabolic substance profile of the sample includes:
  • class activation score can be expressed as s(t,r), where t is the retention time and r is the mass-to-charge ratio;
  • the two-dimensional coordinates (x, y) of the class activation heat map are mapped to retention time (t) and mass-to-charge ratio (r).
  • S305 Screen key metabolites according to the retention characteristics, and perform correlation calculations to infer metabolic markers and metabolic network patterns of the target sample data, and then generate a metabolic profile of the target sample data.
  • the step of filtering the class activation scores to obtain retained features comprises: filtering out molecular features whose class activation scores are less than a first preset threshold and whose ion intensity is less than a second preset threshold, to obtain Preserve features.
  • molecular features (t, r) whose class activation scores s(t, r) are smaller than the first threshold or ionic intensity (t, r) is smaller than the second threshold are filtered out to obtain the reserved features [(t1, r1) ,(t2,r2),(t3,r3),...,(tn,rn)].
  • the existing deep learning technology can only classify and extract steroid substances due to the limited types of substances involved.
  • the traditional metabolomics technology needs complex data preprocessing to extract the mass spectrum peaks to obtain the metabolite matrix of the sample, and obtain the metabolite characteristic spectrum in a statistically driven manner.
  • the present invention innovatively proposes a mapping function to map the sample features learned by deep learning technology supervision to the original data attributes (retention time, mass-to-charge ratio).
  • retention time and mass-to-charge ratio are labels for identifying specific substances.
  • the present invention uses deep learning technology to obtain sample features by calculating the class activation thermodynamic map, and can use the mapping function to infer the specific substances that make up the sample features, thereby further mining the sample feature markers metabolites, metabolic network patterns, and inferring the metabolism of the sample characteristic spectrum.
  • the steps to build and train a convolutional neural network model in advance include:
  • the data sets are two-dimensional matrix data obtained through two-dimensional matrix transformation.
  • the method for inferring metabolic profiles provided by the present invention directly inputs LC-MS raw data, converts and processes the original signal to the greatest extent, and then classifies it using a convolutional neural network model, and Extract features from the classification model to obtain different metabolite patterns in different classifications; using the present invention, the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared to A large number of signals will be lost when the existing method removes redundancy, and the two-dimensional conversion processing of the LC-MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, It can more effectively evaluate the joint correlation of multiple substances and sample classification, instead of comparing each substance one by one in isolation, so that the metabolic profile of the sample can be inferred more accurately.
  • the present invention also provides a metabolic profile inference system 100, including:
  • LC-MS processing module 1 used for subjecting target sample data to LC-MS technical processing to obtain LC-MS raw data
  • Dimensionality reduction conversion processing module 2 for performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix, the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion of the LC-MS raw data strength;
  • the metabolic profile inference module 3 is used to input the two-dimensional matrix into the convolutional neural network model to infer the metabolic profile of the target sample data.
  • the LC-MS processing module 1 through the LC-MS processing module 1, first perform LC-MS technology processing on the samples that need to infer the metabolic profile to obtain LC-MS raw data, and then use the dimensionality reduction conversion processing module 2 to convert the LC-MS
  • the MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model through the metabolic profile inference module 3 to infer the metabolic substance profile of the sample.
  • the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS
  • the two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
  • the dimensionality reduction conversion processing module 2 includes:
  • a format conversion unit 21 configured to convert the format of the LC-MS raw data
  • the parameter setting unit 22 is used to set the initial retention time, termination retention time, retention time interval, retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein,
  • the range of the retention time interval is the range from the initial retention time to the end retention time
  • the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the end mass-to-charge ratio
  • the dimensionality reduction sampling unit 23 is used to sample the retention time interval and the mass-to-charge ratio interval using the retention time sampling interval and the mass-to-charge ratio sampling interval as a sliding window within the retention time interval and the mass-to-charge ratio interval.
  • the maximum ionic strength of is used to obtain a two-dimensional matrix of ionic strength.
  • the existing deep learning technology obtains two-dimensional mass spectra of each time phase, and carries out substance identification and subsequent analysis based on the mass spectra.
  • the mass spectrum only contains the mass-to-charge ratio and ion intensity information of the substance, plus the phase label of each mass spectrum, which is still a huge three-dimensional data. Therefore, it is necessary to perform a de-redundancy operation to remove a large amount of phase information, so that only a single class of steroids can be processed.
  • the present invention innovatively converts the original data in three-dimensional space into a two-dimensional matrix, which can simultaneously retain information such as retention time, mass-to-charge ratio, and ion intensity of the original data.
  • the original data of the serum sample after LC-MS detection is a kind of three-dimensional point cloud data, including three dimensions of retention time, mass-to-charge ratio, and ion intensity.
  • the dimension reduction transformation is carried out by the method of the present invention, two-dimensional matrix data with retention time and mass-to-charge ratio as axes and ion intensity as values is obtained. It effectively reduces the dimensionality of the original data while retaining the metabolite signal to the greatest extent.
  • the metabolic profile inference module 3 includes:
  • the class activation score acquisition unit 31 is used to calculate the class activation heat map according to the convolutional neural network model, and generate the class activation score s(t, r) of each sample, where t is the retention time and r is the mass charge Compare;
  • a mapping unit 33 configured to map the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function
  • a filtering unit 34 configured to filter the class activation scores to obtain reserved features
  • Calculation and inference unit 35 configured to screen key metabolites according to the retention characteristics, and perform correlation calculations to deduce the metabolic markers and metabolic network patterns of the target sample data, and then generate the metabolic profile of the target sample data .
  • the filtering unit is configured to filter out molecular features whose class activation scores are smaller than a first preset threshold and whose ion intensity is smaller than a second preset threshold, so as to obtain reserved features.
  • the existing deep learning technology can only classify and extract steroid substances due to the limited types of substances involved.
  • the traditional metabolomics technology needs complex data preprocessing to extract the mass spectrum peaks to obtain the metabolite matrix of the sample, and obtain the metabolite characteristic spectrum in a statistically driven manner.
  • the present invention innovatively proposes a mapping function to map the sample features learned by deep learning technology supervision to the original data attributes (retention time, mass-to-charge ratio).
  • retention time and mass-to-charge ratio are labels for identifying specific substances.
  • the present invention uses deep learning technology to obtain sample features by calculating the class activation thermodynamic map, and can use the mapping function to infer the specific substances that make up the sample features, thereby further mining the sample feature markers metabolites, metabolic network patterns, and inferring the metabolism of the sample characteristic spectrum.
  • the metabolic profile inference system 100 also includes a model building block, and the model building block includes:
  • the data set division unit is used to obtain the data set and divide the data set into a training set, a verification set and a test set, and incorporate data from different sources as an external test set, and use sample attributes as classification labels;
  • the data sets are two-dimensional matrix data obtained by two-dimensional matrix transformation
  • the evaluation unit is used to evaluate the performance of the initial convolutional neural network model after training in the verification set and the test set, and retrain after adjusting the model structure and hyperparameters if the performance is not good;
  • the screening unit is configured to use the initial convolutional neural network model with the highest accuracy and robustness after training as the final convolutional neural network model.
  • the present invention also provides a computer device, including a memory, a processor, and computer instructions stored in the memory and operable on the processor. The steps of the above method are implemented when the processor executes the instructions.
  • the present invention also provides a storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the steps of the above method are realized.
  • the present invention directly inputs LC-MS raw data, converts and processes the original signal to the greatest extent, and uses the convolutional neural network model to classify, and extracts features from the classification model to obtain different metabolites in different classifications mode;
  • the two-dimensional conversion processing to LC-MS raw data can effectively reduce data size, thereby contribute to follow-up calculation;
  • the two-dimensional conversion processing of LC-MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification , instead of comparing each substance one by one in isolation, so that the metabolic profile related to the sample can be inferred more accurately.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

Disclosed in the present invention are a metabolic characteristic spectrum inference method. The method comprises: performing LC-MS technical processing on target sample data, so as to obtain LC-MS original data; performing dimension-reduction conversion processing on the LC-MS original data, so as to obtain a two-dimensional matrix, wherein the two-dimensional matrix retains the retention time, mass-to-charge ratio and ionic strength of the LC-MS original data; and inputting the two-dimensional matrix into a convolutional neural network model so as to infer a metabolic substance characteristic spectrum of the target sample data. Further disclosed in the present invention are a metabolic characteristic spectrum inference system, a computer device and a storage medium. By using the present invention, the problems in the existing metabonomic methods of it being difficult to process an error, a large number of original signals being lost, and the limitations of large-class differentiation can be solved.

Description

代谢特征谱推断方法、系统、计算机设备及存储介质Metabolic profile deduction method, system, computer equipment and storage medium 技术领域technical field
本发明涉及代谢组学数据分析领域,尤其涉及一种代谢特征谱推断方法、系统、计算机设备及存储介质。The invention relates to the field of metabolomics data analysis, in particular to a method, system, computer equipment and storage medium for inferring metabolic profile.
背景技术Background technique
人类血清中的代谢物包含宿主代谢物、微生物衍生代谢物、以及饮食等外源物质,与各种疾病的发生发展关系紧密。目前的代谢组学方法能够对血清中代谢物质进行定量测定、以及鉴定和分析。液相色谱质谱联用技术(Liquid Chromatograph-Mass Spectrometer,LC-MS)是一种常用的代谢物质检测技术,通过高效液相色谱分离不同物质,使用质谱对不同时相分离出来的物质进行质量分析。目前,非靶向LC-MS原始数据的物质鉴定主要是通过数据库比对进行,首先对原始数据进行质谱峰提取,再将不同质谱峰的保留时间、质荷比等属性与数据库中的已知物质进行比较。其中,人类代谢组数据库(The Human Metabolome Database,HMDB)包含114305个代谢物条目。但他们相比于实际的化学空间还是很少。化学宇宙数据库GDB-17中列举了超过1660亿个有机小分子。此外,代谢组学数据的处理过程中也存在着若干挑战(即稀疏、嘈杂、异质、依赖时间等)。现阶段,深度学习技术在代谢组学数据中的应用较少。SteroidXtract工具应用深度学习技术,能够直接使用原始的质谱图谱,对类固醇物质和非类固醇物质进行分类。然而,LC-MS数据是一种复杂的三维空间数据,同一样本包含多个时相数据(即不同的保留时间),每一时相数据均有一张质谱图。SteroidXtract方法与其它的代谢组学分析方法,均需要人为地对这些大量质谱图进行去冗余处理。此外,血清中代谢物质所参与的生物过程往往不止与单一的某一类或者某一个物质相关联,而这些不同的物质往往分布于不同的时相。Metabolites in human serum include host metabolites, microbial-derived metabolites, and exogenous substances such as diet, which are closely related to the occurrence and development of various diseases. Current metabolomics methods are capable of quantitative determination, identification and analysis of metabolites in serum. Liquid Chromatograph-Mass Spectrometer (LC-MS) is a commonly used detection technology for metabolites. Different substances are separated by high-performance liquid chromatography, and mass spectrometry is used to analyze the quality of the substances separated in different phases. . At present, the identification of substances in non-targeted LC-MS raw data is mainly carried out through database comparison. substances for comparison. Among them, the Human Metabolome Database (The Human Metabolome Database, HMDB) contains 114,305 metabolite entries. But they are still few compared to the actual chemical space. More than 166 billion small organic molecules are listed in the Chemical Universe database GDB-17. Furthermore, there are several challenges in the processing of metabolomics data (i.e., sparse, noisy, heterogeneous, time-dependent, etc.). At this stage, deep learning techniques are rarely used in metabolomics data. The SteroidXtract tool applies deep learning technology and can directly use the original mass spectrum to classify steroid substances and non-steroid substances. However, LC-MS data is a kind of complex three-dimensional spatial data. The same sample contains multiple time-phase data (ie, different retention times), and each time-phase data has a mass spectrum. The SteroidXtract method and other metabolomics analysis methods require artificial de-redundancy processing of these large mass spectra. In addition, the biological processes involved in metabolites in serum are often not associated with a single type or substance, and these different substances are often distributed in different phases.
传统的代谢组学方法,首先需要经过复杂过程进行噪音的去除、提取信号质谱峰,后使用统计方法、依赖已有的数据库进行相关分析与物质鉴定。首先,在数据处理的过程中,数据的稀疏性、嘈杂性、批次效应等问题为质谱峰峰对齐、质谱峰提取和后续的统计分析等带来了大量误差。其次,已有的数据库无 法囊括真实化学世界的大量代谢物质,一些未知的代谢物可能也在疾病的发生发展过程中发挥重要的作用。已有的深度学习技术使用质谱图作为输入数据,不仅需要繁复的去冗余处理,而且只能进行类固醇和非类固醇的大类区分。一些具有不同功能的同分异构体等可能有着近似的质谱表现,但在液相色谱中被分离至不同的时相。Traditional metabolomics methods first need to go through complex processes to remove noise and extract signal mass spectrum peaks, and then use statistical methods and rely on existing databases for correlation analysis and substance identification. First of all, in the process of data processing, problems such as data sparsity, noise, and batch effects have brought a lot of errors to mass spectrum peak-to-peak alignment, mass spectrum peak extraction, and subsequent statistical analysis. Secondly, the existing databases cannot cover a large number of metabolites in the real chemical world, and some unknown metabolites may also play an important role in the occurrence and development of diseases. Existing deep learning techniques use mass spectra as input data, which not only requires complicated de-redundancy processing, but also can only distinguish between steroids and non-steroids. Some isomers with different functions, etc. may have similar mass spectrometry performance, but are separated into different phases in liquid chromatography.
发明内容Contents of the invention
本发明所要解决的技术问题在于,提供一种代谢特征谱推断方法、系统、计算设备及存储介质,能够解决现有代谢组学方法所存在的误差处理难、原始信号大量丢失以及大类区分局限性的问题。The technical problem to be solved by the present invention is to provide a method, system, computing device, and storage medium for inferring metabolic profiles, which can solve the difficulties in error handling, loss of a large number of original signals, and the limitations of classification in existing metabolomics methods. sex issue.
为了解决上述技术问题,本发明提供了一种代谢特征谱推断方法,包括:将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的代谢物质特征谱。In order to solve the above technical problems, the present invention provides a method for inferring metabolic profiles, comprising: subjecting target sample data to LC-MS technology processing to obtain LC-MS raw data; performing dimensionality reduction transformation on the LC-MS raw data Processing to obtain a two-dimensional matrix that retains the retention time, mass-to-charge ratio, and ion intensity of the LC-MS raw data; inputting the two-dimensional matrix into a convolutional neural network model to infer the target sample Metabolite profiles of the data.
优选地,所述将所述LC-MS原始数据进行降维转换处理以获得二维矩阵的步骤包括;将所述LC-MS原始数据进行格式转换;设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔,其中,所述保留时间区间的范围为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内内的最大离子强度,以获得离子强度二维矩阵。Preferably, the step of performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix includes: performing format conversion on the LC-MS raw data; setting the initial retention time, termination retention time, retention Time interval, retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein the retention time interval ranges from the initial retention time to the termination retention time The range of the mass-to-charge ratio is the range between the initial mass-to-charge ratio and the end mass-to-charge ratio; in the retention time interval and the mass-to-charge ratio interval, the retention time sampling interval and mass-to-charge ratio sampling The interval is a sliding window, and the maximum ion intensity within the retention time interval and mass-to-charge ratio interval is sampled to obtain a two-dimensional matrix of ion intensity.
优选地,所述将所述二维矩阵输入卷积神经网络模型以推断出所述样本的代谢物质特征谱的步骤包括:根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数s(t,r),其中,t为保留时间,r为质荷比;根据所述卷积神经网络模型的网络结构,提取映射函数:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比;对所述类激活分数进行过滤处理以获得保留特征;根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目 标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。Preferably, the step of inputting the two-dimensional matrix into the convolutional neural network model to infer the metabolite profile of the sample includes: performing class activation thermodynamic calculations according to the convolutional neural network model to generate This class activation score s (t, r), wherein, t is the retention time, r is the mass-to-charge ratio; according to the network structure of the convolutional neural network model, the mapping function is extracted: t=map1(x), r= map2(y), wherein, t is the retention time, and r is the mass-to-charge ratio; according to the mapping function, the two-dimensional coordinates of the class activation thermodynamic map are mapped to retention time and mass-to-charge ratio; the class activation fraction is Filter processing to obtain retention characteristics; screen key metabolites according to the retention characteristics, and perform correlation calculations to infer the metabolic markers and metabolic network patterns of the target sample data, and then generate the metabolic profile of the target sample data .
优选地,所述对所述类激活分数进行过滤处理以获得保留特征的步骤包括:过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子特征,以获得保留特征。Preferably, the step of filtering the class activation scores to obtain retained features comprises: filtering out molecular features whose class activation scores are less than a first preset threshold and whose ion intensity is less than a second preset threshold, to obtain Preserve features.
本发明还提供了一种代谢特征谱推断系统,包括:LC-MS处理模块,用于将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;降维转换处理模块,用于将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;代谢特征谱推断模块,用于将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的代谢物质特征谱。The present invention also provides a metabolic profile inference system, including: an LC-MS processing module, used to process target sample data with LC-MS technology to obtain LC-MS raw data; a dimensionality reduction conversion processing module, used to convert The LC-MS raw data is subjected to dimension reduction conversion processing to obtain a two-dimensional matrix, and the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion intensity of the LC-MS raw data; the metabolic profile deduction module is used for The two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the target sample data.
优选地,所述降维转换处理模块包括:格式转换单元,用于将所述LC-MS原始数据进行格式转换;参数设置单元,用于设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔,其中,所述保留时间区间的范围为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;降维采样单元,用于在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内内的最大离子强度,以获得离子强度二维矩阵。Preferably, the dimensionality reduction conversion processing module includes: a format conversion unit for performing format conversion on the LC-MS raw data; a parameter setting unit for setting the initial retention time, termination retention time, retention time interval, Retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein the retention time interval ranges from the initial retention time to the termination retention time, The mass-to-charge ratio interval is the range between the initial mass-to-charge ratio and the end mass-to-charge ratio; the dimensionality reduction sampling unit is used for sampling intervals of the retention time and The mass-to-charge ratio sampling interval is a sliding window, and the maximum ion intensity within the retention time interval and the mass-to-charge ratio interval is sampled to obtain a two-dimensional matrix of ion intensity.
优选地,所述代谢特征谱推断模块包括:类激活分数获取单元,用于根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数s(t,r),其中,t为保留时间,r为质荷比;提取单元,用于根据所述卷积神经网络模型的网络结构,提取映射函数:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;映射单元,用于根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比;过滤单元,用于对所述类激活分数进行过滤处理以获得保留特征;计算推断单元,用于根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。Preferably, the metabolic profile inference module includes: a class activation score acquisition unit, which is used to calculate the class activation heat map according to the convolutional neural network model, and generate a class activation score s(t, r) for each sample, Wherein, t is the retention time, r is the mass-to-charge ratio; the extraction unit is used to extract the mapping function according to the network structure of the convolutional neural network model: t=map1(x), r=map2(y), wherein, T is the retention time, r is the mass-to-charge ratio; the mapping unit is used to map the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function; the filter unit is used to classify the class The activation score is filtered to obtain the retention characteristics; the calculation inference unit is used to screen key metabolites according to the retention characteristics, and perform correlation calculations to infer the metabolic markers and metabolic network patterns of the target sample data, and then generate The metabolic profile of the target sample data.
优选地,所述过滤单元用于过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子特征,以获得保留特征。Preferably, the filtering unit is configured to filter out molecular features whose class activation scores are smaller than a first preset threshold and whose ion intensity is smaller than a second preset threshold, so as to obtain reserved features.
本发明还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令,所述处理器执行所述指令时实现上述方法的步骤。The present invention also provides a computer device, which includes a memory, a processor, and computer instructions stored in the memory and operable on the processor. The steps of the above method are realized when the processor executes the instructions.
本发明还提供了一种存储介质,其存储有计算机指令,该计算机指令被处理器执行时实现上述方法的步骤。The present invention also provides a storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the steps of the above method are realized.
实施本发明的有益效果在于:The beneficial effect of implementing the present invention is:
本发明,通过先将需要推断代谢特征谱的样本进行LC-MS技术处理以获得LC-MS原始数据,其中,所述LC-MS技术为液相色谱质谱联用技术;再将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,最后将所述二维矩阵输入所述卷积神经网络模型以推断出所述样本的代谢物质特征谱。In the present invention, the raw data of LC-MS is obtained by first processing the sample that needs to infer the metabolic profile through LC-MS technology, wherein the LC-MS technology is liquid chromatography mass spectrometry technology; and then the LC-MS technology is The MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the sample.
采用本发明,对LC-MS原始数据的二维转换处理,能够有效的降低数据大小,从而有助于后续计算;相比于现有方法去冗余时会导致的大量信号丢失,对LC-MS原始数据的二维转换处理,能够最大程度的保留物质信号;本发明从最终卷积神经网络模型中提取样本属性相关特征,能够更有效的评估多个物质与样本分类的联合相关性,而不是孤立地逐一比较各个物质,从而能够更准确地推断样本相关代谢谱。With the present invention, the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS The two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
附图说明Description of drawings
图1是本发明提供的代谢特征谱推断方法流程图;Fig. 1 is the flowchart of the metabolic profile deduction method provided by the present invention;
图2是本发明提供的降维转换处理的方法流程图;Fig. 2 is the flow chart of the method for dimension reduction conversion processing provided by the present invention;
图3是本发明提供的代谢物质特征谱的推断方法流程图;Fig. 3 is the flow chart of the method for inferring the profile of metabolites provided by the present invention;
图4是本发明提供的代谢特征谱推断方法原理图;Fig. 4 is a schematic diagram of the method for inferring metabolic profiles provided by the present invention;
图5是本发明提供的代谢特征谱推断系统的原理框图;Fig. 5 is a functional block diagram of the metabolic profile inference system provided by the present invention;
图6是本发明提供的降维转换处理模块的原理框图;Fig. 6 is a functional block diagram of the dimension reduction conversion processing module provided by the present invention;
图7是本发明提供的代谢特征谱推断模块的原理框图。Fig. 7 is a functional block diagram of the metabolic profile inference module provided by the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述。仅此声明,本发明在文中出现或即将出现的上、下、左、右、前、后、内、外等方位用词,仅以本发明的附图为基准,其并不是对本发 明的具体限定。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. It is only stated here that the words for directions such as up, down, left, right, front, back, inside, and outside that appear or will appear in the text of the present invention are only based on the accompanying drawings of the present invention, and are not specific to the present invention. limited.
如图1所示,本发明提供了一种代谢特征谱推断方法,包括:As shown in Figure 1, the present invention provides a method for inferring metabolic profiles, including:
S101,将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;S101, subjecting the target sample data to LC-MS technical processing to obtain LC-MS raw data;
S102,将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;S102, performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix, the two-dimensional matrix retaining the retention time, mass-to-charge ratio and ion intensity of the LC-MS raw data;
S103,将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的代谢物质特征谱。S103. Input the two-dimensional matrix into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the target sample data.
本发明,通过先将需要推断代谢特征谱的样本进行LC-MS技术处理以获得LC-MS原始数据,其中,所述LC-MS技术为液相色谱质谱联用技术;再将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,最后将所述二维矩阵输入所述卷积神经网络模型以推断出所述样本的代谢物质特征谱。In the present invention, the raw data of LC-MS is obtained by first processing the sample that needs to infer the metabolic profile through LC-MS technology, wherein the LC-MS technology is liquid chromatography mass spectrometry technology; and then the LC-MS technology is The MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the sample.
采用本发明,对LC-MS原始数据的二维转换处理,能够有效的降低数据大小,从而有助于后续计算;相比于现有方法去冗余时会导致的大量信号丢失,对LC-MS原始数据的二维转换处理,能够最大程度的保留物质信号;本发明从最终卷积神经网络模型中提取样本属性相关特征,能够更有效的评估多个物质与样本分类的联合相关性,而不是孤立地逐一比较各个物质,从而能够更准确地推断样本相关代谢谱。With the present invention, the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS The two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
如图2所示,优选地,所述将所述LC-MS原始数据进行降维转换处理以获得二维矩阵的步骤包括;As shown in Figure 2, preferably, the step of performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix includes;
S201,将所述LC-MS原始数据进行格式转换;S201, performing format conversion on the LC-MS raw data;
将所述LC-MS原始数据转换为.mzml格式数据,但不限于此;Converting the LC-MS raw data into .mzml format data, but not limited thereto;
S202,设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔。S202, setting a starting retention time, an ending retention time, a retention time interval, a retention time sampling interval, a starting mass-to-charge ratio, an ending mass-to-charge ratio, a mass-to-charge ratio interval, and a mass-to-charge ratio sampling interval.
所述保留时间区间的范围为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;The range of the retention time interval is the range from the initial retention time to the end retention time, and the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the end mass-to-charge ratio;
S203,在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内内的最大离子强度,以获得离子强度二维矩阵。S203, within the retention time interval and the mass-to-charge ratio interval, using the retention time sampling interval and the mass-to-charge ratio sampling interval as a sliding window, sampling the maximum ion intensity within the retention time interval and the mass-to-charge ratio interval, to obtain a two-dimensional matrix of ionic strength.
离子强度二维矩阵为:The two-dimensional matrix of ionic strength is:
i(t,r)=max{intensity(t,r),...,intensity(t,r+Rgap)...,intensity(t+Tgap,r+Rgap)},t∈ (T0,Te),r∈(R0,Re),其中,t为保留时间,r为质荷比,intensity为离子强度,T0为起始保留时间,Te为终止保留时间,Tgap为保留时间采样间隔,R0为起始质荷比,Re为终止质荷比,Rgap为质荷比采样间隔。i(t,r)=max{intensity(t,r),...,intensity(t,r+Rgap)...,intensity(t+Tgap,r+Rgap)}, t∈ (T0,Te ), r∈(R0,Re), where t is the retention time, r is the mass-to-charge ratio, intensity is the ionic strength, T0 is the initial retention time, Te is the termination retention time, Tgap is the retention time sampling interval, R0 is The starting mass-to-charge ratio, Re is the ending mass-to-charge ratio, and Rgap is the mass-to-charge ratio sampling interval.
需要说明的是,在数据预处理方面,现有的深度学习技术获取各时相的二维质谱图,基于质谱图进行物质的鉴定与后续分析。质谱图中仅含有物质的质荷比、离子强度信息,加上每一质谱图的时相标签,仍然是庞大的三维数据。因此需要进行去冗余操作,去除大量的时相信息,从而只能够针对类固醇单类物质进行处理。而本发明创新性地将三维空间的原始数据降维转换为二维矩阵,能够同时保留原始数据的保留时间、质荷比、离子强度等信息。血清样本经LC-MS检测后的原始数据是一种三维点云数据,分别是保留时间、质荷比、离子强度三个维度。以本发明的方法进行降维转换后,获得以保留时间、质荷比为轴,离子强度为值的二维矩阵数据。有效地对原始数据进行了降维,同时最大程度地保留了代谢物质信号。It should be noted that in terms of data preprocessing, the existing deep learning technology obtains two-dimensional mass spectra of each time phase, and carries out substance identification and subsequent analysis based on the mass spectra. The mass spectrum only contains the mass-to-charge ratio and ion intensity information of the substance, plus the phase label of each mass spectrum, which is still a huge three-dimensional data. Therefore, it is necessary to perform a de-redundancy operation to remove a large amount of phase information, so that only a single class of steroids can be processed. However, the present invention innovatively converts the original data in three-dimensional space into a two-dimensional matrix, which can simultaneously retain information such as retention time, mass-to-charge ratio, and ion intensity of the original data. The original data of the serum sample after LC-MS detection is a kind of three-dimensional point cloud data, including three dimensions of retention time, mass-to-charge ratio, and ion intensity. After the dimension reduction transformation is carried out by the method of the present invention, two-dimensional matrix data with retention time and mass-to-charge ratio as axes and ion intensity as values is obtained. It effectively reduces the dimensionality of the original data while retaining the metabolite signal to the greatest extent.
如图3所示,优选地,所述将所述二维矩阵输入卷积神经网络模型以推断出所述样本的代谢物质特征谱的步骤包括:As shown in Figure 3, preferably, the step of inputting the two-dimensional matrix into the convolutional neural network model to infer the metabolic substance profile of the sample includes:
S301,根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数。S301. Perform class activation heat map calculation according to the convolutional neural network model to generate a class activation score for each sample.
需要说明的是,类激活分数可表示为s(t,r),其中,t为保留时间,r为质荷比;It should be noted that the class activation score can be expressed as s(t,r), where t is the retention time and r is the mass-to-charge ratio;
S302,根据所述卷积神经网络模型的网络结构,提取映射函数。S302. Extract a mapping function according to the network structure of the convolutional neural network model.
所述映射函数为:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;The mapping function is: t=map1(x), r=map2(y), wherein, t is the retention time, and r is the mass-to-charge ratio;
S303,根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比。S303. Map the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function.
其中,将类激活热力图的二维坐标(x,y)映射至保留时间(t)、质荷比(r)。Among them, the two-dimensional coordinates (x, y) of the class activation heat map are mapped to retention time (t) and mass-to-charge ratio (r).
S304,对所述类激活分数进行过滤处理以获得保留特征;S304, performing filtering processing on the class activation score to obtain reserved features;
S305,根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。S305. Screen key metabolites according to the retention characteristics, and perform correlation calculations to infer metabolic markers and metabolic network patterns of the target sample data, and then generate a metabolic profile of the target sample data.
优选地,所述对所述类激活分数进行过滤处理以获得保留特征的步骤包括:过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子 特征,以获得保留特征。具体地,过滤掉类激活分数s(t,r)小于第一阈值或离子强度intensity(t,r)小于第二阈值的分子特征(t,r),以获得保留特征[(t1,r1),(t2,r2),(t3,r3),...,(tn,rn)]。Preferably, the step of filtering the class activation scores to obtain retained features comprises: filtering out molecular features whose class activation scores are less than a first preset threshold and whose ion intensity is less than a second preset threshold, to obtain Preserve features. Specifically, molecular features (t, r) whose class activation scores s(t, r) are smaller than the first threshold or ionic intensity (t, r) is smaller than the second threshold are filtered out to obtain the reserved features [(t1, r1) ,(t2,r2),(t3,r3),...,(tn,rn)].
具体地,现有深度学习技术因涉及物质种类有限,仅能够对类固醇物质进行分类提取。而传统的代谢组学技术需要经复杂地数据前处理过程,提取质谱峰获得样本的代谢物质矩阵后,以统计驱动的方式获取代谢物质特征谱。根据LC-MS数据特性,本发明创新性地提出映射函数,将深度学习技术监督学习到的样本特征映射到原始数据属性(保留时间、质荷比)。对LC-MS数据而言,保留时间、质荷比是鉴定具体物质的标签。本发明利用深度学习技术,以计算类激活热力图的方法获取样本特征后,可以使用映射函数推断出组成样本特征的具体物质,从而进一步挖掘样本特征标志代谢物质、代谢网络模式,推断样本的代谢特征谱。Specifically, the existing deep learning technology can only classify and extract steroid substances due to the limited types of substances involved. However, the traditional metabolomics technology needs complex data preprocessing to extract the mass spectrum peaks to obtain the metabolite matrix of the sample, and obtain the metabolite characteristic spectrum in a statistically driven manner. According to the characteristics of LC-MS data, the present invention innovatively proposes a mapping function to map the sample features learned by deep learning technology supervision to the original data attributes (retention time, mass-to-charge ratio). For LC-MS data, retention time and mass-to-charge ratio are labels for identifying specific substances. The present invention uses deep learning technology to obtain sample features by calculating the class activation thermodynamic map, and can use the mapping function to infer the specific substances that make up the sample features, thereby further mining the sample feature markers metabolites, metabolic network patterns, and inferring the metabolism of the sample characteristic spectrum.
另外,事先构建和训练卷积神经网络模型的步骤包括:In addition, the steps to build and train a convolutional neural network model in advance include:
(1)获取数据集并将所述数据集划分为训练集、验证集以及测试集,并纳入不同来源数据作为外部测试集,以样本属性作为分类标签;(1) Obtain a data set and divide the data set into a training set, a verification set and a test set, and incorporate data from different sources as an external test set, and use sample attributes as a classification label;
其中,所述数据集均为通过二维矩阵变换得到的二维矩阵数据。Wherein, the data sets are two-dimensional matrix data obtained through two-dimensional matrix transformation.
(2)构建初始卷积神经网络模型,并使用训练集对所述初始卷积神经网络模型进行模型训练;(2) construct initial convolutional neural network model, and use training set to carry out model training to described initial convolutional neural network model;
(3)评估训练过后的所述初始卷积神经网络模型在验证集与测试集中的性能表现,若性能不佳则调整模型结构和超参数后重新训练;(3) Evaluate the performance of the initial convolutional neural network model after the training in the verification set and the test set, and retrain after adjusting the model structure and hyperparameters if the performance is not good;
(4)将训练过后中准确率与鲁棒性最高的所述初始卷积神经网络模型作为最终卷积神经网络模型。(4) The initial convolutional neural network model with the highest accuracy and robustness after training is used as the final convolutional neural network model.
综上,如图4所示,本发明提供的代谢特征谱推断方法,通过直接输入LC-MS原始数据,经最大程度保留原始信号的方法进行转换处理后使用卷积神经网络模型进行分类,并从分类模型中提取特征,获取不同分类中不同的代谢物质模式;采用本发明,对LC-MS原始数据的二维转换处理,能够有效的降低数据大小,从而有助于后续计算;相比于现有方法去冗余时会导致的大量信号丢失,对LC-MS原始数据的二维转换处理,能够最大程度的保留物质信号;本发明从最终卷积神经网络模型中提取样本属性相关特征,能够更有效的评估多个物质与样本分类的联合相关性,而不是孤立地逐一比较各个物质,从而能够 更准确地推断样本相关代谢谱。In summary, as shown in Figure 4, the method for inferring metabolic profiles provided by the present invention directly inputs LC-MS raw data, converts and processes the original signal to the greatest extent, and then classifies it using a convolutional neural network model, and Extract features from the classification model to obtain different metabolite patterns in different classifications; using the present invention, the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared to A large number of signals will be lost when the existing method removes redundancy, and the two-dimensional conversion processing of the LC-MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, It can more effectively evaluate the joint correlation of multiple substances and sample classification, instead of comparing each substance one by one in isolation, so that the metabolic profile of the sample can be inferred more accurately.
如图5所示,本发明还提供了一种代谢特征谱推断系统100,包括:As shown in Figure 5, the present invention also provides a metabolic profile inference system 100, including:
LC-MS处理模块1,用于将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;LC-MS processing module 1, used for subjecting target sample data to LC-MS technical processing to obtain LC-MS raw data;
降维转换处理模块2,用于将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;Dimensionality reduction conversion processing module 2, for performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix, the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion of the LC-MS raw data strength;
代谢特征谱推断模块3,用于将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的代谢物质特征谱。The metabolic profile inference module 3 is used to input the two-dimensional matrix into the convolutional neural network model to infer the metabolic profile of the target sample data.
本发明,通过所述LC-MS处理模块1先将需要推断代谢特征谱的样本进行LC-MS技术处理以获得LC-MS原始数据,再通过所述降维转换处理模块2将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,最后通过所述代谢特征谱推断模块3将所述二维矩阵输入所述卷积神经网络模型以推断出所述样本的代谢物质特征谱。采用本发明,对LC-MS原始数据的二维转换处理,能够有效的降低数据大小,从而有助于后续计算;相比于现有方法去冗余时会导致的大量信号丢失,对LC-MS原始数据的二维转换处理,能够最大程度的保留物质信号;本发明从最终卷积神经网络模型中提取样本属性相关特征,能够更有效的评估多个物质与样本分类的联合相关性,而不是孤立地逐一比较各个物质,从而能够更准确地推断样本相关代谢谱。In the present invention, through the LC-MS processing module 1, first perform LC-MS technology processing on the samples that need to infer the metabolic profile to obtain LC-MS raw data, and then use the dimensionality reduction conversion processing module 2 to convert the LC-MS The MS raw data is subjected to dimensionality reduction conversion processing to obtain a two-dimensional matrix, and finally the two-dimensional matrix is input into the convolutional neural network model through the metabolic profile inference module 3 to infer the metabolic substance profile of the sample. With the present invention, the two-dimensional conversion processing of LC-MS raw data can effectively reduce the data size, thereby contributing to subsequent calculations; compared with the existing methods that will cause a large number of signal loss when removing redundancy, the LC-MS The two-dimensional conversion processing of MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification, and Instead of comparing each substance one by one in isolation, it is possible to more accurately infer the metabolic profile of the sample.
如图6所示,所述降维转换处理模块2包括:As shown in Figure 6, the dimensionality reduction conversion processing module 2 includes:
格式转换单元21,用于将所述LC-MS原始数据进行格式转换;A format conversion unit 21, configured to convert the format of the LC-MS raw data;
参数设置单元22,用于设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔,其中,所述保留时间区间的范围为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;The parameter setting unit 22 is used to set the initial retention time, termination retention time, retention time interval, retention time sampling interval, initial mass-to-charge ratio, termination mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein, The range of the retention time interval is the range from the initial retention time to the end retention time, and the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the end mass-to-charge ratio;
降维采样单元23,用于在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内的最大离子强度,以获得离子强度二维矩阵。The dimensionality reduction sampling unit 23 is used to sample the retention time interval and the mass-to-charge ratio interval using the retention time sampling interval and the mass-to-charge ratio sampling interval as a sliding window within the retention time interval and the mass-to-charge ratio interval. The maximum ionic strength of , to obtain a two-dimensional matrix of ionic strength.
需要说明的是,在数据预处理方面,现有的深度学习技术获取各时相的二维质谱图,基于质谱图进行物质的鉴定与后续分析。质谱图中仅含有物质的质 荷比、离子强度信息,加上每一质谱图的时相标签,仍然是庞大的三维数据。因此需要进行去冗余操作,去除大量的时相信息,从而只能够针对类固醇单类物质进行处理。而本发明创新性地将三维空间的原始数据降维转换为二维矩阵,能够同时保留原始数据的保留时间、质荷比、离子强度等信息。血清样本经LC-MS检测后的原始数据是一种三维点云数据,分别是保留时间、质荷比、离子强度三个维度。以本发明的方法进行降维转换后,获得以保留时间、质荷比为轴,离子强度为值的二维矩阵数据。有效地对原始数据进行了降维,同时最大程度地保留了代谢物质信号。It should be noted that in terms of data preprocessing, the existing deep learning technology obtains two-dimensional mass spectra of each time phase, and carries out substance identification and subsequent analysis based on the mass spectra. The mass spectrum only contains the mass-to-charge ratio and ion intensity information of the substance, plus the phase label of each mass spectrum, which is still a huge three-dimensional data. Therefore, it is necessary to perform a de-redundancy operation to remove a large amount of phase information, so that only a single class of steroids can be processed. However, the present invention innovatively converts the original data in three-dimensional space into a two-dimensional matrix, which can simultaneously retain information such as retention time, mass-to-charge ratio, and ion intensity of the original data. The original data of the serum sample after LC-MS detection is a kind of three-dimensional point cloud data, including three dimensions of retention time, mass-to-charge ratio, and ion intensity. After the dimension reduction transformation is carried out by the method of the present invention, two-dimensional matrix data with retention time and mass-to-charge ratio as axes and ion intensity as values is obtained. It effectively reduces the dimensionality of the original data while retaining the metabolite signal to the greatest extent.
如图7所示,所述代谢特征谱推断模块3包括:As shown in Figure 7, the metabolic profile inference module 3 includes:
类激活分数获取单元31,用于根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数s(t,r),其中,t为保留时间,r为质荷比;The class activation score acquisition unit 31 is used to calculate the class activation heat map according to the convolutional neural network model, and generate the class activation score s(t, r) of each sample, where t is the retention time and r is the mass charge Compare;
提取单元32,用于根据所述卷积神经网络模型的网络结构,提取映射函数:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;The extraction unit 32 is used to extract the mapping function according to the network structure of the convolutional neural network model: t=map1(x), r=map2(y), wherein, t is the retention time, and r is the mass-to-charge ratio;
映射单元33,用于根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比;A mapping unit 33, configured to map the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function;
过滤单元34,用于对所述类激活分数进行过滤处理以获得保留特征;A filtering unit 34, configured to filter the class activation scores to obtain reserved features;
计算推断单元35,用于根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。Calculation and inference unit 35, configured to screen key metabolites according to the retention characteristics, and perform correlation calculations to deduce the metabolic markers and metabolic network patterns of the target sample data, and then generate the metabolic profile of the target sample data .
进一步地,所述过滤单元用于过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子特征,以获得保留特征。Further, the filtering unit is configured to filter out molecular features whose class activation scores are smaller than a first preset threshold and whose ion intensity is smaller than a second preset threshold, so as to obtain reserved features.
需要说明的是,现有深度学习技术因涉及物质种类有限,仅能够对类固醇物质进行分类提取。而传统的代谢组学技术需要经复杂地数据前处理过程,提取质谱峰获得样本的代谢物质矩阵后,以统计驱动的方式获取代谢物质特征谱。根据LC-MS数据特性,本发明创新性地提出映射函数,将深度学习技术监督学习到的样本特征映射到原始数据属性(保留时间、质荷比)。对LC-MS数据而言,保留时间、质荷比是鉴定具体物质的标签。本发明利用深度学习技术,以计算类激活热力图的方法获取样本特征后,可以使用映射函数推断出组成样本特征的具体物质,从而进一步挖掘样本特征标志代谢物质、代谢网络模式,推断样本的代谢特征谱。It should be noted that the existing deep learning technology can only classify and extract steroid substances due to the limited types of substances involved. However, the traditional metabolomics technology needs complex data preprocessing to extract the mass spectrum peaks to obtain the metabolite matrix of the sample, and obtain the metabolite characteristic spectrum in a statistically driven manner. According to the characteristics of LC-MS data, the present invention innovatively proposes a mapping function to map the sample features learned by deep learning technology supervision to the original data attributes (retention time, mass-to-charge ratio). For LC-MS data, retention time and mass-to-charge ratio are labels for identifying specific substances. The present invention uses deep learning technology to obtain sample features by calculating the class activation thermodynamic map, and can use the mapping function to infer the specific substances that make up the sample features, thereby further mining the sample feature markers metabolites, metabolic network patterns, and inferring the metabolism of the sample characteristic spectrum.
另外,代谢特征谱推断系统100还包括模型构建模块,所述模型构建模块包括:In addition, the metabolic profile inference system 100 also includes a model building block, and the model building block includes:
数据集划分单元,用于获取数据集并将所述数据集划分为训练集、验证集以及测试集,并纳入不同来源数据作为外部测试集,以样本属性作为分类标签;The data set division unit is used to obtain the data set and divide the data set into a training set, a verification set and a test set, and incorporate data from different sources as an external test set, and use sample attributes as classification labels;
其中,所述数据集均为通过二维矩阵变换得到的二维矩阵数据;Wherein, the data sets are two-dimensional matrix data obtained by two-dimensional matrix transformation;
构建训练单元,用于构建初始卷积神经网络模型,并使用训练集对所述初始卷积神经网络模型进行模型训练;Constructing a training unit for constructing an initial convolutional neural network model, and using a training set to perform model training on the initial convolutional neural network model;
评估单元,用于评估训练过后的所述初始卷积神经网络模型在验证集与测试集中的性能表现,若性能不佳则调整模型结构和超参数后重新训练;The evaluation unit is used to evaluate the performance of the initial convolutional neural network model after training in the verification set and the test set, and retrain after adjusting the model structure and hyperparameters if the performance is not good;
筛选单元,用于将训练过后中准确率与鲁棒性最高的所述初始卷积神经网络模型作为最终卷积神经网络模型。The screening unit is configured to use the initial convolutional neural network model with the highest accuracy and robustness after training as the final convolutional neural network model.
相应地,本发明还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令,所述处理器执行所述指令时实现上述方法的步骤。同时,本发明还提供了一种存储介质,其存储有计算机指令,该计算机指令被处理器执行时实现上述方法的步骤。Correspondingly, the present invention also provides a computer device, including a memory, a processor, and computer instructions stored in the memory and operable on the processor. The steps of the above method are implemented when the processor executes the instructions. At the same time, the present invention also provides a storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the steps of the above method are realized.
综上,本发明直接输入LC-MS原始数据,经最大程度保留原始信号的方法进行转换处理后使用卷积神经网络模型进行分类,并从分类模型中提取特征,获取不同分类中不同的代谢物质模式;采用本发明,对LC-MS原始数据的二维转换处理,能够有效的降低数据大小,从而有助于后续计算;相比于现有方法去冗余时会导致的大量信号丢失,对LC-MS原始数据的二维转换处理,能够最大程度的保留物质信号;本发明从最终卷积神经网络模型中提取样本属性相关特征,能够更有效的评估多个物质与样本分类的联合相关性,而不是孤立地逐一比较各个物质,从而能够更准确地推断样本相关代谢谱。In summary, the present invention directly inputs LC-MS raw data, converts and processes the original signal to the greatest extent, and uses the convolutional neural network model to classify, and extracts features from the classification model to obtain different metabolites in different classifications mode; adopt the present invention, the two-dimensional conversion processing to LC-MS raw data, can effectively reduce data size, thereby contribute to follow-up calculation; Compared with the large amount of signal loss that can cause when existing method removes redundancies, to The two-dimensional conversion processing of LC-MS raw data can preserve the material signal to the greatest extent; the present invention extracts sample attribute-related features from the final convolutional neural network model, which can more effectively evaluate the joint correlation between multiple substances and sample classification , instead of comparing each substance one by one in isolation, so that the metabolic profile related to the sample can be inferred more accurately.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。The above description is a preferred embodiment of the present invention, and it should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications are also considered Be the protection scope of the present invention.

Claims (10)

  1. 一种代谢特征谱推断方法,其特征在于,包括:A metabolic profile inference method, characterized in that, comprising:
    将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;The target sample data is processed by LC-MS technology to obtain LC-MS raw data;
    将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;performing a dimensionality reduction conversion process on the LC-MS raw data to obtain a two-dimensional matrix, the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion intensity of the LC-MS raw data;
    将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的代谢物质特征谱。The two-dimensional matrix is input into the convolutional neural network model to infer the metabolic substance characteristic spectrum of the target sample data.
  2. 如权利要求1所述的代谢特征谱推断方法,其特征在于,所述将所述LC-MS原始数据进行降维转换处理以获得二维矩阵的步骤包括;The method for inferring metabolic profiles according to claim 1, wherein the step of performing dimensionality reduction conversion processing on the LC-MS raw data to obtain a two-dimensional matrix comprises;
    将所述LC-MS原始数据进行格式转换;Carrying out format conversion of the LC-MS raw data;
    设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔,其中,所述保留时间区间为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;Set the initial retention time, termination retention time, retention time interval, retention time sampling interval, initial mass-to-charge ratio, end mass-to-charge ratio, mass-to-charge ratio interval, and mass-to-charge ratio sampling interval, wherein the retention time interval is from From the initial retention time to the range between the termination retention time, the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the termination mass-to-charge ratio;
    在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内的最大离子强度,以获得离子强度二维矩阵。In the retention time interval and the mass-to-charge ratio interval, with the retention time sampling interval and the mass-to-charge ratio sampling interval as a sliding window, sample the maximum ion intensity in the retention time interval and the mass-to-charge ratio interval to obtain ion Intensity 2D matrix.
  3. 如权利要求2所述的代谢特征谱推断方法,其特征在于,所述将所述二维矩阵输入卷积神经网络模型以推断出所述样本的代谢物质特征谱的步骤包括:The metabolic profile inference method according to claim 2, wherein the step of inputting the two-dimensional matrix into a convolutional neural network model to deduce the metabolic profile of the sample comprises:
    根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数s(t,r),其中,t为保留时间,r为质荷比;According to the convolutional neural network model, the class activation heat map is calculated to generate a class activation score s(t, r) for each sample, where t is the retention time and r is the mass-to-charge ratio;
    根据所述卷积神经网络模型的网络结构,提取映射函数:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;According to the network structure of described convolutional neural network model, extract mapping function: t=map1 (x), r=map2 (y), wherein, t is retention time, and r is mass-to-charge ratio;
    根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比;mapping the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function;
    对所述类激活分数进行过滤处理以获得保留特征;Filtering the class activation scores to obtain retained features;
    根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目 标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。Screen key metabolites according to the retention characteristics, and perform correlation calculations to deduce the metabolic markers and metabolic network patterns of the target sample data, and then generate the metabolic profile of the target sample data.
  4. 如权利要求3所述的代谢特征谱推断方法,其特征在于,所述对所述类激活分数进行过滤处理以获得保留特征的步骤包括:The method for inferring metabolic profiles according to claim 3, wherein the step of filtering the class activation scores to obtain retained features comprises:
    过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子特征,以获得保留特征。Filtering out molecular features whose class activation scores are smaller than a first preset threshold and whose ion intensity is smaller than a second preset threshold, so as to obtain reserved features.
  5. 一种代谢特征谱推断系统,其特征在于,包括:A system for inferring metabolic profiles, characterized by comprising:
    LC-MS处理模块,用于将目标样本数据进行LC-MS技术处理以获得LC-MS原始数据;The LC-MS processing module is used to process the target sample data with LC-MS technology to obtain LC-MS raw data;
    降维转换处理模块,用于将所述LC-MS原始数据进行降维转换处理以获得二维矩阵,所述二维矩阵保留所述LC-MS原始数据的保留时间、质荷比及离子强度;A dimensionality reduction transformation processing module, configured to perform dimensionality reduction transformation processing on the LC-MS raw data to obtain a two-dimensional matrix, the two-dimensional matrix retains the retention time, mass-to-charge ratio and ion intensity of the LC-MS raw data ;
    代谢特征谱推断模块,用于将所述二维矩阵输入卷积神经网络模型以推断出所述目标样本数据的的代谢物质特征谱。The metabolic profile inference module is used to input the two-dimensional matrix into the convolutional neural network model to infer the metabolic profile of the target sample data.
  6. 如权利要求5所述的代谢特征谱推断系统,其特征在于,所述降维转换处理模块包括:The metabolic profile inference system according to claim 5, wherein the dimension reduction conversion processing module comprises:
    格式转换单元,用于将所述LC-MS原始数据进行格式转换;a format conversion unit, configured to convert the format of the LC-MS raw data;
    参数设置单元,用于设置起始保留时间、终止保留时间、保留时间区间、保留时间采样间隔、起始质荷比、终止质荷比、质荷比区间以及质荷比采样间隔,其中,所述保留时间区间的范围为起始保留时间至终止保留时间之间的范围,所述质荷比区间为起始质荷比至终止质荷比之间的范围;The parameter setting unit is used to set the initial retention time, the termination retention time, the retention time interval, the retention time sampling interval, the initial mass-to-charge ratio, the final mass-to-charge ratio, the mass-to-charge ratio interval, and the mass-to-charge ratio sampling interval, wherein the The range of the retention time interval is the range from the initial retention time to the termination retention time, and the mass-to-charge ratio interval is the range from the initial mass-to-charge ratio to the termination mass-to-charge ratio;
    降维采样单元,用于在所述保留时间区间和质荷比区间内,以所述保留时间采样间隔以及质荷比采样间隔为滑窗,采样所述保留时间区间和质荷比区间内内的最大离子强度,以获得离子强度二维矩阵。The dimensionality reduction sampling unit is used to sample the retention time interval and the mass-to-charge ratio interval within the retention time interval and the mass-to-charge ratio interval using the retention time sampling interval and the mass-to-charge ratio sampling interval as a sliding window The maximum ionic strength of , to obtain a two-dimensional matrix of ionic strength.
  7. 如权利要求6所述的代谢特征谱推断系统,其特征在于,所述代谢特征谱推断模块包括:The metabolic profile inference system according to claim 6, wherein the metabolic profile inference module comprises:
    类激活分数获取单元,用于根据所述卷积神经网络模型进行类激活热力图计算,生成每一样本的类激活分数s(t,r),其中,t为保留时间,r为质荷比;The class activation score acquisition unit is used to calculate the class activation heat map according to the convolutional neural network model, and generate the class activation score s(t, r) of each sample, where t is the retention time and r is the mass-to-charge ratio ;
    提取单元,用于根据所述卷积神经网络模型的网络结构,提取映射函数:t=map1(x),r=map2(y),其中,t为保留时间,r为质荷比;The extraction unit is used to extract the mapping function according to the network structure of the convolutional neural network model: t=map1(x), r=map2(y), where t is the retention time, and r is the mass-to-charge ratio;
    映射单元,用于根据所述映射函数将所述类激活热力图的二维坐标映射至保留时间和质荷比;a mapping unit, configured to map the two-dimensional coordinates of the class activation thermodynamic map to retention time and mass-to-charge ratio according to the mapping function;
    过滤单元,用于对所述类激活分数进行过滤处理以获得保留特征;a filtering unit, configured to filter the class activation scores to obtain reserved features;
    计算推断单元,用于根据所述保留特征筛选关键代谢物质,并进行相关性计算以推断所述出目标样本数据的代谢标志物与代谢网络模式,进而生成所述目标样本数据的代谢特征谱。The calculation and inference unit is used to screen key metabolites according to the retention characteristics, and perform correlation calculations to deduce the metabolic markers and metabolic network patterns of the target sample data, and then generate the metabolic profile of the target sample data.
  8. 如权利要求7所述的代谢特征谱推断系统,其特征在于,所述过滤单元用于过滤掉所述类激活分数小于第一预设阈值且离子强度小于第二预设阈值的分子特征,以获得保留特征。The metabolic profile inference system according to claim 7, wherein the filtering unit is used to filter out molecular features whose class activation scores are less than a first preset threshold and whose ion intensity is less than a second preset threshold, so as to Get preserved features.
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令,其特征在于,所述处理器执行所述指令时实现权利要求1-4任意一项所述方法的步骤。A computer device, comprising a memory, a processor, and computer instructions stored on the memory and operable on the processor, wherein the processor implements any one of claims 1-4 when executing the instructions method steps.
  10. 一种存储介质,其存储有计算机指令,其特征在于,该计算机指令被处理器执行时实现权利要求1-4任意一项所述方法的步骤。A storage medium, which stores computer instructions, characterized in that, when the computer instructions are executed by a processor, the steps of the method described in any one of claims 1-4 are implemented.
PCT/CN2021/102060 2021-06-24 2021-06-24 Metabolic characteristic spectrum inference method and system, and computer device and storage medium WO2022266928A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102060 WO2022266928A1 (en) 2021-06-24 2021-06-24 Metabolic characteristic spectrum inference method and system, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102060 WO2022266928A1 (en) 2021-06-24 2021-06-24 Metabolic characteristic spectrum inference method and system, and computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2022266928A1 true WO2022266928A1 (en) 2022-12-29

Family

ID=84545075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102060 WO2022266928A1 (en) 2021-06-24 2021-06-24 Metabolic characteristic spectrum inference method and system, and computer device and storage medium

Country Status (1)

Country Link
WO (1) WO2022266928A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116519830A (en) * 2023-04-11 2023-08-01 深圳爱湾智造科技有限公司 Genetic metabolic disease screening method, system and device based on gas chromatograph-mass spectrometer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080234948A1 (en) * 2005-07-08 2008-09-25 Metanomics Gmbh System and Method for Characterizing a Chemical Sample
CN103616450A (en) * 2013-11-29 2014-03-05 湖州市中心医院 Serum specificity metabolite spectrum for patient with lung cancer, and building method thereof
US20170206464A1 (en) * 2016-01-14 2017-07-20 Preferred Networks, Inc. Time series data adaptation and sensor fusion systems, methods, and apparatus
CN110579554A (en) * 2018-06-08 2019-12-17 萨默费尼根有限公司 3D mass spectrometric predictive classification
CN111896609A (en) * 2020-07-21 2020-11-06 上海交通大学 Method for analyzing mass spectrum data based on artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080234948A1 (en) * 2005-07-08 2008-09-25 Metanomics Gmbh System and Method for Characterizing a Chemical Sample
CN103616450A (en) * 2013-11-29 2014-03-05 湖州市中心医院 Serum specificity metabolite spectrum for patient with lung cancer, and building method thereof
US20170206464A1 (en) * 2016-01-14 2017-07-20 Preferred Networks, Inc. Time series data adaptation and sensor fusion systems, methods, and apparatus
CN110579554A (en) * 2018-06-08 2019-12-17 萨默费尼根有限公司 3D mass spectrometric predictive classification
CN111896609A (en) * 2020-07-21 2020-11-06 上海交通大学 Method for analyzing mass spectrum data based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116519830A (en) * 2023-04-11 2023-08-01 深圳爱湾智造科技有限公司 Genetic metabolic disease screening method, system and device based on gas chromatograph-mass spectrometer

Similar Documents

Publication Publication Date Title
CN108564109B (en) Remote sensing image target detection method based on deep learning
WO2022041678A1 (en) Remote sensing image feature extraction method employing tensor collaborative graph-based discriminant analysis
CN113554176B (en) Metabolic profile inference method, system, computer device, and storage medium
WO2016205286A1 (en) Automatic entity resolution with rules detection and generation system
CN110222560B (en) Text person searching method embedded with similarity loss function
Liu Multi-feature fusion for specific emitter identification via deep ensemble learning
CN110457677B (en) Entity relationship identification method and device, storage medium and computer equipment
CN111461037B (en) End-to-end gesture recognition method based on FMCW radar
CA3052846A1 (en) Character recognition method, device, electronic device and storage medium
CN102436645B (en) Spectral clustering image segmentation method based on MOD dictionary learning sampling
CN112990282B (en) Classification method and device for fine-granularity small sample images
NL2029214A (en) Target re-indentification method and system based on non-supervised pyramid similarity learning
WO2022266928A1 (en) Metabolic characteristic spectrum inference method and system, and computer device and storage medium
CN104008394A (en) Semi-supervision hyperspectral data dimension descending method based on largest neighbor boundary principle
CN111695455B (en) Low-resolution face recognition method based on coupling discrimination manifold alignment
CN115131580A (en) Space target small sample identification method based on attention mechanism
CN111896609B (en) Method for analyzing mass spectrum data based on artificial intelligence
Jin et al. A generative semi-supervised model for multi-view learning when some views are label-free
CN115579069A (en) Construction method and device of scRNA-Seq cell type annotation database and electronic equipment
Wang et al. Radar emitter classification based on a multiperspective collaborative clustering method and radar characteristic spectrum
CN114783539A (en) Traditional Chinese medicine component analysis method and system based on spectral clustering
CN105989595B (en) Multi-temporal remote sensing image change detection method based on joint dictionary learning
CN115705917A (en) Subspace learning based m 7 Method for predicting association between G and disease
Shi et al. A recognition method of learning behaviour in English online classroom based on feature data mining
Zhang et al. A Novel SAR Images Change Detection Method Based on Dynamic TUNET-CRF Model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946424

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946424

Country of ref document: EP

Kind code of ref document: A1