EP2541585A1 - Computerunterstützte Strukturidentifizierung - Google Patents

Computerunterstützte Strukturidentifizierung Download PDF

Info

Publication number
EP2541585A1
EP2541585A1 EP11005180A EP11005180A EP2541585A1 EP 2541585 A1 EP2541585 A1 EP 2541585A1 EP 11005180 A EP11005180 A EP 11005180A EP 11005180 A EP11005180 A EP 11005180A EP 2541585 A1 EP2541585 A1 EP 2541585A1
Authority
EP
European Patent Office
Prior art keywords
compounds
candidate
compound
tof
molecular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP11005180A
Other languages
English (en)
French (fr)
Inventor
designation of the inventor has not yet been filed The
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Philip Morris Products SA
Original Assignee
Philip Morris Products SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philip Morris Products SA filed Critical Philip Morris Products SA
Priority to EP11005180A priority Critical patent/EP2541585A1/de
Priority to US14/114,240 priority patent/US20140297201A1/en
Priority to CN201280032300.7A priority patent/CN103650100A/zh
Priority to PCT/EP2012/057942 priority patent/WO2012146787A1/en
Priority to EP12717751.7A priority patent/EP2710621A1/de
Publication of EP2541585A1 publication Critical patent/EP2541585A1/de
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement

Definitions

  • the present invention relates to an automated, computer-assisted method for identifying compounds according to mass spectral and chromatographic data obtained from a sample.
  • the invention relates to methods for identifying compounds using two dimensional gas chromatography-mass spectrometry (GCxGC-MS), and processes for automating the interpretation of the mass spectral and chromatographic data obtained from such a method.
  • GCxGC-MS two dimensional gas chromatography-mass spectrometry
  • Mass spectrometry is an analytical tool that can be used to determine the molecular weights of chemical compounds and of their fragments by detecting the ionized compounds and fragments according to their mass-to-charge ratio (m/z).
  • the molecular ions are generated by inducing either a loss or a gain of a charge by the chemical compounds, such as via electron ejection, protonation, or deprotonation.
  • the fragment ions are generated by collision-induced or energy-induced dissociation.
  • the resulting data are usually presented as a spectrum, a plot with m/z ratio on the x-axis and abundance of ions on the ⁇ -axis. Thus, this spectrum shows the distribution of m/z values in the population of ions being analyzed. This distribution is characteristic for a given compound. Therefore, if the sample is a pure compound or contains only a few compounds, mass spectrometry can reveal the identity of the compound(s) in the sample.
  • a complex sample usually contains too many chemical compounds to be analyzed meaningfully by mass spectrometry alone, because ionization of different chemical compounds may result in ions with the same m/z value.
  • LC liquid chromatography
  • GC gas chromatography
  • capillary electrophoresis capillary electrophoresis
  • gas chromatography is advantageously coupled with mass spectroscopy (GC-MS).
  • GC-MS mass spectroscopy
  • the chemical compounds in the sample are separated based on how long they stay in the sample separation system (column).
  • a chemical compound exits the sample separation system, it enters a mass spectrometer system, and the ionization/ion separation/detection process begins as described above.
  • the time it remains in the sample separation system before it produces signal(s) in the mass spectrum is a function of its structure and is referred to as the retention time (RT).
  • retention time is also specific to the instrument being used, and especially the column specifications in a gas chromatograph.
  • RTs of the same sample measured later may not match the RTs specified in the original chromatographic method or the computerized method files (including calibration and event tables) and can lead to misidentified peaks.
  • One solution is the "relative retention” approach which utilizes retention indices (RI) or Kovats indices (KI) that circumvent problems associated with discrepancies in RT due to instrument-to-instrument or column-to-column variation.
  • Methods to predict Kovats indices (KI) based on molecular structure and associated features are known in the art. Models which predict KI based on such factors are known as Quantitative Structure-Property Relationship (QSPR) models.
  • QSPR Quantitative Structure-Property Relationship
  • a "second dimension" of GC can be added, for instance by coupling the GC column to a second GC column (often referred to as 2DGC-MS or GCxGC-MS, and used interchangeably here with the terms GCxGC-TOF or GCxGC-TOF-MS).
  • 2DGC-MS or GCxGC-MS and used interchangeably here with the terms GCxGC-TOF or GCxGC-TOF-MS.
  • Peaks of interest are diverted from the first column into the second column for further separation, which then feeds into the mass spectrometry system.
  • GCxGC-MS relies on structural correlation with compound libraries to make identifications of unknown compounds.
  • the libraries of compounds most widely used for structural identification, such as the NIST library contain retention index information for only 9% of the compounds having mass spectral data.
  • RI or KI data allows structural assignments derived from comparison with library data to be refined.
  • the assignment in order to achieve an acceptable level of confidence in the identification of an unknown compound, the assignment must be interpreted by the user, and compared to a reference standard by mass spectrometry to confirm the proposed structure.
  • This approach has a number of disadvantages, including the need to repeat the process manually, which is inefficient; the limited size of Kovats Indices libraries; the lack of standardization, due to the need for manual intervention; all of which leads to reduced levels of confidence in the identification process.
  • mass spectral data generated by gas chromatography-electron impact ionization-mass spectrometry are compared with commercially available mass spectral data libraries ( Figure 1 ).
  • the identification has only a low confidence level.
  • a manual verification and interpretation of the mass spectral library search is carried out and the experimental retention time, or the Kovats index, is compared to database entries (e.g., NIST Retention Index library).
  • database entries e.g., NIST Retention Index library
  • a method for analysing mass spectral data obtained from a sample in two dimensional gas chromatography-mass spectrometry comprising:
  • an analytical property score is derived from the predicted value of the analytical property of a candidate compound and a measured value of the analyte.
  • the measured value of the analytical property for the analyte can be the spectral similarity value as determined by algorithms in the software provided by NIST.
  • the predicted value of an analytical property of a candidate compound is calculated according to a quantitative model based on a plurality of molecular descriptors. Accordingly, in one embodiment, the quantitative model of step (c) can be established by:
  • the genetic algorithm used in step (iv) preferably comprises
  • Candidate solutions generated by different machine learning algorithms can be compared to identify the best performing solutions.
  • a quantitative model for one or more analytical properties is performed at least once when a particular set up of a GCxGC-MS separation system (e.g., a change of column specification, temperature profile, mobile phase) or mass spectrometry system..After the quantitative models have been established for an experimental setup, it is not necessary to perform the same each time the data of an analyte generated by this particular set up is being analyzed.
  • a GCxGC-MS separation system e.g., a change of column specification, temperature profile, mobile phase
  • Exp_p measured value of the property obtained by experiments
  • pre_p predicted value of the property
  • the SEP is calculated according to the formula, using the STEXY function of Microsoft Excel 2003: 1 n - 2 ⁇ ⁇ y - y ⁇ 2 - ⁇ x - x ⁇ ⁇ y - y ⁇ 2 ⁇ x - x ⁇ 2 where x is a value of a sample, y is the predicted value of x for the sample and n is the number of samples.
  • a spectral similarity value obtained from mass spectral database comparison can be used to generate a numerical value, wherein the spectral similarity value and the analytical property score(s) are combined.
  • This numerical value is referred to herein as a match score, also referred to as the computer-assisted structure identification (CASI) score in the figures.
  • the match score is calculated using a hyperbolic equation.
  • the concept of the present invention differs from those used in currently available methods, in which analytical property values are used as a filter to select or deselect candidate compounds.
  • the highest and second-highest match scores can be compared by dividing the highest score by the second-highest to generate a discrimination function, where a greater difference between the two scores generates a higher discrimination function.
  • the higher the discrimination function the higher the confidence score that can be assigned to each query.
  • a confidence score can be calculated by multiplying the highest match score by the discrimination function value.
  • step (c) comprises predicting values of multiple analytical properties for each candidate compound.
  • a match score is derived from the spectral similarity obtained from the mass spectral database comparison, and a function of at least two analytical properties derived using a plurality of molecular descriptors.
  • a match score is derived from the spectral similarity value obtained from the mass spectral database comparison, and an analytical property score wherein the analytical property is the relative second dimension retention time derived by using a plurality of molecular descriptors.
  • Preferred analytical properties useful in the present invention include a Kovats index, a boiling point and a relative second dimension retention time (2D rel RT) index. If the predicted analytical properties used in the method of the invention comprise a Kovats index and a rel 2D RT, the Kovats Index and relative 2D retention times are preferably calculated using different molecular descriptors. Preferably, all three preferred analytical properties are used.
  • the Kovats indices of compounds are predicted using a linear equation comprising a plurality of coefficients, each multiplied by the value of a molecular descriptor.
  • the equation is preferably obtained by using a test data set and a genetic algorithm to select the molecular descriptors from a plurality of possible molecular descriptors, and a linear regression or k nearest neighbors learning algorithm to correlate the selected molecular descriptors with the value to predict.
  • the boiling points of compounds can be predicted based on experimentally determined Kovats Indices.
  • the boiling points of candidate compounds are calculated on the basis of their individual chemical structures using software packages known in the art, such as but not limited to ACD/PhysChem from ACD/Labs (Toronto, Canada).
  • the second dimension retention times are absolute second dimension retention times and there is no known available method for calculating relative 2D retention times.
  • the challenge for developing a relative model is to define a reference system that is accessible for all second dimension peaks.
  • This problem is solved by referring to a reference system based on a function of hypothetical deuterated n-alkanes.
  • Deuterated or isotopically labelled compounds are used in a reference system for controlling retention times or internal standard-based quantification.
  • the n-alkanes are preferably used as a class of substances for generating a hypothetic 2D-RT reference system because this class of compounds does not have any known complex interaction with the stationary phase in the column of the second dimension separation system.
  • the first dimension of the GCxGC-MS is separated in a non-polar environment and the second dimension is separated in a polar environment.
  • a relative second dimension retention time of a compound is advantageously calculated as a retention time relative to a hypothetical n-alkane, whose first dimension retention time is derived from the regression function based on a series of deuterated n-alkane reference standards.
  • a method for calculating a relative second dimension retention time in GCxGC-MS (2-dimensional gas chromatography coupled to mass spectrometry) for a compound comprising the steps of:
  • the quantitative model of relative second dimension retention time is established by:
  • the genetic algorithm used in this aspect of the invention comprises:
  • the relative second dimension retention times used in the first aspect of the invention are predicted by the method of the second aspect of the invention.
  • the results obtained from the computer-assisted methods of the invention based on chromatographic and mass spectral data generated by GCxGC-MS can be further enhanced by using the accurate mass data obtained from gas chromatograph-atmospheric pressure chemical ionization-mass spectrometry (GC-APCI-MS).
  • GC-APCI-MS gas chromatograph-atmospheric pressure chemical ionization-mass spectrometry
  • Data generated by the two techniques can be matched by using a duplicate retention index system based on an additional reference system of deuterated fatty acid methyl esters.
  • the invention provides methods for confirming the match of a test compound to a candidate compound identified in a database of two-dimension gas chromatography mass spectrometry.
  • the methods comprise analysis of the same sample by gas chromatography by atmospheric pressure chemical ionization and time-of-flight mass spectrometry (GC-APCI-TOF-MS, GC-APCI-TOF,or GC-APCI-MS) and comparing the theoretical monoisotopic mass with the accurate mass measured by GC-APCI-TOF-MS.
  • the prerequisite for the confirmatory method is to match the retention indices of the two different chromatographic systems.
  • FAMEs deuterated fatty acid methyl esters
  • the Kovats index systems are established by generation of a Kovats index system for GCxGC-TOF-MS system based on deuterated n-alkanes; analysis of deuterated FAMEs using the GC-GC-TOF-MS system and determination of the Kovats indices of the FAMEs; analysis of deuterated FAMEs using the GC-APCI-TOF-MS system and generation of a retention index system for GC-APCI-TOF-MS system based on deuterated FAMEs; and bridging of retention index system for GC-APCI-TOF-MS system based on deuterated FAMEs with the Kovats index system based on n-alkanes by using Kovats indeces of deuterated FAMEs for GCxGC-TOF-MS system.
  • the invention provides methods comprising the steps of:
  • step (d) is derived by linear regression for each retention time range where an analyte is detected between two adjacent reference compounds of the second set of reference compounds.
  • the method further comprises comparing the molecular masses of the analytes with the molecular masses of the respective candidate compounds for each of the analytes.
  • the method further comprises:
  • the first set of reference compounds deuterated n-alkanes.
  • the second set of reference compounds deuterated fatty acids methyl esters.
  • a high-throughput computer-assisted system for analyzing GCxGC-MS data referred to as Computer-Assisted Structure Identification (CASI) is provided in this invention.
  • the CASI system accelerates and standardizes the identification of compound structures, whilst assuring the reproducibility, and enables higher confidence for correct assignment of mass spectra to the right compounds.
  • the concept of CASI is based on several steps of spectral searches and their matches to the parameters that are predicted on-the-fly.
  • mass spectra are searched for candidate compounds and their associated match factors using an algorithm of National Institutes of Standards and Technology (NIST, Gaithersburg, MD, USA) MS Search in the NIST 08 and WILEY 9th ed. Mass Spectra databases .
  • QSPR Quantitative Structure-Property Relationship
  • Two analytic properties, Kovats indices for first dimension (1 D) separation and relative retention times for second dimension (2D) separation are predicted by using these models.
  • the Kovats indices and relative 2D RT are calculated using different molecular descriptors.
  • the boiling points of compounds are derived from the measured 1 D RT of an analyte and are matched to computationally predicted boiling points of the candidate compounds.
  • the boiling points are calculated by software known in the art, such as ACD/PhysChem software.
  • the CASI system combines the matching results of NIST MS search and all parameters predicted in QSPR models to produce a match score, also referred to as a CASI score ( Figure 2 ).
  • the discriminatory power is calculated for each identified compound to measure confidence of the assignment.
  • the proposed chemical structure is confirmed by GC-APCI-TOF.
  • salts are stripped from the compounds' structures using a predefined list, largest fragments are kept, bases are deprotonated and acids are protonated, charges of functional groups are standardized, hydrogens are added, canonical tautomers are generated, and 2D coordinates are generated. Then the duplicate structures are removed.
  • RapidMiner 5 Rapid-I GmBH, Dortmund, Germany.
  • Other similar data mining software platform known in the art can also be used.
  • Several molecular descriptor selection experiments using forward selection and a genetic algorithm were tried. The performance of forward selection is acceptable, but this method has the inconvenience of a fall in local minima. Stochastic methods like genetic algorithms generally perform better. For this reason, genetic algorithms are used to select molecular descriptors.
  • chromosome contains a predefined number of "genes”, and each gene codes for a descriptor. Generally, we select between 2 and 15 descriptors. The genes are not binary, but contain the position of the corresponding descriptor in a list. This allows using a minimum number of descriptors.
  • the fitness function set the subset of descriptors in the "Select Attributes” nodes of the RapidMiner process, executes it, and gets the root mean squared error of the training set as the fitness score. Mutation rate was set to 0.1, the number of chromosomes per generation was set to 20 to 40, preferably 30 and the number of generation was set to 100 to 300, preferably 200. The two best chromosomes survive at each generation.
  • data preparation is constituted of a node which selects a subset of attributes, normalization with Z-transformation, separation of data set into training test (75%) and test set (25%). Then a linear regre ssion is applied on the training set, the learned model is applied on both training set and test set. In addition leave-one-out cross validation on training set was carried out.
  • Various different learning algorithms are used to build the models for prediction of KI and relative second dimension retention time.
  • Various learning algorithms were used, such as but not limited to k-Nearest Neighbors (k-NN), Multi Linear Regression (MLR) and Support Vector Regression (SVR). For each learning algorithm, from 2 to 15 descriptors were used to generate the models. At the end of the modeling run, the best model is kept for each value to predict. This process is described in Figure 3 .
  • Table 6 Descriptors used in the GCxGC-TOF second column retention time model Descriptors Description AMW Average molecular weight. MSD Mean square distance index (Balaban). BLI Kier benzene-likeness index. PW5 Path/walk 5 - Randic shape index. ICR Radial centric information index. piPC04 Molecular multiple path count of order 4. X0Av Averaqe valence connectivity index chi-0. AAC Mean information index on atomic composition. ATS5m Broto-Moreau autocorrelation of a topological structure - lag 5 / weighted by atomic masses. GATS2v Geary autocorrelation - lag 2 / weighted by atomic van der Waals volumes.
  • BEHe1 Highest eigenvalue n. 1 of Burden matrix / weighted by atomic Sanderson electronegativities
  • F06[Si-Si] Frequency of Si-Si at topological distance 6.
  • F09[C-O] Frequency of C-O at topological distance 9.
  • F10[C-Si] Frequency of C-Si at topological distance 10.
  • Scores are calculated from spectral similarity value, (in this example, the NIST MS Search match factor), predicted KI, predicted second dimension relative retention time of the GCxGC-TOF and the predicted boiling point, using a hyperbolic equation.
  • the general principle is based on similarity of experimental MS to library MS multiplied by analytical property scores derived from each analytical property (KI, BP ).
  • the analytical property scores (KIFIT, BPFIT%) are normalized from 0 (no similarity) to 1 (perfect match).
  • the candidate compounds are ranked according to decreasing CASI scores.
  • CASI score is calculated according to the above-described equation. The hit with the highest value is selected by default.
  • each of the three analytical property scores has four parameters. However, only n x has to be established which defines at which value the hyperbolic curve crosses the X axis. n x is contributing to the shape of the hyperbolic curve, and then to the weight of each analytical property score in the final CASI score.
  • a grid search procedure is provided to establish optimal values for n KI , n 2DrelRT and n BP ⁇
  • a solution's score is generated by using every possible combination of integer values between 1 and 50 for each of n KI , n 2DreIRT and n BP .
  • the solution's score is the number of correct hits sorted first for training set and test set. The solution with the highest number of correct hits is selected. The algorithm can be described as follow:
  • n KI , n 2DrelRT and n BP parameters will be used in the final validation step of the configuration in CASI.
  • CASI score An illustrative example of the advantage of the CASI score is the hentriacontane, which is sorted in 20th position with NIST MF but sorted in 2nd position with CASI score, because of the accurate prediction of the KI.
  • Another example presented in Figure 8 is Geranylgeraniol which shows clearly that CASI score gives a better discriminatory power than NIST Match Factor.
  • CASI score as well as NIST Match Factor rank the correct hit in first position, but CASI Score gives a much higher discriminatory power.
  • the results obtained from the CASI system can be confirmed by the use of GC-APCI-TOF-MS.
  • a sample comprising analytes are combined with deuterated n-alkanes and deuterated fatty acids methyl esters, divided into two aliquots.
  • the other aliquot is analyzed in a GC-APCI-MS wherein the absolute retention time of the FAMEs are determined.
  • the deviation of Kovats Index was found to be less than 1% between both systems and the mass deviation was found to be less than 1 mDa for the GC-APCI-TOF-MS.
  • the ability to confirm proposed structures using accurate masses measured by GC-APCI-TOF-MS was tested.
  • the method is used to confirm the proposed structures of 155 compounds present in cigarette smoke. 120 of the 155 compounds are ionizable in the GC-APCI-TOF-MS. 106 compounds are detected within the retention time index window and 85 compounds are confirmed automatically.
  • Figure 10 is a block diagram of a computer system for analysing mass spectral data in GCXGC mass spectrometry.
  • the system includes a web interface 1000, a match score generator engine 2100, a structural candidate search engine 2200 which accesses a structural candidate database 2210, a descriptor selection and model generation engine 2300 and a descriptor computation engine 2400.
  • the system further includes a chemical structure generator 3100 which accesses a name-to-structure database 3200.
  • the components of the system may be software applications operating on a single server or may be distributed over multiple computing systems communicating via network interfaces including wireless communication systems.
  • the match score generator engine 2100, structural candidate search engine 2200, descriptor selection and model generation engine 2300 and descriptor computation engine 2400 are interconnected software applications operating on a match score server 2000, on which structural candidate database 2210 is also stored.
  • the chemical structure generator 3100 and name-to-structure database 3200 operate on a second server 3000, although they may also operate on match score server 2000.
  • Input data 100 is input via web interface 1000.
  • Input data may in the form of a JDX file, and comprises mass spectra data from a sample, and further include experimental values for analytical properties such as Kovats index data, boiling point data and 2D retention time data.
  • the web interface 1000 may communicate with the match score generator engine 2100 via a SOAP (Simple Object Access Protocol).
  • the computer system operates in two modes, a training mode and an analysis mode.
  • the training mode may be run at any time, but it is necessary to run the computer system in training mode every time the mass spectrometer experimental set up is changed.
  • the input data are mass spectrometer data and measured values of an analytical property such as Kovats index, for a set of known compounds.
  • the chemical structure in computer readable form is generated by the chemical structure generator 3100 which accesses the name-to-structure database 3200.
  • the chemical structure generator 3100 may be Pipeline Pilot 7.5.1 software, and the database 3200 may be an ACD database.
  • molecular descriptors are calculated by descriptor computation engine 2400, which may be the Dragon software package.
  • the known compounds are divided into a training set and a test set.
  • descriptor selection and model generation engine 2300 which may be RapidMiner software, selects a set of predictive descriptors using forward selection and a genetic algorithm as described in detail above to construct a predictive model for predicting values of an analytical property, such as Kovats indices or 2D retention time, for the training compound structures.
  • the predicted model is verified using the test set, as described in more detail above, and a model is selected.
  • the input data 100 is mass spectrometry data from a sample.
  • the structural candidate search engine 2200 carries out a search in structural candidate database 2210 by comparing the mass spectra data from the sample with mass spectra data in the database 2210, to generate a number of structural candidate compounds based on similarity of the mass spectra data with the data in the database 2210.
  • the selected candidate compounds may be, for example, the top 100 matches.
  • the search engine may be an NIST MS search algorithm, and the database 2210 may be the NIST 08 and WILEY 9th ed Mass Spectra databases.
  • the list of structural candidates is made available for the user to view via web interface 1000.
  • Each candidate has a match factor indicative of the similarity of the mass spectra data for the sample with the data in the database 2210 for the candidate.
  • the match factor is generated by the structural candidate search engine 2200, and may also be displayed to the user via the web interface 1000 for each structural candidate.
  • the chemical structure in computer readable form is generated by the chemical structure generator 3100 which accesses the name-to-structure database 3200.
  • the chemical structure generator 3100 may be Pipeline Pilot 7.5.1 software, and the database 3200 may be an ACD database.
  • molecular descriptors are calculated by descriptor computation engine 2400, which may be the Dragon software package.
  • the model generated by the descriptor selection and model generation engine 2300 in the training mode is then used to predict the analytical property, such as Kovats index or 2D retention time, for the candidate structures.
  • the descriptor selection and model generation engine 2300 supplies the model to the match score generator engine 2100 which calculates predicted values of one or more analytical properties based on the model.
  • the predicted values may be communicated to the user via web interface 1000.
  • the match score generator engine 2100 calculates a match score for each candidate compound based on the match factors generated by the structural candidate search engine 2200, the predicted values of the analytical properties predicted by the model provided by the descriptor selection and model generation engine 2300, and measured values of the analytical properties of the sample which were included in input data 100.
  • the match score generator engine 2100 may calculate a CASI score in accordance with the method described above.
  • the match scores may also be communicated to a user via web interface 1000.
  • the web interface 1000 may display the results to the user in the form of a table, listing the structural candidates, the match factors generated by the structural candidate search engine 2200, the predicted values of the analytical properties generated by the model generation engine 2300, and the match score.
  • the table may be sorted to rank the structural candidates by their match scores.
  • the descriptor selection and model generation engine 2300 supplies the selected model to the match score generator 2100, which, in the analysis mode, applies the model to the structural candidates to generate predicted values for the analytical property. In this way, in the analysis mode, access to the descriptor selection and model generation engine 2300 is not required. Access to the descriptor selection and model generation engine 2300 is only required in the training mode for generation of a new model.
  • the descriptor selection and model generation engine 2300 may thus be provided on a separate computing device eg server which is only accessed in the training mode.
  • FIG. 12 A preferred embodiment of the software architecture is illustrated in Figure 12 .
  • Oracle Application Express is used for the development of the web interface 1000.
  • a SOAP interface allows Oracle Application Express to communicate with the match score generator engine 2100, which is developed in Java and runs in Tomcat. RapidMiner is used as the descriptor selection and model generation engine 2300 and is integrated by Java API. Java is used to implement the match score generator engine 2100 mainly because RapidMiner can be easily integrated in Java.
  • the structural candidate search engine 2200 comprises NIST MS Search and is integrated by command line.
  • the chemical structure generator 3100 is Pipeline Pilot and is integrated with Java API. It is used to convert names of the hits to structures (using ACD/Labs name-to-structure and an internet connection to ChemBL), to standardize the structures, to compute boiling point (ACD/Labs PhysChem Batch) and to move data from CASI to a chemical registry database.
  • the descriptor computation engine 2400 comprises Dragon and is integrated by command line.
  • the standard Java APIs Log4J is used for logging error messages
  • Hibernate is used for the mapping of the objects to the Oracle database
  • JUnit is used for the unit tests.
  • Figures 13 and 14 illustrate outputs of the web interface 1000.
  • all compounds to identify are presented with the structure candidate having the best score ( Figure 13 ).
  • Structure candidates can be browsed and selection can be changed ( Figure 14 ).
  • Each structure candidates (Hits) for compound to identify (Query, in this case 1-Pentene, 2,3-dimethyl) are listed with predicted properties. The one with the best score is selected by default. User can change the selection and add comments which will be inserted with the selected structure into a chemical registration system.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
EP11005180A 2011-04-28 2011-06-27 Computerunterstützte Strukturidentifizierung Ceased EP2541585A1 (de)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP11005180A EP2541585A1 (de) 2011-06-27 2011-06-27 Computerunterstützte Strukturidentifizierung
US14/114,240 US20140297201A1 (en) 2011-04-28 2012-04-30 Computer-assisted structure identification
CN201280032300.7A CN103650100A (zh) 2011-04-28 2012-04-30 计算机辅助结构识别
PCT/EP2012/057942 WO2012146787A1 (en) 2011-04-28 2012-04-30 Computer-assisted structure identification
EP12717751.7A EP2710621A1 (de) 2011-04-28 2012-04-30 Computerunterstützte strukturidentifizierung

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP11005180A EP2541585A1 (de) 2011-06-27 2011-06-27 Computerunterstützte Strukturidentifizierung

Publications (1)

Publication Number Publication Date
EP2541585A1 true EP2541585A1 (de) 2013-01-02

Family

ID=44720467

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11005180A Ceased EP2541585A1 (de) 2011-04-28 2011-06-27 Computerunterstützte Strukturidentifizierung

Country Status (1)

Country Link
EP (1) EP2541585A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11237101B2 (en) 2017-08-30 2022-02-01 Mls Acq, Inc. Local and global peak matching

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
GARJANI-NEJAD, JOURNAL OF CHROMATOGRAPHY A, vol. 1028, 2004, pages 287 - 296
MIHALEVA ET AL., BIOINFORMATICS, vol. 6, 2009, pages 787 - 794
ROBERTO TODESCHINI, VIVIANA CONSONNI: "Series of Methods and Principles in Medicinal Chemistry", vol. 41, 2009, WILEY - VCH, article "Molecular Descriptors for Chemoinformatics"
SEELEY, SEELEY, JOURNAL OF CHROMATOGRAPHY A, vol. 1172, 2007, pages 72 - 83
V. V. MIHALEVA ET AL: "Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index", BIOINFORMATICS, vol. 25, no. 6, 28 January 2009 (2009-01-28), pages 787 - 794, XP055020002, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btp056 *
VENKATRAMANI, PHILLIPS, J. MICROCOLUMN SEP., vol. 5, 1993, pages 511 - 516
W ECKEL: "Use of boiling point-Lee retention index correlation for rapid review of gas chromatography-mass spectrometry data", ANALYTICA CHIMICA ACTA, vol. 494, no. 1-2, 8 October 2003 (2003-10-08), pages 235 - 243, XP055019998, ISSN: 0003-2670, DOI: 10.1016/j.aca.2003.08.003 *
YAPING ZHAO ET AL: "A method of calculating the second dimension retention index in comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry", JOURNAL OF CHROMATOGRAPHY A, vol. 1218, no. 18, 1 May 2011 (2011-05-01), pages 2577 - 2583, XP055020277, ISSN: 0021-9673, DOI: 10.1016/j.chroma.2011.02.072 *
Z BAYAT ET AL: "QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIP (QSPR) STUDY OF KOVATS RETENTION INDICES OF SOME OF ADAMANTANE DERIVATIVES BY THE GENETIC ALGORITHM AND MULTIPLE LINEAR REGRESSION (GA-MLR) METHOD", PETROLEUM & COAL, vol. 53, no. 2, 12 July 2011 (2011-07-12), pages 132 - 140, XP055020330, ISSN: 1337-7027 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11237101B2 (en) 2017-08-30 2022-02-01 Mls Acq, Inc. Local and global peak matching
US11300503B2 (en) * 2017-08-30 2022-04-12 Mls Acq, Inc. Carbon ladder calibration
US11680894B2 (en) 2017-08-30 2023-06-20 Mls Acq, Inc. Local and global peak matching

Similar Documents

Publication Publication Date Title
US20140297201A1 (en) Computer-assisted structure identification
Scheubert et al. Computational mass spectrometry for small molecules
US9312110B2 (en) System and method for grouping precursor and fragment ions using selected ion chromatograms
US20070095757A1 (en) Methods and systems for the annotation of biomolecule patterns in chromatography/mass-spectrometry analysis
WO2019240289A1 (ja) 化合物の構造を同定するための方法およびシステム
JP2009500617A (ja) 化学試料を特徴づけるシステムおよび方法
JP6110380B2 (ja) クロマトグラフィ保持指標を利用した化学的同定
Godzien et al. Metabolite annotation and identification
EP3803936B1 (de) Identifizierung von chemischen strukturen
Barnes Overview of experimental methods and study design in metabolomics, and statistical and pathway considerations
CN114283877A (zh) 一种建立代谢物模型及其代谢组学数据库的方法
Wu et al. A new estimation of protein-level false discovery rate
US20230251224A1 (en) Method and system for identifying structure of compound
EP2541585A1 (de) Computerunterstützte Strukturidentifizierung
Hogan et al. Charge state estimation for tandem mass spectrometry proteomics
Zhou Computational analysis of LC-MS/MS data for metabolite identification
Moritz et al. The Potential of Ultrahigh Resolution MS (FTICR‐MS) in Metabolomics
Goodenowe Metabolomic analysis with Fourier transform ion cyclotron resonance mass spectrometry
Cooper et al. An assessment of AcquireX and Compound Discoverer software 3.3 for non-targeted metabolomics
Neumann et al. Mass Spectrometry Data Processing
JP7327431B2 (ja) 質量分析データの解析方法、プログラム及び質量分析データの解析装置
Price Optimising the statistical pipeline for quantitative proteomics
JP2023539812A (ja) 複雑な混合物の小分子成分を決定するための方法並びに関連する装置及びコンピュータプログラム製品
LaMarche Methods for comparing metaproteomic data in the absence of metagenomic information
Li et al. Mono-isotope prediction for mass spectra using Bayes network

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20121030