CN111610281A - Cloud platform framework based on gas chromatography-mass spectrometry library identification and operation method thereof - Google Patents

Cloud platform framework based on gas chromatography-mass spectrometry library identification and operation method thereof Download PDF

Info

Publication number
CN111610281A
CN111610281A CN202010677060.2A CN202010677060A CN111610281A CN 111610281 A CN111610281 A CN 111610281A CN 202010677060 A CN202010677060 A CN 202010677060A CN 111610281 A CN111610281 A CN 111610281A
Authority
CN
China
Prior art keywords
data
cloud platform
identification
gcms
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010677060.2A
Other languages
Chinese (zh)
Other versions
CN111610281B (en
Inventor
熊行创
刘震
何文魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xingjian Proshi Technology Co ltd
Original Assignee
Beijing Xingjian Proshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xingjian Proshi Technology Co ltd filed Critical Beijing Xingjian Proshi Technology Co ltd
Priority to CN202010677060.2A priority Critical patent/CN111610281B/en
Publication of CN111610281A publication Critical patent/CN111610281A/en
Application granted granted Critical
Publication of CN111610281B publication Critical patent/CN111610281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a gas chromatography-mass spectrometry library identification cloud platform system and an operation method thereof. The cloud platform system consists of a client and a cloud platform. The cloud platform comprises a network exit gateway, a Web application service interface, Web background service, GCMS data identification, reference data/method updating, reference data/method downloading and a database; the client comprises the GCMS combination instrument and the application program/Web application service. The user provides a mass spectrum data file needing spectrum base identification from a mass spectrum data source, the data is stored in a test database after the analysis method is configured, GCMS data identification is carried out in a cloud platform, and the result is quickly returned. The platform upgrades the verified self-built method and data into reliable reference method and data, and the identification spectrum library of the GCMS is expanded by using continuously increased reference test methods and standard reference data and is provided for users to use on a cloud platform.

Description

Cloud platform framework based on gas chromatography-mass spectrometry library identification and operation method thereof
Technical Field
The invention relates to a cloud platform framework for gas chromatography-mass spectrometry spectral library identification and an operation method thereof, in particular to a system framework design scheme and a system operation method for cloud spectral library identification according to test data of a gas chromatography-mass spectrometry standard test method.
Background
The mass spectrometry is an analysis method in which particles (atoms, molecules) of a substance are ionized into ions, and they are separated in terms of their mass-to-charge ratios in terms of spatial positions, time sequences, etc. by an appropriate stable or changing electric field or magnetic field, and their intensities are detected to perform qualitative and quantitative analyses. The mass spectrometry directly measures the material particles and has the characteristics of high sensitivity, high resolution, high flux and high applicability, so that the mass spectrometry technology plays a significant role in modern scientific technology.
The abscissa of the mass spectrometry data is a mass-to-charge ratio, and the ordinate is data representing absolute intensity or relative intensity, which belongs to data with rich information content and high resolution.
The analysis method of gas chromatography and mass spectrometry is called gas chromatography-mass spectrometry (GC-MS) for short. Gas chromatography has effective separation and resolution capability on organic compounds, and mass spectrometry is an effective means for accurately identifying the compounds. The combined chromatographic-mass spectrometry technology can directly separate complex mixture (such as crude oil and rock extract) samples by gas chromatography under the control of a computer, so that the compounds in the samples enter an ion source of a mass spectrometer one by one, all the compounds in each sample are ionized, and the mass spectrum of the compounds is obtained. By utilizing the characteristics of high separation efficiency of chromatography and high-sensitivity detection and qualitative performance of mass spectrum, a better analysis effect is obtained. The method is widely applied to the determination of pesticide residues, automobile exhaust and the like in samples with complex analysis components, such as environmental samples.
Gas chromatography-mass spectrometry is an authoritative standard method in many fields, and a national standard method is established in many fields such as environmental detection, food safety and the like. The GC-MS standard method defines the sample introduction, the chromatographic column type specification of GC, the operation parameters of GC and MS, the chromatographic peak time of a compound, the mass spectrum identification characteristic ion and other information. And storing the chromatogram and mass spectrum of the reference standard in a database to obtain a reference standard spectrum library. According to the standard method, unknown organic matters can be analyzed and tested, and the retention time of the obtained GC-MS spectrogram in a chromatogram and the characteristic peak of a mass spectrum are respectively compared with data in a reference standard spectrum library, so that accurate qualitative judgment is obtained, namely the identification method based on the chromatographic mass spectrum library.
Identification methods based on mass chromatography libraries have numerous advantages, but face several problems:
firstly, standard chromatogram mass spectrum library data is authoritative. However, the standard reference method and the reference spectrum can be identified as the standard reference method and the spectrum only by requiring at least more than two laboratories and more than two different manufacturers to obtain repeated and reliable results, so that the method is time-consuming and labor-consuming. Therefore, the standard reference method in one field is developed, so that the investment is large and the period is long. Because of such stringent requirements and high cost, there are fewer standard reference spectra libraries. Meanwhile, the unit selling price of the standard spectrum library is high, and for non-high-frequency users, the average cost for searching the spectrum library at a time is high and difficult to bear, so that the audience population and the income expansion of the data of the standard spectrum library are limited.
And secondly, in addition to the disclosed standard reference spectrum library, reference data are formed by a plurality of laboratory self-establishing reference methods to form a self-establishing spectrum library, the self-establishing spectrum libraries cannot be mutually applied among different laboratories in the same field due to technical reasons to form repeated construction, and other losses are difficult to find in some self-establishing spectrum libraries with errors.
And thirdly, instrument suppliers of GCMS provide spectral library retrieval software which can provide the function of self-establishing a spectral library. According to investigation, the cloud platform is not found to be convenient for each user to share the self-established spectrogram, and the function of conveniently checking the quality of the self-established spectrogram is not provided. Therefore, it is difficult to upgrade the self-built method data spectrogram scattered in each laboratory into a reliable reference method spectrogram through efficient verification.
How to enable mass spectrometry testers to efficiently use standard reference data based on a chromatographic mass spectrometry library at low cost, encourage laboratories with conditions and capabilities to provide self-construction methods and data, verify self-construction methods and data provided by other laboratories, efficiently expand high-quality standard reference data items based on the chromatographic mass spectrometry library, avoid repeated construction, and expand economic benefits and social benefits (more and more people) of standard reference data? This is a difficult problem that is expected to be solved by the mass spectrometry testers in the industry.
Disclosure of Invention
In order to solve the above problems, the present invention aims to provide a cloud platform system and an operation method for identifying a gas chromatography-mass spectrometry library, which solve the difficult problem of low-cost and high-efficiency expansion of a standard reference database in a cloud platform construction manner, compared with the current single-version solution of gas chromatography-mass spectrometry library identification software, thereby realizing significant increase in the number and quality of data in the standard reference database and the number of users.
A cloud platform system based on gas chromatography-mass spectrometry spectral library identification comprises a client and a cloud platform,
the client comprises:
GCMS combined instrument: collecting data of a sample;
application/Web application service: calling an original data file provided by the GCMS and uploading the original data file to a cloud platform;
the cloud platform includes:
web application service interface: the Web front-end program service is undertaken, and the Web front-end program service comprises data/operation interaction, user management, user service data management, analysis reports and result display;
web background: processing a front-end request and service realization of Web application service;
reference data/method quality verification program module: carrying out validity check, parameter merging and attribute binding on the original data file;
GCMS data identification: receiving GCMS original data sent by a Web background, and calling standard library data of a reference database to judge the data; calling a test database to store and manage data and calling GCMS data; returning the identification result to the Web background;
reference method library: storage and management for GCMS reference method;
and (3) referencing a database: the device is used for storing and managing GCMS reference data;
testing a database: the method is used for storing and managing GCMS test data.
An operation method of a cloud platform system based on gas chromatography-mass spectrometry spectral library identification is characterized in that a reference method library and a reference database are established on a cloud platform, and then data identification is carried out on a sample to be detected by calling method files and data files in the reference method library and the reference database.
Preferably, the step of establishing the reference method library and the reference database on the cloud platform specifically comprises the following steps:
s201: a user acquires data of a standard sample through a GCMS instrument and uploads a data file of the acquired standard sample and a corresponding instrument method file to a Web application service interface through an application program/Web application service;
s202: the Web application service interface delivers the data files and the instrument method files to a reference data/method quality verification program module through a Web background, and the reference data/method quality verification program module carries out validity check, parameter merging and attribute binding on the uploaded data files and method files;
s203: classifying and managing the uploaded data files and method files, storing the classified data files into a reference database, and storing the method files into a reference method library;
preferably, the data identification of the sample to be tested by calling the method files and the data files in the reference method library and the reference database specifically comprises the following steps:
s204: a user requests a cloud platform to download a corresponding instrument method file according to a sample to be tested;
s205: the cloud platform returns a corresponding instrument method file to the user according to the request;
s206: a user acquires original data of a sample to be detected by applying a downloaded instrument method file through a GCMS (general packet radio service) instrument and generates a mass spectrum data file; uploading the mass spectrum data file to a Web application service interface through an application program/Web application service and completing service realization through a Web background;
s207: carrying out filtering, bad value elimination and peak detection on the mass spectrum data file through data preprocessing;
s208: performing GCMS data identification on the preprocessed mass spectrum data file for storage management and identification;
s209: outputting an identification result data set through a Web background;
s210: and the Web background returns the identification result to the application program/Web application service through the Web application service interface for the user to look up.
Preferably, in step S208, reference data matched with the mass spectrum data file is retrieved from the reference database according to the mass spectrum data of the mass spectrum data file and the additional attribute parameters, and an identification result is obtained according to the matching degree with the reference data.
Preferably, the identification result data set in step S209 includes a forward similarity, a backward similarity and a likelihood value of the mass spectrum data file and the reference data file, and the likelihood value is not greater than 100%.
Preferably, the authentication results obtained by the user through the application/Web application service are displayed as a legend, report, and authentication report.
Preferably, the authentication result obtained in step S208 is uploaded to a test database for storage.
Preferably, the method further comprises updating the reference data and the reference method, and comprises the following steps:
step S301, a user releases self-built GCMS reference data and a reference method into to-be-verified reference data and uploads the to-be-verified reference data to a cloud platform through an application program/Web application service;
and step S302, the Web application service interface stores the received reference data to be verified into a reference database and a reference method library through a Web background for other users to download and verify.
Step S303, other users download corresponding reference data to be verified from a reference database and a reference method library of the cloud platform for data verification;
and S304, after more than two users pass the verification of a certain reference data and a reference method in the to-be-verified reference data, randomly extracting test cases from a corresponding reference database and a corresponding reference method library by the cloud platform to perform verification test on the to-be-verified reference data.
And S305, the cloud platform marks the to-be-verified reference data passing the verification test as reliable reference data, and the updating of the reference data and the reference method is completed.
Preferably, the following steps are further included after step S305:
and S306, returning an identification result to the user who issues the reference data to be authenticated.
Compared with the prior art, the invention has the following beneficial effects:
the method and the data are built by self by continuously increasing the network advantages and the operation modes (operation modes) of the cloud platform, different laboratory methods and data are fused to efficiently rise to be reference methods and data, the quality of the reference methods and the data is continuously optimized, audiences and profits are expanded, and the requirements of mass spectrometry testers in the industry on high efficiency and low cost based on the identification of the chromatographic mass spectrometry spectrum library are met.
Drawings
FIG. 1 is a schematic diagram of the architecture of a cloud platform system based on chromatography-mass spectrometry library identification according to the present invention;
FIG. 2 is a schematic flow chart of a cloud platform operation method based on chromatography-mass spectrometry library identification according to the present invention;
FIG. 3 is a schematic diagram of a reference data service business process of a cloud platform operation method based on chromatography-mass spectrometry library identification according to the present invention;
FIG. 4 is a schematic diagram of a cloud platform service function identified based on a chromatography-mass spectrometry library.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings 1 to 4 and the following detailed description.
As shown in fig. 1, the cloud platform system identified based on the chromatography-mass spectrometry library is divided into two major structures, namely a client 11 and a cloud platform 10. Wherein the cloud platform part includes: the system comprises a network security layer 101, a Web application service interface 102, a Web background service 103, a reference data/method quality verification program module, GCMS data authentication 1041, reference data/method updating 1042, reference method downloading 1043, a test database 1053, a reference number database 1052 and a reference method database 1051; the client portion contains the GCMS spectrometer 111 and the application/Web application service 112.
The functions of the structures are explained below:
a client:
GCMS combined instrument 111: the GCMS gas chromatography-mass spectrometer is used for realizing data acquisition of a sample (based on an instrument method). The system can support the native combined system or the integrated combined system of various gas chromatography and mass spectrometry manufacturers (such as Agilent, Thermo, Panro and the like) at home and abroad at present.
application/Web application service 112: the user is provided with a browser or client mode based application based on different application scenarios or user preferences. Uploading of the collected data/instrument method of the GCMS instrument 111 is realized; downloading the instrument method of the GCMS combined instrument 111; a user interaction interface; and the service functions of showing the identification result and the like.
Cloud platform:
the egress gateway 101: including firewalls (access/service port control, port mapping), reverse proxies (controlling requests by the browser/client programs (1121 and 1122) and data forwarding by the Web service 102), load balancing. The security and the system stability of the cloud platform internet access are realized through the functions and the mechanisms.
Web application service interface 102: the main responsibility of the Web front-end program service includes data/operation interaction, user management (registration/login authentication), user service data management (GCMS authentication data, reference method data and the like), authentication results and the like.
Web application background 103: implementing Web application service processing (processing front-end requests and service implementation of Web application service 102); the direction of the solution is as follows; controlling a service process; interactive control among background service functions (data authentication, reference data method updating/downloading) and the like.
Reference data/method quality verification program module (not shown in the figure): carrying out validity check, parameter merging and attribute binding on the original data file;
GCMS data identification 1041: for the authentication of the GCMS data (receiving GCMS original data sent by the Web background 103, calling standard database data of the reference database 1052 to judge the data); storage management of GCMS data (calling the test database 1053 to store, manage, and call data); return of authentication result (return of authentication result to Web backend 103).
Reference data/method update 1042: the method is used for updating GCMS reference data (GCMS standard library data files) and reference methods (instrument method files corresponding to the collected GCMS standard library data files) (calling the reference method library 1051 and the reference database 1052 to add and update data).
Reference method download 1043 for downloading of GCMS reference methods (instrument method files) (calling reference method library 1051 for data query, reading, downloading).
Reference method library 1051: the method is used for storing and managing GCMS test data.
Reference database 1052: the method is used for storing and managing GCMS reference data (standard library).
Test database 1053: the method is used for storing and managing the GCMS reference method (instrument method).
The reference method library 1051, the reference database 1052 and the test database 1053 adopt a cloud storage system, and the cloud storage system adopts HDFS, Hive and MYSQL databases of Hadoop. HDFS (Hadoop distributed File System): the Hadoop Distributed File System (HDFS) is designed to fit distributed file systems running on general purpose hardware (comfort hardware). The system is a high-fault-tolerance system, can provide high-throughput data access, and is very suitable for application on large-scale data sets. Hive: the data warehouse tool based on Hadoop can map the structured data file into a database table and provide a query function similar to SQL. The method has three characteristics that (i) the method is extensible, Hive can freely extend the scale of the cluster, and restarting service is not needed in general. And the extensibility supports the user to define the function, and the user can realize the function according to the requirement. And thirdly, fault tolerance and good fault tolerance are realized, and the SQL can still be executed when the node has a problem.
Test data (GCMS certification data files), reference data (GCMS standard library data files), reference methods (GCMS instrumentation method files) are stored in HDFS, MYSQL is used to store platform system/configuration data, and HVIE integration metadata (HDFS, MYSQL) provides data warehouse support.
Fig. 2 is a schematic diagram of an operation data flow of the cloud platform identified based on the chromatography-mass spectrometry library.
The cloud platform operation flow can be generally described as: a user firstly carries out data acquisition on a standard sample through a GCMS (general packet system MS) instrument and uploads the data of the acquired standard sample and a corresponding instrument method file to a reference database (standard library) and a reference method library (instrument method library) of a cloud platform; when a user needs to detect and identify an object to be detected (unknown sample), a corresponding reference method (instrument method) can be downloaded from the cloud platform (according to a corresponding target object range or standard). The GCMS uses a downloaded instrument method to collect data of an object to be tested (unknown sample) (to form original test data); uploading test original data (files) to a cloud platform, and after data preprocessing (filtering, bad value elimination, peak detection and the like) is carried out on the original data, carrying out storage management and identification on the data (carrying out standard library retrieval and matching according to retention time and a mass spectrogram of the data) through GCMS data identification service; the cloud platform outputs an identification result (such as the forward similarity, the reverse similarity, the probability and the like of the object to be detected and a standard object (known sample) in a reference database (standard library), a probability value (maximum value 100%) is one of the marking parameters which can be judged to be a certain substance, and when the probability value is greater than a certain threshold (such as 90%), the probability value is judged to be true, and the identification result is calculated according to the forward similarity and the reverse similarity) and returns the identification result to the user in a visual form.
The following detailed description of the various process steps:
and S201, a user acquires data of the standard sample through the GCMS and uploads a data file (reference data) of the acquired standard sample and a corresponding instrument method file (reference method).
Inter-step interaction or data flow:
s201 to S202, mass spectrum data files, instrument method files/method parameters.
And S202, the reference data/method quality verification program module carries out validity check, parameter merging, attribute binding and the like on the uploaded reference data file and reference method file.
Inter-step interaction or data flow:
and S202 to S203, verifying the mass spectrum data file and the instrument method file/method parameter.
And S203, classifying and warehousing the uploaded reference data files and reference method files and managing data.
Inter-step interaction or data flow:
s203 to reference method library 1051 Instrument method document/method parameter.
S203 to a reference database 1052, mass spectrum data file.
Step S204, when the user needs to detect and identify the object to be detected (unknown sample), a corresponding reference method (instrument method) is downloaded from the platform (according to the corresponding target object range or standard), wherein the 'corresponding' is the instrument method for selecting the corresponding standard according to the detection and identification of which type of sample is to be made, such as drugs, water quality and food safety).
Inter-step interaction or data flow: s203 to the reference method library 1051, a reference method (instrument method) download request.
And S205, the platform returns the reference method corresponding to the downloading request to the user.
Inter-step interaction or data flow: reference method libraries 1051 to S205 reference methods (instrument method files).
And S206, uploading test original data (a mass spectrum data file generated after data acquisition is carried out on the object to be tested by applying a downloaded reference method) by the user.
Inter-step interaction or data flow: and S206 to S207, mass spectrum data files.
And step S207, carrying out data preprocessing (filtering, bad value elimination, peak detection and the like) on the original test data.
Inter-step interaction or data flow: and S207 to S208, namely the mass spectrum data file after data preprocessing.
And S208, performing storage management and identification on the test data, such as standard library retrieval and matching according to the retention time and the mass spectrogram of the data.
Inter-step interaction or data flow:
reference database 1052 to S208 standard library data (mass spectrum data, additional attribute parameters: retention time, etc.).
S208 to test database 1053 test data (Mass Spectrometry data File)
S208 to S209, the result data sets (forward similarity, reverse similarity, likelihood) are identified.
And S209, outputting the identification result (tissue).
Inter-step interaction or data flow:
s209 to S209, the result data set and the style data are identified.
And step S210, returning the appraisal result to the user in a visual form (legend, report and appraisal report).
Fig. 3 is a schematic diagram of a service flow of the cloud platform operation reference data service based on the identification of the chromatography-mass spectrum library.
The cloud platform reference data service business may be described as: the user can release the self-established GCMS reference data (standard library) and the reference method as the reference data to be verified. Other users can download corresponding reference methods for data verification and provide verification results for the data to be verified. The user of the data party to be verified can complete and correct the reference data and the reference method through the verification feedback data of other users. When a certain reference data and a certain reference method are verified to pass a certain number (more than two) of users, the platform randomly extracts test cases from the corresponding spectrum library data to perform verification test on the test cases. The platform upgrades the self-built method and data subjected to the verification test into reliable reference method and data. After the data verification is passed, the parameter data is released as platform authentication service, and all users participating in the data verification can be proportionally divided according to the number of valid verification data of the users participating in the data verification. The following detailed description of the various process steps:
and S301, the user can release the self-established GCMS reference data (standard library) and the reference method as the reference data to be verified.
Inter-step interaction or data flow:
s301 to S302, the reference data/method parameter or the feedback-corrected reference data/method parameter.
And S302, releasing the reference data to be verified for other users to download and verify.
Inter-step interaction or data flow:
and S302 to S301, feedback data after other users verify.
S302 to S303, reference data/method parameters.
And S302 to S304, verifying the passed reference data by other users (more than two users).
Step S303, the user downloads corresponding reference data/method for data (experiment) verification.
Inter-step interaction or data flow:
and S303 to S302, feedback data after verification.
And step S304, after a certain reference data and a certain reference method are verified to pass a certain number (more than two) of users, the platform randomly extracts test cases from the corresponding spectrum library data to carry out verification test on the test cases.
Inter-step interaction or data flow:
and S304 to S305, after the platform verification test is passed, the data are calibrated to be reliable reference data.
Step S305, reliable reference data (the associated reference number provides the user, and the reference data effectively verifies the participating user).
Inter-step interaction or data flow:
s305 to S306 reliable reference data (standard library).
And S306, the platform data identification service provides data identification service (interface or application) for the data service requirement user according to the reliable reference data (standard library), and returns an identification result/identification report of the data to be identified submitted by the user.
Inter-step interaction or data flow:
and S306 to S301, returning the calling record (proportionally distributing the authentication service fee according to the record) to a reliable reference data (standard library) provider.
And S306 to S303, the participants are verified to reliable reference data (standard library), and calling records are returned (according to the records, authentication service fees are proportionally distributed).
S306 to S307, identification result/identification report.
Step S307, the service demander user calls the platform data authentication service meeting the requirement condition, submits the data to be authenticated (supports authentication service fee according to authentication times), and receives the authentication result/authentication report.
Inter-step interaction or data flow:
and S307 to S306, submitting the data to be authenticated and supporting the authentication service fee according to the authentication times.
On the cloud platform constructed by the method, a laboratory or an individual with capability provides self-construction methods and data, and can also verify self-construction methods and data provided by other laboratories. The user charges according to the identification times, so the charge is very low, and the unit identification times can be completely borne; the provided and verified laboratories may benefit from user payment, the more the user is applying, the more income is, the third one provides the laboratories free of cost, thus encouraging them to actively participate in the provision and verification of the self-built method. Custom self-built methods and data from multiple laboratories are appreciated in terms of demand for user demand for reference spectrum libraries that have not yet been built.
Fig. 4 is a schematic diagram of main service functions of the cloud platform system identified based on the chromatographic mass spectrometry library, and the following is a detailed description of each service function:
user account control F001:
the user account control is used for managing and controlling user registration, user authentication, user role/authority control and user information management. After the user registers the function, the corresponding user name (user visible), password (user visible), role (user visible), corresponding authority (system binding) possessed by the role, and system ID (identification) of the user (system binding) are obtained.
When the user operates the browser/client program, login authentication is required, and the system opens the corresponding service function according to the role authority corresponding to the base. And all subsequent operations and user data (synchronization of GCMS data upload and reference method, GCMS (test library) data management, reference data management, data authentication and visualization of authentication result) will be bound with the system ID (identification) of the user.
The GCMS data uploading and reference method synchronizes F002:
the user can collect and upload the GCMS data file/method file through a browser (applicable to offline GCMS data) or a client (applicable to online/offline GCMS data) program. And simultaneously setting the attribute items of the data file/method file/additional attribute items of the GCMS according to a configuration interface of the program. Such as: standard samples/unknown samples corresponding to the data files; whether it is used for real-time authentication; the method file corresponds to the type of the instrument; a warehousing configuration (test database or reference database); reference (standard) method description (applicable categories such as drugs, water quality and corresponding national standard); data preprocessing methods, etc.
Before the unknown sample is identified, a user can query and obtain a reference method list with visible corresponding roles/authorities through a browser/client, and select and download an applicable reference method (which can be according to a detection type). The GCMS instrument is used for carrying out data acquisition on the unknown sample by applying the reference method.
GCMS (test library) data management F003:
the user is provided with the data belonging in the GCMS database (binding with the system ID of the user), and multi-mode browsing (list, independent spectrogram, spectrogram comparison), query, deletion, attribute management (private data, shared open data) and the like of the shared open (visible to other users) GCMS data.
Reference data management F004:
users are provided with data belonging to a reference data (standard library) library (bound to the user's system ID), and multimodal browsing (spectrogram, spectrogram comparison, substance attributes (CAS number, molecular formula, molecular weight, etc.), corresponding reference method ID), querying, deleting, attribute management (private data, shared open data), etc. that share open (visible to other users) data.
The user is provided with the data (bound with the system ID of the user) in the reference method database, and the multimode browsing (method list, method attribute (reference method ID, corresponding reference data (standard library) ID, etc.) detailed list, method comparison), inquiry, deletion, attribute management (private data, shared open data) and the like of the shared open (visible to other users) data.
Data service F005:
data verification: the user can release the self-established GCMS reference data (standard library) and the reference method as the reference data to be verified. Other users can download corresponding reference methods for data verification and provide verification results for the data to be verified. The user of the data party to be verified can complete and correct the reference data and the reference method through the verification feedback data of other users. When a certain reference data and a certain reference method are verified to pass a certain number (more than two) of users, the platform randomly extracts test cases from the corresponding spectrum library data to perform verification test on the test cases. The platform upgrades the self-built method and data subjected to the verification test into reliable reference method and data. After the data verification is passed, all the users participating in the data verification can be proportionally divided according to the number of valid verification data.
And (3) data service release: the user can publish the reliable reference method and data as data authentication service after verification, and the data authentication service is provided for the user to use on the cloud platform. The user pays a fee to the data service provider according to the number of authentication times of use.
And (3) data service delegation/acceptance, namely a user can issue service delegation (such as method development, batch data identification and the like) on a platform according to actual requirements. Paid take over by users with corresponding reliable reference data. Data delegation allows multiple users with the same requirements to initiate or participate, and can also be jointly accepted (split by workload or other modes) by qualified (reliable reference data) users. To enable the users to share profits and reduce authentication costs.
Data identification and visualization of identification results F006:
and GCMS data identification, and test data analysis identification is created in a task form. Providing reference data (standard library) ID selection of a data analysis and creation task in a user interface (under an online identification mode, a system can automatically select the corresponding reference data (standard library) ID according to a reference method downloaded by a user); GCMS data file (group) assignment to be authenticated in a test database; algorithm selection and algorithm parameter setting, such as data budget physics selection (filtering, bad value elimination, etc.), retention time offset threshold, spectrogram matching confidence threshold, spectrogram matching algorithm selection, integral algorithm selection (for unknown sample data), whether spectrogram is averaged or not, and the number of spectrograms of the average (spectrogram corresponding to the integral peak center of the unknown sample data). When the GCMS data identification task is executed, the system calculates the matching degree of each substance in the test data (unknown sample) and the reference data (standard library) according to the set data source and parameters and returns an identification result data set.
The user can browse/query the corresponding data identification task information table under the name, edit and submit the identification analysis tasks in the list, inquire the state (analysis, abnormity/error and completion), and browse the detailed information of the identification result.
And (4) displaying the online identification result in a client program in a graph form in real time (the forward similarity, the reverse similarity, the retention time offset, the identification result (whether the substance is rejected) and the like of the object to be detected (unknown sample) and the standard substance (known sample) in the reference database (standard library)), and providing analysis report downloading. Offline monitoring allows browsing/querying at the browser/client program and detailed reporting (customizable PDF) downloading/printing.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it is therefore intended that all such changes and modifications as fall within the true spirit and scope of the invention be considered as within the following claims.

Claims (10)

1. A cloud platform system based on gas chromatography-mass spectrometry spectral library identification is characterized by comprising a client and a cloud platform,
the client comprises:
GCMS combined instrument: collecting data of a sample;
application/Web application service: calling an original data file provided by the GCMS and uploading the original data file to a cloud platform;
the cloud platform includes:
web application service interface: the Web front-end program service is undertaken, and the Web front-end program service comprises data/operation interaction, user management, user service data management, analysis reports and result display;
web background: processing a front-end request and service realization of Web application service;
reference data/method quality verification program module: carrying out validity check, parameter merging and attribute binding on the original data file;
GCMS data identification: receiving GCMS original data sent by a Web background, and calling standard library data of a reference database to judge the data; calling a test database to store and manage data and calling GCMS data; returning the identification result to the Web background;
reference method library: storage and management for GCMS reference method;
and (3) referencing a database: the device is used for storing and managing GCMS reference data;
testing a database: the method is used for storing and managing GCMS test data.
2. An operation method of a cloud platform system based on gas chromatography-mass spectrometry spectral library identification is characterized in that a reference method library and a reference database are established on a cloud platform, and then data identification is carried out on a sample to be detected by calling method files and data files in the reference method library and the reference database.
3. The method of claim 2, wherein the step of establishing the reference method library and the reference database on the cloud platform comprises the steps of:
s201: a user acquires data of a standard sample through a GCMS instrument and uploads a data file of the acquired standard sample and a corresponding instrument method file to a Web application service interface through an application program/Web application service;
s202: the Web application service interface delivers the data files and the instrument method files to a reference data/method quality verification program module through a Web background, and the reference data/method quality verification program module carries out validity check, parameter merging and attribute binding on the uploaded data files and method files;
s203: and classifying and managing the uploaded data files and method files, storing the classified data files into a reference database, and storing the method files into a reference method library.
4. The operating method of the cloud platform system based on the gas chromatography-mass spectrometry library identification as claimed in claim 2 or 3, wherein the data identification of the sample to be tested by calling the method files and the data files in the reference method library and the reference database specifically comprises the following steps:
s204: a user requests a cloud platform to download a corresponding instrument method file according to a sample to be tested;
s205: the cloud platform returns a corresponding instrument method file to the user according to the request;
s206: a user acquires original data of a sample to be detected by applying a downloaded instrument method file through a GCMS (general packet radio service) instrument and generates a mass spectrum data file; uploading the mass spectrum data file to a Web application service interface through an application program/Web application service and completing service realization through a Web background;
s207: carrying out filtering, bad value elimination and peak detection on the mass spectrum data file through data preprocessing;
s208: performing GCMS data identification on the preprocessed mass spectrum data file for storage management and identification;
s209: outputting an identification result data set through a Web background;
s210: and the Web background returns the identification result to the application program/Web application service through the Web application service interface for the user to look up.
5. The method of claim 4, wherein in step S208, reference data matching the mass spectrum data file is retrieved from the reference database according to the mass spectrum data of the mass spectrum data file and the additional attribute parameters, and an identification result is obtained according to the matching degree of the reference data.
6. The method of claim 4, wherein the identification data set in step S209 comprises forward similarity, backward similarity and likelihood value of the mass spectrum data file and the reference data file, and the likelihood value is not greater than 100%.
7. The method of claim 4, wherein the results of the identification obtained by the user through the application/Web application are displayed as a legend, report, and identification report.
8. The method of claim 5, wherein the identification result obtained in step S208 is uploaded to a test database for storage.
9. The method of operation of a cloud platform system based on gas chromatography mass spectrometry library identification as claimed in claim 3, further comprising updating of reference data and reference methods, comprising the steps of:
step S301, a user releases self-built GCMS reference data and a reference method into to-be-verified reference data and uploads the to-be-verified reference data to a cloud platform through an application program/Web application service;
and step S302, the Web application service interface stores the received reference data to be verified into a reference database and a reference method library through a Web background for other users to download and verify.
Step S303, other users download corresponding reference data to be verified from a reference database and a reference method library of the cloud platform for data verification;
and S304, after more than two users pass the verification of a certain reference data and a reference method in the to-be-verified reference data, randomly extracting test cases from a corresponding reference database and a corresponding reference method library by the cloud platform to perform verification test on the to-be-verified reference data.
And S305, the cloud platform marks the to-be-verified reference data passing the verification test as reliable reference data, and the updating of the reference data and the reference method is completed.
10. The method of operating a cloud platform system based on gas chromatography-mass spectrometry library identification according to claim 9, further comprising the following steps after step S305:
and S306, returning an identification result to the user who issues the reference data to be authenticated.
CN202010677060.2A 2020-07-14 2020-07-14 Operation method of cloud platform framework based on gas chromatography-mass spectrometry library identification Active CN111610281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010677060.2A CN111610281B (en) 2020-07-14 2020-07-14 Operation method of cloud platform framework based on gas chromatography-mass spectrometry library identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010677060.2A CN111610281B (en) 2020-07-14 2020-07-14 Operation method of cloud platform framework based on gas chromatography-mass spectrometry library identification

Publications (2)

Publication Number Publication Date
CN111610281A true CN111610281A (en) 2020-09-01
CN111610281B CN111610281B (en) 2022-06-10

Family

ID=72200610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010677060.2A Active CN111610281B (en) 2020-07-14 2020-07-14 Operation method of cloud platform framework based on gas chromatography-mass spectrometry library identification

Country Status (1)

Country Link
CN (1) CN111610281B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113219042A (en) * 2020-12-03 2021-08-06 深圳市步锐生物科技有限公司 Device and method for analyzing and detecting components in human body exhaled air
WO2022179444A1 (en) * 2021-02-25 2022-09-01 华谱科仪(大连)科技有限公司 Chromatographic analysis system, method for detecting and analyzing chromatogram, and electronic device
CN116561384A (en) * 2023-05-16 2023-08-08 南京中医药大学 Method for constructing molecular network and consensus spectrogram interface frame and establishing mass spectrum library

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325796A1 (en) * 2012-06-05 2013-12-05 Michael Basnight System and Method for Integrating Databases in a Cloud Environment
CN106092959A (en) * 2016-06-30 2016-11-09 上海仪器仪表研究所 A kind of near-infrared food quality based on cloud platform monitoring system
US20180052893A1 (en) * 2016-08-22 2018-02-22 Eung Joon JO Database management using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer
CN110020665A (en) * 2019-02-12 2019-07-16 北京鑫汇普瑞科技发展有限公司 A kind of microbial biomass modal data analysis method being compatible with different flight mass spectrometers
CN110110743A (en) * 2019-03-26 2019-08-09 中国检验检疫科学研究院 A kind of seven class mass spectrogram automatic recognition system of world's common pesticides and chemical pollutant and method based on cloud platform
US20200042540A1 (en) * 2017-04-17 2020-02-06 Chinese Academy Of Inspection And Quarantine Pesticide residue detection data platform based on high resolution mass spectrum, internet and data science, and method for automatically generating detection report
US20200104464A1 (en) * 2018-09-30 2020-04-02 International Business Machines Corporation A k-mer database for organism identification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325796A1 (en) * 2012-06-05 2013-12-05 Michael Basnight System and Method for Integrating Databases in a Cloud Environment
CN106092959A (en) * 2016-06-30 2016-11-09 上海仪器仪表研究所 A kind of near-infrared food quality based on cloud platform monitoring system
US20180052893A1 (en) * 2016-08-22 2018-02-22 Eung Joon JO Database management using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometer
US20200042540A1 (en) * 2017-04-17 2020-02-06 Chinese Academy Of Inspection And Quarantine Pesticide residue detection data platform based on high resolution mass spectrum, internet and data science, and method for automatically generating detection report
US20200104464A1 (en) * 2018-09-30 2020-04-02 International Business Machines Corporation A k-mer database for organism identification
CN110020665A (en) * 2019-02-12 2019-07-16 北京鑫汇普瑞科技发展有限公司 A kind of microbial biomass modal data analysis method being compatible with different flight mass spectrometers
CN110110743A (en) * 2019-03-26 2019-08-09 中国检验检疫科学研究院 A kind of seven class mass spectrogram automatic recognition system of world's common pesticides and chemical pollutant and method based on cloud platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BARBARA CALABRESE: "Cloud-Based Bioinformatics Tools", 《ENCYCLOPEDIA OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY》 *
冯晋文: "基于云平台的蛋白质组数据分析系统的构建", 《中国博士学位论文全文数据库 信息科技辑》 *
孙磊 等: "生物医学大数据处理的云计算解决方案", 《电子测量与仪器学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113219042A (en) * 2020-12-03 2021-08-06 深圳市步锐生物科技有限公司 Device and method for analyzing and detecting components in human body exhaled air
WO2022179444A1 (en) * 2021-02-25 2022-09-01 华谱科仪(大连)科技有限公司 Chromatographic analysis system, method for detecting and analyzing chromatogram, and electronic device
CN116561384A (en) * 2023-05-16 2023-08-08 南京中医药大学 Method for constructing molecular network and consensus spectrogram interface frame and establishing mass spectrum library
CN116561384B (en) * 2023-05-16 2023-11-03 南京中医药大学 Method for constructing molecular network and consensus spectrogram interface frame and establishing mass spectrum library

Also Published As

Publication number Publication date
CN111610281B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN111610281B (en) Operation method of cloud platform framework based on gas chromatography-mass spectrometry library identification
Ulrich et al. EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings
Fermin et al. Abacus: a computational tool for extracting and pre‐processing spectral count data for label‐free quantitative proteomic analysis
Choi et al. MassIVE. quant: a community resource of quantitative mass spectrometry–based proteomics datasets
Monroe et al. VIPER: an advanced software package to support high-throughput LC-MS peptide identification
Horai et al. MassBank: a public repository for sharing mass spectral data for life sciences
Valot et al. MassChroQ: a versatile tool for mass spectrometry quantification
Camacho et al. Monte Carlo simulations of post-common-envelope white dwarf+ main sequence binaries: comparison with the SDSS DR7 observed sample
Farrah et al. PASSEL: The P eptide A tlas SRM experiment library
Herzog et al. LipidXplorer: software for quantitative shotgun lipidomics compatible with multiple mass spectrometry platforms
US11036777B2 (en) Analysis information management system
LaMarche et al. MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis
Park et al. Census for proteome quantification
CN112199296B (en) Page testing method and device, computer equipment and storage medium
AU2014400621B2 (en) System and method for providing contextual analytics data
KR20120007889A (en) Method, system and recording medium for verifying effect of advertisement
Remes et al. Highly multiplex targeted proteomics enabled by real-time chromatographic alignment
Knorr et al. Computer-Assisted Structure Identification (CASI) An Automated Platform for High-Throughput Identification of Small Molecules by Two-Dimensional Gas Chromatography Coupled to Mass Spectrometry
CN111159561A (en) Method for constructing recommendation engine according to user behaviors and user portrait
CN111414410A (en) Data processing method, device, equipment and storage medium
Hutchins et al. Accelerating lipidomic method development through in silico simulation
CN111814864A (en) Artificial intelligent cloud platform system for mass spectrometry data and data analysis method
Pu et al. High-throughput deconvolution of intact protein mass spectra for the screening of covalent inhibitors
Swainston et al. A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data
JP6897073B2 (en) Regional policy evaluation method and regional policy evaluation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant