CN113223621A - Full-chain data analysis system for biomedicine - Google Patents

Full-chain data analysis system for biomedicine Download PDF

Info

Publication number
CN113223621A
CN113223621A CN202110532117.4A CN202110532117A CN113223621A CN 113223621 A CN113223621 A CN 113223621A CN 202110532117 A CN202110532117 A CN 202110532117A CN 113223621 A CN113223621 A CN 113223621A
Authority
CN
China
Prior art keywords
data
warehousing
analysis
module
online
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110532117.4A
Other languages
Chinese (zh)
Other versions
CN113223621B (en
Inventor
吕晖
张悦宁
任永永
程志伟
李磊杰
顾坚磊
王晓雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110532117.4A priority Critical patent/CN113223621B/en
Publication of CN113223621A publication Critical patent/CN113223621A/en
Application granted granted Critical
Publication of CN113223621B publication Critical patent/CN113223621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a full-chain data analysis system for biomedicine, which comprises a user management system, a warehousing management system and an online analysis system, wherein the user management system, the warehousing management system and the online analysis system are respectively connected with a personal center module, the warehousing management system comprises a code warehousing module, a software warehousing module and a data warehousing module, the online analysis system is provided with a workflow application release subsystem, codes and software in the code warehousing module and the software warehousing module are called, data stored in the data warehousing module or received online uploaded data are analyzed, and an analysis result is output. Compared with the prior art, the method has the advantages of wide longitudinal coverage of analysis links, wide transverse coverage of analysis objects, low development complexity, good expandability, online operation and the like.

Description

Full-chain data analysis system for biomedicine
Technical Field
The invention relates to the technical field of data analysis, in particular to a full-chain data analysis system for biomedicine.
Background
With the rise of the application of high-throughput omics technology in the field of life science, the understanding of human beings on the disease phenomenon at the molecular level is unprecedentedly developed, and the combination of omics analysis and clinical research is quickened by the proposal of the concept of 'precise medicine'. How to efficiently utilize the patient's molecular omics and clinical information data to individually perform fine disease classification and precise medical intervention has become a research hotspot in the field of bioinformatics. Open source biomedical data analysis systems have been developed for the purpose of storage, management, sharing and application of massive genetic resources.
At present, biomedical analysis systems can be classified into three categories according to their use: the system comprises a data storage platform, a letter generation tool platform and an online analysis platform. The data storage platform is used for storing massive multi-isomeric biomedical big data, and is typically a nucleic acid sequence database (SRA) and a gene expression database (GEO) of the National Center for Biotechnology Information (NCBI), an ENA database of the European Bioinformatics Institute (EBI), and a GSA database of the Chinese life and health big data center (BIGD), and the biological information resource storage platform is a basis for data mining and integration analysis. The tool platform is a collection of software or code in the field of biomedical data analysis, generally provides function classification, instructions for use, download links and the like, and is ranked according to the number of users and influence, such as a collection website of biological software, Omictools and a whole-field software project hosting platform GitHub. The online analysis platform is a data analysis system for realizing single or integrated specific functions, and is divided into a one-key analysis platform for common users and a combined flow building system for developers, so that the operation difficulty is different, the threshold of analysts is reduced to a certain extent, and researchers or doctors in non-credit-production specialties can analyze and obtain valuable information from data, such as a Galaxy biological information analysis platform in the United states, a BGI online platform for Chinese large genes, and hundreds of thousands of lightweight webpage end credit-production analysis tools.
Although the prior art has been provided with the above omnibearing system for biomedical data, due to the lack of a more perfect medical information and molecular omics data technology system design, the above platforms often cannot combine efficient transmission, storage, process development and online computational analysis at the same time, so that all platforms and research projects are disconnected and isolated, and the engineering applicability in the biomedical projects is not high at present. Therefore, innovations are still needed in the key architecture system, and breakthroughs are made in the aspects of openness, standardization, information flow sharing mechanism and the like of the system.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a full-chain data analysis system for biomedicine, which realizes cross-platform interconnection by calling corresponding IDs (identification numbers), allows a user to freely develop, combine and build, and covers all links of the whole biomedical data analysis.
The purpose of the invention can be realized by the following technical scheme:
a full-chain data analysis system for biomedicine comprises a user management system, a warehousing management system and an online analysis system which are respectively connected with a personal center module, wherein the warehousing management system comprises a code warehousing module, a software warehousing module and a data warehousing module, the online analysis system is provided with a workflow application issuing subsystem, codes and software in the code warehousing module and the software warehousing module are called, data stored in the data warehousing module or received online uploaded data are analyzed, and an analysis result is output
The user management system has the functions of auditing the registration information, adding the user ID corresponding to the registration information in the personal center module after the auditing is passed, and fully opening all systems and modules of the data analysis system for the user ID which is audited.
Further, the registration information includes mailbox, name and unit basic information.
The warehousing management system packages and classifies the codes and the software through the code warehousing module and the software warehousing module.
The code warehousing module comprises codes and code basic information uploaded by a user, and the codes and the code basic information are issued by the workflow application issuing subsystem and then are operated on line by the on-line analysis system.
The software warehousing module comprises a software mirror image uploaded by a user and corresponding mirror image information, and the running environment of the code warehousing module is provided through the software mirror image and the mirror image information.
The data stored in the data warehousing module specifically comprises original data to be analyzed and corresponding data description information uploaded by a user.
The function of the personal center module comprises management operation on the analysis results of the warehousing management system and the online analysis system.
Further, the management operation in the personal center module specifically includes an addition operation, a deletion operation, a query operation, and a modification operation.
The online analysis system supports efficient transmission and privacy protection.
The analysis mode of the online analysis system comprises omics data analysis, medical information analysis, multi-level data integration analysis and visual analysis.
The function of the online analysis system also comprises adjusting the parameters of the workflow application release subsystem.
The online analysis system is connected with a high-performance computing platform, and online data analysis is carried out through a high-performance computing cluster of the high-performance computing platform.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention has wide longitudinal coverage of analysis links, covers all links of biomedical data analysis, integrates data, codes and software required by analysis into the same system, and a user does not need to respectively search a corresponding platform for use in each link, so that the invention has more comprehensive coverage and wider user audience compared with the existing biomedical data analysis system.
2. The invention has wide transverse coverage of the analysis object, and the top-level architecture design is field-oriented, thereby not only meeting the multi-level data analysis requirements of medical informatics and medical image analysis besides the biological information related resources included by the existing platform, widening the field boundary of the analysis object and covering the growing medical data analysis modules.
3. The invention is user-friendly to developers, integrates data, codes and software of different layers in the field of biomedical data analysis into the same system, can independently classify and manage each system, realizes series interconnection and intercommunication in workflow modules, and provides an integrated development environment, thereby reducing the development complexity of users and reducing redundant development.
4. The invention adopts the concept of open development architecture of the user, fundamentally endows the user with extremely high degree of freedom of the construction process, and can more flexibly and rapidly deal with diversified analysis purposes and resources along with the technical development in the field of biomedical data analysis, so the system has good expandability and strong potential on the basis of the existing classification architecture.
5. All the included workflows support online operation, the repeatability of the workflows is guaranteed due to the fact that the included workflows are packaged with codes and software operation environments with fixed versions, and due to the fact that a resource scheduling scheme of a high-performance computing platform is added, the properties of speed, usability and the like of the workflow can be evaluated more directly and objectively.
Drawings
FIG. 1 is a schematic structural view of the present invention;
fig. 2 is a schematic operation flow chart in the embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Examples
As shown in fig. 1, a full-chain data analysis system for biomedicine includes a user management system, a warehouse management system and an online analysis system respectively connected with a personal center module, wherein the warehouse management system includes a code warehouse module, a software warehouse module and a data warehouse module, the online analysis system is provided with a workflow application release subsystem, calls codes and software in the code warehouse module and the software warehouse module, analyzes data stored in the data warehouse module or received online uploaded data, and outputs an analysis result
In this embodiment, the data analysis system is supported by a high-performance computing platform, and the underlying resource support includes a cloud storage server, a Web server, and a computing resource scheduling system.
The user management system has the functions of auditing the registration information, adding the user ID corresponding to the registration information in the personal center module after the auditing is passed, and fully opening all systems and modules of the data analysis system for the user ID which is audited.
The registration information includes mailbox, name and unit basic information.
In this embodiment, the user management system is based on a Web server, and the Web client is used as a frontmost display layer interacting with the user, and functions of the Web client include Http request, data acquisition, view analysis, and page return.
The warehousing management system packages and classifies the codes and the software through the code warehousing module and the software warehousing module.
The code storage module comprises codes and code basic information uploaded by a user, and the codes and the code basic information are issued by the workflow application issuing subsystem and then are operated on line by the on-line analysis system.
The software warehousing module comprises a software mirror image uploaded by a user and corresponding mirror image information, and provides the running environment of the code warehousing module through the software mirror image and the mirror image information.
The data stored in the data warehousing module specifically comprises original data to be analyzed and corresponding data description information uploaded by a user.
The personal center module is responsible for user information maintenance and managing uploaded warehousing, workflow and analysis results, and specific management operations comprise adding operations, deleting operations, inquiring operations and modifying operations.
The online analysis system supports efficient transmission and privacy protection.
The analysis mode of the online analysis system comprises omics data analysis, medical information analysis, multi-level data integration analysis and visual analysis.
The functions of the online analysis system also include adjusting parameters of the workflow application publishing subsystem.
The online analysis system is connected with a high-performance computing platform, and online data analysis is carried out through a high-performance computing cluster of the high-performance computing platform.
As shown in fig. 2, the implementation steps of the present invention are specifically as follows:
in step 1, a new user registers the system on a Web page, has login authority after manual verification, and then executes step 2;
in step 2, the use purpose input by the user is obtained and judged, if the use purpose is the use purpose of developing the analysis workflow, step 31 is executed, and if the use purpose is only the use purpose of data analysis, step 32 is executed;
in step 31, entering the warehouse management system, then executing steps 411 and 412, uploading the code and software images to the code warehousing submodule and the software warehousing submodule respectively, filling information of each file in a page, then executing step 51, editing and issuing a workflow.
In step 32, entering an online analysis system, then executing step 421, selecting an analysis type to be omics data analysis, medical information analysis, multi-level data integration analysis or visualization analysis, executing step 52, selecting a workflow which is already stored in the system and is provided by other users, and then executing step 6;
in step 6, judging whether the data to be analyzed is stored in a data warehousing submodule in the system, if so (namely, step 413 is executed), filling in a corresponding warehousing ID, and directly entering step 8; if not, entering step 7;
in step 7, uploading the data to be analyzed in a compressed format on line, and then entering step 8;
in step 8, selecting to execute analysis, transmitting a command to a background, running a code of a workflow through an operation scheduling system of a high-performance computer, and then entering step 9;
in step 9, the analysis result is presented at the web page end, and the user is supported to download and delete.
In addition, it should be noted that the specific embodiments described in the present specification may have different names, and the above descriptions in the present specification are only illustrations of the structures of the present invention. All equivalent or simple changes in the structure, characteristics and principles of the invention are included in the protection scope of the invention. Various modifications or additions may be made to the described embodiments or methods may be similarly employed by those skilled in the art without departing from the scope of the invention as defined in the appending claims.

Claims (10)

1. The system is characterized by comprising a user management system, a warehousing management system and an online analysis system which are respectively connected with a personal center module, wherein the warehousing management system comprises a code warehousing module, a software warehousing module and a data warehousing module, the online analysis system is provided with a workflow application release subsystem, codes and software in the code warehousing module and the software warehousing module are called, data stored in the data warehousing module or received online uploaded data are analyzed, and an analysis result is output.
2. The system of claim 1, wherein the user management system further comprises a function of auditing the registration information, and the function of the auditing system is to add the user ID corresponding to the registration information to the personal central module after the auditing is passed.
3. The full chain data analysis system for biomedicine according to claim 2, characterized in that the registration information includes mailbox, name and unit basic information.
4. The full-chain data analysis system for biomedicine according to claim 1, wherein the code warehousing module comprises codes and code basic information uploaded by a user, and the codes and the code basic information are issued by the workflow application issuing subsystem and then run online by an online analysis system.
5. The system for analyzing full chain data for biomedicine according to claim 1, wherein the software warehousing module comprises a software image uploaded by a user and corresponding image information, and an operating environment of the code warehousing module is provided through the software image and the image information.
6. The system according to claim 1, wherein the data stored in the data warehousing module specifically includes raw data to be analyzed and corresponding data description information uploaded by a user.
7. The system of claim 1, wherein the function of the personal center module comprises management of the analysis results of the warehouse management system and the on-line analysis system.
8. The system according to claim 7, wherein the management operations in the personal central module include, in particular, addition operations, deletion operations, query operations and modification operations.
9. The full chain data analysis system for biomedical science according to claim 1, wherein the functions of the online analysis system further comprise adjusting parameters of the workflow application publishing subsystem.
10. The full-chain data analysis system for biomedicine according to claim 1, characterized in that a high-performance computing platform is connected to the online analysis system, and online data analysis is performed through a high-performance computing cluster of the high-performance computing platform.
CN202110532117.4A 2021-05-17 2021-05-17 Full-chain data analysis system for biomedicine Active CN113223621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110532117.4A CN113223621B (en) 2021-05-17 2021-05-17 Full-chain data analysis system for biomedicine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110532117.4A CN113223621B (en) 2021-05-17 2021-05-17 Full-chain data analysis system for biomedicine

Publications (2)

Publication Number Publication Date
CN113223621A true CN113223621A (en) 2021-08-06
CN113223621B CN113223621B (en) 2023-10-31

Family

ID=77092153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110532117.4A Active CN113223621B (en) 2021-05-17 2021-05-17 Full-chain data analysis system for biomedicine

Country Status (1)

Country Link
CN (1) CN113223621B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022007A (en) * 2016-06-14 2016-10-12 中国科学院北京基因组研究所 Cloud platform system and method oriented to biological omics big data calculation
CN108694305A (en) * 2018-03-30 2018-10-23 武汉光谷创赢生物技术开发有限公司 Analysis of biological information platform based on cloud computing
CN109192248A (en) * 2017-07-21 2019-01-11 上海桑格信息技术有限公司 Biological information analysis system, method and cloud computing platform system based on cloud platform
CN109324804A (en) * 2018-01-09 2019-02-12 上海交通大学医学院附属瑞金医院 For downloading the method and system of installation and management bioinformatics software and database
CN109448788A (en) * 2018-10-24 2019-03-08 广州基迪奥生物科技有限公司 On-line analysis platform architecture of microbiology of genomics and bioinformatics
CN111324671A (en) * 2020-03-02 2020-06-23 苏州工业园区洛加大先进技术研究院 Biomedical high-speed information processing and analyzing system based on big data technology
CN111367978A (en) * 2020-03-02 2020-07-03 苏州工业园区洛加大先进技术研究院 Biological medical information processing and analyzing system combining omics data and clinical data
CN111666356A (en) * 2020-08-10 2020-09-15 南京江北新区生物医药公共服务平台有限公司 Belief generation analysis paas cloud platform system based on galaxy

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022007A (en) * 2016-06-14 2016-10-12 中国科学院北京基因组研究所 Cloud platform system and method oriented to biological omics big data calculation
CN109192248A (en) * 2017-07-21 2019-01-11 上海桑格信息技术有限公司 Biological information analysis system, method and cloud computing platform system based on cloud platform
CN109324804A (en) * 2018-01-09 2019-02-12 上海交通大学医学院附属瑞金医院 For downloading the method and system of installation and management bioinformatics software and database
CN108694305A (en) * 2018-03-30 2018-10-23 武汉光谷创赢生物技术开发有限公司 Analysis of biological information platform based on cloud computing
CN109448788A (en) * 2018-10-24 2019-03-08 广州基迪奥生物科技有限公司 On-line analysis platform architecture of microbiology of genomics and bioinformatics
CN111324671A (en) * 2020-03-02 2020-06-23 苏州工业园区洛加大先进技术研究院 Biomedical high-speed information processing and analyzing system based on big data technology
CN111367978A (en) * 2020-03-02 2020-07-03 苏州工业园区洛加大先进技术研究院 Biological medical information processing and analyzing system combining omics data and clinical data
CN111666356A (en) * 2020-08-10 2020-09-15 南京江北新区生物医药公共服务平台有限公司 Belief generation analysis paas cloud platform system based on galaxy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭华章 等: "建立基于公开软件的校园生物信息学平台", 《第四军医大学学报》 *

Also Published As

Publication number Publication date
CN113223621B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN107918600B (en) Report development system and method, storage medium and electronic equipment
Swedlow et al. Bioimage informatics for experimental biology
US20030233365A1 (en) System and method for semantics driven data processing
CN107563153A (en) A kind of PacBio microarray dataset IT architectures based on Hadoop structures
KR20130027948A (en) System and method for processing bio information analysis pipeline
CN108694305A (en) Analysis of biological information platform based on cloud computing
Manduchi et al. RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies
Khalid et al. A comparative analysis of big data frameworks: An adoption perspective
Gusev et al. The dataset of the experimental evaluation of software components for application design selection directed by the artificial bee colony algorithm
CN113741883A (en) RPA lightweight data middling station system
Stokes et al. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
Cunha et al. Context-aware execution migration tool for data science Jupyter Notebooks on hybrid clouds
Woollard et al. Scientific software as workflows: From discovery to distribution
Di Martino et al. A methodology based on computational patterns for offloading of big data applications on cloud-edge platforms
Bisognin et al. A-MADMAN: annotation-based microarray data meta-analysis tool
US10467068B2 (en) Automated remote computing method and system by email platform for molecular analysis
Coutinho et al. Many task computing for orthologous genes identification in protozoan genomes using Hydra
CN113223621A (en) Full-chain data analysis system for biomedicine
Kroß et al. Pertract: model extraction and specification of big data systems for performance prediction by the example of apache spark and hadoop
Grunzke et al. Seamless HPC integration of data-intensive KNIME workflows via UNICORE
Cavalcanti et al. Structural genomic workflows supported by web services
Lushbough et al. BioExtract server—an integrated workflow-enabling system to access and analyze heterogeneous, distributed biomolecular data
CN113643757B (en) Microbiology data interaction analysis system
Auer et al. An infrastructure for platform-independent experimentation of software changes
Agapito et al. An efficient and scalable SPARK preprocessing methodology for Genome Wide Association Studies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant