CN109656910B - Extensible large-scale biomedical sample management and visualization platform - Google Patents

Extensible large-scale biomedical sample management and visualization platform Download PDF

Info

Publication number
CN109656910B
CN109656910B CN201811487666.9A CN201811487666A CN109656910B CN 109656910 B CN109656910 B CN 109656910B CN 201811487666 A CN201811487666 A CN 201811487666A CN 109656910 B CN109656910 B CN 109656910B
Authority
CN
China
Prior art keywords
sample
platform
user
search
data information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811487666.9A
Other languages
Chinese (zh)
Other versions
CN109656910A (en
Inventor
臧天仪
刘春圃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201811487666.9A priority Critical patent/CN109656910B/en
Publication of CN109656910A publication Critical patent/CN109656910A/en
Application granted granted Critical
Publication of CN109656910B publication Critical patent/CN109656910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is an extensible large-scale biomedical sample visualization platform. The platform comprises a sample statistic module, a visualization module, a retrieval module and a mongodb database system. And the user logs in the platform to verify the identity and the authority, and after the user completes the authority verification, the user adds data information to the platform to upload. And performing quality inspection on the added data information, and storing the data information passing the quality inspection into the mongodb database system. And the user comprehensively searches the sample information in the mongodb database through multiple conditions and checks the statistics of the search results. And after the retrieval is finished, the platform counts the sample search results and presents the sample search results to the user in the forms of visual charts and the like. Aiming at massive biomedical samples, the distributed type biomedical sample management method realizes more convenient distributed deployment and expandable storage, can provide more convenient operation, and greatly improves the biomedical sample management and use efficiency.

Description

Extensible large-scale biomedical sample management and visualization platform
Technical Field
The invention relates to the field of biology or medicine, in particular to an extensible large-scale biomedical sample management and visualization platform.
Background
With the rapid development of modern biomedicine, data on biomedicine is also increasing, wherein a very critical part of the data is data on a biomedical sample. These data are derived from real organisms and will be of great significance for research in the medical and biological fields. Therefore, the management of biomedical samples is a significant concern.
At present, most of domestic management or retrieval platforms of biological and medical related information are based on literature management, and a management platform specially aiming at a biomedical sample is lacked, so that the biomedical sample management platform established in the patent can better manage sample information and help researchers to better utilize the related biomedical sample information.
Most of the existing sample library systems use relational databases, such as mysql. The relational database is a more classical database, but the expression of the relational database on the storage scale and the expansibility has some problems. When a large amount of data needs to be stored, the conventional database is difficult to be conveniently and distributively deployed and stored, so that the size of the data which can be accommodated is limited. And when the data volume in the relational database becomes large, the speed and efficiency of operation on the database are greatly reduced, and the use experience of the sample library system is influenced. It is a considerable problem to use new storage management techniques to improve the operating efficiency of the platform. This patent has adopted neotype non-relational database in order to solve above problem, not only can be very convenient expand data storage, can optimize the efficiency of database operation moreover, promote user's experience.
In addition, the quality of the samples in most sample library systems is uneven, and the information of the samples entering the sample library is lack of corresponding check and control, so that the users of the system can hardly obtain real valuable data. Aiming at the problem, a link of sample quality detection is added in the process of putting the sample in storage, and the quality of the sample information entering the sample storage is improved by checking the sample data.
In most sample libraries, the retrieval form is relatively single, retrieval can be performed only according to certain items or certain conditions, the retrieval items and the retrieval form cannot be freely selected, and various requirements of users are sometimes difficult to meet. In the search result, corresponding statistics and visualization are lacked, and a system user cannot be provided with more visual and overall grasp on the result of searching the sample.
Disclosure of Invention
The invention provides an extensible large-scale biomedical sample management and visualization platform in order to enable management of massive biomedical samples to be more convenient and efficient, and the invention provides the following technical scheme:
an extensible large-scale biomedical sample management and visualization platform comprises a sample statistics module, a visualization module, a retrieval module and a mongodb database, wherein the platform integrates functions of the mongodb database, and the mongodb database is responsible for data storage, sample data addition and the like.
Preferably, the platform adopts xml schema technology to control the data quality of different data items.
Preferably, the platform defines sample metadata by using an xml technology, and the platform uses a file in an excel form as a carrier of sample data information.
Preferably, by adding Key-Value of mongodb database, the data information amount is increased and the operation efficiency is improved.
An expandable large-scale biomedical sample management and visualization platform operation method comprises the following steps:
the method comprises the following steps: a user logs in an extensible large-scale biomedical sample management and visualization platform to verify the identity and the authority;
step two: after the user completes the authority verification, the user adds data information to the platform and uploads the data information;
step three: the added data information is subjected to quality inspection, the data information which does not pass the quality inspection is not stored in the mongodb database and is returned to a client for correcting errors, and the data information which passes the quality inspection is stored in the mongodb database;
step four: a user comprehensively searches sample information existing in the mongodb database through multiple conditions and checks statistics of search results;
step five: and after the retrieval is finished, the extensible large-scale biomedical sample management and visualization platform counts the sample search results and presents the sample search results to the user in a visualization chart form.
Preferably, the multi-condition comprehensive search carries out logic search by realizing combination and AND/OR/NOT form of any search item.
The invention has the following beneficial effects:
1. the platform adopts the big data-oriented non-relational database mongodb, can accommodate samples with larger data volume compared with the traditional relational database, can realize more convenient distributed deployment and storage expansion, can realize faster operation, and greatly improves the service efficiency of the platform.
2. The platform enables quality control of the sample. Sample data in the uploaded sample file can be checked in the process of warehousing the sample, excessive data loss or wrong format use of the sample is avoided, the quality of the sample in the database can be guaranteed to a certain extent, and a platform user can better utilize the sample data.
3. The platform enables multi-conditional comprehensive retrieval of samples. Compared with other management platforms, the retrieval module of the platform can provide the user with the function of freely adding retrieval items, and can realize fuzzy search and AND/OR/NOT logic search, so that the platform can better meet different search requirements of the user.
4. The platform can better realize the statistics and visualization of sample data. The platform is provided with a sample data counting and visualization module, so that various types of sample statistical information can be given, and a user can visually know the distribution condition of samples in the platform more intuitively.
Drawings
FIG. 1 is a flow chart of a method of operation of a scalable large-scale biomedical specimen management and visualization platform.
Detailed Description
The present invention will be described in detail with reference to specific examples.
The first embodiment is as follows:
an extensible large-scale biomedical sample management and visualization platform is characterized in that: the platform comprises a sample statistic module, a visualization module, a retrieval module and a mongodb database, and is connected with the mongodb database.
The platform adopts a non-relational database mongodb to be responsible for storage management of data, and new addition and other operations of sample data are all responsible for the mongodb. According to the current requirement, only one database server is arranged in the cluster, the number of the database servers in the cluster can be further increased to expand data storage according to the increase of data volume and the improvement of the requirement of operation efficiency, and meanwhile, data backup and load balancing are further carried out, so that the reliability and the availability of the system are further improved.
The platform enables quality control of the sample. Sample data in the uploaded sample file can be checked in the process of warehousing the sample, excessive data loss or wrong format use of the sample is avoided, the quality of the sample in the database can be guaranteed to a certain extent, and a platform user can better utilize the sample data.
The platform enables multi-conditional comprehensive retrieval of samples. Compared with other management platforms, the retrieval module of the platform can provide the user with the function of freely adding retrieval items, and can realize fuzzy search and AND/OR/NOT logic search, so that the platform can better meet the search requirement of the user.
The platform can better realize the statistics and visualization of sample data. The platform is provided with a sample data counting and visualization module, so that various types of sample statistical information can be given, and a user can visually know the distribution condition of samples in the platform more intuitively.
The second embodiment is as follows:
an expandable large-scale biomedical sample management and visualization platform operation method comprises the following steps:
the method comprises the following steps: a user logs in an extensible large-scale biomedical sample management and visualization platform to verify the identity and the authority;
step two: after the user completes the authority verification, the user adds data information to the platform and uploads the data information;
step three: the added data information is subjected to quality inspection, the data information which does not pass the quality inspection is not stored in the mongodb database and is returned to a client for correcting errors, and the data information which passes the quality inspection is stored in the mongodb database;
step four: and the user comprehensively searches the sample information in the mongodb database through multiple conditions and checks the statistics of the search results.
Step five: and after the retrieval is finished, the extensible large-scale biomedical sample visualization platform counts the sample search results and presents the sample search results to the user in a visualization chart form.
The user using the platform needs to log in the platform for identity and authority verification. Different accounts can be endowed with different authorities, and different authorities can execute different operations so as to ensure the security of sample data.
After the authentication of the identity authority is completed, the user with the corresponding authority can add sample data to the platform. In order to facilitate users to use the platform, the platform adopts an excel file form as a carrier of sample information, and users only need to fill the sample information into an excel form according to specifications and finish uploading of the sample file in an interface. In order to standardize the quality of the sample information, the platform makes a corresponding excel template for a sample information uploader, and a user can download and use the template in a corresponding interface. The template uploading sample information provided by the platform can enable the uploaded samples to have better quality and higher inspection passing rate.
And in the uploading process, quality check is carried out on the information of the samples in the uploaded files. The platform here uses xml schema technology to control the data quality of different data items, which can find missing data items and data items that do not conform to the data format specification in the sample. If the sample information in the uploaded sample file does not pass the sample quality check, the sample information in the file cannot be put in a database, the platform returns information which does not meet the specification in the sample to a user for the user to correct the error of the sample information, and if the sample information passes the sample quality check, the sample is added into the database. According to the method, the Key-Value of the mongodb database is added, so that the data information quantity is increased and the operation efficiency is improved.
The user may retrieve sample information present in the platform database. The platform provides a plurality of retrieval modes, can realize the combination of any retrieval items and the logical retrieval in the form of AND, OR and NOT, and can select the search item and the search form which are desired to be used by a user to retrieve a sample. After the search is completed, the platform performs a simple statistic on the sample search result, such as the statistic of gender and a sample storage mechanism, and then presents the result to the user in a visual chart mode.
In addition to the search mode, the user can perform various forms of statistical and visual viewing on the data in the whole platform, such as: and counting and visualizing the distribution of samples among different sample storage institutions and the distribution of samples in each sample storage institution, counting and visualizing the distribution of samples across the country, and the like.
The above description is only a preferred embodiment of the scalable large-scale biomedical sample management and visualization platform, and the scope of protection of the scalable large-scale biomedical sample management and visualization platform is not limited to the above embodiments, and all technical solutions belonging to the idea belong to the scope of protection of the present invention. It should be noted that modifications and variations can be made by those skilled in the art without departing from the principles of the invention and these modifications and variations should also be considered as within the scope of the invention.

Claims (3)

1. An extensible large-scale biomedical sample management and visualization platform is characterized in that: the platform comprises a sample statistical module, a visualization module, a retrieval module and a mongodb database, the platform is connected with the mongodb database, and the mongodb database is responsible for data storage and sample data addition; the platform adopts an xml schema technology to control the data quality of different data items, the platform adopts the xml technology to define metadata, and the platform adopts an excel form file as a carrier of sample data information; by adding Key-Value of the mongodb database, the data information quantity is increased and the operation efficiency is improved;
the user searches the sample information in the platform database, realizes the combination of any search terms and the logic search by using the form of AND, OR and NOT, the user selects the search term and the search form which the user wants to use to search the sample, after the search is finished, the platform carries out statistics on the sample search result, the gender and the sample storage mechanism, and then the statistics is presented to the user in a visual chart mode.
2. A method of operating a scalable large-scale biomedical specimen management and visualization platform of claim 1, wherein: the method comprises the following steps:
the method comprises the following steps: a user logs in an extensible large-scale biomedical sample management and visualization platform to verify the identity and the authority;
step two: after the user completes the authority verification, the user adds data information to the platform and uploads the data information;
step three: performing quality inspection on the added data information, returning the data information which does not pass the quality inspection to a client to correct errors without storing the data information into the mongodb database, and storing the data information which passes the quality inspection into the mongodb database;
step four: a user comprehensively searches sample information existing in the mongodb database through multiple conditions and checks statistics of search results;
step five: and after the retrieval is finished, the extensible large-scale biomedical sample management and visualization platform counts the sample search results and presents the sample search results to the user in a visualization chart form.
3. The method of operation of claim 2, wherein: the multi-condition comprehensive search carries out logic search by realizing combination of any search items and using AND/OR/NOT mode.
CN201811487666.9A 2018-12-06 2018-12-06 Extensible large-scale biomedical sample management and visualization platform Active CN109656910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811487666.9A CN109656910B (en) 2018-12-06 2018-12-06 Extensible large-scale biomedical sample management and visualization platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811487666.9A CN109656910B (en) 2018-12-06 2018-12-06 Extensible large-scale biomedical sample management and visualization platform

Publications (2)

Publication Number Publication Date
CN109656910A CN109656910A (en) 2019-04-19
CN109656910B true CN109656910B (en) 2021-04-13

Family

ID=66112703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811487666.9A Active CN109656910B (en) 2018-12-06 2018-12-06 Extensible large-scale biomedical sample management and visualization platform

Country Status (1)

Country Link
CN (1) CN109656910B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699161A (en) * 2019-10-23 2021-04-23 上海磐门信息科技有限公司 Medical statistical system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933112A (en) * 2015-06-04 2015-09-23 浙江力石科技股份有限公司 Distributed Internet transaction information storage and processing method
CN107066531A (en) * 2017-03-01 2017-08-18 苏州朗动网络科技有限公司 A kind of business data radar monitoring method and system based on enterprise's big data platform
CN107066532A (en) * 2017-03-01 2017-08-18 苏州朗动网络科技有限公司 A kind of method and system for generating enterprise's transverse and longitudinal graph of a relation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10331848B2 (en) * 2017-01-31 2019-06-25 Onramp Bioinformatics, Inc. Method for managing complex genomic data workflows

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933112A (en) * 2015-06-04 2015-09-23 浙江力石科技股份有限公司 Distributed Internet transaction information storage and processing method
CN107066531A (en) * 2017-03-01 2017-08-18 苏州朗动网络科技有限公司 A kind of business data radar monitoring method and system based on enterprise's big data platform
CN107066532A (en) * 2017-03-01 2017-08-18 苏州朗动网络科技有限公司 A kind of method and system for generating enterprise's transverse and longitudinal graph of a relation

Also Published As

Publication number Publication date
CN109656910A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
US9805079B2 (en) Executing constant time relational queries against structured and semi-structured data
US10769123B2 (en) Workload-driven recommendations for Columnstore and Rowstore indexes in relational databases
CN103631842B (en) For detecting the method and system of multiple row compound keys row set
Vera et al. Data modeling for NoSQL document-oriented databases
CN107679146A (en) The method of calibration and system of electric network data quality
Barberis et al. The ATLAS EventIndex: architecture, design choices, deployment and first operation experience
CN111259004B (en) Method for indexing data in storage engine and related device
US11461333B2 (en) Vertical union of feature-based datasets
CN110245145A (en) Structure synchronization method and apparatus of the relevant database to Hadoop database
CN103995828B (en) A kind of cloud storage daily record data analysis method
CN109656910B (en) Extensible large-scale biomedical sample management and visualization platform
CN110019017B (en) High-energy physical file storage method based on access characteristics
Sangeetha et al. A survey on big data mining techniques
Reis et al. An evaluation of data model for NoSQL document-based databases
CN111125045B (en) Lightweight ETL processing platform
Manghi et al. De-duplication of aggregation authority files
Win et al. Document clustering by fuzzy c-mean algorithm
Chen et al. Analysis of plant breeding on hadoop and spark
Alexandrov et al. Design of the event metadata system for the experiments at NICA
Barberis et al. The ATLAS EventIndex for LHC Run 3
Sulova The Usage of Data Lake for Business Intelligence Data Analysis
Davardoost et al. An innovative model for extracting olap cubes from nosql database based on scalable naïve bayes classifier
CN110413602B (en) Layered cleaning type big data cleaning method
RU2417424C1 (en) Method of compensating for multi-dimensional data for storing and searching for information in database management system and device for realising said method
Punia et al. Implementing Information System Using MongoDB and Redis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant