CN117854628A - Configuration method and system of drug development database - Google Patents
Configuration method and system of drug development database Download PDFInfo
- Publication number
- CN117854628A CN117854628A CN202211205105.1A CN202211205105A CN117854628A CN 117854628 A CN117854628 A CN 117854628A CN 202211205105 A CN202211205105 A CN 202211205105A CN 117854628 A CN117854628 A CN 117854628A
- Authority
- CN
- China
- Prior art keywords
- data
- protein
- drug
- database
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000009509 drug development Methods 0.000 title claims description 19
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 64
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 64
- 239000003814 drug Substances 0.000 claims abstract description 53
- 229940079593 drug Drugs 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 35
- 239000003446 ligand Substances 0.000 claims abstract description 33
- 239000013078 crystal Substances 0.000 claims abstract description 29
- 125000003275 alpha amino acid group Chemical group 0.000 claims abstract description 8
- 238000012958 reprocessing Methods 0.000 claims abstract description 8
- 230000035772 mutation Effects 0.000 claims description 26
- 150000001875 compounds Chemical class 0.000 claims description 14
- 201000010099 disease Diseases 0.000 claims description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 9
- 150000002611 lead compounds Chemical class 0.000 claims description 8
- 230000000975 bioactive effect Effects 0.000 claims description 7
- 238000000547 structure data Methods 0.000 claims description 7
- 238000002864 sequence alignment Methods 0.000 claims description 6
- 230000002596 correlated effect Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 230000004071 biological effect Effects 0.000 claims description 3
- 229940126586 small molecule drug Drugs 0.000 claims description 3
- 150000003384 small molecules Chemical class 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000007876 drug discovery Methods 0.000 claims 1
- 238000012827 research and development Methods 0.000 abstract description 19
- 230000010354 integration Effects 0.000 abstract description 4
- 238000002425 crystallisation Methods 0.000 description 4
- 230000008025 crystallization Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009510 drug design Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000010754 BS 2869 Class F Substances 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The configuration method of the drug research and development database provided by the invention comprises the following steps: acquiring relevant data from a public database; processing, correlating and matching the related data; searching and displaying the data which the user needs to inquire; reprocessing the protein crystal structure of the ligand so that the combination mode of the ligand and the protein is easier to understand and display; the amino acid sequences of the multiple targets are aligned for intuitively displaying the differences among the sequences; and processing a plurality of protein crystal structures to enable the structural relationship among the proteins to be visually displayed. The research and development efficiency of drug research and development personnel can be effectively improved, and more research and development ideas are provided. The data integration degree among a large number of databases is improved, the research and development threshold is reduced, and the research and development efficiency is improved.
Description
Technical Field
The invention relates to the field of data supervision, in particular to a configuration method and a configuration system of a drug research and development database.
Background
The drug development requires a lot of manpower and material resources, especially in the initial drug design stage, a lot of data are needed to support the design work of the developer. Due to the specificity of the pharmaceutical chemistry field, various data related to pharmaceutical design are dispersed in various public databases, which is not beneficial to search and use of research personnel. On the other hand, the tools used by the research and development personnel for drug design are usually in various client forms, the software is difficult to communicate with each other, and a certain technical threshold is provided for use.
The method improves the data integration degree among a large number of databases, reduces the research and development threshold, improves the research and development efficiency, and is always a research key point and a technical problem in the field.
Disclosure of Invention
In view of the foregoing, the present invention has been made to provide a method and system for configuring a drug development database that overcomes or at least partially solves the foregoing problems.
According to an aspect of the present invention, there is provided a method of configuring a drug development database comprising:
acquiring relevant data from a public database;
processing, correlating and matching the related data;
searching and displaying the data which the user needs to inquire;
reprocessing the protein crystal structure of the ligand so that the combination mode of the ligand and the protein is easier to understand and display;
the amino acid sequences of the multiple targets are aligned for intuitively displaying the differences among the sequences;
and processing a plurality of protein crystal structures to enable the structural relationship among the proteins to be visually displayed.
Optionally, the related data specifically includes:
drug data, target data, protein crystal structure data, indication data, bioactivity data, lead data, and mutation data.
Optionally, the processing method of the drug data includes:
acquiring data defining the types of the medicaments from each medicament data table, screening out small-molecule medicaments marked as the medicaments, and independently storing other types of medicaments;
from small molecule drugs, finding out data defining the structure of the drug, and directly using SMILES as an identification mode;
introducing the SMILES into an open source module RDkit, and converting the SMILES into uniform RDkit_SMILES by using the RDkit;
directly comparing all RDkit_SMILES, identifying medicaments with different data sources of the same RDkit_SMILES as the same medicament, and merging data;
and acquiring the DRUGBANK ID by matching the data of the DRUGBANK database, and associating with other data tables by using the DRUGBANK ID as a main key of the data table.
Optionally, the processing method of the target point data includes:
obtaining classification data of targets from a target data table;
classifying the targets according to a classification mode;
for targets lacking classification information, marking the classification as TBD, and waiting for other modes to confirm classification;
and merging the target data through Uniprot ID, and associating with other data tables as a unique primary key.
Optionally, the processing method of the protein crystal structure data comprises the following steps:
extracting data from each protein three-dimensional data file to obtain basic information of the protein three-dimensional data files;
ignoring PDB which does not belong to protein in HEADER, and only reserving the protein three-dimensional data file which belongs to protein;
acquiring Uniprot ID in the detailed information of each protein three-dimensional data file for correlation with target data;
and taking the PDB ID of each protein three-dimensional data file as a main key, and correlating with other data tables.
Optionally, the method for processing indication data includes:
the indication data obtained from the database are matched through synonyms, and the indications with the same name are combined;
the indications are correlated with the drug data by DRUGBANK ID and with Clinical laboratory information in the Clinical tools database by NCT NUMBER.
Optionally, the method for processing the biological activity data comprises the following steps:
acquiring bioactive data from a database, wherein the bioactive data comprises compound data, target data and experimental result data between the compound data and the target data;
and correlating the compound data and the target point data with the drug data and the target point data respectively through SMILES and Uniprot ID, so that the subsequent calling is facilitated.
Optionally, the method for processing the lead compound data specifically includes:
acquiring all compound data from the biological activity test data, screening the compound data, and selecting data with data types and data values meeting the requirements;
identifying the SMILES of the part of data, and combining the data of the same molecule;
after matching molecules through the CHEMBL database, other data is correlated using CHEMBL ID as a primary key.
Optionally, the method for processing mutation data specifically includes:
obtaining mutation data from a database, classifying according to mutations associated with the disease and mutations associated with the ligand;
for disease-related mutations, it is necessary to correlate Uniprot ID with disease name, in addition to mutation site information;
for ligand-related mutations, it is necessary to correlate the Uniprot ID with ligand information;
and after finishing according to the Uniprot ID, correlating with a target point through the Uniprot ID.
The invention also provides a configuration system of the drug development database, which comprises:
the data acquisition module is used for acquiring related data from the public database;
the data processing module is used for processing, correlating and matching the related data;
the retrieval matching module is used for retrieving and displaying the data which the user needs to query;
the ligand display module is used for reprocessing the protein crystal structure where the ligand is located, so that the combination mode of the ligand and the protein is easier to understand and display;
the sequence alignment module is used for aligning the amino acid sequences of a plurality of targets and intuitively displaying the difference between the sequences;
and the structure alignment module is used for processing a plurality of protein crystal structures and enabling the structural relationship among the proteins to be visually displayed.
The configuration method of the drug research and development database provided by the invention comprises the following steps: acquiring relevant data from a public database; processing, correlating and matching the related data; searching and displaying the data which the user needs to inquire; reprocessing the protein crystal structure of the ligand so that the combination mode of the ligand and the protein is easier to understand and display; the amino acid sequences of the multiple targets are aligned for intuitively displaying the differences among the sequences; and processing a plurality of protein crystal structures to enable the structural relationship among the proteins to be visually displayed. The research and development efficiency of drug research and development personnel can be effectively improved, and more research and development ideas are provided. The data integration degree among a large number of databases is improved, the research and development threshold is reduced, and the research and development efficiency is improved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for configuring a drug development database according to an embodiment of the present invention;
FIG. 2 is a block diagram of a configuration system of a drug development database according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for processing drug data according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for processing target data according to an embodiment of the present invention;
FIG. 5 is a flowchart of a method for processing protein crystal structure data according to an embodiment of the present invention;
fig. 6 is a flowchart of a method for processing indication data according to an embodiment of the present invention;
FIG. 7 is a flowchart of a method for processing bioactive data according to an embodiment of the present invention;
FIG. 8 is a flow chart of a method for processing lead compound data according to an embodiment of the present invention;
FIG. 9 is a flowchart of a method for processing mutation data according to an embodiment of the present invention;
FIG. 10 is a flow chart of a drug search provided by an embodiment of the present invention;
FIG. 11 is a flowchart of target searching provided by an embodiment of the present invention;
FIG. 12 is a flow chart of an indication search provided by an embodiment of the present invention;
FIG. 13 is a flow chart of a lead compound search provided in an embodiment of the present invention;
FIG. 14 is a flow chart of a ligand display provided by an embodiment of the present invention;
FIG. 15 is a flow chart of sequence alignment provided by an embodiment of the present invention;
fig. 16 is a flow chart of structure alignment provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprising" and "having" and any variations thereof in the description embodiments of the invention and in the claims and drawings are intended to cover a non-exclusive inclusion, such as a series of steps or elements.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings and the examples.
As shown in fig. 1, a method for configuring a drug development database includes:
acquiring relevant data from a public database;
processing, correlating and matching the related data;
searching and displaying the data which the user needs to inquire;
reprocessing the protein crystal structure of the ligand so that the combination mode of the ligand and the protein is easier to understand and display;
the amino acid sequences of the multiple targets are aligned for intuitively displaying the differences among the sequences;
and processing a plurality of protein crystal structures to enable the structural relationship among the proteins to be visually displayed.
As shown in fig. 2, a configuration system of a drug development database includes:
the data acquisition module is used for acquiring related data from the public database;
the data processing module is used for processing, correlating and matching the related data;
the retrieval matching module is used for retrieving and displaying the data which the user needs to query;
the ligand display module is used for reprocessing the protein crystal structure where the ligand is located, so that the combination mode of the ligand and the protein is easier to understand and display;
the sequence alignment module is used for aligning the amino acid sequences of a plurality of targets and intuitively displaying the difference between the sequences;
and the structure alignment module is used for processing a plurality of protein crystal structures and enabling the structural relationship among the proteins to be visually displayed.
The invention includes the following types of data: drug data, target spot data, protein crystal structure data, indication data, bioactivity data, lead compound data and mutation data.
The data integration method of the invention is as follows:
as shown in fig. 3, drug data: data defining the Type of Drug (column name is usually Drug Type) is found from each Drug data table, and drugs labeled as Small molecules molecular are selected and other types of drugs are stored separately. From small molecule drugs, data defining the structure of the drug is found, and SMILES is directly used as a recognition mode. The SMILES is imported into an open source module RDkit, and the RDkit is used for converting the SMILES into unified RDkit_SMILES. All RDkit_SMILES are directly compared, and drugs with different data sources of the same RDkit_SMILES are identified as the same drug, and the data are combined. And acquiring the DRUGBANK ID by matching the data of the DRUGBANK database, taking the DRUGBANK ID as a main key of the data table, and correlating the DRUGBANK ID with other data tables.
As shown in fig. 4, target data: and obtaining classification data of the targets from each target data table, and classifying the targets according to classification modes (Class A, class B, class C, class D and Class F). For targets lacking classification information, the classification is labeled "TBD", waiting for other means to confirm the classification. And merging all target point data through Uniprot ID, and associating with other data tables as a unique primary key.
As shown in fig. 5, protein crystal structure data: and (3) extracting data of the protein crystallization structure from each PDB file to obtain PDB basic information, ignoring PDB which does not belong to the protein in the HEADER, and only retaining the PDB which belongs to the protein. And acquiring Uniprot ID in the detailed information of each PDB file for correlation with target data. The PDB ID of each PDB is associated with other data tables as a primary key.
As shown in fig. 6, indication data: and (3) the indication data obtained from the database are matched through synonyms, and the indications with the same name are combined. The indications are associated with the drug data by DRUGBANK ID and with Clinical laboratory information in the Clinical tools database by NCT NUMBER.
As shown in fig. 7, bioactivity data: and obtaining bioactive data from a database, wherein the bioactive data comprises data of the compound, data of the target point and data of experimental results between the compound and the target point. And the compound data and the target data are respectively related with the drug data and the target data through SMILES and Uniprot ID, so that the subsequent calling is facilitated.
As shown in fig. 8, the lead compound data: all compound data were obtained from the bioactivity test data and screened, only data with data types Ki, kd, IC50, EC50 and data values not exceeding 1000nM were selected. And identifying SMILES of the data, and merging the data of the same molecule. After matching molecules through the CHEMBL database, other data is correlated using CHEMBL ID as a primary key.
As shown in fig. 9, mutation data: mutation data is obtained from the database and classified according to the mutation associated with the disease and the ligand. For disease-related mutations, it is necessary to correlate Uniprot ID with disease name, in addition to mutation site information. For ligand-related mutations, uniprot ID needs to be associated with ligand information. After finishing according to the Uniprot ID, associating with the target point through the Uniprot ID.
The invention relates to a functional module: the system comprises a search matching module, a ligand display module, a sequence alignment module and a structure alignment module.
And (5) searching a matching module:
as shown in fig. 10, the user inputs SMILES or drug name to search for drug data, and the background matches the relevant target data Uniprot ID, protein crystal data PDB ID, and indication data via drug bank ID in the drug data, and all the data are combined and displayed.
As shown in fig. 11, the target search is performed, the user inputs UNIPROT ID and target name to search target data, and the background matches related drug data drug ID, protein crystallization data PDB ID and mutation data UNIPROT ID through UNIPROT ID in the target data, and finally displays the drug data drug ID, the protein crystallization data PDB ID and the mutation data UNIPROT ID together.
As shown in fig. 12, the indication search is performed, the user inputs the indication name, matches the indication name in the database, and then associates the drug data with the indication.
As shown in fig. 13, the user inputs SMILES and CHEMBL ID to search for the lead compound data, matches CHEMBL ID in the lead compound data with the corresponding target data Uniprot ID, reads other data of the corresponding target, and finally displays the result.
As shown in fig. 14, the ligand display module: the user selects a designated protein crystal structure in the PDB display plug-in, the system displays a list of ligands present in the crystal structure, the user continues to select the designated ligand, and a designated radius is entered. The system receives the three information: and (3) reading the protein crystallization structure file in the database, calculating according to the three parameters, and loading the calculated amino acid residues into the PDB display plug-in for highlighting.
As shown in fig. 15, the sequence alignment module: the user inputs the target names of a plurality of targets, reads the sequence information of the corresponding targets from the database, calculates the information, and gives a similarity result and an alignment condition.
As shown in fig. 16, the structure alignment module: the user inputs IDs of a plurality of protein crystal structures, selects a designated Chain ID, a designated cut-off value and a designated cycle number, reads the designated protein crystal structure from a database, substitutes the parameters into the database for calculation, gives an offset value, and loads the aligned protein crystal structure into a PDB display plug-in a file form.
The beneficial effects are that: through the cooperative use of the modules, the research and development efficiency of drug research and development personnel can be effectively improved, and more research and development ideas are provided.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.
Claims (10)
1. A method of configuring a drug discovery database, the method comprising:
acquiring relevant data from a public database;
processing, correlating and matching the related data;
searching and displaying the data which the user needs to inquire;
reprocessing the protein crystal structure of the ligand so that the combination mode of the ligand and the protein is easier to understand and display;
the amino acid sequences of the multiple targets are aligned for intuitively displaying the differences among the sequences;
and processing a plurality of protein crystal structures to enable the structural relationship among the proteins to be visually displayed.
2. The method for configuring a drug development database according to claim 1, wherein the related data specifically comprises:
drug data, target data, protein crystal structure data, indication data, bioactivity data, lead data, and mutation data.
3. A method of configuring a drug development database according to claim 2, wherein the method of processing drug data comprises:
acquiring data defining the types of the medicaments from each medicament data table, screening out small-molecule medicaments marked as the medicaments, and independently storing other types of medicaments;
from small molecule drugs, finding out data defining the structure of the drug, and directly using SMILES as an identification mode;
introducing the SMILES into an open source module RDkit, and converting the SMILES into uniform RDkit_SMILES by using the RDkit;
directly comparing all RDkit_SMILES, identifying medicaments with different data sources of the same RDkit_SMILES as the same medicament, and merging data;
and acquiring the DRUGBANK ID by matching the data of the DRUGBANK database, and associating with other data tables by using the DRUGBANK ID as a main key of the data table.
4. The method for configuring a drug development database according to claim 2, wherein the method for processing target data comprises:
obtaining classification data of targets from a target data table;
classifying the targets according to a classification mode;
for targets lacking classification information, marking the classification as TBD, and waiting for other modes to confirm classification;
and merging the target data through Uniprot ID, and associating with other data tables as a unique primary key.
5. The method for configuring a drug development database according to claim 2, wherein the method for processing the protein crystal structure data comprises:
extracting data from each protein three-dimensional data file to obtain basic information of the protein three-dimensional data files;
ignoring PDB which does not belong to protein in HEADER, and only reserving the protein three-dimensional data file which belongs to protein;
acquiring Uniprot ID in the detailed information of each protein three-dimensional data file for correlation with target data;
and taking the PDB ID of each protein three-dimensional data file as a main key, and correlating with other data tables.
6. A method of configuring a drug development database according to claim 2, wherein the method of processing the indication data comprises:
the indication data obtained from the database are matched through synonyms, and the indications with the same name are combined;
the indications are correlated with the drug data by DRUGBANK ID and with Clinical laboratory information in the Clinical tools database by NCT NUMBER.
7. The method for configuring a drug development database according to claim 2, wherein the method for processing the bioactivity data comprises:
acquiring bioactive data from a database, wherein the bioactive data comprises compound data, target data and experimental result data between the compound data and the target data;
and correlating the compound data and the target point data with the drug data and the target point data respectively through SMILES and Uniprot ID, so that the subsequent calling is facilitated.
8. The method for configuring a drug development database according to claim 2, wherein the lead compound data specifically includes:
acquiring all compound data from the biological activity test data, screening the compound data, and selecting data with data types and data values meeting the requirements;
identifying the SMILES of the part of data, and combining the data of the same molecule;
after matching molecules through the CHEMBL database, other data is correlated using CHEMBL ID as a primary key.
9. The method for configuring a drug development database according to claim 2, wherein the mutation data specifically comprises:
obtaining mutation data from a database, classifying according to mutations associated with the disease and mutations associated with the ligand;
for disease-related mutations, it is necessary to correlate Uniprot ID with disease name, in addition to mutation site information;
for ligand-related mutations, it is necessary to correlate the Uniprot ID with ligand information;
and after finishing according to the Uniprot ID, correlating with a target point through the Uniprot ID.
10. A system for configuring a drug development database, the system comprising:
the data acquisition module is used for acquiring related data from the public database;
the data processing module is used for processing, correlating and matching the related data;
the retrieval matching module is used for retrieving and displaying the data which the user needs to query;
the ligand display module is used for reprocessing the protein crystal structure where the ligand is located, so that the combination mode of the ligand and the protein is easier to understand and display;
the sequence alignment module is used for aligning the amino acid sequences of a plurality of targets and intuitively displaying the difference between the sequences;
and the structure alignment module is used for processing a plurality of protein crystal structures and enabling the structural relationship among the proteins to be visually displayed.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211205105.1A CN117854628A (en) | 2022-09-30 | 2022-09-30 | Configuration method and system of drug development database |
PCT/CN2023/100464 WO2024066489A1 (en) | 2022-09-30 | 2023-06-15 | Configuration method for drug research and development database, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211205105.1A CN117854628A (en) | 2022-09-30 | 2022-09-30 | Configuration method and system of drug development database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117854628A true CN117854628A (en) | 2024-04-09 |
Family
ID=90475901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211205105.1A Pending CN117854628A (en) | 2022-09-30 | 2022-09-30 | Configuration method and system of drug development database |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117854628A (en) |
WO (1) | WO2024066489A1 (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11515105A (en) * | 1995-10-13 | 1999-12-21 | マウント・サイナイ・ホスピタル・コーポレイション | Activating a novel ligand regulatory pathway |
CN109545284A (en) * | 2018-10-16 | 2019-03-29 | 中国人民解放军军事科学院军事医学研究院 | Drug integrated information database building method and system based on drug and target information |
CN110618987A (en) * | 2019-09-18 | 2019-12-27 | 宁夏大学 | Treatment pathway key node information processing method based on lung cancer medical big data |
CN111415702B (en) * | 2020-03-03 | 2023-05-05 | 深圳晶泰科技有限公司 | Method for establishing molecular structure and activity database |
CN114203269B (en) * | 2022-02-17 | 2022-05-10 | 北京泽桥医疗科技股份有限公司 | Anticancer traditional Chinese medicine screening method based on machine learning and molecular docking technology |
-
2022
- 2022-09-30 CN CN202211205105.1A patent/CN117854628A/en active Pending
-
2023
- 2023-06-15 WO PCT/CN2023/100464 patent/WO2024066489A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024066489A1 (en) | 2024-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dror et al. | MASS: multiple structural alignment by secondary structures | |
US6389428B1 (en) | System and method for a precompiled database for biomolecular sequence information | |
US6023659A (en) | Database system employing protein function hierarchies for viewing biomolecular sequence data | |
US6553317B1 (en) | Relational database and system for storing information relating to biomolecular sequences and reagents | |
Simon et al. | Drug effect prediction by polypharmacology-based interaction profiling | |
Ji et al. | Identifying time-lagged gene clusters using gene expression data | |
Terwilliger et al. | Ligand identification using electron-density map correlations | |
JP2009520278A (en) | Systems and methods for scientific information knowledge management | |
KR101117603B1 (en) | System and method for providing functional correlation information of biomedical data by generating inter-linkable maps | |
Cleal et al. | Dysgu: efficient structural variant calling using short or long reads | |
Curran et al. | Computer aided manual validation of mass spectrometry-based proteomic data | |
CN113742443A (en) | Multi-medicine sharing query method, mobile terminal and storage medium | |
CN117854628A (en) | Configuration method and system of drug development database | |
EP4033492A1 (en) | Database construction method and apparatus, and file retrieval method and apparatus | |
Kifer et al. | GOSSIP: a method for fast and accurate global alignment of protein structures | |
EP1419383A2 (en) | System and method for storing mass spectrometry data | |
Gabetta et al. | A Unified Medical Language System (UMLS) based system for literature-based discovery in medicine | |
Berman et al. | The Nucleic Acid Database: a resource for nucleic acid science | |
CN113470776B (en) | Genetic diagnosis system integrating data acquisition, analysis and report generation | |
Mottaz et al. | Designing an optimal expansion method to improve the recall of a genomic variant curation-support service | |
Kynast et al. | ATLIGATOR: editing protein interactions with an atlas-based approach | |
Bastas et al. | Bioinformatic requirements for protein database searching using predicted epitopes from disease-associated antibodies | |
KR20200104672A (en) | Method and apparatus of the Classification of Species using Sequencing Clustering | |
CN117275656B (en) | Method and system for automatically generating standardized report of clinical test record | |
CN115050421A (en) | Method for storing tumor neogenesis antigen and targeted drug information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |