CN116246696A - Ligand docking gesture virtual screening method based on quick retrieval - Google Patents

Ligand docking gesture virtual screening method based on quick retrieval Download PDF

Info

Publication number
CN116246696A
CN116246696A CN202310319885.0A CN202310319885A CN116246696A CN 116246696 A CN116246696 A CN 116246696A CN 202310319885 A CN202310319885 A CN 202310319885A CN 116246696 A CN116246696 A CN 116246696A
Authority
CN
China
Prior art keywords
conformation
ligand
conformations
screening
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310319885.0A
Other languages
Chinese (zh)
Inventor
陈晓健
顾彦慧
刘畅
张先锋
夏浩辉
李杨
李亚飞
王金兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Normal University
Original Assignee
Nanjing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Normal University filed Critical Nanjing Normal University
Priority to CN202310319885.0A priority Critical patent/CN116246696A/en
Publication of CN116246696A publication Critical patent/CN116246696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biochemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a ligand docking gesture virtual screening method based on quick retrieval, which comprises the following steps: preprocessing ligand conformation information data, and establishing an index table for retrieving a tree space index structure; inputting known active ligand conformations, rapidly searching and screening potential candidate conformations similar to the query conformations by utilizing a search tree space index structure, and taking the top-k query result which is the most similar as a top-k conformational result; and evaluating the top-k conformation result obtained by retrieval, comparing and outputting the actual RMSD value of the top-k candidate conformation and the natural conformation, verifying the accuracy of the screening result, and further optimizing the screening strategy. The invention organizes the data structure and creates the index by utilizing the three-dimensional space search tree based on the space data of the ligand molecules so as to reduce the search range, thereby being capable of quickly searching out the optimal butt joint gesture structure in massive ligand structure data and effectively improving the prediction performance.

Description

Ligand docking gesture virtual screening method based on quick retrieval
Technical Field
The invention belongs to the field of computer-aided drug design and content retrieval, and particularly relates to a ligand docking posture virtual screening method based on rapid retrieval.
Background
Predicting the docking posture between proteins and ligands plays an important role in computer-aided biopharmaceuticals, and how to improve the prediction and screening efficiency becomes a key one of them. With the advent of protein design technology, more potential proteins are continually being explored, and their properties and functions are more abundant, so the need to rapidly screen out optimal ligand docking positions is continually rising, emerging proteins are continually emerging, and related protein property data and ligand docking position data are difficult to rapidly follow, which becomes a great difficulty in computer-aided drug prediction work.
The traditional method generally generates a combination of a plurality of docking gestures, and a group of docking gestures which most meet the conditions are screened out on the basis of the combination. In the logic of gesture docking prediction and screening, existing knowledge is needed to comprehensively consider the local information and the whole information of the intramolecular force, the intermolecular force and the protein, and the method similar to 'blind search' is used for searching, so that the defects of long time consumption, high cost and insufficient accuracy are serious. In addition, the method for predicting the docking posture based on the neural network generally needs massive posture docking data with good labels, the existing biomedical information is still limited, the data labels are not perfect, and some important information among molecules is inevitably lost or ignored by the existing method.
Drug discovery methods also include High Throughput Screening (HTSBDD), structure-based drug discovery (SBDD), and the like. Among them, the drug discovery method based on high throughput screening uses a phenotypic screening method, but it is based on biochemical experiments with lower efficiency; in structure-based drug discovery, the method using molecular docking computation requires higher computational performance while accuracy remains a bottleneck.
The content-based quick search method can utilize the existing information base to carry out quick screening through the characteristic value index, so that the search efficiency is improved and the effectiveness of the obtained result is ensured. The current method for searching the space data mainly comprises a space index method and a dimension reduction method. The former is a data structure for organizing and managing spatial data, which can effectively reduce the search range and improve the query efficiency. The space index method has many successful applications in the fields of geographic information systems, computer graphics, robot navigation and the like, and particularly in Geographic Information Systems (GIS), the space index method plays an important role in traffic flow analysis and land utilization planning. In the field of drug discovery, there is also an indexing method, which is based on fragment drug discovery (FBDD), using small molecule fragments as starting points, searching for lead compounds that bind to targets by indexing or other methods, but there is no mention of spatial data contained by ligands at the time of gesture docking, nor of using corresponding indexes to optimize the search strategy.
Disclosure of Invention
The invention aims to: in order to overcome the defects in the prior art, a ligand butt joint gesture virtual screening method based on quick retrieval is provided, a spatial retrieval tree is adopted, the spatial position relation of ligand conformation molecules is used for indexing, a splitting and screening strategy is optimized, the defects of complexity of gradual trial and error of molecules in a traditional prediction method and the defect of losing important information among molecules in a machine learning prediction method are avoided, the full utilization of the existing ligand conformation molecule information is realized, and the accuracy and the efficiency of the ligand molecule butt joint gesture prediction are greatly improved.
The technical scheme is as follows: in order to achieve the above purpose, the invention provides a ligand docking gesture virtual screening method based on quick retrieval, which comprises the following steps:
s1: preprocessing ligand conformation information data, and establishing an index table for obtaining a search tree space index structure;
s2: inputting known active ligand conformations, rapidly searching and screening potential candidate conformations similar to the query conformations by utilizing a search tree space index structure, and taking the top-k query result which is the most similar as a top-k conformational result;
s3: and evaluating the top-k conformation result obtained by retrieval, comparing and outputting the actual RMSD value of the top-k candidate conformation and the natural conformation, verifying the accuracy of the screening result, and further optimizing the screening strategy.
Further, in the step S1, the biological information data of the ligand conformation is preprocessed based on the spatial position relationship of the ligand conformation, and the specific processing steps are as follows:
a1: processing the CASF-2016 ligand docking candidate data set in the PDBbind to obtain biological information of different constellations of a single ligand, and simultaneously obtaining RMSD values between different candidate conformations and the natural ligand conformations;
a2: extracting atomic space position information in each conformation, and converting a corresponding three-dimensional structure into a group of characteristic points, wherein each characteristic point comprises coordinate and type information; covering all feature points in the constellation with a minimum bounding box in space;
a3: and constructing a retrieval tree from the feature point set according to a hierarchical structure, wherein each leaf node stores a feature point, and each non-leaf node stores the minimum bounding box of its child nodes.
Further, the biological information of different constellations of the single ligand in the step A1 includes molecular system information, composition atom information, bond value information, and substructure information.
Further, the step S2 specifically includes the following steps:
b1: extracting atomic space position information in the known active ligand conformation, and converting the three-dimensional structure of the known active ligand conformation into a group of characteristic points, wherein each characteristic point comprises coordinate and type information;
b2: searching the feature point set of the known active ligand conformation in an established search tree, performing similarity matching with candidate conformations to be screened in a hierarchical mode, and calculating an RMSD value, namely a similarity score according to the number and the distance of the matched feature points;
b3: and sorting the candidate conformations to be screened according to the similarity score, and selecting a part of candidate conformations with highest scores as candidate ligand conformations to obtain a top-k docking posture result.
Further, in the step B2, the feature point set of the known active ligand conformation is searched in an established search tree, wherein the search comprises top-down search and bottom-up search;
the search from top to bottom specifically comprises the following steps: firstly, finding out an area where an instance is located from a root node through a retrieval method, then further dividing the area according to the attribute of a target, finding out a next layer of area, and sequentially iterating to finally obtain a conformational output result;
the bottom-up search is specifically as follows: firstly, determining the atomic position relation according to the preprocessed data, and distinguishing different examples through clustering and measurement learning means.
Further, in the step B2, a similarity score calculating method based on a spatial position relationship is adopted, which specifically includes the following steps:
the similarity score calculation formula of the candidate conformation based on the spatial position relation is as follows:
Figure BDA0004151310110000031
wherein x is ij X being the j-th conformation in top-k results j All points in the neighborhood, y i For all points within the y-neighborhood of the native conformation, n is the sum of the number of atoms contained in the individual ligands, the distance between points in equation (1) is defined as the deviation or deviation error of the candidate conformation, the smaller the RMSD value, representing the candidate conformationThe closer the molecule is to the known active conformation;
in order to minimize the value of the formula (1), namely, the docking posture closest to the natural conformation is obtained, the minimum value is taken from the calculation set, and the obtained conformation is the conformation molecule with the smallest deviation difference in the candidate conformation library; and after obtaining the conformation with the minimum deviation difference value in the search tree, backtracking upwards to obtain a conformation with the minimum deviation error outside the obtained conformation, and iterating until top-k conformation query results are output.
Further, the step S3 specifically includes:
c1: evaluating the screening result; calculating the RMSD values of all candidate conformations, sequencing the results to obtain k minimum conformations, comparing the k minimum conformations with top-k results obtained by searching, calculating the accuracy and comparing the time consumption;
c2: optimizing a screening strategy of a search tree; optimizing the splitting strategy, reconstructing the index item of the search tree, and carrying out reevaluation to obtain the strategy with highest accuracy through multiple experiments.
Further, the optimizing splitting strategy in the step C2 comprises linear splitting, binary splitting, quadtree splitting and the like.
The screening in the step C2 of the present invention is mainly based on the idea of ThresholdAlgorithm (TA), and when the regions are combined, two regions which are most relevant in cognition need to be combined together, so that the weights of different attribute parameters affect the sorting, thereby affecting the strategy of combining and splitting.
The invention organizes the data structure and creates the index by utilizing the three-dimensional space search tree based on the space data of the ligand molecules so as to reduce the search range, thereby being capable of quickly searching out the optimal butt joint gesture structure in massive ligand structure data and effectively improving the prediction performance.
The invention utilizes the spatial position relation of different conformations of the ligand to establish indexes for the ligand database, can rapidly screen the top-k docking gesture structure most similar to the known conformation in a large-scale ligand candidate library, and ensures the accuracy of the conformation obtained by screening.
The beneficial effects are that: compared with the prior art, the invention adopts a space retrieval tree, indexes by using the space position relation of ligand conformational molecules, optimizes the splitting and screening strategies, avoids the complexity of gradual trial and error of molecules in the traditional prediction method and the defect of losing important information among molecules in the machine learning prediction method, realizes the full utilization of the information of the existing ligand conformational molecules, greatly improves the accuracy and efficiency of the prediction of the ligand molecule docking posture, can quickly retrieve the optimal docking posture structure in massive ligand structure data, and plays an important role in the design of computer-aided medicaments in medicament discovery and design.
Drawings
FIG. 1 is a schematic overall flow diagram of the method of the present invention;
FIG. 2 is a schematic diagram of a conformational preprocessing scheme of the ligand docking pose of the present invention;
FIG. 3 is a schematic diagram of a search query flow for ligand docking gestures according to the present invention.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various modifications of the invention, which are equivalent to those skilled in the art upon reading the invention, will fall within the scope of the invention as defined in the appended claims.
The invention provides a ligand docking gesture virtual screening method based on quick retrieval, which is shown in figure 1 and comprises the following steps:
s1: preprocessing ligand conformation information data, and establishing an index table for obtaining a search tree space index structure;
s2: inputting known active ligand conformations, rapidly searching and screening potential candidate conformations similar to the query conformations by utilizing a search tree space index structure, and taking the top-k query result which is the most similar as a top-k conformational result;
s3: and evaluating the top-k conformation result obtained by retrieval, comparing and outputting the actual RMSD value of the top-k candidate conformation and the natural conformation, verifying the accuracy of the screening result, and further optimizing the screening strategy.
Referring to fig. 2, in step S1 of the present embodiment, biological information data of ligand conformation is preprocessed based on spatial position relation of ligand conformation, and specific processing steps are as follows:
a1: processing a CASF-2016 ligand docking candidate data set in the PDBbind to obtain biological information of different constellations of a single ligand, including molecular system information, composition atom information, bond value information, substructure information and the like, and simultaneously obtaining RMSD values between different candidate conformations and natural ligand conformations;
a2: extracting atomic space position information in each conformation, and converting a corresponding three-dimensional structure into a group of characteristic points, wherein each characteristic point comprises coordinate and type information; covering all feature points in the constellation with a minimum bounding box in space;
a3: and constructing a retrieval tree from the feature point set according to a hierarchical structure, wherein each leaf node stores a feature point, and each non-leaf node stores the minimum bounding box of its child nodes.
Referring to fig. 3, step S2 of the present embodiment specifically includes the following steps:
b1: extracting atomic space position information in the known active ligand conformation, and converting the three-dimensional structure of the known active ligand conformation into a group of characteristic points, wherein each characteristic point comprises coordinate and type information;
b2: searching the feature point set of the known active ligand conformation in an established search tree, performing similarity matching with candidate conformations to be screened in a hierarchical mode, and calculating an RMSD value, namely a similarity score according to the number and the distance of the matched feature points;
in the step, the feature point set of the known active ligand conformation is searched in an established search tree, wherein the search comprises top-down search and bottom-up search;
the search from top to bottom specifically comprises the following steps: firstly, finding out an area where an instance is located from a root node through a retrieval method, then further dividing the area according to the attribute of a target, finding out a next layer of area, and sequentially iterating to finally obtain a conformational output result;
the bottom-up search is specifically as follows: firstly, determining the atomic position relation according to the preprocessed data, and distinguishing different examples through clustering and measurement learning means.
In this embodiment, a similarity score calculation method based on spatial position relationship is adopted, which specifically includes the following steps:
the similarity score calculation formula of the candidate conformation based on the spatial position relation is as follows:
Figure BDA0004151310110000051
wherein x is ij X being the j-th conformation in top-k results j All points in the neighborhood, y i For all points within the y-neighborhood of the native conformation, n is the sum of the number of atoms contained in the single ligand, the distance between points in equation (1) is defined as the deviation or deviation error of the candidate conformation, the smaller the RMSD value, the closer the conformational molecule representing the candidate is to the known active conformation;
in order to minimize the value of the formula (1), namely, the docking posture closest to the natural conformation is obtained, the minimum value is taken from the calculation set, and the obtained conformation is the conformation molecule with the smallest deviation difference in the candidate conformation library; and after obtaining the conformation with the minimum deviation difference value in the search tree, backtracking upwards to obtain a conformation with the minimum deviation error outside the obtained conformation, and iterating until top-k conformation query results are output.
B3: and sorting the candidate conformations to be screened according to the similarity score, and selecting a part of candidate conformations with highest scores as candidate ligand conformations to obtain a top-k docking posture result.
The specific process of step S3 in this embodiment is as follows:
c1: evaluating the screening result; calculating the RMSD values of all candidate conformations, sequencing the results to obtain k minimum conformations, comparing the k minimum conformations with top-k results obtained by searching, calculating the accuracy and comparing the time consumption;
c2: optimizing a screening strategy of a search tree; optimizing splitting strategies, including linear splitting, binary splitting, quadtree splitting and the like, reconstructing index items of the search tree, reevaluating, and obtaining the strategy with highest accuracy through multiple experiments.
The embodiment also provides a ligand docking gesture virtual screening system based on quick retrieval, which comprises a network interface, a memory and a processor; the network interface is used for receiving and transmitting signals in the process of receiving and transmitting information with other external network elements; a memory storing computer program instructions executable on the processor; and a processor for executing the steps of the consensus method as described above when executing the computer program instructions.
The present embodiment also provides a computer storage medium storing a computer program which, when executed by a processor, implements the method described above. The computer-readable medium may be considered tangible and non-transitory. Non-limiting examples of non-transitory tangible computer readable media include non-volatile memory circuits (e.g., flash memory circuits, erasable programmable read-only memory circuits, or masked read-only memory circuits), volatile memory circuits (e.g., static random access memory circuits or dynamic random access memory circuits), magnetic storage media (e.g., analog or digital magnetic tape or hard disk drives), and optical storage media (e.g., CDs, DVDs, or blu-ray discs), among others. The computer program includes processor-executable instructions stored on at least one non-transitory tangible computer-readable medium. The computer program may also include or be dependent on stored data. The computer programs may include a basic input/output system (BIOS) that interacts with the hardware of the special purpose computer, device drivers that interact with particular devices of the special purpose computer, one or more operating systems, user applications, background services, background applications, and so forth.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (8)

1. The ligand docking gesture virtual screening method based on the quick retrieval is characterized by comprising the following steps of:
s1: preprocessing ligand conformation information data, and establishing an index table for obtaining a search tree space index structure;
s2: inputting known active ligand conformations, rapidly searching and screening potential candidate conformations similar to the query conformations by utilizing a search tree space index structure, and taking the top-k query result which is the most similar as a top-k conformational result;
s3: and evaluating the top-k conformation result obtained by retrieval, comparing and outputting the actual RMSD value of the top-k candidate conformation and the natural conformation, verifying the accuracy of the screening result, and further optimizing the screening strategy.
2. The method for virtually screening the ligand docking posture based on the rapid search according to claim 1, wherein the step S1 is characterized in that the biological information data of the ligand conformation based on the spatial position relationship of the ligand conformation is preprocessed, and the specific processing steps are as follows:
a1: processing the ligand docking candidate data set to obtain biological information of different conformations of a single ligand, and simultaneously obtaining RMSD values between different candidate conformations and natural ligand conformations;
a2: extracting atomic space position information in each conformation, and converting a corresponding three-dimensional structure into a group of characteristic points, wherein each characteristic point comprises coordinate and type information; covering all feature points in the constellation with a minimum bounding box in space;
a3: and constructing a retrieval tree from the feature point set according to a hierarchical structure, wherein each leaf node stores a feature point, and each non-leaf node stores the minimum bounding box of its child nodes.
3. The method for virtually screening the docking postures of the ligands based on the rapid search according to claim 2, wherein the biological information of different constellations of the single ligands in the step A1 comprises molecular system information, composition atom information, key value information and substructure information.
4. The ligand docking gesture virtual screening method based on quick search according to claim 1, wherein the step S2 specifically includes the following steps:
b1: extracting atomic space position information in the known active ligand conformation, and converting the three-dimensional structure of the known active ligand conformation into a group of characteristic points, wherein each characteristic point comprises coordinate and type information;
b2: searching the feature point set of the known active ligand conformation in an established search tree, performing similarity matching with candidate conformations to be screened in a hierarchical mode, and calculating an RMSD value, namely a similarity score according to the number and the distance of the matched feature points;
b3: and sorting the candidate conformations to be screened according to the similarity score, and selecting a part of candidate conformations with highest scores as candidate ligand conformations to obtain a top-k docking posture result.
5. The method for virtually screening ligand docking gestures based on rapid search according to claim 4, wherein the step B2 is characterized in that feature points of known active ligand conformations are searched in an established search tree, and the searching comprises top-down searching and bottom-up searching;
the search from top to bottom specifically comprises the following steps: firstly, finding out an area where an instance is located from a root node through a retrieval method, then further dividing the area according to the attribute of a target, finding out a next layer of area, and sequentially iterating to finally obtain a conformational output result;
the bottom-up search is specifically as follows: firstly, determining the atomic position relation according to the preprocessed data, and distinguishing different examples through clustering and measurement learning means.
6. The method for virtually screening the ligand docking posture based on the rapid search according to claim 4, wherein the similarity score calculation method based on the spatial position relationship is adopted in the step B2, and specifically comprises the following steps:
the similarity score calculation formula of the candidate conformation based on the spatial position relation is as follows:
Figure FDA0004151310100000021
wherein x is ij X being the j-th conformation in top-k results j All points in the neighborhood, y i For all points within the y-neighborhood of the natural conformation, n is the sum of the number of atoms contained in a single ligand, and the distance between the points in formula (1) is defined as the deviation or deviation error of the candidate conformation;
in order to minimize the value of the formula (1), namely, the docking posture closest to the natural conformation is obtained, the minimum value is taken from the calculation set, and the obtained conformation is the conformation molecule with the smallest deviation difference in the candidate conformation library; and after obtaining the conformation with the minimum deviation difference value in the search tree, backtracking upwards to obtain a conformation with the minimum deviation error outside the obtained conformation, and iterating until top-k conformation query results are output.
7. The method for virtually screening the ligand docking posture based on the rapid search according to claim 1, wherein the step S3 is specifically:
c1: evaluating the screening result; calculating the RMSD values of all candidate conformations, sequencing the results to obtain k minimum conformations, comparing the k minimum conformations with top-k results obtained by searching, calculating the accuracy and comparing the time consumption;
c2: optimizing a screening strategy of a search tree; optimizing the splitting strategy, reconstructing the index item of the search tree, and carrying out reevaluation to obtain the strategy with highest accuracy through multiple experiments.
8. The method for virtually screening a ligand docking gesture based on rapid search according to claim 7, wherein the optimized splitting strategy in step C2 comprises linear splitting, binary splitting, quadtree splitting.
CN202310319885.0A 2023-03-29 2023-03-29 Ligand docking gesture virtual screening method based on quick retrieval Pending CN116246696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310319885.0A CN116246696A (en) 2023-03-29 2023-03-29 Ligand docking gesture virtual screening method based on quick retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310319885.0A CN116246696A (en) 2023-03-29 2023-03-29 Ligand docking gesture virtual screening method based on quick retrieval

Publications (1)

Publication Number Publication Date
CN116246696A true CN116246696A (en) 2023-06-09

Family

ID=86624314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310319885.0A Pending CN116246696A (en) 2023-03-29 2023-03-29 Ligand docking gesture virtual screening method based on quick retrieval

Country Status (1)

Country Link
CN (1) CN116246696A (en)

Similar Documents

Publication Publication Date Title
Hochheiser et al. Dynamic query tools for time series data sets: timebox widgets for interactive exploration
dos Santos et al. Hierarchical density-based clustering using MapReduce
Zhang et al. Protein complex prediction in large ontology attributed protein-protein interaction networks
Dehdouh Building OLAP cubes from columnar NoSQL data warehouses
Xu et al. An effective approach to detecting both small and large complexes from protein-protein interaction networks
Khan et al. Predictive performance comparison analysis of relational & NoSQL graph databases
Battle et al. A structured review of data management technology for interactive visualization and analysis
Bian et al. MCANet: shared-weight-based MultiheadCrossAttention network for drug–target interaction prediction
Jalili et al. Next generation indexing for genomic intervals
KR20090069874A (en) Method of selecting keyword and similarity coefficient for knowledge map analysis, and system thereof and media that can record computer program sources for method therof
Sun et al. A scalable and flexible basket analysis system for big transaction data in Spark
ur Rehman et al. Multi-dimensional scaling based grouping of known complexes and intelligent protein complex detection
Faridoon et al. Big Data Storage Tools Using NoSQL Databases and Their Applications in Various Domains: A Systematic Review.
Shumaila A comparison of k-means and mean shift algorithms
CN116246696A (en) Ligand docking gesture virtual screening method based on quick retrieval
Samaddar et al. A model for distributed processing and analyses of NGS data under map-reduce paradigm
Vespa et al. Efficient bulk-loading on dynamic metric access methods
Nanni et al. Exploring genomic datasets: From batch to interactive and back
Zhang et al. A Multi-perspective Model for Protein–Ligand-Binding Affinity Prediction
CN102411572B (en) Efficient sharing method for biomolecular data
Srivastava et al. Multi Minimum Product Spanning Tree Based Indexing Approach for Content Based Retrieval of Bio Images
CN105930463A (en) Cloud computing platform based big data processing method
CN117373564B (en) Method and device for generating binding ligand of protein target and electronic equipment
Bijral et al. Hierarchical clustering based characterization of protein database using molecular dynamic simulation
Riba et al. Error-tolerant coarse-to-fine matching model for hierarchical graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination