CN112185463A - Protein feature extraction method and device and computer equipment - Google Patents

Protein feature extraction method and device and computer equipment Download PDF

Info

Publication number
CN112185463A
CN112185463A CN202010998666.6A CN202010998666A CN112185463A CN 112185463 A CN112185463 A CN 112185463A CN 202010998666 A CN202010998666 A CN 202010998666A CN 112185463 A CN112185463 A CN 112185463A
Authority
CN
China
Prior art keywords
protein
ligand
drug
ligand molecule
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010998666.6A
Other languages
Chinese (zh)
Inventor
丁康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wangshi Intelligent Technology Co ltd
Original Assignee
Beijing Wangshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wangshi Intelligent Technology Co ltd filed Critical Beijing Wangshi Intelligent Technology Co ltd
Priority to CN202010998666.6A priority Critical patent/CN112185463A/en
Publication of CN112185463A publication Critical patent/CN112185463A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a protein feature extraction method, a device and computer equipment, wherein the protein feature extraction method comprises the following steps: obtaining the chemical structure of the protein ligand molecule; determining the drug-like properties of the protein ligand molecules according to the chemical structures, and selecting a default ligand according to the drug-like properties of the protein ligand molecules, wherein the default ligand represents the ligand molecule with the best drug-like property; calculating information characteristic of the drug binding pocket of the protein ligand molecule from the default ligand. The invention can rapidly obtain protein characteristics by ligand processing the chemical structure of the protein ligand molecules obtained from the protein database, is convenient for subsequent virtual screening in computer-aided drug design, has simple operation, does not need to depend on the manual operation of experienced operators, and reduces the manpower and time consumption.

Description

Protein feature extraction method and device and computer equipment
Technical Field
The invention relates to the technical field of biology, in particular to a protein feature extraction method, a protein feature extraction device and computer equipment.
Background
In computer-aided drug design, virtual screening techniques based on structure mainly rely on the accuracy of structures of biological macromolecules, such as proteins, nucleic acids, etc. The current experimental method for acquiring the biomacromolecule structure mainly comprises the following steps: the three methods are high in reliability, but billions of equipment and months are needed for structure analysis, and the structure obtained by analysis is generally stored in a world Protein database (Worldwide Protein Data Bank, wwPDB). Because the accuracy of early experimental data is not high, or the analyzed structure cannot meet certain drug research and development requirements, all biomacromolecule structures need to be reasonably preprocessed to perform subsequent research work.
In the correlation technique, traditional software such as Schrodinger molecular simulation software is adopted
Figure BDA0002693181220000011
Drug Discovery and biomacromolecule computing simulation platform (Discovery Studio), Molecular Operating Environment (MOE), or practical Molecular simulation tool (UCSF chip), etc., require a user to manually perform feature extraction on protein containing ligands, and require an operator with certain experience to process, which wastes human resources, so a protein feature extraction method is urgently needed to realize automatic acquisition of protein features, and reduce manpower and time consumption.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defects that the prior art needs a user to manually process the protein, and the manpower and time are wasted, thereby providing a protein feature extraction method, device and computer equipment.
According to a first aspect, the embodiment of the invention discloses a protein feature extraction method, which comprises the following steps: obtaining the chemical structure of the protein ligand molecule; determining the drug-like properties of the protein ligand molecules according to the chemical structures, and selecting a default ligand according to the drug-like properties of the protein ligand molecules, wherein the default ligand represents the ligand molecule with the best drug-like property; calculating information characteristic of the drug binding pocket of the protein ligand molecule from the default ligand.
Optionally, the obtaining a chemical structure of a protein ligand molecule comprises: acquiring index information of the protein corresponding to the protein ligand molecule in a protein database; acquiring a structure file of the protein in the protein database according to the index information; in the structure file, the chemical structure of the protein ligand molecule is obtained.
Optionally, the determining the drug-like nature of the protein ligand molecule from the chemical structure comprises: obtaining a target property feature of the chemical structure; and determining the drug-like property of the protein ligand molecule according to the target attribute feature and a preset attribute feature evaluation value.
Optionally, said calculating information characteristic of the drug binding pocket of said protein ligand molecule from said default ligand comprises: acquiring the maximum value and the minimum value of the default ligand in each direction coordinate in a preset coordinate system to obtain an initial drug binding pocket; constructing a corresponding space range of the initial drug binding pocket according to the maximum value and the minimum value of the coordinates in each direction, and expanding the space range to obtain the drug binding pocket; and calculating characteristic information of the drug binding pocket of the protein ligand molecule according to the maximum value and the minimum value of the coordinates of each direction of the drug binding pocket.
Optionally, the method further comprises: obtaining a structure file of a protein corresponding to the protein ligand molecule; completing amino acids in the structural file of the protein; mutating the supplemented amino acids to obtain a protein structure containing standard amino acids; removing non-drug small molecules from the protein structure comprising standard amino acids; carrying out hydrogenation operation on the protein structure without the non-drug small molecules based on a preset pH value; and (3) carrying out energy minimization treatment on the protein structure after the hydrogenation operation to obtain a protein structure file.
Optionally, after the energy minimization process is performed on the protein structure after the hydroprocessing to obtain a structure file of the target protein, the method further comprises: and converting the structure file of the protein into a preset format according to the data storage condition and storing the preset format in a corresponding database.
According to a second aspect, an embodiment of the present invention further discloses a protein feature extraction device, including: a first obtaining module for obtaining a chemical structure of a protein ligand molecule; a selection module for determining the drug-like properties of the protein ligand molecules according to the chemical structure and selecting a default ligand according to the drug-like properties of the protein ligand molecules, the default ligand representing the ligand molecule with the best drug-like properties; a first calculation module for calculating characteristic information of the drug binding pocket of the protein ligand molecule from the default ligand.
Optionally, the first obtaining module includes: the first acquisition submodule is used for acquiring index information of the protein corresponding to the protein ligand molecule in a protein database; the second acquisition module is used for acquiring the structure file of the protein in the protein database according to the index information; and the third acquisition module is used for acquiring the chemical structure of the protein ligand molecule in the structure file.
According to a third aspect, an embodiment of the present invention further discloses a computer device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the protein feature extraction method of the first aspect or any one of the optional embodiments of the first aspect.
According to a fourth aspect, the embodiments of the present invention further disclose a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the protein feature extraction method according to the first aspect or any one of the alternative embodiments of the first aspect.
The technical scheme of the invention has the following advantages:
according to the protein feature extraction method and device provided by the invention, the chemical structure of the protein ligand molecule is obtained, the drug-like property of the protein ligand molecule is determined according to the chemical structure, the default ligand is selected according to the drug-like property of the protein ligand molecule, the default ligand represents the ligand molecule with the best drug-like property, and the feature information of the drug binding pocket of the protein ligand molecule is calculated according to the default ligand. The invention can rapidly obtain protein characteristics by ligand processing the chemical structure of the protein ligand molecules obtained from the protein database, is convenient for subsequent virtual screening in computer-aided drug design, has simple operation, does not need to depend on the manual operation of experienced operators, and reduces the manpower and time consumption.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart illustrating a specific example of a protein feature extraction method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a specific example of a protein feature extraction apparatus according to an embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of a computer device.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In computer-aided drug design, a virtual screening technology based on a structure mainly depends on the accuracy of structures such as proteins and nucleic acids of biological macromolecules, and the virtual screening technology is mainly based on drug binding pocket information of protein ligand molecules and the structure of the proteins for screening, but because the accuracy of early experimental data is not high, the protein structure extracted and stored in a protein database cannot meet the drug research and development requirements, the embodiment of the invention discloses a protein feature extraction method for processing the protein structure, so that the subsequent research and development work is facilitated, and as shown in fig. 1, the method comprises the following steps:
s11: obtaining the chemical structure of the protein ligand molecule.
Illustratively, the protein ligand molecule is any molecule that can bind to a protein in a manner that is primarily loop-domain binding as well as hinge-type binding. The acquisition of the chemical structure of the protein ligand molecule can be directly acquired from a protein database or extracted from a protein structure file, and the acquisition mode of the chemical structure of the protein ligand molecule is not limited in the embodiment of the invention and can be selected by a person skilled in the art according to actual situations.
S12: and determining the drug-like property of the protein ligand molecule according to the chemical structure, and selecting a default ligand according to the drug-like property of the protein ligand molecule, wherein the default ligand represents the ligand molecule with the best drug-like property.
Exemplary, a drug-like property refers to the similarity of a protein ligand molecule to a known drug. Compounds with drug-like properties are not drugs, but have the potential to be drugs. The determination of the drug-like property of the protein ligand molecule according to the chemical structure may be performed by an exclusion method, for example, the exclusion method may be performed based on toxicity caused by the molecular characteristics of the protein ligand molecule, bioavailability reduced, or the like, or a method simulating a known drug may be performed, that is, the protein ligand molecule is evaluated by using favorable characteristics in the chemical structure of the known drug, or a quantitative drug-like property evaluation method may be performed.
And (3) according to the comparison of the drug-like properties of the protein ligand molecules, selecting the protein ligand molecule with the best drug-like property to obtain a default ligand, and processing the default ligand.
S13: information characterizing the drug binding pocket of the protein ligand molecule is calculated from the default ligand.
Illustratively, the drug binding pocket is a region to which a ligand may bind, and may be generally pocket-like in shape, but may also be other shapes, such as a channel, a groove, and the like. The characteristic information of the drug binding pocket includes the center coordinates of the drug binding pocket and the length, width and height of the drug binding pocket. These several data may represent the drug binding pocket. Calculating the characteristic information of the drug binding pocket of the protein ligand molecule according to the default ligand may be calculated according to the maximum value and the minimum value of coordinates of the default ligand in each direction in a preset coordinate system.
According to the protein feature extraction method provided by the invention, the chemical structure of the protein ligand molecule is obtained, the drug-like property of the protein ligand molecule is determined according to the chemical structure, the default ligand is selected according to the drug-like property of the protein ligand molecule, the default ligand represents the ligand molecule with the best drug-like property, and the feature information of the drug binding pocket of the protein ligand molecule is calculated according to the default ligand. The invention can rapidly obtain protein characteristics by ligand processing the chemical structure of the protein ligand molecules obtained from the protein database, is convenient for subsequent virtual screening in computer-aided drug design, has simple operation, does not need to depend on the manual operation of experienced operators, and reduces the manpower and time consumption.
As an optional implementation manner of the embodiment of the present invention, the step S11 includes:
first, index information of a protein corresponding to a protein ligand molecule in a protein database is obtained.
Illustratively, the Protein database may be a Research Bioinformatics Research collaborative organization Protein database (RCSB PDB), a european Protein database (PDBe), a japanese Protein database (PDBj), which all store structure files of proteins, and which provide different interfaces for different servers. The index information may be the PDB number of the protein. The index information of the protein can be obtained by directly inputting the name of the protein on a related search engine.
Next, a structure file of the protein is obtained in the protein database based on the index information.
Illustratively, the index information is input into the processor, and the processor calls the structure file of the protein, which is a file in PDB format, directly from the corresponding protein database according to the index information.
Again, in the structure file, the chemical structure of the protein ligand molecule is obtained.
Illustratively, there are zero to many protein ligand molecules per protein, and this example only considers the presence of protein ligand molecules. In the above-mentioned structure document, the Chemical structure of the protein ligand molecule may be obtained by obtaining index information of the protein ligand molecule from a protein database when the protein ligand molecule exists in the protein, and the index information of the protein ligand molecule may be (CCD) and extracting the Chemical structure of the protein ligand molecule from the structure document of the protein according to the CCD number.
By implementing the method, a user can automatically obtain the structure of the protein ligand molecule only by providing the PDB number of the protein to be processed, other additional operations are not needed, and human resources are saved. Meanwhile, the embodiment of the invention links a plurality of PDB databases with different emphasis, integrates multi-party data and can obtain the latest protein information.
As an optional implementation manner of this embodiment, the step S12 includes:
first, a target property feature of the chemical structure is obtained.
Illustratively, the target attribute characteristics may include molecular weight, lipid water partition coefficient (log)p) Topological surface area, number of hydrogen bond acceptors/donors, number of aromatic rings, number of rotatable bonds, number of toxic chemical fragments, etc., these target characteristic attributes can be obtained directly from the chemical structure of the protein ligand molecule.
And secondly, determining the drug-like property of the protein ligand molecule according to the target attribute characteristics and the preset attribute characteristic evaluation value.
Illustratively, the preset attribute evaluation value may be one that is stored in the processor set in advance according to expert experience, for example, the preset evaluation value of molecular weight is 300 to 500; the preset evaluation value of the fat-water distribution coefficient is 5; the preset evaluation value of hydrogen bond acceptors is 5; the preset evaluation value of the hydrogen bond ligand is 5-10, and similarly, other attribute characteristics can also set corresponding preset evaluation values. The property classification of the protein ligand molecules is determined according to whether the target attribute characteristics are within the corresponding preset attribute characteristic evaluation value range, and compared with other property classification calculation methods, the result of whether the target attribute characteristics have the property classification can be obtained.
As an optional implementation manner of this embodiment, the step S13 includes:
firstly, acquiring the maximum value and the minimum value of each direction coordinate of a default ligand in a preset coordinate system to obtain an initial drug binding pocket.
For example, the preset coordinate system may be a stereo coordinate system set in advance in a unified standard. The maximum value and the minimum value of each direction coordinate x, y and z can be obtained from a structure file of the protein or a protein database. And taking the area enclosed by the maximum value and the minimum value of each direction coordinate as the space range of the initial drug binding pocket.
And secondly, constructing a corresponding space range of the initial drug binding pocket according to the maximum value and the minimum value of the coordinate in each direction, and expanding the space range to obtain the drug binding pocket.
For example, an area surrounded by the maximum value and the minimum value of each direction coordinate is used as a spatial range corresponding to the initial drug binding pocket, and the spatial range is expanded to be a preset distance, so that a complete drug binding pocket is obtained. In an embodiment of the present invention, the predetermined distance may be 5 angstroms.
Thirdly, characteristic information of the drug binding pocket of the protein ligand molecule is calculated according to the maximum value and the minimum value of the coordinates of each direction of the drug binding pocket.
Illustratively, the central coordinates c _ x, c _ y, c _ z may be calculated from the maximum and minimum values in the x, y, z directions of the drug binding pocket in such a manner that the maximum and minimum values in each direction are averaged; the length, width, height, dim _ x, dim _ y, dim _ z of the drug binding pocket may be calculated as the difference between the maximum and minimum values in each direction, based on the maximum and minimum values in the x, y, z directions of the drug binding pocket. These 6 numbers are recorded in a database and represent the drug binding pocket.
As an optional implementation manner of this embodiment, the protein feature extraction method further includes:
first, a structural file of a protein corresponding to a protein ligand molecule is obtained. The structural files of the protein are obtained in the above-mentioned related descriptions, and are not described herein again.
Then, the amino acids in the protein structure file are subjected to a completion process.
Illustratively, the missing residues of the amino acid structure are supplemented according to a comparison of the sequence information of the residue sequence (SEQRES) in the structure file of the protein with standard amino acid sequence information in a pre-set database.
And finally, mutating the supplemented amino acids to obtain a protein structure containing standard amino acids.
Illustratively, the supplemented amino acids may be non-standard amino acids, and the supplemented amino acids are mutated using the openmm-PDBFixer function, specifically, the amino acids are compared with standard amino acid sequences in a predetermined database, the side chain atomic coordinates of the residues of the supplemented amino acids are deleted, and the amino acid numbering is changed to the numbering of the mutated amino acids.
Again, non-drug small molecules in the protein structure containing the standard amino acids are removed. The non-drug small molecule may include: water molecules, other solvents and some heteroatoms. To ensure the accuracy of the protein structure, non-drug small molecules in the protein structure containing standard amino acids are directly deleted.
Thirdly, the protein structure from which the non-drug small molecules are removed is hydroprocessed based on a preset PH value.
Illustratively, the preset PH may be 7.4, and the hydrogenating operation of the protein structure from which the non-drug small molecules are removed based on the preset PH may be adding reasonable hydrogen atoms according to the preset PH using openmm. For example, basic amino acids (arginine, lysine, histidine) and acidic amino acids (glutamic acid, aspartic acid) can be protonated using a standard acidity coefficient (pKa) and a predetermined PH, and hydrogen atoms can be added to amino acids according to the topology of the amino acids stored in openmm based on their protonated state.
Thirdly, carrying out energy minimization treatment on the protein structure after the hydrogenation operation to obtain a protein structure file.
Illustratively, some unreasonable tension may be introduced during the above process, and energy minimization is performed using openmm, local energy minimizer, minimize function, which is typically the steepest gradient descent method or conjugate gradient descent method, and gradient solving method is the differential of potential energy versus coordinate at a given molecular force field (CHARM36 force field).
As an optional embodiment of this embodiment, after the energy minimization processing is performed on the protein structure after the hydroprocessing to obtain the structure file of the target protein, the method for extracting protein features further includes:
and converting the structure file of the protein into a preset format according to the data storage condition and storing the preset format in a corresponding database.
Illustratively, the data storage condition may be a chemical structure format that can be stored in a corresponding database. The preset format may include: SDF format, MOL2 format, PDB format, etc. The corresponding database can be a relational database management system (MySQL), and the database is an off-line database, so that other servers can be called conveniently at any time. The PDB number of the protein may be used for calling, and the Unit Access number of the protein may be used.
The embodiment of the invention also discloses a protein feature extraction device, as shown in fig. 2, comprising:
a first obtaining module 21 for obtaining the chemical structure of the protein ligand molecule; the specific implementation manner is described in step S11 in the embodiment, and is not described herein again.
The selection module 22 is used for determining the drug-like property of the protein ligand molecule according to the chemical structure, and selecting a default ligand according to the drug-like property of the protein ligand molecule, wherein the default ligand represents the ligand molecule with the best drug-like property; the specific implementation manner is described in step S12 in the embodiment, and is not described herein again.
A first calculation module 23 for calculating information on the characteristics of the drug binding pocket of the protein ligand molecule based on the default ligand. The specific implementation manner is described in step S13 in the embodiment, and is not described herein again.
According to the protein feature extraction device provided by the invention, the chemical structure of the protein ligand molecule is obtained, the drug-like property of the protein ligand molecule is determined according to the chemical structure, the default ligand is selected according to the drug-like property of the protein ligand molecule, the default ligand represents the ligand molecule with the best drug-like property, and the feature information of the drug binding pocket of the protein ligand molecule is calculated according to the default ligand. The invention can rapidly obtain protein characteristics by ligand processing the chemical structure of the protein ligand molecules obtained from the protein database, is convenient for subsequent virtual screening in computer-aided drug design, has simple operation, does not need to depend on the manual operation of experienced operators, and reduces the manpower and time consumption.
As an optional embodiment of the present invention, the first obtaining module 21 includes:
the first acquisition submodule is used for acquiring index information of the protein corresponding to the protein ligand molecule in the protein database; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
The second acquisition module is used for acquiring a structure file of the protein in the protein database according to the index information; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
And the third acquisition module is used for acquiring the chemical structure of the protein ligand molecule in the structure file. The specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
As an optional implementation manner of the embodiment of the present invention, the selecting module 22 includes:
the fourth acquisition module is used for acquiring the target attribute characteristics of the chemical structure; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
And the second calculation module is used for determining the drug-like property of the protein ligand molecule according to the target attribute characteristics and the preset attribute characteristic evaluation value. The specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
As an optional implementation manner of the embodiment of the present invention, the first calculating module 23 includes:
a fifth obtaining module, configured to obtain a maximum value and a minimum value of coordinates of the default ligand in each direction in a preset coordinate system, so as to obtain an initial drug binding pocket; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
The first processing module is used for constructing a corresponding space range of the initial drug binding pocket according to the maximum value and the minimum value of the coordinates in each direction, and expanding the space range to obtain the drug binding pocket; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
And the first calculation submodule is used for calculating the characteristic information of the drug binding pocket of the protein ligand molecule according to the maximum value and the minimum value of the coordinates of each direction of the drug binding pocket. The specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
As an optional implementation manner of the embodiment of the present invention, the protein feature extraction apparatus further includes:
the second acquisition submodule is used for acquiring a structure file of the protein corresponding to the protein ligand molecule; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
The second processing module is used for performing completion processing on amino acids in the structure file of the protein; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
The mutation module is used for mutating the supplemented amino acids to obtain a protein structure containing standard amino acids; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
A removal module for removing non-drug small molecules in a protein structure comprising standard amino acids; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
The hydrogenation operation module is used for carrying out hydrogenation operation on the protein structure without the non-drug small molecules based on a preset pH value; the specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
And the third processing module is used for performing energy minimization processing on the protein structure after the hydrogenation operation to obtain a protein structure file. The specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
As an optional implementation manner of the embodiment of the present invention, the protein feature extraction apparatus further includes:
and the storage module is used for converting the structure file of the protein into a preset format according to the data storage condition and storing the preset format in a corresponding database. The specific implementation manner is shown in the corresponding steps in the embodiments, and is not described herein again.
An embodiment of the present invention further provides a computer device, as shown in fig. 3, the computer device may include a processor 31 and a memory 32, where the processor 31 and the memory 32 may be connected by a bus or in another manner, and fig. 3 takes the example of being connected by a bus as an example.
The processor 31 may be a Central Processing Unit (CPU). The Processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 32, which is a non-transitory computer-readable storage medium, may be used for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the first obtaining module 21, the selecting module 22, and the first calculating module 23 shown in fig. 2) corresponding to the protein feature extraction method in the embodiment of the present invention. The processor 31 executes various functional applications and data processing of the processor by executing the non-transitory software programs, instructions and modules stored in the memory 32, that is, implements the protein feature extraction method in the above method embodiment.
The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 31, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, and these remote memories may be connected to the processor 31 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 32 and, when executed by the processor 31, perform a protein feature extraction method as in the embodiment shown in fig. 1.
The details of the computer device can be understood with reference to the corresponding related descriptions and effects in the embodiment shown in fig. 1, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A protein feature extraction method is characterized by comprising the following steps:
obtaining the chemical structure of the protein ligand molecule;
determining the drug-like properties of the protein ligand molecules according to the chemical structures, and selecting a default ligand according to the drug-like properties of the protein ligand molecules, wherein the default ligand represents the ligand molecule with the best drug-like property;
calculating information characteristic of the drug binding pocket of the protein ligand molecule from the default ligand.
2. The method of claim 1, wherein said obtaining the chemical structure of a protein ligand molecule comprises:
acquiring index information of the protein corresponding to the protein ligand molecule in a protein database;
acquiring a structure file of the protein in the protein database according to the index information;
in the structure file, the chemical structure of the protein ligand molecule is obtained.
3. The method of claim 1, wherein said determining the drug-like nature of said protein ligand molecule based on said chemical structure comprises:
obtaining a target property feature of the chemical structure;
and determining the drug-like property of the protein ligand molecule according to the target attribute feature and a preset attribute feature evaluation value.
4. The method of claim 1, wherein said calculating information about characteristics of a drug binding pocket of said protein ligand molecule based on said default ligand comprises:
acquiring the maximum value and the minimum value of the default ligand in each direction coordinate in a preset coordinate system to obtain an initial drug binding pocket;
constructing a corresponding space range of the initial drug binding pocket according to the maximum value and the minimum value of the coordinates in each direction, and expanding the space range to obtain the drug binding pocket;
and calculating characteristic information of the drug binding pocket of the protein ligand molecule according to the maximum value and the minimum value of the coordinates of each direction of the drug binding pocket.
5. The method of claim 1, further comprising:
obtaining a structure file of a protein corresponding to the protein ligand molecule;
completing amino acids in the structural file of the protein;
mutating the supplemented amino acids to obtain a protein structure containing standard amino acids;
removing non-drug small molecules from the protein structure comprising standard amino acids;
carrying out hydrogenation operation on the protein structure without the non-drug small molecules based on a preset pH value;
and (3) carrying out energy minimization treatment on the protein structure after the hydrogenation operation to obtain a protein structure file.
6. The method of claim 5, wherein after subjecting the hydroprocessed protein structure to an energy minimization process to obtain a structure file of the target protein, the method further comprises:
and converting the structure file of the protein into a preset format according to the data storage condition and storing the preset format in a corresponding database.
7. A protein feature extraction device, characterized by comprising:
a first obtaining module for obtaining a chemical structure of a protein ligand molecule;
a selection module for determining the drug-like properties of the protein ligand molecules according to the chemical structure and selecting a default ligand according to the drug-like properties of the protein ligand molecules, the default ligand representing the ligand molecule with the best drug-like properties;
a first calculation module for calculating characteristic information of the drug binding pocket of the protein ligand molecule from the default ligand.
8. The apparatus of claim 7, wherein the first obtaining module comprises:
the first acquisition submodule is used for acquiring index information of the protein corresponding to the protein ligand molecule in a protein database;
the second acquisition module is used for acquiring the structure file of the protein in the protein database according to the index information;
and the third acquisition module is used for acquiring the chemical structure of the protein ligand molecule in the structure file.
9. A computer device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the protein feature extraction method of any one of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the protein feature extraction method according to any one of claims 1 to 6.
CN202010998666.6A 2020-09-21 2020-09-21 Protein feature extraction method and device and computer equipment Pending CN112185463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010998666.6A CN112185463A (en) 2020-09-21 2020-09-21 Protein feature extraction method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010998666.6A CN112185463A (en) 2020-09-21 2020-09-21 Protein feature extraction method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN112185463A true CN112185463A (en) 2021-01-05

Family

ID=73957033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010998666.6A Pending CN112185463A (en) 2020-09-21 2020-09-21 Protein feature extraction method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN112185463A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102511045A (en) * 2009-09-25 2012-06-20 上海抗体药物国家工程研究中心有限公司 Method of acquiring proteins with high affinity by computer aided design
CN109215737A (en) * 2018-09-30 2019-01-15 东软集团股份有限公司 Protein characteristic extracts, functional mode generates, the method and device of function prediction
CN110428864A (en) * 2019-07-17 2019-11-08 大连大学 Method for constructing the affinity prediction model of protein and small molecule

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102511045A (en) * 2009-09-25 2012-06-20 上海抗体药物国家工程研究中心有限公司 Method of acquiring proteins with high affinity by computer aided design
CN109215737A (en) * 2018-09-30 2019-01-15 东软集团股份有限公司 Protein characteristic extracts, functional mode generates, the method and device of function prediction
CN110428864A (en) * 2019-07-17 2019-11-08 大连大学 Method for constructing the affinity prediction model of protein and small molecule

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
毛一博;顾月清;: "蛋白质-配体相互作用信息数据库研究进展", 药学进展, no. 09, pages 1 - 5 *
邱智军: "蛋白质结合位点预测方法研究与应用", 博士电子期刊, no. 9, pages 1 - 2 *

Similar Documents

Publication Publication Date Title
US10650221B2 (en) Systems and methods for comprehensive multi-assay tissue analysis
CN113593633B (en) Convolutional neural network-based drug-protein interaction prediction model
AU2019346427A1 (en) Methods of normalizing and correcting RNA expression data
CN107292129A (en) Susceptible genotype detection method
CN113488106A (en) Method for rapidly acquiring comparison result data of target genome region
CN111444188A (en) Stock test data preparation method and device, storage medium and electronic equipment
CN111796095A (en) Proteome mass spectrum data processing method and device
He et al. Full-length de novo protein structure determination from cryo-EM maps using deep learning
WO2003067256A2 (en) Systems for analyzing microtissue arrays
CN110531977B (en) Automatic control method and device for instrument, computer equipment and storage medium
CN107122607A (en) A kind of method and device for generating therapeutic regimen report
CN112185463A (en) Protein feature extraction method and device and computer equipment
CN111128308B (en) New mutation information knowledge platform for neuropsychiatric diseases
KR102563901B1 (en) Prediction method for property of pharmaceutical active ingredient
CN112418196A (en) Crowd quantity prediction method and device, computer equipment and storage medium
Oh et al. Mining protein data from two‐dimensional gels: Tools for systematic post‐planned analyses
CN113704077A (en) Test case generation method and device
CN110164504A (en) Processing method, device and the electronic equipment of two generation sequencing datas
CN111627492B (en) Cancer genome Hi-C data simulation method and device and electronic equipment
CN116994647A (en) Method for constructing model for analyzing mutation detection result
WO2021188410A1 (en) Computer-implemented methods for quantitation of features of interest in whole-slide imaging
CN111581317B (en) Method and system for quickly searching adjacent plots of target area in large database
WO2020061327A1 (en) Characterization platform for scalable, spatially-resolved multispectral analysis of tissue
CN117150065B (en) Image information acquisition method and system
CN110570902B (en) Copy number variation analysis method, system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination