CN113223607B - Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm - Google Patents

Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm Download PDF

Info

Publication number
CN113223607B
CN113223607B CN202110590305.2A CN202110590305A CN113223607B CN 113223607 B CN113223607 B CN 113223607B CN 202110590305 A CN202110590305 A CN 202110590305A CN 113223607 B CN113223607 B CN 113223607B
Authority
CN
China
Prior art keywords
smiles
heparin
algorithm
character string
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110590305.2A
Other languages
Chinese (zh)
Other versions
CN113223607A (en
Inventor
于明加
郭霄亮
林煌
李晋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202110590305.2A priority Critical patent/CN113223607B/en
Publication of CN113223607A publication Critical patent/CN113223607A/en
Application granted granted Critical
Publication of CN113223607B publication Critical patent/CN113223607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Polysaccharides And Polysaccharide Derivatives (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention provides a method for randomly generating heparin analogue structure coordinates in batches by adopting a smiles algorithm, wherein atom numbers on a compound structure can be redefined through a smiles character string, and a redundancy elimination algorithm is combined to eliminate repeated structures on the randomly generated heparin analogue structure. The invention constructs a five sugar unit structure library of heparin or heparan sulfate analogues, which is used for screening heparan sulfate analogues and pharmacophores which are tightly combined with related receptor proteins in the follow-up process, is used for rationally designing oligosaccharide lead drugs for resisting coronaviruses, and synthesizes the designed heparan sulfate lead compounds through an in-vitro biosynthesis way. The method can simply and rapidly generate the structure coordinates of all the possibility of specific substitution of a certain site on the heparin analogue oligosaccharide unit structure, can be widely applied to small molecules or macromolecular compounds, realizes specific modification of the specific site, and then randomly generates a large number of characteristic compound coordinates in batches.

Description

Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm
Technical Field
The invention relates to the technical field of biological information, in particular to a method for randomly generating heparin analogue structural coordinates in batches by adopting a smiles algorithm.
Background
Heparin (HS) and Heparan Sulfate (HS) are linear glycosaminoglycans in which the heparin molecular structure is sulfonated to a higher degree than HS, is mainly present in mast cells of animal tissues, is diversified in molecular structure, is widely expressed in animals, is present on the cell surface and in the cell matrix of all mammals, and can bind to a variety of protein molecules including cytokines and chemokines, enzymes and enzyme inhibitors, extracellular matrix proteins and membrane receptors, and is involved in a variety of complex physiological and pathological responses such as cell attachment, migration, differentiation, embryonic development, organogenesis, coagulation, lipid metabolism, inflammation and injury response. In particular HS are involved in regulating the activity of a number of tumor growth factors and immune factors, thereby affecting tumor genesis and metastasis and inflammatory responses. In addition, HS is also a helper receptor for a variety of viruses on the cell surface, including herpes simplex virus, influenza virus, SARS-CoV-1 and SARS-CoV-2, and the like, involved in regulating the invasion of viruses into host cells. At the same time, HS is also involved in regulating the release and activity of a plurality of inflammatory mediators, playing an important role in the immune response after invasion of viruses and bacteria into host cells.
In 2020, heparin or HS analogues are used for mediating the binding of the receptor binding domain of the S1 subunit of the viral spike protein to HS on the surface of cells in vivo and interfering with the binding of inflammatory mediators to HS on the surface of endothelial cells, thereby inhibiting the invasion of viruses into host cells and the inflammatory storm process generated after the invasion of viruses into host cells, and the two aspects of research contents become hot spots for developing novel antiviral drugs. However, the number of commercial heparin structures commonly found in clinic is limited, and the commercial heparin structures cannot be used for screening the anti-coronavirus anti-inflammatory oligosaccharide lead drugs combined with related receptor proteins.
Therefore, the problem to be solved is to construct oligosaccharide structure library of heparin or HS analogues for subsequent screening of HS analogues and pharmacophores tightly combined with related receptor proteins, for rational design of oligosaccharide lead drugs against coronaviruses, and for synthesis of designed HS lead compounds by in vitro biosynthesis.
Disclosure of Invention
In order to solve the limitations and defects in the prior art, the invention provides a method for randomly generating heparin analog structure coordinates in batches by adopting a smiles algorithm, which comprises the following steps:
generating a Label character string;
generating a digital template;
performing redundancy elimination processing on the digital template by using a redundancy elimination algorithm;
performing expansion processing on the digital template;
generating a smiles character string by using the digital template and the Label character string, wherein the atomic number on the heparin analogue structure is arranged corresponding to the smiles character string;
storing the smiles character string as a csv file;
importing a csv file storing smiles character strings by using a pandas tool;
converting the csv file into a mol structure by using an RDkit tool;
carrying out three-dimensional treatment on the mol structure, and simultaneously carrying out force field optimization on the mol structure;
and converting the mol structure into a coordinate file.
Optionally, the step of performing redundancy elimination processing on the digital template by using a redundancy elimination algorithm includes:
converting the digital template array into an English character string array;
traversing to remove redundant character strings in English character strings;
and converting the English character string into a digital template.
Optionally, the step of performing expansion processing on the digital template includes:
a sulfonic acid group, amino group or acetyl group substitution treatment is performed on N of the heparin analog structure, a sulfonic acid group substitution treatment is performed on O of the heparin analog structure, and a carboxyl chiral conversion treatment is performed on C of the heparin analog structure.
Optionally, the method further comprises:
an example of a library of coordinates for the heparin analog structure is randomly generated, comprising 14112 coordinate files.
Optionally, the redundancy elimination algorithm is used to eliminate duplicate structures on randomly generated heparin analog oligosaccharide units.
Optionally, the method further comprises:
the resulting pentasaccharide structure coordinates of heparin analog are made into a structure database for screening lead compounds against SARS-CoV-2 virus.
Optionally, the method further comprises:
the obtained heparin analogue pentasaccharide structure lead compound is synthesized in vitro, and is used for carrying out SPR molecular interaction affinity analysis experiments with SARS-CoV-2 virus replication and invasion related proteins and inflammatory factor proteins.
The invention has the following beneficial effects:
the invention provides a method for randomly generating heparin analogue structure coordinates in batches by adopting a smiles algorithm, wherein atom numbers on a compound structure can be redefined through a smiles character string, and a redundancy elimination algorithm is combined to eliminate repeated structures on the randomly generated heparin analogue structure. According to the invention, a smiles character string generation algorithm is adopted to randomly generate heparin analogue structure coordinates in batches, and a five sugar unit structure library of heparin or HS analogues is constructed for subsequent screening of HS analogues and pharmacophores which are tightly combined with related receptor proteins, for rational design of oligosaccharide lead drugs for resisting coronaviruses, and for synthesis of designed HS lead compounds through an in vitro biosynthesis way. The method can simply and rapidly generate the structure coordinates of all the possibility of specific substitution of a certain site on the heparin analogue oligosaccharide unit structure, can be widely applied to small molecules or macromolecular compounds, realizes specific modification of the specific site, and then randomly generates a large number of characteristic compound coordinates in batches.
Drawings
FIG. 1 is a diagram of the pentasaccharide structure of heparin analog converted to smiles character string.
FIG. 2 is a flow chart of a traversal of the redundancy elimination algorithm to determine whether the pentasaccharide structure of a heparin analog is repeated.
Fig. 3 is an overall flowchart of a random batch heparin analog structure coordinate by a smiles string generation algorithm.
FIG. 4 is a flow chart for converting smiles files to coordinate files using RDkit.
Fig. 5 is a flow chart of a procedure for randomizing heparin analog structure coordinates by a smiles string generation algorithm.
FIG. 6 is a diagram of the pentasaccharide structure of heparin analogues in the first position, respectively, of proteins involved in mediating viral replication and invasion processes obtained by screening with biological computer software.
FIG. 7 is a graph showing the results of affinity for heparin and SARS-CoV-2 virus and inflammatory factor-related proteins obtained by SPR molecular interaction affinity assay.
Detailed Description
In order to enable those skilled in the art to better understand the technical scheme of the invention, the method for randomly generating heparin analog structure coordinates in batches by adopting a smiles algorithm is described in detail below with reference to the accompanying drawings.
Example 1
The technical problem to be solved in the embodiment is to provide a smiles character string and redundancy elimination algorithm capable of randomly generating heparin analog structure coordinates in batches aiming at the defect that the number of the existing commercialized heparin analog structures is limited. In this example, the method can simply and rapidly generate the structure coordinates of all the possibilities of specific substitution of a certain site on the structure of the heparin analogue oligosaccharide unit.
To this end, the present embodiment provides a smiles string command that can redefine any atomic number on the structure of a compound. According to the implementation manner of the embodiment, the heparin analogue structure smiles character string can be quickly generated in batches. The heparin analogue structure smiles character string can be defined through a template, so that the atomic number of a site needing structural modification is unchanged.
In this embodiment, a redundancy elimination algorithm is used to eliminate the duplicate structure coordinates on the pentasaccharide unit of the heparin analog. According to the implementation of the embodiment, the smiles character string definition of the structure smiles character string of heparin analog can be expanded through a template, wherein the N of the smiles character string can be subjected to sulfonic acid group, amino group or acetyl group substitution, the O of the smiles character string is subjected to sulfonic acid group substitution and the C of the chiral conversion of the C.
The embodiment provides a method for calling RDkit by Python to realize batch generation of heparin analogue structure coordinates by importing smiles character string files. In some further specific embodiments of this embodiment, the whole process of random batch heparin analog structure coordinates of the smiles character string generation algorithm is implemented by codes. In this example, the resulting pentasaccharide structure coordinates of heparin analog in the example were made into a structure database by computational biology software for screening lead compounds against SARS-CoV-2 virus.
In this example, heparin analog pentasaccharide structure lead compounds screened by biological calculation software were synthesized in vitro and subjected to SPR molecular interaction affinity analysis experiments with SARS-CoV-2 virus replication and invasion related proteins.
In the embodiment, a smiles character string generation algorithm is adopted to randomly generate heparin analogue structure coordinates in batches, a five sugar unit structure library of heparin or HS analogues is constructed and used for subsequently screening HS analogues and pharmacophores which are tightly combined with related receptor proteins, the HS analogues and the pharmacophores are used for rationally designing oligosaccharide lead drugs for resisting coronaviruses, and the designed HS lead compounds are synthesized through an in vitro biosynthesis way. The method is proposed for the first time, can be widely applied to small molecules or macromolecular compounds, realizes specific modification of specific sites, and then randomly generates a large number of characteristic compound coordinates in batches.
The term "Smiles string" (Simplified molecular input line entry specification) in this embodiment is a simplified molecular linear input specification, which is a specification for explicitly describing the molecular structure with ASCII strings.
The terms "heparin and heparan sulfate" as used in this example are linear glycosaminoglycans (GAGs) whose basic structure is a repeating disaccharide unit formed by the attachment of D-glucuronic acid (GlcA) or L-iduronic acid (IdoA) and D-glucosamine (D-glucosamine, glcN) in 1,4 glycosidic linkages, which are capable of being sulfonated in the N-position or the O-position. Each sugar unit is modified by sulfonation to different degrees, so that the molecular structure of the HS has various characteristics. The biosynthetic pathway is now the microbial synthesis of heparin precursors, which are treated with acetyl transferase, sulfotransferase and epimerase to convert the N-acetylglucosamine (GlcNAc) sugar units on the heparin precursors to sulfonated N-glucosamine (N-sulfated glucosamine, glcNS) sugar units; conversion of GlcA sugar units to IdoA sugar units; providing a sulfonic acid group by 3'-phosphoadenosine-5' -phosphosulfate (PAPS), transferring the sulfonic acid group to the C2 position of the IdoA saccharide unit or a minority of the GlcA saccharide units to form IdoA2S and a minority of the GlcA2S saccharide units; the sulfonate group may also be transferred to the C6 position of the GlcNS saccharide unit to form a GlcNS6S saccharide unit; the sulfonic acid group is transferred to the C3 position of the GlcNS saccharide unit to form a GlcNS3S saccharide unit.
The term "pentasaccharide unit structure of heparin analog" as used in this example means five saccharide units composed of D-glucosamine (GlcN and) D-glucuronic acid (GlcA) or L-iduronic acid (IdoA) linked by 1,4 glycosidic bonds, wherein the basic structure is either N-position or O-position.
The terms "tumor growth factor and immune factor" as used herein refer to fibroblast growth factor (fibroblast growth factor, FGF), vascular endothelial derived growth factor (vascular endothelial-derived growth factor, VEGF), platelet-derived growth factor (platelet-derived growth factor, PDGF), hepatocyte growth factor (hepatocyte growth factor, HGF), transforming growth factor-beta (transforming growth factor-beta, TGF-beta), interleukin, interferon, and the like.
The term "inflammatory regulator" as used in this example refers to, for example, interleukin 1-10 (interleukin-1 to 10, IL-1-10), monocyte chemotactic protein 1 (monocyte chemoattractant protein 1, MCP-1), CC chemokine family ligand 8 (CC chemokine family ligands, CCL 8) and the like.
The term "RDkit" in this embodiment is an open source kit for chemical informatics, and based on 2D and 3D molecular operations on compounds, the generation of compound descriptors, the generation of finger print, the calculation of structural similarity of compounds, 2D and 3D molecular display, etc. are performed by using a machine learning method.
The term "SPR" in this embodiment is an abbreviation of surface plasmon resonance (surface plasmon resonance technology) and is an optical technology for characterizing changes in refractive index of a surface, and can observe surface phenomena such as molecular binding and film formation in real time and give signals of nonspecific binding with high sensitivity and high selectivity.
FIG. 1 is a diagram of the pentasaccharide structure of heparin analog converted to smiles character string. This example converts heparin analog structural formulas into smiles character strings. The formula in fig. 1 can be converted into a smiles string (which can be built by itself by software such as KingDraw or ChemDraw) as follows.
O=C([O-])[C@@H]1O[C@H](O[C@H]2[C@H](COS(=O)(=O)[O-])O[C@H](O)[C@@H](NS(=O)(=O)[O-])[C@@H]2O)[C@@H](O)[C@H](O)[C@H]1O[C@H]1O[C@@H](COS(=O)(=O)[O-])[C@H](O[C@H]2O[C@@H](C(=O)[O-])[C@H](O[C@H]3O[C@@H](CO)[C@H](O)[C@@H](O)[C@@H]3NS(=O)(=O)[O-])[C@@H](O)[C@H]2OS(=O)(=O)[O-])[C@@H](OS(=O)(=O)[O-])[C@@H]1NS(=O)(=O)[O-]
This example generated heparin analogue structured smiles strings in batches. The underlined dots are marked in fig. 1 as 13 total positions where substitution occurs, with the dots being underlined in the smiles string and the replaced groups being bolded and tilted.
The molecule consists of five parts, and is marked as U W Z X Y; wherein uzy is of the same structure and W X is of the same structure. Taking X and Y as examples, Y has 3 alternative points marked as Y 0 ,Y 1 Y; x has 1 alternative point position, marked X 1 And a chiral switch position marked X 0 The run time in which heparin analogue structured smiles strings were generated in batches was 0.05471524899999736 seconds. Y is Y 0 May be substituted with "hydroxy", "sulfonate", "acetyl"; y is Y 1 Y 2 X 1 May be substituted with "hydroxy", "sulfonate"; x is X 0 Having chiral isomerism, expressed in the smiles rule as "[ C@H @]"sum" [ C@H]”。
This example defines the "template" of heparin analog structure smiles strings. Firstly, defining a structural molecular formula of the heparin analogue by using a label, replacing each substitution point by using the label, and defining a skeleton character string as follows, wherein only specific functional groups are needed to replace the label in the follow-up process.
The present example then defines the structural formula of the heparin analog as a "template". Since the formula is an "axisymmetric" structure centered on the third saccharide unit, direct traversal of the label with a functional group results in a repeated formula that varies across smiles strings, but is actually of the same structure (i.e., there are multiple smiles of the same structural formula). To solve the duplication problem, this embodiment introduces a "template" approach.
The definition of the template provided in this embodiment specifically operates as follows: in fig. 1, the U, Z, Y structure has 12 (3×2×2) types, and W, X has 4 (2×2) types. The 12 types of U, Z and Y are respectively indicated by numerals (0-11), and W, X are respectively indicated by numerals (0-3). Templates may be expressed as [ U, W, Z, X, Y ]. In total, according to the traversal, will generate 27648 templates% 12 x 4 x 12). This gave the following "template" representation methods "[0, 0], … …, [11,3,11,3,11 ]. The representation method has repeated representation, such as '2,1,3,3,5' and '5,3,3,1,2', wherein the two templates are 'axisymmetric', that is, the two templates represent the same compound, and then the repeated 'templates' can be subjected to a redundancy elimination algorithm to remove repeated structures.
FIG. 2 is a flow chart of a traversal of the redundancy elimination algorithm to determine whether the pentasaccharide structure of a heparin analog is repeated. The present embodiment provides a heparin analog structure smiles character string redundancy elimination algorithm, and the process of eliminating the repeated structure coordinates on the pentasaccharide unit of the heparin analog by using the redundancy elimination algorithm is as shown in fig. 2. The present embodiment converts the template into english characters (correspondence 0-a,1-B, 11-L) "[0, … …, [11,3,11,3,11] is" converted to "[ A, A, A, A ], … …, [ L, A, L, A, L ]". In this embodiment, the template is represented by an array in python, and can be converted into a character string to represent "AAAAA, … …, LALAL". The present example is run through to determine if the pentasaccharide unit structure of the heparin analog is duplicated and if so, deletion is performed. A total of 27648 categories for all templates were generated by the traversal, followed by 14112 by culling the remaining template categories.
This embodiment provides a "template" extension of heparin analog structure smiles strings. The present embodiment is based on the structural formula of heparin analog in the first step, and by substitution of sulfonic acid group, amino group or acetyl group on N, substitution of sulfonic acid group on O and chiral transformation of carboxyl group on C, the structure of heparin analog is randomly defined by smiles character string, and the structure is defined as follows:
contain_n= [ "," S (=o) [ O- ], "C (=o) C" ] # N point substitution
The position of the =s (=o) [ O- ] "] # O replaces the position of the =c = [" [ c@ @ H ] "," [ C@H ] "] # C chiral transformation
In this embodiment, the specific point position substitution is defined as follows in the smiles string:
U 0 =Z 0 =Y 0 =contact_n (indicating U 0 Point, Z 0 Point, Y 0 All points being substituted on N)
X 0 =W 0 =contact_c (meaning X 0 Point, W 0 Point, all chiral transformations on C
U 1 =U 2 =W 1 =Z 1 =Z 2 =X 1 =Y 1 =Y 2 =contain_O
Thus, the template in the second step can be expressed as [ U, W, Z, X, Y ]]Thus the template can be expressed as [ [ U ] 0 ,U 1 ,U 2 ],[W 0 ,W 1 ],[Z 0 ,Z 1 ,Z 2 ],[X 0 ,X 1 ],[Y 0 ,Y 1 ,Y 2 ]]Wherein U has the following 12 expression methods "[0, 0]]……[2,1,1]"; w has the following 4 expression methods "[0, 0]]……[1,1]"in turn may be expressed as" [ [0, 0]],[0,0],[0,0,0],[0,0],[0,0,0]]……[[2,1,1],[1,1],[2,1,1],[1,1],[2,1,1]]And replacing the label with the redundancy-removed template to obtain all smiles character string files. Fig. 3 is an overall flowchart of a random batch heparin analog structure coordinate by a smiles string generation algorithm, wherein the run time of the batch heparin analog structure smiles string "template" extension is 0.49323898200000005 seconds.
This embodiment uses RDkit to convert the smiles string file to a heparin analog structure coordinate file. FIG. 4 is a flow chart for converting smiles files to coordinate files using RDkit. RDkit is called through Python, smiles character string file is imported to perform batch generation of heparin analog structure coordinates, the concrete operation flow is shown in fig. 4, the generation of 100 structure coordinate files at random takes 76.820890146 seconds, all 14112 structure coordinate files are generated, and 10840.96 seconds are expected to be taken.
Fig. 5 is a flow chart of a procedure for randomizing heparin analog structure coordinates by a smiles string generation algorithm. In this embodiment, the random batch heparin analog structure coordinates of the smiles character string generation algorithm are realized through codes, the program flow of the random batch heparin analog structure coordinates of the smiles character string generation algorithm is shown in fig. 5, and the code writing flow is as follows: generating template code string_remove_symmetry; generating a smiles code create_smiles. Py; creating a coordinate file create_coordinate
FIG. 6 is a diagram of the pentasaccharide structure of heparin analogues in the first position, respectively, of proteins involved in mediating viral replication and invasion processes obtained by screening with biological computer software. In this example, heparin-based lead compounds against SARS-CoV-2 virus were virtually screened by biological calculation software, and the resulting heparin analogue pentasaccharide structure coordinates of the examples were made into a structure database by calculation biological software for screening the lead compounds against SARS-CoV-2 virus. Wherein we initiate in the course of replication and transcription and invasion of SARS-CoV-2 virus into host cellsChymotrypsin-like proteases (3 CL) PRO ) And spike protein (S) as a target protein, and a heparin pentasaccharide lead compound which binds tightly to the target protein is virtually screened from a pentasaccharide library of heparin analogues. Firstly, through literature investigation, with a chymotrypsin-like protease (PDB ID is 6LU 7) of SARS-CoV-2 and a spike protein (PDB ID is 6 VSB) crystal structure as references, virtual screening is carried out by using an open-source molecular simulation software Autodock VINA and a self-written shell script, which takes half a month, and two optimal heparin pentasaccharide compounds in 14112 structures respectively have the highest affinities for two target proteins, wherein the structure is shown in figure 6.
FIG. 7 is a graph showing the results of affinity for heparin and SARS-CoV-2 virus and inflammatory factor-related proteins obtained by SPR molecular interaction affinity assay. In this example, heparin and SARS-CoV-2 virus and inflammatory factor related protein affinity were verified by SPR molecular interaction affinity assay, and biotin-labeled heparin was immobilized on a chip. This example labels biotin on the reducing end of heparin, 10mg heparin with a molecular weight of about 15kDa and 11. Mu.L of aniline in NaAc buffer (100mM,pH 6.0,1.08mL) were reacted with 120. Mu.L of EZ-Link Alkoxyamine-PEG 4-biotin in DMSO (50 mM) at 37℃for 48 hours and the product purified using a 2mL-DEAE SEPHACEL (Cytiva) column.
Streptavidin (SA) gold sensor chip was plasma washed before heparin immobilization, biotin-labeled heparin (5. Mu.L, 2 mg/ml) was dissolved in 200. Mu.L HBS-P buffer (10mM HEPES,150mM NaCl,0.005% (v/v) surfactant P20), and experimental data was collected at 25℃with buffer flow rate of 30. Mu.L/min by continuous flow of HBS-P buffer over chip surface for 2 to 4 hours in response to signal to 800 units using multichannel SPR analyzer (Biacore S200, GE Healthcare) analysis. This example provides SPR molecular interaction affinity assay experiments for SARS-CoV-2 virus spike protein, several inflammatory factor proteins and heparin. SARS-CoV-2 spike protein S1 subunit and several inflammatory factor proteins (IL-1 beta, IL-2, IL-6, CCL8 wherein FGF2 is used as a control) are all solubilized in 10mM HBS-P buffer at a concentration dependent onThe secondary dilutions were 10.000,3.333,1.111,0.370,0.123,0.041,0.014 and 0.005mM. The diluted sample was injected into the detection channel for 60 seconds, after which it was buffered with PBS (2 mM KH 2 PO 4 ,10mM Na 2 HPO 4 137mM NaCl,2.7mM KCl,pH7.4) is washed for more than 90 seconds, the binding condition of a sample to be detected is analyzed by recording response signals, and finally, the equilibrium dissociation constant K for measuring the affinity of SARS-CoV-2 spike protein S1 subunit and several inflammatory factor proteins (IL-1 beta, IL-2, IL-6, CCL8, wherein FGF2 is used as a control) and heparin is obtained by software fitting D The values were respectively (IL-1 beta)/2997 (IL-2)/681.5 (IL-6)/11.3 (CCL 8)/7326 (Spike S1)/5.79 (FGF 2) nM, as shown in FIG. 7, indicating that heparin and SARS-CoV-2 virus Spike protein, as well as several inflammatory factor proteins, were also very high in affinity.
The embodiment provides a method for randomly generating heparin analogue structure coordinates in batches by adopting a smiles algorithm, wherein atom numbers on a compound structure can be redefined through a smiles character string, and repeated structures on the randomly generated heparin analogue structure can be removed by combining a redundancy removing algorithm. In the embodiment, a smiles character string generation algorithm is adopted to randomly generate heparin analogue structure coordinates in batches, a five sugar unit structure library of heparin or HS analogues is constructed, the five sugar unit structure library is used for subsequently screening HS analogues which are tightly combined with related receptor proteins, oligosaccharide lead drugs for resisting coronaviruses are rationally designed, and the designed HS lead compounds are synthesized through an in vitro biosynthesis way. The method can simply and rapidly generate the structure coordinates of all the possibility of specific substitution of a certain site on the heparin analogue oligosaccharide unit structure, can be widely applied to small molecules or macromolecular compounds, realizes specific modification of the specific site, and then randomly generates a large number of characteristic compound coordinates in batches.
It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the invention, and are also considered to be within the scope of the invention.

Claims (7)

1. A method for randomly generating heparin analog structure coordinates in batches by adopting a smiles algorithm, which is characterized by comprising the following steps:
generating a Label character string;
generating a digital template;
performing redundancy elimination processing on the digital template by using a redundancy elimination algorithm;
performing expansion processing on the digital template;
generating a smiles character string by using the digital template and the Label character string, wherein the atomic number on the heparin analogue structure is arranged corresponding to the smiles character string;
storing the smiles character string as a csv file;
importing a csv file storing smiles character strings by using a pandas tool;
converting the csv file into a mol structure by using an RDkit tool;
carrying out three-dimensional treatment on the mol structure, and simultaneously carrying out force field optimization on the mol structure;
and converting the mol structure into a coordinate file.
2. The method for randomly generating heparin analog structure coordinates in batches by using a smiles algorithm according to claim 1, wherein the step of performing redundancy elimination processing on the digital template by using a redundancy elimination algorithm comprises the following steps:
converting the digital template array into an English character string array;
traversing to remove redundant character strings in English character strings;
and converting the English character string into a digital template.
3. The method for randomly generating heparin analog structure coordinates in batches by using a smiles algorithm according to claim 1, wherein the step of performing the expansion processing on the digital template comprises the steps of:
a sulfonic acid group, amino group or acetyl group substitution treatment is performed on N of the heparin analog structure, a sulfonic acid group substitution treatment is performed on O of the heparin analog structure, and a carboxyl chiral conversion treatment is performed on C of the heparin analog structure.
4. The method for randomly generating heparin analog structure coordinates in batch by utilizing smiles algorithm according to claim 1, further comprising:
an example of a library of coordinates for the heparin analog structure is randomly generated, comprising 14112 coordinate files.
5. The method for randomly generating heparin analog structure coordinates in batches by using a smiles algorithm according to claim 1, wherein the redundancy elimination algorithm is used for eliminating repeated structures on randomly generated heparin analog oligosaccharide units.
6. The method for randomly generating heparin analog structure coordinates in batch by utilizing smiles algorithm according to claim 1, further comprising:
the resulting pentasaccharide structure coordinates of heparin analog are made into a structure database for screening lead compounds against SARS-CoV-2 virus.
7. The method for randomly generating heparin analog structure coordinates in batch by utilizing smiles algorithm according to claim 1, further comprising:
the obtained heparin analogue pentasaccharide structure lead compound is synthesized in vitro, and is used for carrying out SPR molecular interaction affinity analysis experiments with SARS-CoV-2 virus replication and invasion related proteins and inflammatory factor proteins.
CN202110590305.2A 2021-05-28 2021-05-28 Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm Active CN113223607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110590305.2A CN113223607B (en) 2021-05-28 2021-05-28 Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110590305.2A CN113223607B (en) 2021-05-28 2021-05-28 Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm

Publications (2)

Publication Number Publication Date
CN113223607A CN113223607A (en) 2021-08-06
CN113223607B true CN113223607B (en) 2023-10-20

Family

ID=77099080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110590305.2A Active CN113223607B (en) 2021-05-28 2021-05-28 Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm

Country Status (1)

Country Link
CN (1) CN113223607B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114464273A (en) * 2021-12-22 2022-05-10 天翼云科技有限公司 Molecular structure database construction method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1449410A (en) * 2000-07-27 2003-10-15 约翰斯·霍普金斯大学医学院 Growth differentiation factor receptors, agonists and antagonists thereof, and methods of using same
WO2009045541A2 (en) * 2007-10-05 2009-04-09 The Research Foundation Of State University Of New York Crystal of xpa and ercc1 complex and uses thereof
WO2011040971A2 (en) * 2009-09-30 2011-04-07 Pacific Biosciences Of California, Inc. Generation of modified polymerases for improved accuracy in single molecule sequencing
CN102690262A (en) * 2011-03-23 2012-09-26 上海市第一人民医院 Lead compound of targeting human La protein and purpose thereof in preparation of medicament for resisting hepatitis B virus
CN111741980A (en) * 2017-12-11 2020-10-02 新加坡科技研究局 Heparin and heparan sulfate oligosaccharides

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7166574B2 (en) * 2002-08-20 2007-01-23 Biosurface Engineering Technologies, Inc. Synthetic heparin-binding growth factor analogs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1449410A (en) * 2000-07-27 2003-10-15 约翰斯·霍普金斯大学医学院 Growth differentiation factor receptors, agonists and antagonists thereof, and methods of using same
WO2009045541A2 (en) * 2007-10-05 2009-04-09 The Research Foundation Of State University Of New York Crystal of xpa and ercc1 complex and uses thereof
WO2011040971A2 (en) * 2009-09-30 2011-04-07 Pacific Biosciences Of California, Inc. Generation of modified polymerases for improved accuracy in single molecule sequencing
CN102690262A (en) * 2011-03-23 2012-09-26 上海市第一人民医院 Lead compound of targeting human La protein and purpose thereof in preparation of medicament for resisting hepatitis B virus
CN111741980A (en) * 2017-12-11 2020-10-02 新加坡科技研究局 Heparin and heparan sulfate oligosaccharides

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肝素类似物合成方法的研究进展;李平利;王凤山;中国生化药物杂志;第31卷(第6期);427-430 *

Also Published As

Publication number Publication date
CN113223607A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
Holm et al. DALI shines a light on remote homologs: One hundred discoveries
Perez et al. Glycosaminoglycans: what remains to be deciphered?
Gama et al. Sulfation patterns of glycosaminoglycans encode molecular recognition and activity
Turnbull et al. Heparan sulfate: decoding a dynamic multifunctional cell regulator
Lue et al. Versatile protein biotinylation strategies for potential high-throughput proteomics
Shipp et al. Profiling the sulfation specificities of glycosaminoglycan interactions with growth factors and chemotactic proteins using microarrays
Samsonov et al. Computational analysis of interactions in structurally available protein–glycosaminoglycan complexes
Keiser et al. Direct isolation and sequencing of specific protein-binding glycosaminoglycans
Saad et al. Heparin sequencing using enzymatic digestion and ESI-MS n with HOST: a heparin/HS oligosaccharide sequencing tool
Debler et al. A glutamate/aspartate switch controls product specificity in a protein arginine methyltransferase
Ricard-Blum et al. Glycosaminoglycanomics: where we are
CN113223607B (en) Method for randomly generating heparin analogue structural coordinates in batch by adopting smiles algorithm
Clerc et al. A pipeline to translate glycosaminoglycan sequences into 3D models. Application to the exploration of glycosaminoglycan conformational space
Nagarajan et al. Molecular dynamics simulations to understand glycosaminoglycan interactions in the free-and protein-bound states
Yang et al. Characterization of structural motifs for interactions between glycosaminoglycans and proteins
Turnbull Heparan sulfate glycomics: towards systems biology strategies
Agostino et al. Development and application of site mapping methods for the design of glycosaminoglycans
Hook et al. High sensitivity analysis of nanogram quantities of glycosaminoglycans using ToF-SIMS
Thieker et al. Downstream products are potent inhibitors of the heparan sulfate 2-O-sulfotransferase
Wang et al. Efficient platform for synthesizing comprehensive heparan sulfate oligosaccharide libraries for decoding glycosaminoglycan–protein interactions
Guo et al. Novel perspectives of environmental proteomics
Holmes et al. 3-O-Sulfation induces sequence-specific compact topologies in heparan sulfate that encode a dynamic sulfation code
Roy et al. Effect of Sulfation on the Conformational Dynamics of Dermatan Sulfate Glycosaminoglycan: A Gaussian Accelerated Molecular Dynamics Study
Boothello et al. Chemoenzymatically prepared heparan sulfate containing rare 2-O-sulfonated glucuronic acid residues
Bogetti et al. LPATH: A Semiautomated Python Tool for Clustering Molecular Pathways

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant