CN113223632B - Determination method of molecular fragment library, molecular segmentation method and device - Google Patents

Determination method of molecular fragment library, molecular segmentation method and device Download PDF

Info

Publication number
CN113223632B
CN113223632B CN202110519595.1A CN202110519595A CN113223632B CN 113223632 B CN113223632 B CN 113223632B CN 202110519595 A CN202110519595 A CN 202110519595A CN 113223632 B CN113223632 B CN 113223632B
Authority
CN
China
Prior art keywords
molecular
segmentation
probability
determining
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110519595.1A
Other languages
Chinese (zh)
Other versions
CN113223632A (en
Inventor
李相彬
邹德超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wangshi Intelligent Technology Co ltd
Original Assignee
Beijing Wangshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wangshi Intelligent Technology Co ltd filed Critical Beijing Wangshi Intelligent Technology Co ltd
Priority to CN202110519595.1A priority Critical patent/CN113223632B/en
Publication of CN113223632A publication Critical patent/CN113223632A/en
Application granted granted Critical
Publication of CN113223632B publication Critical patent/CN113223632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for determining a molecular fragment library, a method for dividing molecules and a device, wherein the method for determining the molecular fragment library comprises the following steps: acquiring a molecular data set; cutting molecules in the molecular data set to obtain a plurality of molecular fragments and initial segmentation probability of each molecular fragment; updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment; and determining the set of the molecular fragments, the target segmentation probability of which meets the preset condition, as a molecular fragment library. According to the method, the molecules in the molecular data set are segmented to obtain various molecular fragments and corresponding initial segmentation probabilities, and the initial segmentation probabilities are updated by using a preset algorithm, so that the probability of the obtained molecular fragments is more accurate, and the molecules can be segmented by using the molecular fragment library conveniently.

Description

Determination method of molecular fragment library, molecular segmentation method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method for determining a molecular fragment library, a method for dividing molecules and a device for dividing molecules.
Background
The discovery of drug molecules associated with the treatment of diseases is a major goal of drug research and development in the pharmaceutical industry, and there are a wide variety of currently known drug discovery methods such as structure-based and mechanism-based drug discovery, combinatorial chemistry-based and high throughput screening-based drug discovery, and the like. At present, drug discovery based on molecular fragments is emerging. With the advent of molecular fragment-based drug discovery, molecular fragments are also being of great interest, and molecular cleavage is particularly important in order to obtain molecular fragments.
The related art molecular cleavage method generally performs cleavage according to the angle of molecular synthesis, and the specific method is as follows: molecular segmentation based on the BRICS algorithm (Breaking of Retrosynthetically Interesting Chemical Substructure), which is to judge whether a chemical bond should be cleaved or not based on whether the bond can be synthesized, retains valuable and functional structure, specifically, scans sequentially from the 0 th atom, and cleaves the current molecule into two fragments once a cleavable bond is found. However, the method is carried out by a technician according to human experience, all the obtained molecular fragments are fragments lacking in novelty, and some characteristics in the molecules can be ignored, so that the molecular segmentation is inaccurate.
Disclosure of Invention
Therefore, the invention aims to overcome the defect that in the prior art, the molecular segmentation is inaccurate due to the segmentation according to human experience, thereby providing a method for determining a molecular fragment library, a method for molecular segmentation and a device thereof.
According to a first aspect, the invention discloses a method for determining a molecular fragment library, comprising the following steps: acquiring a molecular data set; cutting molecules in the molecular data set to obtain a plurality of molecular fragments and initial segmentation probability of each molecular fragment; updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment; and determining the set of the molecular fragments, the target segmentation probability of which meets the preset condition, as a molecular fragment library.
Optionally, before the cleaving of the molecules in the molecular dataset to obtain a plurality of molecular fragments and an initial segmentation probability for each molecular fragment, the method further comprises: and removing molecules which do not meet preset segmentation conditions in the molecular data set.
Optionally, the cleaving the molecules in the molecular dataset to obtain a plurality of molecular fragments and an initial segmentation probability of each molecular fragment, including: cutting molecules in the molecular data set to obtain the plurality of molecular fragments, wherein the length of the molecular fragments is smaller than or equal to a preset length; the initial segmentation probability of each molecular fragment is determined according to the number of each molecular fragment and the number of all molecular fragments.
Optionally, the updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment includes: determining a segmentation mode of each molecule in the molecule data set; determining the segmentation probability of each segmentation mode according to the initial segmentation probability; normalizing the segmentation probability of each segmentation mode to obtain a normalization processing result; updating the initial segmentation probability of each molecular fragment according to the normalization processing result; repeatedly executing the steps of determining the segmentation probability of each segmentation mode according to the initial segmentation probability, and updating the initial segmentation probability of each molecular segment according to the normalization processing result until a first preset iteration condition is met, so as to obtain the target segmentation probability of the molecular segment.
Optionally, the updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment includes: obtaining a molecular structure diagram of each molecule; determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecular fragment, wherein the first probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from left to right, and the second probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from right to left; updating the initial segmentation probability of each molecular fragment according to the first probability set and the second probability set; and repeatedly executing the steps of determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecule fragment, and updating the initial segmentation probability of each molecule fragment according to the first probability set and the second probability set until a second preset iteration condition is met, so as to obtain the target segmentation probability of the molecule fragment.
According to a second aspect, the invention also discloses a molecular segmentation method, comprising the following steps: obtaining molecules to be segmented; determining a segmentation mode of the molecules to be segmented; determining the segmentation probability of each segmentation mode according to the target segmentation probability of the molecular fragments in the molecular fragment library, wherein the molecular fragment library is obtained according to the method for determining the molecular fragment library according to the first aspect or any optional implementation manner of the first aspect; and dividing the molecules to be divided according to the dividing mode corresponding to the maximum value of the dividing probability.
According to a third aspect, the invention also discloses a device for determining a molecular fragment library, which comprises: the molecular data set acquisition module is used for acquiring a molecular data set; the cutting module is used for cutting the molecules in the molecular data set to obtain a plurality of molecular fragments and the initial segmentation probability of each molecular fragment; the updating module is used for updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment; and the molecular fragment library determining module is used for determining the set of the molecular fragments, the target segmentation probability of which meets the preset condition, as a molecular fragment library.
According to a fourth aspect, the present invention also discloses a molecular segmentation apparatus, comprising: the molecule to be segmented obtaining module is used for obtaining the molecule to be segmented; the segmentation mode determining module is used for determining the segmentation mode of the molecules to be segmented; the segmentation probability determining module is configured to determine the segmentation probability of each segmentation mode according to a target segmentation probability of a molecular fragment in a molecular fragment library, where the molecular fragment library is obtained according to the first aspect or the method for determining a molecular fragment library according to any optional implementation manner of the first aspect; the segmentation module is used for segmenting the molecules to be segmented according to the segmentation mode corresponding to the maximum segmentation probability.
According to a fifth aspect, the invention also discloses a computer device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method of determining a library of molecular fragments according to the first aspect or any alternative embodiment of the first aspect or the steps of the method of molecular segmentation according to the second aspect.
According to a sixth aspect, the present invention also discloses a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method of determining a library of molecular fragments according to the first aspect or any of the alternative embodiments of the first aspect or the steps of the method of molecular segmentation according to the second aspect.
The technical scheme of the invention has the following advantages:
1. according to the method and the device for determining the molecular fragment library, provided by the invention, the molecular data set is obtained, the molecules in the molecular data set are cut, so that a plurality of molecular fragments and the initial segmentation probability of each molecular fragment are obtained, the initial segmentation probability of each molecular fragment is updated according to a preset algorithm, the target segmentation probability of the molecular fragments is obtained, and the set of the molecular fragments, of which the target segmentation probability meets the preset condition, is determined as the molecular fragment library. According to the method, the molecules in the molecular data set are segmented to obtain various molecular fragments and corresponding initial segmentation probabilities, and the initial segmentation probabilities are updated by using a preset algorithm, so that the probability of the obtained molecular fragments is more accurate, and the molecules can be segmented by using the molecular fragment library conveniently.
2. According to the molecular segmentation method and device, the segmentation mode of the molecules to be segmented is determined by acquiring the molecules to be segmented, the segmentation probability of each segmentation mode is determined according to the target segmentation probability of the molecule fragments in the molecule fragment library, and the molecules to be segmented are segmented according to the segmentation mode corresponding to the maximum value of the segmentation probabilities. According to the method, the segmentation probability of each segmentation mode of the molecules to be segmented is determined through the target segmentation probability of the molecular fragments in the molecular fragment library, the molecules to be segmented are segmented according to the segmentation mode with the largest segmentation probability, more attention is paid to the occurrence probability of each fragment, certain characteristic of the molecular fragments is not favored, various characteristics are integrated, various properties of the fragments are excellent, the segmentation of the molecules is accurate, and the possibility of drug preparation of the molecular fragments can be increased.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart showing a specific example of a method for determining a molecular fragment library according to an embodiment of the present invention;
FIG. 2 is a diagram showing a molecular structure according to an embodiment of the present invention;
FIG. 3 is a diagram showing a specific example of determining the increase amount of a molecular fragment on the main chain in the embodiment of the present invention;
FIG. 4 (a) is a diagram illustrating one embodiment of traversing a molecular structure diagram from left to right to determine backbone molecular fragments linked to non-backbone molecular fragments in an embodiment of the present invention;
FIG. 4 (b) is a diagram showing one embodiment of the present invention for traversing a molecular structure diagram from right to left to determine backbone molecular fragments linked to non-backbone molecular fragments;
FIG. 4 (c) is a specific example of determining the increase of molecular fragments not in the main chain in the embodiment of the present invention;
FIG. 5 is a flowchart showing a specific example of a molecular segmentation method according to an embodiment of the present invention;
FIG. 6 is a diagram showing a specific example of a molecule to be split according to an embodiment of the present invention;
FIG. 7 is a diagram showing a specific example of partitioning a molecule to be partitioned using the BRICS algorithm according to an embodiment of the present invention;
FIG. 8 is a diagram showing a specific example of dividing a molecule to be divided by using a molecular fragment library according to an embodiment of the present invention;
FIG. 9 is a schematic block diagram of a specific example of a molecular fragment library determining apparatus in an embodiment of the present invention;
FIG. 10 is a schematic block diagram of a specific example of a molecular segmentation apparatus in accordance with an embodiment of the present invention;
FIG. 11 is a diagram showing a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The embodiment of the invention discloses a method for determining a molecular fragment library, which is shown in fig. 1 and comprises the following steps:
s11: a molecular dataset is acquired.
The molecular data set is illustratively a collection comprising a plurality of different drug molecules, which may be obtained from a chemical database or may be obtained from a search engine search. The method for acquiring the molecular data set according to the embodiment of the invention is not particularly limited, and can be determined by a person skilled in the art according to actual conditions. The present invention uses drug molecules obtained from a CHEMBL-25 database that includes 1879206 molecules, corresponding to 1879206 molecules in the molecular data set.
S12: and cutting molecules in the molecular data set to obtain a plurality of molecular fragments and initial segmentation probability of each molecular fragment.
Illustratively, the cleavage of a molecule in the molecular data set may be a division of the molecule by a predetermined length (e.g., 4), resulting in molecular fragments of all predetermined lengths. In order to obtain more various molecular fragments, in the embodiment of the present invention, when the molecules are cut, all the molecules may be cut into molecular fragments with a length of 1, and then the molecular fragments with a length of 1 may be combined according to the molecular structure, where the length of the combined molecular fragments does not exceed the preset length. The method for dividing the molecule and the preset length are not particularly limited, and can be determined by a person skilled in the art according to practical situations.
In the embodiment of the invention, the molecular fragment may be composed of only nodes, or may be composed of nodes and edges, wherein the nodes may include a single atom, a ring structure, a parallel ring structure, a bridge ring structure, or a spiro ring structure, and the edges may include single bonds, double bonds, triple bonds, and the like. The length of a molecular fragment refers to the number of all nodes in the molecular fragment.
The initial segmentation probability for each molecular fragment may be determined based on the ratio of the number of occurrences of that molecular fragment to the sum of the number of occurrences of all molecular fragments.
S13: and updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment.
Illustratively, the initial segmentation probabilities obtained after segmentation of the molecules in the molecular dataset are inaccurate, and each initial segmentation probability needs to be updated with a preset algorithm. The preset algorithm may be a maximum expected algorithm (Expectation Maximization Algorithm, EM). The embodiment of the invention does not limit the preset algorithm specifically, and the person skilled in the art can determine the preset algorithm according to the actual situation. According to the embodiment of the invention, the initial segmentation probability of the molecular fragments is updated, so that the segmentation probability (namely the target segmentation probability) of the obtained molecular fragments is more accurate.
S14: and determining the set of the molecular fragments with the target segmentation probability meeting the preset condition as a molecular fragment library.
The preset condition may be, for example, that the target segmentation probability is greater than a preset threshold (e.g., 10 -8 ) The target segmentation probability may also be within a target ranking (e.g., 80 ten thousand) of the molecular fragments according to the target segmentation probability for each molecular fragment. The embodiment of the invention does not limit the preset condition or the preset threshold, and the person skilled in the art can determine the preset condition or the preset threshold according to the actual situation. According to the embodiment of the invention, the set of the molecular fragments with larger target segmentation probability is determined as the molecular fragment library, so that the molecular fragments with smaller target segmentation probability are eliminated, and the calculated amount during molecular segmentation is reduced.
According to the method for determining the molecular fragment library, provided by the invention, the molecular data set is obtained, the molecules in the molecular data set are cut, so that a plurality of molecular fragments and the initial segmentation probability of each molecular fragment are obtained, the initial segmentation probability of each molecular fragment is updated according to a preset algorithm, the target segmentation probability of the molecular fragment is obtained, and the set of the molecular fragments, of which the target segmentation probability meets the preset condition, is determined as the molecular fragment library. According to the method, the molecules in the molecular data set are segmented to obtain various molecular fragments and corresponding initial segmentation probabilities, and the initial segmentation probabilities are updated by using a preset algorithm, so that the probability of the obtained molecular fragments is more accurate, and the molecules can be segmented by using the molecular fragment library conveniently.
As an optional implementation manner of the embodiment of the present invention, before the step S12, the method for determining a molecular fragment library further includes:
and removing molecules in the molecular data set which do not meet the preset segmentation condition.
Illustratively, the preset segmentation condition may include: a number of heavy atoms greater than or equal to 10, and a number of heavy atoms less than or equal to 50, a number of intramolecular bonds greater than 65, a number of molecules including rare atoms in the drug, a number of molecules including rare bonds in the drug, and a number of molecules unrecognizable by a predetermined software (e.g., RDKit). Wherein rare atoms in the drug may include: boron, phosphorus, etc. Rare linkages in a drug may include: chelate bonds, and the like. Molecules that cannot be identified by the preset software are molecules that violate the chemical rules, and such molecules may be saved to the molecular database due to a molecular record error.
1607036 molecules can be obtained when molecules in the molecular dataset that do not meet all of the preset segmentation conditions described above are removed.
It should be noted that 10, 50, and 65 are only one specific implementation of the embodiment of the present invention, and are not meant to limit the present invention, and those skilled in the art may also adopt other implementation modes based on the above modes.
According to the embodiment of the invention, the molecules which do not meet the preset segmentation conditions are removed, so that the subsequent data processing amount is reduced, and the data processing efficiency is improved.
As an alternative implementation manner of the embodiment of the present invention, the step S13 includes:
first, the manner of segmentation of each molecule in the molecular dataset is determined.
For example, in order to ensure accuracy of molecule segmentation, in the embodiment of the present invention, all possible segmentation modes of each molecule are determined first, specifically, the molecule fragments corresponding to the initial segmentation probability may be segmented, for example, if one molecule is composed of 3 atoms or rings A, B, C, then there are 4 segmentation modes of the molecule, where the segmentation modes may include: (1) A/B/C, (2) AB/C, (3) A/BC, and (4) ABC. Where "/" indicates the location of the segmentation.
And secondly, determining the segmentation probability of each segmentation mode according to the initial segmentation probability.
For example, determining the segmentation probability of each segmentation method according to the initial segmentation probability may be to add the initial segmentation probabilities of the molecular fragments corresponding to each segmentation method, or may be to multiply the initial segmentation probabilities of the molecular fragments corresponding to each segmentation method. The method for determining the segmentation probability of each segmentation mode according to the embodiment of the invention is not particularly limited, and can be determined by a person skilled in the art according to actual situations. The embodiment of the invention is described by taking the example of multiplying the initial segmentation probability of the molecular fragments corresponding to each segmentation mode.
Taking the molecule to be segmented composed of A, B, C as an example for illustration, the segmentation probability of the first segmentation mode "a/B/C" may be:
P(seg1)=p(A)·p(B)·p(C)
where P (seg 1) represents the segmentation probability of the first segmentation method, and P (a), P (B), and P (C) represent the initial segmentation probabilities of the molecular fragments A, B, C, respectively.
P(seg2)=p(AB)·p(C)
Where P (seg 2) represents the segmentation probability of the first segmentation approach and P (AB) represents the initial segmentation probability of the molecular fragment AB.
The same can be done to calculate the segmentation probabilities for other segmentation modes.
And thirdly, normalizing the segmentation probability of each segmentation mode to obtain a normalization processing result.
Illustratively, normalizing the segmentation probability for each segmentation means dividing the probability according to each segmentation means by the sum of the probabilities for all segmentation means. The sum of probabilities of the normalization processing results of all the division modes is 1.
And updating the initial segmentation probability of each molecular fragment according to the normalization processing result.
Illustratively, updating the initial segmentation probability for each molecular fragment according to the normalization process result may be: and summing the normalization processing result of each segmentation mode with the initial segmentation probabilities of all the molecular fragments corresponding to each segmentation mode.
For example, when the normalization result of the first segmentation method is 0.0005, the update result is: p (A) +0.0005, p (B) +0.0005, p (C) +0.0005. Similarly, an updated value of the initial segmentation probability of the molecular fragments for each segmentation mode of each molecule can be obtained.
And repeatedly executing the steps of determining the segmentation probability of each segmentation mode according to the initial segmentation probability and updating the initial segmentation probability of each molecular segment according to the normalization processing result until a first preset iteration condition is met, so as to obtain the target segmentation probability of the molecular segment. The first preset iteration condition may be that the number of iterations reaches a preset number (e.g., 5). The preset number of times is not particularly limited in the embodiment of the present invention, and may be determined by a person skilled in the art according to actual situations.
According to the embodiment of the invention, the initial segmentation probability of the molecular fragments is iterated for a plurality of times, so that the obtained target segmentation probability is more accurate.
For the method for updating the initial segmentation probability of the above molecular fragments, one molecule has n molecular fragments, and 2 needs to be calculated n-1 In the number division manner, when the number of n is larger, the calculated amount of the update process is larger, and the increase amount of the segment C in one iteration process can be obtained through the above example:
where α is the sum of the segmentation probabilities of all segmentation modes. Therefore, in the embodiment of the present invention, when calculating the increment of the molecular fragment C in one iteration process, in order to reduce the calculation amount, only the sum of the segmentation probabilities of all the segmentation modes and the sum of the probabilities of all the segmentation modes after the fragment C is removed need to be calculated.
Therefore, the embodiment of the invention provides a dynamic programming algorithm based on a molecular structure diagram to calculate the increment of each molecular fragment, which comprises the following steps:
first, a molecular structure diagram of each molecule is obtained.
Illustratively, the molecular structure diagram includes nodes and edges, wherein a node is a single atom or a ring (a monocyclic structure, a parallel ring structure, a bridged ring structure, or a spiro ring structure, etc.), and an edge is a bond (e.g., single bond, double bond, triple bond, etc.) connecting the nodes. The molecular structure diagram is stored in the molecular database in advance, and the acquisition of the molecular structure diagram can be directly acquired from the molecular database. FIG. 2 is a diagram showing a molecular structure according to an embodiment of the present invention, wherein A, B, C, D, E, F, G, H is a node of a molecule.
And secondly, determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecular fragment, wherein the first probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from left to right, and the second probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from right to left.
Illustratively, as shown in fig. 2, the determination method for determining the first probability set is as follows:
first, performing hierarchical traversal from a first node A to obtain a sequence: a B C D E F G H, and then calculating the sum of all segmentation possibility probabilities of the current node from the last node H of the sequence to obtainWherein j is E [0, n]N is the number of { A, B, C, D, E, F, G, H }, the formula is as follows:
wherein i represents the number of nodes contained in the partitioned molecular fragment containing the node j, and the value is [1,4 ]];p(C j ) Representing an initial segmentation probability of each node; represents the start of this molecule when j=0A state set to 1, when j=1 indicates that there is only one node at the time, i.e., H, at the time p (C j )=p(C H ),When j is greater than or equal to 2, calculate +.>And (C) is->Subtracting the number of nodes of the current molecular fragment from the value representing the current node j, and recording +. >When A is calculated, record +.>For later use, while obtaining the backbone sequence of the molecule: aB D G H.
Similarly, the second probability set is determined as follows: hierarchical traversal is firstly carried out from the last node H to obtain a sequence: h G D F E B C A, then calculating the sum of all segmentation possibility probabilities of the current node from the last node A of the sequence to obtainSimilarly, let go of>
Again, the initial segmentation probability for each molecular fragment is updated according to the first probability set and the second probability set.
Illustratively, when updating the initial segmentation probability of each molecular fragment according to the first probability set and the second probability set, the increment of each molecular fragment needs to be calculated, and in the embodiment of the present invention, there are 2 cases when the increment of the molecular fragment is calculated, that is, when the molecular fragment is on the main chain and when the molecular fragment is not on the main chain. The following describes how these 2 cases are calculated, respectively.
When the molecular fragment is in the main chain, as shown in fig. 3, for example, the molecular fragment BD is increased by:
wherein add (BD) represents an increase in molecular fragment BD;representing the segmentation probability of traversing the molecular structure diagram node A from right to right; />Representing the segmentation probability of traversing the molecular structure diagram node C from right to left; / >Representing the segmentation probability of traversing the molecular structure diagram node G from right to left; />Representing the segmentation probability of traversing the molecular structure diagram node F from right to left; />Representing the segmentation probability of traversing the molecular structure diagram node E from right to left; p (BD) represents the segmentation probability of the molecular fragment BD; alpha represents the sum of segmentation probabilities of all segmentation modes, < ->
When the molecular fragment is not on the main chain, it is necessary to search from this point to the main chain until the node of the main chain is searched, for example, the node on the main chain searched when the increment of the molecular fragment E is calculated is the node D, and FIG. 4 (a) is a view of traversing the molecular structure from right to leftFIG. 4 (b) is the +.>FIG. 4 (c) shows the determination of the increase in the molecular fragment E:
wherein add (E) represents an increase in molecular fragment E; p (E) represents the initial segmentation probability of the molecular fragment E; f (F) D The segmentation probability of all segmentation modes of the node D is represented; alpha represents the sum of segmentation probabilities for all segmentation modes.
And repeatedly executing the steps of determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecule fragment until the initial segmentation probability of each molecule fragment is updated according to the first probability set and the second probability set until a second preset iteration condition is met, and obtaining the target segmentation probability of the molecule fragment. The second preset iteration condition may be the same as the first preset iteration condition, for example, the number of iterations reaches a preset number (e.g., 5). The second preset iteration condition may also be different from the first preset condition, for example, the first preset condition is that the number of iterations reaches 5 times, and the second preset iteration condition may be that the number of iterations reaches 3 times. The embodiment of the present invention does not specifically limit the second preset iteration condition, and those skilled in the art can determine the second preset iteration condition according to the actual situation.
According to the invention, the molecular structure diagram is abstracted into a chain structure, the probability of all nodes on the main chain can be calculated by performing forward and reverse traversal calculation twice according to the hierarchical traversal sequence, and when the probability of the nodes on the branched chain is calculated, only a small amount of extra calculation is needed from the calculated main chain nodes, so that the calculation complexity is greatly reduced.
The embodiment of the invention also discloses a molecular segmentation method, as shown in fig. 5, comprising the following steps:
s21: obtaining the molecules to be segmented.
The molecules to be segmented may or may not be molecules in the molecular database, for example. The molecules to be segmented can be sent to the processor by a staff directly through a wired or wireless network, or can be obtained from a memory for storing the molecules in advance. The source of the molecule to be segmented and the obtaining method of the molecule to be segmented are not particularly limited in the embodiment of the invention, and can be determined by a person skilled in the art according to actual situations.
S22: determining the segmentation mode of the molecules to be segmented. The detailed description of the steps related to the foregoing embodiments is omitted herein.
S23: determining the segmentation probability of each segmentation mode according to the target segmentation probability of the molecular fragments in the molecular fragment library, wherein the molecular fragment library is obtained according to the determination method of the molecular fragment library described in the embodiment. The detailed description of the steps related to the foregoing embodiments is omitted herein.
It should be noted that, when the molecule to be segmented is not a molecule in the molecular database, it is necessary to ensure that all the molecular fragments that can be segmented by the molecule to be segmented can be obtained from the target segmentation probabilities of the molecular fragments in the molecular fragment library.
S24: and dividing the molecules to be divided according to the dividing mode corresponding to the maximum value of the dividing probability.
According to the molecular segmentation method provided by the invention, the segmentation mode of the molecules to be segmented is determined by acquiring the molecules to be segmented, the segmentation probability of each segmentation mode is determined according to the target segmentation probability of the molecule fragments in the molecule fragment library, and the molecules to be segmented are segmented according to the segmentation mode corresponding to the maximum value of the segmentation probabilities. According to the method, the segmentation probability of each segmentation mode of the molecules to be segmented is determined through the target segmentation probability of the molecular fragments in the molecular fragment library, the molecules to be segmented are segmented according to the segmentation mode with the largest segmentation probability, more attention is paid to the occurrence probability of each fragment, certain characteristic of the molecular fragments is not favored, various characteristics are integrated, various properties of the fragments are excellent, the segmentation of the molecules is accurate, and the possibility of drug preparation of the molecular fragments can be increased.
Fig. 6 shows a molecule to be segmented, fig. 7 shows a molecular fragment obtained by segmenting the molecule to be segmented using the BRICS algorithm, and fig. 8 shows a molecular fragment obtained by segmenting the molecule to be segmented using the molecular fragment library. As shown in the following table 1, according to expert evaluation, the accuracy of dividing the molecules to be divided by using the molecular fragment library is 93%, and compared with 72% of the accuracy of dividing by the BRICS algorithm and 86% of the accuracy of dividing by the RECAP algorithm, the accuracy of dividing by the BRICS algorithm is greatly improved.
TABLE 1 accuracy of molecular segmentation method
The embodiment of the invention also discloses a device for determining the molecular fragment library, as shown in fig. 9, comprising:
a molecular data set acquisition module 31 for acquiring a molecular data set; the specific implementation manner is described in the above embodiment in the related description of step S11, which is not repeated here.
A cutting module 32, configured to cut the molecules in the molecular data set to obtain a plurality of molecular fragments and an initial segmentation probability of each molecular fragment; the specific implementation manner is described in the above embodiment in the related description of step S12, which is not repeated here.
An updating module 33, configured to update an initial segmentation probability of each molecular segment according to a preset algorithm, so as to obtain a target segmentation probability of the molecular segment; the specific implementation manner is described in the above embodiment in the related description of step S13, which is not repeated here.
The molecular fragment library determining module 34 is configured to determine, as a molecular fragment library, a set of molecular fragments whose target segmentation probability satisfies a preset condition. The specific implementation manner is described in the above embodiment in the related description of step S14, which is not repeated here.
According to the determination device of the molecular fragment library, a molecular data set is obtained, molecules in the molecular data set are cut, so that various molecular fragments and initial segmentation probability of each molecular fragment are obtained, the initial segmentation probability of each molecular fragment is updated according to a preset algorithm, target segmentation probability of the molecular fragments is obtained, and a set of the molecular fragments, of which the target segmentation probability meets preset conditions, is determined as the molecular fragment library. According to the method, the molecules in the molecular data set are segmented to obtain various molecular fragments and corresponding initial segmentation probabilities, and the initial segmentation probabilities are updated by using a preset algorithm, so that the probability of the obtained molecular fragments is more accurate, and the molecules can be segmented by using the molecular fragment library conveniently.
As an optional implementation manner of the embodiment of the present invention, the device for determining a molecular fragment library further includes:
and the removing module is used for removing molecules which do not meet the preset segmentation conditions in the molecular data set. The specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
As an alternative implementation of the embodiment of the present invention, the cutting module 32 includes:
the cutting sub-module is used for cutting molecules in the molecular data set to obtain a plurality of molecular fragments, and the length of the molecular fragments is smaller than or equal to the preset length; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
And the initial segmentation probability determining module is used for determining the initial segmentation probability of each molecular fragment according to the number of each molecular fragment and the number of all the molecular fragments. The specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
As an alternative implementation manner of the embodiment of the present invention, the update module 33 includes:
the first determining module is used for determining the segmentation mode of each molecule in the molecule data set; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The second determining module is used for determining the segmentation probability of each segmentation mode according to the initial segmentation probability; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The normalization processing module is used for normalizing the segmentation probability of each segmentation mode to obtain a normalization processing result; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The first updating sub-module is used for updating the initial segmentation probability of each molecular fragment according to the normalization processing result; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The first repeated execution module is used for repeatedly executing the steps of determining the segmentation probability of each segmentation mode according to the initial segmentation probability and updating the initial segmentation probability of each molecular segment according to the normalization processing result until a first preset iteration condition is met, and obtaining the target segmentation probability of the molecular segment. The specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
As an alternative implementation manner of the embodiment of the present invention, the update module 33 includes:
the molecular structure diagram acquisition module is used for acquiring a molecular structure diagram of each molecule; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The probability set determining module is used for determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecular fragment, wherein the first probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from left to right, and the second probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from right to left; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The second updating sub-module is used for updating the initial segmentation probability of each molecular fragment according to the first probability set and the second probability set; the specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The second repeated execution module is used for repeatedly executing the steps of determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecule fragment, and updating the initial segmentation probability of each molecule fragment according to the first probability set and the second probability set until a second preset iteration condition is met, so as to obtain the target segmentation probability of the molecule fragment. The specific implementation manner is described in the related description of the corresponding steps in the foregoing embodiments, which is not repeated herein.
The embodiment of the invention also discloses a molecular dividing device, as shown in fig. 10, comprising:
a molecule to be segmented acquisition module 41 for acquiring a molecule to be segmented; the specific implementation manner is described in the above embodiment in the related description of step S21, which is not repeated here.
A segmentation method determination module 42, configured to determine a segmentation method of a molecule to be segmented; the specific implementation manner is described in the above embodiment in the related description of step S22, which is not repeated here.
A segmentation probability determining module 43, configured to determine a segmentation probability of each segmentation mode according to a target segmentation probability of a molecular fragment in a molecular fragment library, where the molecular fragment library is obtained according to the determination method of the molecular fragment library described in the above embodiment; the specific implementation manner is described in the above embodiment in the related description of step S23, which is not repeated here.
The segmentation module 44 is configured to segment the molecule to be segmented according to a segmentation method corresponding to the maximum value of the segmentation probability. The specific implementation manner is described in the above embodiment in the related description of step S24, which is not repeated here.
According to the molecular segmentation device provided by the invention, the segmentation mode of the molecules to be segmented is determined by acquiring the molecules to be segmented, the segmentation probability of each segmentation mode is determined according to the target segmentation probability of the molecule fragments in the molecule fragment library, and the molecules to be segmented are segmented according to the segmentation mode corresponding to the maximum value of the segmentation probabilities. According to the method, the segmentation probability of each segmentation mode of the molecules to be segmented is determined through the target segmentation probability of the molecular fragments in the molecular fragment library, the molecules to be segmented are segmented according to the segmentation mode with the largest segmentation probability, more attention is paid to the occurrence probability of each fragment, certain characteristic of the molecular fragments is not favored, various characteristics are integrated, various properties of the fragments are excellent, the segmentation of the molecules is accurate, and the possibility of drug preparation of the molecular fragments can be increased.
The embodiment of the present invention further provides a computer device, as shown in fig. 11, which may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or other means, and in fig. 11, the connection is exemplified by a bus.
The processor 51 may be a central processing unit (Central Processing Unit, CPU). The processor 51 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 52 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to a method for determining a molecular fragment library or a method for dividing a molecule (e.g., the molecular data set obtaining module 31, the cutting module 32, the updating module 33, and the molecular fragment library determining module 34 shown in fig. 9, or the molecule to be divided obtaining module 41, the division manner determining module 42, the division probability determining module 43, and the division module 44 shown in fig. 10) in the embodiment of the present invention. The processor 51 executes various functional applications of the processor and data processing, i.e., implements the method of determining a molecular fragment library or the method of molecular segmentation in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 52.
Memory 52 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 51, etc. In addition, memory 52 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51, which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 52 and when executed by the processor 51 perform the method of determining a library of molecular fragments or the method of molecular segmentation in the embodiments shown in fig. 1-8.
The details of the above-mentioned computer device may be understood correspondingly with respect to the corresponding relevant descriptions and effects in the embodiments shown in fig. 1 to 8, and will not be repeated here.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (RandomAccessMemory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (8)

1. A method for determining a library of molecular fragments, comprising the steps of:
acquiring a molecular data set;
cutting molecules in the molecular data set to obtain a plurality of molecular fragments and initial segmentation probability of each molecular fragment;
updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment;
determining a set of molecular fragments, wherein the target segmentation probability of the set of molecular fragments meets a preset condition, as a molecular fragment library;
the method for cutting the molecules in the molecular data set to obtain a plurality of molecular fragments and the initial segmentation probability of each molecular fragment comprises the following steps:
cutting molecules in the molecular data set to obtain the plurality of molecular fragments, wherein the length of the molecular fragments is smaller than or equal to a preset length;
determining the initial segmentation probability of each molecular fragment according to the number of each molecular fragment and the number of all molecular fragments;
The updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain a target segmentation probability of the molecular fragment comprises the following steps:
determining a segmentation mode of each molecule in the molecule data set;
determining the segmentation probability of each segmentation mode according to the initial segmentation probability;
normalizing the segmentation probability of each segmentation mode to obtain a normalization processing result;
updating the initial segmentation probability of each molecular fragment according to the normalization processing result;
repeatedly executing the steps of determining the segmentation probability of each segmentation mode according to the initial segmentation probability, and updating the initial segmentation probability of each molecular segment according to the normalization processing result until a first preset iteration condition is met, so as to obtain the target segmentation probability of the molecular segment.
2. The method of claim 1, wherein prior to said cleaving the molecules in the molecular dataset to obtain a plurality of molecular fragments and an initial segmentation probability for each molecular fragment, the method further comprises:
and removing molecules which do not meet preset segmentation conditions in the molecular data set.
3. The method according to claim 1 or 2, wherein updating the initial segmentation probability of each molecular segment according to a preset algorithm to obtain a target segmentation probability of the molecular segment comprises:
obtaining a molecular structure diagram of each molecule;
determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecular fragment, wherein the first probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from left to right, and the second probability set represents the segmentation probability of each node obtained by traversing the nodes of the molecular structure diagram from right to left;
updating the initial segmentation probability of each molecular fragment according to the first probability set and the second probability set;
and repeatedly executing the steps of determining a first probability set and a second probability set of each molecule according to the molecular structure diagram and the initial segmentation probability of each molecule fragment, and updating the initial segmentation probability of each molecule fragment according to the first probability set and the second probability set until a second preset iteration condition is met, so as to obtain the target segmentation probability of the molecule fragment.
4. A method of molecular segmentation comprising the steps of:
obtaining molecules to be segmented;
determining a segmentation mode of the molecules to be segmented;
determining the segmentation probability of each segmentation mode according to the target segmentation probability of the molecular fragments in a molecular fragment library, wherein the molecular fragment library is obtained according to the determination method of the molecular fragment library according to any one of claims 1-3;
and dividing the molecules to be divided according to the dividing mode corresponding to the maximum value of the dividing probability.
5. A device for determining a library of molecular fragments, comprising:
the molecular data set acquisition module is used for acquiring a molecular data set;
the cutting module is used for cutting the molecules in the molecular data set to obtain a plurality of molecular fragments and the initial segmentation probability of each molecular fragment;
the updating module is used for updating the initial segmentation probability of each molecular fragment according to a preset algorithm to obtain the target segmentation probability of the molecular fragment;
the molecular fragment library determining module is used for determining a set of molecular fragments, the target segmentation probability of which meets a preset condition, as a molecular fragment library;
the cutting module includes:
the cutting sub-module is used for cutting molecules in the molecular data set to obtain a plurality of molecular fragments, and the length of the molecular fragments is smaller than or equal to the preset length;
The initial segmentation probability determining module is used for determining the initial segmentation probability of each molecular fragment according to the number of each molecular fragment and the number of all the molecular fragments;
the updating module comprises:
the first determining module is used for determining the segmentation mode of each molecule in the molecule data set;
the second determining module is used for determining the segmentation probability of each segmentation mode according to the initial segmentation probability;
the normalization processing module is used for normalizing the segmentation probability of each segmentation mode to obtain a normalization processing result;
the first updating sub-module is used for updating the initial segmentation probability of each molecular fragment according to the normalization processing result;
the first repeated execution module is used for repeatedly executing the steps of determining the segmentation probability of each segmentation mode according to the initial segmentation probability and updating the initial segmentation probability of each molecular segment according to the normalization processing result until a first preset iteration condition is met, and obtaining the target segmentation probability of the molecular segment.
6. A molecular dividing apparatus, comprising:
the molecule to be segmented obtaining module is used for obtaining the molecule to be segmented;
the segmentation mode determining module is used for determining the segmentation mode of the molecules to be segmented;
The segmentation probability determining module is used for determining the segmentation probability of each segmentation mode according to the target segmentation probability of the molecular fragments in the molecular fragment library, wherein the molecular fragment library is obtained according to the determination method of the molecular fragment library according to any one of claims 1-3;
the segmentation module is used for segmenting the molecules to be segmented according to the segmentation mode corresponding to the maximum segmentation probability.
7. A computer device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method of determining a library of molecular fragments according to any one of claims 1-3 or the steps of the method of molecular segmentation according to claim 4.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method for determining a library of molecular fragments according to any one of claims 1-3 or the steps of the method for molecular segmentation according to claim 4.
CN202110519595.1A 2021-05-12 2021-05-12 Determination method of molecular fragment library, molecular segmentation method and device Active CN113223632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519595.1A CN113223632B (en) 2021-05-12 2021-05-12 Determination method of molecular fragment library, molecular segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519595.1A CN113223632B (en) 2021-05-12 2021-05-12 Determination method of molecular fragment library, molecular segmentation method and device

Publications (2)

Publication Number Publication Date
CN113223632A CN113223632A (en) 2021-08-06
CN113223632B true CN113223632B (en) 2024-02-13

Family

ID=77095450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519595.1A Active CN113223632B (en) 2021-05-12 2021-05-12 Determination method of molecular fragment library, molecular segmentation method and device

Country Status (1)

Country Link
CN (1) CN113223632B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0028157D0 (en) * 2000-11-17 2001-01-03 Amedis Pharm Ltd Method for predicting a biological target characteristic of a molecule
WO2008116495A1 (en) * 2007-03-26 2008-10-02 Molcode Ltd Method and apparatus for the design of chemical compounds with predetermined properties
WO2015154137A1 (en) * 2014-04-11 2015-10-15 Newsouth Innovations Pty Limited A method and a system for identifying molecules
CN110706756A (en) * 2019-09-03 2020-01-17 兰州大学 3D drug design method for targeting receptor based on artificial intelligence
CN111816265A (en) * 2020-06-30 2020-10-23 北京晶派科技有限公司 Molecule generation method and computing device
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
CN112201314A (en) * 2020-09-18 2021-01-08 北京望石智慧科技有限公司 Method and device for extracting molecular fingerprints and calculating correlation degree based on molecular fingerprints
CN112329443A (en) * 2020-11-03 2021-02-05 中国平安人寿保险股份有限公司 Method, device, computer equipment and medium for determining new words

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9396304B2 (en) * 2003-07-10 2016-07-19 Wisconsin Alumni Research Foundation Computer systems for annotation of single molecule fragments
US8108153B2 (en) * 2005-12-13 2012-01-31 Palo Alto Research Center Incorporated Method, apparatus, and program product for creating an index into a database of complex molecules

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0028157D0 (en) * 2000-11-17 2001-01-03 Amedis Pharm Ltd Method for predicting a biological target characteristic of a molecule
WO2008116495A1 (en) * 2007-03-26 2008-10-02 Molcode Ltd Method and apparatus for the design of chemical compounds with predetermined properties
WO2015154137A1 (en) * 2014-04-11 2015-10-15 Newsouth Innovations Pty Limited A method and a system for identifying molecules
CN110706756A (en) * 2019-09-03 2020-01-17 兰州大学 3D drug design method for targeting receptor based on artificial intelligence
CN111816265A (en) * 2020-06-30 2020-10-23 北京晶派科技有限公司 Molecule generation method and computing device
CN112201314A (en) * 2020-09-18 2021-01-08 北京望石智慧科技有限公司 Method and device for extracting molecular fingerprints and calculating correlation degree based on molecular fingerprints
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
CN112329443A (en) * 2020-11-03 2021-02-05 中国平安人寿保险股份有限公司 Method, device, computer equipment and medium for determining new words

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Development of molecular fragment interaction method for designing organic ferromagnets;Xun Zhu et al.;《Journal of Mathematical Chemistry》;第54卷;1585-1595 *
基于分子片段的药物发现;汪小涧 等;《中国药科大学学报》;第40卷(第4期);289-296 *

Also Published As

Publication number Publication date
CN113223632A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
Modolo et al. UrQt: an efficient software for the Unsupervised Quality trimming of NGS data
CN109542901B (en) Data processing method and device, computer readable storage medium and electronic equipment
US20160140289A1 (en) Variant caller
US20170206458A1 (en) Computer-readable recording medium, detection method, and detection apparatus
US20220172800A1 (en) Computer Method and System of Identifying Genomic Mutations Using Graph-Based Local Assembly
US20190362048A1 (en) Method of rip-up and re-routing a global routing solution
CN113223632B (en) Determination method of molecular fragment library, molecular segmentation method and device
CN114610825A (en) Method and device for confirming associated grid set, electronic equipment and storage medium
Pagh et al. I/O-efficient similarity join
CN110807061A (en) Method for searching frequent subgraphs of uncertain graphs based on layering
US20220067224A1 (en) Parallel processing designing device and parallel processing designing method
CN107844526B (en) Knowledge base-based vocabulary relation chain analysis method, system and device
CN116258114A (en) Circuit simulation method and device based on hypergraph division, server and storage medium
Chehreghani et al. DyBED: An efficient algorithm for updating betweenness centrality in directed dynamic graphs
US20200381082A1 (en) Alignment methods, devices and systems
US11875880B2 (en) Systems and methods for calculating protein confidence values
Chen et al. A survey on de novo assembly methods for single-molecular sequencing
CN115602244B (en) Genome variation detection method based on sequence alignment skeleton
CN112836827A (en) Model training method and device and computer equipment
WO2024029022A1 (en) Device for accelerating branch-and-bound method, method, and program
EP4354444A1 (en) Method and system for identifying candidate genome sequecnces by estimating coverage
CN113609125B (en) IP address matching method, device, equipment and computer readable storage medium
CN118171052A (en) Method, device, equipment and storage medium for supplementing universal characteristic data of engine
CN118274851A (en) Map matching method and device, electronic equipment and storage medium
US8135720B2 (en) Homology searching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant