WO2021021120A1 - Method of synthesizing chemical compounds - Google Patents
Method of synthesizing chemical compounds Download PDFInfo
- Publication number
- WO2021021120A1 WO2021021120A1 PCT/US2019/044068 US2019044068W WO2021021120A1 WO 2021021120 A1 WO2021021120 A1 WO 2021021120A1 US 2019044068 W US2019044068 W US 2019044068W WO 2021021120 A1 WO2021021120 A1 WO 2021021120A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- database
- hashed
- entries
- synthons
- customer
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6209—Protecting access to data via a platform, e.g. using keys or access control rules to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/10—Analysis or design of chemical reactions, syntheses or processes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
Definitions
- This disclosure describes systems and methods for synthesizing pathways to create chemical compounds, also referred to as retrosynthetic analysis.
- SynthiaTM software application
- a system, software application and method that allows a customer to protect their proprietary database of compounds and substances while utilizing a retrosynthesis software application is disclosed.
- the customer's proprietary database is encrypted prior to being provided to the retrosynthesis system. This encrypted is performed using a hash and optionally a salt.
- the retrosynthesis algorithm then creates synthons as is traditionally done. However, after their creation, the synthons are hashed so that they may be compared to the entries in the customer' s proprietary database. In this way, the actual contents of the customer' s database are never made available in a molecular format to the retrosynthesis system or software application.
- FIG. 1 shows a representative system for performing the retrosynthesis
- FIG. 2 shows a representative system for a user of the software application described herein
- FIG. 3 shows a sequence to create the hashed proprietary database
- FIG. 4 shows the comparison of entries in the original database to the salted and hashed entries
- FIG. 5 shows a sequence to perform retrosynthesis using the hashed proprietary database
- FIG. 6 shows an enhancement to the sequence of FIG. 5 to utilize a second data store
- FIG. 7 shows a variation of the process shown in FIG. 6.
- the present disclosure represents an advancement in the retrosynthesis of chemical compounds.
- the present disclosure describes a system, method and software application that allow for retrosynthesis analysis that protects the confidentiality of a customer's library.
- the software application may be written in any suitable language and may be executed on any system.
- the software application comprises one or more processing blocks. Each of these processing blocks may be a software module or application that is executed on a computer or other processing unit.
- a representative retrosynthesis system 10 that executes the software application is shown in FIG. 1.
- the processing unit 20 can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware, such as personal computers, that is programmed using microcode or software to perform the functions recited herein.
- a local memory device 25 may contain the software application and instructions, which, when executed by the processing unit 20, enable the retrosynthesis system 10 to perform the functions described herein.
- This local memory device 25 may be a non-volatile memory, such as a FLASH ROM, an electrically erasable ROM or other suitable devices. In other embodiments, the local memory device 25 may be a volatile memory, such as a RAM or DRAM.
- the retrosynthesis system 10 also comprises a data store 50.
- the data store 50 may be used to store large amounts of data, such as lists of reaction rules, lists of commercial compounds and their prices per gram. Additionally, the retrosynthesis system 10 may include a user input device 30, such as a keyboard, mouse, touch screen or another suitable device.
- the retrosynthesis system 10 may also include a display device 40, such as a computer screen, LED display, touch screen or the like.
- the data store 50, the user input device 30 and the display device 40 are all in communication with the processing unit 20.
- the retrosynthesis system 10 may also have a network interface 60, in communication with an external network, such as the internet, which allows the processing unit 20 to access information that is stored remotely from the retrosynthesis system 10.
- the data store 50 may store a vast knowledge base of methodologies that describe known reactions. In one embodiment, the data store 50 may include in excess of 70,000 reaction rules. In addition, the retrosynthesis system 10 may have access to diverse collections of starting materials. This information may be stored in the data store 50 or another storage element. Alternatively, this information may be accessible to the processing unit 20 via the network interface 60. In one embodiment, information regarding more than 7 million literature-known substances is available to the processing unit 20. This information may also include pricing per gram for at least some of these substances. Each of these substances may be stored in a text format, as opposed to a graphical format. For example, the substances may be depicted using Simplified Molecular Input Lines Entry System (SMILES) strings.
- SILES Simplified Molecular Input Lines Entry System
- SMILES is a notation that describes the structure of chemical species using ASCII strings.
- Other notations include IUPAC International Chemical Identifier (InChl), and InChl-Keys. Commonly, regardless of which notation is used, the same compound may be expressed using different strings. Therefore, in certain embodiments, all strings are rewritten using canonical representation.
- the processing unit 20 has access to a proprietary database 80, which is encrypted.
- This proprietary database 80 contains the library of compounds and substances that are available to a particular customer of the software application.
- This proprietary database 80 contains a number of canonical representations, each representation corresponding to a specific substance that is available to that particular user.
- Each canonical representation is then encrypted using a hash.
- the hash may be SHA-1, SHA-2, SHA-3, MD5 or another algorithm.
- the particular hash algorithm is not limited by this disclosure.
- each entry is the proprietary database 80 is salted prior to being hashed.
- a "salt" is an arbitrary string that is prepended or appended to each canonical representation. This added input further protects the confidentiality of the proprietary database 80.
- salting and hashing the proprietary database 80 a third party would be unable to determine the contents of the proprietary database 80. In this way, a customer may utilize the disclosed retrosynthesis system and software application without providing access to their confidential library of compounds and substances.
- a representative customer's system 100 is shown in FIG. 2.
- the processing unit 120 can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware, such as personal computers, that is programmed using microcode or software to perform the functions recited herein.
- a local memory device 125 may contain instructions, which, when executed by the processing unit 120, enable the customer's system 100 to perform the functions described herein.
- This local memory device 125 may be a non-volatile memory, such as a FLASH ROM, an electrically erasable ROM or other suitable devices. In other embodiments, the local memory device 125 may be a volatile memory, such as a RAM or DRAM.
- the customer's system 100 may include a user input device 130, such as a keyboard, mouse, touch screen or another suitable device.
- the customer's system 100 may also include a display device 140, such as a computer screen, LED display, touch screen or the like.
- the user input device 130 and the display device 140 are all in communication with the processing unit 120.
- the customer's system 100 may also have a network interface 160, in communication with an external network, such as the internet, which allows the processing unit 120 to provide its proprietary database to the retrosynthesis system 10.
- the customer's system 100 may also have a data store that contains the customer's database 180 of compounds and substances.
- FIG. 3 shows a method that may be used to create the hashed proprietary database 80 that is provided to the retrosynthesis system 10. This method may be executed by providing an executable file to the customer's system 100.
- the executable file contains instructions, which when executed by the processing unit 120 on the customer's system 100, performs the functions described herein. In other words, the processes shown in FIG. 3 may be performed on the customer's system 100. In this way, the unencrypted customer's database 180 is never made available to the retrosynthesis system 10 or the software application.
- This executable file may be created by the retrosynthesis system 10 and transmitted to the customer's system 100, such as via download across the internet.
- Process 300 each entry in the customer's database 180 is reviewed to ensure that it is in canonical form. This is necessary, as the comparison of hashed synthons can only be successful if each compound is denoted using only a single representation.
- a salt may be added to each entry, as shown in Process 310.
- This salt may be unique to the particular customer and may be kept confidential.
- the executable file contains the unique salt, which is not made visible to the customer.
- the executable file and the software application use the same salt for a particular customer. In other embodiments, a salt may not be used. In these embodiments, Process 310 may be omitted.
- each entry which is a canonical representation with a salt, is hashed.
- the particular hashing algorithm is not limited by this disclosure.
- Each of these salted and hashed entries is then compiled into the proprietary database 80. Once this is complete, the proprietary database 80 may be made available to the software application, as the unencrypted contents are no longer accessible to the retrosynthesis system 10 or the software application.
- FIG. 4 shows several representative canonical SMILES strings and the resultant hashed values. Note that there is no way to recreate the canonical SMILES strings from the hashed values. Further, note that the hashed values do not provide any insight as to the original SMILES strings.
- the salted and hashed canonical representations may be sorted, such as in alphabetical order, as shown in Process 330. This enhancement may reduce the time required for the software application to search the proprietary database 80 looking for a match .
- the proprietary database 80 is made available to the processing unit 20 of the retrosynthesis system 10. This may be achieved by uploading the proprietary database 80 to the retrosynthesis system 10, or by allowing the retrosynthesis system 10 to access the proprietary database 80 remotely.
- a retron is a minimal molecular substructure that enables certain transformations.
- a synthon is a fragment of a compound that assists in the formation of a synthesis, derived from that target molecule.
- each synthon must be in canonical form.
- Each synthon is then salted (if this was performed to the proprietary database 80) using the same salt that was used in Process 310 in FIG. 3.
- the salted canonical representation of each synthon is hashed using the same hash algorithm that was used in Process 320 of FIG. 3.
- the hashed synthons will only match to the exact same compound in the customer's database.
- the hashed synthons are compared to the entries in the proprietary database 80. If a match is found, the synthon is recorded and/or displayed, as shown in Process 450.
- This path of the retrosynthesis is now complete, and this synthon does not need to undergo further analysis.
- the sequence continues by checking to see if there are other synthons that have not been matched to the proprietary database 80, as shown in Process 460. If there are other synthons that have not been identified yet, the retrosynthesis process continues. For example, the remaining synthons now are treated as the target molecule, as shown in Process 480. The retrosynthesis process then continues using these remaining synthons as the targets .
- the resulting synthesis paths are then provided to the customer, such as by displaying a result on the display device 140 associated with the customer's system 100, Alternatively, the resulting synthesis paths may be provided to the customer via a text file, email, or other means.
- FIG. 6 shows an enhancement to the sequence of FIG. 5.
- the retrosynthesis system 10 may include both the proprietary database 80 and a data store 50 of commercially available compounds and substances. While the proprietary database 80 is salted and hashed, the data store 50 may not be.
- the sequence includes Processes 421 and 422, which follow Process 420. As shown in Process 421, the synthons may be compared to the entries in the data store 50. These comparisons are made before the synthons are hashed. If any of the synthons match an entry in the data store 50, that synthon is displayed and/or recorded, as shown in Process 422. The sequence then proceeds to Process 460.
- the sequence continues with Process 430.
- the data store 50 is also salted and hashed using the same parameters as the proprietary database 80.
- the synthons are then compared to the entries in the hashed public database, as shown in Process 431. If a match is found, the synthon is displayed and/or recorded, as shown in Process 432. The sequence would then continue with Process 460. If a match is not found, the sequence would then proceed to Process 440, where the hashed synthon is compared to entries in the proprietary database 80.
- the executable file provides the unique salt to the customer
- the executable file may allow the customer to enter a password which will serve as the salt. This password is then used by the executable file to salt the customer's database 180 to create the proprietary database 80, as shown in Process 310. Further, this password is transmitted to the retrosynthesis system 10 so that the same password is used to perform the salt process shown in Process 430.
- the present disclosure describes a system, method and software application that allows the user to utilize a propriety database without allowing the software application to access the contents of that proprietary database. This may reduce customer's uneasiness of providing their confidential information to another party, while still allowing them to make use of this software application .
- the use of a unique salt for each customer also increases the security of the customer's database. Specifically, if a salt is not used, it may be possible to compare the proprietary databases of multiple customers to determine commonality. However, the use of a salt implies that the same compound, in two different customer's proprietary databases, will not have the same final hash, thus making it impossible to make comparisons between databases .
- the present disclosure is not to be limited in scope by the specific embodiments described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Stored Programmes (AREA)
- Storage Device Security (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A system, software application and method that allows a customer to protect their proprietary database of compounds and substances while utilizing a retrosynthesis software application is disclosed. The customer's proprietary database is encrypted prior to being provided to the retrosynthesis system. This encrypted is performed using a hash and optionally a salt. The retrosynthesis algorithm then creates synthons as is traditionally done. However, after their creation, the synthons are hashed so that they may be compared to the entries in the customer's proprietary database. In this way, the actual contents of the customer's database are never made available to the retrosynthesis system or software application.
Description
Method of synthesizing chemical compounds
This disclosure describes systems and methods for synthesizing pathways to create chemical compounds, also referred to as retrosynthetic analysis.
Background
Programming a computer to plan multistep chemical syntheses leading to nontrivial targets has been an elusive goal for over five decades. Specifically, one software application, referred to as Synthia™, designed, with minimal human supervision, complete pathways leading to structurally diverse and medicinally relevant targets. These theoretical pathways were subsequently executed in the laboratory, offering substantial improvements over previous approaches or providing the first documented routes to a given target .
Knowing that retrosynthesis is achievable, one can consider expanding the scope of automated retrosynthetic design modalities. One of the interesting possibilities is to allow customers to supply their proprietary database of compounds and to terminate the retrosynthesis when commonly available compounds or compounds from that proprietary database are reached.
However, customers may be reluctant to share their propriety database with another entity, such as the owner of this software application. Therefore, it would be beneficial if there was a system and method for the customer to utilize their proprietary database without allowing other entities to access that database.
Further, it would be advantageous if the software application could operate using databases from multiple customers without having access to unencrypted data in any of those databases as well as identifying overlap within these databases.
Summary
A system, software application and method that allows a customer to protect their proprietary database of compounds and substances while utilizing a retrosynthesis software application is disclosed. The customer's proprietary database is encrypted prior to being provided to the retrosynthesis system. This encrypted is performed using a hash and optionally a salt. The retrosynthesis algorithm then creates synthons as is traditionally done. However, after their creation, the synthons are hashed so that they may be compared to the entries in the customer' s proprietary database. In this way, the actual contents of the customer' s database are never made available in a molecular format to the retrosynthesis system or software application.
Brief Description of the Drawings
For a better understanding of the present disclosure, reference is made to the accompanying drawings, in which like elements are referenced with like numerals, and in which:
FIG. 1 shows a representative system for performing the retrosynthesis ;
FIG. 2 shows a representative system for a user of the software application described herein;
FIG. 3 shows a sequence to create the hashed proprietary database ;
FIG. 4 shows the comparison of entries in the original database to the salted and hashed entries;
FIG. 5 shows a sequence to perform retrosynthesis using the hashed proprietary database;
FIG. 6 shows an enhancement to the sequence of FIG. 5 to utilize a second data store; and
FIG. 7 shows a variation of the process shown in FIG. 6.
Detailed Description
The present disclosure represents an advancement in the retrosynthesis of chemical compounds. The present disclosure describes a system, method and software application that allow for retrosynthesis analysis that protects the confidentiality of a customer's library. The software application may be written in any suitable language and may be executed on any system. The software application comprises one or more processing blocks. Each of these processing blocks may be a software module or application that is executed on a computer or other processing unit. A representative retrosynthesis system 10 that executes the software application is shown in FIG. 1. The processing unit 20 can be implemented in numerous ways, such as with dedicated hardware, or with general
purpose hardware, such as personal computers, that is programmed using microcode or software to perform the functions recited herein. A local memory device 25 may contain the software application and instructions, which, when executed by the processing unit 20, enable the retrosynthesis system 10 to perform the functions described herein. This local memory device 25 may be a non-volatile memory, such as a FLASH ROM, an electrically erasable ROM or other suitable devices. In other embodiments, the local memory device 25 may be a volatile memory, such as a RAM or DRAM. The retrosynthesis system 10 also comprises a data store 50. The data store 50 may be used to store large amounts of data, such as lists of reaction rules, lists of commercial compounds and their prices per gram. Additionally, the retrosynthesis system 10 may include a user input device 30, such as a keyboard, mouse, touch screen or another suitable device. The retrosynthesis system 10 may also include a display device 40, such as a computer screen, LED display, touch screen or the like. The data store 50, the user input device 30 and the display device 40 are all in communication with the processing unit 20. In some embodiments, the retrosynthesis system 10 may also have a network interface 60, in communication with an external network, such as the internet, which allows the processing unit 20 to access information that is stored remotely from the retrosynthesis system 10.
The data store 50 may store a vast knowledge base of methodologies that describe known reactions. In one embodiment, the data store 50 may include in excess of 70,000 reaction rules. In addition, the retrosynthesis system 10 may have access to diverse collections of starting materials. This information may be stored in the data store 50 or another storage element. Alternatively, this information may be accessible to the
processing unit 20 via the network interface 60. In one embodiment, information regarding more than 7 million literature-known substances is available to the processing unit 20. This information may also include pricing per gram for at least some of these substances. Each of these substances may be stored in a text format, as opposed to a graphical format. For example, the substances may be depicted using Simplified Molecular Input Lines Entry System (SMILES) strings. SMILES is a notation that describes the structure of chemical species using ASCII strings. Other notations include IUPAC International Chemical Identifier (InChl), and InChl-Keys. Commonly, regardless of which notation is used, the same compound may be expressed using different strings. Therefore, in certain embodiments, all strings are rewritten using canonical representation.
Additionally, the processing unit 20 has access to a proprietary database 80, which is encrypted. This proprietary database 80 contains the library of compounds and substances that are available to a particular customer of the software application. This proprietary database 80 contains a number of canonical representations, each representation corresponding to a specific substance that is available to that particular user. Each canonical representation is then encrypted using a hash. The hash may be SHA-1, SHA-2, SHA-3, MD5 or another algorithm. The particular hash algorithm is not limited by this disclosure. In certain embodiments, each entry is the proprietary database 80 is salted prior to being hashed. A "salt" is an arbitrary string that is prepended or appended to each canonical representation. This added input further protects the confidentiality of the proprietary database 80.
By salting and hashing the proprietary database 80, a third party would be unable to determine the contents of the proprietary database 80. In this way, a customer may utilize the disclosed retrosynthesis system and software application without providing access to their confidential library of compounds and substances.
A representative customer's system 100 is shown in FIG. 2. The processing unit 120 can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware, such as personal computers, that is programmed using microcode or software to perform the functions recited herein. A local memory device 125 may contain instructions, which, when executed by the processing unit 120, enable the customer's system 100 to perform the functions described herein. This local memory device 125 may be a non-volatile memory, such as a FLASH ROM, an electrically erasable ROM or other suitable devices. In other embodiments, the local memory device 125 may be a volatile memory, such as a RAM or DRAM. The customer's system 100 may include a user input device 130, such as a keyboard, mouse, touch screen or another suitable device. The customer's system 100 may also include a display device 140, such as a computer screen, LED display, touch screen or the like. The user input device 130 and the display device 140 are all in communication with the processing unit 120. In some embodiments, the customer's system 100 may also have a network interface 160, in communication with an external network, such as the internet, which allows the processing unit 120 to provide its proprietary database to the retrosynthesis system 10.
The customer's system 100 may also have a data store that contains the customer's database 180 of compounds and substances.
FIG. 3 shows a method that may be used to create the hashed proprietary database 80 that is provided to the retrosynthesis system 10. This method may be executed by providing an executable file to the customer's system 100. The executable file contains instructions, which when executed by the processing unit 120 on the customer's system 100, performs the functions described herein. In other words, the processes shown in FIG. 3 may be performed on the customer's system 100. In this way, the unencrypted customer's database 180 is never made available to the retrosynthesis system 10 or the software application. This executable file may be created by the retrosynthesis system 10 and transmitted to the customer's system 100, such as via download across the internet.
Each of the following processes is performed by the executable file. First, as shown in Process 300, each entry in the customer's database 180 is reviewed to ensure that it is in canonical form. This is necessary, as the comparison of hashed synthons can only be successful if each compound is denoted using only a single representation. After each entry has been reviewed and converted to canonical form, a salt may be added to each entry, as shown in Process 310. This salt may be unique to the particular customer and may be kept confidential. For example, in one embodiment, the executable file contains the unique salt, which is not made visible to the customer. The executable file and the software application use the same salt for a particular customer. In other embodiments, a salt may not be used. In these embodiments, Process 310 may be omitted. Next, as shown in Process 320, each entry, which is a canonical representation with a salt, is hashed. As noted above, the particular hashing algorithm is not limited by this disclosure. Each of these salted and hashed entries is then compiled into the
proprietary database 80. Once this is complete, the proprietary database 80 may be made available to the software application, as the unencrypted contents are no longer accessible to the retrosynthesis system 10 or the software application.
FIG. 4 shows several representative canonical SMILES strings and the resultant hashed values. Note that there is no way to recreate the canonical SMILES strings from the hashed values. Further, note that the hashed values do not provide any insight as to the original SMILES strings.
As an operational enhancement to FIG. 3, in certain embodiments, the salted and hashed canonical representations may be sorted, such as in alphabetical order, as shown in Process 330. This enhancement may reduce the time required for the software application to search the proprietary database 80 looking for a match .
Having described how the proprietary database 80 is created, the sequence used to perform the retrosynthesis for the customer will now be described. A representative flowchart of this sequence is shown in FIG. 5. First, as shown in Process 400, the proprietary database 80 is made available to the processing unit 20 of the retrosynthesis system 10. This may be achieved by uploading the proprietary database 80 to the retrosynthesis system 10, or by allowing the retrosynthesis system 10 to access the proprietary database 80 remotely.
Next, as shown in Process 410, the retrosynthetic search commences. Specifically, the matching reaction templates are applied, and the first generation of synthon sets is created. For
the initial search, the retron is set to the target compound. For each candidate retron-to-synthon ( s ) transformation, r ® s1,s2, ...,sN (where r = t in the first generation) , the synthons are identified, as shown in Process 420. As is well to those skilled in the art, a retron is a minimal molecular substructure that enables certain transformations. Also, as is well known, a synthon is a fragment of a compound that assists in the formation of a synthesis, derived from that target molecule.
Since the proprietary database 80 is hashed, in order to determine whether any of the synthons are in the proprietary database 80, it is necessary to perform the same operations on the synthons that were previously performed on the customer' s database, as shown in Process 430. In other words, each synthon must be in canonical form. Each synthon is then salted (if this was performed to the proprietary database 80) using the same salt that was used in Process 310 in FIG. 3. Finally, the salted canonical representation of each synthon is hashed using the same hash algorithm that was used in Process 320 of FIG. 3.
Since the exact same transformation was performed on the synthons that was performed on the customer's database 180, the hashed synthons will only match to the exact same compound in the customer's database. Thus, as shown in Process 440, the hashed synthons are compared to the entries in the proprietary database 80. If a match is found, the synthon is recorded and/or displayed, as shown in Process 450. This path of the retrosynthesis is now complete, and this synthon does not need to undergo further analysis. The sequence continues by checking to see if there are other synthons that have not been matched to the proprietary database 80, as shown in Process 460. If there are other synthons
that have not been identified yet, the retrosynthesis process continues. For example, the remaining synthons now are treated as the target molecule, as shown in Process 480. The retrosynthesis process then continues using these remaining synthons as the targets .
This process continues until all of the synthons have been found in the proprietary database 80, as shown in Process 470. The resulting synthesis paths are then provided to the customer, such as by displaying a result on the display device 140 associated with the customer's system 100, Alternatively, the resulting synthesis paths may be provided to the customer via a text file, email, or other means.
FIG. 6 shows an enhancement to the sequence of FIG. 5. Specifically, in certain embodiments, the retrosynthesis system 10 may include both the proprietary database 80 and a data store 50 of commercially available compounds and substances. While the proprietary database 80 is salted and hashed, the data store 50 may not be. Thus, in certain embodiments, the sequence includes Processes 421 and 422, which follow Process 420. As shown in Process 421, the synthons may be compared to the entries in the data store 50. These comparisons are made before the synthons are hashed. If any of the synthons match an entry in the data store 50, that synthon is displayed and/or recorded, as shown in Process 422. The sequence then proceeds to Process 460. If the synthon does not match any of the entries in the data store 50, the sequence continues with Process 430. In this way, both commercially available substances and proprietary substances may be included in the search algorithm.
In a variation of FIG. 6, shown in FIG. 7, the data store 50 is also salted and hashed using the same parameters as the proprietary database 80. Thus, after the synthons have been salted and hashed, as shown in Process 430, the salted hashed synthons are then compared to the entries in the hashed public database, as shown in Process 431. If a match is found, the synthon is displayed and/or recorded, as shown in Process 432. The sequence would then continue with Process 460. If a match is not found, the sequence would then proceed to Process 440, where the hashed synthon is compared to entries in the proprietary database 80.
While the above description discloses that the executable file provides the unique salt to the customer, other embodiments are also possible. For example, in another embodiment, the executable file may allow the customer to enter a password which will serve as the salt. This password is then used by the executable file to salt the customer's database 180 to create the proprietary database 80, as shown in Process 310. Further, this password is transmitted to the retrosynthesis system 10 so that the same password is used to perform the salt process shown in Process 430.
Thus, the present disclosure describes a system, method and software application that allows the user to utilize a propriety database without allowing the software application to access the contents of that proprietary database. This may reduce customer's uneasiness of providing their confidential information to another party, while still allowing them to make use of this software application .
Further, the use of a unique salt for each customer also increases the security of the customer's database. Specifically, if a salt is not used, it may be possible to compare the proprietary databases of multiple customers to determine commonality. However, the use of a salt implies that the same compound, in two different customer's proprietary databases, will not have the same final hash, thus making it impossible to make comparisons between databases . The present disclosure is not to be limited in scope by the specific embodiments described herein. Indeed, other various embodiments of and modifications to the present disclosure, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such other embodiments and modifications are intended to fall within the scope of the present disclosure. Further, although the present disclosure has been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the present disclosure may be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the present disclosure as described herein.
Claims
1. A method of performing retrosynthesis on a target compound, comprising :
setting the target compound to a retron;
performing a first retrosynthesis search on the retron to find a set of synthons;
hashing each synthon of the set of synthons; comparing each hashed synthon to entries in a proprietary database, where the entries in the proprietary database are hashed using same hash algorithm;
if the comparison succeeds, recording and/or displaying the synthon; and
if the comparison fails, setting the set of synthons to the retron and repeating the performing, hashing, and comparing steps.
2. The method of claim 1, wherein the set of synthons are salted prior to being hashed and the entries in the proprietary database are salted using the same salt.
3. The method of claim 2, wherein the salt is unique for each customer .
4. The method of claim 1, wherein the set of synthons are also compared to entries in a public database.
5. The method of claim 4, wherein the entries in the public database are not hashed.
6. The method of claim 4, wherein the entries in the public database are hashed using the same hash algorithm as the proprietary database.
7. The method of claim 1, wherein an executable file is used to create the proprietary database.
8. The method of claim 7, wherein the executable file comprises instructions, which when executed by a processing unit, allow the processing unit to:
convert all entries in a customer' s database to canonical form;
hash each entry after conversion to canonical form; and store the hashed entries in the proprietary database.
9. The method of claim 8, wherein the executable file further includes a unique salt and comprises instructions, which when executed by the processing unit, allow the processing unit to:
salt each entry in canonical format prior to the hash.
10. The method of claim 8, wherein the executable file further comprises instructions, which when executed by the processing unit, allow the processing unit to:
request a password from a customer;
utilize the password as a salt; and
salt each entry in canonical format prior to the hash.
11. The method of claim 8, wherein each entry in the customer's database is in SMILES notation.
12. The method of claim 8, wherein each entry in the customer's database is in InChl notation.
13. The method of claim 8, wherein each entry in the customer's database is in InChl-Keys notation.
14. A software program, disposed on a non-transitory storage media, the software program comprising instructions, which when executed by a processing unit perform retrosynthesis on a target compound, by :
setting the target compound to a retron;
performing a first retrosynthesis search on the retron to find a set of synthons;
hashing each synthon of the set of synthons;
comparing each hashed synthon to entries in a proprietary database, where the entries in the proprietary database are hashed using same hash algorithm;
if the comparison succeeds, recording and/or displaying the synthon; and
if the comparison fails, setting the set of synthons to the retron and repeating the performing, hashing, and comparing steps .
15. The software program of claim 14, wherein the software program further comprises instructions to:
salt the set of synthons prior to being hashed; wherein the entries in the proprietary database are salted using the same salt.
16. The software program of claim 14, wherein the software program further comprises instructions to:
compare the set of synthons to entries in a public database .
17. The software program of claim 16, wherein the public database is not hashed.
18. The software program of claim 16, wherein the entries in the public database are hashed using the same hash algorithm as the proprietary database.
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022506295A JP7231786B2 (en) | 2019-07-30 | 2019-07-30 | Methods of synthesizing chemical compounds |
US16/772,293 US11410752B2 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
ES19939568T ES2973350T3 (en) | 2019-07-30 | 2019-07-30 | Synthesis procedure of chemical compounds |
PL19939568.2T PL4003165T3 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
EP19939568.2A EP4003165B1 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
PCT/US2019/044068 WO2021021120A1 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
SG11202113371VA SG11202113371VA (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
CN201980098923.6A CN114144110B (en) | 2019-07-30 | 2019-07-30 | Method for synthesizing compound |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2019/044068 WO2021021120A1 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021021120A1 true WO2021021120A1 (en) | 2021-02-04 |
Family
ID=74229770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2019/044068 WO2021021120A1 (en) | 2019-07-30 | 2019-07-30 | Method of synthesizing chemical compounds |
Country Status (8)
Country | Link |
---|---|
US (1) | US11410752B2 (en) |
EP (1) | EP4003165B1 (en) |
JP (1) | JP7231786B2 (en) |
CN (1) | CN114144110B (en) |
ES (1) | ES2973350T3 (en) |
PL (1) | PL4003165T3 (en) |
SG (1) | SG11202113371VA (en) |
WO (1) | WO2021021120A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023153148A1 (en) * | 2022-02-09 | 2023-08-17 | イーセップ株式会社 | Membrane reactor development assistance system and development assistance device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024044168A2 (en) * | 2022-08-22 | 2024-02-29 | Luka Shai Therapeutics, Llc | Compositions and methods for treating a v-atpase malfunction |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050033524A1 (en) * | 1998-04-13 | 2005-02-10 | Cowsert Lex M. | Identification of genetic targets for modulation by oligonucleotides and generation of oligonucleotides for gene modulation |
US20150147275A1 (en) * | 2012-04-27 | 2015-05-28 | University Of Bristol | Anthracenyl-tetralactam macrocycles and their use in detecting a target saccharide |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU751956B2 (en) | 1997-03-20 | 2002-09-05 | University Of Washington | Solvent for biopolymer synthesis, solvent microdroplets and methods of use |
JP5032120B2 (en) * | 2003-10-14 | 2012-09-26 | バーセオン | Method and apparatus for classifying molecules |
CN101789047B (en) * | 2010-02-05 | 2011-10-26 | 四川大学 | Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis |
JP5975490B2 (en) * | 2011-09-14 | 2016-08-23 | 国立研究開発法人産業技術総合研究所 | Search system, search method, and program |
US20130226549A1 (en) * | 2012-02-27 | 2013-08-29 | Yufeng J. Tseng | Structure-based fragment hopping for lead optimization and improvement in synthetic accessibility |
US8965066B1 (en) * | 2013-09-16 | 2015-02-24 | Eye Verify LLC | Biometric template security and key generation |
US10607726B2 (en) * | 2013-11-27 | 2020-03-31 | Accenture Global Services Limited | System for anonymizing and aggregating protected health information |
US9824236B2 (en) * | 2015-05-19 | 2017-11-21 | Accenture Global Services Limited | System for anonymizing and aggregating protected information |
EP3264314B1 (en) * | 2016-06-30 | 2021-02-17 | Huawei Technologies Co., Ltd. | System and method for searching over encrypted data |
US10679733B2 (en) * | 2016-10-06 | 2020-06-09 | International Business Machines Corporation | Efficient retrosynthesis analysis |
WO2019010101A1 (en) * | 2017-07-01 | 2019-01-10 | Shape Security, Inc. | Secure detection and management of compromised credentials |
CN107592298B (en) * | 2017-08-11 | 2020-07-14 | 中国科学院大学 | Sequence comparison algorithm secure outsourcing method based on single server model, user terminal and server |
US10622098B2 (en) * | 2017-09-12 | 2020-04-14 | Massachusetts Institute Of Technology | Systems and methods for predicting chemical reactions |
US11750390B2 (en) * | 2019-01-31 | 2023-09-05 | Global Bionic Optics Limited | System and method for producing a unique stable biometric code for a biometric hash |
US11604767B2 (en) * | 2019-04-05 | 2023-03-14 | Comcast Cable Communications, Llc | Systems and methods for data distillation |
-
2019
- 2019-07-30 JP JP2022506295A patent/JP7231786B2/en active Active
- 2019-07-30 ES ES19939568T patent/ES2973350T3/en active Active
- 2019-07-30 EP EP19939568.2A patent/EP4003165B1/en active Active
- 2019-07-30 CN CN201980098923.6A patent/CN114144110B/en active Active
- 2019-07-30 PL PL19939568.2T patent/PL4003165T3/en unknown
- 2019-07-30 SG SG11202113371VA patent/SG11202113371VA/en unknown
- 2019-07-30 US US16/772,293 patent/US11410752B2/en active Active
- 2019-07-30 WO PCT/US2019/044068 patent/WO2021021120A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050033524A1 (en) * | 1998-04-13 | 2005-02-10 | Cowsert Lex M. | Identification of genetic targets for modulation by oligonucleotides and generation of oligonucleotides for gene modulation |
US20150147275A1 (en) * | 2012-04-27 | 2015-05-28 | University Of Bristol | Anthracenyl-tetralactam macrocycles and their use in detecting a target saccharide |
Non-Patent Citations (4)
Title |
---|
"Chapter 18: Organic Synthesis", CHEMISTRY LIBRETEXTS, 13 July 2016 (2016-07-13), XP055792666, Retrieved from the Internet <URL:https://chem.libretexts.org/Courses/Purdue/Purdue%3A_Chem_26605%3A_Organic_Chemistry_II_(Lipton)/Chapter_18%3A_Organic_Synthesis> [retrieved on 20190927] * |
COLEY CONNOR W., ROGERS LUKE, GREEN WILLIAM H., JENSEN KLAVS F.: "Computer-assisted retrosynthesis based on molecular similarity.", ACS CENTRAL SCIENCE, vol. 3, no. 12, 16 November 2017 (2017-11-16), pages 1237 - 1245, XP055792654, DOI: 10.1021/acscentsci.7b00355 * |
DELEPINE BAUDOIN, DUIGOU THOMAS, CARBONELL PABLO, FAULON JEAN-LOUP: "RetroPath2. 0: A retrosynthesis workflow for metabolic engineers", METABOLIC ENGINEERING, vol. 45, 9 December 2017 (2017-12-09), pages 158 - 170, XP055792640, Retrieved from the Internet <URL:https://pdf.sciencedirectassets.com/272595/1-s2.0-S1096717617X00086/1-s2.0-S1096717617301337/main.pdf?X-Amz-Security-Token=AgoJb3JpZ2luX2VjEKX%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQD3oyales3V%2BdD8BzpcO56R%2BcpEbh3CQdeauy3709EUBglgMZzFMm5L84RXExgRiWFozCd4qn15saQd%2BSVhxirGDa4q2gMlfRACGgwwNTkwMDM1NDY4NjUiDJrx7Ws5Zx6PPPsZnCq3A2EuMIVJRf70zUJx88qyFNrv9AwM5WvS6kyvPCAZYBvV%2F5Pc3N79B6yysdozvesSm4V%2B46pCx6G0N3Xu9Rc414ewbgjUxMx9GubuRW42X1Xu4v2RnrqRRSB5WhhOrtoVpwMgSOis1RmaEwV6SY17PCTVE0qyxCg4NT3Y7GkWZ0gfJPdgeQrdFQh%2F3i6NEsl9Jr8FVTLzFwR3gkwXXbD%2B8auQAnUcR2CpRnT%2B3kM5pdlT89vWOyt5Vnzap3ANOKz5CY7z2SW4eAn8pzEnlxwJL8YURt%2FiHKRcbKhvy9tFKmZ45Li1l3ldetvr4B6hjLxfEruZ3LdbC4VbKMrzoSZgQso8xVLZenbxv%2FB5C2CCi%2Bzf8s%2Fa7O8BfayOwYXc * |
See also references of EP4003165A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023153148A1 (en) * | 2022-02-09 | 2023-08-17 | イーセップ株式会社 | Membrane reactor development assistance system and development assistance device |
Also Published As
Publication number | Publication date |
---|---|
US11410752B2 (en) | 2022-08-09 |
JP7231786B2 (en) | 2023-03-01 |
CN114144110A (en) | 2022-03-04 |
JP2022537076A (en) | 2022-08-23 |
EP4003165A1 (en) | 2022-06-01 |
ES2973350T3 (en) | 2024-06-19 |
US20220148686A1 (en) | 2022-05-12 |
PL4003165T3 (en) | 2024-06-10 |
SG11202113371VA (en) | 2021-12-30 |
EP4003165B1 (en) | 2024-02-14 |
CN114144110B (en) | 2023-02-03 |
EP4003165A4 (en) | 2023-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Awale et al. | Chemical space: big data challenge for molecular diversity | |
Rastogi et al. | Accurate and sensitive quantification of protein-DNA binding affinity | |
Reguly et al. | Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae | |
Zanette et al. | Toward learned chemical perception of force field typing rules | |
Schmidt et al. | Electronic excitations in long polyenes revisited | |
Martinez-Val et al. | Data processing and analysis for DIA-based phosphoproteomics using spectronaut | |
US11410752B2 (en) | Method of synthesizing chemical compounds | |
Holman et al. | Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker | |
Blank | Next-generation analysis of deep sequencing data: bringing light into the black box of SELEX experiments | |
Lan et al. | Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level | |
Naderi et al. | A graph-based approach to construct target-focused libraries for virtual screening | |
Kelil et al. | Exhaustive search of linear information encoding protein-peptide recognition | |
Uversky | How to predict disorder in a protein of interest | |
Seoane et al. | The complexity of protein interactions unravelled from structural disorder | |
Guruharsha et al. | Drosophila Protein interaction Map (DPiM) A paradigm for metazoan protein complex interactions | |
Lee et al. | Hierarchical energy‐based approach to protein‐structure prediction: Blind‐test evaluation with CASP3 targets | |
Ghiandoni et al. | RENATE: a pseudo‐retrosynthetic tool for synthetically accessible de Novo design | |
Thorne et al. | Generating confidence intervals on biological networks | |
US8504302B2 (en) | Template constrained fragment alignment used to identify fragments of similar shape and activity in drug development | |
Chen et al. | APEX2S: A two‐layer machine learning model for discovery of host‐pathogen protein‐protein interactions on cloud‐based multiomics data | |
Hu et al. | Analyzing networks with VisANT | |
Jones et al. | afterParty: turning raw transcriptomes into permanent resources | |
Cadoret et al. | Genome-wide approaches to determining origin distribution | |
Harris et al. | Rooting species trees using gene tree-species tree reconciliation | |
Boyen et al. | Mining minimal motif pair sets maximally covering interactions in a protein-protein interaction network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19939568 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022506295 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019939568 Country of ref document: EP Effective date: 20220228 |