WO2001039041A1 - Id symbol unique to structural formula of compound - Google Patents
Id symbol unique to structural formula of compound Download PDFInfo
- Publication number
- WO2001039041A1 WO2001039041A1 PCT/JP2000/008078 JP0008078W WO0139041A1 WO 2001039041 A1 WO2001039041 A1 WO 2001039041A1 JP 0008078 W JP0008078 W JP 0008078W WO 0139041 A1 WO0139041 A1 WO 0139041A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- structural formula
- chemical structural
- symbol
- compound
- character string
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/80—Data visualisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/40—Searching chemical structures or physicochemical data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
Definitions
- the present invention relates to a method of generating a fixed-length or variable-length character string that is substantially unique to a chemical structural formula of a compound as an ID symbol attached to the compound or information related to the compound.
- a compound is uniquely identified by a chemical structural formula indicating the types of atoms constituting the compound and the bonding state between the atoms.
- compound nomenclature has been studied for a long time for that purpose.
- the IUPAC method and the chemical abstract method are well known.
- neither of these nomenclatures is still used.
- common names arbitrarily named by the discoverer of a new compound are often used for natural compounds and the like. Strict application of naming conventions requires a high degree of skill, but ordinary organic chemists using nomenclature are not familiar with naming conventions.
- Chemical Abstract which is a database based on compounds published in academic papers and patent applications, is famous as a database based on nomenclature.
- ACD which is a database of commercially available compounds, is famous as a compound database based on chemical structural formulas.
- each compound record of the compound has an identifier (ID symbol) consisting of 6 to 10 alphanumeric characters.
- ID symbol identifier
- Tasks that require extensive trials to search for a compound in one of the existing databases or to find out if the same compound is contained in databases from different sources include: It is convenient to have an ID symbol that can be compared instead of the compound structural formula. To do this, the structural formula of all compounds must be unique It is necessary to develop a method for assigning a unique ID symbol that can be fixed. Disclosure of the invention
- An object of the present invention is to provide a method for attaching an ID symbol consisting of a substantially unique fixed-length or variable-length character string to a chemical structural formula of a compound so that the same chemical structural formula can be used anytime and anywhere.
- Another object is to provide an index search method that can directly use a chemical structural formula as a query.
- the inventors of the present invention have made intensive efforts to solve the above-mentioned problems, and as a result, have performed a process of converting a structural formula of a compound into a unique character string or a group of character strings and expressing the same. It has been found that a substantially unique ID symbol can be generated.
- the present invention provides a fixed-length or variable-length character string that is substantially unique to a chemical structural formula based on the types of atoms constituting the chemical structural formula of the compound and the bonding relationship between the atoms. It is intended to provide a method of generating and using this character string as an ID symbol of the compound.
- each atom has an atomic number depending on the element number of each atom constituting the chemical structural formula, and / or the type of each atom, the type of isotope, or the type of isomer generated by the atom.
- the present invention also provides a method including a step of converting the character string obtained by the above step into a shorter fixed-length or variable-length character string using a conversion function, following the step of the above method.
- a collision-resistant hash function and / or a general-purpose one-way hash function can be used as the conversion function.
- the conversion function is selected from message digest functions such as SHA, SHA 1, MD-4.
- At least one function can be used to generate a fixed-length string, preferably a fixed-length string consisting of alphabetic and / or Arabic characters.
- Character strings or character strings obtained by the above method include one or more character strings related to information not used directly (for example, information on the type of the ID symbol generation method and / or the category of the ID symbol object). 1 or 2 or more character strings).
- the method of the present invention preferably comprises the following elements:
- (b) means n for storing vectors whose values are elements
- (c) means for inputting a covalent bond relationship between the atoms, and / or storage means c for storing the relationship as a matrix element;
- (d) means for storing a sequence generated by an arithmetic expression using n and c, a generation device thereof, and / or a medium for storing an arithmetic procedure for the generation;
- each atom is assigned a numerical value according to the element number of each atom in the chemical structural formula, the type of isotope, and the type of isomer generated by the atom, and these numerical values are used as elements.
- the above method can be performed using a medium that stores a sequence of elements in which the elements are rearranged in units of elements or atoms, and a device that outputs the series as a character string unique to the structural formula of the compound.
- an ID symbol unique to the chemical structural formula of the compound obtained by the above method and a storage medium storing the ID symbol.
- This ID symbol can be used to determine the identity or similarity of the chemical structures of the compounds. For example, it can be used to extract information on the same or similar chemical structural formulas within a single compound database or between two or more compound databases. Can also be used. For example, the above-mentioned ID symbol is added to the compound database or each file in the database containing the compound information, and the compound information is searched or collated only by comparing the ID symbols without using the chemical structural formula information. be able to.
- the above-mentioned ID symbol is used A method for maintaining the confidentiality of the chemical structural formula because it is not necessary to directly compare the chemical structural formulas of the compound; It said ID symbols are provided to be used to search more than one database in the same query; how the performed to match the ID symbol subjected to Oite same compound.
- Both the file and the record are of the same nature in the essence of the present invention, and are one mode of information recording format in a computer.
- a storage medium storing a computer program implementing the above method is provided by the present invention, wherein the apparatus for executing the above method and the program for operating the apparatus are provided.
- a recorded medium is provided by the present invention.
- the storage medium, the storage device, the recording medium, and the recording device any medium or device that can be read by a computer may be used, and preferably, a memory, a flash memory, a floppy disk, a hard disk, a CD-R0M, a DVD, and a M0. Etc. can be used.
- FIG. 1 is a diagram showing an example of a database system capable of searching for a record managed by a local ID by using an ID unique to a compound structural formula in a query.
- “Characters” are codes that encode all or some of the characters and symbols used around the world, such as alphabets, Arabic numerals, hiragana, hiragana, kana, kanji, and angles.
- a "character string” is a data sequence in which one or more finite numbers of characters are arranged in order. Usually, the data sequence is stored and used in a computer-readable storage device. “Character strings” include those that consist of alphanumeric characters and data that are converted into bit strings using the ASCI I code.
- a “sequence” is a data sequence in which one or more finite numbers of rational numbers are arranged in order. Usually, the data sequence is stored in a storage device that can be read by a computer and used. Since data that can be represented by a binary bit string can be represented by 0 and 1, it can be interpreted as either a character string or a sequence.
- each character is converted to a 1- or 4-byte length binary number in the character-code table. It is preferable to memorize.
- ASCI I code or UNICODE is preferred as a character-to-code table, but one-to-one correspondence between characters and sequences Any material may be used as long as it is attached.
- a group of characters in a character string collectively represents a numerical value such as a decimal number or a hexadecimal number, the value may be converted to a binary number and stored. It may be stored in the character code table by converting it to a binary number of 1 to 16 bytes long.
- each rational number is converted into a 1- to 8-byte binary number and stored.
- the value of each rational number may be converted to a binary number and stored, or the value may be represented as a group of 10 or hexadecimal numbers in a plurality of character groups and stored in the same format as a character string.
- Data expressing a character string and a sequence in a binary number format may be referred to as a “bit sequence” or “binary data” in this specification.
- “Chemical structural formula of a compound” is generally used by chemists to uniquely express a compound, and refers to a figure that describes the types of atoms, bonding relationships, types of bonds, and types of isomers. In the specification, a broader concept is used to mean data that uniquely identifies the structure of a compound.
- “Unique ID symbol” (sometimes referred to as “unique ID symbol” in this specification) means that the ID symbol of the same compound is the same, and the ID symbols of compounds with different structural formulas do not substantially match Means nature.
- “unique” may be used in place of “unique” in the sense of expressing the above properties. "Substantial disagreement” does not prove to be logically inconsistent, but in the sense that there is very little likelihood in practical use because the likelihood of a match is very small. is there.
- a “compound” is an atomic group bound to each other by a covalent bond, and includes inorganic compounds in addition to organic compounds.
- a conversion process for expressing a chemical structural formula of a compound by a unique character string is performed.
- a chemical structure into a unique character string, if the same chemical structure always generates the same character string, and different chemical structures generate substantially different character strings.
- the type is not particularly limited.
- methanol (CH 3 OH) will be specifically described as an example, but the conversion treatment that can be used in the method of the present invention is not limited to the following.
- This storage means may be a register, a memory, a magnetic storage medium, a punch tape, or the like, but a memory is most preferable as a storage device usable by a computer. Allocate to these atoms the values determined according to their "type of atom".
- a numerical value to be assigned a numerical value arbitrarily defined according to the element number of each atom, the type of each atom, or a different numerical value can be assigned to the type of isotope.
- the assigned numerical value may be assigned.
- data representing the chemical structural formula or three-dimensional data of the chemical structure which is information equivalent to the chemical structural formula, is input from an input means such as a file system, and It is good to automatically assign numerical values (element numbers are assigned for simplicity in the following explanation. Element numbers are not necessarily assigned in the conversion process. The following procedure is performed by changing the assigned values several times. In the figure below, the assigned values are also shown and the numbers in front of the element symbols are in a convenient order to distinguish each atom.)
- Step 1 Chemical formula (1)
- the numerical value assigned to each atom in step 1 is stored in the storage means 1.
- the data stored in the storage means 1 consists of a plurality of numerical values, where each numerical value is the number assigned to each atom. Value. If the number of atoms is M, it is possible to collectively treat these multiple values (numerical value 1, numerical value 2,, numerical value M) as one M-dimensional vector.
- the data is called a “vector”, and the numerical value assigned to each atom in the vector may be called an “element”.
- the vector stored in the storage means 1 in step 1 is called “first term”.
- the first term obtained by arranging numerical values in the order of 1H, 2H, 3H, 4C, 50, and 6H is (1, 1, 1, 6, 6, 8, 1). Stored in 1.
- information representing a covalent bond relationship between atoms is stored in the storage means c from the data representing the chemical structural formula input from the input means.
- the data structure of the storage means c is not particularly limited.
- a matrix or two-dimensional array (c [l, 2,, M] [l, 2,, M]) containing 0 is stored in electronic memory and used.
- Step 2 a storage means 2 equivalent to the storage means 1 is prepared, and a value newly allocated to each atom is stored in the storage means 2 as a result of performing the following arithmetic processing based on the value of each atom in step 1.
- the value of each atom in step 2 is calculated as follows.
- the value obtained by multiplying the value of each atom in the storage means 1 by a constant (preferably 1) is stored in the storage means 2.
- the number of partner atoms to which each atom can be covalently bonded by the storage means c (1 partner for H, 4 for C, 2 for 0) is multiplied by a constant (preferably 0) and stored.
- the value obtained by multiplying the value of each atom in step 1 is added to the value of the atom in the storage means 2.
- 3H (1 + 6 7)
- calculate the value of each atom in step n as follows.
- the multiplied (preferably 1) value is stored in the storage means n + 1.
- k is an integer group of n-1 or less selected from integers satisfying 1 ⁇ k ⁇ n.
- the value multiplied by the value of each atom is added to the value of the atom in the storage means n + 1.
- the storage means k of the partner atom group to which each atom can be examined by the storage means c (where k is an integer group of n or less arbitrarily selected from integers satisfying 1 ⁇ k ⁇ n
- the number of steps may be repeated any finite number of times (preferably about 10).
- a vector sequence corresponding to each step is generated.
- the recurrence formula is defined based on covalent bond relationship information between atoms stored in the storage means C.
- the execution result up to step 2 will be described for simplicity, but the number of steps is not particularly limited in the practice of the present invention.
- the vectors in steps 1 and 2 are as shown in the above chemical formula, the vector of storage means 1 is (1,1,1,1,6,8,1), and the vector of storage means 2 is ( 7, 7, 7, 17, 15, and 9).
- the vector elements for each atom are as follows: Chemical formula (3)
- these elements are rearranged according to the magnitude comparison rule to generate a sequence.
- the sequence "1, 1, 1, 1, 7, 7, 9, 15, 17" is a sequence that is substantially unique to a chemical structural formula.
- Is generated as As another size comparison rule it is also possible to arrange in the order in which the atoms are collectively compared for each atom.For example, it is possible to first compare the element strings for each atom by the value in the storage means 1 and then sort them in ascending order . If the values in the storage means 1 are equal, the values in the storage means 2 are compared to obtain the following order.
- the number of steps to be performed for the purpose of the present invention is as follows. For tens of thousands of commercially available compounds, character strings are generated by changing the number of steps, and the character strings collide (the same character strings are generated from different structural formulas). By comparing the frequencies, the minimum number of required steps can be estimated. Using this method to process actual data and studying string collisions, the longer the number of steps is calculated and the longer the string is, the longer the string is between compounds with different structures. It was confirmed that the collision could be prevented.
- ACD a database of about 250,000 commercially available compounds
- the string thus generated is a variable length string that is substantially unique to the chemical formula. This is referred to below as a "structured string”.
- Structural strings are generated from sequences like the ones above (such sequences are sometimes referred to as “structural sequences”), and have a one-to-one correspondence with the chemical structure, so the chemical structure matches. And similarity determination, and can also be used as an ID symbol.
- each value in the sequence is represented as a character string by Arabic characters, etc., and those character strings are concatenated with an arbitrary delimiter or null character, and combined into a single character string as a whole. Is also good.
- element numbers are assigned to each atom as initial values, but any number may be given instead of element numbers.
- the algorithm may be executed up to the final step, and when arranging the numbers, the numbers obtained with the respective initial values may be arranged together.
- atoms that have local features in the structure they can be dealt with by changing the initial value of the atom. For example, by changing the initial value of atoms related to differences in geometric isomerism, stereoisomerism, etc. for each isomer, it is possible to reflect differences in structural character strings.
- Structural character strings derived directly from the structural formula of a compound have various lengths, but are unique to the structural formula of the compound and are generated from information only on the structural formula. If the character string is within the appropriate length range, the structural character string itself may be used as an ID symbol to determine the identity or similarity of chemical structural formulas. When a shorter character string is used as the ID symbol, it is desirable to perform processing using a conversion function. By using a conversion function, a character string that is a fixed-length ID symbol can be derived from the structural character strings having different lengths obtained as described above. Therefore, a method including this step is a preferred embodiment of the present invention.
- an algorithm for converting a structural character string into a bit string and storing it in the storage means b, and converting it to a short fixed-length bit string of about 20 bytes is applied to the storage means b.
- the converted bit string can be stored in the storage means d.
- This can be converted to a character string and output from the output means as an ID symbol that is a character string.
- the storage means b and d any device capable of storing a binary number can be used, but preferably a computer register or a memory can be used.
- a structural character string is applied will be described.
- the present invention can be similarly applied to a structural number sequence.
- the character string as the ID symbol generated by the conversion function processing is unique to the structural character string, and must substantially satisfy the following conditions as a one-to-one mapping function.
- the same ID string is generated from the same structural character string.
- Different ID strings are generated from different structure strings.
- the ID symbol must be a fixed-length or variable-length (preferably fixed-length) short character string.
- the generation method is easy.
- a hash function more preferably a collision-resistant hash function, and a general-purpose one-way C and Nsch function
- the transformation function used in the method of the present invention does not need to be mathematically rigorously proved to be difficult to collide, preferably to be difficult to collide and one-way, and in fact satisfies the above characteristics. Any function may be used as long as it provides a conversion result.
- ⁇ Universal one-way hash function 3 ⁇ 4 A function introduced by Naor and Yung, where h (x) h given a function h and a certain value X of its domain A function for which it is difficult to find y such that (y).
- the collision-resistant hash function is stronger than the general-purpose one-way function.
- a hash function particularly a hard-to-collision hash function or a general-purpose one-way hash function, must be interpreted in the broadest sense, and should not be interpreted restrictively in any sense.
- any function classified as a collision-resistant hash function or a general-purpose one-way hash function can be used.
- SHA or SHA-1 the functions to be used and their combination are appropriately determined by those skilled in the art so as to sufficiently reduce the possibility of collision of generated ID symbols. Can be selected. In this specification, these functions are sometimes referred to as message digest functions.
- an algorithm of SHA will be introduced as a conversion function that can be particularly preferably used in the method of the present invention, but the conversion function that can be used in the method of the present invention is not limited to SHA.
- the character string that is the hash value generated by the hash function processing is represented by a combination of lowercase letters and numbers, but the characters are not limited to lowercase letters.
- the characters used in the method of the present invention may be either uppercase or lowercase, and may be used without distinguishing between uppercase and lowercase, or may be used with distinguishing between them.
- the present method may be implemented by using a high-speed hashing method, which has higher collision resistance, instead of SHA.
- SHA1 which is an improved version of SHA may be used.
- a hash value of 160 bits is generated for “m”.
- Padding is performed in the following procedure so that the input bit string “m” is a multiple of 512 bits (16 x 32 bits).
- Step 1) Add a bit array 100 ... 0 to the end of "m" so that the bit array length of m is '512N-64'.
- Step 2) Express the bit array length of the input array in 64 bits and add it after the bit array.
- the obtained bit array is divided into ⁇ pieces each of 512 bits, and each of them is Ml 5 5 2 ⁇ ⁇ .
- Mi Using the following constants and functions for the above bit array, calculate the hash value by the procedure described below.
- the following constant values are expressed in hexadecimal.
- X ⁇ n means that X is cyclically shifted left by n bits.
- Circular shift Moves the numerical array of bits in a certain direction, and the number at the end is cyclic.
- Reference 5) '+' indicates the remainder of the sum of left and right by '2 32 '.
- a method for converting a structure character string into a bit string will be described below.
- Each character is converted to 8 bits in the order of the character string by ASCII code to create a bit string.
- ASCII code may be used when converting a character code into a bit string.
- the bit string is a sequence of 1-bit information. One bit corresponds to one digit of a binary number and is represented by 0 or 1.
- ASCII code is used, the SHA condition of less than 2 64 bits means that the number of characters is less than about 210 18 , and it is possible to express a structured character string with a considerably high number of steps.
- 160 bits are decomposed into 5 bits, and each 5 bits is divided into 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, g, It is represented by a hexadecimal number using 32 characters of h, i, j, k, 1, m, n, o, p, q, r, s, t, u, and v.
- next 160-bit string is divided into 5 bits and converted to the corresponding hex characters
- the ID symbol generated by the method of the present invention includes information indicating the type of the corresponding data (for example, information indicating that the ID symbol indicates a compound) and the type of the ID symbol generation method. Information indicating the type (for example, information indicating the type of hash function used), etc.
- a character string one or more fixed-length character strings, preferably a character string composed of alphanumeric characters, may be added as a new ID symbol.
- the character string to be added may be placed at any part, such as the beginning or end of the character string obtained by applying the hash function.
- a character string of 1 is added to the beginning of a character string obtained as a hash value.
- the ID symbol of the present invention can be used for management and collation of compound data (including chemical structural formula data). Since the ID symbol is unique to each compound and the possibility of collision is extremely low, multiple compounds are treated by the method of the present invention to generate an ID symbol, and the ID symbols are compared by comparing the ID symbols. Can be determined easily and at high speed. For example, the same chemical structural formula as a specific compound can be searched at high speed from a compound database using the ID symbol. Also, compound databases can be managed using the ID symbols described above. For example, the above-mentioned ID symbol can be generated for a compound in the database, and a compound that is duplicated in the compound database can be detected. And it becomes possible to detect at high speed. Also, when registering new compound information in the database, it is possible to easily search whether or not the compound is already registered. Furthermore, it is possible to protect the confidentiality of compound data by disclosing only the ID symbol for compound comparison and not disclosing the compound data itself.
- the method of the present invention is convenient for the purpose of searching and collating basically the same chemical structural formula, but can also be used for classification and the like by detecting similar chemical structural formulas such as derivatives. Further, the following method can be used for the purpose of detecting a compound having a similar structural formula.
- a compound having a similar structural formula For a certain chemical structural formula, in addition to the ID symbol of the structural formula itself, it is better to create an ID symbol for the structure excluding the substituent (not limited to one) and store it together. For example, if the ID symbol generated by substituting H for C1 in a chloride compound and the ID symbol generated by substituting H for Br in a bromide compound match, it can be determined mechanically that the compound is related. The same operation can be performed for a group of compounds having a more complicated structural formula.
- ID symbol There is no limit on the number of, and it is only necessary to save them in order from the original one. Similarly, if multiple ID symbols are generated and stored for all the compounds in the database, whether or not there is a compound of a certain derivative series between data bases of different sources, and a specific compound Can be searched at high speed for the presence of the derivative in the compound database. It should be understood that all such embodiments are also within the scope of the present invention.
- step 1
- Figure 1 shows an example of building a database system that can search for records by using a unique ID for a compound structural formula as a query.
- records are assigned IDs and managed internally. Record ID (In Fig. 1, RecordIDl, RecordID2, etc. are used locally in this database system, so they are called local IDs here.
- IDs unique to the structural formulas of compounds.
- the correspondence table between the record searcher and ID and the local database may be physically separated from each other, Communication during this time may be performed via the Internet / Intranet, and the administrator and the mouth of the correspondence table between IDs may be used.
- the administrator of the local database may be different.
- the correspondence table between IDs may be any method as long as the unique ID can be searched for the local ID associated with the unique ID.
- the correspondence may be many-to-many.
- the processing procedure at the time of retrieval is as follows.
- a searcher outside the database system sends a search query that contains one or more unique IDs to the compound's structural formula to the record search device of the database system (Fig. 1 (2)).
- the record retrieval device retrieves the local ID associated with the unique ID from the ID correspondence table (Fig. 1 (1)).
- the record retrieval device retrieves a record with the local ID from the oral database (Fig. 1 (3)).
- the record search device sends the record back to the searcher.
- the searches for 2 and 3 can be performed collectively. It is possible.
- the searcher can search the database only from the IDs unique to the structural formula of the compound, and at the time of searching, search for the record of the local ID that is associated with the "correspondence table between IDs" be able to.
- the system administrator changes, adds, or deletes records in the local database, the correspondence between the unique ID and the role ID is changed to an appropriate one, so that the structure of the compound required by the searcher is changed.
- a setting is made so that correction information about the record is sent back to the searcher instead of the record. it can.
- searchers can search multiple databases simultaneously by sending the same unique ID as a query to multiple database systems shown in Fig. 1 via the Internet intranet. .
- the index search program automatically recognizes the unique ID in the file as a key and automatically creates a correspondence between the ID and the path of the file (corresponding to a correspondence table between IDs). Willing to. Therefore, by sending the unique ID as a query to the index search program, a file containing the unique ID can be searched.
- a unique ID symbol is generated in a chemical structural formula of a compound having a fixed length or a variable length and having a very low probability of collision for a compound having any structure. be able to.
- This ID symbol can be generated very quickly and easily from the chemical formula of the compound.
- the ID symbol is unique to the chemical formula of each organic compound and there is virtually no possibility of collision, the ID symbol By comparing only one, the identity or similarity of the chemical structures can be easily determined. It can be used for database management so that entries do not overlap, compound databases made at different sites can be used centrally, and whether a compound or its derivative exists in the compound database, etc. Can be checked at high speed.
- the ID symbol of the present invention is generated by software from the chemical structure itself, if the software is distributed, the same ID symbol will be given to the same structure anywhere in the world, instead of the chemical structural formula. It can be used for overnight search and collation. This eliminates the need to search the database using the chemical structural formula itself as a query, thereby preventing confidential information from leaking outside during communication or search.
- the database administrator adds all the compounds in the database, it can be used to avoid duplication and to link between databases created from different sources.
- the same software can easily assign ID symbols to chemical structural formulas of compounds to be synthesized or to be synthesized by researchers, making it possible to search databases and check structures.
Landscapes
- Crystallography & Structural Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
- Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
- Dental Preparations (AREA)
- Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)
- Adhesives Or Adhesive Processes (AREA)
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002393321A CA2393321A1 (en) | 1999-11-19 | 2000-11-16 | Id symbol unique to structural formula of compound |
EP00976284A EP1235159B1 (en) | 1999-11-19 | 2000-11-16 | Id symbol unique to structural formula of compound |
AU14139/01A AU1413901A (en) | 1999-11-19 | 2000-11-16 | Id symbol unique to structural formula of compound |
DE60033422T DE60033422T2 (de) | 1999-11-19 | 2000-11-16 | Identifikationssymbol das einmalig für die struktur der formel einer mischung ist |
US11/381,497 US20070027900A1 (en) | 1999-11-19 | 2006-05-03 | Id symbol unique to structural formula of compound |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP11-330432 | 1999-11-19 | ||
JP33043299 | 1999-11-19 | ||
JP2000-149641 | 2000-05-22 | ||
JP2000149641 | 2000-05-22 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/381,497 Continuation US20070027900A1 (en) | 1999-11-19 | 2006-05-03 | Id symbol unique to structural formula of compound |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001039041A1 true WO2001039041A1 (en) | 2001-05-31 |
Family
ID=26573527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2000/008078 WO2001039041A1 (en) | 1999-11-19 | 2000-11-16 | Id symbol unique to structural formula of compound |
Country Status (8)
Country | Link |
---|---|
US (1) | US20070027900A1 (ja) |
EP (1) | EP1235159B1 (ja) |
CN (1) | CN1425159A (ja) |
AT (1) | ATE354133T1 (ja) |
AU (1) | AU1413901A (ja) |
CA (1) | CA2393321A1 (ja) |
DE (1) | DE60033422T2 (ja) |
WO (1) | WO2001039041A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007323182A (ja) * | 2006-05-30 | 2007-12-13 | Riron Soyaku Kenkyusho:Kk | 大規模化学構造データベースから高速に化学構造を検索するシステム及び方法 |
JP2009116592A (ja) * | 2007-11-06 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | ベクトル検索装置、ベクトル検索方法、プログラムおよびプログラムを記録した記録媒体 |
JP2009543489A (ja) * | 2006-07-10 | 2009-12-03 | ジェムアルト エスアー | 匿名の機密データを管理するためのサーバ |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010085075A (ko) * | 2001-08-01 | 2001-09-07 | 조현정 | 네트워크 기반의 3차원 화학정보 제공시스템 및 그 화학식에디터 |
US7809843B1 (en) * | 2003-09-18 | 2010-10-05 | Intel Corporation | Globally unique identification in communications protocols and databases |
US9143357B2 (en) * | 2004-03-31 | 2015-09-22 | Nec Infrontia Corporation | Chat apparatus transmitting/receiving information indicating switching of chat |
US20070016612A1 (en) * | 2005-07-11 | 2007-01-18 | Emolecules, Inc. | Molecular keyword indexing for chemical structure database storage, searching, and retrieval |
US7676484B2 (en) * | 2006-07-30 | 2010-03-09 | International Business Machines Corporation | System and method of performing an inverse schema mapping |
US7996576B2 (en) * | 2008-05-08 | 2011-08-09 | Lsi Corporation | Generating an identifier for a SATA disk |
US9600808B1 (en) * | 2011-06-24 | 2017-03-21 | Epic One Texas, Llc | Secure payment card, method and system |
US20160021543A1 (en) * | 2012-01-05 | 2016-01-21 | Andrew Jay Diamond | Method and system for ad hoc cellular pbx |
CN113919290A (zh) * | 2020-07-09 | 2022-01-11 | 中国科学院上海药物研究所 | 一种用于有机化合物的化学结构和命名双向自动转化的处理方法及装置 |
CN112988358A (zh) * | 2021-04-18 | 2021-06-18 | 上海丽人丽妆网络科技有限公司 | 一种用于电商平台的数据中间件 |
CN113903410B (zh) * | 2021-12-08 | 2022-03-11 | 成都健数科技有限公司 | 一种化合物检索方法及系统 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996006391A2 (en) * | 1994-08-10 | 1996-02-29 | Oxford Molecular Limited | Relational database management system for chemical structure storage, searching and retrieval |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996029659A1 (fr) * | 1995-03-17 | 1996-09-26 | Kureha Kagaku Kogyo Kabushiki Kaisha | Processeur, methode de traitement et support d'enregistrement d'informations biochimiques |
JP3462024B2 (ja) * | 1996-12-04 | 2003-11-05 | 株式会社東芝 | ネットワークシステムの伝送制御方法 |
US6640278B1 (en) * | 1999-03-25 | 2003-10-28 | Dell Products L.P. | Method for configuration and management of storage resources in a storage network |
WO2003023656A1 (en) * | 2001-09-13 | 2003-03-20 | Jda Software Group, Inc | Database interface architecture with time-based load balancing in a real-time environment |
US8108249B2 (en) * | 2001-12-04 | 2012-01-31 | Kimberly-Clark Worldwide, Inc. | Business planner |
US7379890B2 (en) * | 2003-10-17 | 2008-05-27 | Makor Issues And Rights Ltd. | System and method for profit maximization in retail industry |
-
2000
- 2000-11-16 AU AU14139/01A patent/AU1413901A/en not_active Abandoned
- 2000-11-16 DE DE60033422T patent/DE60033422T2/de not_active Expired - Lifetime
- 2000-11-16 CA CA002393321A patent/CA2393321A1/en not_active Abandoned
- 2000-11-16 EP EP00976284A patent/EP1235159B1/en not_active Expired - Lifetime
- 2000-11-16 WO PCT/JP2000/008078 patent/WO2001039041A1/ja active IP Right Grant
- 2000-11-16 AT AT00976284T patent/ATE354133T1/de not_active IP Right Cessation
- 2000-11-16 CN CN00818519A patent/CN1425159A/zh active Pending
-
2006
- 2006-05-03 US US11/381,497 patent/US20070027900A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996006391A2 (en) * | 1994-08-10 | 1996-02-29 | Oxford Molecular Limited | Relational database management system for chemical structure storage, searching and retrieval |
Non-Patent Citations (3)
Title |
---|
FUKUDA ET AL.: "Seibutsu johou tougou database system no kouchiku; Atarashii hassou ni motozuita seibutsu johou kanrihou", DAI 22KAI JOUHOU KAGAKU TOURONKAI, DAI 27 KAI KOUZOU KASSEI SOUKAN SYMPOSIUM KOUEN YOUSHISHU, 31 October 1999 (1999-10-31), pages 84 - 85, XP002935900 * |
IHLENFELDT W.D. & GASTEIGER J.: "Hash codes for the identification and classification of molecular structure elements", JOURNAL OF COMPUTATIONAL CHEMISTRY, vol. 15, no. 8, August 1994 (1994-08-01), pages 793 - 813, XP002935899 * |
WIPKE W T ET AL: "Stereochemically unique naming algorithm", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 96, no. 15, 24 July 1974 (1974-07-24), pages 4834 - 4872, XP002935898 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007323182A (ja) * | 2006-05-30 | 2007-12-13 | Riron Soyaku Kenkyusho:Kk | 大規模化学構造データベースから高速に化学構造を検索するシステム及び方法 |
JP2009543489A (ja) * | 2006-07-10 | 2009-12-03 | ジェムアルト エスアー | 匿名の機密データを管理するためのサーバ |
JP2009116592A (ja) * | 2007-11-06 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | ベクトル検索装置、ベクトル検索方法、プログラムおよびプログラムを記録した記録媒体 |
Also Published As
Publication number | Publication date |
---|---|
AU1413901A (en) | 2001-06-04 |
EP1235159B1 (en) | 2007-02-14 |
EP1235159A1 (en) | 2002-08-28 |
EP1235159A4 (en) | 2003-04-02 |
CA2393321A1 (en) | 2001-05-31 |
DE60033422D1 (de) | 2007-03-29 |
DE60033422T2 (de) | 2007-11-29 |
CN1425159A (zh) | 2003-06-18 |
ATE354133T1 (de) | 2007-03-15 |
US20070027900A1 (en) | 2007-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070027900A1 (en) | Id symbol unique to structural formula of compound | |
US11899641B2 (en) | Trie-based indices for databases | |
JP5373846B2 (ja) | リレーショナルシステムにおける階層的に編成された情報にアクセスするための階層的インデックス付け | |
US8060521B2 (en) | Systems and methods of directory entry encodings | |
Broder | Some applications of Rabin’s fingerprinting method | |
JP4722620B2 (ja) | 暗号化文書検索方法および暗号化文書検索システム | |
JP2638307B2 (ja) | データベースの登録簿を探索する方法 | |
JP2022531790A (ja) | Dnaに基づくデータ記憶における探索、算出、および索引付けのためのデータ構造および動作 | |
US7574457B2 (en) | Non-mutating tree-structured file identifiers | |
JP2009003541A (ja) | データベースのインデックス作成システム、方法及びプログラム | |
WO2006094365A1 (en) | Method for storing data with reduced redundancy using data clusters | |
CN109492410B (zh) | 数据可搜索加密和关键词搜索方法、系统及终端、设备 | |
JP2011215835A (ja) | 全文検索機能を備えるストレージ装置 | |
Lippert et al. | A space-efficient construction of the Burrows–Wheeler transform for genomic data | |
Moataz et al. | Oblivious substring search with updates | |
JP4768009B2 (ja) | データ・クラスタを使用する冗長性の少ないデータを格納する方法 | |
Kanda et al. | Dynamic path-decomposed tries | |
Ghaleb et al. | Novel scheme for labeling XML trees based on bits-masking and logical matching | |
JP2001022766A (ja) | 多次元データベースの高速処理方法および装置 | |
Zhuang et al. | Full tree-based encoding technique for dynamic XML labeling schemes | |
CN113836018B (zh) | 一种测试环境配置参数的备份方法及相关装置 | |
JP2988304B2 (ja) | 文字列管理装置 | |
JP2990312B2 (ja) | データアクセス方法および装置 | |
Peters et al. | FLOUDS: A Succinct File System Structure. | |
Dromey | A Compact Free-Keyword File Structure for Author-Title-Keyword Searching. An Application to an NMR Bibliographic Database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2001 540635 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2393321 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000976284 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 008185190 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2000976284 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWG | Wipo information: grant in national office |
Ref document number: 2000976284 Country of ref document: EP |