EP4220644A1 - Feature amount calculation program, feature amount calculation method, and feature amount calculation device - Google Patents
Feature amount calculation program, feature amount calculation method, and feature amount calculation device Download PDFInfo
- Publication number
- EP4220644A1 EP4220644A1 EP22198688.8A EP22198688A EP4220644A1 EP 4220644 A1 EP4220644 A1 EP 4220644A1 EP 22198688 A EP22198688 A EP 22198688A EP 4220644 A1 EP4220644 A1 EP 4220644A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- feature amount
- type
- amount calculation
- atomic
- amino acid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims description 63
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 10
- 150000001413 amino acids Chemical class 0.000 claims description 161
- 102000001189 Cyclic Peptides Human genes 0.000 claims description 58
- 108010069514 Cyclic Peptides Proteins 0.000 claims description 58
- 238000000034 method Methods 0.000 claims description 5
- 235000001014 amino acid Nutrition 0.000 description 160
- 235000005772 leucine Nutrition 0.000 description 27
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 26
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 24
- 238000010586 diagram Methods 0.000 description 20
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 18
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 16
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 16
- 230000010365 information processing Effects 0.000 description 16
- 235000018977 lysine Nutrition 0.000 description 16
- 239000004472 Lysine Substances 0.000 description 15
- 235000003704 aspartic acid Nutrition 0.000 description 14
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 14
- 238000010801 machine learning Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- LCPYQJIKPJDLLB-UWVGGRQHSA-N Leu-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC(C)C LCPYQJIKPJDLLB-UWVGGRQHSA-N 0.000 description 4
- 108010091798 leucylleucine Proteins 0.000 description 4
- 238000007876 drug discovery Methods 0.000 description 3
- 150000002614 leucines Chemical class 0.000 description 3
- OTXBNHIUIHNGAO-UWVGGRQHSA-N Leu-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN OTXBNHIUIHNGAO-UWVGGRQHSA-N 0.000 description 2
- 108010034529 leucyl-lysine Proteins 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- FRYULLIZUDQONW-IMJSIDKUSA-N Asp-Asp Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(O)=O FRYULLIZUDQONW-IMJSIDKUSA-N 0.000 description 1
- CIOWSLJGLSUOME-BQBZGAKWSA-N Lys-Asp Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CC(O)=O CIOWSLJGLSUOME-BQBZGAKWSA-N 0.000 description 1
- NVGBPTNZLWRQSY-UWVGGRQHSA-N Lys-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(O)=O)CCCCN NVGBPTNZLWRQSY-UWVGGRQHSA-N 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 108010040443 aspartyl-aspartic acid Proteins 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- DVCSNHXRZUVYAM-BQBZGAKWSA-N leu-asp Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(O)=O)CC(O)=O DVCSNHXRZUVYAM-BQBZGAKWSA-N 0.000 description 1
- 150000002669 lysines Chemical class 0.000 description 1
- 108010003700 lysyl aspartic acid Proteins 0.000 description 1
- 108010054155 lysyllysine Proteins 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- the embodiments relate to a feature amount calculation program, a feature amount calculation method, and a feature amount calculation device.
- a method for using a fingerprint as a feature amount a method for representing a feature amount by considering a molecular sequence as a structure having a beginning and an end and the like are known.
- the conventional technology described above was insufficient to reflect a molecular structure to a feature amount with high accuracy in a case where a partial structure of the molecule has a specific sequence, in a case where the molecule includes a cyclic structure and the like.
- an object of the present embodiments is to reflect the molecular structure to the feature amount with high accuracy.
- a feature amount calculation program comprising instructions which, when executed by a computer, cause the computer to execute processing including: receiving structure specifying information that specifies a type of each of a plurality of atomic groups and a sequence of the plurality of atomic groups regarding a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced; specifying an optional first type and an optional second type in the plurality of types; specifying, on the basis of the structure specifying information, one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups; and calculating, on the basis of the structure specifying information, a number of pairs of the first atomic group and the second atomic group in which a mutual distance in the sequence between the first atomic group and the second atomic group is a predetermined distance.
- a feature of a molecular structure is reflected to a feature amount with high accuracy.
- FIG. 1 is a diagram illustrating an example of a feature amount calculation device.
- a feature amount calculation program is installed in a feature amount calculation device 100 of this embodiment, and a function of a feature amount calculation unit 110 is implemented by executing the feature amount calculation program.
- the feature amount calculation unit 110 is described later in detail.
- the feature amount calculation device 100 of this embodiment is connected to, for example, an information processing device 200 and the like via a network and the like.
- the feature amount calculation device 100 of this embodiment calculates a feature amount indicating the molecular structure using the structure specifying information by the feature amount calculation unit 110 and outputs the same to the information processing device 200.
- Structure specifying information 10 is information specifying a structure of a molecule in which a plurality of atomic groups is cyclically sequenced. The structure specifying information 10 is described later in detail.
- the feature amount calculation device 100 acquires information indicating the number of atomic groups of a specific type arranged n atomic groups away for each of a plurality of atomic groups included in the structure specifying information 10 and output the same to the information processing device 200.
- the information processing device 200 may include a learning unit and may perform machine learning using the feature amount output from the feature amount calculation device 100.
- a feature amount 30 of this embodiment may be used to predict a substance amount required for drug discovery and the like.
- the feature amount by expressing the feature amount in this manner, it is possible to reflect the fact that the specific atomic group is included in the molecule and the fact that the plurality of atomic groups is cyclically sequenced to the feature amount of the molecule. Therefore, according to this embodiment, the molecular structure may be reflected to the feature amount with high accuracy.
- the atomic group indicates a partial structure in the molecule.
- the partial structure (atomic group) of this embodiment is an amino acid.
- the molecule in this embodiment indicates a cyclic peptide.
- the cyclic peptide is a molecule in which a plurality of amino acids is cyclically sequenced.
- Types of amino acids include, for example, aspartic acid, leucine, lysine and the like.
- aspartic acid might be expressed as "asp”, leucine as “leu”, and lysine as "lys”.
- the structure specifying information 10 is assumed to be input from the information processing device 200, but this is not limited thereto.
- the structure specifying information 10 may be directly input to the feature amount calculation device 100.
- the feature amount 30 is assumed to be output to the information processing device 200, but this is not limited thereto.
- the feature amount 30 may be output to a device other than the information processing device 200.
- an output destination of the feature amount 30 may be, for example, a learning device that performs machine learning using the feature amount 30.
- FIG. 2 is a diagram for illustrating the feature amount of a first embodiment.
- FIG. 2A illustrates an example of the cyclic peptide
- FIG. 2B illustrates an example of the feature amount 30.
- the number of pairs of a certain amino acid and an amino acid located in an n-th position from the amino acid in a sequence of the cyclic peptide 20 is counted. Then, in this embodiment, a matrix in which a value of n is made a row, information indicating types of amino acids included in the pair is made a column, and the number of pairs is made a component is made the feature amount.
- the value of n is referred to as a "distance" between the amino acids in the cyclic peptide 20.
- the feature amount 30 of this embodiment may be said to be information including, for each amino acid in the cyclic peptide 20, the value of n indicating a distance between a certain amino acid and another amino acid, and the number of other amino acids arranged at a distance n from a certain amino acid.
- this information may be said to be information indicating a positional relationship between each amino acid and another amino acid in a sequence of amino acids included in the cyclic peptide 20.
- the feature amount 30 includes information indicating a type of a certain amino acid and a type of another amino acid located in an n-th position from the certain amino acid.
- the feature amount 30 of this embodiment may be said to be the information indicating a positional relationship between each amino acid and another amino acid in the sequence of amino acids included in the cyclic peptide 20 and the information indicating a type of each amino acid and a type of another amino acid.
- the another amino acid in this embodiment may be the same type of amino acid as the certain amino acid, or may be a different type of amino acid.
- the cyclic peptide 20 illustrated in FIG. 2A has a structure in which leucine (leu), aspartic acid (asp), and lysine (lys) are cyclically sequenced.
- each of the number of pairs (leu-leu) of leucine and leucine located in an n-th position from leucine, the number of pairs (leu-asp) of leucine and aspartic acid located in an n-th position from leucine, and the number of pairs (leu-lys) of leucine and lysine located in an n-th position from leucine is counted.
- the number of other amino acids arranged in the position at the distance n from leucine is counted.
- the other amino acids include leucine, aspartic acid, and lysine.
- the number of pairs (asp-asp) of aspartic acid and aspartic acid located in an n-th position from aspartic acid, the number of pairs (lys-lys) of lysine and lysine located in an n-th position from lysine, and the number of pairs (lys-asp) of lysine and aspartic acid located in an n-th position from lysine are counted.
- the number of other amino acids located in the position at the distance n from aspartic acid and the number of other amino acids located in the position at the distance n from lysine are counted.
- the other amino acids include leucine, aspartic acid, and lysine.
- n 1
- the total number of leucines arranged in a first position from reference leucine is one.
- n 1
- the total number of lysines arranged in a first position from reference leucine is three.
- n 2
- the total number of leucines arranged in a second position from reference leucine is one.
- n 3
- the total number of leucines arranged in a third position from reference leucine is one.
- possible combinations (pairs) of types of amino acids are specified in a plurality of amino acids included in the cyclic peptide 20. Then, in this embodiment, the matrix in which the types of the amino acids in the specified pair, the distance between the amino acids included in the pair, and the number of pairs for each distance are associated with one another is made the feature amount 30.
- FIG. 3 is a diagram illustrating an example of the hardware configuration of the feature amount calculation device.
- the feature amount calculation device 100 of this embodiment is a computer including an input device 11, an output device 12, a drive device 13, an auxiliary storage device 14, a memory device 15, an arithmetic processing device 16, and an interface device 17 mutually connected to one another via a bus B1.
- the input device 11 is a device for inputting various types of information, and is implemented by, for example, a keyboard, a pointing device and the like.
- the output device 12 is for outputting various types of information, and is implemented by, for example, a display and the like.
- the interface device 17 includes a local area network (LAN) card and the like, and is used for connecting to a network.
- LAN local area network
- the feature amount calculation program that implements the feature amount calculation unit 110 included in the feature amount calculation device 100 is at least part of various programs that control the feature amount calculation device 100.
- the feature amount calculation program is provided by, for example, distribution of a recording medium 18, download from the network and the like.
- the recording medium 18 recording the feature amount calculation program it is possible to use various types of recording media such as a recording medium that optically, electrically, or magnetically records information such as a compact disk read only memory (CD-ROM), a flexible disk, and a magneto-optical disc, a semiconductor memory that electrically records information such as a ROM and a flash memory and the like.
- the recording medium 18 that records the feature amount calculation program When the recording medium 18 that records the feature amount calculation program is set in the drive device 13, the feature amount calculation program recorded in the recording medium 18 is installed in the auxiliary storage device 14 from the recording medium 18 via the drive device 13.
- the feature amount calculation program downloaded from the network is installed in the auxiliary storage device 14 via the interface device 17.
- the auxiliary storage device 14 stores the feature amount calculation program installed in the feature amount calculation device 100, and also stores various required files, data and the like by the feature amount calculation device 100.
- the memory device 15 reads the feature amount calculation program from the auxiliary storage device 14 at startup of the feature amount calculation device 100, and stores the same. Then, the arithmetic processing device 16 implements various types of processing to be described later in accordance with the feature amount calculation program stored in the memory device 15.
- FIG. 4 is a diagram for illustrating the function of the feature amount calculation unit of the first embodiment.
- the feature amount calculation unit 110 of this embodiment includes an input reception unit 111, a pair specification unit 112, a pair number count unit 113, a feature amount acquisition unit 114, and an output unit 115.
- the input reception unit 111 receives various inputs to the feature amount calculation device 100.
- the input reception unit 111 receives the structure specifying information 10 input to the feature amount calculation device 100.
- the pair specification unit 112 specifies a pair in which amino acids are at a specified distance with reference to the structure specifying information 10.
- the pair number count unit 113 counts the number of specified pairs included in the cyclic peptide.
- the feature amount acquisition unit 114 acquires the feature amount in which the specified pair, the distance between the amino acids included in the pair, and the number counted by the pair number count unit 113 are represented as the matrix.
- the output unit 115 outputs the feature amount acquired by the feature amount acquisition unit 114 to an external device such as the information processing device 200.
- FIG. 5 is a diagram for illustrating the structure specifying information.
- FIG. 5A illustrates an example of the cyclic peptide
- FIG. 5B illustrates an example of the structure specifying information specifying the structure of the cyclic peptide.
- the structure specifying information 10 of this embodiment is information including a type of the amino acid included in the cyclic peptide 20 and a type of an amino acid next to a certain amino acid.
- the cyclic peptide 20 includes three types of amino acids, which are aspartic acid, leucine, and lysine, as illustrated in FIG. 5A . Furthermore, the cyclic peptide 20 includes six amino acids.
- the structure specifying information 10 is a matrix indicating the sequence of the amino acid included in the cyclic peptide 20, and components in each column and each row indicate whether the amino acids indicated by each column and row are next to each other.
- a component in first column and second row and a component in first column and sixth row are "1", and it is understood that aspartic acid is next to leucine and lysine in the cyclic peptide 20.
- a component in second column and second row and a component in second column and third row are "1”, and it is understood that leucine arranged next to aspartic acid is also next to lysine in the cyclic peptide 20.
- the structure specifying information 10 of this embodiment may be created in advance by, for example, a user of the information processing device 200 and the like and input to the feature amount calculation device 100.
- FIG. 6 is a flowchart for illustrating the processing of the feature amount calculation device of the first embodiment.
- the feature amount calculation unit 110 of the feature amount calculation device 100 of this embodiment receives an input of the structure specifying information 10 by the input reception unit 111 (step S601). Subsequently, the feature amount calculation device 100 reads order of the sequence of amino acids from the structure specifying information 10 by the pair specification unit 112 (step S602).
- the pair specification unit 112 specifies a certain type of amino acid (first atomic group), which is one of the amino acids included in the pair, from the sequence of amino acids indicated by the structure specifying information 10 (step S603).
- a type of the amino acid specified at step S603 is sometimes represented by "A”
- the amino acid of the type specified at step S603 is sometimes represented by an amino acid A.
- the pair specification unit 112 sets a value of n indicating the distance between the amino acid A and an amino acid paired with the amino acid A to "1" (step S604).
- the pair specification unit 112 specifies an amino acid (second atomic group) arranged n amino acids away from the amino acid A, from the sequence of amino acids indicated by the structure specifying information 10 (step S605).
- a type of the amino acid specified at step S605 is sometimes represented by "B", and the amino acid of the type specified at step S605 is sometimes represented by an amino acid B.
- the pair specification unit 112 specifies the amino acid of the type "B" arranged at a distance n from the amino acid of the type "A” with reference to the amino acid of the type "A".
- the feature amount calculation unit 110 counts the number of amino acids B n amino acids away from the amino acid A by the pair number count unit 113 (step S606).
- the pair number count unit 113 counts the number of pairs including the amino acid A and the amino acid B located in an n-th position from the amino acid A.
- the feature amount calculation unit 110 determines whether the processing from step S603 to step S606 is performed until the value of n reaches a maximum value in the sequence of amino acids indicated by the structure specifying information 10 (step S607).
- the maximum value of n may be the number of amino acids included in the cyclic peptide indicated by the structure specifying information 10.
- the feature amount calculation unit 110 determines whether the processing from step S604 to step S608 is performed for all the types of amino acids included in the structure specifying information 10 (step S609).
- step S609 in a case where the processing is not performed for all the types of amino acids, the feature amount calculation unit 110 sets a type different from the type specified at step S603 to type "A" (step S610), and returns to step S604.
- the feature amount calculation unit 110 acquires the feature amount 30 in which the number acquired by the pair number count unit 113 is represented by a matrix by the feature amount acquisition unit 114 (step S611).
- the feature amount calculation unit 110 outputs the acquired feature amount 30 to an external device such as the information processing device 200 by the output unit 115 (step S612), and finishes the processing.
- the feature amount calculation device 100 of this embodiment executes processing of receiving the structure specifying information of specifying each type of a plurality of atomic groups (amino acids) and the sequence of the plurality of atomic groups regarding the cyclic peptide, which is a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced.
- the feature amount calculation device 100 receives the structure specifying information, this executes processing of specifying an optional first type (amino acid A) and an optional second type (amino acid B) out of the plurality of types, and processing of specifying one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups, on the basis of the structure specifying information.
- the feature amount calculation device 100 executes processing of calculating the number of pairs of the first atomic group and the second atomic group in which a distance n therebetween in the sequence of the first atomic group and the second atomic group is a predetermined distance, on the basis of the structure specifying information.
- FIG. 7A is a first diagram for illustrating processing using the feature amount.
- FIG. 7B is a second diagram for illustrating processing using the feature amount.
- FIG. 7C is a third diagram for illustrating processing using the feature amount.
- FIG. 7A , FIG. 7B , and FIG. 7C illustrate a case where the feature amount is acquired by applying this embodiment regarding a cyclic peptide 71, a cyclic peptide 72, and a cyclic peptide 73, respectively.
- a feature amount 31 illustrated in FIG. 7A is a feature amount acquired by applying this embodiment regarding the cyclic peptide 71 including two amino acids A and one amino acid B.
- a feature amount 32 illustrated in FIG. 7B is a feature amount acquired by applying this embodiment regarding the cyclic peptide 72 including three amino acids A and one amino acid B.
- a feature amount 33 illustrated in FIG. 7C is a feature amount acquired by applying this embodiment to the cyclic peptide 73 including two amino acids A and three amino acids B.
- the information processing device 200 calculated the similarity of the cyclic peptides 71, 72, and 73 on the basis of the feature amounts 31, 32, and 33 calculated by the feature amount calculation device 100.
- the similarity of the cyclic peptides 71, 72, and 73 was calculated using a cosine similarity formula.
- the cosine similarity formula is a method of regarding a matrix as a vector in one row and calculating the similarity from an angle formed between the vectors.
- the similarity between the cyclic peptide 71 and the cyclic peptide 72 was 0.77
- the similarity between the cyclic peptide 71 and the cyclic peptide 73 was 0.51
- the similarity between the cyclic peptide 72 and the cyclic peptide 73 was 0.50.
- the information processing device 200 may perform machine learning on the basis of teacher data including attribute values of known cyclic peptides, and estimate the attribute values of the cyclic peptides 71, 72, and 73 on the basis of the feature amounts 31, 32, and 33 calculated by the feature amount calculation device 100. Furthermore, the information processing device 200 may perform machine learning on the basis of information regarding the feature amounts 31, 32, and 33 and the attribute values of the cyclic peptides 71, 72, and 73.
- the information processing device 200 is a computer including an input device, an output device, a drive device, an auxiliary storage device, a memory device, an arithmetic processing device, and an interface device mutually connected to one another via a bus.
- the second embodiment is different from the first embodiment in specifying whether a distance n between amino acids is made a distance in a first direction of a cycle in a cyclic molecule or a distance in a second direction opposite to the first direction.
- a component having a functional configuration similar to that in the first embodiment is denoted by a reference sign similar to the reference sign used in the description of the first embodiment, and the description thereof is omitted.
- FIGs. 8A to 8C are diagrams for illustrating a feature amount of the second embodiment.
- FIGs. 8A and 8B illustrate a state in which, in a sequence of amino acids, an amino acid of a type "A", an amino acid of a type "B", and an amino acid of a type "C” are bonded by amide bond (-NHCO-).
- FIG. 8A illustrates an example of a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in the clockwise direction (direction of arrow Y1).
- an N-terminus of the amino acid A is bonded to a C-terminus of the amino acid B
- an N-terminus of the amino acid B is bonded to a C-terminus of the amino acid C.
- FIG. 8B illustrates a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in the counterclockwise direction (direction of arrow Y2).
- a C-terminus of the amino acid A is bonded to an N-terminus of the amino acid B
- a C-terminus of the amino acid B is bonded to an N-terminus of the amino acid C.
- a feature amount of a cyclic peptide is calculated on the basis of the structure specifying information 10 and the direction specifying information.
- a cyclic peptide 80 illustrated in FIG. 8C includes the amino acid A, the amino acid B, the amino acid C, and two other amino acids.
- a pair of the amino acid A and the amino acid C is a pair of the amino acid A and the amino acid C two amino acids away from the amino acid A in the clockwise direction and a pair of the amino acid C and the amino acid A three amino acids away from the amino acid C in the clockwise direction.
- the sequence of the amino acids may be expressed more accurately.
- FIG. 9 is a diagram for illustrating a function of the feature amount calculation unit of the second embodiment.
- the feature amount calculation unit 110A of this embodiment includes an input reception unit 111, a pair specification unit 112A, a pair number count unit 113, a feature amount acquisition unit 114, an output unit 115, and a direction specification unit 116.
- the pair specification unit 112A specifies another amino acid located in a position at a distance n from a certain amino acid in a direction specified by the direction specification unit 116 as an amino acid paired with the certain amino acid.
- the direction specification unit 116 specifies the direction when counting the distance between the amino acids in the cyclic peptide on the basis of the direction specifying information input from the information processing device 200 and the like.
- FIG. 10 is a flowchart for illustrating processing of the feature amount calculation device of the second embodiment.
- the feature amount calculation unit 110A of this embodiment receives an input of the structure specifying information 10 by the input reception unit 111 (step S1001). Subsequently, the feature amount calculation unit 110A receives an input of the direction specifying information by the input reception unit 111 (step S1002).
- step S1003 to step S1005 in FIG. 10 Since processing from step S1003 to step S1005 in FIG. 10 is similar to the processing from step S602 to step S604 in FIG. 6 , the description thereof is omitted.
- the feature amount calculation unit 110A refers to the direction specifying information input at step S1002 by the pair specification unit 112A, specifies the amino acid arranged in a position at the distance n in the specified direction from the amino acid of the type "A" (step S1006), and shifts to step S1007.
- step S1007 to step S1013 in FIG. 10 is similar to the processing from step S606 to step S612 in FIG. 6 , the description thereof is omitted.
- the structure of the cyclic peptide formed by the sequence of the amino acids may be reflected in the feature amount with high accuracy.
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Biotechnology (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Bioethics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Collating Specific Patterns (AREA)
- Image Analysis (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
- The embodiments relate to a feature amount calculation program, a feature amount calculation method, and a feature amount calculation device.
- Recently, in a field of drug discovery, machine learning has been attracting attention as a method for searching for candidate molecules, and a technology of specifying a feature amount that may be handled by machine learning on the basis of a molecular structure is being considered.
- As an example, for example, a method for using a fingerprint as a feature amount, a method for representing a feature amount by considering a molecular sequence as a structure having a beginning and an end and the like are known.
-
Japanese National Publication of International Patent Application No. 2012-509848 Japanese National Publication of International Patent Application No. 2020-517290 - Tajimi et al. BMC Bioinformatics 2018, 19 (Suppl 19): 527, X. Yang et al. / Computational and Structural Biotechnology Journal 18 (2020) 153-161, and Carhart et al., J. Chem. Inf., 1985 are also disclosed as related art.
- The conventional technology described above was insufficient to reflect a molecular structure to a feature amount with high accuracy in a case where a partial structure of the molecule has a specific sequence, in a case where the molecule includes a cyclic structure and the like.
- In one aspect, an object of the present embodiments is to reflect the molecular structure to the feature amount with high accuracy.
- According to an aspect of the embodiments, there is provided a feature amount calculation program comprising instructions which, when executed by a computer, cause the computer to execute processing including: receiving structure specifying information that specifies a type of each of a plurality of atomic groups and a sequence of the plurality of atomic groups regarding a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced; specifying an optional first type and an optional second type in the plurality of types; specifying, on the basis of the structure specifying information, one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups; and calculating, on the basis of the structure specifying information, a number of pairs of the first atomic group and the second atomic group in which a mutual distance in the sequence between the first atomic group and the second atomic group is a predetermined distance.
- A feature of a molecular structure is reflected to a feature amount with high accuracy.
-
-
FIG. 1 is a diagram illustrating an example of a feature amount calculation device; -
FIG. 2 (i.e.,FIGs. 2A and 2B ) is a diagram for illustrating a feature amount of a first embodiment; -
FIG. 3 is a diagram illustrating an example of a hardware configuration of the feature amount calculation device; -
FIG. 4 is a diagram for illustrating a function of a feature amount calculation unit of the first embodiment; -
FIG. 5 (i.e.,FIGs. 5A and 5B ) is a diagram for illustrating structure specifying information; -
FIG. 6 is a flowchart for illustrating processing of the feature amount calculation device of the first embodiment; -
FIG. 7A is a first diagram for illustrating processing using the feature amount; -
FIG. 7B is a second diagram for illustrating processing using the feature amount; -
FIG. 7C is a third diagram for illustrating processing using the feature amount; -
FIGs. 8A to 8C are diagrams for illustrating a feature amount of a second embodiment; -
FIG. 9 is a diagram for illustrating a function of a feature amount calculation unit of the second embodiment; and -
FIG. 10 is a flowchart for illustrating processing of a feature amount calculation device of the second embodiment. - Hereinafter, embodiments are described with reference to the drawings.
FIG. 1 is a diagram illustrating an example of a feature amount calculation device. - A feature amount calculation program is installed in a feature
amount calculation device 100 of this embodiment, and a function of a featureamount calculation unit 110 is implemented by executing the feature amount calculation program. The featureamount calculation unit 110 is described later in detail. - The feature
amount calculation device 100 of this embodiment is connected to, for example, aninformation processing device 200 and the like via a network and the like. - When structure specifying information specifying a molecular structure is input from the
information processing device 200, the featureamount calculation device 100 of this embodiment calculates a feature amount indicating the molecular structure using the structure specifying information by the featureamount calculation unit 110 and outputs the same to theinformation processing device 200. -
Structure specifying information 10 is information specifying a structure of a molecule in which a plurality of atomic groups is cyclically sequenced. Thestructure specifying information 10 is described later in detail. - When the
structure specifying information 10 is input, the featureamount calculation device 100 acquires information indicating the number of atomic groups of a specific type arranged n atomic groups away for each of a plurality of atomic groups included in thestructure specifying information 10 and output the same to theinformation processing device 200. - The
information processing device 200 may include a learning unit and may perform machine learning using the feature amount output from the featureamount calculation device 100. For example, afeature amount 30 of this embodiment may be used to predict a substance amount required for drug discovery and the like. - In this embodiment, by expressing the feature amount in this manner, it is possible to reflect the fact that the specific atomic group is included in the molecule and the fact that the plurality of atomic groups is cyclically sequenced to the feature amount of the molecule. Therefore, according to this embodiment, the molecular structure may be reflected to the feature amount with high accuracy.
- Note that, in this embodiment, the atomic group indicates a partial structure in the molecule. For example, the partial structure (atomic group) of this embodiment is an amino acid. Furthermore, the molecule in this embodiment indicates a cyclic peptide. For example, the cyclic peptide is a molecule in which a plurality of amino acids is cyclically sequenced.
- Types of amino acids include, for example, aspartic acid, leucine, lysine and the like. In the following description, aspartic acid might be expressed as "asp", leucine as "leu", and lysine as "lys".
- Note that, in the example in
FIG. 1 , thestructure specifying information 10 is assumed to be input from theinformation processing device 200, but this is not limited thereto. Thestructure specifying information 10 may be directly input to the featureamount calculation device 100. - Furthermore, in the example in
FIG. 1 , thefeature amount 30 is assumed to be output to theinformation processing device 200, but this is not limited thereto. Thefeature amount 30 may be output to a device other than theinformation processing device 200. Furthermore, an output destination of thefeature amount 30 may be, for example, a learning device that performs machine learning using thefeature amount 30. - Hereinafter, the feature amount of this embodiment is described with reference to
FIG. 2. FIG. 2 (i.e.,FIGs. 2A and 2B ) is a diagram for illustrating the feature amount of a first embodiment.FIG. 2A illustrates an example of the cyclic peptide, andFIG. 2B illustrates an example of thefeature amount 30. - In this embodiment, for each amino acid being each atomic group included in a
cyclic peptide 20, the number of pairs of a certain amino acid and an amino acid located in an n-th position from the amino acid in a sequence of thecyclic peptide 20 is counted. Then, in this embodiment, a matrix in which a value of n is made a row, information indicating types of amino acids included in the pair is made a column, and the number of pairs is made a component is made the feature amount. - Here, in this specification, the value of n is referred to as a "distance" between the amino acids in the
cyclic peptide 20. In this case, thefeature amount 30 of this embodiment may be said to be information including, for each amino acid in thecyclic peptide 20, the value of n indicating a distance between a certain amino acid and another amino acid, and the number of other amino acids arranged at a distance n from a certain amino acid. For example, this information may be said to be information indicating a positional relationship between each amino acid and another amino acid in a sequence of amino acids included in thecyclic peptide 20. - Furthermore, the
feature amount 30 includes information indicating a type of a certain amino acid and a type of another amino acid located in an n-th position from the certain amino acid. - For example, the
feature amount 30 of this embodiment may be said to be the information indicating a positional relationship between each amino acid and another amino acid in the sequence of amino acids included in thecyclic peptide 20 and the information indicating a type of each amino acid and a type of another amino acid. - Note that, the another amino acid in this embodiment may be the same type of amino acid as the certain amino acid, or may be a different type of amino acid.
- The
cyclic peptide 20 illustrated inFIG. 2A has a structure in which leucine (leu), aspartic acid (asp), and lysine (lys) are cyclically sequenced. - Therefore, in this embodiment, each of the number of pairs (leu-leu) of leucine and leucine located in an n-th position from leucine, the number of pairs (leu-asp) of leucine and aspartic acid located in an n-th position from leucine, and the number of pairs (leu-lys) of leucine and lysine located in an n-th position from leucine is counted.
- For example, in this embodiment, the number of other amino acids arranged in the position at the distance n from leucine is counted. Here, the other amino acids include leucine, aspartic acid, and lysine.
- Moreover, in this embodiment, the number of pairs (asp-asp) of aspartic acid and aspartic acid located in an n-th position from aspartic acid, the number of pairs (lys-lys) of lysine and lysine located in an n-th position from lysine, and the number of pairs (lys-asp) of lysine and aspartic acid located in an n-th position from lysine are counted.
- For example, in this embodiment, the number of other amino acids located in the position at the distance n from aspartic acid and the number of other amino acids located in the position at the distance n from lysine are counted. Here, the other amino acids include leucine, aspartic acid, and lysine.
- For example, in the
cyclic peptide 20 illustrated inFIG. 2 , there is only onepair 21 as the "leu-leu" pair in which n = 1. For example, in thecyclic peptide 20, with reference to certain leucine, the total number of leucines arranged in a first position from reference leucine is one. - Furthermore, in the
cyclic peptide 20, there are threepairs cyclic peptide 20, with reference to certain leucine, the total number of lysines arranged in a first position from reference leucine is three. - Similarly, in the
cyclic peptide 20, there is one "leu-leu" pair in which n = 2. For example, in thecyclic peptide 20, with reference to certain leucine, the total number of leucines arranged in a second position from reference leucine is one. - Furthermore, in the
cyclic peptide 20, there is one "leu-leu" pair in which n = 3. For example, in thecyclic peptide 20, with reference to certain leucine, the total number of leucines arranged in a third position from reference leucine is one. - In this manner, in this embodiment, possible combinations (pairs) of types of amino acids are specified in a plurality of amino acids included in the
cyclic peptide 20. Then, in this embodiment, the matrix in which the types of the amino acids in the specified pair, the distance between the amino acids included in the pair, and the number of pairs for each distance are associated with one another is made thefeature amount 30. - Therefore, according to this embodiment, it is possible to create the feature amount specialized for the cyclic peptide, and it is possible to reflect the feature of the structure of the cyclic peptide to the feature amount with high accuracy. Therefore, according to this embodiment, it is possible to contribute to acceleration of drug discovery by machine learning using this feature amount.
- Hereinafter, a hardware configuration of the feature
amount calculation device 100 of this embodiment is described with reference toFIG. 3. FIG. 3 is a diagram illustrating an example of the hardware configuration of the feature amount calculation device. - The feature
amount calculation device 100 of this embodiment is a computer including aninput device 11, anoutput device 12, adrive device 13, anauxiliary storage device 14, amemory device 15, anarithmetic processing device 16, and aninterface device 17 mutually connected to one another via a bus B1. - The
input device 11 is a device for inputting various types of information, and is implemented by, for example, a keyboard, a pointing device and the like. Theoutput device 12 is for outputting various types of information, and is implemented by, for example, a display and the like. Theinterface device 17 includes a local area network (LAN) card and the like, and is used for connecting to a network. - The feature amount calculation program that implements the feature
amount calculation unit 110 included in the featureamount calculation device 100 is at least part of various programs that control the featureamount calculation device 100. The feature amount calculation program is provided by, for example, distribution of arecording medium 18, download from the network and the like. As therecording medium 18 recording the feature amount calculation program, it is possible to use various types of recording media such as a recording medium that optically, electrically, or magnetically records information such as a compact disk read only memory (CD-ROM), a flexible disk, and a magneto-optical disc, a semiconductor memory that electrically records information such as a ROM and a flash memory and the like. - When the
recording medium 18 that records the feature amount calculation program is set in thedrive device 13, the feature amount calculation program recorded in therecording medium 18 is installed in theauxiliary storage device 14 from therecording medium 18 via thedrive device 13. The feature amount calculation program downloaded from the network is installed in theauxiliary storage device 14 via theinterface device 17. - The
auxiliary storage device 14 stores the feature amount calculation program installed in the featureamount calculation device 100, and also stores various required files, data and the like by the featureamount calculation device 100. Thememory device 15 reads the feature amount calculation program from theauxiliary storage device 14 at startup of the featureamount calculation device 100, and stores the same. Then, thearithmetic processing device 16 implements various types of processing to be described later in accordance with the feature amount calculation program stored in thememory device 15. - Next, a function of the feature
amount calculation unit 110 of this embodiment is described with reference toFIG. 4. FIG. 4 is a diagram for illustrating the function of the feature amount calculation unit of the first embodiment. - The feature
amount calculation unit 110 of this embodiment includes aninput reception unit 111, apair specification unit 112, a pairnumber count unit 113, a featureamount acquisition unit 114, and anoutput unit 115. - The
input reception unit 111 receives various inputs to the featureamount calculation device 100. For example, theinput reception unit 111 receives thestructure specifying information 10 input to the featureamount calculation device 100. - The
pair specification unit 112 specifies a pair in which amino acids are at a specified distance with reference to thestructure specifying information 10. - The pair
number count unit 113 counts the number of specified pairs included in the cyclic peptide. - The feature
amount acquisition unit 114 acquires the feature amount in which the specified pair, the distance between the amino acids included in the pair, and the number counted by the pairnumber count unit 113 are represented as the matrix. - The
output unit 115 outputs the feature amount acquired by the featureamount acquisition unit 114 to an external device such as theinformation processing device 200. - Next, the
structure specifying information 10 of this embodiment is described with reference toFIG. 5. FIG. 5 (i.e.,FIGs. 5A and 5B ) is a diagram for illustrating the structure specifying information.FIG. 5A illustrates an example of the cyclic peptide, andFIG. 5B illustrates an example of the structure specifying information specifying the structure of the cyclic peptide. - The
structure specifying information 10 of this embodiment is information including a type of the amino acid included in thecyclic peptide 20 and a type of an amino acid next to a certain amino acid. - For example, the
cyclic peptide 20 includes three types of amino acids, which are aspartic acid, leucine, and lysine, as illustrated inFIG. 5A . Furthermore, thecyclic peptide 20 includes six amino acids. - The
structure specifying information 10 is a matrix indicating the sequence of the amino acid included in thecyclic peptide 20, and components in each column and each row indicate whether the amino acids indicated by each column and row are next to each other. - In this embodiment, in the
structure specifying information 10, in a case where a component in each column and each row is "0", this indicates that the amino acids indicated by each column and each row are not next to each other (distance n = 2 or longer), and in a case where a component in each column and each row is "1", this indicates that the amino acids indicated by each column and each row are next to each other (distance n = 1). - For example, in the
structure specifying information 10 inFIG. 5B , a component in first column and second row and a component in first column and sixth row are "1", and it is understood that aspartic acid is next to leucine and lysine in thecyclic peptide 20. Furthermore, in thestructure specifying information 10, a component in second column and second row and a component in second column and third row are "1", and it is understood that leucine arranged next to aspartic acid is also next to lysine in thecyclic peptide 20. - The
structure specifying information 10 of this embodiment may be created in advance by, for example, a user of theinformation processing device 200 and the like and input to the featureamount calculation device 100. - Next, processing of the feature
amount calculation device 100 of this embodiment is described with reference toFIG. 6. FIG. 6 is a flowchart for illustrating the processing of the feature amount calculation device of the first embodiment. - The feature
amount calculation unit 110 of the featureamount calculation device 100 of this embodiment receives an input of thestructure specifying information 10 by the input reception unit 111 (step S601). Subsequently, the featureamount calculation device 100 reads order of the sequence of amino acids from thestructure specifying information 10 by the pair specification unit 112 (step S602). - Subsequently, the
pair specification unit 112 specifies a certain type of amino acid (first atomic group), which is one of the amino acids included in the pair, from the sequence of amino acids indicated by the structure specifying information 10 (step S603). - In the following description, a type of the amino acid specified at step S603 is sometimes represented by "A", and the amino acid of the type specified at step S603 is sometimes represented by an amino acid A.
- Subsequently, the
pair specification unit 112 sets a value of n indicating the distance between the amino acid A and an amino acid paired with the amino acid A to "1" (step S604). - Subsequently, the
pair specification unit 112 specifies an amino acid (second atomic group) arranged n amino acids away from the amino acid A, from the sequence of amino acids indicated by the structure specifying information 10 (step S605). - In the following description, a type of the amino acid specified at step S605 is sometimes represented by "B", and the amino acid of the type specified at step S605 is sometimes represented by an amino acid B.
- For example, in the sequence of amino acids indicated by the
structure specifying information 10, thepair specification unit 112 specifies the amino acid of the type "B" arranged at a distance n from the amino acid of the type "A" with reference to the amino acid of the type "A". - Subsequently, the feature
amount calculation unit 110 counts the number of amino acids B n amino acids away from the amino acid A by the pair number count unit 113 (step S606). - For example, the pair
number count unit 113 counts the number of pairs including the amino acid A and the amino acid B located in an n-th position from the amino acid A. - Subsequently, the feature
amount calculation unit 110 determines whether the processing from step S603 to step S606 is performed until the value of n reaches a maximum value in the sequence of amino acids indicated by the structure specifying information 10 (step S607). The maximum value of n may be the number of amino acids included in the cyclic peptide indicated by thestructure specifying information 10. - At step S607, in a case where the value of n is not maximized, the feature
amount calculation unit 110 sets n = n + 1 (step S608) and returns to step S605. - At step S607, in a case where the value of n is maximized, the feature
amount calculation unit 110 determines whether the processing from step S604 to step S608 is performed for all the types of amino acids included in the structure specifying information 10 (step S609). - At step S609, in a case where the processing is not performed for all the types of amino acids, the feature
amount calculation unit 110 sets a type different from the type specified at step S603 to type "A" (step S610), and returns to step S604. - At step S609, in a case where the processing is performed for all the types of amino acids, the feature
amount calculation unit 110 acquires thefeature amount 30 in which the number acquired by the pairnumber count unit 113 is represented by a matrix by the feature amount acquisition unit 114 (step S611). - Subsequently, the feature
amount calculation unit 110 outputs the acquiredfeature amount 30 to an external device such as theinformation processing device 200 by the output unit 115 (step S612), and finishes the processing. - In this manner, the feature
amount calculation device 100 of this embodiment executes processing of receiving the structure specifying information of specifying each type of a plurality of atomic groups (amino acids) and the sequence of the plurality of atomic groups regarding the cyclic peptide, which is a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced. Furthermore, when the featureamount calculation device 100 receives the structure specifying information, this executes processing of specifying an optional first type (amino acid A) and an optional second type (amino acid B) out of the plurality of types, and processing of specifying one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups, on the basis of the structure specifying information. Moreover, the featureamount calculation device 100 executes processing of calculating the number of pairs of the first atomic group and the second atomic group in which a distance n therebetween in the sequence of the first atomic group and the second atomic group is a predetermined distance, on the basis of the structure specifying information. - It is possible to calculate mutual similarity of a plurality of cyclic peptides on the basis of the feature amount acquired by applying this embodiment, and apply the feature amount to processing of machine learning and the like.
FIG. 7A is a first diagram for illustrating processing using the feature amount.FIG. 7B is a second diagram for illustrating processing using the feature amount.FIG. 7C is a third diagram for illustrating processing using the feature amount. -
FIG. 7A ,FIG. 7B , andFIG. 7C illustrate a case where the feature amount is acquired by applying this embodiment regarding acyclic peptide 71, acyclic peptide 72, and acyclic peptide 73, respectively. - A
feature amount 31 illustrated inFIG. 7A is a feature amount acquired by applying this embodiment regarding thecyclic peptide 71 including two amino acids A and one amino acid B. Furthermore, afeature amount 32 illustrated inFIG. 7B is a feature amount acquired by applying this embodiment regarding thecyclic peptide 72 including three amino acids A and one amino acid B. Furthermore, afeature amount 33 illustrated inFIG. 7C is a feature amount acquired by applying this embodiment to thecyclic peptide 73 including two amino acids A and three amino acids B. - In this embodiment, the
information processing device 200 calculated the similarity of thecyclic peptides amount calculation device 100. For example, in this embodiment, the similarity of thecyclic peptides - In the example of
FIGs. 7A to 7C , the similarity between thecyclic peptide 71 and thecyclic peptide 72 was 0.77, the similarity between thecyclic peptide 71 and thecyclic peptide 73 was 0.51, and the similarity between thecyclic peptide 72 and thecyclic peptide 73 was 0.50. - In this manner, by using the feature amount to which this embodiment is applied, the similarity between the cyclic peptides may be compared and examined regardless of the size and the like of the cyclic peptide. Furthermore, the
information processing device 200 may perform machine learning on the basis of teacher data including attribute values of known cyclic peptides, and estimate the attribute values of thecyclic peptides amount calculation device 100. Furthermore, theinformation processing device 200 may perform machine learning on the basis of information regarding the feature amounts 31, 32, and 33 and the attribute values of thecyclic peptides information processing device 200 is a computer including an input device, an output device, a drive device, an auxiliary storage device, a memory device, an arithmetic processing device, and an interface device mutually connected to one another via a bus. - Hereinafter, a second embodiment is described with reference to the drawings. The second embodiment is different from the first embodiment in specifying whether a distance n between amino acids is made a distance in a first direction of a cycle in a cyclic molecule or a distance in a second direction opposite to the first direction. In the description of the second embodiment below, the difference from the first embodiment is described, and a component having a functional configuration similar to that in the first embodiment is denoted by a reference sign similar to the reference sign used in the description of the first embodiment, and the description thereof is omitted.
-
FIGs. 8A to 8C are diagrams for illustrating a feature amount of the second embodiment.FIGs. 8A and 8B illustrate a state in which, in a sequence of amino acids, an amino acid of a type "A", an amino acid of a type "B", and an amino acid of a type "C" are bonded by amide bond (-NHCO-). - In this case, since the amino acids are bonded to each other by the amide bond, a structure is different between a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in a clockwise direction on the drawing and a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in a counterclockwise direction on the drawing.
-
FIG. 8A illustrates an example of a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in the clockwise direction (direction of arrow Y1). In this case, an N-terminus of the amino acid A is bonded to a C-terminus of the amino acid B, and an N-terminus of the amino acid B is bonded to a C-terminus of the amino acid C. -
FIG. 8B illustrates a case where the amino acid A, the amino acid B, and the amino acid C are sequenced in this order in the counterclockwise direction (direction of arrow Y2). In this case, a C-terminus of the amino acid A is bonded to an N-terminus of the amino acid B, and a C-terminus of the amino acid B is bonded to an N-terminus of the amino acid C. - Therefore, a pair of the amino acid A and the amino acid B with a distance n = 1 in
FIG. 8A and a pair of the amino acid A and the amino acid B with a distance n = 1 inFIG. 8B have different structures. - In this embodiment, focusing on this point, when determining the pair of amino acids, it is specified whether the distance between the amino acids is a distance in the clockwise direction or a distance in the counterclockwise direction. For example, in this embodiment, together with the
structure specifying information 10, an input of direction specifying information specifying whether the distance between the amino acids is the distance in the clockwise direction or the distance in the counterclockwise direction is accepted. - Then, in this embodiment, a feature amount of a cyclic peptide is calculated on the basis of the
structure specifying information 10 and the direction specifying information. - Furthermore, in this embodiment, since the direction of the distance between the amino acids is specified by the direction specifying information, even if the amino acids included in the pairs are the same, they are counted as different pairs.
- A cyclic peptide 80 illustrated in
FIG. 8C includes the amino acid A, the amino acid B, the amino acid C, and two other amino acids. - In this case, in a case where the distance between the amino acids is made the distance in the clockwise direction, a pair of the amino acid A and the amino acid C is a pair of the amino acid A and the amino acid C two amino acids away from the amino acid A in the clockwise direction and a pair of the amino acid C and the amino acid A three amino acids away from the amino acid C in the clockwise direction.
- For example, in the cyclic peptide 80, in a case where the distance in the clockwise direction is made the distance between the amino acids, the pair including the amino acid A and the amino acid C is the pair of the amino acid A and the amino acid C with a distance n = 2 and the pair of the amino acid C and the amino acid A with a distance n = 3.
- In this manner, in this embodiment, even when the types of the amino acids included in the pairs are the same, the direction when specifying the distance is specified, so that these pairs are counted separately. Therefore, in this embodiment, the sequence of the amino acids may be expressed more accurately.
- Hereinafter, a functional configuration of a feature
amount calculation unit 110A of this embodiment is described with reference toFIG. 9. FIG. 9 is a diagram for illustrating a function of the feature amount calculation unit of the second embodiment. - The feature
amount calculation unit 110A of this embodiment includes aninput reception unit 111, a pair specification unit 112A, a pairnumber count unit 113, a featureamount acquisition unit 114, anoutput unit 115, and adirection specification unit 116. - The pair specification unit 112A specifies another amino acid located in a position at a distance n from a certain amino acid in a direction specified by the
direction specification unit 116 as an amino acid paired with the certain amino acid. - The
direction specification unit 116 specifies the direction when counting the distance between the amino acids in the cyclic peptide on the basis of the direction specifying information input from theinformation processing device 200 and the like. - Hereinafter, processing of the feature
amount calculation unit 110A of this embodiment is described with reference toFIG. 10. FIG. 10 is a flowchart for illustrating processing of the feature amount calculation device of the second embodiment. - The feature
amount calculation unit 110A of this embodiment receives an input of thestructure specifying information 10 by the input reception unit 111 (step S1001). Subsequently, the featureamount calculation unit 110A receives an input of the direction specifying information by the input reception unit 111 (step S1002). - Since processing from step S1003 to step S1005 in
FIG. 10 is similar to the processing from step S602 to step S604 inFIG. 6 , the description thereof is omitted. - Following step S1005, the feature
amount calculation unit 110A refers to the direction specifying information input at step S1002 by the pair specification unit 112A, specifies the amino acid arranged in a position at the distance n in the specified direction from the amino acid of the type "A" (step S1006), and shifts to step S1007. - Since the processing from step S1007 to step S1013 in
FIG. 10 is similar to the processing from step S606 to step S612 inFIG. 6 , the description thereof is omitted. - In this manner, in this embodiment, when specifying another amino acid located in the position at the distance n from a certain amino acid, the another amino acid at the distance n is specified in the specified direction. Therefore, according to this embodiment, the structure of the cyclic peptide formed by the sequence of the amino acids may be reflected in the feature amount with high accuracy.
Claims (5)
- A feature amount calculation program comprising instructions which, when executed by a computer, cause the computer to execute processing comprising:receiving structure specifying information that specifies a type of each of a plurality of atomic groups and a sequence of the plurality of atomic groups regarding a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced;specifying an optional first type and an optional second type in the plurality of types;specifying, on the basis of the structure specifying information, one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups; andcalculating, on the basis of the structure specifying information, a number of pairs of the first atomic group and the second atomic group in which a mutual distance in the sequence between the first atomic group and the second atomic group is a predetermined distance.
- The non-transitory computer-readable recording medium according to claim 1, further causing the computer to execute the process comprising:receiving direction specifying information that specifies either a first direction along a cycle of the cyclic molecule or a second direction along the cycle opposite to the first direction; andcalculating the distance in the direction specified by the direction specifying information.
- The non-transitory computer-readable recording medium according to claim 1 or 2, wherein each of the plurality of atomic groups is an amino acid, and the cyclic molecule is a cyclic peptide.
- A feature amount calculation method implemented by a computer, the feature amount calculation method comprising:receiving structure specifying information that specifies a type of each of a plurality of atomic groups and a sequence of the plurality of atomic groups regarding a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced;specifying an optional first type and an optional second type in the plurality of types;specifying, on the basis of the structure specifying information, one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups; andcalculating, on the basis of the structure specifying information, a number of pairs of the first atomic group and the second atomic group in which a mutual distance in the sequence between the first atomic group and the second atomic group is a predetermined distance.
- A feature amount calculation apparatus comprising:a reception unit of receiving structure specifying information that specifies a type of each of a plurality of atomic groups and a sequence of the plurality of atomic groups regarding a cyclic molecule in which the plurality of atomic groups classified into a plurality of types is cyclically sequenced;a first processing unit of specifying an optional first type and an optional second type in the plurality of types;a second processing unit of specifying, on the basis of the structure specifying information, one or a plurality of first atomic groups classified into the first type and one or a plurality of second atomic groups classified into the second type out of the plurality of atomic groups; anda third processing unit of calculating, on the basis of the structure specifying information, a number of pairs of the first atomic group and the second atomic group in which a mutual distance in the sequence between the first atomic group and the second atomic group is a predetermined distance.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022010118A JP2023108850A (en) | 2022-01-26 | 2022-01-26 | Feature amount calculation program, feature amount calculation method, and feature amount calculation device |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4220644A1 true EP4220644A1 (en) | 2023-08-02 |
Family
ID=83508783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP22198688.8A Pending EP4220644A1 (en) | 2022-01-26 | 2022-09-29 | Feature amount calculation program, feature amount calculation method, and feature amount calculation device |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230238076A1 (en) |
EP (1) | EP4220644A1 (en) |
JP (1) | JP2023108850A (en) |
CN (1) | CN116504332A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012509848A (en) | 2008-11-24 | 2012-04-26 | シドニー ウェスト エリア ヘルス サービス | Cyclic peptides and uses thereof |
WO2020058176A1 (en) * | 2018-09-21 | 2020-03-26 | Deepmind Technologies Limited | Machine learning for determining protein structures |
US20200110777A1 (en) * | 2017-06-28 | 2020-04-09 | Zhejiang University | System and Method of Graph Feature Extraction Based on Adjacency Matrix |
JP2020517290A (en) | 2017-04-26 | 2020-06-18 | フーナン ゾンセン ペプリブ バイオテック カンパニー リミテッド | Method for constructing peptide library |
WO2021106706A1 (en) * | 2019-11-28 | 2021-06-03 | フューチャー株式会社 | Amino acid sequence searching device, vaccine, amino acid sequence searching method, and amino acid sequence searching program |
-
2022
- 2022-01-26 JP JP2022010118A patent/JP2023108850A/en active Pending
- 2022-09-29 EP EP22198688.8A patent/EP4220644A1/en active Pending
- 2022-10-03 US US17/937,571 patent/US20230238076A1/en active Pending
- 2022-10-14 CN CN202211260719.XA patent/CN116504332A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012509848A (en) | 2008-11-24 | 2012-04-26 | シドニー ウェスト エリア ヘルス サービス | Cyclic peptides and uses thereof |
JP2020517290A (en) | 2017-04-26 | 2020-06-18 | フーナン ゾンセン ペプリブ バイオテック カンパニー リミテッド | Method for constructing peptide library |
US20200110777A1 (en) * | 2017-06-28 | 2020-04-09 | Zhejiang University | System and Method of Graph Feature Extraction Based on Adjacency Matrix |
WO2020058176A1 (en) * | 2018-09-21 | 2020-03-26 | Deepmind Technologies Limited | Machine learning for determining protein structures |
WO2021106706A1 (en) * | 2019-11-28 | 2021-06-03 | フューチャー株式会社 | Amino acid sequence searching device, vaccine, amino acid sequence searching method, and amino acid sequence searching program |
Non-Patent Citations (4)
Title |
---|
CARHART ET AL., J. CHEM. INF., 1985 |
JAN-JORIS DEVOGELAER ET AL: "Co-crystal Prediction by Artificial Neural Networks**", ANGEWANDTE CHEMIE, WILEY - V C H VERLAG GMBH & CO. KGAA, DE, vol. 132, no. 48, 18 September 2020 (2020-09-18), pages 21895 - 21902, XP071383816, ISSN: 0044-8249, DOI: 10.1002/ANGE.202009467 * |
TAJIMI ET AL., BMC BIOINFORMATICS, vol. 19, 2018, pages 527 |
X. YANG ET AL., COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, vol. 18, 2020, pages 153 - 161 |
Also Published As
Publication number | Publication date |
---|---|
JP2023108850A (en) | 2023-08-07 |
US20230238076A1 (en) | 2023-07-27 |
CN116504332A (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Daily | Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments | |
US9721186B2 (en) | Global signatures for large-scale image recognition | |
Dayringer et al. | Interactive program for visualization and modelling of proteins, nucleic acids and small molecules | |
Yilmaz et al. | NMR structure calculation for all small molecule ligands and non-standard residues from the PDB Chemical Component Dictionary | |
KR102299220B1 (en) | Method, apparatus and computer program for predicting between chemical compound and protein | |
EP4220644A1 (en) | Feature amount calculation program, feature amount calculation method, and feature amount calculation device | |
CN109346125B (en) | Rapid and accurate protein binding pocket structure alignment method | |
CN116051873A (en) | Key point matching method and device and electronic equipment | |
Sulimov et al. | Development of docking programs for Lomonosov supercomputer | |
Kutov et al. | Supercomputer docking: Investigation of low energy minima of protein-ligand complexes | |
Morgenstern | A space-efficient algorithm for aligning large genomic sequences | |
Laguerre et al. | MLPP: a program for the calculation of molecular lipophilicity potential in proteins | |
CN107180164B (en) | Template-based multi-domain protein structure assembly method | |
US7167819B1 (en) | Method of determining the three-dimensional shape of a macromolecule | |
JP2019125224A (en) | Boundary search test support apparatus and boundary search test support method | |
Sezerman et al. | Toward computational determination of peptide‐receptor structure | |
JP2000163572A (en) | Fingerprint collating device and method | |
Lenhof | New contact measures for the protein docking problem | |
CN109273048B (en) | Method for rapidly determining protein surface residues | |
モハマド,レザファイサル | Effect of Features Generated from Adjacent and Overlapped Segments in Protein Sequence Classification | |
Langmead et al. | 3D structural homology detection via unassigned residual dipolar couplings | |
Bayati et al. | High‐performance transformation of protein structure representation from internal to Cartesian coordinates | |
WO2024048460A1 (en) | Intermolecular interaction analysis device, intermolecular interaction analysis method, and program | |
Hahne et al. | Comment on ‘‘Algebraic analysis of physical and spurious states in Dyson boson mapping’’ | |
JP2020166747A (en) | Method of searching for cell membrane-permeable native peptides |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230905 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |