KR20140100190A

KR20140100190A - Apparatus and method for prediction of protein binding relationships

Info

Publication number: KR20140100190A
Application number: KR1020130013183A
Authority: KR
Inventors: 김대희
Original assignee: 한국전자통신연구원
Priority date: 2013-02-06
Filing date: 2013-02-06
Publication date: 2014-08-14

Abstract

Provided is an apparatus for predicting a protein binding relation. The apparatus for predicting the protein binding relation includes an arranging unit which generates a first structure related to a first protein and a second structure related to a second protein by modeling the first protein and the second protein used for binding and arranging the modeled results on a grid divided in a 3D space, and a searching unit which searches the binding relation between the first structure and the second structure.

Description

[0001] APPARATUS AND METHOD FOR PREDICTION OF PROTEIN BINDING RELATIONSHIPS [0002]

And more particularly to an apparatus and method for predicting the actual binding and binding pattern of the proteins on a computer without directly binding specific proteins.

Protein-protein binding and protein-ligand binding are very important factors in protein function, drug side effects, drug selection, and so on.

In the past, the results could be obtained through actual binding between proteins in the laboratory, but this is a time-consuming task.

With advances in computer technology, it is now possible to predict the binding relationships between proteins in a PC, but it still does not take much time.

The method of predicting the binding relationship is to use a model of each protein atom to determine the position with the lowest entropy value at each binding site considering all involved forces such as hydrogen bonding, van der Waals force, attraction force, repulsive force, electrostatic force, And how to find the optimal complementary bond shape by shaping proteins into three-dimensional structural shapes.

According to one aspect, a first protein and a second protein used for binding are modeled and arranged in a three-dimensional space divided grid to form a first structure associated with the first protein and a second structure associated with the second protein And a search unit for searching for a binding relationship between the first structure and the second structure.

According to an embodiment, the protein binding relationship predicting apparatus may further include an input unit for receiving at least one item related to the first protein and the second protein.

In this case, the at least one item may be at least one of atomic name, atomic number, atomic coordinates and radius of the first protein and the second protein.

According to one embodiment, the arrangement may generate the first structure and the second structure using the at least one item input from the input unit.

According to an embodiment, the search unit may search for a relationship between the first structure and the second structure by calculating a correlation coefficient between the first structure and the second structure.

According to one embodiment, the search unit may change the position of the remaining one of the first structure and the second structure while fixing the position of one of the first structure and the second structure, And calculate a correlation coefficient between the second structures.

According to one embodiment, the search unit designates a value of a boundary part of the first structure and the second structure to be 1 when searching for a combination of the first structure and the second structure, The inner value of one of the second structures may be designated as -1 and the other inner value may be designated as +1.

According to one embodiment, the searching unit can search for the coupling relation using a plurality of GPUs (Graphic Processing Units).

According to another aspect, there is provided a method for modeling a first protein and a second protein used for binding, a modeling step of modeling each of the first protein and the second protein in a grid divided in a three-dimensional space, Comprising the steps of: generating a first structure associated with the first protein and a second structure associated with the second protein; and searching a combination relationship of the generated first structure and the second structure / RTI >

According to one embodiment, the method further comprises receiving at least one item associated with the first protein and the second protein, wherein the arranging step comprises: using the at least one item to construct the first structure and the second protein, 2 structure.

According to an embodiment, the searching step may search for a relation of the first structure and the second structure by calculating a correlation coefficient between the first structure and the second structure.

According to one embodiment, the searching step may be performed while changing the position of the remaining one of the first structure and the second structure while fixing the position of one of the first structure and the second structure, And a correlation coefficient between the first structure and the second structure.

According to another aspect, a first protein and a second protein used for binding are modeled and arranged in a three-dimensional space divided grid to form a first structure associated with the first protein and a second structure associated with the second protein A calculation step of calculating a correlation coefficient between the first structure and the second structure, and a calculation step of calculating a correlation coefficient between the first structure and the second structure based on the calculation result, A method for predicting protein binding relationships is provided.

1 is a block diagram showing an apparatus for predicting a protein binding relationship according to an embodiment.
FIG. 2 is a diagram for explaining a process of modeling based on at least one item of an input protein according to an embodiment.
FIG. 3 is a view for explaining a process of voxelization of a protein structure according to an embodiment.
FIG. 4 is a diagram for explaining a protein binding relationship search process according to an embodiment.
5 is a flowchart illustrating a method of predicting a protein binding relationship according to an embodiment.
6 is a flowchart showing a method of predicting a protein binding relationship according to another embodiment.

In the following, some embodiments will be described in detail with reference to the accompanying drawings. However, it is not limited or limited by these embodiments. Like reference symbols in the drawings denote like elements.

Although the terms used in the following description have selected the general terms that are widely used in the present invention while considering the functions of the present invention, they may vary depending on the intention or custom of the artisan, the emergence of new technology, and the like.

Also, in certain cases, there may be terms chosen arbitrarily by the applicant for the sake of understanding and / or convenience of explanation, and in this case the meaning of the detailed description in the corresponding description section. Therefore, the term used in the following description should be understood based on the meaning of the term, not the name of a simple term, and the contents throughout the specification.

Throughout the specification, the first structure means a three-dimensional structure generated by arranging a first protein used for binding in a three-dimensional space corresponding to the coordinates of the first protein.

In addition, throughout the specification, the second structure represents the three-dimensional structure generated for the second protein to be bound to the first protein.

1 is a block diagram showing an apparatus 100 for predicting protein binding relationship according to an embodiment.

The protein binding relationship predicting apparatus 100 can predict whether the specific protein elements can actually bind to each other on the computer without binding directly to the specific protein elements to be bound,

The protein binding prediction apparatus 100 may include an input unit 110, an arrangement unit 120, and a search unit 130. However, the input unit 110 is an optional configuration. In some embodiments, the input unit 110 may be omitted.

The array part 120 models a first protein and a second protein used for binding and arranges them in a three-dimensional space divided grid to form a first structure associated with the first protein and a second structure associated with the second protein 2 structure.

The search unit 130 may search for a combination of the first structure and the second structure.

The search unit 130 may calculate a correlation coefficient between the first structure and the second structure to search for an association relationship between the first structure and the second structure.

In this case, while the position of one of the first structure and the second structure is fixed, the search unit 130 may change the position of the other of the first structure and the second structure, And calculate a correlation coefficient between the second structures.

The searching unit 130 designates the boundary of the first structure and the second structure to be 1 when searching for a combined relationship between the first structure and the second structure, The inner value of one of the structures is set to -1 and the inner value of the other one is set to +1.

The searching unit 130 may use at least one of MIC (Multi Intergrated Core) or a plurality of GPUs (Graphic Processing Unit) to search for a combination of the first structure and the second structure, The complexity of the process can be lowered.

The protein binding relationship predicting apparatus 100 according to another embodiment may further include an input unit 110. [

The input unit 110 may receive at least one item associated with the first protein and the second protein.

Here, the at least one item may be at least one of an atomic name, an atomic number, an atomic coordinate, and a radius of the first protein and the second protein.

In this case, the arrangement unit 120 may generate the first structure and the second structure using the at least one item input from the input unit 110. [

FIG. 2 is a diagram illustrating a process of modeling based on at least one item of an input protein according to an embodiment.

The protein binding prediction apparatus 100 can predict a binding relationship between proteins to be coupled based on a three-dimensional structural shape using a parallel processing technique combining a GPU and a multicore.

In predicting the binding relationship between proteins, the protein binding prediction apparatus 100 can generate a model of a protein used for binding through the arrangement unit 120.

In order to generate a model of a protein used in the binding, the protein binding prediction apparatus 100 may receive at least one item 200 related to the protein through the input unit 110.

Referring to FIG. 2, the at least one item 200 may include at least one of an atomic name, an atomic number, an atomic coordinate, and a radius of the protein.

The at least one item 200 may be input in the form of a table as shown in FIG. 2, or may be input directly to a user from among items included in the table. The embodiments described above are merely examples, It is not limited to an embodiment.

When the at least one item 200 associated with a protein used for binding is input, the arrangement unit 120 may generate the protein as a model 210 using the at least one item.

The generated protein model can be used for voxelization in which the protein is arranged in a three-dimensional structure, which can be performed as shown in FIG.

FIG. 3 is a view for explaining a process of voxelization of a protein structure according to an embodiment.

The protein binding relationship predicting apparatus 100 can allocate and arrange the model of the protein generated in FIG. 2 to the divided grid in the three-dimensional space.

In FIG. 3, 310 and 320 are diagrams showing the structure of the protein in a planar structure in order to more easily explain the process of arranging the pattern on the grid.

The protein model 310 generated based on at least one item inputted for the protein used for binding can be arranged in the grid space 320 according to the coordinates of the protein and displayed as 321. [

When the protein model is generated in a three-dimensional structure, the protein model of FIG. 2 is mapped to the grid 330 divided into three-dimensional space based on the coordinates (X coordinate, Y coordinate, Z coordinate) and 331, respectively.

The first structure and the second structure generated for the first protein and the second protein used for binding are used for searching the binding relationship between the proteins.

FIG. 4 is a diagram for explaining a protein binding relationship search process according to an embodiment.

The protein binding prediction apparatus 100 predicts a binding relationship between voxelized protein elements in a three-dimensional structure by performing a Fast Fourier Transform (FFT) operation using a GPU (Graphic Processing Unit) You can search for joins.

In this process, the parts that are repeatedly performed, such as the FFT operation of the GPU, can be performed more quickly by parallel processing using MIC (Multi Intergrated Core) or a plurality of GPUs.

In the case of CPU (Central Processing Unit) technology, single-core, dual-core, quad-core and hexa-core are evolving day by day. Particularly, in the simple FFT operation processing, it is possible to use a GPU to show a speed improvement of several to several hundred times as compared with a single CPU. Also, according to the addition of GPU, you can multiply the number of times by that number (several hundreds of times) * You can expect speed increase by the number of GPUs.

FIG. 4 shows an embodiment for exploring a protein binding relationship using two GPUs and a dual core.

The protein binding relationship predicting apparatus 100 determines a correlation coefficient between the three-dimensional structure (first structure and second structure) of each of the protein elements generated in FIG. 3 through the search unit 130 The complementary combination of the first structure and the second structure can be searched.

4, when the voxelization of the protein a of (1) is performed, the DFT (Discrete Fourier Transform) operation and (5) conjugation of the protein a and the protein b of (2) And the multiplication operation, (6) inverse FFT operation portions can be processed in parallel using the GPU. In particular, in the case of (5), multiplication between array elements can be performed at a high speed using a plurality of GPUs.

The searching unit 130 may calculate the correlation coefficient while changing the position of one of the first structure and the second structure while fixing the position of the other.

For example, when performing the calculation process of (3) to (7), the correlation coefficient between the protein a and the protein b located in various positions is calculated, And the position of the protein b for the case of having.

In this case, the protein b can be rotated by angles to repeat the processes of (3) to (7).

If the total number of revolutions is set, alpha = 0 to 360 degrees, beta = 0 to 180 degrees, and theta = 0 to 360 degrees, for example, if it is divided by 20 degrees, 18 * 9 * 18 = 2916 Rotation is required.

The apparatus 100 for predicting the protein binding relationship can reduce the number of iterations of the above 2916 to 2916 / (number of GPUs = number of cores).

For example, the first GPU of the two GPUs calculates half of the total number of rotations (for (i = 0; i <number of rotations / number of GPUs; i ++) , The second GPU can perform the other half (which can be handled by an operation of for (i = number of rotations / number of GPUs 0; i <number of rotations; i ++)).

When the number of GPUs is extended and n GPUs are mounted, the calculation procedures of (3) to (7) described above can be shared by the n GPUs and processed in parallel.

The searching unit 130 designates the value of the boundary part of the first structure and the second structure as 1 and sets the inner value of one of the first structure and the second structure as -1, An initial value is assigned to an internal value of +1, and the correlation coefficient between the first structure and the second structure is calculated using the initial value.

However, the initial value designation is only one embodiment, and it may be specified by various types of numbers rather than specific values.

5 is a flowchart illustrating a method of predicting a protein binding relationship according to an embodiment.

In step 510, the array 120 can model the first and second proteins used for binding.

The arrangement unit 120 may perform a modeling process for the first protein and the second protein using at least one item inputted in association with the first protein and the second protein.

In step 520, the arrangement 120 arranges the model of the first protein and the second protein on a partitioned grid in a three-dimensional space to form a first structure associated with the first protein and a second structure associated with the second protein A second structure can be created.

In operation 530, the search unit 130 may search for a combination of the first structure and the second structure.

6 is a flowchart showing a method of predicting a protein binding relationship according to another embodiment.

In step 610, the arrangement 120 models the first and second proteins used for binding and arranges them in a three-dimensional space divided grid to form a first structure associated with the first protein and a second structure associated with the second protein A second structure associated with the protein.

The arrangement unit 120 may generate the first structure and the second structure using at least one item related to the first protein and the second protein received from the input unit 110. [

In operation 620, the search unit 130 may calculate a correlation coefficient between the first structure and the second structure.

The search unit 130 may search for a combination of the first structure and the second structure based on the calculation result of the correlation coefficient at step 630.

The detailed description and various embodiments of each step are as described above with reference to Figs.

The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA) A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Dimensional arrangement of the first protein and the second protein used for binding and arranging the first protein and the second protein on a divided grid in a three-dimensional space to generate a first structure associated with the first protein and a second structure associated with the second protein, ; And
A search unit searching for a relationship of the first structure and the second structure;
Wherein the protein binding relationship predictor comprises: