CN103544406B - A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity - Google Patents

A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity Download PDF

Info

Publication number
CN103544406B
CN103544406B CN201310552472.3A CN201310552472A CN103544406B CN 103544406 B CN103544406 B CN 103544406B CN 201310552472 A CN201310552472 A CN 201310552472A CN 103544406 B CN103544406 B CN 103544406B
Authority
CN
China
Prior art keywords
cell
cnn1
sequence
boss
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310552472.3A
Other languages
Chinese (zh)
Other versions
CN103544406A (en
Inventor
纪禄平
郝德水
周龙
黄青君
尹力
杨洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201310552472.3A priority Critical patent/CN103544406B/en
Publication of CN103544406A publication Critical patent/CN103544406A/en
Application granted granted Critical
Publication of CN103544406B publication Critical patent/CN103544406B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method that one-dimensional cell neural network detects DNA sequence dna similarity, first design one-dimensional cell neural network basic model, then utilize the antithesis cell neural network of a this Model Construction one dimension; By two DNA sequence dna information to be detected, initialization is carried out to this network again, in network operation process, record the cell state in each moment network and output, form optimum output matrix accordingly; Again the element in optimum output matrix is traveled through, thus determine best align to path; Finally according to align to path, space update is carried out so that by two sequence global alignments to two sequences; After sequence alignment, then calculate its overall similarity according to the base quantity of aliging and total base quantity.Show through test comparison, the present invention detects accurately on basis in guarantee, for the DNA sequence dna that length is longer, obviously has required computing time reduce greatly than existing method.

Description

A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity
Technical field
The invention belongs to the DNA sequence dna similarity detection technique field in bioinformatics, more specifically say, relate to a kind of method that one-dimensional cell neural network detects DNA sequence dna similarity, for the detection to DNA double sequence overall situation similarity.
Background technology
20 century 70s, the appearance of DNA sequencing method produces many biomolecular sequence data, and these data just increase rapidly with geometry speed, and it has become human practice and has produced the maximum field of data volume.Draw successfully at human genomic sequence figure, people start again various vegeto-animal genome project in succession.But data are also not equal to knowledge and information, and the task of studying and processing these data is more and more heavier, and we must find method efficiently and solve this kind of problem.
DNA is connected by base pairing with double chain form existence, and the pairing of base exists specificity, and the bases G always on a chain is connected with the base C on another chain, and the base T on a chain is connected with the base A on another chain.DNA nucleotide sequence is exactly the character string be made up of this 4 fundamental elements.Therefore, in fact DNA sequence dna coupling is exactly the similarity between the sequence that is made up of any one character in these 4 characters of ACGT of coupling two.Sequence alignment is exactly one and finds maximum coupling between two or more pieces sequence by certain specific algorithm.The process of coupling base number is excavated between sequence in structure or similarity functionally by the method for sequence alignment, this is for the searching algorithm of biometric database, and the structure prediction of protein or DNA, evolutionary analysis and functional analysis have very important practice significance.
According to the difference of the number of the biological sequence of comparing, sequence alignment method can be divided into pairwise comparison method and Multiple Sequence Alignment Method.Pairwise comparison method can be divided into three kinds again, is dot matrix, dynamic programming algorithm and heuritic approach (BLAST algorithm, fasta algorithm etc.) respectively.Multiple Sequence Alignment is a np complete problem, and be a still unsolved difficult problem, it can be divided into following several: precise alignment algorithm, iteration alignment algorithm, progressive alignment algorithm, heuritic approach and the alignment algorithm etc. based on graph theory.
In pairwise comparison method, dot matrix is that first McIntyre and Gibbs in 1970 put forward, be the advantage of the most basic a kind of visual pairwise comparison method points tactical deployment of troops be directly to find all possible coupling between two sequences, but the comparison result that it obtains is accurate not, and be only applicable to two shorter sequences, in the face of the biological sequence data that nowadays data volume is huge obviously also exists defect.The basic thought of dynamic programming algorithm is exactly that PROBLEM DECOMPOSITION to be solved is become several subproblems, first respectively the solution of subproblem is solved out, then store the solution of subproblem and avoid double counting, finally by the solution of subproblem being combined the solution just obtaining former problem.Adopt dynamic programming algorithm to solve biological sequence alignment problem and can obtain optimum comparison result under given scoring systems, if but problem amount is large especially, so its computing velocity can be slowly, and the selection of this method to parameter is very sensitive, the minor modifications of parameter also can make the result of comparison have larger change.The dynamic programming algorithm solving biological sequence alignment problem mainly contains a kind of global sequence alignment algorithm-Needleman-Wunsch algorithm (being called for short NW algorithm) proposed by Needleman and Wunsch for 1970, it is a kind of with solving the Smith-Waterman algorithm (referred to as SW algorithm) found and have local similarity region that Smith and Waterman proposed in 1981, within 1985, first proposed and the heuritic approach of a kind of fasta algorithm pairwise comparison improved in 1988 by Pearsom and Lipman, the heuritic approach of a kind of BLAST algorithm pairwise comparison that nineteen ninety is proposed by people such as Altschul.
And traditional alignment algorithm is when the pairwise comparison problem that solution data volume is larger, required time and storage space along with sequence number and sequence length growth exponentially level increase, therefore, we need to study the better method upgraded to improve the search speed of algorithm, reduce computing time.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide a kind of one-dimensional cell neural network to detect the method for DNA sequence dna similarity, to reduce computing time.
For achieving the above object, the present invention's one-dimensional cell neural network detects the method for DNA sequence dna similarity, it is characterized in that, comprises the following steps:
(1), one-dimensional cell neural network basic model is designed
Carry out catenation by unicellular, each cell sequence number is used successively " ..., i-1, i, i+1 ... " represent, alphabetical i wherein represents the arrangement sequence number of cell;
In this basic model, cell state differential equation group represents:
{ C ∂ x i ( t ) ∂ t = - x i ( t ) R x + A ⊗ Y i ( t ) + B ⊗ U i ( t ) + I i y i ( t ) = f ( x i ( t ) ) - - - ( 1 )
Wherein, in system of equations (1), t represents the time, x irepresent the state of cell i, A is feedback template, and B is Control architecture, I i, R xthree constants respectively with C, f (x i(t)) be the output modulating function of cell state; Y it () represents that cell i comprises the neighborhood output matrix of oneself, U it () represents that cell i comprises the neighborhood input of oneself, be expressed as:
Y i ( t ) = y i - 1 ( t ) y i ( t ) y i ( t + 1 ) U i ( t ) = u i - 1 ( t ) u i ( t ) u i + 1 ( t )
Y i-1(t), y i(t) and y i+1t () represents the output of cell i-1, i and i+1 respectively, u i-1, u iand u i+1represent the cell input that cell i, i-1 and i+1 receive, the convolution algorithm of representing matrix;
Cell exports modulating function f (x i(t)) concrete form be:
y i ( t ) = f ( x i ( t ) ) = 1 2 ( | x i ( t ) + 1 | - | x i ( t ) - 1 | ) - - - ( 2 )
(2) the symmetrical cell neural network of one dimension, is built
With the one-dimensional cell neural network model that step (1) designs, first generate boss respectively and net CNN1 and from subnet CNN2, then build an one dimension antithesis cell neural network by the two:
In one dimension antithesis cell neural network, it is fixed that boss nets CNN1, then can net CNN1 along boss to move in parallel from subnet CNN2, time, t often increased by 1, move from subnet CNN2 and move a step, and equal boss from the distance of subnet CNN2 movement at every turn and net distance CNN1 between two connected cells; Boss net CNN1 by cell 0,1,2 ..., m-1 composition, from subnet CNN2 by cell 0,1,2 ..., n composition;
In one dimension antithesis cell neural network, make C=1, R x=1, then in order to represent that the differential equation of cell state is reduced to:
x i ( t + 1 ) = Σ l ∈ L ( i ) A ⊗ Y l ( t ) + Σ l ∈ L ( i ) B ⊗ U l ( t ) + I i - - - ( 3 )
In formula (3), L (i) represents that cell i nets CNN1 boss, namely comprises cell i oneself from the cell neighborhood subnet CNN2, the previous cell i-1 that boss nets CNN1, a rear cell i+1 and cell i be at cell corresponding from subnet CNN2, l then represents l cell in the cell neighborhood of cell i, i.e. l ∈ L (i);
During time T=t+1, the output y of cell i i(t+1) corresponding be redefined into:
y i ( t + 1 ) = f ( x i ( t + 1 ) ) = 1 2 ( | x i ( t + 1 ) + 1 | - | x i ( t + 1 ) - 1 | ) - - - ( 4 )
(3), the one dimension antithesis cell neural network that utilizes step (2) to build, the DNA sequence dna of two similarities to be detected is carried out to the base alignment of the overall situation;
3.1), the initialization of antithesis cellular network
Two DNA base sequence S to be matched 1and S 2base quantity be respectively K 1and K 2, the base code of base sequence is expressed as S 1(k 1) and S 2(k 2), and 0≤k 1≤ K 1-1 and 0≤k 2≤ K 2-1, then boss nets CNN1 and is initialized to K respectively from the cell quantity of subnet CNN2 1+ 1 and K 2+ 1, i.e. cell quantity m=K 1+ 1 and n=K 2+ 1;
Use u 1(i) and u 2j () represents that boss nets i-th cell input of CNN1 and the jth cell input from subnet CNN2, then meet 0≤i≤K 1and 0≤j≤K 2, boss nets CNN1 and the cell input of each cell carries out assignment by formula (5) and formula (6) respectively from subnet CNN2:
Wherein, symbol " * " represents that the input u of cell is set to null value;
Another constant parameter initialization assignment that boss nets in CNN1 is I i=2; Boss nets the feedback template Α that uses in CNN1 and Control architecture B and is initialized as following two constant matricess respectively:
A=[010] and B=[01-1]
In addition, also boss to be netted cell i, i=0 in CNN1,1 .., K 1, original state and t=0 time be set to x respectively i(0)=0, y i(0)=0; Boss net CNN1 the 0th cell and from subnet CNN2 K 2individual cell alignment;
3.2), calculate boss iteratively and net cell in CNN1 in the state in each moment and output
Time, t often increased by 1, and the arrangement of netting CNN1 from subnet CNN2 along boss needs to increase direction and moves and move a step;
CNN1 is netted to boss, if that cell j immediately below cell i lexist, then that cell j choosing its 3 neighborhood cells and cell i-1, i and be in from subnet CNN2 immediately below i l; At time t, t=1,2 ..., during m+n-1, when time t and cell sequence number i satisfies condition 1≤t≤m+n-1 and 1≤i≤m+1 simultaneously, calculate the optimum state of each cell respectively export with optimum and if that cell j immediately below cell i ldo not exist, then not calculate optimum cell state export with optimum value;
Described optimum state export with optimum respectively by following formulae discovery:
x i ‾ ( t ) = m a x { x i - 1 ( t - 2 ) + 2 I i , x i - 1 ( t - 1 ) - I i , x i ( t - 1 ) - I i } - - - ( 7 )
Wherein, function max (...) represents the maximal value asked in input parameter, x i-1(t-2), x i-1and x (t-1) i(t-1) all calculate by formula (3);
3.3) the optimum output matrix of cell, is formed
According to step 3.2) calculate all cells that boss nets CNN1 each moment optimum state and optimumly to export, then according to the 1st be classified as cell 1 from t=1 to n moment optimum export, the 2nd be classified as cell 2 from t=2 to 1+n moment optimum export ..., m is classified as cell m moment optimum exports and obtains the final optimum cell output matrix S of master network CNN1 from t=m to m+n y;
3.4), global alignment is carried out to the base of two DNA sequence dnas
According to step 3.3) the optimum output matrix S that obtains y, from the element in the matrix upper left corner, from left to right, Ergodic Matrices from top to bottom, determine optimum output matrix S yintermediate value is the matrix element position of 1, and each element determined is linked in sequence the align to path P forming base;
According to the base alignment path P determined, point three kinds of situations are to DNA base sequence S 1and S 2operate: from first element 1, if under the next one 1 is positioned at it, then at sequence S 1current location insert symbol " * "; If next element 1 is positioned on the right side of it, then at sequence S 2current location insert symbol " * "; If next 1 is just positioned at its bottom-right location, then not to sequence S 1and S 2current location do any operation.
Process S yfirst element after, by aforesaid three kinds of situation continued process second element, until output matrix S ywhole values be 1 element all processed complete, this time series S 1and S 2complete global alignment by putting in order of base;
(4), two DNA base sequence S are calculated 1and S 2overall similarity
Defined nucleotide sequence S 1and S 2overall similarity be SC (S 1, S 2), then the overall similarity of these two DNA base sequences calculates by following formula:
S C ( S 1 , S 2 ) = 2 × N m a t c h L e n ( S 1 ) + L e n ( S 2 ) × 100 % - - - ( 9 )
Wherein, symbol N matchrepresent two DNA base sequence S 1and S 2after global sequence's alignment, the base-pair quantity that the match is successful, Len (S 1) and Len (S 2) represent sequence S respectively 1and S 2physical length.
Goal of the invention of the present invention is achieved in that
The present invention's one-dimensional cell neural network detects the method for DNA sequence dna similarity, first designs one-dimensional cell neural network basic model, then utilizes the antithesis cell neural network of a this Model Construction one dimension; By two DNA sequence dna information to be detected, initialization is carried out to this network again, in network operation process, record the cell state in each moment network and output, form optimum output matrix accordingly; Again the element in optimum output matrix is traveled through, thus determine best align to path; Finally according to align to path, space update is carried out so that by two sequence global alignments to two sequences; After sequence alignment, then calculate its overall similarity according to the base quantity of aliging and total base quantity.Show through test comparison, the present invention detects accurately on basis in guarantee, for the DNA sequence dna that length is longer, obviously has required computing time reduce greatly than existing method.
Accompanying drawing explanation
Fig. 1 is the one-dimensional cell neural network basic model schematic diagram that the present invention relates to;
Fig. 2 is the structural drawing of individual cells in the one-dimensional cell neural network basic model shown in Fig. 1;
Fig. 3 is the one dimension antithesis cell coupled neural network schematic diagram that the present invention builds;
Fig. 4 is the base alignment process flow diagram of the overall situation in the present invention;
Fig. 5 is the location diagram of scheme of the present invention two subnets when original state (t=0);
Fig. 6 is the serial connection rule schema of cell state matrix of the present invention and output matrix;
Fig. 7 is three kinds of methods " base sum-computing time " curve comparison figure.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.
In this implementation column, carry out catenation by unicellular, the one-dimensional cell neural network basic model of design as shown in Figure 1.Each cell sequence number is used successively " ..., i-1, i, i+1 ... " represent, alphabetical i wherein represents the arrangement sequence number of cell.
In one-dimensional cell neural network basic model, the structure of individual cells as shown in Figure 2, wherein uses x irepresent the state of cell i, y i, y i-1and y i+1represent that the feedback received from its neighborhood cell i, i-1 and i+1 from cell i exports respectively, u i, u i+1and u i-1represent the cell input that cell i receives from its neighborhood cell i, i-1 and i+1, A is feedback template, and B is Control architecture, I i, R xthree constants respectively with C, f (x i) be the output modulating function of cell state.
In the present embodiment, based on one-dimensional cell neural network model as shown in Figure 1, first generate boss respectively and net CNN1 and from subnet CNN2, then build an one dimension antithesis cell neural network as shown in Figure 3 by the two.
As shown in Figure 3, in one dimension antithesis cell neural network, it is fixed that boss nets CNN1, then can move in parallel along CNN1 from subnet CNN2.As t=1, mobile from subnet CNN2, t often increases by 1, moves move a step from subnet CNN2, and moves from subnet CNN2 the distance moved a step at every turn equal boss and net distance CNN1 between two connected cells.CNN1 by cell 0,1,2 ..., m-1 composition, represent cell sequence number with i, then have 0≤i≤m-1.CNN2 by cell 0,1,2 ..., n composition, j represents cell sequence number, then have 0≤j≤n-1.In Fig. 3, character u and y (being designated as cell sequence number under in figure) still represents the input and output of cell, and solid arrow represents to there is connection input between cell, and dotted arrow represents there is not connection input between cell.
As shown in Figure 3, in formula (3), L (i) represents that cell i nets CNN1 boss, namely comprises cell i oneself from the cell neighborhood subnet CNN2, the previous cell i-1 that boss nets CNN1, a rear cell i+1 and cell i at cell corresponding from subnet CNN2, be in figure 3 and add four black cells.Wherein, l then represents l cell in the cell neighborhood of cell i, i.e. l ∈ L (i);
In the present embodiment, utilize the one dimension antithesis cell neural network that step (2) builds, the DNA sequence dna of two similarities to be detected is carried out to the base alignment of the overall situation, corresponding flow process as shown in Figure 4.Be a moment export the process generating optimum output matrix and global alignment, identical with summary of the invention, repeat no more.
To in its process, during the initialization of antithesis cellular network, boss nets CNN1 and from the relation of subnet CNN2 as shown in Figure 5, boss net CNN1 the 0th cell and from subnet CNN2 K 2individual cell alignment.
According to step 3.2) calculate all cells that boss nets CNN1 in the state in each moment and output, by being linked in sequence shown in as shown in Figure 6, obtain the final optimum cell output matrix S that boss nets CNN1 y.
Optimum cell output matrix S yconcatenated sequences is:
Example
The DNA base sequence getting necessary being in ncbi database below in conjunction with two is respectively further described specifically implementation process of the present invention.
The DNA base sequence identifier chosen is S62051 and NM_008134 respectively.In order to show conveniently, only choose two sequence fragments wherein to carry out implementation process explanation, the details of these two sequence fragments is as shown in table 1:
DNA base sequence Base number Series fragment code
S 1 8 AAGCTCTG
S 2 6 CAGCAT
Table 1
Two DNA sequencing fragment S as shown in table 1 1and S 2, the base quantity that they comprise is respectively K 1=8 and K 2=6, press 4 sub-steps of step 3 subordinate respectively to S 1and S 2align, as follows respectively:
1. by the step 3.1 of step (3)), initialization is carried out to the one dimension antithesis cell neural network of design:
In m=8+1=9, n=6+1=7, CNN1 c=1, Rx=1, I i=2.
u 1(0)=*,u 1(1)=S1(0)=A、u 1(2)=S1(1)=A,…,u 1(8)=S1(7)=G;
u 2(0)=*,u 2(1)=S2(0)=C、u 2(2)=S2(1)=A,…,u 2(6)=S2(5)=T;
2. by the step 3.2 of step (3)), antithesis cell neural network iteration is run, and calculates as time t=1 respectively according to formula (7) and (8), 2,3 ... when 14, boss nets the optimum state of each cell in CNN1 and optimum output;
3. by the step 3.3 of step (3)), the optimum cell in each moment is exported serial connection and forms optimum output matrix S y, the S obtained yas shown in table 2.
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 1 1 1
Table 2
4. by the step 3.4 of step (3)), to DNA base sequence S 1and S 2base carry out global alignment, later two base sequences of global alignment are as shown in table 3.
* A A G C * T C T G
C A * G C A T * * *
Table 3
5. by the formula (9) in step (4), sequence of calculation S 1and S 2overall similarity.According to the global alignment result (shown in table 3) of sequence, N can be obtained match=4, Len (S 1)=8, and Len (S 1)=8, the then overall similarity SC (S of these two sequences 1, S 2)=(2 × 4) ÷ (8+6)=57.14%
In the present embodiment, method of the present invention, a large amount of true DNA sequence dnas also in ncbi database has carried out implement checking, and contrasts with the MILP method of prior art and SPA method respectively.The leading indicator contrasted in implementation process is the sequence overall situation similarity that computing time of each scheme and scheme obtain, and detailed contrast situation is as shown in table 4 and table 5.
Table 4
Table 5
Wherein, table 4 is sequence similarity contrast tables of the present invention and prior art, and the computing time of the present invention and prior art table 5 contrast (unit: millisecond)
As shown in table 4 and table 5, two DNA base sequence length sums that from left to right each row are corresponding increase gradually, and two tables respectively show the similarity and required computing time that the 6 groups of DNA base sequences come from ncbi database differently calculate.From table 4, the similarity database of display can be found out for part short data records (as S62051: length 226 and NM_008134: length 625), it is consistent that the computing method introduced in the present invention obtain similarity and other two kinds of methods, but along with the increase (as NG_009301: length 42028 and NM_000405: length 3690) of sequence summation length, the similarity that the method in the present invention obtains is slightly higher than other two kinds of methods.Its main cause is exactly method of the present invention when aliging DNA sequence dna, can obtain the base pairs that more aligns than other two kinds of methods.As can be seen from table 5 display computing time data also, for short data records to (as S62051: length 226 and NM_008134: length 625), the computing time of three kinds of methods does not have marked difference, and only than other, two kinds of methods are few uses about 10 milliseconds for method of the present invention.But along with the increase (as NG_009301: length 42028 and NM_000405: length 3690) of sequence summation length, computing time required for the present invention is about 33% of SPA method, is also only about 45% of MILP method required time.
In order to show computing time and sequence length and between Changing Pattern, depict these three kinds of methods " base sum-computing time " curve separately respectively." base sum-computing time " correlation curve shown in Fig. 7 shows, when the length summation of two sequences is less (as being less than 5000), three curves overlap substantially, mean that the computing time now needed for three kinds of methods there is no significant difference.When the length summation of two DNA sequence dnas continues to increase, time curve corresponding to SPA and MILP method can precipitously climb, and meanwhile time curve of the present invention also can climb, but its steep climbed is more a lot of gently than another two curves.As can be seen here, when two sequences total length and larger time, obviously have than SPA and MILP method computing time required for the present invention and reduce greatly.
Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims (1)

1. detect a method for DNA sequence dna similarity by one-dimensional cell neural network, comprise the following steps:
(1), one-dimensional cell neural network basic model is designed
Carry out catenation by unicellular, each cell sequence number is used successively " ..., i-1, i, i+1 ... " represent, alphabetical i wherein represents the arrangement sequence number of cell;
In this basic model, cell state differential equation group represents:
Wherein, in system of equations (1), t represents the time, x irepresent the state of cell i, A is feedback template, and B is Control architecture, I i, R xthree constants respectively with C, f (x i(t)) be the output modulating function of cell state; Y it () represents that cell i comprises the neighborhood output matrix of oneself, U it () represents that cell i comprises the neighborhood input of oneself, be expressed as:
Cell exports modulating function f (x i(t)) concrete form be:
(2) the symmetrical cell neural network of one dimension, is built
With the one-dimensional cell neural network model that step (1) designs, first generate boss respectively and net CNN1 and from subnet CNN2, then build an one dimension antithesis cell neural network by the two:
In one dimension antithesis cell neural network, it is fixed that boss nets CNN1, then can net CNN1 along boss to move in parallel from subnet CNN2, time, t often increased by 1, move from subnet CNN2 and move a step, and equal boss from the distance of subnet CNN2 movement at every turn and net distance CNN1 between two connected cells; Boss net CNN1 by cell 0,1,2 ..., m-1 composition, from subnet CNN2 by cell 0,1,2 ..., n composition;
In one dimension antithesis cell neural network, make C=1, R x=1, then in order to represent that the differential equation of cell state is reduced to:
In formula (3), L (i) represents that cell i nets CNN1 boss, namely comprises cell i oneself from the cell neighborhood subnet CNN2, the previous cell i-1 that boss nets CNN1, a rear cell i+1 and cell i be at cell corresponding from subnet CNN2, l then represents l cell in the cell neighborhood of cell i, i.e. l ∈ L (i);
During time T=t+1, the output y of cell i i(t+1) corresponding be redefined into:
(3), the one dimension antithesis cell neural network that utilizes step (2) to build, the DNA sequence dna of two similarities to be detected is carried out to the base alignment of the overall situation;
3.1), the initialization of antithesis cellular network
Two DNA base sequence S to be matched 1and S 2base quantity be respectively K 1and K 2, the base code of base sequence is expressed as S 1(k 1) and S 2(k 2), and 0≤k 1≤ K 1-1 and 0≤k 2≤ K 2-1, then boss nets CNN1 and is initialized to K respectively from the cell quantity of subnet CNN2 1+ 1 and K 2+ 1, i.e. cell quantity m=K 1+ 1 and n=K 2+ 1;
Use u 1(i) and u 2j () represents that boss nets i-th cell input of CNN1 and the jth cell input from subnet CNN2, then meet 0≤i≤K 1and 0≤j≤K 2, boss nets CNN1 and the cell input of each cell carries out assignment by formula (5) and formula (6) respectively from subnet CNN2:
Wherein, symbol " * " represents that the input u of cell is set to null value;
Another constant parameter initialization assignment that boss nets in CNN1 is I i=2; Boss nets the feedback template Α that uses in CNN1 and Control architecture B and is initialized as following two constant matricess respectively:
A=[010] and B=[01-1];
In addition, also boss to be netted cell i, i=0 in CNN1,1 .., K 1, original state and t=0 time be set to x respectively i(0)=0, y i(0)=0; Boss net CNN1 the 0th cell and from subnet CNN2 K 2individual cell alignment;
3.2), calculate boss iteratively and net cell in CNN1 in the state in each moment and output
Time, t often increased by 1, and the arrangement of netting CNN1 from subnet CNN2 along boss needs to increase direction and moves and move a step;
CNN1 is netted to boss, if that cell j immediately below cell i lexist, then that cell j choosing its 3 neighborhood cells and cell i-1, i and be in from subnet CNN2 immediately below i l; At time t, t=1,2 ..., during m+n-1, when time t and cell sequence number i satisfies condition 1≤t≤m+n-1 and 1≤i≤m+1 simultaneously, calculate the optimum state of each cell respectively export with optimum and if that cell j immediately below cell i ldo not exist, then not calculate optimum cell state export with optimum value;
Described optimum state export with optimum respectively by following formulae discovery:
Wherein, function max (...) represents the maximal value asked in input parameter, x i-1(t-2), x i-1and x (t-1) i(t-1) all calculate by formula (3);
3.3) the optimum output matrix of cell, is formed
According to step 3.2) calculate all cells that boss nets CNN1 each moment optimum state and optimumly to export, then according to the 1st be classified as cell 1 from t=1 to n moment optimum export, the 2nd be classified as cell 2 from t=2 to 1+n moment optimum export ..., m is classified as cell m moment optimum exports and obtains the final optimum cell output matrix S of master network CNN1 from t=m to m+n y;
3.4), global alignment is carried out to the base of two DNA sequence dnas
According to step 3.3) the optimum output matrix S that obtains y, from the element in the matrix upper left corner, from left to right, Ergodic Matrices from top to bottom, determine optimum output matrix S yintermediate value is the matrix element position of 1, and each element determined is linked in sequence the align to path P forming base;
According to the base alignment path P determined, point three kinds of situations are to DNA base sequence S 1and S 2operate: from first element 1, if under the next one 1 is positioned at it, then at sequence S 1current location insert symbol " * "; If next element 1 is positioned on the right side of it, then at sequence S 2current location insert symbol " * "; If next 1 is just positioned at its bottom-right location, then not to sequence S 1and S 2current location do any operation;
Process S yfirst element after, by aforesaid three kinds of situation continued process second element, until output matrix S ywhole values be 1 element all processed complete, this time series S 1and S 2complete global alignment by putting in order of base;
(4), two DNA base sequence S are calculated 1and S 2overall similarity
Defined nucleotide sequence S 1and S 2overall similarity be SC (S 1, S 2), then the overall similarity of these two DNA base sequences calculates by following formula:
Wherein, symbol N matchrepresent two DNA base sequence S 1and S 2after global sequence's alignment, the base-pair quantity that the match is successful, Len (S 1) and Len (S 2) represent sequence S respectively 1and S 2physical length.
CN201310552472.3A 2013-11-08 2013-11-08 A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity Expired - Fee Related CN103544406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310552472.3A CN103544406B (en) 2013-11-08 2013-11-08 A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310552472.3A CN103544406B (en) 2013-11-08 2013-11-08 A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity

Publications (2)

Publication Number Publication Date
CN103544406A CN103544406A (en) 2014-01-29
CN103544406B true CN103544406B (en) 2016-03-23

Family

ID=49967850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310552472.3A Expired - Fee Related CN103544406B (en) 2013-11-08 2013-11-08 A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity

Country Status (1)

Country Link
CN (1) CN103544406B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224826A (en) * 2015-09-07 2016-01-06 云南大学 A kind of DNA sequence dna similarity analysis method based on S-PCNN and huffman coding
CN107291690B (en) * 2017-05-26 2020-10-27 北京搜狗科技发展有限公司 Punctuation adding method and device and punctuation adding device
CN110222745B (en) * 2019-05-24 2021-04-30 中南大学 Similarity learning based and enhanced cell type identification method
CN110321722B (en) * 2019-07-08 2021-11-09 济南大学 DNA sequence similarity safe calculation method and system
CN111027570B (en) * 2019-11-20 2022-06-14 电子科技大学 Image multi-scale feature extraction method based on cellular neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0775972A2 (en) * 1995-11-27 1997-05-28 Canon Kabushiki Kaisha Digital image processor comprising a neural network
CN103020933A (en) * 2012-12-06 2013-04-03 天津师范大学 Multi-source image fusion method based on bionic visual mechanism
CN103218544A (en) * 2013-04-03 2013-07-24 河海大学 Gene identification method based on sequence similarity and periodicity of frequency spectrum 3

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0775972A2 (en) * 1995-11-27 1997-05-28 Canon Kabushiki Kaisha Digital image processor comprising a neural network
CN103020933A (en) * 2012-12-06 2013-04-03 天津师范大学 Multi-source image fusion method based on bionic visual mechanism
CN103218544A (en) * 2013-04-03 2013-07-24 河海大学 Gene identification method based on sequence similarity and periodicity of frequency spectrum 3

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Feature selection and parameter optimization for support vector machines:A new approach based on genetic algorithm with feature chromosomes;Mingyuan Zhao等;《Expert Systems with Applications》;20110531;第38卷(第5期);第5197-5204页 *
Global Asymptotic Stability of a General Class of Recurrent Neural Networks With Time-Varying Delays;Jinde Cao等;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS》;20030131;第50卷(第1期);第34-44页 *
One-Dimensional Discrete-Time CNN with Multiplexed Template-Hardware;Gabriele Manganaro等;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: FUNDAMENTAL THEORY AND APPLICATIONS》;20000531;第47卷(第5期);第764-769页 *
Sufficient Conditions for One-Dimensional Cellular Neural Networks to Perform Connected Component Detection;Norikazu Takahashi等;《Nonlinear Analysis: Real World Applications》;20101031;第11卷(第5期);第4202-4213页 *

Also Published As

Publication number Publication date
CN103544406A (en) 2014-01-29

Similar Documents

Publication Publication Date Title
CN103544406B (en) A kind of one-dimensional cell neural network detects the method for DNA sequence dna similarity
Kumar et al. An efficient k-means clustering filtering algorithm using density based initial cluster centers
Guan et al. NeNMF: An optimal gradient method for nonnegative matrix factorization
Zolhavarieh et al. A review of subsequence time series clustering
Yi et al. RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information
He et al. A simplified multi-class support vector machine with reduced dual optimization
Sharabiani et al. Efficient classification of long time series by 3-d dynamic time warping
CN104346629A (en) Model parameter training method, device and system
Mitra et al. Feature selection and clustering of gene expression profiles using biological knowledge
Ma et al. DNA sequence classification via an expectation maximization algorithm and neural networks: a case study
Jääskinen et al. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
Xiong et al. Recursive learning for sparse Markov models
CN106228035A (en) Based on local sensitivity Hash and the efficient clustering method of imparametrization bayes method
Jahani-Nezhad et al. CodedSketch: Coded distributed computation of approximated matrix multiplication
Vega-Pons et al. Weighted association based methods for the combination of heterogeneous partitions
Vengatesan et al. The Performance Enhancement of Statistically Significant Bicluster Using Analysis of Variance
CN106022359A (en) Fuzzy entropy space clustering analysis method based on orderly information entropy
CN104156503A (en) Disease risk gene recognition method based on gene chip network analysis
Sun et al. Protein function prediction using function associations in protein–protein interaction network
Zhang et al. MODEC: an unsupervised clustering method integrating omics data for identifying cancer subtypes
Chen et al. Frequent patterns mining in multiple biological sequences
Ma et al. Kernel soft-neighborhood network fusion for miRNA-disease interaction prediction
Jiang et al. A general alternating-direction implicit framework with Gaussian process regression parameter prediction for large sparse linear systems
Wu On biological validity indices for soft clustering algorithms for gene expression data
Wang et al. Hypergraph based geometric biclustering algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160323

Termination date: 20191108