CN108664768A - Protein classification method based on the SAT and OBDD barrels of member that disappears - Google Patents
Protein classification method based on the SAT and OBDD barrels of member that disappears Download PDFInfo
- Publication number
- CN108664768A CN108664768A CN201810463426.9A CN201810463426A CN108664768A CN 108664768 A CN108664768 A CN 108664768A CN 201810463426 A CN201810463426 A CN 201810463426A CN 108664768 A CN108664768 A CN 108664768A
- Authority
- CN
- China
- Prior art keywords
- obdd
- variable
- constraint
- clause
- sat
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention discloses a kind of protein classification method based on the SAT and OBDD barrels of member that disappears, and uses the model of Boolean satisfiability (SAT), using the symbol solving algorithm and tender chicken of Ordered Binary Decision Diagrams (OBDD), including:SAT models are built first with the restriction relation and constraint base of element position in candidate pattern;Reuse OBDD symbols technology and comprising every symbol manipulation the model established is solved, and solution technique is applied in protein classification in conjunction with tender chicken, analysis is extracted characteristic information in protein, is effectively classified.For the present invention towards protein classification problem, the Frequent episodes Mining Problems in being excavated by Solution model study protein.During algorithm performs, search space is effectively reduced, improves the solution efficiency of problem, there is good practicability.
Description
Technical field
The present invention relates to data minings and symbol technical field, and in particular to a kind of egg based on the SAT and OBDD barrels of member that disappears
White matter sorting technique.
Background technology
Many medically important pathogenic bacterias are the additional "outside" films around its cell.Reside in the protein in the film
(outer membrane protein OMPs) is the main study subject of antibiotic and pharmaceutical vaccine design, because they are located at bacterium surface, because
This is that developing new drug object is easiest to close target.With the development of genomic sequencing technique and bioinformatics, biology
Family can be inferred that the issuable all proteins in specific bacteria now, and attempt the position to protein in bacterial cell
It sets and classifies.However, when predicting OMPs, such protein positioning program is currently least accurately, it is therefore desirable to be opened
Send out OMP graders better.
Data mining studies have shown that the use of frequent mode have in terms of helping to develop the sorting algorithm of precise and high efficiency it is good
Good performance.Data mining research in, have studied many in classification using the scheme of the Item Sets frequently occurred with
And the technology of Mining Frequent subsequence, the subsequence meet the constraints that some users specify.These constraints are intended to selection frequency
Feature of the subset of numerous subsequence as classification, to carry out efficient feature excavation, to build grader.
However, in practical problem, efficient, accurate excavate is carried out to Frequent episodes and still suffers from many problems, especially
It is still to be improved on time complexity and space complexity.So for effective solution sequential mode mining problem,
Many researchers use the basic thought of Boolean satisfiability (SAT), and mode excavation problem is converted into one
The model of satisfiability problem achievees the purpose that solve the problems, such as mode excavation by analysis to SAT models and solution.Boolean can
It is the whether consistent problem of a judgement classical propositional logic formula to meet sex chromosome mosaicism, is the complete decision problems of NP of most study
One of, so will inevitably be restricted by combinatorial complexity in solution procedure.
Invention content
To be solved by this invention be combined sequential mode mining problem with protein classification problem solves
In the process, the problem of inevitably being restricted by combinatorial complexity provides a kind of albumen based on the SAT and OBDD barrels of member that disappears
Qualitative classification method.
To solve the above problems, the present invention is achieved by the following technical solutions:
Based on the protein classification method of the SAT and OBDD barrels of member that disappears, specifically include that steps are as follows:
Step 1, basis given protein sequence and minimum support threshold value, candidate mould is indicated using propositional variable
The position of element in formula is derived the support of candidate pattern using constraint base, establishes the SAT models of protein classification;
Step 2 converts the institute Constrained clause for the SAT models that step 1 is established to Boolean function expression form, and transports
Institute's Constrained clause representation is OBDD forms, obtains the symbol OBDD tables of SAT models by the operation with OBDD and reduction rules
Show;
Step 3 is indicated according to the symbol OBDD of the SAT models of step 2 gained, on the basis of tender chicken, is passed through
The symbol manipulation of OBDD solves SAT.
In above-mentioned steps 1, SAT models include following 3 kinds of constraint representations:
First constraint:First symbol must be a fixed character;
Second constraint:The position that candidate pattern is not present is obtained by the constraint that binary system clause forms;
Third constrains:It is enumerating relative in all frequent mode problems of minimum support threshold value λ, is needing candidate pattern
At least occur λ times.
Above-mentioned third constraint is obtained by constraint base.
Above-mentioned steps 4 are as follows:
Step 4.1, to the variable in the institute Constrained clause of the second of SAT models the constraint, according to the variable and other changes
The number of restriction relation between amount carries out sort ascending, obtains variable sequence π:y0< y1< ... < yn-1;
Step 4.2, the second institute Constrained clause constrained to SAT models, as variable yiTo constrain clause cjConstraint model
It, then will constraint clause c when enclosing the variable of middle variable sequence minimumjIt is merged into OBDD variable bucket [yi] in;
Step 4.3 is based on variable sequence π, disappears to the variable in the institute Constrained clause of the second constraint of SAT models
Member, i.e.,:
Step 4.3.1, according to the quantization operation of OBDD from OBDD variable bucket [y0] in eliminate variable y0, and obtain new
Constraint clause g0, this variations per hour y1For new constraint clause g0The variable of middle variable sequence minimum, then by new constraint clause g0Add
Enter to OBDD variable bucket [y1] in;
Step 4.3.2, variable y is being eliminated0Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [y1] in eliminate
Variable y1, and obtain new constraint clause g1, this variations per hour y2For new constraint clause g1The variable of middle variable sequence minimum then will
New constraint clause g1It is added to OBDD variable bucket [y2] in;
And so on;
Step 4.3.n-1, variable y is being eliminatedn-3Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [yn-2] in
Eliminate variable yn-2, and obtain new constraint clause gn-2, variable y is only remained at this timen-1, then by new constraint clause gn-2It is added to
OBDD variable bucket [yn-1] in;
Step 4.4, from variable OBDD variable bucket [yn-1] start to OBDD variable bucket [y0] end, i.e. basis
The backward of variable sequence is by OBDD variable bucket [yi] conjunction is carried out one by one, last obtained OBDD variables are to meet second
The OBDD of all solutions of constraint is indicated;
Step 4.5, the institute Constrained clause to the first constraint, all solutions of the second constraint of satisfaction and the institute of third constraint
Constrained clause carry out OBDD with operation, the OBDD of gained be meet constrained SAT models all solutions, it is thus complete
At protein classification;
Above-mentioned i=0,1 ..., n-1, n be second constraint in variable number, j=1,2 ..., m, m be second constraint in about
The item number of beam clause.
In above-mentioned steps 4.2 and 4.3, constraint is merged into OBDD variables with operation by OBDD.
Compared with prior art, the present invention uses OBDD (Ordered Binary Decision Diagrams) technology, plays the advantage of operating method,
According to the constraint symbol solving technology that Frequent episodes excavate, and the symbolic algorithm of sequential mining is applied to biological information field egg
In white matter classification problem, to solve the problems, such as existing mode excavation to be combined the process solved with protein classification problem
The problem of inevitably being restricted by combinatorial complexity;In addition, the present invention generates in the calculation herein in connection with tender chicken
The OBDD of intermediate product may be smaller than symbol direct solving method, and various operations based on OBDD, calculating the time mainly depends on
In the size of the OBDD of Attended Operation, therefore the algorithm can improve the solution efficiency of problem to a certain extent, straight better than symbol
Connect derivation algorithm.
Description of the drawings
Fig. 1 is the flow chart of the protein classification method based on the SAT and OBDD barrels of member that disappears.
Fig. 2 a are the deletion rule in OBDD rule of simplification.
Fig. 2 b are the merging rule in OBDD rule of simplification.
Fig. 2 c are Boolean function f=(x1+x2)·x3OBDD indicate.
Fig. 3 a are the constraint clause of the first constraint:xa,o∨xb,oCorresponding OBDD is indicated.
Fig. 3 b are the constraint clause x of the second constrainta,o→(y2∧y3) corresponding OBDD indicates.
Fig. 3 c are the constraint clause x of the second constrainta,1→(y1∧y2∧y3) corresponding OBDD indicates.
Fig. 3 d are the constraint clause x of the second constraintb,o→(y0∧y1) corresponding OBDD indicates.
Fig. 3 e are the constraint clause x of the second constraintb,1→(y0∧y3) corresponding OBDD indicates.
Fig. 3 f are the constraint clause x of the second constraintb,2→(y2∧y3) corresponding OBDD indicates.
Fig. 3 g are the constraint clause of third constraint:y0+y1+y2+y3≤ 2 corresponding OBDD are indicated.
Fig. 4 is that the OBDD that the first constraint "AND" second constrains is indicated.
Fig. 5 is that the OBDD of all solutions of SAT solved according to this method is indicated.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific example, and with reference to attached
Figure, the present invention is described in more detail.
Boolean satisfiability is logical as one of the large amount of complex problem in artificial intelligence and computer science
Solution example is the whether consistent problem of a judgement classical propositional logic formula, is the complete decisions of NP of most study
One of problem.So far, very extensively and profoundly to the research of SAT algorithms, and there are many more ripe efficient algorithms, and efficient
SAT solver, but since SAT problems are usually all np hard problems, so will be inevitably by group in solution procedure
The restriction for closing complexity, for this purpose, we introduce symbol technology.Simultaneously in order to improve the solution efficiency of symbol technology, introduce
Tender chicken.
Ordered Binary Decision Diagrams (Ordered Binary Decision Diagram, OBDD) and its extension form can be with
Realize state space either the implicit representation of variable combination and search can effectively slow down or partial extent on avoid combination multiple
Polygamy problem is maximally efficient so far one of symbol technology.OBDD provides a kind of effective and specification for Boolean function
Description method, and all complex calculations of Boolean function can greatly be simplified based on OBDD data structures, OBDD
Maximum feature is high intense and ease for operation, and all arithmetic operations are handled in a manner of set, to a certain degree
On can solve combinations of states explosion issues.Therefore according to the above feature, symbolic algorithm is applied into Frequent episodes Mining Problems
In, the solution efficiency of problem can be improved to a certain extent, reduce state space complexity.
The present invention relates to the solution to mode excavation problem, relate more specifically to based on SAT and OBDD barrels of the frequent of member that disappear
Sequential mining technology, and the Frequent episodes digging technology based on the SAT and OBDD barrels of member that disappears is applied to biological information field albumen
In qualitative classification problem.The present invention considers the position of element in protein candidate pattern, and using Boolean satisfiability
Model, combined symbol technology and tender chicken are solved, are illustrated using simply example.The present invention provides bases
The Frequent episodes digging technology for the member that disappears in SAT and OBDD barrels, utilizes the restriction relation and radix of element position in candidate pattern
Constrain establish SAT models, using OBDD symbols technology and comprising every symbol manipulation, in conjunction with tender chicken, to
Achieve the purpose that solve Frequent episodes Mining Problems, finally refer to the technology in protein classification problem, and to the problem
It is solved.
Referring to Fig. 1, a kind of protein classification method based on the SAT and OBDD barrels of member that disappears specifically includes that steps are as follows:
Step 1. indicates element in candidate pattern according to the idea about modeling of Boolean satisfiability using propositional variable
Position, using constraint base derive candidate pattern support, establish protein sequence Frequent Pattern Mining problem i.e. albumen
The SAT models of qualitative classification.
Frequent mode:Given sequence s, pattern x, minimum support threshold value λ >=1, ifThen x is in sequence s
Relative to the frequent patterns of support λ.Frequent Pattern Mining problem in one item sequence includes that calculating is all relative to λ frequencies
Numerous set of modes
Enable ∑={ e1..., emIt is alphabet, s is the sequence on ∑, and length n, λ are minimum support threshold value.By egg
The each character e occurred in white matter sequence s is associated with keThe set of a propositional variableSo thatVariable xe,iE is indicated in candidate pattern, position i.xe,i
=1 indicates that there are element e in candidate pattern, and positioned at the i-th bit of pattern, xe,i=0 indicates that element e is not located at the i-th of pattern
Position. Corresponding to all possible position collection of the e in candidate pattern
Close, thus only need byA variable association reduces constraint to each character e
Number, to accelerate to solve.
First constraint (constraint 1):First symbol must be a fixed character (being different from asterisk wildcard):
Second constraint (constraint 2):The following constraint being made of binary system clause obtains the position that candidate pattern is not present:
Wherein y0..., yn-1It is n new propositional variables.If position l of the candidate pattern not in s, in above-mentioned public affairs
Y in formulal=1.In classical propositional logic,Therefore above-mentioned formula can be regarded as binary system clause
Collect (expression formula sl+i≠ e is constant, i.e. sl+i≠e∈{0,1})。
Third constraint (constraint 3):Relative in all frequent mode problems of support threshold λ in enumerating sequence s, need
Indicate that candidate pattern at least occurs λ times.The attribute is obtained by following constraint base:
If the constraint is unsatisfactory for, mean that at least there is+1 position candidate pattern of n- λ does not occur.As it can be seen that candidate mould
The position that formula occurs at most has λ -1, i.e., is not frequent.Otherwise at least there is the position of λ candidate pattern, the i.e. pattern
It is frequent.Therefore, which can derive the support of considered candidate pattern, to judge whether it is greater than or equal to most
Small support threshold.
The problem of enumerating all frequent modes in given protein sequence is constrained by the first constraint, the second constraint and third
These three constraint representations.
In the present embodiment, step 1 has including following sub-step:
The purpose of it is 2 that step 1A., which gives protein sequence aabb and minimum support threshold value, solution of the invention is given
Mining Frequent sequence pattern in sequence.
Step 1B. according to given sequence enumerates the first constraint, and (first symbol must be that a fixed character is (different
In asterisk wildcard)), the constraint clause of the first constraint can be obtained, including:
xa,o∨xb,o (1)
Step 1C. calculates k according to given sequencee, the institute Constrained clause of the second constraint is enumerated to obtain candidate pattern
The position being not present can obtain the constraint clause of the second constraint, including:
xa,o→(y2∧y3) (2)
xa,1→(y1∧y2∧y3) (3)
xb,o→(y0∧y1) (4)
xb,1→(y0∧y3) (5)
xb,2→(y2∧y3) (6)
Step 1D. enumerates third and constrain according to given sequence at least to be occurred λ times to limit candidate pattern, can obtain third about
The constraint clause of beam, including:
Third constrains:y0+y1+y2+y3≤2 (7)
Step 2. converts the constraint clause in the SAT models of frequent protein sequence Mining Problems to Boolean function, fortune
Operation with OBDD and reduction rules, and then the symbol OBDD for creating institute Constrained clause is indicated.
By the SAT models of the Frequent episodes Mining Problems of foundation, constraint clause is converted into the statement shape of Boolean function
The thought of symbol OBDD is added in formula, further carries out symbol OBDD descriptions to SAT models.Following 2 simplification are provided in OBDD
Rule:Regular 1 (deletion rule):As shown in Figure 2 a, for the node u in OBDD, if low (u)=high (u), is deleted
Node u, and by node u it is all enter side be directed toward low (u) node;Rule 2 (merges rule):As shown in Figure 2 b, in OBDD
Node u and v, if var (u)=var (v), low (u)=low (v) and high (u)=high (v), then delete one of them
Node, and by deleted node it is all enter side be directed toward retain node.The rule is equally applicable to the termination with identical label
Point.
Since the representation that OBDD is Boolean function, such as Fig. 2 c are Boolean function f=(x1+x2)·x3In variable sequence:x1
< x2< x3Lower corresponding OBDD is indicated, therefore according to the SAT models established in step 1, the symbol of OBDD is carried out to the model
Number description, that is, need that the constraint clause in SAT is converted to Boolean function first.
In the present embodiment, step 2 has including following sub-step:
One kind of step 2A. Ordered Binary Decision Diagrams (Ordered Binary Decision Diagrams, OBDD) is opened up
Spread formula.It illustrates with reference to attached drawing, the symbol OBDD used in the present invention is described in detail, in variable sequence π in Fig. 2 c:x1
<x2<x3Rule under, Boolean function f=(x1+x2)·x3OBDD indicate.
Step 2B. can be expressed as Boolean function for the constraint clause in above-mentioned example, constraint clause (2)-(6)
xa,o′+y2·y3;χa,1′+y1·y2·y3;xb,o′+y0·y1;xb,1'+y0·y3;xb,2′+y2·y3Wherein " ", " ' " and "+"
Boolean's "AND", NOT sum inclusive-OR operation are indicated respectively.
Step 2C. can must constrain the OBDD representations of clause (2)-(6) according to the representation of the above Boolean function,
Corresponding diagram 3b- Fig. 3 f respectively.
The constraint clause of constraint clause and third constraint that step 2D. first is constrained can directly be expressed as OBDD forms, point
Other corresponding diagram 3a and Fig. 3 g.
Step 3. passes through the symbol manipulation of OBDD according to the description of the symbol OBDD of SAT on the basis of tender chicken
To solve SAT.
The present invention is based on the symbol OBDD of the constraint described in step 2 to indicate, provides the symbol in conjunction with tender chicken
OBDD technologies, the SAT models excavated to Frequent episodes solve.The essence of the bucket elimination is by sum operation and the member behaviour that disappears
Make the two core operations to realize the solution to problem.In the solution of CSP, the two operations are equivalent to the connection behaviour of relationship
Work and projection operation.And the attended operation of relationship and the projection operation of relationship can be respectively by the with operation of OBDD and quantization operations
It is simple to realize.
In the present embodiment, step 3 has including following sub-step:
Step 3A. is to the variable y in the institute Constrained clause of the second constraint of SATlAccording to the variable and its dependent variable it
Between restriction relation number carry out sort ascending.It is by the variable sequence obtained after the size sort ascending of degree for above-mentioned example
π:y0< y1< y2< y3, then classified according to each constraint clause in this variable ordered pair SAT, constrain (4) xb,o→(y0∧
y1) and constraint (5) xb,1→(y0∧y3) it is one kind;Constrain (3) xa,1→(y1∧y2∧y3) it is one kind;Constrain (2) xa,o→(y2
∧y3) and constraint (6) xb,2→(y2∧y3) it is classified as one kind.
Step 3B. is due to variable y0The variable of variable sequence minimum in restriction range to constrain clause (4), (5), therefore will
Constraint clause (4), (5) are merged into OBDD variable bucket [y by the with operation of OBDD0] in.Similarly, due to variable y2For
Constrain clause (2), (6) restriction range in variable sequence minimum variable, therefore will constrain clause (2), (6) pass through OBDD's
With operation is merged into OBDD variable bucket [y2] in.
Step 3C. is based on variable sequence π, and the member that disappears is carried out to the variable in SAT.
According to the quantization operation of OBDD from bucket [y0] in eliminate variable y0, and obtain new constraint clause g0.This time-varying
Measure y1To constrain clause g0The variable of middle variable sequence minimum, then by new constraint clause g0It is added to bucket [y1] in, i.e.,:
bucket[y1]=bucket [y1]·g0,
Eliminating variable y0Afterwards, then use same method from bucket [y1] in eliminate variable y1, until eliminating variable
y2.Last bucket [y3] it is to meet the constrained variable y of institute3Value;
Step 3D. is from variable y3It rises, according to the backward of variable sequence π by OBDD variable bucket [yi] (i=3,2,1,0 is one by one
Conjunction is carried out, i.e., first by OBDD variable bucket [y3] conjunction is to OBDD variable bucket [y2] in, obtain new OBDD variables
bucket[y2];Again by new OBDD variable bucket [y2] conjunction is to OBDD variable bucket [y1] in, it obtains new OBDD and becomes
Measure bucket [y1];Finally by new OBDD variable bucket [y1] conjunction is to OBDD variable bucket [y0] in, it obtains new
OBDD variable bucket [y0];The new OBDD variable bucket [y0] it is the OBDD expressions for meeting all solutions that second constrains;
The institute of the institute Constrained clause of 3E. pairs first constraint of step, all solutions for meeting the second constraint and third constraint
Constrained clause carries out the with operation of OBDD, the OBDD of gained be meet constrained SAT all solutions, namely given
Relative to all frequent modes of minimum support threshold value λ in protein sequence.First constraint and the second constraint carry out OBDD's
Gained OBDD after with operation is as shown in figure 4, the OBDD of all solutions of SAT solved according to this method indicates such as Fig. 5.In figure
Any one from root node to destination node on 1 path value be 1 variable be SAT solution, the change lacked in the paths
Amount indicates that 0 and 1 value can be taken.Path xa,o′xb,oxa,1′xb,1′xb,2′y0y1And xa,oxb,o′xa,1′xb,1′y2y3Frequency is indicated respectively
Numerous pattern { xb,oAnd { { xa,o, { xa,o, xb,2, corresponding to pattern b and a, a*b.
Step 3F. obtains the method for solving of Frequent episodes Mining Problems by above method, finally acquires in given sequence
Meet all Frequent episodes for giving minimum support threshold value, reaches solution purpose.
It is sub to the institute Constrained clause of the first constraint, institute's Constrained of all solutions and third constraint that meet the second constraint
Sentence carries out the with operation of OBDD.The OBDD of gained be meet constrained SAT all solutions, namely given protein sequence
Relative to all frequent modes of minimum support threshold value λ in row.Any one in Fig. 5 from root node to destination node 1 road
The variable that value is 1 on diameter is the solution of SAT, and the variable expression lacked in the paths can take 0 and 1 value;According to above step
The Frequent Sequential Patterns in protein sequence are obtained, solution purpose is reached.
The present invention is by the solution above to Frequent episodes Mining Problems, by the Frequent episodes based on the SAT and OBDD barrels of member that disappears
Digging technology is applied in biological information domain proteins classification problem.In fact, protein is by numerous amino acid sequences
It is constituted, it can be abstracted as sequence, feature of the subset of Frequent Subsequence as classification be selected, to carry out efficient feature digging
Pick carries out protein classification to build grader.In the present invention, it converts protein problem to a SAT problem, and adopt
The SAT models of protein sequence are solved with above method, to reach using based on SAT and OBDD barrels of the frequent of member that disappear
Sequential mining technology is applied to the purpose in biological information domain proteins classification problem.
One side SAT models adequately can be analyzed and be portrayed to restriction relation.Another aspect OBDD symbol technologies
The descriptive power of Boolean function and the function of finite field value is greatly improved, there is high-efficiency compact and maneuverable excellent
Point reduces the space requirement of problem, can slow down state explosion problem.Frequent episodes are described using OBDD and constraint reasoning frame
Mining Problems, and combined symbol constraint solving technology and tender chicken will largely alleviate sequential mining algorithm
Computational complexity, and extend the Frequent episodes digging technology based on the SAT and OBDD barrels of member that disappears and be applied to biological information field albumen
In qualitative classification problem, promotes the development of sequential mining technology and dig the application potential of sequential mining technology, one is provided for it
The new way of solution.
The present invention discloses a kind of protein classification new method based on the SAT and OBDD barrels of member that disappears, and uses Boolean satisfiability
The model of sex chromosome mosaicism (SAT) utilizes the symbol solving algorithm and tender chicken of Ordered Binary Decision Diagrams (OBDD), including elder generation
SAT models are built using the restriction relation and constraint base of element position in candidate pattern;Reuse OBDD symbols technology with
And comprising every symbol manipulation the model established is solved in conjunction with tender chicken, and by solution technique application
Into protein classification, analysis is extracted the characteristic information in protein, is effectively classified.The method of the present invention combines first
SAT and symbol technology include creating the OBDD of constraint clause to indicate, using tender chicken, according to variable and its dependent variable
Between restriction relation number carry out sort ascending.In mining process, using the operating technology of OBDD, to whether meeting constraint
Condition is judged, all variables have been searched for, so that variable can instantiate completely, finally obtains the given minimum of satisfaction
The Frequent episodes of support threshold reach solution purpose.Wherein, the method for building SAT can clearly, directly portray candidate mould
The location information of element in formula, avoids the complication of problem, more can declarative feature.Simultaneously using the symbol manipulation of OBDD
Technology greatly improves the descriptive power of Boolean function and the function of finite field value, has high-efficiency compact and maneuverable
Advantage can slow down state explosion problem, and under higher time and space efficiency, be possible to by analyzing frequent mode
Existing position ensures that the completeness that Frequent episodes excavate, this method can and preferably be applied in protein classification.This hair
It is bright towards protein classification problem, the Frequent episodes Mining Problems in being excavated by Solution model study protein.It calculates
In method implementation procedure, search space is effectively reduced, improves the solution efficiency of problem, there is good practicability.
It should be noted that although the above embodiment of the present invention is illustrative, this is not to the present invention
Limitation, therefore the invention is not limited in above-mentioned specific implementation mode.Without departing from the principles of the present invention, every
The other embodiment that those skilled in the art obtain under the inspiration of the present invention is accordingly to be regarded as within the protection of the present invention.
Claims (5)
1. the protein classification method based on the SAT and OBDD barrels of member that disappears, characterized in that specifically include that steps are as follows:
Step 1, basis given protein sequence and minimum support threshold value, are indicated using propositional variable in candidate pattern
The position of element is derived the support of candidate pattern using constraint base, establishes the SAT models of protein classification;
Step 2 converts the institute Constrained clause for the SAT models that step 1 is established to Boolean function expression form, and uses
Institute's Constrained clause representation is OBDD forms by the operation of OBDD and reduction rules, and the symbol OBDD for obtaining SAT models is indicated;
Step 3 is indicated according to the symbol OBDD of the SAT models of step 2 gained, on the basis of tender chicken, passes through OBDD
Symbol manipulation solve SAT.
2. the protein classification method according to claim 1 based on the SAT and OBDD barrels of member that disappears, characterized in that step 1
In, SAT models include following 3 kinds of constraint representations:
First constraint:First symbol must be a fixed character;
Second constraint:The position that candidate pattern is not present is obtained by the constraint that binary system clause forms;
Third constrains:It is enumerating relative in all frequent mode problems of minimum support threshold value λ, is needing candidate pattern at least
Occur λ times.
3. the protein classification method according to claim 2 based on the SAT and OBDD barrels of member that disappears, characterized in that third is about
Beam is obtained by constraint base.
4. the protein classification method according to claim 2 or 3 based on the SAT and OBDD barrels of member that disappears, characterized in that step
4 are as follows:
Step 4.1, to the variable in the institute Constrained clause of the second of SAT models the constraint, according to the variable and its dependent variable it
Between restriction relation number carry out sort ascending, obtain variable sequence π:y0< y1< ... < yn-1;
Step 4.2, the second institute Constrained clause constrained to SAT models, as variable yiTo constrain clause cjRestriction range in
It, then will constraint clause c when the variable of variable sequence minimumjIt is merged into OBDD variable bucket [yi] in;
Step 4.3 is based on variable sequence π, carries out the member that disappears to the variable in the institute Constrained clause of the second constraint of SAT models, i.e.,:
Step 4.3.1, according to the quantization operation of OBDD from OBDD variable bucket [y0] in eliminate variable y0, and obtain new pact
Beam clause g0, this variations per hour y1For new constraint clause g0The variable of middle variable sequence minimum, then by new constraint clause g0It is added to
OBDD variable bucket [y1] in;
Step 4.3.2, variable y is being eliminated0Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [y1] in eliminate variable
y1, and obtain new constraint clause g1, this variations per hour y2For new constraint clause g1The variable of middle variable sequence minimum, then will be new
Constrain clause g1It is added to OBDD variable bucket [y2] in;
And so on;
Step 4.3.n-1, variable y is being eliminatedn-3Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [yn-2] in eliminate
Variable yn-2, and obtain new constraint clause gn-2, variable y is only remained at this timen-1, then by new constraint clause gn-2It is added to OBDD changes
Measure bucket [yn-1] in;
Step 4.4, from variable OBDD variable bucket [yn-1] start to OBDD variable bucket [y0] terminate, i.e., according to variable sequence
Backward by OBDD variable bucket [yi] conjunction is carried out one by one, last obtained OBDD variables are to meet the second constraint
The OBDD of all solutions is indicated;
Step 4.5, to first constraint institute Constrained clause, meet second constraint all solutions and third constraint it is all about
Beam clause carry out OBDD with operation, the OBDD of gained be meet constrained SAT models all solutions, thus complete egg
White matter is classified;
Above-mentioned i=0,1 ..., n-1, n are the number of variable in the second constraint, and j=1,2 ..., m, m is to constrain son in the second constraint
The item number of sentence.
5. the protein classification method according to claim 4 based on the SAT and OBDD barrels of member that disappears, characterized in that step 4.2
In 4.3, constraint is merged into OBDD variables with operation by OBDD.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463426.9A CN108664768A (en) | 2018-05-15 | 2018-05-15 | Protein classification method based on the SAT and OBDD barrels of member that disappears |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463426.9A CN108664768A (en) | 2018-05-15 | 2018-05-15 | Protein classification method based on the SAT and OBDD barrels of member that disappears |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108664768A true CN108664768A (en) | 2018-10-16 |
Family
ID=63779580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810463426.9A Pending CN108664768A (en) | 2018-05-15 | 2018-05-15 | Protein classification method based on the SAT and OBDD barrels of member that disappears |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108664768A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020178424A1 (en) * | 2001-04-06 | 2002-11-28 | Nec Usa, Inc. | Partition-based decision heuristics for SAT and image computation using SAT and BDDs |
US20120198399A1 (en) * | 2011-01-31 | 2012-08-02 | Sean Arash Safarpour | System, method and computer program for determining fixed value, fixed time, and stimulus hardware diagnosis |
CN104794370A (en) * | 2015-01-05 | 2015-07-22 | 中国人民解放军国防科学技术大学 | Construction method and device of protein classification model |
CN106126972A (en) * | 2016-06-21 | 2016-11-16 | 哈尔滨工业大学 | A kind of level multi-tag sorting technique for protein function prediction |
CN106650136A (en) * | 2016-12-29 | 2017-05-10 | 北京华大九天软件有限公司 | Method for detecting functional consistency of standard units of timing library and netlist library |
-
2018
- 2018-05-15 CN CN201810463426.9A patent/CN108664768A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020178424A1 (en) * | 2001-04-06 | 2002-11-28 | Nec Usa, Inc. | Partition-based decision heuristics for SAT and image computation using SAT and BDDs |
US20120198399A1 (en) * | 2011-01-31 | 2012-08-02 | Sean Arash Safarpour | System, method and computer program for determining fixed value, fixed time, and stimulus hardware diagnosis |
CN104794370A (en) * | 2015-01-05 | 2015-07-22 | 中国人民解放军国防科学技术大学 | Construction method and device of protein classification model |
CN106126972A (en) * | 2016-06-21 | 2016-11-16 | 哈尔滨工业大学 | A kind of level multi-tag sorting technique for protein function prediction |
CN106650136A (en) * | 2016-12-29 | 2017-05-10 | 北京华大九天软件有限公司 | Method for detecting functional consistency of standard units of timing library and netlist library |
Non-Patent Citations (2)
Title |
---|
SAID JABBOUR ET AL: "Boolean Satisfiability for Sequence Mining", 《CIKM2013 PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT》 * |
徐周波等: "约束满足问题求解的符号OBDD 桶消元算法", 《计算机科学》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Langdon et al. | Foundations of genetic programming | |
Li et al. | History-based topological speciation for multimodal optimization | |
Givoni et al. | Semi-supervised affinity propagation with instance-level constraints | |
JP2005276225A (en) | Tree learning using table | |
Li et al. | A survey of explainable graph neural networks: Taxonomy and evaluation metrics | |
Pio et al. | Exploiting causality in gene network reconstruction based on graph embedding | |
CN109063094A (en) | A method of establishing knowledge of TCM map | |
Oliver et al. | Inferring decision graphs using the minimum message length principle | |
Dhar et al. | Machine learning capabilities in medical diagnosis applications: Computational results for hepatitis disease | |
Métivier et al. | A constraint programming approach for mining sequential patterns in a sequence database | |
Su et al. | Improving structure mcmc for bayesian networks through markov blanket resampling | |
Li et al. | Distance-enhanced graph neural network for link prediction | |
CN113241117B (en) | Residual map-based convolutional neural network RNA-protein binding site discrimination method | |
Affeldt et al. | Robust Reconstruction of Causal Graphical Models based on Conditional 2-point and 3-point Information. | |
Meqdad et al. | New prediction method for data spreading in social networks based on machine learning algorithm | |
Rubert et al. | Gene orthology inference via large-scale rearrangements for partially assembled genomes | |
CN108664768A (en) | Protein classification method based on the SAT and OBDD barrels of member that disappears | |
Kattan et al. | GP made faster with semantic surrogate modelling | |
Archambault et al. | Smashing peacocks further: Drawing quasi-trees from biconnected components | |
Carfora et al. | Model geometries in the space of Riemannian structures and Hamilton's flow | |
Le Clément et al. | Constraint-based graph matching | |
Pérez et al. | Extraction and reuse of design patterns from genetic algorithms using case-based reasoning | |
Papageorgiou et al. | Complementary use of fuzzy decision trees and augmented fuzzy cognitive maps for decision making in medical informatics | |
Lv et al. | Benchmarking Analysis of Evolutionary Neural Architecture Search | |
Sloper | Techniques in parameterized algorithm design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181016 |