CN108664768A - Protein classification method based on the SAT and OBDD barrels of member that disappears - Google Patents

Protein classification method based on the SAT and OBDD barrels of member that disappears Download PDF

Info

Publication number
CN108664768A
CN108664768A CN201810463426.9A CN201810463426A CN108664768A CN 108664768 A CN108664768 A CN 108664768A CN 201810463426 A CN201810463426 A CN 201810463426A CN 108664768 A CN108664768 A CN 108664768A
Authority
CN
China
Prior art keywords
obdd
variable
constraint
clause
sat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810463426.9A
Other languages
Chinese (zh)
Inventor
徐周波
戴瑀君
梁轩瑜
宁黎华
刘桂珍
张鵾
杨健
黄文文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201810463426.9A priority Critical patent/CN108664768A/en
Publication of CN108664768A publication Critical patent/CN108664768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a kind of protein classification method based on the SAT and OBDD barrels of member that disappears, and uses the model of Boolean satisfiability (SAT), using the symbol solving algorithm and tender chicken of Ordered Binary Decision Diagrams (OBDD), including:SAT models are built first with the restriction relation and constraint base of element position in candidate pattern;Reuse OBDD symbols technology and comprising every symbol manipulation the model established is solved, and solution technique is applied in protein classification in conjunction with tender chicken, analysis is extracted characteristic information in protein, is effectively classified.For the present invention towards protein classification problem, the Frequent episodes Mining Problems in being excavated by Solution model study protein.During algorithm performs, search space is effectively reduced, improves the solution efficiency of problem, there is good practicability.

Description

Protein classification method based on the SAT and OBDD barrels of member that disappears
Technical field
The present invention relates to data minings and symbol technical field, and in particular to a kind of egg based on the SAT and OBDD barrels of member that disappears White matter sorting technique.
Background technology
Many medically important pathogenic bacterias are the additional "outside" films around its cell.Reside in the protein in the film (outer membrane protein OMPs) is the main study subject of antibiotic and pharmaceutical vaccine design, because they are located at bacterium surface, because This is that developing new drug object is easiest to close target.With the development of genomic sequencing technique and bioinformatics, biology Family can be inferred that the issuable all proteins in specific bacteria now, and attempt the position to protein in bacterial cell It sets and classifies.However, when predicting OMPs, such protein positioning program is currently least accurately, it is therefore desirable to be opened Send out OMP graders better.
Data mining studies have shown that the use of frequent mode have in terms of helping to develop the sorting algorithm of precise and high efficiency it is good Good performance.Data mining research in, have studied many in classification using the scheme of the Item Sets frequently occurred with And the technology of Mining Frequent subsequence, the subsequence meet the constraints that some users specify.These constraints are intended to selection frequency Feature of the subset of numerous subsequence as classification, to carry out efficient feature excavation, to build grader.
However, in practical problem, efficient, accurate excavate is carried out to Frequent episodes and still suffers from many problems, especially It is still to be improved on time complexity and space complexity.So for effective solution sequential mode mining problem, Many researchers use the basic thought of Boolean satisfiability (SAT), and mode excavation problem is converted into one The model of satisfiability problem achievees the purpose that solve the problems, such as mode excavation by analysis to SAT models and solution.Boolean can It is the whether consistent problem of a judgement classical propositional logic formula to meet sex chromosome mosaicism, is the complete decision problems of NP of most study One of, so will inevitably be restricted by combinatorial complexity in solution procedure.
Invention content
To be solved by this invention be combined sequential mode mining problem with protein classification problem solves In the process, the problem of inevitably being restricted by combinatorial complexity provides a kind of albumen based on the SAT and OBDD barrels of member that disappears Qualitative classification method.
To solve the above problems, the present invention is achieved by the following technical solutions:
Based on the protein classification method of the SAT and OBDD barrels of member that disappears, specifically include that steps are as follows:
Step 1, basis given protein sequence and minimum support threshold value, candidate mould is indicated using propositional variable The position of element in formula is derived the support of candidate pattern using constraint base, establishes the SAT models of protein classification;
Step 2 converts the institute Constrained clause for the SAT models that step 1 is established to Boolean function expression form, and transports Institute's Constrained clause representation is OBDD forms, obtains the symbol OBDD tables of SAT models by the operation with OBDD and reduction rules Show;
Step 3 is indicated according to the symbol OBDD of the SAT models of step 2 gained, on the basis of tender chicken, is passed through The symbol manipulation of OBDD solves SAT.
In above-mentioned steps 1, SAT models include following 3 kinds of constraint representations:
First constraint:First symbol must be a fixed character;
Second constraint:The position that candidate pattern is not present is obtained by the constraint that binary system clause forms;
Third constrains:It is enumerating relative in all frequent mode problems of minimum support threshold value λ, is needing candidate pattern At least occur λ times.
Above-mentioned third constraint is obtained by constraint base.
Above-mentioned steps 4 are as follows:
Step 4.1, to the variable in the institute Constrained clause of the second of SAT models the constraint, according to the variable and other changes The number of restriction relation between amount carries out sort ascending, obtains variable sequence π:y0< y1< ... < yn-1
Step 4.2, the second institute Constrained clause constrained to SAT models, as variable yiTo constrain clause cjConstraint model It, then will constraint clause c when enclosing the variable of middle variable sequence minimumjIt is merged into OBDD variable bucket [yi] in;
Step 4.3 is based on variable sequence π, disappears to the variable in the institute Constrained clause of the second constraint of SAT models Member, i.e.,:
Step 4.3.1, according to the quantization operation of OBDD from OBDD variable bucket [y0] in eliminate variable y0, and obtain new Constraint clause g0, this variations per hour y1For new constraint clause g0The variable of middle variable sequence minimum, then by new constraint clause g0Add Enter to OBDD variable bucket [y1] in;
Step 4.3.2, variable y is being eliminated0Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [y1] in eliminate Variable y1, and obtain new constraint clause g1, this variations per hour y2For new constraint clause g1The variable of middle variable sequence minimum then will New constraint clause g1It is added to OBDD variable bucket [y2] in;
And so on;
Step 4.3.n-1, variable y is being eliminatedn-3Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [yn-2] in Eliminate variable yn-2, and obtain new constraint clause gn-2, variable y is only remained at this timen-1, then by new constraint clause gn-2It is added to OBDD variable bucket [yn-1] in;
Step 4.4, from variable OBDD variable bucket [yn-1] start to OBDD variable bucket [y0] end, i.e. basis The backward of variable sequence is by OBDD variable bucket [yi] conjunction is carried out one by one, last obtained OBDD variables are to meet second The OBDD of all solutions of constraint is indicated;
Step 4.5, the institute Constrained clause to the first constraint, all solutions of the second constraint of satisfaction and the institute of third constraint Constrained clause carry out OBDD with operation, the OBDD of gained be meet constrained SAT models all solutions, it is thus complete At protein classification;
Above-mentioned i=0,1 ..., n-1, n be second constraint in variable number, j=1,2 ..., m, m be second constraint in about The item number of beam clause.
In above-mentioned steps 4.2 and 4.3, constraint is merged into OBDD variables with operation by OBDD.
Compared with prior art, the present invention uses OBDD (Ordered Binary Decision Diagrams) technology, plays the advantage of operating method, According to the constraint symbol solving technology that Frequent episodes excavate, and the symbolic algorithm of sequential mining is applied to biological information field egg In white matter classification problem, to solve the problems, such as existing mode excavation to be combined the process solved with protein classification problem The problem of inevitably being restricted by combinatorial complexity;In addition, the present invention generates in the calculation herein in connection with tender chicken The OBDD of intermediate product may be smaller than symbol direct solving method, and various operations based on OBDD, calculating the time mainly depends on In the size of the OBDD of Attended Operation, therefore the algorithm can improve the solution efficiency of problem to a certain extent, straight better than symbol Connect derivation algorithm.
Description of the drawings
Fig. 1 is the flow chart of the protein classification method based on the SAT and OBDD barrels of member that disappears.
Fig. 2 a are the deletion rule in OBDD rule of simplification.
Fig. 2 b are the merging rule in OBDD rule of simplification.
Fig. 2 c are Boolean function f=(x1+x2)·x3OBDD indicate.
Fig. 3 a are the constraint clause of the first constraint:xa,o∨xb,oCorresponding OBDD is indicated.
Fig. 3 b are the constraint clause x of the second constrainta,o→(y2∧y3) corresponding OBDD indicates.
Fig. 3 c are the constraint clause x of the second constrainta,1→(y1∧y2∧y3) corresponding OBDD indicates.
Fig. 3 d are the constraint clause x of the second constraintb,o→(y0∧y1) corresponding OBDD indicates.
Fig. 3 e are the constraint clause x of the second constraintb,1→(y0∧y3) corresponding OBDD indicates.
Fig. 3 f are the constraint clause x of the second constraintb,2→(y2∧y3) corresponding OBDD indicates.
Fig. 3 g are the constraint clause of third constraint:y0+y1+y2+y3≤ 2 corresponding OBDD are indicated.
Fig. 4 is that the OBDD that the first constraint "AND" second constrains is indicated.
Fig. 5 is that the OBDD of all solutions of SAT solved according to this method is indicated.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific example, and with reference to attached Figure, the present invention is described in more detail.
Boolean satisfiability is logical as one of the large amount of complex problem in artificial intelligence and computer science Solution example is the whether consistent problem of a judgement classical propositional logic formula, is the complete decisions of NP of most study One of problem.So far, very extensively and profoundly to the research of SAT algorithms, and there are many more ripe efficient algorithms, and efficient SAT solver, but since SAT problems are usually all np hard problems, so will be inevitably by group in solution procedure The restriction for closing complexity, for this purpose, we introduce symbol technology.Simultaneously in order to improve the solution efficiency of symbol technology, introduce Tender chicken.
Ordered Binary Decision Diagrams (Ordered Binary Decision Diagram, OBDD) and its extension form can be with Realize state space either the implicit representation of variable combination and search can effectively slow down or partial extent on avoid combination multiple Polygamy problem is maximally efficient so far one of symbol technology.OBDD provides a kind of effective and specification for Boolean function Description method, and all complex calculations of Boolean function can greatly be simplified based on OBDD data structures, OBDD Maximum feature is high intense and ease for operation, and all arithmetic operations are handled in a manner of set, to a certain degree On can solve combinations of states explosion issues.Therefore according to the above feature, symbolic algorithm is applied into Frequent episodes Mining Problems In, the solution efficiency of problem can be improved to a certain extent, reduce state space complexity.
The present invention relates to the solution to mode excavation problem, relate more specifically to based on SAT and OBDD barrels of the frequent of member that disappear Sequential mining technology, and the Frequent episodes digging technology based on the SAT and OBDD barrels of member that disappears is applied to biological information field albumen In qualitative classification problem.The present invention considers the position of element in protein candidate pattern, and using Boolean satisfiability Model, combined symbol technology and tender chicken are solved, are illustrated using simply example.The present invention provides bases The Frequent episodes digging technology for the member that disappears in SAT and OBDD barrels, utilizes the restriction relation and radix of element position in candidate pattern Constrain establish SAT models, using OBDD symbols technology and comprising every symbol manipulation, in conjunction with tender chicken, to Achieve the purpose that solve Frequent episodes Mining Problems, finally refer to the technology in protein classification problem, and to the problem It is solved.
Referring to Fig. 1, a kind of protein classification method based on the SAT and OBDD barrels of member that disappears specifically includes that steps are as follows:
Step 1. indicates element in candidate pattern according to the idea about modeling of Boolean satisfiability using propositional variable Position, using constraint base derive candidate pattern support, establish protein sequence Frequent Pattern Mining problem i.e. albumen The SAT models of qualitative classification.
Frequent mode:Given sequence s, pattern x, minimum support threshold value λ >=1, ifThen x is in sequence s Relative to the frequent patterns of support λ.Frequent Pattern Mining problem in one item sequence includes that calculating is all relative to λ frequencies Numerous set of modes
Enable ∑={ e1..., emIt is alphabet, s is the sequence on ∑, and length n, λ are minimum support threshold value.By egg The each character e occurred in white matter sequence s is associated with keThe set of a propositional variableSo thatVariable xe,iE is indicated in candidate pattern, position i.xe,i =1 indicates that there are element e in candidate pattern, and positioned at the i-th bit of pattern, xe,i=0 indicates that element e is not located at the i-th of pattern Position. Corresponding to all possible position collection of the e in candidate pattern Close, thus only need byA variable association reduces constraint to each character e Number, to accelerate to solve.
First constraint (constraint 1):First symbol must be a fixed character (being different from asterisk wildcard):
Second constraint (constraint 2):The following constraint being made of binary system clause obtains the position that candidate pattern is not present:
Wherein y0..., yn-1It is n new propositional variables.If position l of the candidate pattern not in s, in above-mentioned public affairs Y in formulal=1.In classical propositional logic,Therefore above-mentioned formula can be regarded as binary system clause Collect (expression formula sl+i≠ e is constant, i.e. sl+i≠e∈{0,1})。
Third constraint (constraint 3):Relative in all frequent mode problems of support threshold λ in enumerating sequence s, need Indicate that candidate pattern at least occurs λ times.The attribute is obtained by following constraint base:
If the constraint is unsatisfactory for, mean that at least there is+1 position candidate pattern of n- λ does not occur.As it can be seen that candidate mould The position that formula occurs at most has λ -1, i.e., is not frequent.Otherwise at least there is the position of λ candidate pattern, the i.e. pattern It is frequent.Therefore, which can derive the support of considered candidate pattern, to judge whether it is greater than or equal to most Small support threshold.
The problem of enumerating all frequent modes in given protein sequence is constrained by the first constraint, the second constraint and third These three constraint representations.
In the present embodiment, step 1 has including following sub-step:
The purpose of it is 2 that step 1A., which gives protein sequence aabb and minimum support threshold value, solution of the invention is given Mining Frequent sequence pattern in sequence.
Step 1B. according to given sequence enumerates the first constraint, and (first symbol must be that a fixed character is (different In asterisk wildcard)), the constraint clause of the first constraint can be obtained, including:
xa,o∨xb,o (1)
Step 1C. calculates k according to given sequencee, the institute Constrained clause of the second constraint is enumerated to obtain candidate pattern The position being not present can obtain the constraint clause of the second constraint, including:
xa,o→(y2∧y3) (2)
xa,1→(y1∧y2∧y3) (3)
xb,o→(y0∧y1) (4)
xb,1→(y0∧y3) (5)
xb,2→(y2∧y3) (6)
Step 1D. enumerates third and constrain according to given sequence at least to be occurred λ times to limit candidate pattern, can obtain third about The constraint clause of beam, including:
Third constrains:y0+y1+y2+y3≤2 (7)
Step 2. converts the constraint clause in the SAT models of frequent protein sequence Mining Problems to Boolean function, fortune Operation with OBDD and reduction rules, and then the symbol OBDD for creating institute Constrained clause is indicated.
By the SAT models of the Frequent episodes Mining Problems of foundation, constraint clause is converted into the statement shape of Boolean function The thought of symbol OBDD is added in formula, further carries out symbol OBDD descriptions to SAT models.Following 2 simplification are provided in OBDD Rule:Regular 1 (deletion rule):As shown in Figure 2 a, for the node u in OBDD, if low (u)=high (u), is deleted Node u, and by node u it is all enter side be directed toward low (u) node;Rule 2 (merges rule):As shown in Figure 2 b, in OBDD Node u and v, if var (u)=var (v), low (u)=low (v) and high (u)=high (v), then delete one of them Node, and by deleted node it is all enter side be directed toward retain node.The rule is equally applicable to the termination with identical label Point.
Since the representation that OBDD is Boolean function, such as Fig. 2 c are Boolean function f=(x1+x2)·x3In variable sequence:x1 < x2< x3Lower corresponding OBDD is indicated, therefore according to the SAT models established in step 1, the symbol of OBDD is carried out to the model Number description, that is, need that the constraint clause in SAT is converted to Boolean function first.
In the present embodiment, step 2 has including following sub-step:
One kind of step 2A. Ordered Binary Decision Diagrams (Ordered Binary Decision Diagrams, OBDD) is opened up Spread formula.It illustrates with reference to attached drawing, the symbol OBDD used in the present invention is described in detail, in variable sequence π in Fig. 2 c:x1 <x2<x3Rule under, Boolean function f=(x1+x2)·x3OBDD indicate.
Step 2B. can be expressed as Boolean function for the constraint clause in above-mentioned example, constraint clause (2)-(6) xa,o′+y2·y3;χa,1′+y1·y2·y3;xb,o′+y0·y1;xb,1'+y0·y3;xb,2′+y2·y3Wherein " ", " ' " and "+" Boolean's "AND", NOT sum inclusive-OR operation are indicated respectively.
Step 2C. can must constrain the OBDD representations of clause (2)-(6) according to the representation of the above Boolean function, Corresponding diagram 3b- Fig. 3 f respectively.
The constraint clause of constraint clause and third constraint that step 2D. first is constrained can directly be expressed as OBDD forms, point Other corresponding diagram 3a and Fig. 3 g.
Step 3. passes through the symbol manipulation of OBDD according to the description of the symbol OBDD of SAT on the basis of tender chicken To solve SAT.
The present invention is based on the symbol OBDD of the constraint described in step 2 to indicate, provides the symbol in conjunction with tender chicken OBDD technologies, the SAT models excavated to Frequent episodes solve.The essence of the bucket elimination is by sum operation and the member behaviour that disappears Make the two core operations to realize the solution to problem.In the solution of CSP, the two operations are equivalent to the connection behaviour of relationship Work and projection operation.And the attended operation of relationship and the projection operation of relationship can be respectively by the with operation of OBDD and quantization operations It is simple to realize.
In the present embodiment, step 3 has including following sub-step:
Step 3A. is to the variable y in the institute Constrained clause of the second constraint of SATlAccording to the variable and its dependent variable it Between restriction relation number carry out sort ascending.It is by the variable sequence obtained after the size sort ascending of degree for above-mentioned example π:y0< y1< y2< y3, then classified according to each constraint clause in this variable ordered pair SAT, constrain (4) xb,o→(y0∧ y1) and constraint (5) xb,1→(y0∧y3) it is one kind;Constrain (3) xa,1→(y1∧y2∧y3) it is one kind;Constrain (2) xa,o→(y2 ∧y3) and constraint (6) xb,2→(y2∧y3) it is classified as one kind.
Step 3B. is due to variable y0The variable of variable sequence minimum in restriction range to constrain clause (4), (5), therefore will Constraint clause (4), (5) are merged into OBDD variable bucket [y by the with operation of OBDD0] in.Similarly, due to variable y2For Constrain clause (2), (6) restriction range in variable sequence minimum variable, therefore will constrain clause (2), (6) pass through OBDD's With operation is merged into OBDD variable bucket [y2] in.
Step 3C. is based on variable sequence π, and the member that disappears is carried out to the variable in SAT.
According to the quantization operation of OBDD from bucket [y0] in eliminate variable y0, and obtain new constraint clause g0.This time-varying Measure y1To constrain clause g0The variable of middle variable sequence minimum, then by new constraint clause g0It is added to bucket [y1] in, i.e.,:
bucket[y1]=bucket [y1]·g0,
Eliminating variable y0Afterwards, then use same method from bucket [y1] in eliminate variable y1, until eliminating variable y2.Last bucket [y3] it is to meet the constrained variable y of institute3Value;
Step 3D. is from variable y3It rises, according to the backward of variable sequence π by OBDD variable bucket [yi] (i=3,2,1,0 is one by one Conjunction is carried out, i.e., first by OBDD variable bucket [y3] conjunction is to OBDD variable bucket [y2] in, obtain new OBDD variables bucket[y2];Again by new OBDD variable bucket [y2] conjunction is to OBDD variable bucket [y1] in, it obtains new OBDD and becomes Measure bucket [y1];Finally by new OBDD variable bucket [y1] conjunction is to OBDD variable bucket [y0] in, it obtains new OBDD variable bucket [y0];The new OBDD variable bucket [y0] it is the OBDD expressions for meeting all solutions that second constrains;
The institute of the institute Constrained clause of 3E. pairs first constraint of step, all solutions for meeting the second constraint and third constraint Constrained clause carries out the with operation of OBDD, the OBDD of gained be meet constrained SAT all solutions, namely given Relative to all frequent modes of minimum support threshold value λ in protein sequence.First constraint and the second constraint carry out OBDD's Gained OBDD after with operation is as shown in figure 4, the OBDD of all solutions of SAT solved according to this method indicates such as Fig. 5.In figure Any one from root node to destination node on 1 path value be 1 variable be SAT solution, the change lacked in the paths Amount indicates that 0 and 1 value can be taken.Path xa,o′xb,oxa,1′xb,1′xb,2′y0y1And xa,oxb,o′xa,1′xb,1′y2y3Frequency is indicated respectively Numerous pattern { xb,oAnd { { xa,o, { xa,o, xb,2, corresponding to pattern b and a, a*b.
Step 3F. obtains the method for solving of Frequent episodes Mining Problems by above method, finally acquires in given sequence Meet all Frequent episodes for giving minimum support threshold value, reaches solution purpose.
It is sub to the institute Constrained clause of the first constraint, institute's Constrained of all solutions and third constraint that meet the second constraint Sentence carries out the with operation of OBDD.The OBDD of gained be meet constrained SAT all solutions, namely given protein sequence Relative to all frequent modes of minimum support threshold value λ in row.Any one in Fig. 5 from root node to destination node 1 road The variable that value is 1 on diameter is the solution of SAT, and the variable expression lacked in the paths can take 0 and 1 value;According to above step The Frequent Sequential Patterns in protein sequence are obtained, solution purpose is reached.
The present invention is by the solution above to Frequent episodes Mining Problems, by the Frequent episodes based on the SAT and OBDD barrels of member that disappears Digging technology is applied in biological information domain proteins classification problem.In fact, protein is by numerous amino acid sequences It is constituted, it can be abstracted as sequence, feature of the subset of Frequent Subsequence as classification be selected, to carry out efficient feature digging Pick carries out protein classification to build grader.In the present invention, it converts protein problem to a SAT problem, and adopt The SAT models of protein sequence are solved with above method, to reach using based on SAT and OBDD barrels of the frequent of member that disappear Sequential mining technology is applied to the purpose in biological information domain proteins classification problem.
One side SAT models adequately can be analyzed and be portrayed to restriction relation.Another aspect OBDD symbol technologies The descriptive power of Boolean function and the function of finite field value is greatly improved, there is high-efficiency compact and maneuverable excellent Point reduces the space requirement of problem, can slow down state explosion problem.Frequent episodes are described using OBDD and constraint reasoning frame Mining Problems, and combined symbol constraint solving technology and tender chicken will largely alleviate sequential mining algorithm Computational complexity, and extend the Frequent episodes digging technology based on the SAT and OBDD barrels of member that disappears and be applied to biological information field albumen In qualitative classification problem, promotes the development of sequential mining technology and dig the application potential of sequential mining technology, one is provided for it The new way of solution.
The present invention discloses a kind of protein classification new method based on the SAT and OBDD barrels of member that disappears, and uses Boolean satisfiability The model of sex chromosome mosaicism (SAT) utilizes the symbol solving algorithm and tender chicken of Ordered Binary Decision Diagrams (OBDD), including elder generation SAT models are built using the restriction relation and constraint base of element position in candidate pattern;Reuse OBDD symbols technology with And comprising every symbol manipulation the model established is solved in conjunction with tender chicken, and by solution technique application Into protein classification, analysis is extracted the characteristic information in protein, is effectively classified.The method of the present invention combines first SAT and symbol technology include creating the OBDD of constraint clause to indicate, using tender chicken, according to variable and its dependent variable Between restriction relation number carry out sort ascending.In mining process, using the operating technology of OBDD, to whether meeting constraint Condition is judged, all variables have been searched for, so that variable can instantiate completely, finally obtains the given minimum of satisfaction The Frequent episodes of support threshold reach solution purpose.Wherein, the method for building SAT can clearly, directly portray candidate mould The location information of element in formula, avoids the complication of problem, more can declarative feature.Simultaneously using the symbol manipulation of OBDD Technology greatly improves the descriptive power of Boolean function and the function of finite field value, has high-efficiency compact and maneuverable Advantage can slow down state explosion problem, and under higher time and space efficiency, be possible to by analyzing frequent mode Existing position ensures that the completeness that Frequent episodes excavate, this method can and preferably be applied in protein classification.This hair It is bright towards protein classification problem, the Frequent episodes Mining Problems in being excavated by Solution model study protein.It calculates In method implementation procedure, search space is effectively reduced, improves the solution efficiency of problem, there is good practicability.
It should be noted that although the above embodiment of the present invention is illustrative, this is not to the present invention Limitation, therefore the invention is not limited in above-mentioned specific implementation mode.Without departing from the principles of the present invention, every The other embodiment that those skilled in the art obtain under the inspiration of the present invention is accordingly to be regarded as within the protection of the present invention.

Claims (5)

1. the protein classification method based on the SAT and OBDD barrels of member that disappears, characterized in that specifically include that steps are as follows:
Step 1, basis given protein sequence and minimum support threshold value, are indicated using propositional variable in candidate pattern The position of element is derived the support of candidate pattern using constraint base, establishes the SAT models of protein classification;
Step 2 converts the institute Constrained clause for the SAT models that step 1 is established to Boolean function expression form, and uses Institute's Constrained clause representation is OBDD forms by the operation of OBDD and reduction rules, and the symbol OBDD for obtaining SAT models is indicated;
Step 3 is indicated according to the symbol OBDD of the SAT models of step 2 gained, on the basis of tender chicken, passes through OBDD Symbol manipulation solve SAT.
2. the protein classification method according to claim 1 based on the SAT and OBDD barrels of member that disappears, characterized in that step 1 In, SAT models include following 3 kinds of constraint representations:
First constraint:First symbol must be a fixed character;
Second constraint:The position that candidate pattern is not present is obtained by the constraint that binary system clause forms;
Third constrains:It is enumerating relative in all frequent mode problems of minimum support threshold value λ, is needing candidate pattern at least Occur λ times.
3. the protein classification method according to claim 2 based on the SAT and OBDD barrels of member that disappears, characterized in that third is about Beam is obtained by constraint base.
4. the protein classification method according to claim 2 or 3 based on the SAT and OBDD barrels of member that disappears, characterized in that step 4 are as follows:
Step 4.1, to the variable in the institute Constrained clause of the second of SAT models the constraint, according to the variable and its dependent variable it Between restriction relation number carry out sort ascending, obtain variable sequence π:y0< y1< ... < yn-1
Step 4.2, the second institute Constrained clause constrained to SAT models, as variable yiTo constrain clause cjRestriction range in It, then will constraint clause c when the variable of variable sequence minimumjIt is merged into OBDD variable bucket [yi] in;
Step 4.3 is based on variable sequence π, carries out the member that disappears to the variable in the institute Constrained clause of the second constraint of SAT models, i.e.,:
Step 4.3.1, according to the quantization operation of OBDD from OBDD variable bucket [y0] in eliminate variable y0, and obtain new pact Beam clause g0, this variations per hour y1For new constraint clause g0The variable of middle variable sequence minimum, then by new constraint clause g0It is added to OBDD variable bucket [y1] in;
Step 4.3.2, variable y is being eliminated0Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [y1] in eliminate variable y1, and obtain new constraint clause g1, this variations per hour y2For new constraint clause g1The variable of middle variable sequence minimum, then will be new Constrain clause g1It is added to OBDD variable bucket [y2] in;
And so on;
Step 4.3.n-1, variable y is being eliminatedn-3Afterwards, according to the quantization operation of OBDD from OBDD variable bucket [yn-2] in eliminate Variable yn-2, and obtain new constraint clause gn-2, variable y is only remained at this timen-1, then by new constraint clause gn-2It is added to OBDD changes Measure bucket [yn-1] in;
Step 4.4, from variable OBDD variable bucket [yn-1] start to OBDD variable bucket [y0] terminate, i.e., according to variable sequence Backward by OBDD variable bucket [yi] conjunction is carried out one by one, last obtained OBDD variables are to meet the second constraint The OBDD of all solutions is indicated;
Step 4.5, to first constraint institute Constrained clause, meet second constraint all solutions and third constraint it is all about Beam clause carry out OBDD with operation, the OBDD of gained be meet constrained SAT models all solutions, thus complete egg White matter is classified;
Above-mentioned i=0,1 ..., n-1, n are the number of variable in the second constraint, and j=1,2 ..., m, m is to constrain son in the second constraint The item number of sentence.
5. the protein classification method according to claim 4 based on the SAT and OBDD barrels of member that disappears, characterized in that step 4.2 In 4.3, constraint is merged into OBDD variables with operation by OBDD.
CN201810463426.9A 2018-05-15 2018-05-15 Protein classification method based on the SAT and OBDD barrels of member that disappears Pending CN108664768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810463426.9A CN108664768A (en) 2018-05-15 2018-05-15 Protein classification method based on the SAT and OBDD barrels of member that disappears

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810463426.9A CN108664768A (en) 2018-05-15 2018-05-15 Protein classification method based on the SAT and OBDD barrels of member that disappears

Publications (1)

Publication Number Publication Date
CN108664768A true CN108664768A (en) 2018-10-16

Family

ID=63779580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810463426.9A Pending CN108664768A (en) 2018-05-15 2018-05-15 Protein classification method based on the SAT and OBDD barrels of member that disappears

Country Status (1)

Country Link
CN (1) CN108664768A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178424A1 (en) * 2001-04-06 2002-11-28 Nec Usa, Inc. Partition-based decision heuristics for SAT and image computation using SAT and BDDs
US20120198399A1 (en) * 2011-01-31 2012-08-02 Sean Arash Safarpour System, method and computer program for determining fixed value, fixed time, and stimulus hardware diagnosis
CN104794370A (en) * 2015-01-05 2015-07-22 中国人民解放军国防科学技术大学 Construction method and device of protein classification model
CN106126972A (en) * 2016-06-21 2016-11-16 哈尔滨工业大学 A kind of level multi-tag sorting technique for protein function prediction
CN106650136A (en) * 2016-12-29 2017-05-10 北京华大九天软件有限公司 Method for detecting functional consistency of standard units of timing library and netlist library

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178424A1 (en) * 2001-04-06 2002-11-28 Nec Usa, Inc. Partition-based decision heuristics for SAT and image computation using SAT and BDDs
US20120198399A1 (en) * 2011-01-31 2012-08-02 Sean Arash Safarpour System, method and computer program for determining fixed value, fixed time, and stimulus hardware diagnosis
CN104794370A (en) * 2015-01-05 2015-07-22 中国人民解放军国防科学技术大学 Construction method and device of protein classification model
CN106126972A (en) * 2016-06-21 2016-11-16 哈尔滨工业大学 A kind of level multi-tag sorting technique for protein function prediction
CN106650136A (en) * 2016-12-29 2017-05-10 北京华大九天软件有限公司 Method for detecting functional consistency of standard units of timing library and netlist library

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAID JABBOUR ET AL: "Boolean Satisfiability for Sequence Mining", 《CIKM2013 PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT》 *
徐周波等: "约束满足问题求解的符号OBDD 桶消元算法", 《计算机科学》 *

Similar Documents

Publication Publication Date Title
Langdon et al. Foundations of genetic programming
Li et al. History-based topological speciation for multimodal optimization
Givoni et al. Semi-supervised affinity propagation with instance-level constraints
JP2005276225A (en) Tree learning using table
Li et al. A survey of explainable graph neural networks: Taxonomy and evaluation metrics
Pio et al. Exploiting causality in gene network reconstruction based on graph embedding
CN109063094A (en) A method of establishing knowledge of TCM map
Oliver et al. Inferring decision graphs using the minimum message length principle
Dhar et al. Machine learning capabilities in medical diagnosis applications: Computational results for hepatitis disease
Métivier et al. A constraint programming approach for mining sequential patterns in a sequence database
Su et al. Improving structure mcmc for bayesian networks through markov blanket resampling
Li et al. Distance-enhanced graph neural network for link prediction
CN113241117B (en) Residual map-based convolutional neural network RNA-protein binding site discrimination method
Affeldt et al. Robust Reconstruction of Causal Graphical Models based on Conditional 2-point and 3-point Information.
Meqdad et al. New prediction method for data spreading in social networks based on machine learning algorithm
Rubert et al. Gene orthology inference via large-scale rearrangements for partially assembled genomes
CN108664768A (en) Protein classification method based on the SAT and OBDD barrels of member that disappears
Kattan et al. GP made faster with semantic surrogate modelling
Archambault et al. Smashing peacocks further: Drawing quasi-trees from biconnected components
Carfora et al. Model geometries in the space of Riemannian structures and Hamilton's flow
Le Clément et al. Constraint-based graph matching
Pérez et al. Extraction and reuse of design patterns from genetic algorithms using case-based reasoning
Papageorgiou et al. Complementary use of fuzzy decision trees and augmented fuzzy cognitive maps for decision making in medical informatics
Lv et al. Benchmarking Analysis of Evolutionary Neural Architecture Search
Sloper Techniques in parameterized algorithm design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181016