CN106126973B - Gene correlation method based on R-SVM and TPR rules - Google Patents

Gene correlation method based on R-SVM and TPR rules Download PDF

Info

Publication number
CN106126973B
CN106126973B CN201610452114.9A CN201610452114A CN106126973B CN 106126973 B CN106126973 B CN 106126973B CN 201610452114 A CN201610452114 A CN 201610452114A CN 106126973 B CN106126973 B CN 106126973B
Authority
CN
China
Prior art keywords
node
sample
attribute
svm
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610452114.9A
Other languages
Chinese (zh)
Other versions
CN106126973A (en
Inventor
冯收
付平
徐明珠
郑文斌
石金龙
邱传良
于鸿杰
阳彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Jiancheng medical laboratory Co.,Ltd.
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201610452114.9A priority Critical patent/CN106126973B/en
Publication of CN106126973A publication Critical patent/CN106126973A/en
Application granted granted Critical
Publication of CN106126973B publication Critical patent/CN106126973B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Based on the Gene correlation method of R SVM and TPR rules, it is related to a kind of prediction technique of gene function.The prediction to gene function may be implemented in the present invention, existing multi-tag and level restricted problem when can solve the problems, such as using sorting algorithm realization Gene correlation.The present invention is first using the gene of known function as training sample, composing training collection;For each node in GO annotation schemes, positive sample collection and negative sample collection are constructed;For each node in GO annotation schemes, selection contributes larger attribute when classifying to the function of the node;One group of R SVM classifier is obtained by training and classification prediction is carried out to unknown sample, obtains one group of preliminary R svm classifiers result;It converts classification results to posterior probability values, using the TPR Integrated Algorithms of the Weight for directed acyclic graph hierarchical structure, realizes the prediction of gene function.The present invention is suitable for the prediction of gene function.

Description

Gene correlation method based on R-SVM and TPR rules
Technical field
The present invention relates to the Data Minings of bioinformatics, and in particular to a kind of prediction technique of gene function.
Background technology
Gene is the DNA fragmentation for having hereditary effect.Gene supports essential structure and the performance of life, stores life Race, blood group, breed, grow, all information of apoptotic process.It is also the internal factor for determining life and health, organism It is sick, long, decline, old, all life phenomenon of waiting indefinitely is all related with gene.Therefore the biological function of clear gene (Biological Function) biological processes in understanding organism, analysis pathogenic mechanism, it is all to develop new drug etc. Various aspects suffer from highly important meaning.
Currently, which kind of many genes of many biologies such as mouse, people have the function of also in unknown state, gene function note It releases there is largely working to do, how to predict and finally determines that the biological function of gene becomes the research of genomics Emphasis.The classification problem that can be converted into due to Gene correlation problem in machine learning and Data Mining, base The research hotspot that function prediction is also current is carried out in classification.
The groundwork of Gene correlation is noted according to gene function according to known gene and its functional category information Scheme is released, function prediction is carried out to the gene of unknown function, obtains its function that may have.In Gene correlation problem In, each gene is considered as a sample, and function is considered as class label, annotation of gene function side possessed by gene Case is the set of all possible class label.The function of one gene is predicted, is exactly to the gene according to certain function Annotation scheme is classified, and obtains its function class label having, therefore Gene correlation problem can be considered as classification and ask Topic is handled.Gene correlation method based on classification mainly carries out the related data of gene using various sorting algorithms Processing, and then provide the function class label of genes with unknown function.
One function of gene is also referred to as functional label either functional category or function class label.One gene can There can be multiple functions simultaneously, i.e., it has multiple functions class label simultaneously.It is not mutual indepedent between these function class labels, But there are certain relationships, meet certain specific hierarchical structure.Hierarchical structure relationship is generally divided into two kinds, arborescence (Tree) structure and directed acyclic graph (Directed Acyclic Graph, DAG) structure.For Gene correlation, Classify to gene function generally according to pre-defined annotation scheme.Since annotation of gene function scheme carries level naturally Structure, for example annotate scheme according to FunCat, then meet tree-like graph structure;Scheme is annotated according to GO, then meets directed acyclic graph Structure, therefore this hierarchical structure is predefined, and be known.Due to these features of Gene correlation, gene Function classification problem is not belonging to two traditional classification problems, but belongs to a kind of in machine learning and Data Mining and more choose The problem of war property --- level multi-tag classification problem (Hierarchical Multi-label Classification, HMC)。
Problem of the existing technology:Level multi-tag classification problem has both multi-tag classification (Multi-label Classification) and the characteristic of hierarchical classification (Hierarchical Classification) two kinds of problems, i.e., such In problem, each sample can carry multiple labels, there are predefined hierarchical structure relationship known to one between each label, Each sample can have two or more label in any one layer simultaneously.Classical two sorting algorithms or plane Multi-tag sorting algorithm can not be directly used in and solve the problems, such as this.Since level multi-tag classification problem has both multi-tag and layer The difficult point of grade two problems of classification, thus caused by data set imbalance problem, predetermined depth problem, multi-tag problem, layer Grade restricted problem and prediction result consistency problem will take in one by one.How to overcome these difficult points, design effective Level multi-tag sorting algorithm is an important process.Currently, the domestic research to this problem and acquired achievement in research It is less, still there is prodigious research space on this problem.
Invention content
The prediction to gene function may be implemented in the present invention, when can solve to realize Gene correlation using sorting algorithm Existing multi-tag problem and level restricted problem.
Since in GO annotation schemes, function gradually refines from top to bottom, for some genes, may not have There is the function representated by bottom layer node, this makes for these nodes, and the sample size with the function is seldom, does not have There are many sample size of the function, and such case is referred to as data set imbalance problem.The presence of this problem can to classify When accuracy rate reduce, therefore, in sample set positive and negative for joint structure, need to solve this using certain strategy to ask Topic.
Due to representing a cdna sample with the vector of a multidimensional so that a sample has many attribute values, simultaneously A large amount of calculating can be introduced by handling these attribute values.For different functional nodes, attribute related with the function may It is different, some attributes may be uncorrelated attribute, can reduce the performance of grader to the processing of uncorrelated attribute, therefore be directed to For different functional nodes, sample attribute select permeability is solved.
Since in GO annotation schemes, there are certain restriction relations between each function, therefore the knot given by grader Fruit also has to comply with this hierarchical relationship, this is also problem to be solved.
Gene correlation method based on R-SVM and TPR rules, includes the following steps:
Step 1, using the gene of known function as training sample, composing training collection, and by each gene representation at one The vector of multidimensional, vector in each element be referred to as an attribute, the content in the vector is the number of actual experimental results Wordization indicates that these experimental results are all derived from the biological data library of standard;
In machine learning field, attribute refers to the property or characteristic of research object, it is different because of object, or at any time Between change;One research object may have a variety of properties or characteristic, so an object may have a variety of different categories Property;In practical operation, we are associated with numerical value or value of symbol using certain regular attribute by object, this numerical value Or value of symbol is known as the value of the attribute;For different objects, the same attribute may have different values;Therefore every One object can be indicated with a multi-C vector;
Research for this method and application background, research object are gene, and it is long that the attribute of research object refers to gene order Degree, molecular wt, the amino acid ratio etc. of encoded protein;
Each gene can have multiple functions, that is to say, that and when classifying, a gene is considered as a sample, It is exactly each term in GO annotation schemes, that is, GO annotations that each sample, which can have multiple class labels, these class labels, Each node in scheme;For existing data, one group of gene can be regarded as one group of sample, it is understood that these bases Because which has the function of, that is to say, that class label possessed by these samples is also known;It is original for unknown gene sample It says, we are exactly to expect its function class label that may have;
Step 2, in classification problem, for some class label, if sample have this class label, claim should Sample is positive sample, is known as positive sample collection by the sample set that positive sample is constituted;Sample without this class label is known as negative Sample is collectively referred to as negative sample collection by the collection that negative sample is constituted;If the quantity of positive sample is far fewer than the quantity of negative sample, we claim This problem is unbalanced dataset problem, positive and negative sample set imbalance problem or sample imbalance problem;
Each node in GO annotation schemes indicates a class label, for each node in GO annotation schemes, first By each sample in training set, positive sample collection and negative sample collection are constructed according to improved sibling principles;
Step 3 selects the Attributions selection of corresponding data set progress sample for each node in GO annotation schemes It selects the when of classifying to the function of the node and contributes larger attribute;
Step 4, for each node in GO annotation schemes, using R-SVM graders to the data set of each node into Row training;Obtain one group of R-SVM grader;
R-SVM improves processing capacities of the SVM to unbalanced dataset using threshold value regulation technology;R-SVM is not depended on pair It is distributed, will not change by the distribution situation of processing data set by the hypothesis of processing data set, can be used for solving uneven number According to collection problem;
R-SVM is selected using the method for potential optimal threshold selection (Potential Best Threshold Selection) One group of potential best SVM threshold value is selected, the method meter of optimal threshold estimation (Best Threshold Estimation) is then used Optimal threshold is calculated, is allowed to be applied on SVM;
Step 5, each node correspond to a grader, and all nodes obtain a classifiers in GO annotation schemes;It uses One group of R-SVM grader that training stage obtains carries out classification prediction to unknown sample, obtains one group of preliminary R-SVM classification knot Fruit;
Step 6 converts this group of R-SVM classification results to posterior probability using the sigmoid methods of Platt propositions Value;
Step 7, the TPR Integrated Algorithms using the Weight for directed acyclic graph hierarchical structure are ensureing finally pre- Under the premise of survey result meets directed acyclic graph level constraint requirements, the prediction of gene function is realized.
The present invention has the following effects that:
Level multi-tag sorting technique proposed by the invention can be used for the Gene correlation problem of GO annotation schemes, It realizes the prediction to gene function, provides the prediction result for multiple functions that a gene may have, solve gene function Multi-tag problem in prediction.
Method proposed by the invention can using the TPR Integrated Algorithms of the Weight for directed acyclic graph hierarchical structure It is unsatisfactory for asking for level constraint with the prediction result for solving to occur when existing Gene correlation method predicts gene Topic.
Positive and negative sample set construction method and R-SVM graders of the present invention can solve to annotate scheme using GO Existing data set imbalance problem when to Gene correlation.
The present invention can complete Gene correlation this vital task, so as to improve at present since high throughput experiment is produced Raw mass data cannot get the realistic problem of timely and effective processing, and then provide corresponding foundation and side for biological experimental verification To so that Bioexperiment can be carried out purposefully, greatly shorten the time needed for annotation of gene function, save corresponding experiment at This, retrenches expenditures, has very far-reaching practical application meaning for the research of functional genomics.
Description of the drawings
Fig. 1 is that potential optimal threshold selects schematic diagram;
Fig. 2 is a part of GO annotated maps of bioprocess ontology in GO annotation schemes.
Specific implementation mode
Specific implementation mode one:
Since in GO annotation schemes, function gradually refines from top to bottom, for some genes, may not have There is the function representated by bottom layer node, this makes for these nodes, and the sample size with the function is seldom, does not have There are many sample size of the function, and such case is referred to as data set imbalance problem.The presence of this problem can to classify When accuracy rate reduce, therefore, in sample set positive and negative for joint structure, need to solve this using certain strategy to ask Topic.
Due to representing a cdna sample with the vector of a multidimensional so that a sample has many attribute values, simultaneously A large amount of calculating can be introduced by handling these attribute values.For different functional nodes, attribute related with the function may It is different, some attributes may be uncorrelated attribute, can reduce the performance of grader to the processing of uncorrelated attribute, therefore be directed to For different functional nodes, sample attribute select permeability is solved.
Since in GO annotation schemes, there are certain restriction relations between each function, therefore the knot given by grader Fruit also has to comply with this hierarchical relationship, this is also problem to be solved.
This method is for predicting the function of gene, and wherein gene function is determined by GO annotation of gene function schemes Justice, GO annotation schemes give the function that gene may have, these functions are all indicated with term (term), Mei Geshu Language all indicates a kind of function.Fig. 2 is a part for bioprocess ontology in GO annotation schemes.
Each node is a term in Fig. 2, that is, represents a kind of function.In GO structure charts, from top to bottom, art Language is gradually detailed to the annotation of protein function.In GO structure charts, each node represents a term.Term more connects Nearly bottom leaf node, functional information amount is bigger, functional interpretation it is more specific.GO annotation schemes meet TPR rules, a term Annotate some gene, the term of his father's term or more top can also annotate this gene.Such as response in figure to stress(GO:0006950) certain gene, his father term node R esponse to stimulus (GO are annotated: 0050896) this gene can equally be annotated.
For each gene, we can be carried out with the digital vectors of a multidimensional, and the content in the vector is true The digitized representations of experimental result, these experimental results are all derived from the biological data library of standard.
Each gene can have multiple functions, that is to say, that and when classifying, a gene is considered as a sample, Each sample can have multiple class labels, these class labels are exactly each term in GO annotation schemes.As existing number For, the sample of one group of gene is can be regarded as, it is understood which these genes have the function of, that is to say, that these samples Possessed class label is also known.For unknown cdna sample, we are exactly to expect its work(that may have It can class label.
Gene correlation method based on R-SVM and TPR rules, includes the following steps:
Step 1, using the gene of known function as training sample, composing training collection, and by each gene representation at one The vector of multidimensional, vector in each element be referred to as an attribute, the content in the vector is the number of actual experimental results Wordization indicates that these experimental results are all derived from the biological data library of standard;
In machine learning field, attribute refers to the property or characteristic of research object, it is different because of object, or at any time Between change;One research object may have a variety of properties or characteristic, so an object may have a variety of different categories Property;In practical operation, we are associated with numerical value or value of symbol using certain regular attribute by object, this numerical value Or value of symbol is known as the value of the attribute;For different objects, the same attribute may have different values;Therefore every One object can be indicated with a multi-C vector;
Research for this method and application background, research object are gene, and it is long that the attribute of research object refers to gene order Degree, molecular wt, the amino acid ratio etc. of encoded protein;
Each gene can have multiple functions, that is to say, that and when classifying, a gene is considered as a sample, It is exactly each term in GO annotation schemes, that is, GO annotations that each sample, which can have multiple class labels, these class labels, Each node in scheme;For existing data, one group of gene can be regarded as one group of sample, it is understood that these bases Because which has the function of, that is to say, that class label possessed by these samples is also known;It is original for unknown gene sample It says, we are exactly to expect its function class label that may have;
Step 2, in classification problem, for some class label, if sample have this class label, claim should Sample is positive sample, is known as positive sample collection by the sample set that positive sample is constituted;Sample without this class label is known as negative Sample is collectively referred to as negative sample collection by the collection that negative sample is constituted;If the quantity of positive sample is far fewer than the quantity of negative sample, we claim This problem is unbalanced dataset problem, positive and negative sample set imbalance problem or sample imbalance problem;
Each node in GO annotation schemes indicates a class label, for each node in GO annotation schemes, first By each sample in training set, positive sample collection and negative sample collection are constructed according to improved sibling principles;
Step 3 selects the Attributions selection of corresponding data set progress sample for each node in GO annotation schemes It selects the when of classifying to the function of the node and contributes larger attribute;
Step 4, for each node in GO annotation schemes, using R-SVM graders to the data set of each node into Row training;Obtain one group of R-SVM grader;
R-SVM improves processing capacities of the SVM to unbalanced dataset using threshold value regulation technology;R-SVM is not depended on pair It is distributed, will not change by the distribution situation of processing data set by the hypothesis of processing data set, can be used for solving uneven number According to collection problem;
R-SVM is selected using the method for potential optimal threshold selection (Potential Best Threshold Selection) One group of potential best SVM threshold value is selected, the method meter of optimal threshold estimation (Best Threshold Estimation) is then used Optimal threshold is calculated, is allowed to be applied on SVM;
Step 5, each node correspond to a grader, and all nodes obtain a classifiers in GO annotation schemes;It uses One group of R-SVM grader that training stage obtains carries out classification prediction to unknown sample, obtains one group of preliminary R-SVM classification knot Fruit;
Step 6 converts this group of R-SVM classification results to posterior probability using the sigmoid methods of Platt propositions Value;
Step 7, the TPR Integrated Algorithms using the Weight for directed acyclic graph hierarchical structure are ensureing finally pre- Under the premise of survey result meets directed acyclic graph level constraint requirements, the prediction of gene function is realized.
Specific implementation mode two:
The tool that positive sample collection and negative sample collection are constructed according to improved sibling principles described in present embodiment step 2 Body process is as follows:
For each node in GO annotation schemes, in training set, using the sample for belonging to the node as positive sample, incite somebody to action Belong to the sample of the brotgher of node of the node as initial negative sample, while being rejected in original negative sample set while belonging to positive sample The sample of this concentration, and as final negative sample collection, i.e. negative sample collection;Wherein, if a node does not have the brotgher of node, The sample for the brotgher of node for selecting to belong to its father node of then tracing to the source upwards is as negative sample;
Specific symbolic indication:
Tr+(cj)=* (cj)
Wherein, Tr indicates the training set for including all samples;Node cjRepresent corresponding class label;Tr+(cj) indicate node cjPositive sample collection,It indicates while belonging to node cjWith the positive sample collection of its brotgher of node, that is, these Sample has c simultaneouslyjWith the class label of its brotgher of node;Tr-(cj) indicate node cjNegative sample collection;*(cj) indicate node cj The set that corresponding specific sample is constituted;Indicate the brotgher of node;↑ indicate father node, ↓ indicate child node;Indicate ancestors' section Point,Indicate descendent node;Indicate certain samples are rejected from a sample set.
Other steps and parameter are same as the specific embodiment one.
Specific implementation mode three:
The specific implementation process of present embodiment step 3 is as follows:
Step 3.1,
First, the information gain that each attribute is calculated using the concept of the information gain in C4.5 decision Tree algorithms, is calculated simultaneously Go out the gain ratio that each attribute is occupied;
For a certain node, if D is sample set, Gain (R) is information gain, and Gainratio is for attribute R's Information gain ratio, then its calculation formula be:
Gain (R)=Info (D)-InfoR(D)
Wherein, piIndicate that the sample for belonging to classification i ratio shared in sample set, m are the classification contained by sample set Number, Info () indicate the entropy of sample set, i.e., the information content separately needed the different classification of sample set;K indicates attribute R The value for having k kinds different, DjThe sample set being made of the sample that attribute R values are j, InfoR() indicates sample set for category The entropy of property R, that is, after being classified according to attribute R, the information content that the different classification of sample set is separately also needed to; SplitInfoR() indicates the division information for attribute R;| | indicate the number of sample included in set;
Step 3.2,
For some node, after obtaining the information gain rate value of each attribute, select sample to classification results Larger attribute is contributed, and rejects unrelated attribute, the value of information gain ratio is bigger to indicate about big to classification results contribution;In order to Appropriate number of sample attribute is chosen, is allowed to neither lose a large amount of sample information, while there is sufficient amount of attribute, is introduced Two conditions --- minimal information gain ratio value and minimum number of attributes rate value;Select the specific of final combinations of attributes Operating process is:
If each sample xjCan enough n-dimensional vectors indicate that contain n attribute, these attributes are expressed as (a1,…,an);For node i, minimal information gain ratio value is set as gi, 0<gi≤1;Minimum number of attributes rate value is set For qi, 0<qi≤1;
First, according to minimum number of attributes rate value qiCalculate minimum attribute number magnitude Qi=n × qi
Then, each attribute is arranged from big to small according to the value of information gain ratio, it is maximum from information gain rate value Attribute starts, when the summation of several information gain rate values of front is more than or equal to minimal information gain ratio value giWhen, simultaneously Judge whether the quantity of these attributes is more than minimum attribute number magnitude Qi, if conditions are not met, then continuing to select from remaining attribute The maximum attribute of breath gain ratio value of winning the confidence, until the quantity of attribute is more than or equal to minimum attribute number magnitude Qi;Then will meet The Attributions selection of the two conditions comes out, and is rejected remaining attribute as unrelated attribute;This process retains information gain ratio It is worth big attribute, that is, selects sample that classification results are contributed with larger attribute;
Step 3.2 illustrates:
The first situation:
It is now assumed that n=10, that is, there are 10 attributes, for node i, g is seti=0.95, qi=0.25, Q at this timei=10 × 0.25=2.5 ≈ 3;
For node i, the information gain rate value of each attribute be 0.4,0.3,0.1,0.1,0.05,0.01,0.01, 0.01,0.01,0.01 }, it is 1 that all information ratio values, which mutually sum it up,;We select preceding 5 attribute values at this time, then this 5 attributes The information gain rate value of value and be 0.95, have equalized to gi, that is, meet the requirement of minimal information gain ratio value;Institute simultaneously The attribute value quantity selected is 5, is more than minimum attribute number magnitude Qi=3, so when select the attribute value representative sample of front 5, Abandon 5 attributes below;After operating herein, sample becomes 5 dimensional vectors from 10 dimensional vectors;
The second situation:
It is now assumed that n=10, that is, there are 10 attributes, for node i, g is seti=0.95, qi=0.25, Q at this timei=10 × 0.25=2.5 ≈ 3;
For node i, the information gain rate value of each attribute be 0.8,0.15,0.01,0.02,0.01,0.01,0, 0,0,0 }, it is 1 that all information ratio values, which mutually sum it up,;We select preceding 2 attribute values at this time, then the information of this 2 attribute values Gain ratio value and be 0.95, that is, meet minimal information gain ratio value requirement;But selected attribute value quantity is 2, it is less than minimum attribute number magnitude Qi=3, so when select the attribute value representative sample of front 3, abandon 7 attributes below; After this operation, sample becomes 3 dimensional vectors from 10 dimensional vectors;
Step 3.3,
Process described in step 3.1 and step 3.2 is that the process of Attributions selection is carried out for a node in GO annotation schemes; Step 3.1 and 3.2 is repeated, Attributions selection is carried out to all nodes in GO annotation schemes.
Other steps and parameter are the same as one or two specific embodiments.
Specific implementation mode four:
The specific implementation process of present embodiment step 4 is as follows:
R-SVM improves processing capacities of the SVM to unbalanced dataset using threshold value regulation technology;R-SVM is not depended on pair It is distributed, will not change by the distribution situation of processing data set by the hypothesis of processing data set, can be used for solving uneven number According to collection problem;
R-SVM is selected using the method for potential optimal threshold selection (Potential Best Threshold Selection) One group of potential best SVM threshold value is selected, the method meter of optimal threshold estimation (Best Threshold Estimation) is then used Optimal threshold is calculated, is allowed to be applied on SVM;Detailed process is as follows:
One group of step 4.1, selection potential optimal threshold:
Unbalanced dataset is handled using standard SVM, obtain the output valve of all samples and by it from high to low It is ranked up, finds the different adjacent sample of physical tags, the threshold value between the different adjacent sample of physical tags is exactly potential Optimal threshold, after adjusting thresholds, prediction result can change;For each node in GO annotation schemes, can obtain To one group of potential optimal threshold, i.e., potential optimal threshold set;
Step 4.2 determines used optimal threshold estimation:
For each node in GO annotation schemes, former training set is divided into using Partitioning (PT) methods several Training set, i.e., is divided into the subset of several non-overlapping copies by a sub- training set, and each subset is considered as a training subset later; Then on each training subset, an optimal threshold is selected from potential optimal threshold set;Finally, by all training subsets All optimal thresholds of selection are averaged, as final threshold θ;
Described " optimal threshold is selected from the potential optimal threshold set " detailed process is as follows:By potential best threshold Each threshold value in value set brings sub- training set into, using the best threshold value of classification results as the optimal threshold of training subset;
Step 4.3 is modified the result of SVM using final threshold θ, for node i, sample xjPrediction As a result calculation formula is hi *(xj)=hi(xj)-θ;
Wherein hi() is the classification function that the SVM of node i is provided, hi(xj) it is the sample x that SVM is providedjClassification knot Fruit;hi *(xj) it is revised as a result, that namely R-SVM is provided as a result, hi *(xj) be more than or equal to 0, then judge xjFor positive class; hi *(xj) be less than 0, then judge xjTo bear class.
It illustrates:
If X is the training set containing n sample, sample label number shares m, that is to say, that shares m node;X= {x1,x2,…,xn, Y={ y11,y12,…,y1m…yn1,yn2,…,ynmIt is true class label corresponding with each sample, also It is each node in GO annotation schemes;xjFor a sample in training set, yjiFor sample xjFor the class label of node i, yji=1 indicates that the sample belongs to node i, yji=-1 indicates that the sample is not belonging to node i;Possibility threshold values all SVM are formed Collection be combined into Θ, we expect a threshold θ ∈ Θ so that the classifying quality of SVM reaches best;
R-SVM calculate threshold θ the step of be:
A, illustrate by taking Fig. 1 as an example, common SVM taken to classify sample set first, obtain each sample set as a result, If we there are 10 samples, 10 results that SVM is provided are obtained, it is believed that optimal threshold, which appears in, closes on two sample labels In inconsistent situation;In the example shown, there are three these threshold values, these threshold values are known as potential optimal threshold;
B, training set X is divided into S parts, training set divided method is Partitioning (PT) method, it divides training set For S unduplicated training subsets;
C, each potential optimal threshold is separately verified on each training subset, selects the optimal threshold for each subset;
D, the optimal threshold for each subset selected is averaged, obtains final optimal threshold;
Assuming that potential optimal threshold { 1.1,1.7,2.1 } there are three us, training set is divided into 5 training subsets by us, Verification result is respectively { 1.1,1.1,1.7,1.1,2.1 }, then final threshold value result is θ=1.42
E, the result of SVM is modified using the threshold value, formula hi *(xj)=hi(xj)-θ。
Other steps and parameter are identical as specific implementation mode two or three.
Specific implementation mode five:
The specific implementation process of present embodiment step 6 is as follows:
If X is the training set containing n sample, sample label number shares m, that is, shares m node;X={ x1,x2,…, xn};Y={ y11,y12,…,y1m…yn1,yn2,…,ynmIt is true class label corresponding with each sample, that is, GO annotations Each node in scheme;xjFor a sample in training set, yjiFor sample xjFor the class label of node i, yji=1 indicates The sample belongs to node i, yji=-1 indicates that the sample is not belonging to node i;
For node i, by the SVM of the node for a sample xjOutput valve hi *(xj) be converted to probability valueIt is public Formula isA, B is two coefficients for converting result.
Other steps and parameter are identical as one of specific implementation mode one to four.
Specific implementation mode six:
The solution procedure of coefficient A and B described in present embodiment step 6 are as follows:
For node i, the value of A, B can be obtained by solving following formula to training set:
WhereinN+To belong to section in sample set The quantity of the sample of point i, N_ are the quantity for the sample that node i is not belonging in sample set.
Other steps and parameter are identical as one of specific implementation mode one to five.
Specific implementation mode seven:
The specific implementation process of present embodiment step 7 is as follows:
Step 7.1, a node may contain multiple father nodes in directed acyclic graph structures, therefore be reached from root node There may be mulitpaths for one node;For such case, we define the level belonging to a node and are reached for root node What the maximum path of this node was determined, therefore there are directed acyclic graph structures how many level to depend on the tool in directed acyclic graph There is the node of longest path;It is the root node in directed acyclic graph to define r, and node i is any one section in directed acyclic graph Point (non-root node), p (r, i) indicate that the paths from root node r to node i, l (p (r, i)) indicate the length in the path;ψ (i) it is the function for determining level residing for node i, as follows:
The level in GO annotation schemes residing for each node is obtained according to ψ (i), and it is the 0th layer to define root node, is then 1 Layer, 2 layers, until GO annotation scheme bottom grade;
Step 7.2, for GO annotations scheme, process, a sample are depended in the prediction result of each node from bottom to top The prediction result of the nodal basis grader and its child node are predicted as the result of positive class;One sample its whether there is node i Representative function depend not only on it is that the node classifier provides as a result, additionally depend on the child node grader of the node to The result gone out;
For a sample xj, being in the node of the bottom since GO annotation schemes, successively handled, counted upwards Calculate the synthesis result of the result given by the result and child node grader that each node classifier provides;Detailed process is:
For a node i in GO annotation schemes, φiIndicate that prediction of result is all sons of the node i of positive class The set that node is constituted;Classification results for the node i provided after comprehensive child node classifier result;ThenCalculating it is public Formula is:
Wherein, ω is weight parameter, and weight parameter ω is used for the tribute of balanced basis grader and child node to final result Size is offered, which could be provided as 0.5, can also be and is adjusted according to actual conditions;By the step, the positive class of lower section is pre- It surveys result and is successively transmitted to upper layer respective nodes;
Step 7.3, for GO annotations scheme, process, its main target are will to pass through process from bottom to top from top to bottom Afterwards, upper layer node is judged as that the result of negative class passes to corresponding lower level node;It also by the way of successively transmitting, is changed The predicted value of each node layer is finally predicted finally according to respective threshold and the finally obtained predicted value for each node As a result;Particular content is:
For a sample xj, final calculation resultFor
Wherein, par (i) indicates the father node of node i;
During from bottom to top, it is therefore an objective to be calculated according to the classifier result of each nodeI.e. comprehensive child node result One result;Process from top to bottom is then basisCalculate final calculation resultIt is that the sample belongs to node i Probability value, a number are more than or equal to 0, are less than or equal to+1;More than or equal to 0.5, illustrate that sample belongs to the node,It is less than 0.5 explanation is not belonging to the node;
Step 7.4, for a sample xjFor, the final calculation result of node i isLabel in GO annotation schemes Shared m of number, that is to say, that share m node;Then for a sample xjFor, final calculation result is
Step 7.5, for a sample xjIfMore than or equal to 0.5, then it is predicted as positive class, i.e., the sample belongs to section Point i, the class label indicated with node i;IfLess than 0.5, then it is predicted as negative class, i.e. the sample is not belonging to node i, no The class label indicated with node i;That is sample xjThe final prediction result Y of class labeljiIt is expressed as
Step 7.6 finally obtains a sample xjWhich point, i.e. sample x in GO annotation schemes belonged tojWith which Class label;About sample xjAll class labels final prediction result YjIt can be expressed as Yj={ Yj1..., Yji..., Yjm, Realize sample xjTag Estimation, that is, realize the prediction to gene function.
Other steps and parameter are identical as one of specific implementation mode one to six.
Specific implementation mode eight:
Minimal information gain ratio value g described in present embodiment step 3.2iWith minimum attribute number magnitude QiSpecific number Value needs repeatedly to be trained in training, chooses the highest value of accuracy and is set;Detailed process is as follows:
First rule of thumb selection minimal information gain ratio value giAnd minimum attribute number magnitude QiInitial value;Then Continue step 4- steps 7;After completing the process, according to the accuracy of prediction result, this g is adjustedi、Qi, repeatedly step again 4- steps 7;After repeatedly, the case where choosing pre- accuracy highest, sets the concrete numerical value of the two values.
Other steps and parameter are identical as one of specific implementation mode one to seven.

Claims (7)

1. the Gene correlation method based on R-SVM and TPR rules, it is characterised in that include the following steps:
Step 1, using the gene of known function as training sample, composing training collection, and by each gene representation at a multidimensional Vector, vector in each element be referred to as an attribute;
Each node in step 2, GO annotation schemes indicates a class label, first for each node in GO annotation schemes First by each sample in training set, positive sample collection and negative sample collection are constructed according to improved sibling principles;
Step 3, for each node in GO annotation schemes, the Attributions selection of sample, selection pair are carried out to corresponding data set Larger attribute is contributed when the function of the node is classified;
Step 4, for each node in GO annotation schemes, the data set of each node is instructed using R-SVM graders Practice;Obtain one group of R-SVM grader;
R-SVM uses the potential best SVM threshold values of one group of method choice of potential optimal threshold selection, is then estimated using optimal threshold The method of meter calculates optimal threshold, is allowed to be applied on SVM;Detailed process is as follows:
One group of step 4.1, selection potential optimal threshold:
Unbalanced dataset is handled using standard SVM, obtain the output valve of all samples and carries out it from high to low Sequence, finds the different adjacent sample of physical tags, and the threshold value between the different adjacent sample of physical tags is exactly potential best Threshold value;For each node in GO annotation schemes, one group of potential optimal threshold can be accessed, i.e., potential optimal threshold collection It closes;
Step 4.2 determines used optimal threshold estimation:
For each node in GO annotation schemes, former training set is divided into several height using Partitioning methods and is trained Training set, i.e., is divided into the subset of several non-overlapping copies, each subset is considered as a training subset later by collection;Then every On a training subset, an optimal threshold is selected from potential optimal threshold set;Finally, the institute all training subsets selected There is optimal threshold to be averaged, as final threshold θ;
Described " optimal threshold is selected from the potential optimal threshold set " detailed process is as follows:By potential optimal threshold collection Each threshold value in conjunction brings sub- training set into, using the best threshold value of classification results as the optimal threshold of training subset;
Step 4.3 is modified the result of SVM using final threshold θ, for node i, sample xjPrediction result Calculation formula is
Wherein hi() is the classification function that the SVM of node i is provided, hi(xj) it is the sample x that SVM is providedjClassification results;Be it is revised as a result, it is that namely R-SVM is provided as a result,More than or equal to 0, then x is judgedjFor positive class;Less than 0, then x is judgedjTo bear class;
Step 5, each node correspond to a grader, and all nodes obtain a classifiers in GO annotation schemes;Use training One group of R-SVM grader that stage obtains carries out classification prediction to unknown sample, obtains one group of preliminary R-SVM classification results;
Step 6 converts this group of R-SVM classification results to posterior probability values using sigmoid methods;
Step 7, the TPR Integrated Algorithms using the Weight for directed acyclic graph hierarchical structure are ensureing final prediction knot Under the premise of fruit meets directed acyclic graph level constraint requirements, the prediction of gene function is realized.
2. the Gene correlation method according to claim 1 based on R-SVM and TPR rules, it is characterised in that step 2 The detailed process that positive sample collection and negative sample collection are constructed according to improved sibling principles is as follows:
For each node in GO annotation schemes, in training set, using the sample for belonging to the node as positive sample, will belong to The sample of the brotgher of node of the node rejects in original negative sample set as initial negative sample while belonging to positive sample collection In sample, and as final negative sample collection, i.e. negative sample collection;Wherein, if a node does not have the brotgher of node, to On trace to the source the brotgher of node for selecting to belong to its father node sample as negative sample;
Specific symbolic indication:
Tr+(cj)=* (cj)
Wherein, Tr indicates the training set for including all samples;Node cjRepresent corresponding class label;Tr+(cj) indicate node cj's Positive sample collection,It indicates while belonging to node cjWith the positive sample collection of its brotgher of node, that is, these samples There is c simultaneouslyjWith the class label of its brotgher of node;Tr-(cj) indicate node cjNegative sample collection;*(cj) indicate node cjIt is corresponding Specific sample constitute set;Indicate the brotgher of node;↑ indicate father node;Indicate ancestor node;Indicate from a sample Certain samples are rejected in this set.
3. the Gene correlation method according to claim 2 based on R-SVM and TPR rules, feature is described The specific implementation process of step 3 is as follows:
Step 3.1,
First, the information gain of each attribute is calculated using the concept of the information gain in C4.5 decision Tree algorithms, while being calculated each The gain ratio that attribute is occupied;
For a certain node, if D is sample set, Gain (R) is information gain, and Gainratio is the information for attribute R Gain ratio, then its calculation formula be:
Gain (R)=Info (D)-InfoR(D)
Wherein, piIndicate that the sample for belonging to classification i ratio shared in sample set, m are the classification number contained by sample set, Info () indicates the entropy of sample set, i.e., the information content separately needed the different classification of sample set;K indicates that attribute R has k kinds Different values, DjThe sample set being made of the sample that attribute R values are j, InfoR() indicates sample set for attribute R's Entropy, that is, after being classified according to attribute R, the information content that the different classification of sample set is separately also needed to; SplitInfoR() indicates the division information for attribute R;| | indicate the number of sample included in set;
Step 3.2,
For some node, after obtaining the information gain rate value of each attribute, sample is selected to contribute classification results Larger attribute, and unrelated attribute is rejected, the value of information gain ratio is bigger to indicate about big to classification results contribution;Selection is final The specific operation process of combinations of attributes be:
If each sample xjCan enough n-dimensional vectors indicate that contain n attribute, these attributes are expressed as (a1,…, an);For node i, minimal information gain ratio value is set as gi, 0<gi≤1;Minimum number of attributes rate value is set as qi, 0< qi≤1;
First, according to minimum number of attributes rate value qiCalculate minimum attribute number magnitude Qi=n × qi
Then, each attribute is arranged from big to small according to the value of information gain ratio, from the maximum attribute of information gain rate value Start, when the summation of several information gain rate values of front is more than or equal to minimal information gain ratio value giWhen, judge simultaneously Whether the quantity of attribute is more than minimum attribute number magnitude Qi, if conditions are not met, then continuing to choose information increasing from remaining attribute The beneficial maximum attribute of rate value, until the quantity of attribute is more than or equal to minimum attribute number magnitude Qi;Then it will meet the two The Attributions selection of part comes out, and is rejected remaining attribute as unrelated attribute;This process retains the big category of information gain rate value Property, that is, select sample that classification results are contributed with larger attribute;
Step 3.3,
Process described in step 3.1 and step 3.2 is that the process of Attributions selection is carried out for a node in GO annotation schemes;It repeats Step 3.1 and 3.2 carries out Attributions selection to all nodes in GO annotation schemes.
4. the Gene correlation method according to claim 3 based on R-SVM and TPR rules, feature is described The specific implementation process of step 6 is as follows:
If X is the training set containing n sample, sample label number shares m, that is, shares m node;X={ x1,x2,…,xn}; Y={ y11,y12,…,y1m…yn1,yn2,…,ynmIt is true class label corresponding with each sample, that is, GO annotation schemes In each node;xjFor a sample in training set, yjiFor sample xjFor the class label of node i, yji=1 indicates the sample Originally belong to node i, yji=-1 indicates that the sample is not belonging to node i;
For node i, by the SVM of the node for a sample xjOutput valveBe converted to probability valueFormula isA, B is two coefficients for converting result.
5. the Gene correlation method according to claim 4 based on R-SVM and TPR rules, feature is in step 6 institute The solution procedure for stating coefficient A and B is as follows:
For node i, the value of A, B can be obtained by solving following formula to training set:
WhereinN+To belong to node i in sample set Sample quantity, N-For be not belonging in sample set node i sample quantity.
6. the Gene correlation method according to claim 5 based on R-SVM and TPR rules, feature is described The specific implementation process of step 7 is as follows:
Step 7.1 defines the level belonging to a node and reaches what the maximum path of this node determined by root node, oriented nothing Ring graph structure depends on the node with longest path in directed acyclic graph with how many level;Definition r is directed acyclic graph In root node, node i is any one node in directed acyclic graph, and p (r, i) indicates one from root node r to node i Path, l (p (r, i)) indicate the length in the path;ψ (i) is the function for determining level residing for node i, as follows:
The level in GO annotation schemes residing for each node is obtained according to ψ (i), and it is the 0th layer to define root node, is then 1 layer, 2 Layer, until the bottom grade of GO annotation schemes;
Step 7.2 annotates scheme process from bottom to top for GO,
For a sample xj, being in the node of the bottom since GO annotation schemes, successively handled, calculated each upwards The synthesis result of a node and child node result;Detailed process is:
For a node i in GO annotation schemes, φiIndicate that prediction of result is all child node structures of the node i of positive class At set;Classification results for the node i provided after comprehensive child node classifier result;ThenCalculation formula be:
Wherein, ω is weight parameter;
Step 7.3 annotates scheme process from top to bottom for GO,
For a sample xj, final calculation resultFor
Wherein, par (i) indicates the father node of node i;
Step 7.4, for a sample xjFor, the final calculation result of node i isNumber of tags in GO annotation schemes is total There are m, that is to say, that share m node;Then for a sample xjFor, final calculation result is
Step 7.5, for a sample xjIfMore than or equal to 0.5, being then predicted as positive class, i.e., the sample belongs to node i, The class label indicated with node i;IfLess than 0.5, being then predicted as negative class, i.e. the sample is not belonging to node i, without The class label that node i indicates;That is sample xjThe final prediction result Y of class labeljiIt is expressed as
Step 7.6 finally obtains a sample xjWhich point, i.e. sample x in GO annotation schemes belonged tojWith which category Label;About sample xjAll class labels final prediction result YjIt is expressed as Yj={ Yj1..., Yji..., Yjm, realize sample xjTag Estimation, that is, realize the prediction to gene function.
7. the Gene correlation method according to claim 6 based on R-SVM and TPR rules, feature is in step 3.2 The minimal information gain ratio value giWith minimum attribute number magnitude QiConcrete numerical value need training when repeatedly instructed Practice, chooses the highest value of accuracy and set;Detailed process is as follows:
First rule of thumb selection minimal information gain ratio value giAnd minimum attribute number magnitude QiInitial value;It then proceedes to Carry out step 4- steps 7;After completing the process, according to the accuracy of prediction result, this g is adjustedi、Qi, repeatedly step 4- is walked again Rapid 7;After repeatedly, the case where choosing pre- accuracy highest, sets the concrete numerical value of the two values.
CN201610452114.9A 2016-06-21 2016-06-21 Gene correlation method based on R-SVM and TPR rules Active CN106126973B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610452114.9A CN106126973B (en) 2016-06-21 2016-06-21 Gene correlation method based on R-SVM and TPR rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610452114.9A CN106126973B (en) 2016-06-21 2016-06-21 Gene correlation method based on R-SVM and TPR rules

Publications (2)

Publication Number Publication Date
CN106126973A CN106126973A (en) 2016-11-16
CN106126973B true CN106126973B (en) 2018-10-02

Family

ID=57471369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610452114.9A Active CN106126973B (en) 2016-06-21 2016-06-21 Gene correlation method based on R-SVM and TPR rules

Country Status (1)

Country Link
CN (1) CN106126973B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400602B (en) * 2018-04-23 2022-03-25 深圳华大生命科学研究院 Sequencing data-based ABO blood group system typing method and application thereof
CN109686400B (en) * 2018-12-14 2023-04-28 山东浪潮科学研究院有限公司 Enrichment degree inspection method and device, readable medium and storage controller
CN111260120A (en) * 2020-01-12 2020-06-09 桂林理工大学 Weather data entropy value-based weather day prediction method
CN114528948B (en) * 2022-03-10 2023-07-11 电子科技大学 Sequential test sequence generation method of complex system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559294A (en) * 2013-11-12 2014-02-05 中国石油大学(北京) Construction method and device as well as sorting method and device for support vector machine sorter

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559294A (en) * 2013-11-12 2014-02-05 中国石油大学(北京) Construction method and device as well as sorting method and device for support vector machine sorter

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction;Benhui Chen,et al.;《IEEJ Transaction on Electrical and Electronic Engineering》;20121231;第183-189页 *
一种基于投票的不平衡数据分类集成算法;李勇.;《科学技术与工程》;20140731;第14卷(第21期);第275-279页 *
基于R-SVM与SVDD的部位外观模型;韩贵金,等.;《计算机应用研究》;20150430;第32卷(第4期);第1272-1275页 *
基于分类的蛋白质功能预测技术研究;陈义明.;《中国博士学位论文全文数据库 信息科技辑》;20120715(第07期);第I140-25页 *
基于蛋白质网络的复合体识别算法研究;代启国.;《中国博士学位论文全文数据库 信息科技辑》;20160315(第03期);第I138-77页 *

Also Published As

Publication number Publication date
CN106126973A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
CN106126972B (en) A kind of level multi-tag sorting technique for protein function prediction
Karaboga et al. Fuzzy clustering with artificial bee colony algorithm
Tan et al. Evolutionary computing for knowledge discovery in medical diagnosis
CN106126973B (en) Gene correlation method based on R-SVM and TPR rules
Gao Improved ant colony clustering algorithm and its performance study
Garcia-Piquer et al. Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering
Celik et al. Coabcminer: an algorithm for cooperative rule classification system based on artificial bee colony
Zemmal et al. A new hybrid system combining active learning and particle swarm optimisation for medical data classification
Cheu et al. On the two-level hybrid clustering algorithm
Martínez-Ballesteros et al. Improving a multi-objective evolutionary algorithm to discover quantitative association rules
CN110909785B (en) Multitask Triplet loss function learning method based on semantic hierarchy
Hernandez et al. A hybrid global-local approach for hierarchical classification
Junior et al. Attribute-based decision graphs: a framework for multiclass data classification
CN108388769A (en) The protein function module recognition method of label propagation algorithm based on side driving
CN117195027A (en) Cluster weighted clustering integration method based on member selection
Azarbad et al. Brain tissue segmentation using an unsupervised clustering technique based on PSO algorithm
Boonchuay et al. Minority split and gain ratio for a class imbalance
Glybovets et al. Evolutionary multimodal optimization
Andreeva et al. Data mining learning models and algorithms for medical applications
Bhat et al. OTU clustering: A window to analyse uncultured microbial world
Li et al. Data mining techniques for the practical bioinformatician
Almazini et al. Enhanced feature clustering method based on ant colony optimization for feature selection
Weise et al. Evolutionary data mining approaches for rule-based and tree-based classifiers
Bej Improved imbalanced classification through convex space learning
Breaban Clustering: evolutionary approaches

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210118

Address after: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee after: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE

Address before: 150001 No. 92 West straight street, Nangang District, Heilongjiang, Harbin

Patentee before: HARBIN INSTITUTE OF TECHNOLOGY

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211223

Address after: 150000 room 301-58, building 16, No. 1616, Chuangxin Road, Songbei District, Harbin City, Heilongjiang Province

Patentee after: Heilongjiang Jiancheng medical laboratory Co.,Ltd.

Address before: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee before: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE