CN110517786A - Excavate the method and device of cancer key organism marker - Google Patents

Excavate the method and device of cancer key organism marker Download PDF

Info

Publication number
CN110517786A
CN110517786A CN201910814283.6A CN201910814283A CN110517786A CN 110517786 A CN110517786 A CN 110517786A CN 201910814283 A CN201910814283 A CN 201910814283A CN 110517786 A CN110517786 A CN 110517786A
Authority
CN
China
Prior art keywords
particle
population
cancer
biomarker
life span
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910814283.6A
Other languages
Chinese (zh)
Inventor
赵环宇
封晓娟
黎彤亮
庞超逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Applied Mathematics Hebei Academy Of Sciences
Original Assignee
Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Applied Mathematics Hebei Academy Of Sciences filed Critical Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority to CN201910814283.6A priority Critical patent/CN110517786A/en
Publication of CN110517786A publication Critical patent/CN110517786A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The present invention is suitable for data mining technology field, provides a kind of method and device for excavating cancer key organism marker, which comprises obtain at least one cancer patient's sample;Using all cancer patient's samples as the particle in population, each particle of population is initialized;According to the biomarker data of particle each in Bayesian Classification Model, population and corresponding practical life span, the speed and position of each particle are updated;Particle renewal process is repeated, until reaching default cut-off condition, exports the current global extreme point of the population, and using the biomarker selected in the current global extreme point of population as the key organism marker of prediction life span.The application can excavate crucial biomarker combination from flood tide biomarker data acquisition system, effectively reduce useless or redundancy biomarker for cancer data, and then improve the precision of prediction of cancer patient's life cycle.

Description

Excavate the method and device of cancer key organism marker
Technical field
The invention belongs to data mining technology field more particularly to it is a kind of excavate cancer key organism marker method and Device.
Background technique
Medically, cancer refers to the malignant tumour originating from epithelial tissue, and cancer has stronger concealment, in order to carry out The generation of early screening and pre- anti-cancer needs to detect various types of biomarkers, with verify they whether can it is accurate, Delicately assess this disease.
In the researchs predicted cancer patient's life cycle many in early days, have exponential distribution method, Logistic regression analysis, The statistical methods such as COX proportional hazard model analytic approach.It is postoperative on PATIENTS WITH LARGE BOWEL may be influenced using Cox recurrence such as to connect a pretty man of virtue and ability 10 factors of life cycle have carried out analysis to determine the factor with statistical significance.Common univariate method often assumes that The original distribution of data, to go down to study its statistical significance in specific data distribution.But, it is assumed that cancer data Original distribution may be unscientific, and statistical method often has very strict requirements to data, using the above method by It is more in the biomarker data of needs, and wherein there is bulk redundancy and useless data, the existence to cancer patient Phase precision of prediction has a certain impact.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of method and device for excavating cancer key organism marker, with Solve the problems, such as in the prior art due to biomarker for cancer is excessive caused by patient survival precision of prediction it is low.
The first aspect of the embodiment of the present invention provides the method for excavating cancer key organism marker, comprising:
S101: obtaining at least one cancer patient's sample, and each cancer patient's sample standard deviation includes multiple biomarker numbers According to and corresponding practical life span;
S102: using all cancer patient's samples as the particle in population, and the relevant biomarker of cancer is made For the component of a vector of particle, each particle of the population is initialized;
S103: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding Practical life span is updated the speed and position of each particle;
S104: repeating S103, until reaching default cut-off condition, exports the current global extreme point of the population, and Using the biomarker selected in the current global extreme point of the population as the key organism mark of prediction life span Object.
The second aspect of the embodiment of the present invention provides a kind of device for excavating cancer key organism marker, comprising:
Sample acquisition module, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes multiple Biomarker data and corresponding practical life span;
Population initialization module, for using all cancer patient's samples as the particle in population, and by cancer phase Component of a vector of the biomarker of pass as particle initializes each particle of the population;
Particle update module, for the biomarker according to each particle in Bayesian Classification Model, the population Data and corresponding practical life span, are updated the speed and position of each particle;
Key organism marker obtains module, for repeating the calculating process of particle update module, cuts until reaching default Only condition, export the current global extreme point of the population, and will select in the current global extreme point of the population Key organism marker of the biomarker as prediction life span.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program The step of realizing the method for excavating cancer key organism marker as described above.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, realizes when the computer program is executed by processor and excavates cancer key organism as described above The step of method of marker.
The embodiment of the present invention obtains at least one cancer patient's sample first, and each cancer patient's sample standard deviation includes multiple lifes Object mark number evidence and corresponding practical life span;Then right using all cancer patient's samples as the particle in population Each particle of the population is initialized;According to the biology of each particle in Bayesian Classification Model, the population Mark number evidence and corresponding practical life span, are updated the speed and position of each particle;It is updated to repeat particle Journey exports the current global extreme point of the population until reach default cut-off condition, and by current complete of the population Key organism marker of the biomarker selected in office's extreme point as prediction life span.The present embodiment uses population Optimization algorithm acts on the reciprocation between biomarker, and is selected using Bayesian Classification Model particle swarm optimization algorithm Biomarker set out is measured, until getting optimal global extreme point, the present embodiment can be from flood tide biology Crucial biomarker combination is excavated in marker data set, effectively reduces useless or redundancy biomarker for cancer Data, and then improve the precision of prediction of cancer patient's life cycle.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is the flow diagram of the method provided in an embodiment of the present invention for excavating cancer key organism marker;
Fig. 2 is the implementation process schematic diagram of S103 in Fig. 1 provided in an embodiment of the present invention;
Fig. 3 is the implementation process schematic diagram of S201 in Fig. 2 provided in an embodiment of the present invention;
Fig. 4 is the implementation process schematic diagram of S202 in Fig. 2 provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the device provided in an embodiment of the present invention for excavating cancer key organism marker
Fig. 6 is the schematic diagram of terminal device provided in an embodiment of the present invention;
Fig. 7 is overall Mining Strategy figure provided in an embodiment of the present invention.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.
Description and claims of this specification and term " includes " and their any deformations in above-mentioned attached drawing, meaning Figure, which is to cover, non-exclusive includes.Such as process, method or system comprising a series of steps or units, product or equipment do not have It is defined in listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also wrap Include the other step or units intrinsic for these process, methods, product or equipment.In addition, term " first ", " second " and " third " etc. is for distinguishing different objects, not for description particular order.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Embodiment 1:
Fig. 1 shows the implementation process that the method for cancer key organism marker is excavated provided by one embodiment of the invention Figure, for ease of description, only parts related to embodiments of the present invention are shown, and details are as follows:
As shown in Figure 1, Fig. 1 shows the reality of the method provided in an embodiment of the present invention for excavating cancer key organism marker Existing process, details are as follows for process:
S101: obtaining at least one cancer patient's sample, and each cancer patient's sample standard deviation includes multiple biomarker numbers According to and corresponding practical life span.
The present embodiment excavates the biomarker for influencing survival time of colorectal cancer by taking colorectal cancer as an example.Greatly Intestinal cancer generally includes a large amount of biomarker.Original biomarker is based purely on to carry out the life span of cancer patient Prediction needs biggish calculation amount, and further includes the interference of many mistakes or redundant data.The present embodiment acquisition has counted The cancer patient of practical life span is as cancer patient's sample.Initial data is excavated, obtains influencing life span Biggish key organism marker.
Biomarker data acquisition system isThe life span of patient isIts In, m indicates the number of biomarker, and n indicates the case number of patient, aij(1≤i≤n, 1≤j≤m) indicates i-th of patient J-th of biomarker measured value, si(1≤i≤n) indicates the practical life span of i-th of patient, when the practical existence Between be discrete value, research is predicted as the key organism marker method for digging of target design with the life span of PATIENTS WITH LARGE BOWEL, It is intended to find out n biomarker from m biomarker, so that n < m, and pass through the essence of n biomarker prediction S Degree is not less than the prediction result of m data.
S102: using all cancer patient's samples as the particle in population, and the relevant biomarker of cancer is made For the component of a vector of particle, each particle of the population is initialized.
In the present embodiment, regard cancer patient's sample as particle in population, and by the relevant biological marker of cancer Component of a vector of the object as particle, it is assumed that have m cancer patient's sample, there is d biomarker, then include m in population Particle, the position vector of particle i are xi(t)=(xi1(t),xi2(t),…xid(t)), i=1,2 ... m, velocity vector vi(t) =(vi1(t),vi2(t),…vid(t)), i=1,2 ... m.Then xi1(t) one-component as particle position vector, works as xi1 (t)=1 it when, indicates that the biomarker that the component represents is selected, works as xi1(t)=0 when, the biology mark that the component represents is indicated Will object is not selected, to obtain m group biomarker subset according to the numerical value of each component.In search process, particle is logical Two extreme values of tracking are crossed to update oneself speed and position, first extreme value is individual extreme point, i.e., current particle institute itself The optimal solution found, position are expressed as pbesti=(pi1,pi2,…pid), i=1,2 ... m;Another extreme point is entire The optimal solution that population is found so far, i.e. global extreme point, position are gbest=(g1,g2,…gd)。
Before particle swarm optimization algorithm starts, the maximum number of iterations MaxK of particle swarm optimization algorithm is set first, according to M =100-60 (MaxK-k)/MaxK initialization population is the population of M, to achieve the purpose that increase population diversity, initially The position vector and velocity vector of each particle, while initializing individual extreme value pbestiAnd global extreme point gbest be zero to Amount.
S103: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding Practical life span is updated the speed and position of each particle.
In the present embodiment, using Bayesian Classification Model to particle swarm optimization algorithm select biomarker Collection is measured, and the classification of each particle is determined, to calculate the fitness of each particle.
Specifically, the Bayesian Classification Model that the present embodiment is chosen is Naive Bayes Classification Model.
S104: repeating S103, until reaching default cut-off condition, exports the current global extreme point of the population, and Using the biomarker selected in the current global extreme point of the population as the key organism mark of prediction life span Object.
The present embodiment carries out feature selecting using particle swarm optimization algorithm combination Naive Bayes Classification Model, and one is simple Bayesian Classification Model can cope with the characteristics of data set missing values, and two carry out Naive Bayes Classification Model in the presence of " feature is mutual It is independent " it is assumed that therefore using random search algorithm progress feature selecting to cope with the higher-dimension characteristic of data set, overall Mining Strategy As shown in fig. 7, scanning for by population search strategy to biomarker, then pass through the evaluation such as Bayesian Classification Model Strategy determines the precision of prediction of the life span prediction result for the biomarker chosen, and determines feedback plan according to precision of prediction Slightly, it repeats and scans for, until finding key organism marker.
The present embodiment acts on the reciprocation between biomarker using particle swarm optimization algorithm, and using Bayes point Class model measures the biomarker subset that particle swarm optimization algorithm is picked out, until getting optimal global extremum Point, the present embodiment can excavate crucial biomarker combination from flood tide biomarker data acquisition system, effectively reduce Useless or redundancy biomarker for cancer data, and then improve the precision of prediction of cancer patient's life cycle.
In one embodiment of the invention, before S102, the method for excavation cancer key organism marker further include: Data prediction is carried out to the biomarker data in each cancer patient's sample, and to the reality of each cancer patient's sample Life span carries out sliding-model control.
In the present embodiment, each cancer patient's sample includes a large amount of biomarker data, is needed to obtaining Cancer patient's sample carry out data prediction.In process of data preprocessing, it is necessary first to by redundancy present in sample and mistake Data accidentally filter out, and useless biomarker is especially verified as by medical domain.Secondly, it is also necessary to suffer from all cancers Biomarker data volume is rejected less than the data of case load 80% in the data set of the biomarker composition of person's sample.Most Afterwards, it needs the practical life span of cancer patient's sample carrying out sliding-model control, cancer patient's sample can be divided into less than 3 Year, 3~5 years and 5 years extra.
In the present embodiment, it will be denoted as set A by the biomarker data acquisition system of data prediction, and number will be passed through Cancer patient's sample of Data preprocess carries out population initialization and subsequent processing.
In one embodiment of the invention, as shown in Fig. 2, Fig. 2 shows the specific implementation flow of S103 in Fig. 1, Include:
S201: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding Practical life span calculates the fitness of each particle in the population;
S202: according to the fitness of particle each in the population, the speed and position of each particle are updated.
In one embodiment of the invention, Fig. 3 shows the specific implementation flow of the S201 in Fig. 2, and process is described in detail It is as follows:
S301: the corresponding biomarker data of each particle are inputted into the Bayesian Classification Model respectively, output is each The life span classification results of a particle, and according to the life span classification results of each particle with it is corresponding it is practical survive when Between, obtain the nicety of grading of each particle;
S302: according to the nicety of grading of each particle, the fitness of each particle is calculated.
In the present embodiment, classified using right-angled intersection verification method to particle.It can be by population 80% The biomarker that particle is chosen as training sample, according to each training sample is by the biomarker of corresponding cancer patient Data input Bayesian Classification Model, pass through the corresponding biomarker subset of each particle and practical life span training pattra leaves This disaggregated model obtains life span point then by the Bayesian Classification Model after in population 20% particle input training Class result.Then according to the life span classification results of each particle and corresponding practical life span, each particle is determined Nicety of grading.
In the present embodiment, the nicety of grading of each particle can be substituted into fitness function, obtains the suitable of each particle Response.
In one embodiment of the invention, as shown in figure 4, Fig. 4 shows the specific implementation flow of S202 in Fig. 2, Details are as follows for process:
S401: it according to the fitness of each particle, updates the current individual extreme point of each particle and the population is worked as Preceding global extreme point.
In the present embodiment, if the fitness of particle is Fit[i], then by the fitness F of each particleit[i] and the particle Corresponding current individual extreme point is compared, if Fit[i] > pbest (i), then use Fit[i] replaces current of the particle Body extreme point pbest (i), if Fit[i] < pbest (i), then the individual extreme point for keeping the particle current is constant, and by each grain The fitness F of sonit[i] is compared with gbest, if Fit[i] > gbest, then use Fit[i] replaces gbest, complete individual extreme point and The update of global extreme point.
S402: according to the current global extreme point of the current individual extreme point of each particle and the population, to each The speed of particle and position are updated.
In the present embodiment, the speed of each particle is updated according to speed more new formula, speed more new formula is such as Shown in formula (1).
vij(t+1)=ωi·vij(t)+c1·rand()·(pbestij(t)-xij(t))+c2·rand()· (gbestj(t)-xij(t)) (1)
The update of particle rapidity is divided into three parts, and first part is the inertia of particle present speed, and second part is particle Cognitive behavior, i.e., the elaborative faculty of particle itself, Part III are the social actions of particle, i.e., each interparticle information is total It enjoys and cooperates with each other, in formula (1), ω indicates inertia weight, and when ω value is larger, algorithm may have stronger global search energy Power, when ω value is smaller, algorithm is more likely to local search.So the initial value of ω can be 0.9, with the number of iterations Increase again linear decrease makes algorithm first lay particular emphasis on global search, search space quickly converges on a certain region, then passes through to 0.4 Local search obtains high-precision solution;c1,c2It is two Studying factors, suitable c1, c2Can accelerate convergence rate and It is not easy to fall into local optimum, general c1,c2It is taken as 2;Random number of the rand () between (0,1).Every one-dimensional speed of particle vijIt is limited in [- vmax,vmax] between, to avoid particle far from search space.If the one-dimensional speed of certain of current particle is more than Maximum speed v on this is one-dimensionalmax, then the dimension speed is restricted to maximum speed.
In the present embodiment, shown in location update formula such as formula (2), the present embodiment is by location update formula to particle Position is updated.
xij(t+1)=xij(t)+vij(t+1) (2)
In formula (2), xij(t) position vector of the t moment particle, x are indicatedij(t+1) position of the t+1 moment particle is indicated Set vector, vij(t+1) velocity vector of the t+1 moment particle is indicated.According to formula (2), can by the current location of particle and Updated velocity vector determines updated particle position vector.
In one embodiment of the invention, the specific implementation flow of S104 includes: in Fig. 1
S103 is repeated, until the number of iterations reaches maximum number of iterations, then exports the current global extremum of the population Point;Or,
S103 is repeated, until prediction error rate is less than default error rate, then exports the current global extremum of the population Point, the prediction error rate are whole when by the global extreme point all cancer patient's samples are carried out with life span prediction Body prediction error rate.
In the present embodiment, default cut-off condition includes maximum number of iterations, when the number of iterations of particle swarm optimization algorithm When reaching maximum number of iterations, then stop iteration, and using the biomarker selected in current global extreme point as large intestine The key organism marker of cancer.
In the present embodiment, default cut-off condition can also include default error rate, click when with current global extremum The biomarker taken substitutes into Bayesian Classification Model then to be stopped iterating to calculate when prediction error rate is less than default error rate.
In the present embodiment, as long as reaching one kind of above-mentioned default cut-off condition, stop iterating to calculate, exports global extremum Point.
In one embodiment of the invention, after speed and the location updating for carrying out particle for the first time, judge conditions present Whether satisfaction presets cut-off condition, if not satisfied, then will the corresponding biomarker data substitution shellfish of updated each particle This disaggregated model of leaf, obtains the nicety of grading of each particle, carries out to each particle according to the sequence of nicety of grading from high to low Sequence, and w particle after taking sequence to arrange, are back to S102 for the w particle and initialize, and by w grain of initialization Son and remaining m-w particle input Bayesian Classification Model respectively, and repeat the process of S103-S104, and calculating every time After the nicety of grading of particle, the lower w particle of nicety of grading is taken to re-start initialization, until reaching default cut-off item Part.
Further, w value can be a fixed value in entire calculating process, can also be with the increase of the number of iterations Gradually it is incremented by, so that the population for gradually falling into local optimum be broken up, improves the diversity of population, population is avoided to fall into office Portion is optimal, improves the accuracy that biomarker excavates.
By taking a practical application scene as an example, the application is by using certain university's oncology to PATIENTS WITH LARGE BOWEL over 15 years Clinical data feature, treatment feature and biomarker arrange to obtain initial data set, the excavation cancer proposed to the application The method of key organism marker is verified.
Specifically, initial data set enumerates a large amount of biomarker information, and to the same biomarker into It has gone more people, multimode measurement, ensure that the diversity of data.Initial data set contains 985 column, 809 row data altogether, often What row represented is each characteristic index value of different patients, and each column indicates colorectal cancer feature, wherein 1~64 is classified as clinical data spy Sign, 65~114 are classified as treatment feature, and 115~985 are classified as biomarker data.
, biomarker more serious for higher-dimension, the shortage of data of initial data set measurement diversity, successive value with Discrete value such as coexists at the features, carries out pretreatment work to data, obtains the standard data set comprising 845 column and 265 rows, wherein 1 ~64 column indicate Clinical symptoms, and 65~114 column indicate the treatment feature of patient, remaining is biomarker Characteristics.
Research passes through digging provided by the present application respectively using life cycle 5~year of PATIENTS WITH LARGE BOWEL, 3~5 year life cycle as target The redundancy feature that the method for pick cancer key organism marker concentrates normal data filters out, and excavating has life cycle prediction The key organism marker of great influence.Table 1 lists Result.As can be seen that in contrast, the key organism excavated The precision that marker is predicted is significantly improved.
The 2 groups of features predicted for 5~life cycle of patient picked out based on disclosed method are listed in table 2 Subset information separately includes 22 and 27 features.
It is listed in table 3 select for existence in forecast colorectal cancer patient 3~5 years based on disclosed method The character subset information of phase.Prediction of survival is carried out using 27 and 26 key organism markers of the present processes output Accuracy rate has respectively reached 75.09% and 76.6%.
The embodiment of the present invention based on 5~year of PATIENTS WITH LARGE BOWEL or 3~5 years discrete life cycle prediction research on, Predictablity rate has been more than 92% and 75% respectively.Demonstrate the feasibility and validity of this method for digging.Through this embodiment In the key organism marker subset that the method for offer is excavated, the accounting very little of Clinical symptoms, it was demonstrated that biomarker is pre- Survey importance when PATIENTS WITH LARGE BOWEL life cycle.There may be some total in the biomarker subset that the application selects simultaneously Some biomarker information, such as " All-CT ", " Dukes " etc., illustrate these common biomarkers to colorectal cancer from It is critically important to dissipate life cycle prediction.And these biomarkers for colorectal cancer hierarchical composition " variation of A, B, C, D " are very big, Illustrate under different biomarker combinations, the classification of colorectal cancer can be refined further.
1 Result of table
The character subset that table 2 is excavated based on the prediction of 5~life cycle
The character subset that table 3 was excavated based on 3~5 years Prediction of survival
The phenomenon that same biomarker coexists there are statistical methods, such as " D2-40 ", this explanation are same The different statistics of a biomarker provide different advantageous informations for the prediction of survival of patients time.In general, this kind of The statistical information of biomarker is by carrying out discretization as a result, the process of this discretization is inevitable to successive value It will cause information loss, and the group credit union advantageous loss for making up this partial information to a certain extent of different statistics.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
As shown in figure 5, the device 100 for the excavation cancer key organism marker that one embodiment of the present of invention provides, is used Method and step in the embodiment corresponding to execution Fig. 1 comprising:
Sample acquisition module 110, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes more A biomarker data and corresponding practical life span;
Population initialization module 120, for using all cancer patient's samples as the particle in population, and by cancer Component of a vector of the relevant biomarker as particle initializes each particle of the population;
Particle update module 130, for the biological marker according to each particle in Bayesian Classification Model, the population Object data and corresponding practical life span, are updated the speed and position of each particle;
Key organism marker obtains module 140, for repeating the calculating process of particle update module, until reaching default Cut-off condition exports the current global extreme point of the population, and will select in the current global extreme point of the population Biomarker as prediction life span key organism marker.
In one embodiment of the invention, the device 100 of cancer key organism marker is excavated further include:
Data preprocessing module is located in advance for carrying out data to the biomarker data in each cancer patient's sample Reason, and sliding-model control is carried out to the practical life span of each cancer patient's sample.
In one embodiment of the invention, the particle update module 130 in Fig. 5 further includes for executing corresponding to Fig. 2 Embodiment in method and step structure comprising:
Fitness computing unit, for the biological marker according to each particle in Bayesian Classification Model, the population Object data and corresponding practical life span, calculate the fitness of each particle in the population;
Particle updating unit, for the fitness according to particle each in the population, speed to each particle and Position is updated.
In one embodiment of the invention, fitness computing unit further includes for executing embodiment corresponding to Fig. 3 In method and step structure comprising:
Nicety of grading computation subunit, for the corresponding biomarker data of each particle to be inputted the pattra leaves respectively This disaggregated model, exports the life span classification results of each particle, and according to the life span classification results of each particle with Corresponding practical life span, obtains the nicety of grading of each particle;
Fitness computation subunit calculates the fitness of each particle for the nicety of grading according to each particle.
In one embodiment of the invention, particle updating unit further includes for executing in embodiment corresponding to Fig. 4 Method and step structure comprising:
Extreme point updates subelement, for the fitness according to each particle, updates the current individual extreme value of each particle Point and the current global extreme point of the population;
Particle updates subelement, for the overall situation current according to the current individual extreme point of each particle and the population Extreme point is updated the speed and position of each particle.
In one embodiment of the invention, key organism marker acquisition module 140 includes:
First global extreme point output unit, for repeating S103, until the number of iterations reaches maximum number of iterations, then it is defeated The current global extreme point of the population out;Or,
Second global extreme point output unit, for repeating S103, until prediction error rate is less than default error rate, then it is defeated The current global extreme point of the population out, the prediction error rate are by the global extreme point to all cancer patients Sample carries out whole prediction error rate when life span prediction.
In one embodiment, the device 100 for excavating cancer key organism marker further includes other function module/mono- Member, for realizing the method and step in each embodiment in embodiment 1.
Fig. 6 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 6, the terminal of the embodiment is set Standby 600 include: processor 60, memory 61 and are stored in the memory 61 and can run on the processor 60 Computer program 62.The processor 60 realizes above-mentioned each excavation cancer key organism mark when executing the computer program 62 Remember the step in the embodiment of the method for object, such as S101 shown in FIG. 1 to S104.Alternatively, the processor 60 executes the meter The function of each module/unit in above-mentioned each Installation practice, such as module 110 to 140 shown in Fig. 5 are realized when calculation machine program 62 Function.
The computer program 62 can be divided into one or more module/units, and one or more of modules/ Unit is stored in the memory 61, and is executed by the processor 60, to complete the present invention.One or more of moulds Block/unit can be the series of computation machine program instruction section that can complete specific function, the instruction segment by describe it is described based on Implementation procedure of the calculation machine program 62 in the terminal device 600.
The terminal device 600 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 60, memory 61.It will be understood by those skilled in the art that Fig. 6 The only example of terminal device 600 does not constitute the restriction to terminal device 600, may include more more or less than illustrating Component, perhaps combine certain components or different components, such as the terminal device can also be set including input and output Standby, network access equipment, bus etc..
Alleged processor 60 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.
The memory 61 can be the internal storage unit of the terminal device 600, such as the hard disk of terminal device 600 Or memory.The memory 61 is also possible to the External memory equipment of the terminal device 600, such as the terminal device 600 The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card) etc..Further, the memory 61 can also both include the terminal device 600 Internal storage unit also includes External memory equipment.The memory 61 is for storing the computer program and the terminal Other programs and data needed for equipment.The memory 61, which can be also used for temporarily storing, have been exported or will export Data.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk, Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as does not include electric carrier signal and electricity according to legislation and patent practice, computer-readable medium in certain jurisdictions Believe signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of method for excavating cancer key organism marker characterized by comprising
S101: obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation include multiple biomarker data and Corresponding practical life span;
S102: using all cancer patient's samples as the particle in population, and using the relevant biomarker of cancer as grain The component of a vector of son, initializes each particle of the population;
S103: according to the biomarker data and corresponding reality of each particle in Bayesian Classification Model, the population Life span is updated the speed and position of each particle;
S104: repeating S103, until reach default cut-off condition, exports the current global extreme point of the population, and by institute State key organism marker of the biomarker selected in the current global extreme point of population as prediction life span.
2. excavating the method for cancer key organism marker as described in claim 1, which is characterized in that before S102, also Include:
Data prediction is carried out to the biomarker data in each cancer patient's sample, and to each cancer patient's sample Practical life span carries out sliding-model control.
3. excavating the method for cancer key organism marker as described in claim 1, which is characterized in that the S103 includes:
When according to the biomarker data of each particle in Bayesian Classification Model, the population and corresponding practical existence Between, calculate the fitness of each particle in the population;
According to the fitness of particle each in the population, the speed and position of each particle are updated.
4. excavating the method for cancer key organism marker as claimed in claim 3, which is characterized in that described according to Bayes The biomarker data of each particle and corresponding practical life span, calculate the grain in disaggregated model, the population The fitness of each particle in subgroup, comprising:
The corresponding biomarker data of each particle are inputted into the Bayesian Classification Model respectively, export the life of each particle Deposit chronological classification as a result, and according to the life span classification results of each particle and corresponding practical life span, obtain each The nicety of grading of particle;
According to the nicety of grading of each particle, the fitness of each particle is calculated.
5. excavating the method for cancer key organism marker as claimed in claim 3, which is characterized in that described according to the grain The fitness of each particle in subgroup is updated the speed and position of each particle, comprising:
According to the fitness of each particle, the global pole of the current individual extreme point of each particle and the population currently is updated Value point;
According to the current global extreme point of the current individual extreme point of each particle and the population, to the speed of each particle And position is updated.
6. such as the method described in any one of claim 1 to 5 for excavating cancer key organism marker, which is characterized in that described S103 is repeated, until reaching default cut-off condition, exports the current global extreme point of the population, and the population is worked as Key organism marker of the biomarker selected in preceding global extreme point as prediction life span, comprising:
S103 is repeated, until the number of iterations reaches maximum number of iterations, then exports the current global extreme point of the population; Or,
S103 is repeated, until prediction error rate is less than default error rate, then exports the current global extreme point of the population, institute It is whole pre- when by the global extreme point all cancer patient's samples are carried out with life span prediction for stating prediction error rate Survey error rate.
7. a kind of device for excavating cancer key organism marker characterized by comprising
Sample acquisition module, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes multiple biologies Mark number evidence and corresponding practical life span;
Population initialization module, for using all cancer patient's samples as the particle in population, and cancer is relevant Component of a vector of the biomarker as particle initializes each particle of the population;
Particle update module, for the biomarker data according to each particle in Bayesian Classification Model, the population And corresponding practical life span, the speed and position of each particle are updated;
Key organism marker obtains module, for repeating the calculating process of particle update module, until reaching default cut-off item Part exports the current global extreme point of the population, and the biology that will be selected in the current global extreme point of the population Key organism marker of the marker as prediction life span.
8. excavating the device of cancer key organism marker as claimed in claim 7, which is characterized in that described device is also wrapped It includes:
Data preprocessing module, for carrying out data prediction to the biomarker data in each cancer patient's sample, and Sliding-model control is carried out to the practical life span of each cancer patient's sample.
9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 6 when executing the computer program The step of any one the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 6 of realization the method.
CN201910814283.6A 2019-08-30 2019-08-30 Excavate the method and device of cancer key organism marker Pending CN110517786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910814283.6A CN110517786A (en) 2019-08-30 2019-08-30 Excavate the method and device of cancer key organism marker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910814283.6A CN110517786A (en) 2019-08-30 2019-08-30 Excavate the method and device of cancer key organism marker

Publications (1)

Publication Number Publication Date
CN110517786A true CN110517786A (en) 2019-11-29

Family

ID=68628379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910814283.6A Pending CN110517786A (en) 2019-08-30 2019-08-30 Excavate the method and device of cancer key organism marker

Country Status (1)

Country Link
CN (1) CN110517786A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763528A (en) * 2009-12-25 2010-06-30 深圳大学 Gene regulation and control network constructing method based on Bayesian network
CN101794115A (en) * 2010-03-08 2010-08-04 清华大学 Scheduling rule intelligent excavating method based on rule parameter global coordination optimization
CN103942599A (en) * 2014-04-23 2014-07-23 天津大学 Particle swarm optimization method based on survival of the fittest and step-by-step selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763528A (en) * 2009-12-25 2010-06-30 深圳大学 Gene regulation and control network constructing method based on Bayesian network
CN101794115A (en) * 2010-03-08 2010-08-04 清华大学 Scheduling rule intelligent excavating method based on rule parameter global coordination optimization
CN103942599A (en) * 2014-04-23 2014-07-23 天津大学 Particle swarm optimization method based on survival of the fittest and step-by-step selection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任淑霞: "基于概率的不确定时态数据建模与挖掘问题的研究", 《中国博士学位论文全文数据库信息科技辑》 *
陈振东等: "《肿瘤学概论》", 31 March 2006, 人民军医出版社 *

Similar Documents

Publication Publication Date Title
Nicora et al. Integrated multi-omics analyses in oncology: a review of machine learning methods and tools
Huerta-Cepas et al. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome
Lee et al. Computational methods for discovering gene networks from expression data
Chen et al. Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks
Tibshirani et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression
Junker et al. Exploration of biological network centralities with CentiBiN
CN109446517A (en) Reference resolution method, electronic device and computer readable storage medium
Qu et al. Towards label-efficient automatic diagnosis and analysis: a comprehensive survey of advanced deep learning-based weakly-supervised, semi-supervised and self-supervised techniques in histopathological image analysis
CN107515890A (en) A kind of method and terminal for identifying resident point
Arneson et al. Multidimensional integrative genomics approaches to dissecting cardiovascular disease
CN103034687B (en) A kind of relating module recognition methodss based on 2 class heterogeneous networks
CN108960264A (en) The training method and device of disaggregated model
CN112818218B (en) Information recommendation method, device, terminal equipment and computer readable storage medium
Lv et al. Mol2Context-vec: learning molecular representation from context awareness for drug discovery
Qu et al. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning
Li et al. HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^ 6 A) based on multiple weights and feature stitching
Aydin et al. Developing structural profile matrices for protein secondary structure and solvent accessibility prediction
CN110263233A (en) Enterprise&#39;s public sentiment base construction method, device, computer equipment and storage medium
CN110827924A (en) Clustering method and device for gene expression data, computer equipment and storage medium
Wang et al. Lung cancer subtype diagnosis using weakly-paired multi-omics data
Zhang et al. MODEC: an unsupervised clustering method integrating omics data for identifying cancer subtypes
CN109522275A (en) Label method for digging, electronic equipment and the storage medium of content are produced based on user
Pak et al. Network propagation for the analysis of multi-omics data
Li et al. Assisted gene expression‐based clustering with AWNCut
CN107679222A (en) Image processing method, mobile terminal and computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129

RJ01 Rejection of invention patent application after publication