CN110517786A - Excavate the method and device of cancer key organism marker - Google Patents
Excavate the method and device of cancer key organism marker Download PDFInfo
- Publication number
- CN110517786A CN110517786A CN201910814283.6A CN201910814283A CN110517786A CN 110517786 A CN110517786 A CN 110517786A CN 201910814283 A CN201910814283 A CN 201910814283A CN 110517786 A CN110517786 A CN 110517786A
- Authority
- CN
- China
- Prior art keywords
- particle
- population
- cancer
- biomarker
- life span
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Abstract
The present invention is suitable for data mining technology field, provides a kind of method and device for excavating cancer key organism marker, which comprises obtain at least one cancer patient's sample;Using all cancer patient's samples as the particle in population, each particle of population is initialized;According to the biomarker data of particle each in Bayesian Classification Model, population and corresponding practical life span, the speed and position of each particle are updated;Particle renewal process is repeated, until reaching default cut-off condition, exports the current global extreme point of the population, and using the biomarker selected in the current global extreme point of population as the key organism marker of prediction life span.The application can excavate crucial biomarker combination from flood tide biomarker data acquisition system, effectively reduce useless or redundancy biomarker for cancer data, and then improve the precision of prediction of cancer patient's life cycle.
Description
Technical field
The invention belongs to data mining technology field more particularly to it is a kind of excavate cancer key organism marker method and
Device.
Background technique
Medically, cancer refers to the malignant tumour originating from epithelial tissue, and cancer has stronger concealment, in order to carry out
The generation of early screening and pre- anti-cancer needs to detect various types of biomarkers, with verify they whether can it is accurate,
Delicately assess this disease.
In the researchs predicted cancer patient's life cycle many in early days, have exponential distribution method, Logistic regression analysis,
The statistical methods such as COX proportional hazard model analytic approach.It is postoperative on PATIENTS WITH LARGE BOWEL may be influenced using Cox recurrence such as to connect a pretty man of virtue and ability
10 factors of life cycle have carried out analysis to determine the factor with statistical significance.Common univariate method often assumes that
The original distribution of data, to go down to study its statistical significance in specific data distribution.But, it is assumed that cancer data
Original distribution may be unscientific, and statistical method often has very strict requirements to data, using the above method by
It is more in the biomarker data of needs, and wherein there is bulk redundancy and useless data, the existence to cancer patient
Phase precision of prediction has a certain impact.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of method and device for excavating cancer key organism marker, with
Solve the problems, such as in the prior art due to biomarker for cancer is excessive caused by patient survival precision of prediction it is low.
The first aspect of the embodiment of the present invention provides the method for excavating cancer key organism marker, comprising:
S101: obtaining at least one cancer patient's sample, and each cancer patient's sample standard deviation includes multiple biomarker numbers
According to and corresponding practical life span;
S102: using all cancer patient's samples as the particle in population, and the relevant biomarker of cancer is made
For the component of a vector of particle, each particle of the population is initialized;
S103: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding
Practical life span is updated the speed and position of each particle;
S104: repeating S103, until reaching default cut-off condition, exports the current global extreme point of the population, and
Using the biomarker selected in the current global extreme point of the population as the key organism mark of prediction life span
Object.
The second aspect of the embodiment of the present invention provides a kind of device for excavating cancer key organism marker, comprising:
Sample acquisition module, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes multiple
Biomarker data and corresponding practical life span;
Population initialization module, for using all cancer patient's samples as the particle in population, and by cancer phase
Component of a vector of the biomarker of pass as particle initializes each particle of the population;
Particle update module, for the biomarker according to each particle in Bayesian Classification Model, the population
Data and corresponding practical life span, are updated the speed and position of each particle;
Key organism marker obtains module, for repeating the calculating process of particle update module, cuts until reaching default
Only condition, export the current global extreme point of the population, and will select in the current global extreme point of the population
Key organism marker of the biomarker as prediction life span.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in
In the memory and the computer program that can run on the processor, when the processor executes the computer program
The step of realizing the method for excavating cancer key organism marker as described above.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, realizes when the computer program is executed by processor and excavates cancer key organism as described above
The step of method of marker.
The embodiment of the present invention obtains at least one cancer patient's sample first, and each cancer patient's sample standard deviation includes multiple lifes
Object mark number evidence and corresponding practical life span;Then right using all cancer patient's samples as the particle in population
Each particle of the population is initialized;According to the biology of each particle in Bayesian Classification Model, the population
Mark number evidence and corresponding practical life span, are updated the speed and position of each particle;It is updated to repeat particle
Journey exports the current global extreme point of the population until reach default cut-off condition, and by current complete of the population
Key organism marker of the biomarker selected in office's extreme point as prediction life span.The present embodiment uses population
Optimization algorithm acts on the reciprocation between biomarker, and is selected using Bayesian Classification Model particle swarm optimization algorithm
Biomarker set out is measured, until getting optimal global extreme point, the present embodiment can be from flood tide biology
Crucial biomarker combination is excavated in marker data set, effectively reduces useless or redundancy biomarker for cancer
Data, and then improve the precision of prediction of cancer patient's life cycle.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is the flow diagram of the method provided in an embodiment of the present invention for excavating cancer key organism marker;
Fig. 2 is the implementation process schematic diagram of S103 in Fig. 1 provided in an embodiment of the present invention;
Fig. 3 is the implementation process schematic diagram of S201 in Fig. 2 provided in an embodiment of the present invention;
Fig. 4 is the implementation process schematic diagram of S202 in Fig. 2 provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the device provided in an embodiment of the present invention for excavating cancer key organism marker
Fig. 6 is the schematic diagram of terminal device provided in an embodiment of the present invention;
Fig. 7 is overall Mining Strategy figure provided in an embodiment of the present invention.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed
Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific
The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, in case unnecessary details interferes description of the invention.
Description and claims of this specification and term " includes " and their any deformations in above-mentioned attached drawing, meaning
Figure, which is to cover, non-exclusive includes.Such as process, method or system comprising a series of steps or units, product or equipment do not have
It is defined in listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also wrap
Include the other step or units intrinsic for these process, methods, product or equipment.In addition, term " first ", " second " and
" third " etc. is for distinguishing different objects, not for description particular order.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Embodiment 1:
Fig. 1 shows the implementation process that the method for cancer key organism marker is excavated provided by one embodiment of the invention
Figure, for ease of description, only parts related to embodiments of the present invention are shown, and details are as follows:
As shown in Figure 1, Fig. 1 shows the reality of the method provided in an embodiment of the present invention for excavating cancer key organism marker
Existing process, details are as follows for process:
S101: obtaining at least one cancer patient's sample, and each cancer patient's sample standard deviation includes multiple biomarker numbers
According to and corresponding practical life span.
The present embodiment excavates the biomarker for influencing survival time of colorectal cancer by taking colorectal cancer as an example.Greatly
Intestinal cancer generally includes a large amount of biomarker.Original biomarker is based purely on to carry out the life span of cancer patient
Prediction needs biggish calculation amount, and further includes the interference of many mistakes or redundant data.The present embodiment acquisition has counted
The cancer patient of practical life span is as cancer patient's sample.Initial data is excavated, obtains influencing life span
Biggish key organism marker.
Biomarker data acquisition system isThe life span of patient isIts
In, m indicates the number of biomarker, and n indicates the case number of patient, aij(1≤i≤n, 1≤j≤m) indicates i-th of patient
J-th of biomarker measured value, si(1≤i≤n) indicates the practical life span of i-th of patient, when the practical existence
Between be discrete value, research is predicted as the key organism marker method for digging of target design with the life span of PATIENTS WITH LARGE BOWEL,
It is intended to find out n biomarker from m biomarker, so that n < m, and pass through the essence of n biomarker prediction S
Degree is not less than the prediction result of m data.
S102: using all cancer patient's samples as the particle in population, and the relevant biomarker of cancer is made
For the component of a vector of particle, each particle of the population is initialized.
In the present embodiment, regard cancer patient's sample as particle in population, and by the relevant biological marker of cancer
Component of a vector of the object as particle, it is assumed that have m cancer patient's sample, there is d biomarker, then include m in population
Particle, the position vector of particle i are xi(t)=(xi1(t),xi2(t),…xid(t)), i=1,2 ... m, velocity vector vi(t)
=(vi1(t),vi2(t),…vid(t)), i=1,2 ... m.Then xi1(t) one-component as particle position vector, works as xi1
(t)=1 it when, indicates that the biomarker that the component represents is selected, works as xi1(t)=0 when, the biology mark that the component represents is indicated
Will object is not selected, to obtain m group biomarker subset according to the numerical value of each component.In search process, particle is logical
Two extreme values of tracking are crossed to update oneself speed and position, first extreme value is individual extreme point, i.e., current particle institute itself
The optimal solution found, position are expressed as pbesti=(pi1,pi2,…pid), i=1,2 ... m;Another extreme point is entire
The optimal solution that population is found so far, i.e. global extreme point, position are gbest=(g1,g2,…gd)。
Before particle swarm optimization algorithm starts, the maximum number of iterations MaxK of particle swarm optimization algorithm is set first, according to M
=100-60 (MaxK-k)/MaxK initialization population is the population of M, to achieve the purpose that increase population diversity, initially
The position vector and velocity vector of each particle, while initializing individual extreme value pbestiAnd global extreme point gbest be zero to
Amount.
S103: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding
Practical life span is updated the speed and position of each particle.
In the present embodiment, using Bayesian Classification Model to particle swarm optimization algorithm select biomarker
Collection is measured, and the classification of each particle is determined, to calculate the fitness of each particle.
Specifically, the Bayesian Classification Model that the present embodiment is chosen is Naive Bayes Classification Model.
S104: repeating S103, until reaching default cut-off condition, exports the current global extreme point of the population, and
Using the biomarker selected in the current global extreme point of the population as the key organism mark of prediction life span
Object.
The present embodiment carries out feature selecting using particle swarm optimization algorithm combination Naive Bayes Classification Model, and one is simple
Bayesian Classification Model can cope with the characteristics of data set missing values, and two carry out Naive Bayes Classification Model in the presence of " feature is mutual
It is independent " it is assumed that therefore using random search algorithm progress feature selecting to cope with the higher-dimension characteristic of data set, overall Mining Strategy
As shown in fig. 7, scanning for by population search strategy to biomarker, then pass through the evaluation such as Bayesian Classification Model
Strategy determines the precision of prediction of the life span prediction result for the biomarker chosen, and determines feedback plan according to precision of prediction
Slightly, it repeats and scans for, until finding key organism marker.
The present embodiment acts on the reciprocation between biomarker using particle swarm optimization algorithm, and using Bayes point
Class model measures the biomarker subset that particle swarm optimization algorithm is picked out, until getting optimal global extremum
Point, the present embodiment can excavate crucial biomarker combination from flood tide biomarker data acquisition system, effectively reduce
Useless or redundancy biomarker for cancer data, and then improve the precision of prediction of cancer patient's life cycle.
In one embodiment of the invention, before S102, the method for excavation cancer key organism marker further include:
Data prediction is carried out to the biomarker data in each cancer patient's sample, and to the reality of each cancer patient's sample
Life span carries out sliding-model control.
In the present embodiment, each cancer patient's sample includes a large amount of biomarker data, is needed to obtaining
Cancer patient's sample carry out data prediction.In process of data preprocessing, it is necessary first to by redundancy present in sample and mistake
Data accidentally filter out, and useless biomarker is especially verified as by medical domain.Secondly, it is also necessary to suffer from all cancers
Biomarker data volume is rejected less than the data of case load 80% in the data set of the biomarker composition of person's sample.Most
Afterwards, it needs the practical life span of cancer patient's sample carrying out sliding-model control, cancer patient's sample can be divided into less than 3
Year, 3~5 years and 5 years extra.
In the present embodiment, it will be denoted as set A by the biomarker data acquisition system of data prediction, and number will be passed through
Cancer patient's sample of Data preprocess carries out population initialization and subsequent processing.
In one embodiment of the invention, as shown in Fig. 2, Fig. 2 shows the specific implementation flow of S103 in Fig. 1,
Include:
S201: according to biomarker data of each particle in Bayesian Classification Model, the population and corresponding
Practical life span calculates the fitness of each particle in the population;
S202: according to the fitness of particle each in the population, the speed and position of each particle are updated.
In one embodiment of the invention, Fig. 3 shows the specific implementation flow of the S201 in Fig. 2, and process is described in detail
It is as follows:
S301: the corresponding biomarker data of each particle are inputted into the Bayesian Classification Model respectively, output is each
The life span classification results of a particle, and according to the life span classification results of each particle with it is corresponding it is practical survive when
Between, obtain the nicety of grading of each particle;
S302: according to the nicety of grading of each particle, the fitness of each particle is calculated.
In the present embodiment, classified using right-angled intersection verification method to particle.It can be by population 80%
The biomarker that particle is chosen as training sample, according to each training sample is by the biomarker of corresponding cancer patient
Data input Bayesian Classification Model, pass through the corresponding biomarker subset of each particle and practical life span training pattra leaves
This disaggregated model obtains life span point then by the Bayesian Classification Model after in population 20% particle input training
Class result.Then according to the life span classification results of each particle and corresponding practical life span, each particle is determined
Nicety of grading.
In the present embodiment, the nicety of grading of each particle can be substituted into fitness function, obtains the suitable of each particle
Response.
In one embodiment of the invention, as shown in figure 4, Fig. 4 shows the specific implementation flow of S202 in Fig. 2,
Details are as follows for process:
S401: it according to the fitness of each particle, updates the current individual extreme point of each particle and the population is worked as
Preceding global extreme point.
In the present embodiment, if the fitness of particle is Fit[i], then by the fitness F of each particleit[i] and the particle
Corresponding current individual extreme point is compared, if Fit[i] > pbest (i), then use Fit[i] replaces current of the particle
Body extreme point pbest (i), if Fit[i] < pbest (i), then the individual extreme point for keeping the particle current is constant, and by each grain
The fitness F of sonit[i] is compared with gbest, if Fit[i] > gbest, then use Fit[i] replaces gbest, complete individual extreme point and
The update of global extreme point.
S402: according to the current global extreme point of the current individual extreme point of each particle and the population, to each
The speed of particle and position are updated.
In the present embodiment, the speed of each particle is updated according to speed more new formula, speed more new formula is such as
Shown in formula (1).
vij(t+1)=ωi·vij(t)+c1·rand()·(pbestij(t)-xij(t))+c2·rand()·
(gbestj(t)-xij(t)) (1)
The update of particle rapidity is divided into three parts, and first part is the inertia of particle present speed, and second part is particle
Cognitive behavior, i.e., the elaborative faculty of particle itself, Part III are the social actions of particle, i.e., each interparticle information is total
It enjoys and cooperates with each other, in formula (1), ω indicates inertia weight, and when ω value is larger, algorithm may have stronger global search energy
Power, when ω value is smaller, algorithm is more likely to local search.So the initial value of ω can be 0.9, with the number of iterations
Increase again linear decrease makes algorithm first lay particular emphasis on global search, search space quickly converges on a certain region, then passes through to 0.4
Local search obtains high-precision solution;c1,c2It is two Studying factors, suitable c1, c2Can accelerate convergence rate and
It is not easy to fall into local optimum, general c1,c2It is taken as 2;Random number of the rand () between (0,1).Every one-dimensional speed of particle
vijIt is limited in [- vmax,vmax] between, to avoid particle far from search space.If the one-dimensional speed of certain of current particle is more than
Maximum speed v on this is one-dimensionalmax, then the dimension speed is restricted to maximum speed.
In the present embodiment, shown in location update formula such as formula (2), the present embodiment is by location update formula to particle
Position is updated.
xij(t+1)=xij(t)+vij(t+1) (2)
In formula (2), xij(t) position vector of the t moment particle, x are indicatedij(t+1) position of the t+1 moment particle is indicated
Set vector, vij(t+1) velocity vector of the t+1 moment particle is indicated.According to formula (2), can by the current location of particle and
Updated velocity vector determines updated particle position vector.
In one embodiment of the invention, the specific implementation flow of S104 includes: in Fig. 1
S103 is repeated, until the number of iterations reaches maximum number of iterations, then exports the current global extremum of the population
Point;Or,
S103 is repeated, until prediction error rate is less than default error rate, then exports the current global extremum of the population
Point, the prediction error rate are whole when by the global extreme point all cancer patient's samples are carried out with life span prediction
Body prediction error rate.
In the present embodiment, default cut-off condition includes maximum number of iterations, when the number of iterations of particle swarm optimization algorithm
When reaching maximum number of iterations, then stop iteration, and using the biomarker selected in current global extreme point as large intestine
The key organism marker of cancer.
In the present embodiment, default cut-off condition can also include default error rate, click when with current global extremum
The biomarker taken substitutes into Bayesian Classification Model then to be stopped iterating to calculate when prediction error rate is less than default error rate.
In the present embodiment, as long as reaching one kind of above-mentioned default cut-off condition, stop iterating to calculate, exports global extremum
Point.
In one embodiment of the invention, after speed and the location updating for carrying out particle for the first time, judge conditions present
Whether satisfaction presets cut-off condition, if not satisfied, then will the corresponding biomarker data substitution shellfish of updated each particle
This disaggregated model of leaf, obtains the nicety of grading of each particle, carries out to each particle according to the sequence of nicety of grading from high to low
Sequence, and w particle after taking sequence to arrange, are back to S102 for the w particle and initialize, and by w grain of initialization
Son and remaining m-w particle input Bayesian Classification Model respectively, and repeat the process of S103-S104, and calculating every time
After the nicety of grading of particle, the lower w particle of nicety of grading is taken to re-start initialization, until reaching default cut-off item
Part.
Further, w value can be a fixed value in entire calculating process, can also be with the increase of the number of iterations
Gradually it is incremented by, so that the population for gradually falling into local optimum be broken up, improves the diversity of population, population is avoided to fall into office
Portion is optimal, improves the accuracy that biomarker excavates.
By taking a practical application scene as an example, the application is by using certain university's oncology to PATIENTS WITH LARGE BOWEL over 15 years
Clinical data feature, treatment feature and biomarker arrange to obtain initial data set, the excavation cancer proposed to the application
The method of key organism marker is verified.
Specifically, initial data set enumerates a large amount of biomarker information, and to the same biomarker into
It has gone more people, multimode measurement, ensure that the diversity of data.Initial data set contains 985 column, 809 row data altogether, often
What row represented is each characteristic index value of different patients, and each column indicates colorectal cancer feature, wherein 1~64 is classified as clinical data spy
Sign, 65~114 are classified as treatment feature, and 115~985 are classified as biomarker data.
, biomarker more serious for higher-dimension, the shortage of data of initial data set measurement diversity, successive value with
Discrete value such as coexists at the features, carries out pretreatment work to data, obtains the standard data set comprising 845 column and 265 rows, wherein 1
~64 column indicate Clinical symptoms, and 65~114 column indicate the treatment feature of patient, remaining is biomarker Characteristics.
Research passes through digging provided by the present application respectively using life cycle 5~year of PATIENTS WITH LARGE BOWEL, 3~5 year life cycle as target
The redundancy feature that the method for pick cancer key organism marker concentrates normal data filters out, and excavating has life cycle prediction
The key organism marker of great influence.Table 1 lists Result.As can be seen that in contrast, the key organism excavated
The precision that marker is predicted is significantly improved.
The 2 groups of features predicted for 5~life cycle of patient picked out based on disclosed method are listed in table 2
Subset information separately includes 22 and 27 features.
It is listed in table 3 select for existence in forecast colorectal cancer patient 3~5 years based on disclosed method
The character subset information of phase.Prediction of survival is carried out using 27 and 26 key organism markers of the present processes output
Accuracy rate has respectively reached 75.09% and 76.6%.
The embodiment of the present invention based on 5~year of PATIENTS WITH LARGE BOWEL or 3~5 years discrete life cycle prediction research on,
Predictablity rate has been more than 92% and 75% respectively.Demonstrate the feasibility and validity of this method for digging.Through this embodiment
In the key organism marker subset that the method for offer is excavated, the accounting very little of Clinical symptoms, it was demonstrated that biomarker is pre-
Survey importance when PATIENTS WITH LARGE BOWEL life cycle.There may be some total in the biomarker subset that the application selects simultaneously
Some biomarker information, such as " All-CT ", " Dukes " etc., illustrate these common biomarkers to colorectal cancer from
It is critically important to dissipate life cycle prediction.And these biomarkers for colorectal cancer hierarchical composition " variation of A, B, C, D " are very big,
Illustrate under different biomarker combinations, the classification of colorectal cancer can be refined further.
1 Result of table
The character subset that table 2 is excavated based on the prediction of 5~life cycle
The character subset that table 3 was excavated based on 3~5 years Prediction of survival
The phenomenon that same biomarker coexists there are statistical methods, such as " D2-40 ", this explanation are same
The different statistics of a biomarker provide different advantageous informations for the prediction of survival of patients time.In general, this kind of
The statistical information of biomarker is by carrying out discretization as a result, the process of this discretization is inevitable to successive value
It will cause information loss, and the group credit union advantageous loss for making up this partial information to a certain extent of different statistics.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
As shown in figure 5, the device 100 for the excavation cancer key organism marker that one embodiment of the present of invention provides, is used
Method and step in the embodiment corresponding to execution Fig. 1 comprising:
Sample acquisition module 110, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes more
A biomarker data and corresponding practical life span;
Population initialization module 120, for using all cancer patient's samples as the particle in population, and by cancer
Component of a vector of the relevant biomarker as particle initializes each particle of the population;
Particle update module 130, for the biological marker according to each particle in Bayesian Classification Model, the population
Object data and corresponding practical life span, are updated the speed and position of each particle;
Key organism marker obtains module 140, for repeating the calculating process of particle update module, until reaching default
Cut-off condition exports the current global extreme point of the population, and will select in the current global extreme point of the population
Biomarker as prediction life span key organism marker.
In one embodiment of the invention, the device 100 of cancer key organism marker is excavated further include:
Data preprocessing module is located in advance for carrying out data to the biomarker data in each cancer patient's sample
Reason, and sliding-model control is carried out to the practical life span of each cancer patient's sample.
In one embodiment of the invention, the particle update module 130 in Fig. 5 further includes for executing corresponding to Fig. 2
Embodiment in method and step structure comprising:
Fitness computing unit, for the biological marker according to each particle in Bayesian Classification Model, the population
Object data and corresponding practical life span, calculate the fitness of each particle in the population;
Particle updating unit, for the fitness according to particle each in the population, speed to each particle and
Position is updated.
In one embodiment of the invention, fitness computing unit further includes for executing embodiment corresponding to Fig. 3
In method and step structure comprising:
Nicety of grading computation subunit, for the corresponding biomarker data of each particle to be inputted the pattra leaves respectively
This disaggregated model, exports the life span classification results of each particle, and according to the life span classification results of each particle with
Corresponding practical life span, obtains the nicety of grading of each particle;
Fitness computation subunit calculates the fitness of each particle for the nicety of grading according to each particle.
In one embodiment of the invention, particle updating unit further includes for executing in embodiment corresponding to Fig. 4
Method and step structure comprising:
Extreme point updates subelement, for the fitness according to each particle, updates the current individual extreme value of each particle
Point and the current global extreme point of the population;
Particle updates subelement, for the overall situation current according to the current individual extreme point of each particle and the population
Extreme point is updated the speed and position of each particle.
In one embodiment of the invention, key organism marker acquisition module 140 includes:
First global extreme point output unit, for repeating S103, until the number of iterations reaches maximum number of iterations, then it is defeated
The current global extreme point of the population out;Or,
Second global extreme point output unit, for repeating S103, until prediction error rate is less than default error rate, then it is defeated
The current global extreme point of the population out, the prediction error rate are by the global extreme point to all cancer patients
Sample carries out whole prediction error rate when life span prediction.
In one embodiment, the device 100 for excavating cancer key organism marker further includes other function module/mono-
Member, for realizing the method and step in each embodiment in embodiment 1.
Fig. 6 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 6, the terminal of the embodiment is set
Standby 600 include: processor 60, memory 61 and are stored in the memory 61 and can run on the processor 60
Computer program 62.The processor 60 realizes above-mentioned each excavation cancer key organism mark when executing the computer program 62
Remember the step in the embodiment of the method for object, such as S101 shown in FIG. 1 to S104.Alternatively, the processor 60 executes the meter
The function of each module/unit in above-mentioned each Installation practice, such as module 110 to 140 shown in Fig. 5 are realized when calculation machine program 62
Function.
The computer program 62 can be divided into one or more module/units, and one or more of modules/
Unit is stored in the memory 61, and is executed by the processor 60, to complete the present invention.One or more of moulds
Block/unit can be the series of computation machine program instruction section that can complete specific function, the instruction segment by describe it is described based on
Implementation procedure of the calculation machine program 62 in the terminal device 600.
The terminal device 600 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set
It is standby.The terminal device may include, but be not limited only to, processor 60, memory 61.It will be understood by those skilled in the art that Fig. 6
The only example of terminal device 600 does not constitute the restriction to terminal device 600, may include more more or less than illustrating
Component, perhaps combine certain components or different components, such as the terminal device can also be set including input and output
Standby, network access equipment, bus etc..
Alleged processor 60 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 61 can be the internal storage unit of the terminal device 600, such as the hard disk of terminal device 600
Or memory.The memory 61 is also possible to the External memory equipment of the terminal device 600, such as the terminal device 600
The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital,
SD) block, flash card (Flash Card) etc..Further, the memory 61 can also both include the terminal device 600
Internal storage unit also includes External memory equipment.The memory 61 is for storing the computer program and the terminal
Other programs and data needed for equipment.The memory 61, which can be also used for temporarily storing, have been exported or will export
Data.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing
The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also
To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list
Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system
The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with
It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute
The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as
Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately
A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device
Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or
In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation
All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program
Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on
The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie
Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk,
Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described
The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice
Subtract, such as does not include electric carrier signal and electricity according to legislation and patent practice, computer-readable medium in certain jurisdictions
Believe signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of method for excavating cancer key organism marker characterized by comprising
S101: obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation include multiple biomarker data and
Corresponding practical life span;
S102: using all cancer patient's samples as the particle in population, and using the relevant biomarker of cancer as grain
The component of a vector of son, initializes each particle of the population;
S103: according to the biomarker data and corresponding reality of each particle in Bayesian Classification Model, the population
Life span is updated the speed and position of each particle;
S104: repeating S103, until reach default cut-off condition, exports the current global extreme point of the population, and by institute
State key organism marker of the biomarker selected in the current global extreme point of population as prediction life span.
2. excavating the method for cancer key organism marker as described in claim 1, which is characterized in that before S102, also
Include:
Data prediction is carried out to the biomarker data in each cancer patient's sample, and to each cancer patient's sample
Practical life span carries out sliding-model control.
3. excavating the method for cancer key organism marker as described in claim 1, which is characterized in that the S103 includes:
When according to the biomarker data of each particle in Bayesian Classification Model, the population and corresponding practical existence
Between, calculate the fitness of each particle in the population;
According to the fitness of particle each in the population, the speed and position of each particle are updated.
4. excavating the method for cancer key organism marker as claimed in claim 3, which is characterized in that described according to Bayes
The biomarker data of each particle and corresponding practical life span, calculate the grain in disaggregated model, the population
The fitness of each particle in subgroup, comprising:
The corresponding biomarker data of each particle are inputted into the Bayesian Classification Model respectively, export the life of each particle
Deposit chronological classification as a result, and according to the life span classification results of each particle and corresponding practical life span, obtain each
The nicety of grading of particle;
According to the nicety of grading of each particle, the fitness of each particle is calculated.
5. excavating the method for cancer key organism marker as claimed in claim 3, which is characterized in that described according to the grain
The fitness of each particle in subgroup is updated the speed and position of each particle, comprising:
According to the fitness of each particle, the global pole of the current individual extreme point of each particle and the population currently is updated
Value point;
According to the current global extreme point of the current individual extreme point of each particle and the population, to the speed of each particle
And position is updated.
6. such as the method described in any one of claim 1 to 5 for excavating cancer key organism marker, which is characterized in that described
S103 is repeated, until reaching default cut-off condition, exports the current global extreme point of the population, and the population is worked as
Key organism marker of the biomarker selected in preceding global extreme point as prediction life span, comprising:
S103 is repeated, until the number of iterations reaches maximum number of iterations, then exports the current global extreme point of the population;
Or,
S103 is repeated, until prediction error rate is less than default error rate, then exports the current global extreme point of the population, institute
It is whole pre- when by the global extreme point all cancer patient's samples are carried out with life span prediction for stating prediction error rate
Survey error rate.
7. a kind of device for excavating cancer key organism marker characterized by comprising
Sample acquisition module, for obtaining at least one cancer patient's sample, each cancer patient's sample standard deviation includes multiple biologies
Mark number evidence and corresponding practical life span;
Population initialization module, for using all cancer patient's samples as the particle in population, and cancer is relevant
Component of a vector of the biomarker as particle initializes each particle of the population;
Particle update module, for the biomarker data according to each particle in Bayesian Classification Model, the population
And corresponding practical life span, the speed and position of each particle are updated;
Key organism marker obtains module, for repeating the calculating process of particle update module, until reaching default cut-off item
Part exports the current global extreme point of the population, and the biology that will be selected in the current global extreme point of the population
Key organism marker of the marker as prediction life span.
8. excavating the device of cancer key organism marker as claimed in claim 7, which is characterized in that described device is also wrapped
It includes:
Data preprocessing module, for carrying out data prediction to the biomarker data in each cancer patient's sample, and
Sliding-model control is carried out to the practical life span of each cancer patient's sample.
9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 6 when executing the computer program
The step of any one the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In when the computer program is executed by processor the step of any one of such as claim 1 to 6 of realization the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910814283.6A CN110517786A (en) | 2019-08-30 | 2019-08-30 | Excavate the method and device of cancer key organism marker |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910814283.6A CN110517786A (en) | 2019-08-30 | 2019-08-30 | Excavate the method and device of cancer key organism marker |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110517786A true CN110517786A (en) | 2019-11-29 |
Family
ID=68628379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910814283.6A Pending CN110517786A (en) | 2019-08-30 | 2019-08-30 | Excavate the method and device of cancer key organism marker |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110517786A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763528A (en) * | 2009-12-25 | 2010-06-30 | 深圳大学 | Gene regulation and control network constructing method based on Bayesian network |
CN101794115A (en) * | 2010-03-08 | 2010-08-04 | 清华大学 | Scheduling rule intelligent excavating method based on rule parameter global coordination optimization |
CN103942599A (en) * | 2014-04-23 | 2014-07-23 | 天津大学 | Particle swarm optimization method based on survival of the fittest and step-by-step selection |
-
2019
- 2019-08-30 CN CN201910814283.6A patent/CN110517786A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763528A (en) * | 2009-12-25 | 2010-06-30 | 深圳大学 | Gene regulation and control network constructing method based on Bayesian network |
CN101794115A (en) * | 2010-03-08 | 2010-08-04 | 清华大学 | Scheduling rule intelligent excavating method based on rule parameter global coordination optimization |
CN103942599A (en) * | 2014-04-23 | 2014-07-23 | 天津大学 | Particle swarm optimization method based on survival of the fittest and step-by-step selection |
Non-Patent Citations (2)
Title |
---|
任淑霞: "基于概率的不确定时态数据建模与挖掘问题的研究", 《中国博士学位论文全文数据库信息科技辑》 * |
陈振东等: "《肿瘤学概论》", 31 March 2006, 人民军医出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nicora et al. | Integrated multi-omics analyses in oncology: a review of machine learning methods and tools | |
Huerta-Cepas et al. | PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome | |
Lee et al. | Computational methods for discovering gene networks from expression data | |
Chen et al. | Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks | |
Tibshirani et al. | Diagnosis of multiple cancer types by shrunken centroids of gene expression | |
Junker et al. | Exploration of biological network centralities with CentiBiN | |
CN109446517A (en) | Reference resolution method, electronic device and computer readable storage medium | |
Qu et al. | Towards label-efficient automatic diagnosis and analysis: a comprehensive survey of advanced deep learning-based weakly-supervised, semi-supervised and self-supervised techniques in histopathological image analysis | |
CN107515890A (en) | A kind of method and terminal for identifying resident point | |
Arneson et al. | Multidimensional integrative genomics approaches to dissecting cardiovascular disease | |
CN103034687B (en) | A kind of relating module recognition methodss based on 2 class heterogeneous networks | |
CN108960264A (en) | The training method and device of disaggregated model | |
CN112818218B (en) | Information recommendation method, device, terminal equipment and computer readable storage medium | |
Lv et al. | Mol2Context-vec: learning molecular representation from context awareness for drug discovery | |
Qu et al. | Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning | |
Li et al. | HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^ 6 A) based on multiple weights and feature stitching | |
Aydin et al. | Developing structural profile matrices for protein secondary structure and solvent accessibility prediction | |
CN110263233A (en) | Enterprise's public sentiment base construction method, device, computer equipment and storage medium | |
CN110827924A (en) | Clustering method and device for gene expression data, computer equipment and storage medium | |
Wang et al. | Lung cancer subtype diagnosis using weakly-paired multi-omics data | |
Zhang et al. | MODEC: an unsupervised clustering method integrating omics data for identifying cancer subtypes | |
CN109522275A (en) | Label method for digging, electronic equipment and the storage medium of content are produced based on user | |
Pak et al. | Network propagation for the analysis of multi-omics data | |
Li et al. | Assisted gene expression‐based clustering with AWNCut | |
CN107679222A (en) | Image processing method, mobile terminal and computer-readable recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191129 |
|
RJ01 | Rejection of invention patent application after publication |