CN108280327B - Ex-warehouse method for improving sample diversity of sample library - Google Patents

Ex-warehouse method for improving sample diversity of sample library Download PDF

Info

Publication number
CN108280327B
CN108280327B CN201810133242.6A CN201810133242A CN108280327B CN 108280327 B CN108280327 B CN 108280327B CN 201810133242 A CN201810133242 A CN 201810133242A CN 108280327 B CN108280327 B CN 108280327B
Authority
CN
China
Prior art keywords
sample
warehouse
samples
diversity
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810133242.6A
Other languages
Chinese (zh)
Other versions
CN108280327A (en
Inventor
皇甫伟
李佳轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201810133242.6A priority Critical patent/CN108280327B/en
Publication of CN108280327A publication Critical patent/CN108280327A/en
Application granted granted Critical
Publication of CN108280327B publication Critical patent/CN108280327B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Abstract

The invention provides a ex-warehouse method for improving sample diversity of a sample library, which can enable the diversity of residual samples in the sample library to reach the highest. The method comprises the following steps: determining an index for measuring the sample diversity in the sample library; preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library; and determining the ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the sample diversity index in the determined sample library in the samples meeting the ex-warehouse conditions. The present invention relates to the field of information management.

Description

Ex-warehouse method for improving sample diversity of sample library
Technical Field
The invention relates to the field of information management, in particular to a ex-warehouse method for improving sample diversity of a sample library.
Background
The biological sample bank is also called biological bank (Biobank) and mainly refers to standardized collection, processing, storage and application of samples such as biological macromolecules, cells, tissues and organs of healthy and disease organisms (including human organ tissues, whole blood, blood plasma, blood serum, biological body fluid or processed biological samples (DNA, RNA, protein and the like), and clinical, pathological, treatment, follow-up visit, informed consent and other data related to the biological samples, and a quality control, information management and application system thereof.
As shown in fig. 1, fig. 1 is a schematic diagram of a warehousing and ex-warehousing process of a biological sample library, when a sample is ex-warehouse, a relevant researcher first needs to screen out a sample meeting the needs of the researcher, which involves a problem of sample diversity in the biological sample library. There are typically tens or even hundreds of thousands of samples in the biological sample library, and each sample has related attributes of hundreds or tens of samples, such as sample type, sampling location, blood type, sex, age, height of the person to be sampled, whether the person has a certain disease, family history, lifestyle, etc. Where the sample is screened by a researcher, the sample may only be of a single attribute, for example, as long as the o-blood sample does not require other attributes of the sample. At this time, under the condition of meeting the requirements of the researchers, a plurality of samples can be selected, the problem of sample diversity is considered, the diversity of the rest samples is guaranteed to the maximum extent, and the requirements of more researchers on samples with different attributes are met.
When the samples are taken out of the warehouse for screening the samples, due to no consideration of sample diversity and related methods, the samples are generally selected sequentially or randomly in a large number of samples meeting the screening conditions. The sample diversity is not considered in the sequential selection and the random selection, so that the sample diversity in a sample library is easily reduced, the sample attribute is single, a certain attribute sample is deficient, the research result of the ex-warehouse sample is deviated, and the research requirement of subsequent researchers is difficult to meet.
Disclosure of Invention
The invention aims to provide a ex-warehouse method for improving sample diversity of a sample library, and solve the problem that sample diversity in the sample library is easily reduced due to sequential selection of ex-warehouse samples and random selection of ex-warehouse samples in the prior art.
In order to solve the above technical problems, an embodiment of the present invention provides a method for improving sample diversity of a sample library, including:
determining an index for measuring the sample diversity in the sample library;
preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library;
and determining the ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the sample diversity index in the determined sample library in the samples meeting the ex-warehouse conditions.
Further, the determined index for measuring the sample diversity in the sample library is:
Figure BDA0001575537670000021
wherein H represents the sample diversity index in the sample library, HjRepresenting the sample diversity of the jth attribute, m representing the total number of attributes of the samples in the sample library, n representing the number of types of samples in a single attribute, PiRepresenting the proportion of individuals belonging to the i-th species in the sample pool。
Further, Pi=niN, wherein NiIndicates the number of ith individuals, and N indicates the total number of individuals in the sample.
Further, the preliminary selection of the samples meeting the ex-warehouse conditions from the predetermined sample warehouse comprises:
inputting a warehouse-out request;
and according to the input ex-warehouse request, preliminarily selecting samples meeting ex-warehouse conditions from a predetermined sample library.
Further, among the samples meeting the ex-warehouse conditions, according to the determined sample diversity index in the sample library, determining the ex-warehouse samples which enable the residual sample diversity in the sample library to be the highest comprises the following steps:
and determining the ex-warehouse sample with the highest diversity of the residual samples in the sample library by using a genetic algorithm according to the determined sample diversity index in the sample library in the samples meeting the ex-warehouse condition.
Further, the determining the ex-warehouse sample with the highest diversity of the remaining samples in the sample warehouse by using the genetic algorithm comprises the following steps:
step 1, initializing a population;
step 2, setting a fitness function as a sample diversity index of residual samples in the sample library, and calculating the fitness value of individuals in the population according to the set fitness function, wherein the residual samples are all samples in the sample library except selected ex-warehouse samples;
and 3, selecting, crossing and mutating according to the fitness value of the individual obtained by calculation, returning the new population generated after selection, crossing and mutation to the step 2, calculating the fitness value of the individual in the new population, judging whether the fitness value meets a preset termination criterion, outputting the individual with the largest fitness value as the best individual if the fitness value meets the preset termination criterion, determining a ex-warehouse sample according to the best individual, finishing iteration, and returning to the step 2 to continue iteration if the fitness value does not meet the preset termination criterion.
Further, the initializing population includes:
randomly generating an initial population, each individual in the population being generated from y numbers, each numberThe characters represent the numbers of the selected samples meeting the ex-warehouse conditions, each number corresponds to an ex-warehouse identification, the ex-warehouse identification is used for identifying whether the corresponding samples are selected for ex-warehouse, and y represents the number of the samples meeting the ex-warehouse conditions. Further, the ith individual x in the populationiFitness function of (x)i)=HiWherein H isiIndicates that x is removed from all samples in the sample pooliThe selected sample is used as the sample diversity index of the ex-warehouse sample.
Further, the judging whether the preset termination criterion is met or not, if so, outputting the individual with the maximum fitness value as the optimal individual, determining the ex-warehouse sample according to the optimal individual, and ending the iteration comprises:
judging whether the current iteration times is the preset maximum iteration times, if so, outputting the individual with the maximum fitness value as the optimal individual, determining ex-warehouse samples according to y ex-warehouse identifications corresponding to the optimal individual, and ending the iteration.
The technical scheme of the invention has the following beneficial effects:
in the scheme, an index for measuring the sample diversity in a sample library is determined; preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library; and determining the ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the sample diversity index in the determined sample library in the samples meeting the ex-warehouse conditions. Therefore, the ex-warehouse samples can well meet the research conditions of related researchers, so that the research is not deviated, the diversity of the residual samples in the sample warehouse can reach the highest, and the research requirements of more researchers can be met.
Drawings
FIG. 1 is a schematic view of a process of warehousing and ex-warehouse of a biological sample warehouse;
FIG. 2 is a schematic flow chart illustrating a ex-warehouse method for improving sample diversity of a sample library according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a genetic algorithm provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a probability density curve when the number of ex-warehouses is k according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a relationship between the number k of ex-warehouse items and the success rate of ex-warehouse items under an extreme condition according to an embodiment of the present invention;
FIG. 6 is a schematic diagram showing the comparison of the reduction of diversity index between random and genetic algorithm.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a ex-warehouse method for improving the sample diversity of a sample library, aiming at the problem that the sample diversity in the sample library is easily reduced by sequentially selecting ex-warehouse samples and randomly selecting ex-warehouse samples in the prior art.
As shown in fig. 2, the method for improving sample diversity of a sample library provided in the embodiment of the present invention includes:
s101, determining an index for measuring sample diversity in a sample library;
s102, preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library;
and S103, determining ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the determined sample diversity index in the sample library in the samples meeting the ex-warehouse conditions.
The ex-warehouse method for improving the sample diversity of the sample library determines an index for measuring the sample diversity in the sample library; preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library; and determining the ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the sample diversity index in the determined sample library in the samples meeting the ex-warehouse conditions. Therefore, the ex-warehouse samples can well meet the research conditions of related researchers, so that the research is not deviated, the diversity of the residual samples in the sample warehouse can reach the highest, and the research requirements of more researchers can be met.
In the foregoing specific embodiment of the method for improving sample diversity of a sample library, further, the index for measuring sample diversity in the sample library is determined as follows:
Figure BDA0001575537670000051
wherein H represents the sample diversity index in the sample library, HjRepresenting the sample diversity of the jth attribute, m representing the total number of attributes of the samples in the sample library, n representing the number of types of samples in a single attribute, PiRepresenting the proportion of individuals belonging to the i-th species in the sample pool, Pi=ni/N,niIndicates the number of ith individuals, and N indicates the total number of individuals in the sample.
The sample diversity index in the sample library can be calculated through the sample diversity index formula, and the sample diversity index provided by the embodiment is multidimensional and multiattribute and can measure the sample diversity with multiple attributes.
In the foregoing embodiment of the method for increasing the diversity of samples in a sample library, further, the preliminary selecting of samples meeting the ex-warehouse conditions from a predetermined sample library includes:
inputting a warehouse-out request;
and according to the input ex-warehouse request, preliminarily selecting samples meeting ex-warehouse conditions from a predetermined sample library.
In this embodiment, it is assumed that 10000 samples exist in the sample library and 20 attributes of each sample record exist, at this time, a user (for example, a researcher) only requires 15 samples of one kind of one attribute, and it is assumed that 3000 samples meeting the condition exist, 15 samples can be selected from 3000 samples meeting the ex-warehouse screening condition, so that after 15 samples are ex-warehouse, the diversity of all remaining samples reaches the highest.
In the foregoing specific embodiment of the method for increasing the sample diversity of the sample library, further, the determining, according to the sample diversity index in the determined sample library, the ex-warehouse sample with the highest sample diversity in the sample library among the samples meeting the ex-warehouse condition includes:
and determining the ex-warehouse sample with the highest diversity of the residual samples in the sample library by using a genetic algorithm according to the determined sample diversity index in the sample library in the samples meeting the ex-warehouse condition.
In this embodiment, there are many choices for selecting 15 samples from 3000 samples, and a large amount of work is required to calculate the sample diversity of each choice. In this example, the Genetic Algorithm (GA) was used to find the export sample (which may also be called the export sample near-optimal solution) with the highest diversity of the remaining samples, and the export was successfully performed.
In the aforementioned specific embodiment of the method for increasing the diversity of the samples in the sample library, further, the determining the ex-warehouse samples with the highest diversity of the remaining samples in the sample library by using the genetic algorithm comprises:
step 1, initializing a population;
step 2, setting a fitness function as a sample diversity index of residual samples in the sample library, and calculating the fitness value of individuals in the population according to the set fitness function, wherein the residual samples are all samples in the sample library except selected ex-warehouse samples;
and 3, selecting, crossing and mutating according to the fitness value of the individual obtained by calculation, returning the new population generated after selection, crossing and mutation to the step 2, calculating the fitness value of the individual in the new population, judging whether the fitness value meets a preset termination criterion, outputting the individual with the largest fitness value as the best individual if the fitness value meets the preset termination criterion, determining a ex-warehouse sample according to the best individual, finishing iteration, and returning to the step 2 to continue iteration if the fitness value does not meet the preset termination criterion.
As shown in fig. 3, the specific step of determining the ex-warehouse sample with the highest diversity of the remaining samples in the sample library by using the genetic algorithm may include:
a11, initializing, generating an initial population: randomly generating an initial population consisting of z individuals, each individual in the population being generated by y numbers (e.g., y decimal numbers), each number (decimal number) representing a selected sample number meeting the ex-warehouse condition, each number corresponding to an ex-warehouse identifier for identifying whether the corresponding sample is selectedThe banks (for example, 1 represents selected ex-warehouse, 0 represents not selected ex-warehouse), y represents the number of samples meeting ex-warehouse conditions, and the ith individual in the population is represented by xiAnd (4) showing.
The larger the population size z, the less easily an individual optimal solution dominates the direction of evolution of the overall solution. The smaller the population scale z is, the slower the speed of finding the optimal solution is, and the problem of local optimization exists. The size of the population size is related to the complexity of the individual genes, and the specific size can be determined by trial and error to determine an estimated number. In this example, z is 100.
A12, setting a fitness function as a diversity index of the residual samples, and calculating the fitness value of the individual in the population according to the set fitness function, wherein the ith individual x in the populationiFitness function of (x)i)=HiWherein H isiIndicates that x is removed from all samples in the sample pooliThe selected sample is used as the sample diversity index of the ex-warehouse sample.
A13, selecting: the selection strategy employs a roulette strategy in which the individual selection probabilities
Figure BDA0001575537670000061
fitness(xi) Representing an individual xiThe fitness value of (a) the roulette is made 25 times in total (in this embodiment, the roulette selection probability is set to 0.25, and the roulette selection probability is multiplied by the population size of 100 to obtain 25), a random number r is generated for each round, and when q is equal to qi-1<=r<qiWhen the individual i is selected, 25 better individuals can be selected. 25 of the remaining individuals were randomly selected and 50 individuals selected were used as parents.
A14, crossover: and adopting single-point crossing to set the crossing probability. And D, randomly combining every two individuals selected in the step A13 to generate a random number r, and if r is smaller than the crossing probability, crossing the group. The specific operation is that a cross point is randomly set in the individual string, when the crossing is carried out, the different partial structures of the two individuals before or after the cross point are exchanged, two new individuals are generated, and the crossing is stopped after the filial generation number after the crossing reaches 50. An example of a single point crossover is given below:
individual a: 1001 ↓111 → 1001000 new individual
Individual B: 0011 ↓ [ 000 → 0011111 new individual
A15, variation: and (4) setting variation probability by adopting multipoint variation. Generating a random number r, and if r is less than the mutation probability, mutating the individual.
A16, returning the new population generated after selection, intersection and variation to the step A12, calculating the fitness value of the individuals in the new population, judging whether the fitness value meets the preset termination criterion, if so, outputting the individual with the maximum fitness value as the best individual, determining ex-warehouse samples according to the y ex-warehouse identifications corresponding to the best individual, and ending iteration, otherwise, returning to the step A12 to continue iteration.
In this embodiment, the preset termination criterion is whether the current iteration number is a preset maximum iteration number.
In this embodiment, the ex-warehouse success rate may be used to evaluate the ex-warehouse method for improving the sample diversity of the sample library, and when the samples in the biological sample library cannot meet the samples required in the ex-warehouse request, the ex-warehouse is considered to be rejected, otherwise, the ex-warehouse is considered to be successfully ex-warehouse. Through multiple ex-warehouse request experiments, the success rate of ex-warehouse requests is calculated, namely the number of successful ex-warehouse requests is divided by the total number of ex-warehouse requests, and the higher the ex-warehouse success rate is compared with the random ex-warehouse method, the better the ex-warehouse method is proved.
In this embodiment, the following specific examples are combined to prove that the higher the sample diversity is, the higher the ex-warehouse success rate is:
setting the total number of samples in the sample library as S, wherein the samples have M types of attributes, and the corresponding attributes are A respectively1To AMIf there are N categories in each attribute, then attribute A1N classes of (A) are represented as11To a1NAnd so on. Class a11Is n11It is known that
Figure BDA0001575537670000071
Where i can take from 1 to M. Assume that the ex-warehouse sample selects attribute A1The probability of each kind is equal, the number of ex-warehouse samples is k, and k considers the actual situation and generally meets the Poisson distribution
Figure BDA0001575537670000081
Where X is an obtainable value and λ is an average occurrence rate of random events per unit time (or unit area), and fig. 4 is a schematic diagram of a probability density curve when the number of ex-warehouse is k.
Firstly, considering two extreme conditions, the relationship between the ex-warehouse number k and the ex-warehouse success rate:
first, the sample types of the attribute A1 are all a type a11At this time, the diversity of the attribute A1 sample is the lowest, and the diversity index is 0;
second, Attribute A1The samples of (1) are most uniform in type, that is, the number of each type is equal to S/N, and at this time, the attribute A1The diversity of the samples, the diversity index is the highest.
FIG. 5 shows the relationship between the ex-warehouse number k and the ex-warehouse success rate under two extreme conditions, and the ex-warehouse success rate P under two conditions can be calculated according to the ex-warehouse number probabilities under two conditions and the relationship between the ex-warehouse number k and the ex-warehouse success rate1、P2
In the first case of the process, the first,
Figure BDA0001575537670000082
wherein p is as shown in fig. 5, and represents an interval from 0 to S/N in k, and p is 1; and taking the interval from S/N to S in k, wherein p is 0.
In the second case of the present invention, the first case,
Figure BDA0001575537670000083
as shown in fig. 5, when k is in the interval from 0 to S, p is 1/N.
Wherein k obeys a Poisson distribution,
Figure BDA0001575537670000084
lambda is typically 100, while S is much larger than lambda,
Figure BDA0001575537670000085
and is also much larger than the value of lambda,
Figure BDA0001575537670000086
it can be seen from the ex-warehouse success probability in two extreme cases that when the sample diversity is the highest, the ex-warehouse success rate is much greater than that in the case of the lowest sample diversity.
Considering the general case below, the ex-warehouse success rate P can be expressed as:
Figure BDA0001575537670000087
it can be clearly seen that the ex-warehouse success rate is higher when the sample types are more uniform and the sample diversity index is higher.
In this example, the following specific examples are combined to demonstrate that applying a genetic algorithm can improve the sample diversity index of a sample library:
(1) assuming a total of 1000 samples, each sample has 16 attributes (A)1~A16) Each attribute has 32 different values (V)1-V32) And all the attributes of the sample are uniformly distributed.
(2) Randomly generate a ex-warehouse request, assuming that 15A's are taken out of it1Attribute is V1The diversity index of the remaining samples of the sample library is calculated. Through simulation experiments, the average level of 100 random extractions is the diversity index 1202.5, and the minimum diversity index reduction amount is 1195.2.
(3) The genetic algorithm is applied, 50 individuals of each generation of population are adopted, the retention rate of excellent individuals of the parents is 20%, other individuals are retained according to the probability of 25%, the remaining deficient individuals are formed by randomly crossing the parents, and the mutation rate is 20%. The variation modifies 1 ex-warehouse sample. 1188.7 after 35 iterations.
As can be seen from fig. 6, the decrease amount of the diversity index can be effectively reduced by applying the genetic algorithm, so that the sample diversity of the remaining samples reaches the highest, a good result is obtained in 12 iterations, and the near-optimal solution of the experiment is obtained in 23 iterations.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A ex-warehouse method for improving sample diversity of a sample library is characterized by comprising the following steps:
determining an index for measuring the sample diversity in the sample library;
preliminarily selecting a sample meeting the ex-warehouse condition from a predetermined sample library;
determining ex-warehouse samples which enable the diversity of the residual samples in the sample library to be the highest according to the determined sample diversity index in the sample library in the samples which accord with the ex-warehouse conditions;
wherein, among the samples meeting the ex-warehouse conditions, according to the determined sample diversity index in the sample library, determining the ex-warehouse sample which enables the diversity of the residual samples in the sample library to be the highest comprises the following steps:
determining ex-warehouse samples which enable the diversity of residual samples in the sample library to be highest by utilizing a genetic algorithm according to the determined sample diversity index in the sample library in samples meeting the ex-warehouse conditions;
wherein the determining the ex-warehouse sample with the highest diversity of the residual samples in the sample warehouse by using the genetic algorithm comprises the following steps:
step 1, initializing a population;
step 2, setting a fitness function as a sample diversity index of residual samples in the sample library, and calculating the fitness value of individuals in the population according to the set fitness function, wherein the residual samples are all samples in the sample library except selected ex-warehouse samples;
and 3, selecting, crossing and mutating according to the fitness value of the individual obtained by calculation, returning the new population generated after selection, crossing and mutation to the step 2, calculating the fitness value of the individual in the new population, judging whether the fitness value meets a preset termination criterion, outputting the individual with the largest fitness value as the best individual if the fitness value meets the preset termination criterion, determining a ex-warehouse sample according to the best individual, finishing iteration, and returning to the step 2 to continue iteration if the fitness value does not meet the preset termination criterion.
2. The method of claim 1, wherein the index for measuring the sample diversity in the sample library is determined as follows:
Figure FDA0002631637270000011
wherein H represents the sample diversity index in the sample library, HjRepresenting the sample diversity of the jth attribute, m representing the total number of attributes of the samples in the sample library, n representing the number of types of samples in a single attribute, PiIndicates the proportion of individuals belonging to the i-th species in the sample pool.
3. The method of claim 2, wherein P is the number of samples in the sample libraryi=niN, wherein NiIndicates the number of ith individuals, and N indicates the total number of individuals in the sample.
4. The method for improving the diversity of the samples in the sample library according to claim 1, wherein the preliminary selection of the samples meeting the ex-warehouse condition from the predetermined sample library comprises:
inputting a warehouse-out request;
and according to the input ex-warehouse request, preliminarily selecting samples meeting ex-warehouse conditions from a predetermined sample library.
5. The ex-warehouse method for improving the sample diversity of the sample library according to claim 1, wherein the initializing population comprises:
and randomly generating an initial population, wherein each individual in the population is generated by y numbers, each number represents a selected sample number meeting the ex-warehouse conditions, each number corresponds to an ex-warehouse identification, the ex-warehouse identification is used for identifying whether the corresponding sample is selected for ex-warehouse, and y represents the number of samples meeting the ex-warehouse conditions.
6. The method of claim 5, wherein the ith individual x in the population isiFitness function of (x)i)=HiWherein H isiIndicates that x is removed from all samples in the sample pooliThe selected sample is used as the sample diversity index of the ex-warehouse sample.
7. The ex-warehouse method for improving the sample diversity of the sample library according to claim 1, wherein the judging step is performed to judge whether a preset termination criterion is met, if so, the individual with the maximum fitness value is output as the optimal individual, the ex-warehouse sample is determined according to the optimal individual, and the ending iteration comprises:
judging whether the current iteration times is the preset maximum iteration times, if so, outputting the individual with the maximum fitness value as the optimal individual, determining ex-warehouse samples according to the y ex-warehouse identifications of the optimal individual, and ending the iteration.
CN201810133242.6A 2018-02-09 2018-02-09 Ex-warehouse method for improving sample diversity of sample library Expired - Fee Related CN108280327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810133242.6A CN108280327B (en) 2018-02-09 2018-02-09 Ex-warehouse method for improving sample diversity of sample library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810133242.6A CN108280327B (en) 2018-02-09 2018-02-09 Ex-warehouse method for improving sample diversity of sample library

Publications (2)

Publication Number Publication Date
CN108280327A CN108280327A (en) 2018-07-13
CN108280327B true CN108280327B (en) 2020-12-29

Family

ID=62808239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810133242.6A Expired - Fee Related CN108280327B (en) 2018-02-09 2018-02-09 Ex-warehouse method for improving sample diversity of sample library

Country Status (1)

Country Link
CN (1) CN108280327B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102041303A (en) * 2009-10-10 2011-05-04 浙江万里学院 Method for analyzing genetic diversity of shellfish by using fAFLP (Fluorescent Amplified Fragment Length Polymorphism) labeling technology
CN102622649A (en) * 2012-03-07 2012-08-01 南京邮电大学 Comentropy-based improved evolutionary multi-objective optimization method
WO2013051701A1 (en) * 2011-10-07 2013-04-11 Hayashi Naoki Optimal solution search method and optimal solution search device
CN103305506A (en) * 2011-03-03 2013-09-18 华中农业大学 SNP (single nucleotide polymorphism) molecular mark related to domesticating property of mandarin fish
CN103642913A (en) * 2013-12-02 2014-03-19 中国农业科学院麻类研究所 Method for constructing boehmeria nivea core idioplasm by using EST-SSR (Expressed Sequence Tag-Simple Sequence Repeat) molecular marker
CN104450917A (en) * 2014-12-12 2015-03-25 江西省农业科学院农业应用微生物研究所 Method for constructing molecular map of bactrocera dorsalis intestinal bacteria colony
CN104946740A (en) * 2015-05-07 2015-09-30 辽宁省海洋水产科学研究院 Detection method for biological community structures of ocean brown tide

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102041303A (en) * 2009-10-10 2011-05-04 浙江万里学院 Method for analyzing genetic diversity of shellfish by using fAFLP (Fluorescent Amplified Fragment Length Polymorphism) labeling technology
CN103305506A (en) * 2011-03-03 2013-09-18 华中农业大学 SNP (single nucleotide polymorphism) molecular mark related to domesticating property of mandarin fish
WO2013051701A1 (en) * 2011-10-07 2013-04-11 Hayashi Naoki Optimal solution search method and optimal solution search device
CN102622649A (en) * 2012-03-07 2012-08-01 南京邮电大学 Comentropy-based improved evolutionary multi-objective optimization method
CN103642913A (en) * 2013-12-02 2014-03-19 中国农业科学院麻类研究所 Method for constructing boehmeria nivea core idioplasm by using EST-SSR (Expressed Sequence Tag-Simple Sequence Repeat) molecular marker
CN104450917A (en) * 2014-12-12 2015-03-25 江西省农业科学院农业应用微生物研究所 Method for constructing molecular map of bactrocera dorsalis intestinal bacteria colony
CN104946740A (en) * 2015-05-07 2015-09-30 辽宁省海洋水产科学研究院 Detection method for biological community structures of ocean brown tide

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Diversity study of multi-objective genetic algorithm based on Shannon entropy";E. J. Solteiro Pires等;《2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014)》;20141016;第17-22页 *
"结合模糊熵和遗传算法的双阈值图像分割";郑毅等;《应用科学学报》;20140731;第32卷(第4期);第427-433页 *

Also Published As

Publication number Publication date
CN108280327A (en) 2018-07-13

Similar Documents

Publication Publication Date Title
US20020179097A1 (en) Method for providing clinical diagnostic services
US20030088320A1 (en) Unsupervised machine learning-based mathematical model selection
WO2002044715A1 (en) Methods for efficiently minig broad data sets for biological markers
WO2005024562A2 (en) System and method for pattern recognition in sequential data
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN107918725B (en) DNA methylation prediction method for selecting optimal features based on machine learning
US20200227134A1 (en) Drug Efficacy Prediction for Treatment of Genetic Disease
CN103164631B (en) A kind of intelligent coordinate expression gene analyser
US7587280B2 (en) Genomic data mining using clustering logic and filtering criteria
CN116959585B (en) Deep learning-based whole genome prediction method
US20230073973A1 (en) Deep learning based system and method for prediction of alternative polyadenylation site
CN113160886A (en) Cell type prediction system based on single cell Hi-C data
De Sousa et al. An immune-evolutionary algorithm for multiple rearrangements of gene expression data
CN108280327B (en) Ex-warehouse method for improving sample diversity of sample library
Lorena et al. Evaluation of noise reduction techniques in the splice junction recognition problem
Ram et al. Causal modeling of gene regulatory network
JP2004355174A (en) Data analysis method and system
KR20230043071A (en) Variant Pathogenicity Scoring and Classification and Use Thereof
Ramachandran et al. Deep learning for better variant calling for cancer diagnosis and treatment
Mandal et al. An Approach towards Automated Disease Diagnosis & Drug Design Using Hybrid Rough-Decision Tree from Microarray Dataset
CN114512188B (en) DNA binding protein recognition method based on improved protein sequence position specificity matrix
Darvish et al. Discovering dynamic regulatory pathway by applying an auto regressive model to time series DNA microarray data
Tchakounte-Wakem A Comparison of Methods Taking into Account Asymmetry when Evaluating Differential Expression in Gene Expression Experiments
Dutta et al. Significance analysis of time‐series transcriptomic data: A methodology that enables the identification and further exploration of the differentially expressed genes at each time‐point
CN117877573A (en) Construction method of polygene genetic risk assessment model by utilizing isooctane model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201229

Termination date: 20220209