CN115050437B - Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain - Google Patents
Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain Download PDFInfo
- Publication number
- CN115050437B CN115050437B CN202210510098.XA CN202210510098A CN115050437B CN 115050437 B CN115050437 B CN 115050437B CN 202210510098 A CN202210510098 A CN 202210510098A CN 115050437 B CN115050437 B CN 115050437B
- Authority
- CN
- China
- Prior art keywords
- data
- biological
- bird nest
- bird
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 55
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 55
- 238000007405 data analysis Methods 0.000 title claims abstract description 28
- 230000036541 health Effects 0.000 claims abstract description 25
- 238000012706 support-vector machine Methods 0.000 claims abstract description 23
- 235000005770 birds nest Nutrition 0.000 claims description 121
- 235000005765 wild carrot Nutrition 0.000 claims description 121
- 241000544061 Cuculus canorus Species 0.000 claims description 19
- 230000008859 change Effects 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 12
- 244000000626 Daucus carota Species 0.000 claims description 4
- 238000011282 treatment Methods 0.000 abstract description 4
- 238000003745 diagnosis Methods 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 230000008506 pathogenesis Effects 0.000 abstract description 3
- 230000002265 prevention Effects 0.000 abstract description 3
- 238000004393 prognosis Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 12
- 238000000034 method Methods 0.000 description 10
- 238000005457 optimization Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 201000004569 Blindness Diseases 0.000 description 2
- 239000003054 catalyst Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The system comprises a plurality of data terminals, a block chain storage module and a biological data analysis platform, wherein the data terminals are used for uploading biological data and displaying health reports of users; the block chain storage module is used for storing the biological data uploaded by each data terminal; the biological data analysis platform is used for establishing a disease prediction model based on a support vector machine according to biological big data and generating a health report of a user according to biological data uploaded by the user. The invention has the beneficial effects that: the establishment of a disease prediction model based on a support vector machine is helpful for people to know the correlation between biological data and diseases, thereby helping people to know the pathogenesis of diseases and playing a very important role in the prevention, diagnosis, monitoring, prognosis and treatment of diseases.
Description
Technical Field
The invention relates to the field of biological big data, in particular to a biological big data analysis and disease accurate identification classification prediction system based on an algorithm and a block chain.
Background
With the rapid development of high-throughput biotechnology, the biomedical field has generated a great deal of different types of biological data including information used by doctors and researchers to understand what diseases patients suffer from and to determine possible treatment regimens that should be used for clinical management, and thus, these biological data are crucial to understanding human biology and the diseases we encounter. Through advances in biotechnology, information in biotechnology can now be readily extracted, resulting in large amounts of digital data being captured in machine-readable form. This biospecific "datamation" produces different types of biological data, reflecting the molecular events that are occurring in the disease. When building a personalized disease treatment framework, the big biological data must be analyzed in a meaningful and operable way to capture disease information from the big biological data.
The support vector machine is a technology in data mining, and can process machine learning related problems by using an optimization method, and the method has been greatly developed in recent years and becomes an important method for solving the problems of 'over-learning' and 'dimension disaster'. The support vector machine is used for disease prediction, and the correlation between biological big data and diseases can be effectively researched, so that the effective prediction of the diseases is realized. The support vector machine has the problem of parameter selection, and the difference of the parameter selection can directly influence the prediction precision and the generalization capability of the support vector machine. In recent years, a number of scholars improve the parameter optimization method of the support vector machine, and the effect of optimizing the parameters of the support vector machine based on the cuckoo algorithm is more excellent than that of other methods, but the method also has some defects, such as the cuckoo algorithm has low optimization accuracy and low convergence rate.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a biological big data analysis and disease accurate identification classification prediction system based on an algorithm and a block chain.
The purpose of the invention is realized by the following technical scheme:
the biological big data analysis and disease accurate identification classification prediction system based on the algorithm and the block chain comprises a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with a disease label are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input into the data preprocessing unit, the data preprocessing unit normalizes the received biological big data when new biological big data are received each time, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease label in the biological big data are marked, all the clusters obtained are input into the data analysis unit as sample subsets, when the new sample subsets are received by the data analysis unit, the support vector machine is re-tested according to the new sample subsets and the corresponding disease labels, so that a disease prediction model based on the biological data is established, the health report generation unit is used for writing the received biological big data of a user into the health report generation unit, and generating a health report according to the corresponding disease prediction result of the training report of the user, and displaying the health report in the user data on the user.
Preferably, the statement-of-health of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a tagged biological database and an untagged biological database, the tagged biological database is used for storing biological big data with disease tags, the untagged biological database calls and stores the biological big data which is not called in the blockchain storage module at intervals of a given updating period, so as to update the biological big data in the untagged biological database, and the biological big data in the tagged biological database and the updated biological big data in the untagged biological database are input into the data preprocessing unit.
Preferably, the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm.
Preferably, in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the t iteration update, p a Indicating the probability of finding, the bird's nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, X i (t)=x i (t); when the random number rand>p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) represents the bird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) the corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) adding Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,…,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates the bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
wherein R is i,l (t) indicates the bird nest position x i,l (t) bird nestPosition x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,Represents a sequence Q i The first k bird nest positions in (t) are bird nest positions x i (t) is the mean of the spatial radii of the centers, and->k is a given positive integer, and k satisfies: k is a radical of<n i (t), α and β are weighting coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) a set of randomly changing preferred bird nest positions using a parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to the bird nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
in the formula, k i (t) indicates a bird nest position x i (t) local range control parameters at random changes,the median of the spatial detection coefficients of the bird nest positions reserved after the population is updated in the Laevir flight mode for the t time is represented, and wherein mean denotes the median function->Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the bird nest positions are the positions x of the participating bird nests in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the front k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
in the formula, x i (t) indicates the bird nest position x i (t) New nest position obtained by random Change, rand 1 Is a randomly generated random number between 0 and 1,and &>Are respectively in the set J i (t) randomly selecting a bird nest position, and->
Let f (x) i (t)) represents the bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) then X i (t)=x i (t) when f (χ) i (t))<f(x i (t)) then X i (t)=χ i (t)。
The beneficial effects created by the invention are as follows: the disease prediction model is established based on the support vector machine, so that accurate identification and prediction of diseases are realized, people can know the correlation between biological data and the diseases, the pathogenesis of the diseases is known, and the disease prediction model plays an important role in prevention, diagnosis, monitoring, prognosis and treatment of the diseases; the parameters of the support vector machine are optimized through an improved cuckoo algorithm, blindness of manual parameter selection is avoided, and prediction accuracy of the support vector machine is improved.
Drawings
The invention is further described with the aid of the accompanying drawings, in which, however, the embodiments do not constitute any limitation to the invention, and for a person skilled in the art, without inventive effort, further drawings may be derived from the following figures.
FIG. 1 is a schematic diagram of the present invention.
Detailed Description
The invention is further described with reference to the following examples.
Referring to fig. 1, the system for analyzing biological big data and accurately identifying, classifying and predicting diseases based on algorithm and block chain of the embodiment includes a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input to the data preprocessing unit, the data preprocessing unit normalizes the received biological big data each time new biological big data are received, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease labels in the biological big data are marked, each class obtained by clustering is input to the data analysis unit as a sample subset, when the new sample subset is received by the data analysis unit, the support vector machine is trained and tested again according to the new sample subset and the disease label corresponding to the new sample subset, so that a disease prediction model based on the biological data is established, the health report generation unit is used for writing the received biological big data with the disease labels in the health report, and generating a health report according to the disease prediction result of a user, and displaying the disease prediction model in the user, and generating a user.
Preferably, the statement-of-health of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a tagged biological database and an untagged biological database, the tagged biological database is used for storing biological big data with disease tags, the untagged biological database calls and stores biological big data which is not called in the blockchain storage module at intervals of a given updating period, so as to update the biological big data in the untagged biological database, and the biological big data in the tagged biological database and the biological big data in the updated untagged biological database are input to the data preprocessing unit.
The preferred embodiment establishes a disease prediction model based on a support vector machine, realizes accurate identification and prediction of diseases, and is helpful for people to know the correlation between biological data and diseases, thereby helping people to know the pathogenesis of diseases and playing an important role in prevention, diagnosis, monitoring, prognosis and treatment of diseases.
Preferably, the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm, in the cuckoo algorithm, the mean square error between the output value and the expected output value of the support vector machine is used as a fitness function of the cuckoo algorithm, and the smaller the fitness function value corresponding to the position of the bird nest is, the more optimal the position of the bird nest is.
Preferably, in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the tth iterative update, nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, then X i (t)=x i (t); when the random number rand>p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) represents a bird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) the corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) mixing Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,…,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates a bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
wherein R is i,l (t) indicates the bird nest position x i,l (t) in bird's nest position x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,Represents a sequence Q i The first k bird nest positions in (t) and the bird nest position x i (t) is the mean of the spatial radii of the centers, and->k is a given positive integer, and k satisfies: k is less than or equal to n i (t), the value of k can be taken to be 5, α and β are weight coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) a set of randomly changing preferred bird nest positions using a parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to the bird nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
in the formula, k i (t) indicates the bird nest position x i (t) local range control parameters at random change,indicates the population isThe median of the spatial detection coefficient of the bird nest position retained after the t-th time of updating by adopting the Laevir flight mode, and wherein mean denotes a median function, which takes the mean value>Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the bird nest positions are the positions x of the participating bird nests in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the front k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
in the formula, chi i (t) indicates the bird nest position x i (t) bird nest position obtained by random variation, rand 1 Is a randomly generated random number between 0 and 1,and &>Are respectively in the set J i (t) randomly selected bird nest positions, and
let f (x) i (t)) represents a bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) in the presence of a catalyst,then X i (t)=x i (t) when f (χ) i (t))<f(x i (t)) then X i (t)=χ i (t)。
In the preferred embodiment, the punishment factor and the kernel function parameter of the support vector machine are optimized by using the cuckoo algorithm, so that the blindness of manually selecting the parameters is avoided, and the classification precision of the support vector machine is improved; the traditional cuckoo algorithm has the problems that the local optimization precision is not high enough, the convergence speed is not high enough and the like, and the problems easily cause that the traditional cuckoo algorithm cannot obtain the optimal parameters of the support vector machine, so that in order to improve the precision of optimizing the support vector machine by using the cuckoo algorithm, the preferred embodiment improves the traditional cuckoo algorithm and aims to improve the optimization precision and the convergence speed of the cuckoo algorithm, and the method specifically comprises the following steps: after the traditional cuckoo algorithm updates the positions of the bird nests by adopting the Levy flight mode, the positions of part of the bird nests in the population are changed randomly, namely two bird nest positions are randomly selected from the population to randomly change the current position of the bird nest, but the random change mode is too random and lacks adaptivity, so that the effect of improving the local optimization precision and the convergence speed cannot be well achieved, therefore, the preferred embodiment is arranged that when the positions of the bird nests are changed randomly, two better bird nest positions are selected from the population to randomly change the positions of the bird nests, therefore, the technical effect of improving the convergence rate of the algorithm is achieved, further, in order to enhance the local optimization precision of the algorithm and avoid the algorithm from falling into the local optimum, in the process of randomly changing the position of the bird nest, in the preferred embodiment, the spatial overlapping degree between the position of the more optimal bird nest close to the position of the bird nest in the population and the position of the bird nest is measured by the defined spatial detection coefficient, when the value of the spatial detection coefficient corresponding to the position of the bird nest is smaller, it is indicated that the more optimal bird nest position close to the position of the bird nest in the population and the local spatial overlapping degree formed by the position of the bird nest are higher, at this time, the parameter k is made to be higher i The value of (t) is large, i.e. in the sequence Q i (t) selecting more nest locations to participate in the random change of said nest locations, thereby increasing the diversity of the population when said nest locations areWhen the value of the space detection coefficient is larger, the overlapping degree of a local space formed by a preferred bird nest position closer to the bird nest position in the population and the bird nest position is smaller, and at the moment, the parameter k is made to be smaller i The value of (t) is small, i.e. in the sequence Q i And (t) selecting fewer bird nest positions to participate in the random change of the bird nest positions, so that the local search of the cuckoo is enhanced in the random change process, and the optimization precision of the algorithm is improved.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (3)
1. The system for analyzing biological big data and accurately identifying, classifying and predicting diseases based on algorithms and block chains is characterized by comprising a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input into the data preprocessing unit, the data preprocessing unit normalizes the received biological big data each time new biological big data are received, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease labels in the biological big data are marked, each class obtained by clustering is input into the data analysis unit as a sample subset, the data analysis unit trains and tests a support vector machine again according to the new sample subset and the disease label corresponding to the new sample subset when the new sample subset is received, so as to establish a disease prediction model based on the biological data, the health report generation unit is used for writing the received biological big data of a user into the health prediction model, and displaying the health prediction report of the user according to the disease prediction result of the user, and displaying the disease prediction report in the user on the user before the user prediction model;
optimizing the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit by using a cuckoo algorithm; in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the t iteration update, p a Indicating the probability of finding, the bird's nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, then X i (t)=x i (t); when the random number rand > p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) representsBird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) a corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) adding Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,...,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates the bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
wherein R is i,l (t) indicates a bird nest position x i,l (t) at bird nest position x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,Represents a sequence Q i The first k bird nest positions in (t) are bird nest positions x i (t) is the mean of the spatial radii of the centers, and +>k is a given positive integer, and k satisfies: kvn i (t), α and β are weighting coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) randomly altered superior birdsSet of nest positions, using parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to bird's nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
in the formula, k i (t) indicates the bird nest position x i (t) local range control parameters at random change,represents the median of the spatial detection coefficients of the bird nest positions reserved after the population is updated by adopting the Levis flight mode for the t time, andwherein mean denotes the median function->Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the positions of the bird nests are the positions x of the bird nests participating in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the top k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
in the formula, x i (t) indicates the bird nest position x i (t) New nest position obtained by random Change, rand 1 To followA random number between 0 and 1 is generated by the machine,and &>Are respectively in the set J i (t) randomly selected bird nest positions, and
let f (x) i (t)) represents the bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) then X i (t)=x i (t) when f (χ) i (t))vf(x i (t)) then X i (t)=χ i (t)。
2. The system according to claim 1, wherein the report of the health of the user comprises identity information of the user and a disease label of the user.
3. The system according to claim 1, wherein the biometric database comprises a tagged biometric database and an untagged biometric database, the tagged biometric database is used for storing biometric data tagged with a disease, the untagged biometric database is used for retrieving and storing the non-retrieved biometric data in the blockchain storage module every given update period, so as to update the biometric data in the untagged biometric database, and the biometric data in the tagged biometric database and the biometric data in the updated untagged biometric database are input into the data preprocessing unit.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211657023.0A CN116130110A (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application |
CN202210510098.XA CN115050437B (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210510098.XA CN115050437B (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211657023.0A Division CN116130110A (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115050437A CN115050437A (en) | 2022-09-13 |
CN115050437B true CN115050437B (en) | 2023-04-07 |
Family
ID=83157943
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210510098.XA Active CN115050437B (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain |
CN202211657023.0A Pending CN116130110A (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211657023.0A Pending CN116130110A (en) | 2022-05-11 | 2022-05-11 | Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN115050437B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7953613B2 (en) * | 2007-01-03 | 2011-05-31 | Gizewski Theodore M | Health maintenance system |
US20090092299A1 (en) * | 2007-10-03 | 2009-04-09 | Siemens Medical Solutions Usa, Inc. | System and Method for Joint Classification Using Feature Space Cluster Labels |
CN109346182B (en) * | 2018-08-28 | 2021-06-18 | 昆明理工大学 | CS-RF-based risk early warning method for thalassemia |
CN111696663A (en) * | 2020-05-26 | 2020-09-22 | 平安科技(深圳)有限公司 | Disease risk analysis method and device, electronic equipment and computer storage medium |
CN113901623A (en) * | 2021-10-18 | 2022-01-07 | 南京工程学院 | SVM power distribution network topology identification method based on cuckoo search algorithm |
-
2022
- 2022-05-11 CN CN202210510098.XA patent/CN115050437B/en active Active
- 2022-05-11 CN CN202211657023.0A patent/CN116130110A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN116130110A (en) | 2023-05-16 |
CN115050437A (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chakraborty et al. | Novel Enhanced-Grey Wolf Optimization hybrid machine learning technique for biomedical data computation | |
Mucherino et al. | Data mining in agriculture | |
JP2006518062A (en) | Prediction algorithm training and testing database optimization system and method | |
CN106021990B (en) | A method of biological gene is subjected to classification and Urine scent with specific character | |
CN113517066B (en) | Depression assessment method and system based on candidate gene methylation sequencing and deep learning | |
CN113113130A (en) | Tumor individualized diagnosis and treatment scheme recommendation method | |
CN110853756B (en) | Esophagus cancer risk prediction method based on SOM neural network and SVM | |
CN112037925B (en) | LSTM algorithm-based early warning method for new major infectious diseases | |
CN110021341A (en) | A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access | |
Zaman et al. | Codon based back propagation neural network approach to classify hypertension gene sequences | |
Hagiwara et al. | BEANS: The benchmark of animal sounds | |
Bajaj et al. | Heart Disease Prediction using Ensemble ML | |
CN113764034B (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
CN113642613B (en) | Medical disease feature selection method based on improved goblet sea squirt swarm algorithm | |
Kiran et al. | A gradient boosted decision tree with binary spotted hyena optimizer for cardiovascular disease detection and classification | |
CN117198517B (en) | Modeling method of motion reactivity assessment and prediction model based on machine learning | |
CN115050437B (en) | Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain | |
CN115985503B (en) | Cancer prediction system based on ensemble learning | |
CN117351484A (en) | Tumor stem cell characteristic extraction and classification system based on AI | |
CN116580848A (en) | Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers | |
CN115206423A (en) | Label guidance-based protein action relation prediction method | |
Barros et al. | Supervised training of a simple digital assistant for a free crop clinic | |
Sharma et al. | Prediction of cardiovascular diseases using genetic algorithm and deep learning techniques | |
CN113284620A (en) | Method for establishing occupational health data analysis model | |
Madadi et al. | Detecting retinal neural and stromal cell classes and ganglion cell subtypes based on transcriptome data with deep transfer learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Algorithm and blockchain based biological big data analysis, precise disease identification, classification and prediction system Granted publication date: 20230407 Pledgee: Hua Xia Bank Co.,Ltd. Kunming Branch Pledgor: Yunnan Shengyue Information Technology Co.,Ltd. Registration number: Y2024980005240 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |