CN116130110A - Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application - Google Patents

Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application Download PDF

Info

Publication number
CN116130110A
CN116130110A CN202211657023.0A CN202211657023A CN116130110A CN 116130110 A CN116130110 A CN 116130110A CN 202211657023 A CN202211657023 A CN 202211657023A CN 116130110 A CN116130110 A CN 116130110A
Authority
CN
China
Prior art keywords
biological
bird nest
data
bird
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211657023.0A
Other languages
Chinese (zh)
Inventor
樊心敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Shengyue Information Technology Co ltd
Original Assignee
Yunnan Shengyue Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Shengyue Information Technology Co ltd filed Critical Yunnan Shengyue Information Technology Co ltd
Priority to CN202211657023.0A priority Critical patent/CN116130110A/en
Publication of CN116130110A publication Critical patent/CN116130110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The biological big data analysis and disease accurate identification classification prediction system based on the algorithm and the blockchain and the application thereof comprise a plurality of data ends, a blockchain storage module and a biological data analysis platform, wherein the data ends are used for uploading biological data by a user and displaying a health report of the user; the block chain storage module is used for storing biological data uploaded by each data end; the biological data analysis platform is used for establishing a disease prediction model based on a support vector machine according to biological big data and generating a health report of a user according to biological data uploaded by the user. The invention has the beneficial effects that: the disease prediction model is established based on the support vector machine, so that the relevance between biological data and diseases is facilitated to be known, the pathogenesis of the diseases is facilitated to be known, and the method plays a very important role in preventing, diagnosing, monitoring, prognosis and treating the diseases.

Description

Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application
Technical Field
The invention relates to the field of biological big data, in particular to a biological big data analysis and disease precise identification classification prediction system based on an algorithm and a block chain.
Background
With the rapid development of high-throughput biotechnology, the biomedical field has generated a large number of different types of biological data containing information for doctors and researchers to understand what kind of diseases a patient has and to determine possible treatment schemes that should be used for clinical management, and thus, these biological data are important for understanding human biology and diseases we encounter. With advances in biotechnology, information in biotechnology can now be easily extracted, resulting in large amounts of digital data being captured in machine-readable form. This biospecific "datamation" produces different types of biological data reflecting the molecular events that are occurring in the disease. In building a personalized disease treatment framework, the biological big data must be analyzed in a meaningful and operable way to capture disease information from the biological big data.
The support vector machine is a technology in data mining, and can process the related problems of machine learning by using an optimization method, and the method has been greatly developed in recent years, so that the support vector machine is an important method for solving the problems of over-learning, dimension disaster and the like. The support vector machine is used for disease prediction, and can effectively study the association between biological big data and diseases, so that the effective prediction of the diseases is realized. The support vector machine has the problem of parameter selection, and the prediction precision and generalization capability of the support vector machine can be directly affected by different parameter selections. In recent years, a plurality of scholars improve the method for optimizing the parameters in the support vector machine, and the effect of optimizing the parameters of the support vector machine based on the cuckoo algorithm is better than that of other methods, but the method has the defects that the accuracy of optimizing the cuckoo algorithm is not high enough, the convergence speed is low and the like.
Disclosure of Invention
Aiming at the problems, the invention aims to provide a biological big data analysis and disease precise identification classification prediction system based on an algorithm and a blockchain.
The aim of the invention is realized by the following technical scheme:
the biological big data analysis and disease accurate identification classification prediction system based on the algorithm and the blockchain comprises a plurality of data ends, a blockchain storage module and a biological data analysis platform;
and the data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data end through a login password by the user login unit, the user uploads the identity information of the user and the biological data of the user to a blockchain storage module and a biological data analysis platform through the data uploading unit after logging in the data end, and the health report display unit is used for displaying a health report of the user received by the data end;
a blockchain storage module: the system is used for storing the biological data uploaded by each data end;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which is not fetched in a blockchain storage module is fetched according to a given updating period, so that the biological big data in the biological database is updated, the biological big data in the updated biological database is input into the data preprocessing unit, the data preprocessing unit is used for carrying out normalization processing on the received biological big data when receiving new biological big data each time, clustering is carried out on the biological big data after normalization processing by adopting a semi-supervised clustering algorithm, the biological data without the disease labels in the biological big data is marked, each class obtained by clustering is input into the data analysis unit as a sample subset, and when receiving the new sample subset, the data analysis unit retrains and tests a support vector machine according to the new sample subset and the corresponding disease labels, so that a disease prediction model based on the biological data is built, the health report generation unit is used for writing the biological data of a received user into the current disease prediction model, carrying out on the received biological data of the user, carrying out clustering on the disease prediction model, and carrying out the health report generation unit is used for displaying the health report data of the user at the end, and the health report generation unit is displayed on the health report.
Preferably, the health report of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a labeled biological database and an unlabeled biological database, the labeled biological database is used for storing biological big data with disease labels, the unlabeled biological database is used for retrieving the biological big data which is not retrieved in the blockchain storage module for storage at intervals of a given updating period, so that the biological big data in the unlabeled biological database is updated, and the biological big data in the labeled biological database and the biological big data in the updated unlabeled biological database are input into the data preprocessing unit.
Preferably, the penalty factors and kernel function parameters of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm.
Preferably, in the cuckoo algorithm, x is set i (t) represents the position, X, of the ith bird nest in the population that remains after the ith bird nest is updated in the Lewy flight mode i (t) represents the position of the ith bird nest in the population remaining after the t-th iteration update, p a Representing the probability of discovery, bird nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When in use, X i (t)=x i (t) when the random number rand>p a When X is determined by i Value of (t):
let x be i (t) represents the position of the jth bird nest in the population which remains after the jth bird nest is updated by using the Lewy flight mode, and the position x of the jth bird nest j (t) satisfies: f (x) j (t))<f(x i (t)) then the bird's nest position x j (t) adding to set M i In (t), wherein M i (t) represents the position x of the bird nest in the population i (t) a set of preferred bird nest positions, f (x) j (t)) represents the bird's nest position x j The corresponding fitness function value of (t), f (x) i (t)) represents the bird's nest position x i (t) corresponding fitness function values; will aggregate M i Nest position in (t)According to the distance x from the bird nest i The Euclidean distance of (t) is sequenced from near to far to form a sequence Q i (t), Q is as follows i (t) is expressed as: qi (t) = { x i,l (t),l=1,2,...,n i (t) } wherein x i,l (t) represents the sequence Q i The position of the ith bird nest, n in (t) i (t) represents the sequence Q i The number of bird nests in (t), define H i (t) represents the bird nest position x i (t) spatial detection coefficient, H i The expression of (t) is:
Figure BDA0004013222150000031
wherein R is i,l (t) represents the bird nest position x i,l (t) in bird nest position x i (t) a spatial radius centered on, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents the sequence Q i The nth bird nest position in (t), R i,n (t) represents the bird nest position x i,n (t) in bird nest position x i (t) a spatial radius centered on, and R i,n (t)=|x i,n (t)-x i (t)|,
Figure BDA0004013222150000032
Representing sequence Q i The first k bird nest positions in (t) are identified as bird nest position x i (t) mean value of spatial radius centered, < >>
Figure BDA0004013222150000033
k is a given positive integer, and k satisfies: k (k)<n i (t), cα and β are weight coefficients, α and β satisfy: α, β e (0, 1) and α+β=1;
set J i (t) represents the position x of a participating bird nest in the population i (t) randomly varying sets of preferred bird nest locations using the parameter k i (t) determining the set J i The preferred bird nest location in (t) is specifically:
(1) According to bird nest position x i Spatial detection coefficient H of (t) i (t)Determining the value of the parameter ki (t):
Figure BDA0004013222150000034
wherein k is i (t) represents the bird nest position x i (t) a local scope control parameter at random change,
Figure BDA0004013222150000035
median of spatial detection coefficients representing bird nest positions retained by population after being updated by Lewy flight mode at t time, and
Figure BDA0004013222150000036
wherein media represents taking the median function, +.>
Figure BDA0004013222150000037
Represents rounding down, N represents the number of bird nests in the population;
(2) Sequence Q i The first ki (t) bird nest positions in (t) are the participating bird nest positions x in the population i Randomly varying preferred bird nest positions of (t), i.e. in sequence Q i Selecting the first ki (t) bird nest positions in (t) to be added to the set J i (t);
the bird nest position xi (t) is randomly changed in the following way:
Figure BDA0004013222150000041
wherein X is i (t) represents the bird nest position x i (t) New bird nest position, rand obtained by random variation 1 For a random number between 0 and 1 that is randomly generated,
Figure BDA0004013222150000042
and->
Figure BDA0004013222150000043
Respectively in the set J i Randomly selected bird nest position in (t), and +.>
Figure BDA0004013222150000044
Let f (x) i (t)) means the bird's nest position X i Fitness function value of (t), when f (x) i (t))≥f(x i (t)) then X i (t)=x i (t), when f (X i (t))<f(x i (t)) then X i (t)=X i (t)。
The invention has the beneficial effects that: the disease prediction model is established based on the support vector machine, so that accurate identification prediction of the disease is realized, the correlation between biological data and the disease is facilitated to be known, the pathogenesis of the disease is facilitated to be known, and the method plays a very important role in preventing, diagnosing, monitoring, prognosis and treating the disease; the parameters of the support vector machine are optimized through an improved cuckoo algorithm, so that blindness of manually selecting the parameters is avoided, and the prediction precision of the support vector machine is improved.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation on the invention, and other drawings can be obtained by one of ordinary skill in the art without undue effort from the following drawings.
Fig. 1 is a schematic diagram of the structure of the present invention.
Detailed Description
The invention will be further described with reference to the following examples.
Referring to fig. 1, the biological big data analysis and disease accurate identification classification prediction system based on algorithm and blockchain of the embodiment comprises a plurality of data terminals, a blockchain storage module and a biological data analysis platform;
and the data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data end through a login password by the user login unit, the user uploads the identity information of the user and the biological data of the user to a blockchain storage module and a biological data analysis platform through the data uploading unit after logging in the data end, and the health report display unit is used for displaying a health report of the user received by the data end;
a blockchain storage module: the system is used for storing the biological data uploaded by each data end;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, biological big data which is not previously fetched in a blockchain storage module are fetched according to a given updating period, so that the biological big data in the biological database are updated, the biological big data in the updated biological database are input into the data preprocessing unit, the data preprocessing unit is used for carrying out normalization processing on the received biological big data when receiving new biological big data each time, clustering the biological big data after normalization processing by adopting a semi-supervised clustering algorithm, thereby marking the biological data without the disease labels in the biological big data, each class obtained by clustering is input into the data analysis unit as a sample subset, the data analysis unit retrains and tests a support vector machine according to the new sample subset and the corresponding disease labels, so that a disease prediction model based on the biological data is built, the health report generation unit is used for writing the received biological data of a current user into the prediction model of the disease of the user, carrying out the health report generation unit is used for displaying the health report to the health report generation unit, and the health report generation unit is used for displaying the health report prediction model of the user.
Preferably, the health report of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a labeled biological database and an unlabeled biological database, the labeled biological database is used for storing biological big data with disease labels, the unlabeled biological database is used for retrieving the biological big data which is not previously retrieved in the blockchain storage module for storage at intervals of a given updating period, so that the biological big data in the unlabeled biological database is updated, and the biological big data in the labeled biological database and the biological big data in the updated unlabeled biological database are input to the data preprocessing unit.
The preferred embodiment establishes a disease prediction model based on a support vector machine, realizes accurate identification prediction of diseases, is beneficial to people to understand the association between biological data and diseases, thereby helping people to understand the pathogenesis of the diseases and playing a very important role in preventing, diagnosing, monitoring, prognosis and treating the diseases.
Preferably, the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm, in the cuckoo algorithm, the mean square error between the output value and the expected output value of the support vector machine is used as the fitness function of the cuckoo algorithm, and the smaller the fitness function value corresponding to the bird nest position is, the better the bird nest position is indicated.
Preferably, in the cuckoo algorithm, x is set i (t) represents the position, X, of the ith bird nest in the population that remains after the ith bird nest is updated in the Lewy flight mode i (t) represents the position of the ith bird nest in the population which remains after the t iteration update, the position x of the bird nest i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When in use, X i (t)=x i (t); when the random number rand>p a When X is determined by i Value of (t):
let x be j (t) represents the position of the jth bird nest in the population which remains after the jth bird nest is updated by using the Lewy flight mode, and the position x of the jth bird nest j (t) satisfies: f (x) j (t))<f (xi (t)) then the bird's nest position x j (t) adding to set M i In (t), wherein M i (t) represents the position x of the bird nest in the population i (t) a set of preferred bird nest positions, f (x) j (t)) represents the bird's nest position x j The corresponding fitness function value of (t), f (x) i (t)) represents the bird's nest position x i (t) corresponding fitness function values; will aggregate M i The bird nest position in (t) is in accordance with its distance from the bird nest position x i The Euclidean distance of (t) is sequenced from near to far to form a sequence Q i (t), Q is as follows i (t) is expressed as: q (Q) i (t)={x i,l (t),I=1,2,...,n i (t) } wherein x i,l (t) represents the sequence Q i The 1 st bird nest position, n, in (t) i (t) represents the sequence Q i The number of bird nests in (t), define H i (t) represents the bird nest position x i (t) spatial detection coefficient, H i The expression of (t) is:
Figure BDA0004013222150000061
wherein R is i,l (t) represents the bird nest position x i,l (t) in bird nest position x i (t) a spatial radius centered on, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents the nth bird's nest position in the sequence Qi (t), R i,n (t) represents the bird nest position x i,n (t) in bird nest position x i (t) a spatial radius centered on, and R i,n (t)=|x i,n (t)-x i (t)|,
Figure BDA0004013222150000062
Representing sequence Q i The first k bird nest positions in (t) are identified as bird nest position x i (t) mean value of spatial radius centered, < >>
Figure BDA0004013222150000063
k is a given positive integer, and k satisfies: k is less than or equal to n i (t), the value of k can take 5, alpha and beta as weight coefficients, and the alpha and beta satisfy the following conditions: α, β e (0, 1) and c+β=1;
set J i (t) represents the position x of a participating bird nest in the population i (t) randomly varying sets of preferred bird nest locations using the parameter k i (t) determining the set J i The preferred bird nest location in (t) is specifically:
(1) According to bird nest position x i Spatial detection coefficient H of (t) i (t) determining the value of parameter ki (t):
Figure BDA0004013222150000064
wherein k is i (t) represents the bird nest position x i (t) a local scope control parameter at random change,
Figure BDA0004013222150000065
median of spatial detection coefficients representing bird nest positions retained by population after being updated by Lewy flight mode at t time, and
Figure BDA0004013222150000071
wherein media represents taking the median function, +.>
Figure BDA0004013222150000072
Represents rounding down, N represents the number of bird nests in the population;
(2) Sequence Q i The first ki (t) bird nest positions in (t) are the participating bird nest positions x in the population i Randomly varying preferred bird nest positions of (t), i.e. in sequence Q i Selecting the first ki (t) bird nest positions in (t) to be added to the set J i (t);
the bird nest position x i (t) randomly altering in the following manner:
Figure BDA0004013222150000073
wherein X is i (t) represents the bird nest position x i (t) bird nest position, rand obtained by random variation 1 For a random number between 0 and 1 that is randomly generated,
Figure BDA0004013222150000074
and->
Figure BDA0004013222150000075
Respectively in the set J i Randomly selected bird nest locations in (t) and
Figure BDA0004013222150000076
let f (x) i (t)) means the bird's nest position X i Fitness function value of (t), when f (x) i (t))≥f(x i (t)) then X i (t)=x i (t), when f (X i (t))<f(x i (t)) then X i (t)=X i (t)。
The preferred embodiment optimizes the penalty factors and the kernel function parameters of the support vector machine by using a cuckoo algorithm, avoids blindness of manually selecting the parameters and improves the classification precision of the support vector machine; the conventional cuckoo algorithm has the problems of insufficient local optimization precision, insufficient convergence speed and the like, and the problems are easy to cause that the conventional cuckoo algorithm cannot obtain the optimal parameters of the support vector machine, so in order to improve the precision of optimizing the support vector machine by adopting the cuckoo algorithm, the preferred embodiment improves the conventional cuckoo algorithm, and aims to improve the optimization precision and the convergence speed of the cuckoo algorithm, specifically: after the bird nest position is updated by adopting the Lewy flight mode, the conventional cuckoo algorithm generally changes part of the bird nest positions in the population randomly, namely, two bird nest positions are randomly selected in the population to randomly change the current bird nest position, but the random change mode is too random and lacks self-adaptability, and cannot well achieve the effect of improving local optimization precision and convergence speed, so that when the bird nest positions are randomly changed, the preferred embodiment selects two preferred bird nest positions in the population to randomly change the bird nest positions, thereby achieving the technical effect of improving the convergence speed of the algorithm, and further, in order to strengthen the local optimization precision of the algorithm and avoid the algorithm from sinking into local optimum, in the process of randomly changing the bird nest positions, the preferred embodiment measures the preferred bird nest position closer to the bird nest position in the population through the defined spatial detection coefficientThe spatial overlapping degree between the bird nest position and the bird nest position indicates that the local spatial overlapping degree formed by the better bird nest position close to the bird nest position and the bird nest position in the population is higher when the value of the spatial detection coefficient corresponding to the bird nest position is smaller, at the moment, the value of the parameter ki (t) is made to be larger, namely, more bird nest positions are selected in the sequence Qi (t) to participate in the random change of the bird nest position, so that the diversity of the population is increased, and the diversity of the population is increased, when the value of the spatial detection coefficient of the bird nest position is larger, the local spatial overlapping degree formed by the better bird nest position close to the bird nest position and the bird nest position in the population is indicated to be smaller, at the moment, the parameter k is made to be i The value of (t) is small, i.e. in the sequence Q i Selecting fewer bird nest positions to participate in random change of the bird nest positions, so that local search of the cuckoo is enhanced in the random change process, and optimizing accuracy of an algorithm is improved.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (7)

1. The biological data analysis platform is characterized by comprising a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which is not called in a block chain storage module is called according to a given updating period, so that the biological big data in the biological database is updated, the biological big data in the updated biological database is input into the data preprocessing unit, the data preprocessing unit performs normalization processing on the received biological big data when receiving new biological big data each time, a semi-supervised clustering algorithm is adopted to cluster the biological big data which is not provided with the disease labels in the biological big data, each class obtained by clustering is input into the data analysis unit as a sample subset, the data analysis unit retrains and tests a support vector machine according to the new sample subset and the corresponding disease labels, so that a disease prediction model based on the biological data is established, the health report generation unit is used for writing the received biological big data into the prediction model of a user to the user, and the health report generation unit displays the current disease prediction model of the user to the health report and displays the disease prediction model of the user at the user end according to the received disease report;
optimizing penalty factors and kernel function parameters of a support vector machine of the data analysis unit by using a cuckoo algorithm; in the cuckoo algorithm, x is set i (t) represents the position, X, of the ith bird nest in the population that remains after the ith bird nest is updated in the Lewy flight mode i (t) represents the position of the ith bird nest in the population remaining after the t-th iteration update, p a Representing the probability of discovery, bird nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When in use, X i (t)=x i (t); when the random number rand>p a When X is determined by i Value of (t):
let x be i (t) represents the position of the jth bird nest in the population which remains after the jth bird nest is updated by using the Lewy flight mode, and the position x of the jth bird nest j (t) satisfies: f (x) j (t))<f(x i (t)) then the bird's nest position x j (t) adding to set M i In (t), wherein M i (t) represents the position x of the bird nest in the population i (t) a set of preferred bird nest positions, f (x) i (t)) represents the bird's nest position x j The corresponding fitness function value of (t), f (x) i (t)) represents the bird's nest position x i (t) corresponding fitness function values; will aggregate M i The bird nest position in (t) is in accordance with its distance from the bird nest position x i The Euclidean distance of (t) is sequenced from near to far to form a sequence Q i (t), Q is as follows i (t) is expressed as: q (Q) i (t)={x i,l (t),l=1,2,...,n i (t) } wherein x i,l (t) represents the sequence Q i The position of the ith bird nest, n in (t) i (t) represents the sequence Q i The number of bird nests in (t), define H i (t) represents the bird nest position x i (t) spatial detection coefficient, H i The expression of (t) is:
Figure FDA0004013222140000021
wherein R is i,l (t) represents the bird nest position x i,l (t) in bird nest position x i (t) a spatial radius centered on, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents the sequence Q i The nth bird nest position in (t), R i,n (t) represents the bird nest position x i,n (t) in bird nest position x i (t) a spatial radius centered on, and R i,n (t)=|x i,n (t)-x i (t)|, - R i (t) represents the sequence Q i The first k bird nest positions in (t) are identified as bird nest position x i (t) mean value of spatial radius as center, and
Figure FDA0004013222140000022
k is a given positive integer, and k satisfies: k (k)<n i (t), alpha and beta are weight coefficients, and alpha and beta satisfy: α, β e (0, 1) and α+β=1;
set J i (t) represents the position x of a participating bird nest in the population i (t) randomly varying sets of preferred bird nest locations using the parameter k i (t) determining the set J i The preferred bird nest location in (t) is specifically:
(1) According to bird nest position x i Spatial detection coefficient H of (t) i (t) determining the parameter k i Value of (t):
Figure FDA0004013222140000023
wherein k is i (t) represents the bird nest position x i (t) local scope control parameters at random, H (t) representing the median value of the spatial detection coefficients of bird nest positions remaining after the population is updated at the t-th time using the Lewy flight mode, and
Figure FDA0004013222140000024
wherein media represents taking the median function, +.>
Figure FDA0004013222140000025
Represents rounding down, N represents the number of bird nests in the population;
(2) Sequence Q i Front k in (t) i (t) bird nest positions are the participating bird nest positions x in the population i Randomly varying preferred bird nest positions of (t), i.e. in sequence Q i Selecting the top k in (t) i (t) bird nest positions added to set J i (t);
the bird nest position x i (t) randomly altering in the following manner:
Figure FDA0004013222140000026
wherein X is i (t) represents the bird nest position x i (t) New bird nest position, rand obtained by random variation 1 For a random number between 0 and 1 that is randomly generated,
Figure FDA0004013222140000031
and->
Figure FDA0004013222140000032
Respectively in the set J i Randomly selected bird nest locations in (t) and
Figure FDA0004013222140000033
let f (x) i (t)) means the bird's nest position X i Fitness function value of (t), when f (x) i (t))≥f(x i (t)) then X i (t)=x i (t), when f (X i (t))<f(x i (t)) then X i (t)=X i (t)。
2. The biometric data analysis platform of claim 1, wherein the user's health report includes user's identity information and a user's disease label.
3. The biological data analysis platform according to claim 1 or 2, wherein the biological database comprises a labeled biological database and an unlabeled biological database, the labeled biological database is used for storing biological big data with disease labels, the unlabeled biological database is used for storing the biological big data which is not called in the blockchain storage module at intervals of given updating period, so that the biological big data in the unlabeled biological database is updated, and the biological big data in the labeled biological database and the biological big data in the updated unlabeled biological database are input into the data preprocessing unit.
4. Use of the biological data analysis platform of any one of claims 1-3 for biological data analysis.
5. The biological big data analysis and disease accurate identification classification prediction system based on the algorithm and the blockchain is characterized by comprising a plurality of data ends, a blockchain storage module and a biological data analysis platform;
and the data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data end through a login password by the user login unit, the user uploads the identity information of the user and the biological data of the user to a blockchain storage module and a biological data analysis platform through the data uploading unit after logging in the data end, and the health report display unit is used for displaying a health report of the user received by the data end;
a blockchain storage module: the system is used for storing the biological data uploaded by each data end;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which is not fetched in a blockchain storage module is fetched according to a given updating period, so that the biological big data in the biological database is updated, the biological big data in the updated biological database is input into the data preprocessing unit, the data preprocessing unit is used for carrying out normalization processing on the received biological big data when receiving new biological big data each time, clustering the biological big data after normalization processing by adopting a semi-supervised clustering algorithm, thereby marking the biological data without the disease labels in the biological big data, each class obtained by clustering is input into the data analysis unit as a sample subset, and the data analysis unit is used for retraining and testing a support vector machine according to the new sample subset and the corresponding disease labels when receiving the new sample subset, so that a disease prediction model based on the biological data is built, the health report generation unit is used for writing the received biological data of a user into the current prediction model of the disease of the user, carrying out clustering processing on the received biological big data, and displaying the data in the health report generation unit and displaying the result of the prediction model of the disease prediction model of the user at the health report generation end of the user, and displaying the health report generation unit according to the result of the prediction model of the user;
optimizing penalty factors and kernel function parameters of the single-phase support vector machine by using the cuum Ma Suanfa; in the cuckoo algorithm, x is set i (t) represents the position, X, of the ith bird nest in the population that remains after the ith bird nest is updated in the Lewy flight mode i (t) represents the position of the ith bird nest in the population remaining after the t-th iteration update, p a Representing the probability of discovery, bird nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When in use, X i (t)=x i (t); when the random number rand>p a When X is determined by i Value of (t):
let x be j (t) represents the position of the jth bird nest in the population which remains after the jth bird nest is updated by using the Lewy flight mode, and the position x of the jth bird nest j (t) satisfies: f (x) j (t))<f(x i (t)) then the bird's nest position x j (t) adding to set M i In (t), wherein M i (t) represents the position x of the bird nest in the population i (t) a set of preferred bird nest positions, f (x) j (t)) represents the bird's nest position x j The corresponding fitness function value of (t), f (x) i (t)) represents the bird's nest position x i (t) corresponding fitness function values; will aggregate M i The bird nest position in (t) is in accordance with its distance from the bird nest position x i The Euclidean distance of (t) is sequenced from near to far to form a sequence Q i (t), Q is as follows i (t) is expressed as: q (Q) i (t)={x i,l (t),l=1,2,...,n i (t) } wherein x i,l (t) represents the sequence Q i The 1 st bird nest position, n, in (t) i (t) represents the sequence Q i The number of bird nests in (t), define H i (t) represents the bird nest position x i (t) spatial detection coefficient, H i The expression of (t) is:
Figure FDA0004013222140000041
wherein T is i,l (t) represents the bird nest position x i,l (t) in bird nest position x i (T) spatial radius centered, and T i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents the sequence Q i The nth bird nest position in (t), R i,n (t) represents the bird nest position x i,n (t) in bird nest position x i (t) a spatial radius centered on, and R i,n (t)=|x i,n (t)-x i (t)|, - R i (t) represents the sequence Q i The first k bird nest positions in (t) are identified as bird nest position x i (t) mean value of spatial radius as center, and
Figure FDA0004013222140000042
k is a given positive integer, and k satisfies: k (k)<n i (t), alpha and beta are weight coefficients, and calpha and beta satisfy: α, β e (0, 1) and α+β=1;
set J i (t) represents the position x of a participating bird nest in the population i (t) randomly varying sets of preferred bird nest locations using the parameter k i (t) determining the set J i The preferred bird nest location in (t) is specifically:
(1) According to bird nest position x i Spatial detection coefficient H of (t) i (t) determining the parameter k i Value of (t):
Figure FDA0004013222140000051
/>
wherein k is i (t) represents the bird nest position x i (t) local scope control parameters at random, H (t) representing the median value of the spatial detection coefficients of bird nest positions remaining after the population is updated at the t-th time using the Lewy flight mode, and
Figure FDA0004013222140000052
wherein media represents taking the median function, +.>
Figure FDA0004013222140000053
Represents rounding down, N represents the number of bird nests in the population;
(2) Sequence Q i Front k in (t) i (t) bird nest positions are the participating bird nest positions x in the population i Randomly varying preferred bird nest positions of (t), i.e. in sequence Q i Selecting the top k in (t) i (t) bird nest positions added to set J i (t);
the bird nest position x i (t) randomly altering in the following manner:
Figure FDA0004013222140000054
wherein X is i (t) represents the bird nest position x i (t) New bird nest position, rand obtained by random variation 1 For randomly generating a random number between 0 and 1, x i 1 (t) and
Figure FDA0004013222140000056
respectively in the set J i Randomly selected bird nest locations in (t) and
Figure FDA0004013222140000055
let f (x) i (t)) means the bird's nest position X i Fitness function value of (t), when f (x) i (t))≥f(x i (t)) then X i (t)=x i (t), when f (X i (t))<f(x i (t)) then X i (t)=x i (t);
The user's health report includes the user's identity information and the user's disease label.
The biological database comprises a labeled biological database and an unlabeled biological database, the labeled biological database is used for storing biological big data with disease labels, the unlabeled biological database is used for calling the biological big data which is not called in the blockchain storage module for storage every given updating period, so that the biological big data in the unlabeled biological database is updated, and the biological big data in the labeled biological database and the biological big data in the updated unlabeled biological database are input into the data preprocessing unit.
6. The use of the algorithm and blockchain based biological big data analysis, disease precise identification classification prediction system of claim 5.
7. A computer carrier comprising the system of claim 5.
CN202211657023.0A 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application Pending CN116130110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211657023.0A CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211657023.0A CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application
CN202210510098.XA CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202210510098.XA Division CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain

Publications (1)

Publication Number Publication Date
CN116130110A true CN116130110A (en) 2023-05-16

Family

ID=83157943

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210510098.XA Active CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain
CN202211657023.0A Pending CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210510098.XA Active CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain

Country Status (1)

Country Link
CN (2) CN115050437B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953613B2 (en) * 2007-01-03 2011-05-31 Gizewski Theodore M Health maintenance system
US20090092299A1 (en) * 2007-10-03 2009-04-09 Siemens Medical Solutions Usa, Inc. System and Method for Joint Classification Using Feature Space Cluster Labels
CN109346182B (en) * 2018-08-28 2021-06-18 昆明理工大学 CS-RF-based risk early warning method for thalassemia
CN111696663A (en) * 2020-05-26 2020-09-22 平安科技(深圳)有限公司 Disease risk analysis method and device, electronic equipment and computer storage medium
CN113901623A (en) * 2021-10-18 2022-01-07 南京工程学院 SVM power distribution network topology identification method based on cuckoo search algorithm

Also Published As

Publication number Publication date
CN115050437B (en) 2023-04-07
CN115050437A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN106021990B (en) A method of biological gene is subjected to classification and Urine scent with specific character
CN109637579B (en) Tensor random walk-based key protein identification method
CN111354338B (en) Parkinson speech recognition system based on PSO convolution kernel optimization sparse transfer learning
Zaman et al. Codon based back propagation neural network approach to classify hypertension gene sequences
CN112215259B (en) Gene selection method and apparatus
CN112270958A (en) Prediction method based on hierarchical deep learning miRNA-lncRNA interaction relation
CN107368707A (en) Gene chip expression data analysis system and method based on US ELM
CN116628510A (en) Self-training iterative artificial intelligent model training method
CN117611974B (en) Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117349732A (en) High-flow humidification therapeutic apparatus management method and system based on artificial intelligence
Bajaj et al. Heart Disease Prediction using Ensemble ML
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
CN109934286A (en) Bug based on Text character extraction and uneven processing strategie reports severity recognition methods
CN109545372B (en) Patient physiological data feature selection method based on greedy-of-distance strategy
CN116720519B (en) Seedling medicine named entity identification method
CN116130110A (en) Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application
Ramakrishna et al. Evolutionary Optimization Algorithm for Classification of Microarray Datasets with Mayfly and Whale Survival.
CN115579068A (en) Pre-training and deep clustering-based metagenome species reconstruction method
CN115662501A (en) Protein generation method based on position specificity weight matrix
CN115206423A (en) Label guidance-based protein action relation prediction method
CN113724779A (en) SNAREs protein identification method, system, storage medium and equipment based on machine learning technology
Cudic et al. Prediction of sorghum bicolor genotype from in-situ images using autoencoder-identified SNPs
Agarwal et al. Crop Prediction Using Ensemble Learning
CN112687329A (en) Cancer prediction system based on non-cancer tissue mutation information and construction method thereof
CN116994645B (en) Prediction method of piRNA and mRNA target pair based on interactive reasoning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination