CN115050437B - Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain - Google Patents

Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain Download PDF

Info

Publication number
CN115050437B
CN115050437B CN202210510098.XA CN202210510098A CN115050437B CN 115050437 B CN115050437 B CN 115050437B CN 202210510098 A CN202210510098 A CN 202210510098A CN 115050437 B CN115050437 B CN 115050437B
Authority
CN
China
Prior art keywords
data
biological
bird nest
bird
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210510098.XA
Other languages
Chinese (zh)
Other versions
CN115050437A (en
Inventor
罗学敏
李益非
樊心敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Shengyue Information Technology Co ltd
Original Assignee
Yunnan Shengyue Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Shengyue Information Technology Co ltd filed Critical Yunnan Shengyue Information Technology Co ltd
Priority to CN202211657023.0A priority Critical patent/CN116130110A/en
Priority to CN202210510098.XA priority patent/CN115050437B/en
Publication of CN115050437A publication Critical patent/CN115050437A/en
Application granted granted Critical
Publication of CN115050437B publication Critical patent/CN115050437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The system comprises a plurality of data terminals, a block chain storage module and a biological data analysis platform, wherein the data terminals are used for uploading biological data and displaying health reports of users; the block chain storage module is used for storing the biological data uploaded by each data terminal; the biological data analysis platform is used for establishing a disease prediction model based on a support vector machine according to biological big data and generating a health report of a user according to biological data uploaded by the user. The invention has the beneficial effects that: the establishment of a disease prediction model based on a support vector machine is helpful for people to know the correlation between biological data and diseases, thereby helping people to know the pathogenesis of diseases and playing a very important role in the prevention, diagnosis, monitoring, prognosis and treatment of diseases.

Description

Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain
Technical Field
The invention relates to the field of biological big data, in particular to a biological big data analysis and disease accurate identification classification prediction system based on an algorithm and a block chain.
Background
With the rapid development of high-throughput biotechnology, the biomedical field has generated a great deal of different types of biological data including information used by doctors and researchers to understand what diseases patients suffer from and to determine possible treatment regimens that should be used for clinical management, and thus, these biological data are crucial to understanding human biology and the diseases we encounter. Through advances in biotechnology, information in biotechnology can now be readily extracted, resulting in large amounts of digital data being captured in machine-readable form. This biospecific "datamation" produces different types of biological data, reflecting the molecular events that are occurring in the disease. When building a personalized disease treatment framework, the big biological data must be analyzed in a meaningful and operable way to capture disease information from the big biological data.
The support vector machine is a technology in data mining, and can process machine learning related problems by using an optimization method, and the method has been greatly developed in recent years and becomes an important method for solving the problems of 'over-learning' and 'dimension disaster'. The support vector machine is used for disease prediction, and the correlation between biological big data and diseases can be effectively researched, so that the effective prediction of the diseases is realized. The support vector machine has the problem of parameter selection, and the difference of the parameter selection can directly influence the prediction precision and the generalization capability of the support vector machine. In recent years, a number of scholars improve the parameter optimization method of the support vector machine, and the effect of optimizing the parameters of the support vector machine based on the cuckoo algorithm is more excellent than that of other methods, but the method also has some defects, such as the cuckoo algorithm has low optimization accuracy and low convergence rate.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a biological big data analysis and disease accurate identification classification prediction system based on an algorithm and a block chain.
The purpose of the invention is realized by the following technical scheme:
the biological big data analysis and disease accurate identification classification prediction system based on the algorithm and the block chain comprises a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with a disease label are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input into the data preprocessing unit, the data preprocessing unit normalizes the received biological big data when new biological big data are received each time, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease label in the biological big data are marked, all the clusters obtained are input into the data analysis unit as sample subsets, when the new sample subsets are received by the data analysis unit, the support vector machine is re-tested according to the new sample subsets and the corresponding disease labels, so that a disease prediction model based on the biological data is established, the health report generation unit is used for writing the received biological big data of a user into the health report generation unit, and generating a health report according to the corresponding disease prediction result of the training report of the user, and displaying the health report in the user data on the user.
Preferably, the statement-of-health of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a tagged biological database and an untagged biological database, the tagged biological database is used for storing biological big data with disease tags, the untagged biological database calls and stores the biological big data which is not called in the blockchain storage module at intervals of a given updating period, so as to update the biological big data in the untagged biological database, and the biological big data in the tagged biological database and the updated biological big data in the untagged biological database are input into the data preprocessing unit.
Preferably, the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm.
Preferably, in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the t iteration update, p a Indicating the probability of finding, the bird's nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, X i (t)=x i (t); when the random number rand>p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) represents the bird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) the corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) adding Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,…,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates the bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
Figure BDA0003639087370000031
wherein R is i,l (t) indicates the bird nest position x i,l (t) bird nestPosition x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,
Figure BDA0003639087370000032
Represents a sequence Q i The first k bird nest positions in (t) are bird nest positions x i (t) is the mean of the spatial radii of the centers, and->
Figure BDA0003639087370000033
k is a given positive integer, and k satisfies: k is a radical of<n i (t), α and β are weighting coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) a set of randomly changing preferred bird nest positions using a parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to the bird nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
Figure BDA0003639087370000034
in the formula, k i (t) indicates a bird nest position x i (t) local range control parameters at random changes,
Figure BDA0003639087370000035
the median of the spatial detection coefficients of the bird nest positions reserved after the population is updated in the Laevir flight mode for the t time is represented, and
Figure BDA0003639087370000036
Figure BDA0003639087370000037
wherein mean denotes the median function->
Figure BDA0003639087370000038
Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the bird nest positions are the positions x of the participating bird nests in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the front k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
Figure BDA0003639087370000039
in the formula, x i (t) indicates the bird nest position x i (t) New nest position obtained by random Change, rand 1 Is a randomly generated random number between 0 and 1,
Figure BDA0003639087370000041
and &>
Figure BDA0003639087370000042
Are respectively in the set J i (t) randomly selecting a bird nest position, and->
Figure BDA0003639087370000043
Let f (x) i (t)) represents the bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) then X i (t)=x i (t) when f (χ) i (t))<f(x i (t)) then X i (t)=χ i (t)。
The beneficial effects created by the invention are as follows: the disease prediction model is established based on the support vector machine, so that accurate identification and prediction of diseases are realized, people can know the correlation between biological data and the diseases, the pathogenesis of the diseases is known, and the disease prediction model plays an important role in prevention, diagnosis, monitoring, prognosis and treatment of the diseases; the parameters of the support vector machine are optimized through an improved cuckoo algorithm, blindness of manual parameter selection is avoided, and prediction accuracy of the support vector machine is improved.
Drawings
The invention is further described with the aid of the accompanying drawings, in which, however, the embodiments do not constitute any limitation to the invention, and for a person skilled in the art, without inventive effort, further drawings may be derived from the following figures.
FIG. 1 is a schematic diagram of the present invention.
Detailed Description
The invention is further described with reference to the following examples.
Referring to fig. 1, the system for analyzing biological big data and accurately identifying, classifying and predicting diseases based on algorithm and block chain of the embodiment includes a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input to the data preprocessing unit, the data preprocessing unit normalizes the received biological big data each time new biological big data are received, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease labels in the biological big data are marked, each class obtained by clustering is input to the data analysis unit as a sample subset, when the new sample subset is received by the data analysis unit, the support vector machine is trained and tested again according to the new sample subset and the disease label corresponding to the new sample subset, so that a disease prediction model based on the biological data is established, the health report generation unit is used for writing the received biological big data with the disease labels in the health report, and generating a health report according to the disease prediction result of a user, and displaying the disease prediction model in the user, and generating a user.
Preferably, the statement-of-health of the user comprises identity information of the user and a disease label of the user.
Preferably, the biological database comprises a tagged biological database and an untagged biological database, the tagged biological database is used for storing biological big data with disease tags, the untagged biological database calls and stores biological big data which is not called in the blockchain storage module at intervals of a given updating period, so as to update the biological big data in the untagged biological database, and the biological big data in the tagged biological database and the biological big data in the updated untagged biological database are input to the data preprocessing unit.
The preferred embodiment establishes a disease prediction model based on a support vector machine, realizes accurate identification and prediction of diseases, and is helpful for people to know the correlation between biological data and diseases, thereby helping people to know the pathogenesis of diseases and playing an important role in prevention, diagnosis, monitoring, prognosis and treatment of diseases.
Preferably, the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit are optimized by using a cuckoo algorithm, in the cuckoo algorithm, the mean square error between the output value and the expected output value of the support vector machine is used as a fitness function of the cuckoo algorithm, and the smaller the fitness function value corresponding to the position of the bird nest is, the more optimal the position of the bird nest is.
Preferably, in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the tth iterative update, nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, then X i (t)=x i (t); when the random number rand>p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) represents a bird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) the corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) mixing Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,…,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates a bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
Figure BDA0003639087370000061
wherein R is i,l (t) indicates the bird nest position x i,l (t) in bird's nest position x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,
Figure BDA0003639087370000062
Represents a sequence Q i The first k bird nest positions in (t) and the bird nest position x i (t) is the mean of the spatial radii of the centers, and->
Figure BDA0003639087370000063
k is a given positive integer, and k satisfies: k is less than or equal to n i (t), the value of k can be taken to be 5, α and β are weight coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) a set of randomly changing preferred bird nest positions using a parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to the bird nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
Figure BDA0003639087370000064
in the formula, k i (t) indicates the bird nest position x i (t) local range control parameters at random change,
Figure BDA0003639087370000065
indicates the population isThe median of the spatial detection coefficient of the bird nest position retained after the t-th time of updating by adopting the Laevir flight mode, and
Figure BDA0003639087370000066
Figure BDA0003639087370000067
wherein mean denotes a median function, which takes the mean value>
Figure BDA00036390873700000612
Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the bird nest positions are the positions x of the participating bird nests in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the front k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
Figure BDA0003639087370000068
in the formula, chi i (t) indicates the bird nest position x i (t) bird nest position obtained by random variation, rand 1 Is a randomly generated random number between 0 and 1,
Figure BDA0003639087370000069
and &>
Figure BDA00036390873700000610
Are respectively in the set J i (t) randomly selected bird nest positions, and
Figure BDA00036390873700000611
let f (x) i (t)) represents a bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) in the presence of a catalyst,then X i (t)=x i (t) when f (χ) i (t))<f(x i (t)) then X i (t)=χ i (t)。
In the preferred embodiment, the punishment factor and the kernel function parameter of the support vector machine are optimized by using the cuckoo algorithm, so that the blindness of manually selecting the parameters is avoided, and the classification precision of the support vector machine is improved; the traditional cuckoo algorithm has the problems that the local optimization precision is not high enough, the convergence speed is not high enough and the like, and the problems easily cause that the traditional cuckoo algorithm cannot obtain the optimal parameters of the support vector machine, so that in order to improve the precision of optimizing the support vector machine by using the cuckoo algorithm, the preferred embodiment improves the traditional cuckoo algorithm and aims to improve the optimization precision and the convergence speed of the cuckoo algorithm, and the method specifically comprises the following steps: after the traditional cuckoo algorithm updates the positions of the bird nests by adopting the Levy flight mode, the positions of part of the bird nests in the population are changed randomly, namely two bird nest positions are randomly selected from the population to randomly change the current position of the bird nest, but the random change mode is too random and lacks adaptivity, so that the effect of improving the local optimization precision and the convergence speed cannot be well achieved, therefore, the preferred embodiment is arranged that when the positions of the bird nests are changed randomly, two better bird nest positions are selected from the population to randomly change the positions of the bird nests, therefore, the technical effect of improving the convergence rate of the algorithm is achieved, further, in order to enhance the local optimization precision of the algorithm and avoid the algorithm from falling into the local optimum, in the process of randomly changing the position of the bird nest, in the preferred embodiment, the spatial overlapping degree between the position of the more optimal bird nest close to the position of the bird nest in the population and the position of the bird nest is measured by the defined spatial detection coefficient, when the value of the spatial detection coefficient corresponding to the position of the bird nest is smaller, it is indicated that the more optimal bird nest position close to the position of the bird nest in the population and the local spatial overlapping degree formed by the position of the bird nest are higher, at this time, the parameter k is made to be higher i The value of (t) is large, i.e. in the sequence Q i (t) selecting more nest locations to participate in the random change of said nest locations, thereby increasing the diversity of the population when said nest locations areWhen the value of the space detection coefficient is larger, the overlapping degree of a local space formed by a preferred bird nest position closer to the bird nest position in the population and the bird nest position is smaller, and at the moment, the parameter k is made to be smaller i The value of (t) is small, i.e. in the sequence Q i And (t) selecting fewer bird nest positions to participate in the random change of the bird nest positions, so that the local search of the cuckoo is enhanced in the random change process, and the optimization precision of the algorithm is improved.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (3)

1. The system for analyzing biological big data and accurately identifying, classifying and predicting diseases based on algorithms and block chains is characterized by comprising a plurality of data terminals, a block chain storage module and a biological data analysis platform;
and a data end: the system comprises a user login unit, a data uploading unit and a health report display unit, wherein a user logs in at a data terminal through a login password at the user login unit, the user uploads the identity information of the user and the biological data of the user to a block chain storage module and a biological data analysis platform through the data uploading unit after logging in at the data terminal, and the health report display unit is used for displaying a health report of the user received by the data terminal;
a block chain storage module: the biological data uploading device is used for storing biological data uploaded by each data terminal;
biological data analysis platform: the system comprises a biological database, a data preprocessing unit, a data analysis unit and a health report generation unit, wherein biological big data with disease labels are stored in the biological database, the biological big data which are not called in a block chain storage module are called according to a given updating period and stored, so that the biological big data in the biological database are updated, the updated biological big data in the biological database are input into the data preprocessing unit, the data preprocessing unit normalizes the received biological big data each time new biological big data are received, and clusters the normalized biological big data by adopting a semi-supervised clustering algorithm, so that the biological data without the disease labels in the biological big data are marked, each class obtained by clustering is input into the data analysis unit as a sample subset, the data analysis unit trains and tests a support vector machine again according to the new sample subset and the disease label corresponding to the new sample subset when the new sample subset is received, so as to establish a disease prediction model based on the biological data, the health report generation unit is used for writing the received biological big data of a user into the health prediction model, and displaying the health prediction report of the user according to the disease prediction result of the user, and displaying the disease prediction report in the user on the user before the user prediction model;
optimizing the penalty factor and the kernel function parameter of the support vector machine of the data analysis unit by using a cuckoo algorithm; in the cuckoo algorithm, let x i (t) represents the position of the ith bird nest in the population that remains after the ith update using the Laevir flight mode, X i (t) represents the position of the ith nest in the population that remains after the t iteration update, p a Indicating the probability of finding, the bird's nest position x i (t) randomly generating a random number rand between 0 and 1, when the random number rand is less than or equal to p a When it is, then X i (t)=x i (t); when the random number rand > p a Then, X is determined in the following manner i The value of (t):
let x j (t) represents the position of the jth bird nest in the population which is reserved after the jth bird nest is updated by adopting a Laevir flight mode for the time t, and when the position x of the bird nest is j (t) satisfies: f (x) j (t))<f(x i (t)), the bird nest position x is set j (t) adding to the set M i (t) wherein M is i (t) indicates the relative nest position x in the population i (t) set of preferred bird nest positions, f (x) j (t)) representsBird nest position x j (t) the corresponding fitness function value, f (x) i (t)) represents the bird nest position x i (t) a corresponding fitness function value; set M i (t) bird nest position by its distance from bird nest position x i (t) the Euclidean distances are sorted from near to far to form a sequence Q i (t) adding Q i (t) is expressed as: q i (t)={x i,l (t),l=1,2,...,n i (t) }, in which x i,l (t) represents a sequence Q i The first bird nest position in (t), n i (t) represents a sequence Q i The number of bird nests in (t), definition H i (t) indicates the bird nest position x i (t) spatial detection coefficient, then H i The expression of (t) is:
Figure FDA0003981915600000021
wherein R is i,l (t) indicates a bird nest position x i,l (t) at bird nest position x i (t) a spatial radius of the center, and R i,l (t)=|x i,l (t)-x i (t) |, let x i,n (t) represents a sequence Q i (t) the nth bird nest position, R i,n (t) indicates the bird nest position x i,n (t) at bird nest position x i (t) a spatial radius of the center, and R i,n (t)=|x i,n (t)-x i (t)|,
Figure FDA0003981915600000022
Represents a sequence Q i The first k bird nest positions in (t) are bird nest positions x i (t) is the mean of the spatial radii of the centers, and +>
Figure FDA0003981915600000023
k is a given positive integer, and k satisfies: kvn i (t), α and β are weighting coefficients, α and β satisfy: α, β ∈ (0, 1) and α + β =1;
let J i (t) positions x of participating nests in the population i (t) randomly altered superior birdsSet of nest positions, using parameter k i (t) determining a set J i The preferred bird nest positions in (t) are specifically:
(1) According to bird's nest position x i (t) spatial detection coefficient H i (t) determining a parameter k i The value of (t):
Figure FDA0003981915600000024
in the formula, k i (t) indicates the bird nest position x i (t) local range control parameters at random change,
Figure FDA0003981915600000025
represents the median of the spatial detection coefficients of the bird nest positions reserved after the population is updated by adopting the Levis flight mode for the t time, and
Figure FDA0003981915600000026
wherein mean denotes the median function->
Figure FDA0003981915600000027
Indicating rounding down, and N indicating the number of nests in the population;
(2) Sequence Q i Front k in (t) i (t) the positions of the bird nests are the positions x of the bird nests participating in the population i (t) randomly changed preferred bird nest position, i.e. in sequence Q i (t) selecting the top k i (t) bird nest positions into set J i (t) in (t);
the bird nest position x i (t) the random change is performed in the following manner:
Figure FDA0003981915600000031
in the formula, x i (t) indicates the bird nest position x i (t) New nest position obtained by random Change, rand 1 To followA random number between 0 and 1 is generated by the machine,
Figure FDA0003981915600000032
and &>
Figure FDA0003981915600000033
Are respectively in the set J i (t) randomly selected bird nest positions, and
Figure FDA0003981915600000034
let f (x) i (t)) represents the bird nest position χ i (t) fitness function value, when f (χ) i (t))≥f(x i (t)) then X i (t)=x i (t) when f (χ) i (t))vf(x i (t)) then X i (t)=χ i (t)。
2. The system according to claim 1, wherein the report of the health of the user comprises identity information of the user and a disease label of the user.
3. The system according to claim 1, wherein the biometric database comprises a tagged biometric database and an untagged biometric database, the tagged biometric database is used for storing biometric data tagged with a disease, the untagged biometric database is used for retrieving and storing the non-retrieved biometric data in the blockchain storage module every given update period, so as to update the biometric data in the untagged biometric database, and the biometric data in the tagged biometric database and the biometric data in the updated untagged biometric database are input into the data preprocessing unit.
CN202210510098.XA 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain Active CN115050437B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211657023.0A CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application
CN202210510098.XA CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210510098.XA CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202211657023.0A Division CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application

Publications (2)

Publication Number Publication Date
CN115050437A CN115050437A (en) 2022-09-13
CN115050437B true CN115050437B (en) 2023-04-07

Family

ID=83157943

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210510098.XA Active CN115050437B (en) 2022-05-11 2022-05-11 Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain
CN202211657023.0A Pending CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211657023.0A Pending CN116130110A (en) 2022-05-11 2022-05-11 Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application

Country Status (1)

Country Link
CN (2) CN115050437B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953613B2 (en) * 2007-01-03 2011-05-31 Gizewski Theodore M Health maintenance system
US20090092299A1 (en) * 2007-10-03 2009-04-09 Siemens Medical Solutions Usa, Inc. System and Method for Joint Classification Using Feature Space Cluster Labels
CN109346182B (en) * 2018-08-28 2021-06-18 昆明理工大学 CS-RF-based risk early warning method for thalassemia
CN111696663A (en) * 2020-05-26 2020-09-22 平安科技(深圳)有限公司 Disease risk analysis method and device, electronic equipment and computer storage medium
CN113901623A (en) * 2021-10-18 2022-01-07 南京工程学院 SVM power distribution network topology identification method based on cuckoo search algorithm

Also Published As

Publication number Publication date
CN116130110A (en) 2023-05-16
CN115050437A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
Chakraborty et al. Novel Enhanced-Grey Wolf Optimization hybrid machine learning technique for biomedical data computation
Mucherino et al. Data mining in agriculture
JP2006518062A (en) Prediction algorithm training and testing database optimization system and method
CN106021990B (en) A method of biological gene is subjected to classification and Urine scent with specific character
CN113517066B (en) Depression assessment method and system based on candidate gene methylation sequencing and deep learning
CN113113130A (en) Tumor individualized diagnosis and treatment scheme recommendation method
CN110853756B (en) Esophagus cancer risk prediction method based on SOM neural network and SVM
CN112037925B (en) LSTM algorithm-based early warning method for new major infectious diseases
CN110021341A (en) A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access
Zaman et al. Codon based back propagation neural network approach to classify hypertension gene sequences
Hagiwara et al. BEANS: The benchmark of animal sounds
Bajaj et al. Heart Disease Prediction using Ensemble ML
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
CN113642613B (en) Medical disease feature selection method based on improved goblet sea squirt swarm algorithm
Kiran et al. A gradient boosted decision tree with binary spotted hyena optimizer for cardiovascular disease detection and classification
CN117198517B (en) Modeling method of motion reactivity assessment and prediction model based on machine learning
CN115050437B (en) Biological big data analysis and disease accurate identification classification prediction system based on algorithm and block chain
CN115985503B (en) Cancer prediction system based on ensemble learning
CN117351484A (en) Tumor stem cell characteristic extraction and classification system based on AI
CN116580848A (en) Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers
CN115206423A (en) Label guidance-based protein action relation prediction method
Barros et al. Supervised training of a simple digital assistant for a free crop clinic
Sharma et al. Prediction of cardiovascular diseases using genetic algorithm and deep learning techniques
CN113284620A (en) Method for establishing occupational health data analysis model
Madadi et al. Detecting retinal neural and stromal cell classes and ganglion cell subtypes based on transcriptome data with deep transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Algorithm and blockchain based biological big data analysis, precise disease identification, classification and prediction system

Granted publication date: 20230407

Pledgee: Hua Xia Bank Co.,Ltd. Kunming Branch

Pledgor: Yunnan Shengyue Information Technology Co.,Ltd.

Registration number: Y2024980005240

PE01 Entry into force of the registration of the contract for pledge of patent right