CN117708139A - Optimized storage and retrieval method for digital health database - Google Patents

Optimized storage and retrieval method for digital health database Download PDF

Info

Publication number
CN117708139A
CN117708139A CN202410165799.3A CN202410165799A CN117708139A CN 117708139 A CN117708139 A CN 117708139A CN 202410165799 A CN202410165799 A CN 202410165799A CN 117708139 A CN117708139 A CN 117708139A
Authority
CN
China
Prior art keywords
data
health data
user health
user
day
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410165799.3A
Other languages
Chinese (zh)
Other versions
CN117708139B (en
Inventor
肖俊
赵海珠
彭嘉聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jun'an Huier Health Technology Co ltd
Original Assignee
Beijing Jun'an Huier Health Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jun'an Huier Health Technology Co ltd filed Critical Beijing Jun'an Huier Health Technology Co ltd
Priority to CN202410165799.3A priority Critical patent/CN117708139B/en
Publication of CN117708139A publication Critical patent/CN117708139A/en
Application granted granted Critical
Publication of CN117708139B publication Critical patent/CN117708139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the technical field of database optimized storage, in particular to a digital health database optimized storage retrieval method, which comprises the following steps: acquiring user health data of each user; acquiring a daily heat index of the user health data according to the basic data, the health performance data and the daily query times of the user health data; clustering all the heat indexes of each day, and obtaining the category value of the heat index of each user health data based on the clustering result; acquiring a query number correction value of the user health data per day according to the query trend of the user health data per day and the fluctuation condition of the query number; acquiring the daily retrieval weight of the user health data according to the daily query times correction value, the heat index and the corresponding category value of the user health data; and readjusting the B+ tree structure of each day according to the retrieval weight and the heat index of the user health data of each user in each day. The invention aims to improve the retrieval speed of cold and hot data.

Description

Optimized storage and retrieval method for digital health database
Technical Field
The application relates to the technical field of database optimized storage, in particular to a digital health database optimized storage retrieval method.
Background
The digital health is supported by information technology, so that the medical service level is continuously improved, and the user experience is optimized. Health data is a digital healthy cornerstone, and database technology in the computer field is widely used in this field.
The database is used as a structured large-scale data set, so that efficient data storage, management and retrieval are realized. With the increasing health data, the role of databases in digital health is becoming increasingly important. Through the database, the medical institution can effectively organize and store the electronic health records of the patient, and realize quick retrieval and update of medical information. Not only is beneficial to improving the efficiency of medical service, but also can help medical staff to better understand the medical history and health condition of patients.
The traditional database is often stored by adopting a data structure with the number of B+ and the data structure effectively reduces the read-write times of a disk, but the arrangement sequence of the data in the sub-nodes of the B+ leaves is often in sequential arrangement, and the data is searched by using a sequential searching mode when the leaf nodes are searched, so that the efficiency is lower and the cold and hot states of the data are not considered.
Disclosure of Invention
In order to solve the technical problems, the invention provides an optimized storage and retrieval method for a digital health database, which aims to solve the existing problems.
The invention relates to a digital health database optimized storage retrieval method, which adopts the following technical scheme:
one embodiment of the invention provides a method for optimizing storage and retrieval of a digital health database, which comprises the following steps:
acquiring user health data of each user, wherein the user health data comprises basic data and health expression data, and storing the user health data by adopting a B+ tree;
acquiring the cold and hot proportionality coefficients of the user health data according to the difference degree between the basic data, the health expression data and the normal reference range of the user health data; acquiring a daily heat index of the user health data according to the cold and hot proportion coefficient of the user health data and the daily query frequency condition; clustering the heat index of each user health data day to obtain each cluster, and obtaining the category value of the heat index of each user health data day in each cluster according to the heat index difference between the clusters;
acquiring a query number correction value of the user health data per day according to the query trend of the user health data per day in the acquisition days and the fluctuation condition of the query number per day; the method comprises the steps that a correction value sequence of user health data is formed according to a time sequence by inquiring the number of times of correction values of user health data every day under the acquisition days; acquiring the daily retrieval weight of the user health data according to the corrected value sequence of the user health data, the daily heat index and the corresponding class value;
and readjusting the B+ tree structure of each day according to the retrieval weight and the heat index of the user health data of each user in each day.
Preferably, the user health data includes basic data and health performance data, including:
the base data includes, but is not limited to, age, height, weight, number of queries per day over the number of days of collection; the health performance data includes, but is not limited to, blood glucose, blood lipid, systolic pressure, diastolic pressure.
Preferably, the obtaining the heat and cold proportionality coefficient of the user health data according to the difference degree between the basic data, the health performance data and the normal reference range of the user health data includes:
for each health performance data in the user health data, acquiring the minimum distance between each health performance data and a normal reference range; calculating the sum of the minimum distances of all the health performance data in the user health data;
calculating the absolute value of the difference between the age in the user health data and the average age of all users; and taking the sum of the sum and the absolute value of the difference as a cold-hot proportionality coefficient of the user health data.
Preferably, the acquiring the minimum distance between each health performance data and the normal reference range includes:
setting a minimum distance between each health performance data and the normal reference range to 0 when each health performance data belongs to the normal reference range;
when each health performance data does not belong to the normal reference range, the minimum value in the absolute values of the differences between the minimum value and the maximum value of each health performance data and the normal reference range is taken as the minimum distance between each health performance data and the normal reference range.
Preferably, the obtaining the daily heat index of the user health data according to the heat and cold proportion coefficient of the user health data and the daily query times comprises:
acquiring the average value of the query times of all user health data in the day before each day; taking the inverse number of the reciprocal of the product of the average value of the query times and the cold-hot proportional coefficient of the user health data as an index of an exponential function based on a natural constant;
acquiring the query times and the heat index of the user health data in the day before each day; and adding the product of the calculation result of the index function and the heat index of the previous day to the query times of the previous day to obtain the heat index of the user health data every day.
Preferably, the obtaining the category value of the heat index of each user health data day in each cluster according to the heat index difference between clusters includes:
sequencing the heat indexes of the clustering centers corresponding to the clustering clusters from large to small to obtain a class sequence;
and taking the position order of each cluster center in the class sequence as the class value of the heat index of all user health data in the cluster to which each cluster center belongs.
Preferably, the obtaining the correction value of the query number of times of the user health data per day according to the query trend of the user health data per day in the acquisition days and the fluctuation condition of the query number of times of the user health data per day includes:
calculating the query time variance of the residual acquisition days except the j th day, and taking the opposite number of the query time variance as an index of an exponential function based on a natural constant;
acquiring a query frequency trend value of the j th day of the user health data; calculating the product of the calculation result of the exponential function and the inquiry times trend value; calculating the product of the difference value result of subtracting the calculation result of the exponential function from the number 1 and the query number of the j day;
the sum of the two products is taken as the correction value of the query times on the j th day of the user health data.
Preferably, the acquiring the trend value of the query times on the j th day of the user health data includes:
acquiring the query times of the j-1 th day and the j-2 th day of the user health data; and calculating the sum of the number of times of query times of the j-1 day and the number of times of query times of the j-2 day as the trend value of the query times of the j-th day of the user health data.
Preferably, the acquiring the daily retrieval weight of the user health data according to the correction value sequence of the user health data, the daily heat index and the corresponding category value includes:
acquiring a predicted value of the query times of each day by adopting an LSTM neural network for the correction value sequence; calculating the product of a preset regulating factor and the predicted value of the query times;
calculating the product of the difference result obtained by subtracting the preset adjustment factor from 1 and the daily heat index of the user health data; acquiring a category value of a heat index of each day of user health data; dividing the sum of the two products by the retrieval weight of the user health data per day of the class value.
Preferably, the readjusting the b+ tree structure of each day according to the searching weight and the heat index of the user health data of each user on each day comprises:
for the structure of the B+ tree of the data storage of each day, sequencing the user health data of each user according to the heat index of each day from large to small, and storing the sequencing result in the B+ tree;
and sequencing the user health data of each user in each leaf node in the B+ tree from large to small in the daily retrieval weight, and marking the median of all the retrieval weights in each leaf node in the corresponding leaf node.
The invention has at least the following beneficial effects:
according to the invention, firstly, aiming at the care degree of different users on the health of the users at different age stages and whether the users have unhealthy conditions, the heat and cold proportion coefficients of the user health data of each user are constructed, the care degree of the users on the user health data is mined, the data are distinguished in heat and cold, the follow-up construction of a query tree is facilitated, the retrieval weight of the heat data is set larger, and the users can inquire conveniently; according to the method, based on the past query times of the users and the overall query conditions of all users on the user health data of the users, the influence relationship of the historical data relative to the current data and the overall data relative to the local data can indirectly influence the possibility of the user to query the health data of the same day, namely, the daily heat index of the user health data can be obtained through step-by-step iteration of the query times of the previous days, the screening speed condition of the probability of the queried cold data is obtained by combining the overall cold and hot conditions of the database, and compared with the traditional cold and hot data classification, only the frequency of data access is considered, the cold and hot data is accurately judged by introducing Newton's cooling law, so that the method for subsequently constructing the B+ tree is more in line with the actual daily heat requirement;
according to the method, the historical access rule of the user health data is analyzed, the overall influence of the sudden query on the health data is restrained, the situation that the data is endowed with high heat due to the fact that the query times are suddenly increased in a certain day is avoided, and therefore the query times of the user health data in a day are corrected more accurately; taking the regularity factor of the user to the user health data access of the user into consideration, namely the user does not access for a plurality of times in the next consecutive days after accessing the data; meanwhile, according to the invention, by combining the predicted value of the query times of the user health data in each day, the heat index and the category conditions thereof, the retrieval weight is constructed, aiming at the concerned category, history and predicted conditions of different users on the health data, the sequence of the user health data in the B+ leaf sub-nodes is rearranged, so that the constructed B+ tree is more convenient for different kinds of users to query, and the query speed is improved; according to the invention, the structure of the B+ tree is readjusted by combining the heat index and the retrieval weight of the user health data of each user every day, the retrieval weight median of each leaf node is stored in the leaf node, the read-write times of a computer in a magnetic disk are reduced, and the retrieval speed of cold and hot data is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for optimizing storage and retrieval of a digital health database;
FIG. 2 is a schematic diagram of a frequency of query burstiness situation;
FIG. 3 is a flow chart of index construction of user health data on daily retrieval weights;
fig. 4 is a schematic diagram of the median Z of the retrieval weights of user health data stored in the respective leaf nodes of the b+ tree.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following description refers to the specific implementation, structure, characteristics and effects of a digital health database optimized storage searching method according to the invention in combination with the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the optimized storage and retrieval method of the digital health database provided by the invention with reference to the accompanying drawings.
The invention provides a digital health database optimized storage retrieval method.
Specifically, the following method for optimizing storage and retrieval of a digital health database is provided, please refer to fig. 1, and the method comprises the following steps:
step S001, collecting user health data of each user, including basic data and health performance data.
A series of health data such as age, height, weight, blood sugar, blood fat, systolic pressure, diastolic pressure and the like of a user are collected. And the reserved mobile phone number of the user is used as a unique primary key and a retrieval index of the user health data. And constructing a database of the user health data of each user according to the collected data, continuously monitoring the query and modification conditions of the user health data of each user for M days, and taking the final modification data after continuously monitoring for M days as the user health data of each user. Suppose there is user health data for a total of N users in the database. In this embodiment, the empirical value 30 is taken for M, and the practitioner can set the empirical value according to the actual situation.
Wherein, the basic data of the user such as age, height, weight and the like are called user basic data, and the blood sugar, blood fat, systolic pressure, diastolic pressure and the like of the user are called health performance data of the user, and are denoted by P.
The user health data of each user can be obtained, and the user health data comprises basic data and health expression data, so that the user health data of each user can be conveniently and specifically analyzed for the cold and hot states.
Step S002, according to the degree of interest of each user on the self-health data, the history access rule and the history access abnormal condition of the user health data, constructing the heat index and the retrieval weight of the user health data of each user every day.
In the embodiment, the user health data of each user is stored in the B+ tree, so that the user can conveniently query the data. However, because the relationship degree of different crowds on the self health data is different, for example, the middle-aged and elderly people are more concerned about the self health data, namely, the situation of frequent viewing exists, and for younger people, the health data is less concerned about the self health data, and the query is only carried out once in a long time. I.e., the healthy data is divided into hot and cold, there is hot data that is frequently queried and cold data that is less accessible. If the user health data of each user is directly stored in the B+ tree, the user health data cannot be optimally stored according to the characteristics of the cold and hot data, so that a series of problems such as low retrieval speed and the like can occur when the user queries the health database.
Newton's law of cooling describes the law of temperature change of an object under the influence of external heat. The temperature change of the object surface can be influenced by a plurality of different inherent factors such as the contact surface area of the object, the structural characteristics of the material of the object and the like, and the heat conduction and exchange efficiency between different object temperatures and the external environment can be different. Therefore, for different data records, the data cold-hot change proportionality coefficient can correspondingly change along with the inherent attribute of the data. In the original newton law of cooling, k represents the coefficient of thermal conductivity of the exchange transfer between the ambient temperature and the temperature of the object itself. For the importance of different data records, the corresponding heat and cold proportionality coefficients can be calculated, and in this embodiment, the heat and cold proportionality coefficient k of the user health data a is calculated by taking the user health data a as an example a
;
;
Wherein k is a Is the cold-hot proportionality coefficient, age, of the user health data a a Representing the age of the user health data a, mean (age) representing the average age of all data, n a The number of health expression data representing the user health data a, P ai Is the ith health performance data of the user health data a, Ω i For the normal reference range of the ith health performance data, the normal reference range in this example is provided by a healthcare professional, DE (P) aii ) Representing P ai And normal reference range omega iz A minimum distance therebetween. min () represents the minimum function, Ω min,i Is the minimum value of the normal reference range of the ith health performance data, Ω max,i Represents the maximum value of the normal reference range of the ith health performance data. Wherein, |P aimin,i The absolute value of the first difference of the user health data a is marked as P aimax,i The absolute value of the second difference of the user health data a is denoted.
The above calculation logic is to calculate the age of the user health data a The difference between the average mean (age) of all ages is that the older child guardians possibly concern about the health condition of the child guardians, the situation of frequent inquiry exists, the older child guardians possibly concern about the health data of the older child guardians due to the fact that the older child guardians can decline their own physical ability, and the situation of frequent inquiry exists, so that the information of the user is more likely to be thermal data when the difference is larger. The second term is the difference DE (P) of the health performance data P from the normal reference range aii ) A healthy user will recognize that he is relatively healthy and will reduce his/her own health data query, whereas a patient with relatively abnormal health performance data may frequently query his/her own health data. Therefore, the cold-hot proportionality coefficient k a Indicating the importance of the user health data a, if the heat and cold proportionality coefficient k a The larger the user health data is, the more important the user is, the more times the user needs to inquire, the larger the retrieval weight of the user health data of the user needs to be set when the query tree is built later, and the user can inquire conveniently.
The change rule can be applied to the data cold and hot dividing process, meanwhile, based on the analysis of the past query times of users, the query times of the previous day have a certain influence relationship, and the overall query condition of all users on the user health data of the users can also indirectly reflect the possibility of the users on the current day to query the health data of the users, namely, the daily heat index can be realized through the gradual iteration of the query times of the previous days,in this embodiment, taking the user health data a at the t-th day as an example, the heat index of the user health data a at the t-th day is constructed
;
In the method, in the process of the invention,heat index representing user health data a on day t, +.>Indicating the heat index of the user health data a at t-1 day, exp () indicating an exponential function based on a natural constant e, k a Is the cold-hot proportionality coefficient of the user health data, N is the total number of the user health data, +.>The total number of query times of all user health data on day t-1 is represented. />Indicating the number of queries of user health data a at t-1 day. Wherein the heat index of the user health data a on the first day of the acquisition +.>Set to 0.
The first term in the above equationConsidering the decay term of the heat of the user health data, when one user health data has no inquiry and modification for a long time, the heat index of the user health data is gradually reduced until the decay is 0. Second item->Taking into account the influence of the access mean of all data on individual data, when the average query of all dataWhen the number of times is large, namely the database of the user health data is in a hotter state, the probability of inquiring the cold data is large, < ->The smaller the term, the more important the data is +.>The smaller the heat decay of the data is, the slower. When the average query times of all the data are smaller, namely the database of the healthy data is in a colder state, the probability of querying the cold data is reduced, and the heat of the data is quickly attenuated. The second item represents the degree of interest in the data, and when a user looks more and more for his own health data, the more queries are made +.>The larger the data, the higher the degree of interest in the health data. The above formula is used for judging the heat degree of one data by the historical query of the data and fusing the cold and hot states of the whole database and the proportionality coefficient of the data. The higher the heat, the higher the frequency and probability that the data is accessed, indicating that the data is frequently queried.
To facilitate subsequent calculations, consider the heat index of all user health data that will be dailyThe abstraction is three categories, respectively: care, general and disregard. And clustering the heat indexes of N pieces of user health data on the t th day by using a K-means clustering algorithm, and setting the number of clustering centers to be 3. The K-means clustering algorithm inputs unlabeled data and gathers the data into different categories, and detailed descriptions of known techniques are omitted. Clustering to obtain clustering centers of three clusters, and respectively marking heat data of each clustering center as cen 1 、cen 2 、cen 3 And cen 1 >cen 2 >cen 3 . Daily user health data are classified into three categories according to their heat index to different cluster centers, denoted cl 1 ,cl 2 ,cl 3 。cl 1 Category of interest expressed as a heat index to user health dataCluster cl 1 The category of the heat index of each user health data in the database is marked as 1; cl (Chinese character) 2 Expressed as a general category of heat index to user health data, cluster cl 2 The category of the heat index of each user health data in the database is marked as 2; cl (Chinese character) 3 Expressed as neglect category of heat index to user health data, cluster cl 3 The category of the heat index for each user health data within is labeled 3. The category of the heat index of the user health data a is denoted as a cl
In the above steps, the inquiry condition of the user health data of each user in M days is monitored, whereinThe number of queries for the first day of the user health data a. Considering that the user may have burstiness in accessing the self health data, for example, part of the users may have low query frequency for a long period, and the query frequency is high in the past several days, as shown in fig. 2, the abscissa in fig. 2 is the number of acquisition days, and the ordinate H a The query times of the user health data a under different acquisition days are obtained. The inquiry times of the method are lower in the early stage of acquisition days, but the situation of higher inquiry times can occur suddenly in the later stage.
Therefore, in order to reduce the influence of the sudden abnormal data of the query times of the health data on other normal query times, a query time correction value of the user health data is constructed, and in this embodiment, the query time correction value of the user health data a on the j th dayThe following are examples:
;
;
in the method, in the process of the invention,representing user healthData a query number correction value on day j, +.>Represents +.about.based on natural constant e>As an exponential function of an index>Indicating the variance of the number of queries of user health data a excluding the jth day data,/for the user health data a>Indicating the number of queries of user health data a on day j, +.>And the trend value of the query times of the user health data a on the j th day is represented. The correction value of the number of inquiry times of each day of the user health data a is formed into a correction value sequence and recorded as
The method corrects the mutation condition of the query times of the user health data in a certain day, and mainly comprises the following three conditions:
(1) Abnormal data exists in the user health data a on the acquisition days, and the query times of the user health data a on the j th day are as followsFor abnormal data, the variance of the data after the abnormal value is removed becomes smaller, namely +.>Becomes smaller, exponential function->Larger, resulting inSmaller, i.e. inquiry times correction value +.>The weight of the outlier part of (2) is smaller and +.>The more deviate from other normal data its weight +.>The smaller. Query times trend value +.>The query times of the j th day, namely the query times correction value, are simply predicted by calculating the difference value of the query times of the (j-1) day and the (j-2) day as the trend>The weight of the query trend value part of (2) is larger, resulting in a query number correction value +.>More prefers +.>
(2) Abnormal data exists in the user health data a in the acquisition days, but the query times of the user health data a in the j th dayFor normal data, the variance of the data after removing the abnormal value becomes larger>Become larger, exponential function->Smaller, resulting inLarger, i.e. number of queries correction +.>The weight of the normal value part of (2) is larger, and +.>The more biased towards other normal data.
(3) If the user health data a has no abnormal data in the acquisition days, the number of inquiry times is similar in all the acquisition days, and at this timeSmaller (less)>About 1,/or>The weight of (2) is larger, the correction value +.>Bias towards normal data->
By the calculation of the formula, the influence of abnormal query data on the whole is restrained, and the condition that the data is endowed with larger heat degree due to the fact that the query times are suddenly increased in a certain day is avoided.
In addition, there may be regularity in the user's access to the self-health data, e.g., some users may query or modify the health data every other day, where it is not logical to give the data a greater heat the next day if the user has queried or modified the health data the previous day. Therefore, taking the regularity factor of the user to the health data access into consideration, the retrieval weight of the user health data is constructed, and the embodiment uses the retrieval weight of the user health data a on the t th dayThe following are examples:
;
in the method, in the process of the invention,for the search weight of the user health data a on the t-th day, beta is a preset adjustment factor, and the value of the search weight is 0.8 +.>For the heat index of the user health data a at the t-th day, the LSTM long-term memory network is a known technology and will not be described again. Q (Q) a For a correction value sequence of user health data a, LSTM (Q a,t ) To use the correction value sequence Q a Predicted value of number of queries on day t, a cl,t A category value for the heat index of user health data a on day t. Wherein (1)>For the first importance index, β×LSTM (Q a,t ) Is a second review index. The index construction flow chart of the search weight of the user health data on a daily basis is shown in fig. 3.
In the above-mentioned method, the step of,explained as the influence of the heat index of the past user health data a on the retrieval weight, β×lstm (Q a ) The influence of the predicted value of the number of times of the query of the past user health data a on the retrieval weight is explained, and the predicted current importance index is obtained because the past query already occurs. When the user health data a is hot data, i.e. the data has more recent inquiry or modification times, the heat index of the user health data is +.>The predicted value of the query times obtained by using the corrected value sequence query prediction is larger, and meanwhile, the category value a of the heat index of the user health data a on the t th day is larger cl,t Smaller, i.e. the data is of the class of interest to the user, resulting in a search weight +.>And the constructed B+ tree is more convenient for users to inquire. When the user health data a is cold data, i.e. the data is recently queried orThe number of modifications is small, the heat index of the user health data is +.>Smaller, the predicted query times using the corrected value sequence query prediction is smaller, while the user health data a has a category value a of the heat index on the t-th day cl,t Smaller, i.e. the data is of a general or neglected category of the user, resulting in a retrieval weight +.>And the method is smaller, so that the cold and hot data can be constructed in a specific way according to the characteristics of the data when the B+ tree is constructed.
Thus, the retrieval weight and the heat index of each user health data in each day can be obtained according to the calculation method.
And step S003, readjusting the structure of the B+ tree according to the heat index and the retrieval weight of the user health data of each user every day, and optimizing the storage index of the user health data.
In this embodiment, the heat index of each user health data is rearranged from large to small every day, and the user health data is stored in the b+ tree according to the rearranged order. The order of the user health data in the leaf nodes in the B+ tree is ordered according to the retrieval weight of each user health data in each day from big to small, the data with high retrieval weight indicates that the data are frequently queried and modified, and the data can be queried with great probability in the future, the order of the B+ tree in the database is often between tens to hundreds, so the setting order can reduce the I/O times of a disk in a computer.
When cold data is queried, the data can be queried only by nearly traversing a section of doubly linked list in the leaf node due to the fact that the retrieval weight is smaller and the data is arranged at the back position in the leaf node. Queries for cold data are slow. To solve this problem, the median of all data retrieval weights for a leaf node is stored at that node. As shown in FIG. 4, wherein Z 1 Representing all users under leaf node 1 of the b+ treeMedian of retrieval weight of health data. When new insertion data exists, the retrieval weight of the new insertion data is set to be 0, and the new insertion data is inserted into the last position in the leaf node. When data retrieval is carried out, the reserved mobile phone number of the user is used as a retrieval index of the user, the retrieval weight and the retrieval index are mutually bound, and the bound result is used as a compound index for inquiry.
And updating and adjusting the structure of the B+ tree in real time by calculating the heat index and index weight of the user health data of each user every day.
This embodiment is completed.
In summary, according to the embodiment of the invention, firstly, aiming at the care degree of different users on the health of the users at different age stages and whether the users have unhealthy conditions, the heat and cold proportion coefficient of the user health data of each user is constructed, the care degree of the users on the user health data is mined, the data are distinguished in heat and cold, the search weight of the heat data is set larger in the follow-up construction of the query tree, and the users can inquire conveniently; according to the method, the device and the system, the historical data can indirectly influence the possibility of inquiring the self-health data of the user on the same day relative to the influence relation of the current data and the whole data relative to the local data based on the past inquiring times of the user and the whole inquiring conditions of the user health data of all users, namely, the daily heat index of the user health data can be obtained through step-by-step iteration of the inquiring times of the previous days, the screening speed condition of the possibility that the cold data is inquired can be obtained by combining the whole cold and hot conditions of the database, and compared with the traditional cold and hot data dividing process, only the frequency of data access is considered, the cold and hot data is accurately judged by introducing Newton cooling law, so that the method for constructing the B+ tree is more in line with the actual daily heat requirement;
according to the embodiment of the invention, through analyzing the historical access rule of the user health data, the overall influence of the sudden query on the health data is restrained, the condition that the data is endowed with high heat due to the sudden increase of the query times in a certain day is avoided, and the query times of the user health data in a certain day are corrected more accurately; taking the regularity factor of the user to the user health data access of the user into consideration, namely the user does not access for a plurality of times in the next consecutive days after accessing the data; meanwhile, according to the embodiment of the invention, the search weight is constructed by combining the predicted value of the query times of the user health data in each day, the heat index and the category condition thereof, and aiming at the concerned category, history and predicted condition of the health data of different users, the sequence of the user health data in the B+ leaf sub-nodes is rearranged, so that the constructed B+ tree is more convenient for different types of users to query, and the query speed is improved; according to the embodiment of the invention, the structure of the B+ tree is readjusted by combining the heat index and the retrieval weight of the user health data of each user every day, the retrieval weight median of each leaf node is stored in the leaf node, the read-write times of a computer in a magnetic disk are reduced, and the retrieval speed of cold and hot data is effectively improved.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.

Claims (10)

1. The optimized storage and retrieval method for the digital health database is characterized by comprising the following steps of:
acquiring user health data of each user, wherein the user health data comprises basic data and health expression data, and storing the user health data by adopting a B+ tree;
acquiring the cold and hot proportionality coefficients of the user health data according to the difference degree between the basic data, the health expression data and the normal reference range of the user health data; acquiring a daily heat index of the user health data according to the cold and hot proportion coefficient of the user health data and the daily query frequency condition; clustering the heat index of each user health data day to obtain each cluster, and obtaining the category value of the heat index of each user health data day in each cluster according to the heat index difference between the clusters;
acquiring a query number correction value of the user health data per day according to the query trend of the user health data per day in the acquisition days and the fluctuation condition of the query number per day; the method comprises the steps that a correction value sequence of user health data is formed according to a time sequence by inquiring the number of times of correction values of user health data every day under the acquisition days; acquiring the daily retrieval weight of the user health data according to the corrected value sequence of the user health data, the daily heat index and the corresponding class value;
and readjusting the B+ tree structure of each day according to the retrieval weight and the heat index of the user health data of each user in each day.
2. The method for optimized storage retrieval of a digital health database as set forth in claim 1, wherein said user health data comprises basic data and health performance data, comprising:
the base data includes, but is not limited to, age, height, weight, number of queries per day over the number of days of collection; the health performance data includes, but is not limited to, blood glucose, blood lipid, systolic pressure, diastolic pressure.
3. The method for optimizing storage and retrieval of a digital health database according to claim 2, wherein the step of obtaining the heat and cold proportionality coefficient of the user health data according to the degree of difference between the basic data of the user health data, the health performance data and the normal reference range comprises the steps of:
for each health performance data in the user health data, acquiring the minimum distance between each health performance data and a normal reference range; calculating the sum of the minimum distances of all the health performance data in the user health data;
calculating the absolute value of the difference between the age in the user health data and the average age of all users; and taking the sum of the sum and the absolute value of the difference as a cold-hot proportionality coefficient of the user health data.
4. A method of optimizing storage and retrieval of a digital health database as claimed in claim 3, wherein said obtaining a minimum distance between each health performance data and a normal reference range comprises:
setting a minimum distance between each health performance data and the normal reference range to 0 when each health performance data belongs to the normal reference range;
when each health performance data does not belong to the normal reference range, the minimum value in the absolute values of the differences between the minimum value and the maximum value of each health performance data and the normal reference range is taken as the minimum distance between each health performance data and the normal reference range.
5. The method for optimizing storage and retrieval of a digital health database according to claim 4, wherein the step of obtaining the daily heat index of the user health data according to the heat and cold proportionality coefficient of the user health data and the daily query number comprises the steps of:
acquiring the average value of the query times of all user health data in the day before each day; taking the inverse number of the reciprocal of the product of the average value of the query times and the cold-hot proportional coefficient of the user health data as an index of an exponential function based on a natural constant;
acquiring the query times and the heat index of the user health data in the day before each day; and adding the product of the calculation result of the index function and the heat index of the previous day to the query times of the previous day to obtain the heat index of the user health data every day.
6. The method for optimizing storage and retrieval of a digital health database according to claim 1, wherein the step of obtaining the category value of the daily heat index of each user health data in each cluster according to the heat index difference between clusters comprises the steps of:
sequencing the heat indexes of the clustering centers corresponding to the clustering clusters from large to small to obtain a class sequence;
and taking the position order of each cluster center in the class sequence as the class value of the heat index of all user health data in the cluster to which each cluster center belongs.
7. The method for optimizing storage and retrieval of a digital health database according to claim 1, wherein the step of obtaining the daily query number correction value of the user health data according to the daily query trend and daily query number fluctuation of the user health data in the acquisition days comprises the steps of:
calculating the query time variance of the residual acquisition days except the j th day, and taking the opposite number of the query time variance as an index of an exponential function based on a natural constant;
acquiring a query frequency trend value of the j th day of the user health data; calculating the product of the calculation result of the exponential function and the inquiry times trend value; calculating the product of the difference value result of subtracting the calculation result of the exponential function from the number 1 and the query number of the j day;
the sum of the two products is taken as the correction value of the query times on the j th day of the user health data.
8. The method for optimizing storage and retrieval of a digital health database according to claim 7, wherein obtaining the trend value of the number of queries on the j-th day of the user health data comprises:
acquiring the query times of the j-1 th day and the j-2 th day of the user health data; and calculating the sum of the number of times of query times of the j-1 day and the number of times of query times of the j-2 day as the trend value of the query times of the j-th day of the user health data.
9. The method for optimizing storage and retrieval of a digital health database according to claim 1, wherein the step of obtaining the daily retrieval weight of the user health data according to the corrected value sequence of the user health data, the daily heat index and the corresponding category value comprises the steps of:
acquiring a predicted value of the query times of each day by adopting an LSTM neural network for the correction value sequence; calculating the product of a preset regulating factor and the predicted value of the query times;
calculating the product of the difference result obtained by subtracting the preset adjustment factor from 1 and the daily heat index of the user health data; acquiring a category value of a heat index of each day of user health data; dividing the sum of the two products by the retrieval weight of the user health data per day of the class value.
10. The method for optimizing storage and retrieval of a digitized health database of claim 1 wherein said readjusting the daily b+ tree structure based on daily retrieval weights and heat indices of each user's user health data comprises:
for the structure of the B+ tree of the data storage of each day, sequencing the user health data of each user according to the heat index of each day from large to small, and storing the sequencing result in the B+ tree;
and sequencing the user health data of each user in each leaf node in the B+ tree from large to small in the daily retrieval weight, and marking the median of all the retrieval weights in each leaf node in the corresponding leaf node.
CN202410165799.3A 2024-02-05 2024-02-05 Optimized storage and retrieval method for digital health database Active CN117708139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410165799.3A CN117708139B (en) 2024-02-05 2024-02-05 Optimized storage and retrieval method for digital health database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410165799.3A CN117708139B (en) 2024-02-05 2024-02-05 Optimized storage and retrieval method for digital health database

Publications (2)

Publication Number Publication Date
CN117708139A true CN117708139A (en) 2024-03-15
CN117708139B CN117708139B (en) 2024-05-03

Family

ID=90151984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410165799.3A Active CN117708139B (en) 2024-02-05 2024-02-05 Optimized storage and retrieval method for digital health database

Country Status (1)

Country Link
CN (1) CN117708139B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071432A1 (en) * 2014-09-10 2016-03-10 Pathway Genomics Corporation Health and wellness management methods and systems useful for the practice thereof
CN108597605A (en) * 2018-03-19 2018-09-28 特斯联(北京)科技有限公司 A kind of life big data acquisition of personal health and analysis system
CN116226179A (en) * 2023-02-03 2023-06-06 西藏云图测绘有限公司 Database optimization method
CN116705337A (en) * 2023-08-07 2023-09-05 山东第一医科大学第一附属医院(山东省千佛山医院) Health data acquisition and intelligent analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071432A1 (en) * 2014-09-10 2016-03-10 Pathway Genomics Corporation Health and wellness management methods and systems useful for the practice thereof
CN108597605A (en) * 2018-03-19 2018-09-28 特斯联(北京)科技有限公司 A kind of life big data acquisition of personal health and analysis system
CN116226179A (en) * 2023-02-03 2023-06-06 西藏云图测绘有限公司 Database optimization method
CN116705337A (en) * 2023-08-07 2023-09-05 山东第一医科大学第一附属医院(山东省千佛山医院) Health data acquisition and intelligent analysis method

Also Published As

Publication number Publication date
CN117708139B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
US9805081B2 (en) Record linkage algorithm for multi-structured data
KR101903522B1 (en) The method of search for similar case of multi-dimensional health data and the apparatus of thereof
CN108352196A (en) There is no hospital&#39;s matching in the health care data library for going mark of apparent standard identifier
JP2017037648A (en) Hybrid data storage system, method, and program for storing hybrid data
Lin et al. Temporal event tracing on big healthcare data analytics
Yang et al. Continuous KNN join processing for real-time recommendation
CN113161001B (en) Improved LDA-based process path mining method
CN115238168A (en) Self-adaptive remote medical expert recommendation method
Pan et al. Google trends analysis of covid-19 pandemic
CN111540460A (en) Intelligent medical system with knowledge base and diagnosis method thereof
Jindal et al. An efficient fuzzy rule-based big data analytics scheme for providing healthcare-as-a-service
Glance et al. Intensive care unit prognostic scoring systems to predict death: a cost-effectiveness analysis
CN111667915A (en) Intelligent medical system with disease reasoning and diagnosis method thereof
CN117216131B (en) Traditional Chinese medicine data management system based on data retrieval
CN113160879A (en) Method for predicting drug relocation through side effect based on network learning
CN117708139B (en) Optimized storage and retrieval method for digital health database
CN107273405B (en) Intelligent retrieval system of electronic medical record files based on MeSH table
CN111540461A (en) Intelligent medical system with problem understanding function and diagnosis method thereof
US20200364566A1 (en) Systems and methods for predicting pain level
Chauhan et al. A spectrum of big data applications for data analytics
Meij et al. Using Prior Information Derived from Citations in Literature Search.
Samydurai et al. An Enhanced Entity Model for Converting Relational to Non-Relational Documents in Hospital Management System Based on Cloud Computing
Chen et al. Fast processing of conversion time data flow in cloud computing via weighted FP-tree mining algorithms
Boff Medeiros et al. Predicting the length-of-stay of pediatric patients using machine learning algorithms
Lee et al. Design and implementation of a system for environmental monitoring sensor network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant