WO2021052156A1 - 数据分析方法、装置、设备及计算机可读存储介质 - Google Patents

数据分析方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021052156A1
WO2021052156A1 PCT/CN2020/112468 CN2020112468W WO2021052156A1 WO 2021052156 A1 WO2021052156 A1 WO 2021052156A1 CN 2020112468 W CN2020112468 W CN 2020112468W WO 2021052156 A1 WO2021052156 A1 WO 2021052156A1
Authority
WO
WIPO (PCT)
Prior art keywords
time series
historical
index
change
slope
Prior art date
Application number
PCT/CN2020/112468
Other languages
English (en)
French (fr)
Inventor
赵惟
徐卓扬
左磊
孙行智
田静涛
胡岗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021052156A1 publication Critical patent/WO2021052156A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • This application relates to the field of data analysis technology, and in particular to a data analysis method, device, equipment, and computer-readable storage medium.
  • the core of precision medicine is to provide personalized treatment according to the individual differences of patients, which is also the most difficult part of treatment.
  • patients such as 100 million diabetic patients
  • how to divide patients (such as 100 million diabetic patients) into several subgroups and formulate different treatment methods for each subgroup to achieve the best treatment effect is a big challenge.
  • the inventor realizes that for patients with chronic diseases, multiple visits and continuous observation are often required.
  • the existing patient grouping methods generally only consider the current single inspection and inspection indicators and basic information, ignoring the previous indicators and the current time of the patient. The relevance of the indicators, therefore, the existing clustering methods are contingent and random for chronic diseases, and the resulting clustering recommendations are not highly reliable.
  • the main purpose of this application is to provide a data analysis method, device, equipment, and computer readable storage medium, aiming to solve the existing technical problem of low reliability of patient grouping results.
  • an embodiment of the present application provides a data analysis method, and the data analysis method includes:
  • Access a preset database obtain time series sample indicators of historical patients from the preset database, and filter the time series sample indicators by means of a significance test to obtain time series statistically associated with the health information of the historical patient Predictive index
  • the nonlinear relationship between the mean value of the numerical change slope and the historical clustering result is analyzed, and the classification control slope that characterizes the nonlinear relationship is determined, and the classification control slope is determined according to the classification.
  • the similar control slope is simulated in the preset coordinate system to obtain the control trajectory;
  • an embodiment of the present application further provides a data analysis device, and the data analysis device includes:
  • the index acquisition module is used to access a preset database, obtain the time series sample index of the historical patient from the preset database, and filter the time series sample index by means of a significance test to obtain the health information of the historical patient Time series prediction indicators with statistical correlation;
  • the first analysis module is used to analyze the change relationship of the numerical value of the time series prediction index over time, and obtain the mean value of the numerical change slope corresponding to the change relationship;
  • the second analysis module is used to analyze the nonlinear relationship between the mean value of the numerical change slope and the historical clustering result based on the characteristic attribution method and the historical clustering results of the historical patients, and determine the classification control that characterizes the nonlinear relationship Slope, and simulate the control trajectory line in a preset coordinate system according to the classified control slope;
  • the trajectory fitting module is configured to obtain the time series test index of the current patient according to the index type of the time series predictive index, and fit the corresponding test trajectory line in the preset coordinate system according to the time series test index;
  • the position comparison module is used for position comparison between the inspection trajectory line and the control trajectory line, and determines according to the position relationship between the inspection trajectory line and the control trajectory line, and the historical grouping results of the historical patients The clustering result of the current patient.
  • an embodiment of the present application further provides a data analysis device, the data analysis device including a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein When the computer program is executed by the processor, the steps of the above-mentioned data analysis method are realized.
  • the embodiments of the present application also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned data is realized. Steps of the analytical method.
  • the embodiment of the application analyzes the time series sample indicators of historical patients with chronic diseases that show numerical changes over time, identifies time series prediction indicators that are related to disease development, and analyzes and determines the time series predictions corresponding to different historical patient groups
  • the index change trend provides a reference basis for the grouping of chronic disease patients, and then compares and matches the change trend of the time series test index of the current patient with the change trend of the time series predictive index corresponding to the historical patient group to determine the grouping result of the current patient; Since the embodiments of the present application perform grouping of patients based on multiple test indicators of patients, the adverse effects of contingency and randomness of single test data on the reliability of grouping are reduced, and the reliability of grouping of patients is improved.
  • FIG. 1 is a schematic diagram of the hardware structure of the data analysis device involved in the solution of the embodiment of the application;
  • FIG. 3 is a schematic diagram of the SHAP value for K-the mean value K of the slope of value change involved in the first embodiment of the data analysis method of this application;
  • FIG. 4 is a schematic diagram of the functional modules of the first embodiment of the data analysis device of this application.
  • the data analysis method involved in the embodiments of the present application is mainly applied to data analysis equipment, and the data analysis equipment may be a server, a personal computer (PC), a notebook computer, or other equipment with data processing functions.
  • the data analysis equipment may be a server, a personal computer (PC), a notebook computer, or other equipment with data processing functions.
  • FIG. 1 is a schematic diagram of the hardware structure of the data analysis device involved in the solution of the embodiment of the application.
  • the data analysis device may include a processor 1001 (for example, a central processing unit, a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to realize the connection and communication between these components;
  • the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard);
  • the network interface 1004 may optionally include a standard wired interface, a wireless interface (Such as wireless fidelity WIreless-FIdelity, WI-FI interface);
  • the memory 1005 can be a high-speed random access memory (random access memory, RAM), or a stable memory (non-volatile memory), such as a disk memory, a memory
  • 1005 may also be a storage device independent of the foregoing processor 1001.
  • the hardware structure shown in FIG. 1 does not constitute a limitation to the present application, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a computer program.
  • the network communication module can be used to connect to a preset database and perform data communication with the database; and the processor 1001 can call a computer program stored in the memory 1005 and execute the data analysis method provided in the embodiment of the present application.
  • the embodiment of the present application provides a data analysis method.
  • FIG. 2 is a schematic flowchart of the first embodiment of the data analysis method of this application.
  • the data analysis method includes the following steps:
  • Step S10 access a preset database, obtain time series sample indicators of historical patients from the preset database, and filter the time series sample indicators by means of a significance test to obtain statistics that are statistically consistent with the historical patient’s health information Associated time series forecast indicators;
  • this embodiment proposes a data analysis method based on the trajectory trend of risk indicators. By analyzing the time series sample indicators of historical patients with chronic diseases that show numerical changes over time, the time series related to the development of the disease are identified.
  • Predictive indicators and analyze and determine the change trend of time series prediction indicators corresponding to different historical patient groups, provide a reference basis for the grouping of chronic disease patients, and then compare the time series test indicators of current patients with the time series prediction corresponding to the historical patient groups Index change trends are compared and matched to determine the clustering results of the current patients; because this embodiment is based on the patient's multiple test indicators for patient clustering, thus reducing the adverse impact of the contingency and randomness of single test data on the reliability of the clustering , Improve the reliability of patient grouping, and provide an effective reference basis for patient health assessment.
  • the data analysis method in this embodiment is implemented by a data analysis device.
  • the data analysis device may be a server, a personal computer, a notebook computer, or other devices.
  • a server is taken as an example for description.
  • the server is in communication connection with a preset database; the database stores several sample indicators provided by historical patients.
  • the sample indicators of diabetic patients include glycosylated hemoglobin and blood sugar. Concentration, blood pressure, etc.
  • the sample indicators of patients with chronic kidney disease include glomerular filtration rate and so on. It is worth noting that, for each type of test sample, it includes several data values at the test time, which has a certain time series, rather than a single test data value, that is, the sample index is a time series sample index.
  • the server in this embodiment can obtain time series sample indicators from a preset database. For these time series sample indicators, due to their many categories, in practice, not all time series sample indicators are related to a certain type of disease. Therefore, the server can select from time series sample indicators by means of significance testing or manual marking and screening.
  • the time series predictive indicators that are relevant to the user’s health are screened out in the database and used as possible risk factors for subsequent analysis; among them, the user’s health can be based on the historical user’s health corresponding to the time series sample indicators. Information is obtained, so the time-series predictive index can be considered to be statistically related (with significant statistical significance) to the health information of historical patients.
  • various time series sample indicators can be used as feature variables, and the final health status of historical patients (or disease diagnosis results, adverse events, death, etc.) can be used as outcome variables, and then chi-squared
  • the test method explores the relationship between the characteristic variable and the outcome variable, and the P-value ⁇ 0.05 calculated by the chi-square test is used to identify the characteristic variable that has a statistically significant impact on the outcome variable, and the time series corresponding to the characteristic variable
  • the sample index is the time series predictive index; further, you can also use the relative risk RR or odds ratio OR to analyze whether these characteristic variables have a positive or negative impact on the outcome variable (and then determine the time series sample index as a risk or protective factor) .
  • Step S20 Analyze the change relationship of the numerical value of the time series prediction index over time, and obtain the mean value of the numerical change slope corresponding to the change relationship;
  • the server when the server obtains the time series predictive indicators that are related to the user's health, the server can analyze the change relationship of the values of these time series predictive indicators over time, and characterize the change relationship by means of the slope of the value change.
  • time is used as the independent variable (x-axis)
  • the value of the time series predictive index is used as the dependent variable (y-axis)
  • the numerical points corresponding to each time series predictive index are drawn in the preset coordinate system , And then connect the numerical points into a line according to the chronological order to obtain the predictive index line; then perform the slope analysis on the predictive index line to determine the mean value of the numerical change slope of the predictive index.
  • the mean value of the numerical change slope represents the value of the time series predictive index The relationship over time. It is worth noting that when the types of time series prediction indicators include multiple types, the server analyzes the various time series prediction indicators separately to obtain multiple mean value change slopes.
  • step S20 it further includes:
  • the server when the server obtains the time series prediction index related to health, in order to make the analysis process more accurate and reliable, it can first perform stability screening of the time series prediction index to eliminate the time series prediction index with large fluctuations, and obtain smooth fluctuations.
  • the target predictive index that has a monotonous change rule is analyzed, and then the target predictive index is analyzed; the monotonic change rule includes monotonic decline and monotonic rise.
  • the following formula can be used to identify:
  • x(i+1) is the data value of the time series predictive index at time i+1
  • x(i) is the data value of the time series predictive index at time i
  • a is a constant greater than zero and close to zero
  • b is A constant that is less than zero and close to zero
  • threshold1 and threshold2 are the absolute value thresholds of the rate of change, and both are constants greater than zero. Smooth fluctuation means that the absolute value of the numerical change rate of the time series predictive index is restricted within a threshold.
  • the step S20 includes:
  • the server When the server obtains the target predictive index again, it can analyze the change relationship of the value of the target predictive index over time to obtain the corresponding mean value of the slope of the value change.
  • the specific analysis process is as described above and will not be repeated here.
  • Step S30 Analyze the nonlinear relationship between the mean value of the numerical change slope and the historical clustering result based on the characteristic attribution method and the historical clustering results of the historical patients, and determine the classification control slope that characterizes the nonlinear relationship, and according to The categorized control slope is simulated in a preset coordinate system to obtain a control trajectory line;
  • the server when it obtains the mean value of the slope of the numerical change corresponding to the time series predictive index (target predictive index), it will be based on the SHAP feature attribution method and the historical patient grouping results (that is, the historical patient history corresponding to the time series predictive index).
  • Clustering results Analyze the non-linear relationship between the mean value of the numerical change slope and the historical patient grouping criteria (patient health status), and find the classification control slope to characterize the non-linear relationship, and the classification control slope may include the best control value k.
  • SHAP is a method of interpreting the output of a machine learning model by calculating the marginal contribution of a feature when it is added to the model, and then considering the different marginal contributions of the feature in all feature sequences and taking the average value.
  • the average value is also That is, the SHAP value of the feature, and the SHAP value is used to characterize the non-linear relationship between the feature pair and the outcome. The larger the SHAP value, the more positive the impact on the outcome, and the smaller the value, the more negative the impact on the outcome.
  • several numerical change slopes K may be used as characteristic variables, and these characteristic variables form the complete set N, and the historical clustering results of historical patients are used as the outcome variable; one is randomly selected from the complete set of characteristic variables N As the current variable ⁇ , it is determined that all the subsets of the current variable ⁇ are included in the complete set N (it is worth noting that including N itself).
  • the average value is the SHAP value of the current variable ⁇ ; and so on, the SHAP value of each characteristic variable can be obtained, that is, the SHAP value of each value change slope K against the historical clustering result is calculated, and then Determine the target variable that has a typical impact on the outcome variable according to the size of each SHAP value, and determine the mean value of the numerical change slope corresponding to the target variable as the classification control slope, such as determining the optimal control value k, The positive control value k1, which has a positive typical influence on the classification outcome, k1, the negative control value k2, which has a negative typical influence on the classification outcome; when the above control values are obtained, it can be considered that a predictive model of related patient groups has been established , When the average value of the slope of
  • the non-linear relationship between the mean K of the numerical change of the index and the historical clustering result is analyzed by the SHAP feature attribution method, and this relationship is output by the server
  • the graph of SHAP value for K-value change slope mean K can be illustrated, as shown in Figure 3; in Figure 3, the x-axis represents the mean value of the numerical change slope K, and the y-axis represents the value of the mean value of the value change slope K for the historical clustering results.
  • y represents the numerical trajectory of the index that has no obvious influence on the historical clustering result
  • y1 represents the numerical trajectory of the index that has a significant positive impact on the historical clustering result
  • y2 represents the numerical trajectory of the index that has a significant negative impact on the historical clustering result.
  • Step S40 Obtain the time series test index of the current patient according to the index type of the time series predictive index, and fit the corresponding test trajectory line in the preset coordinate system according to the time series test index;
  • the patients when the control trajectory line is obtained, the patients can be grouped according to the control trajectory line and the time sequence test index of the current patient.
  • the server can obtain the time series test index of the current patient according to the index type of the time series prediction index, that is, obtain the test index corresponding to the control trajectory (for example, the index of diabetic patients includes glycosylated hemoglobin, blood glucose concentration, blood pressure, etc., and chronic kidney disease patients. Indicators include glomerular filtration rate, etc.).
  • the step of obtaining the current patient's time-series test index according to the index type of the time-series predictive index includes:
  • the time-series test index may be automatically identified and filtered by the server according to the current patient's physical examination data. Specifically, after a current patient undergoes a physical examination (or performs some physical examination), he or she can upload his physical examination data to a database (such as a hospital's medical system database) by himself or by authorizing others.
  • a database such as a hospital's medical system database
  • the server will connect with the database to obtain the periodic physical examination data of the current patient in a preset period from the database, and then filter the periodic physical examination data according to the index type of the time series predictive index to obtain the index type corresponding to the time series predictive index According to the time series test index, subsequent analysis and processing are performed according to the time series test index, thereby improving the efficiency of index (data) acquisition, and it is also convenient for current patients to provide relevant test index data.
  • the method further includes:
  • the physical examination data stored in the database of each patient is created and stored in a table with different account identifications, and these physical examination data are stored in an encrypted manner in the database, and the key used for decryption is determined by the current
  • the patient keeps it by himself, thereby improving the security of data storage.
  • the server Before acquiring the periodic physical examination data of the current patient, the server first sends a data acquisition request to the patient terminal (such as a mobile phone, tablet computer, etc.) of the current patient to obtain the authority to retrieve the physical examination data of the current patient.
  • the patient terminal such as a mobile phone, tablet computer, etc.
  • the patient terminal can be operated to return the corresponding data permission information to the server.
  • the data permission information includes the patient account identifier and the patient data key.
  • the server receives the data permission information, it can parse the data permission information to obtain the corresponding patient account identification and patient data key.
  • the step of obtaining periodic physical examination data of the current patient in a preset period from the preset database includes:
  • the server when the server obtains the patient account identifier and the patient data key, it can access the preset database through the patient account identifier, query the corresponding data table (account data), and obtain the encrypted experience data of the current patient
  • the encrypted experience data is decrypted by the patient data key, and the periodic physical examination data of the current patient in a preset period is obtained according to the decryption result.
  • the server when the server obtains the encrypted experience data of the current patient, it can decrypt the encrypted experience data with the patient data key, and obtain the periodic physical examination data of the current patient in a preset period according to the decryption result.
  • the server When the server obtains the periodic physical examination data, it can filter the periodic physical examination data according to the index type of the time series predictive index, and obtain the time series test index corresponding to the index type of the time series predictive index; and then use the value in the time series test index as the dependent variable ( y-axis), with time as the independent variable (x-axis), fitting the corresponding inspection trajectory in the preset coordinate system.
  • Step S50 Perform a position comparison between the inspection trajectory line and the control trajectory line, and determine the current patient based on the positional relationship between the inspection trajectory line and the control trajectory line, and the historical grouping result of the historical patient The grouping result of.
  • the inspection trajectory when the inspection trajectory is obtained, the inspection trajectory can be compared with the control trajectory, and then the trajectory type of the inspection trajectory can be determined according to the position relationship between the inspection trajectory and the control trajectory;
  • the position relationship corresponds to the different historical grouping results of historical patients.
  • the grouping result of the current patient can be determined according to the positional relationship, so as to determine the similar patients of the current patient group.
  • the historical grouping results of historical patients include two results; in the preset coordinate system, a certain target quadrant of the preset coordinate system can be divided into at least two by the control trajectory line.
  • Sub-regions each of which corresponds to a historical clustering result; then the target sub-region where the test trajectory line is located can be determined, and the historical clustering result corresponding to the target sub-region is the clustering result of the current patient; it is worthwhile It is explained that, in order to facilitate the comparison of the positional relationship between the control trajectory line and the inspection trajectory line, certain translation processing can be performed on the two during the comparison, so that the two intersect at the same point on the y-axis or the x-axis.
  • step S50 it further includes:
  • the server when it obtains the grouping result of the current patient, it can send the grouping result of the current patient to the corresponding diagnosis and treatment terminal, so that medical personnel can provide reference for the diagnosis and treatment of the current patient.
  • the grouping result of the current patient is adjusted according to the grouping correction information, and the adjusted grouping result of the current patient is associated with the time-series test index and stored in The preset database.
  • the medical staff may adjust the grouping result of the current patient; when adjustment is needed, the medical staff can return the corresponding result to the server through the diagnosis and treatment terminal The grouping adjustment information.
  • the server receives the grouping adjustment information returned by the diagnosis and treatment terminal, it adjusts the grouping result of the current patient according to the grouping correction information, and then stores the adjusted grouping result of the current patient and the time series test index in a database for use.
  • follow-up reference use In this way, more sample data can be accumulated continuously according to the actual medical treatment process, which is convenient for subsequent optimization and adjustment of the analysis process.
  • the data analysis method of this embodiment further includes:
  • the corresponding control trajectory is re-acquired according to the grouping results corresponding to the newly-incoming time-series inspection index in the preset database and the newly-incoming time-series inspection index line.
  • the server will also count the number of times the grouping adjustment information is received. When the number of times the grouping adjustment information is received is greater than the preset threshold, it can be considered that the previously analyzed and determined, currently used control trajectory line does not meet The actual situation; at this time, the server can retrieve the newly-inbound time-series inspection index and the clustering results corresponding to the newly-inbound time-series inspection index, and then re-analyze and process according to the newly-inbound time-series inspection index and the clustering result to renew The corresponding control trajectory is obtained and used for subsequent patient grouping; wherein, the reacquiring process of the control trajectory is as described in the above steps, and will not be repeated here. In this way, the control trajectory can be continuously optimized and adjusted according to the actual medical treatment situation, thereby improving the accuracy and reliability of patient grouping.
  • the time series predictive indexes related to disease development are identified, and the time series predictive indexes corresponding to different historical patient groups are analyzed and determined
  • the change trend provides a reference basis for the grouping of patients with chronic diseases, and then compares and matches the change trend of the time series test index of the current patient with the change trend of the time series predictive index corresponding to the historical patient group to determine the grouping result of the current patient;
  • the embodiment of the application is based on the patient's multiple test indicators to group patients, thus reducing the adverse impact of the contingency and randomness of a single test data on the reliability of the grouping, improving the reliability of the patient grouping, and then assessing the health of the patients Provide an effective reference basis.
  • the method further includes:
  • the historical health data of the same type of patient is obtained from the preset database, and the historical health data is sent to the corresponding terminal.
  • the server when it obtains the grouping result of the current patient, it can obtain the historical health data of the same type of patients from the database according to the grouping result of the current patient, and then send these historical health data to the corresponding terminal (such as the diagnosis and treatment terminal of the medical staff). , The patient terminal of the current patient, etc.), to provide the corresponding terminal personnel with a health reference basis and provide convenience for subsequent diagnosis and treatment.
  • the corresponding terminal such as the diagnosis and treatment terminal of the medical staff.
  • the embodiment of the present application also provides a data analysis device.
  • FIG. 4 is a schematic diagram of the functional modules of the first embodiment of the data analysis device of this application.
  • the data analysis device includes:
  • the index acquisition module 10 is used to access a preset database, obtain time-series sample indexes of historical patients from the preset database, and filter the time-series sample indexes by means of a significance test to obtain the health of the historical patient. Time series predictive indicators with statistically relevant information;
  • the first analysis module 20 is configured to analyze the change relationship of the numerical value of the time series prediction index over time, and obtain the mean value of the numerical change slope corresponding to the change relationship;
  • the second analysis module 30 is configured to analyze the nonlinear relationship between the mean value of the numerical change slope and the historical clustering result based on the feature attribution method and the historical clustering results of the historical patients, and determine the classification that characterizes the nonlinear relationship Control the slope, and simulate the control trajectory line in a preset coordinate system according to the classified control slope;
  • the trajectory fitting module 40 is configured to obtain the time series test index of the current patient according to the index type of the time series predictive index, and fit the corresponding test trajectory line in the preset coordinate system according to the time series test index;
  • the position comparison module 50 is used for position comparison of the inspection trajectory line and the control trajectory line, and according to the position relationship between the inspection trajectory line and the control trajectory line, and the historical grouping results of the historical patients Determine the grouping result of the current patient.
  • each virtual function module of the above-mentioned data analysis device is stored in the memory 1005 of the data analysis device shown in FIG. 1 and is used to realize all the functions of the computer program; when each module is executed by the processor 1001, the function of patient grouping can be realized.
  • the data analysis device further includes:
  • An index screening module which is used to perform stability screening on the time series prediction index to obtain a target prediction index that meets a preset change rule
  • the first analysis module 20 is also used to analyze the change relationship of the numerical value of the target predictive index over time, and obtain the mean value of the numerical change slope corresponding to the change relationship.
  • the preset change rule includes monotonic decline and/or monotonic rise
  • the index screening module is specifically configured to perform stability screening on the time series prediction index through a first formula to obtain a target prediction index that satisfies the monotonic declining law, and the first formula is
  • the stability screening of the time series prediction index is performed by a second formula to obtain a target prediction index that satisfies the monotonic rising law, and the second formula is
  • x(i+1) is the data value of the time series predictive index at time i+1
  • x(i) is the data value of the time series predictive index at time i;
  • a is a constant greater than zero, and b is a constant less than zero;
  • Both threshold1 and threshold2 are constants greater than zero.
  • the second analysis module 30 includes:
  • the slope determination unit is configured to use the mean value of the numerical change slope as a characteristic variable, and the historical grouping result as an outcome variable, wherein the characteristic variable forms a full set N; select a characteristic variable from the N as the current variable ⁇ , Determine all the subsets Ri( ⁇ + ⁇ ) of the N including the current variable ⁇ , and determine the non- ⁇ subset Ri( ⁇ ) corresponding to Ri( ⁇ + ⁇ ) that does not include the current variable ⁇ ; through a preset algorithm Calculate the contribution F[Ri( ⁇ + ⁇ )] of each Ri( ⁇ + ⁇ ) to the outcome variable, and the contribution F[Ri( ⁇ )] of each Ri( ⁇ ) to the outcome variable; respectively; Calculate the contribution difference ⁇ Fi of each F[Ri( ⁇ + ⁇ )] and the corresponding F[Ri( ⁇ )], and calculate the mean value of each ⁇ Fi as the SHAP value of the current variable ⁇ ;
  • the SHAP value of the characteristic variable determines the target variable having a typical influence on the outcome variable according to the size of the
  • the trajectory fitting module 40 includes a data acquisition unit
  • the data acquisition unit is configured to acquire periodic physical examination data of the current patient in a predetermined period from the predetermined database, and filter the periodic physical examination data according to the index type of the time-series predictive index to obtain The time series test index corresponding to the index type of the time series prediction index.
  • the data analysis device further includes:
  • the data sending module is used to obtain the historical health data of the same type of patients from the preset database according to the grouping result of the current patient, and send the historical health data to the corresponding terminal.
  • each module in the above-mentioned data analysis device corresponds to each step in the embodiment of the above-mentioned data analysis method, and the function and realization process thereof will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium of the present application stores a computer program, where the computer program, when executed by a processor, implements the steps of the above-mentioned data analysis method.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据分析方法、装置、设备及计算机可读存储介质,该方法包括:对历史慢病患者的随时间推移而呈现数值变化的时序样本指标进行分析,识别出与疾病发展具有相关性的时序预测指标,并分析确定不同历史患者群体所对应的时序预测指标变化趋势,为慢病患者分群提供参考依据,然后将当前患者的时序检验指标随时间的变化趋势与历史患者群体所对应的时序预测指标变化趋势进行对比匹配,进而确定当前患者的分群结果。

Description

数据分析方法、装置、设备及计算机可读存储介质
优先权信息
本申请要求于2019年9月18日提交中国专利局、申请号为201910884245.8,发明名称为“数据分析方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据分析技术领域,尤其涉及一种数据分析方法、装置、设备及计算机可读存储介质。
背景技术
精准医疗的核心在于根据患者的个体差异提供个性化的治疗,这也是治疗最困难的地方。对于慢性疾病来说,如何把患者(比如的一亿糖尿病患者)分成若干的子群,为每个子群制定不一样的治疗方法,达到最佳的治疗效果,是个很大的挑战。
发明人意识到,对于慢病患者,往往需要多次就诊,持续观察,而目前已有的患者分群方法一般仅考虑当前单次的检验检查指标和基本信息,忽略了患者之前的指标与当次指标的相关性,因此现有的分群方法对慢性病而言具有偶然性和随机性,所得到的分群建议的可靠性不高。
发明内容
本申请的主要目的在于提供一种数据分析方法、装置、设备及计算机可读存储介质,旨在解决现有的患者分群结果可靠性不高的技术问题。
为实现上述目的,本申请实施例提供一种数据分析方法,所述数据分析方法包括:
访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史 分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
此外,为实现上述目的,本申请实施例还提供一种数据分析装置,所述数据分析装置包括:
指标获取模块,用于访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
第一分析模块,用于分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
第二分析模块,用于基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
轨迹拟合模块,用于根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
位置比对模块,用于将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
此外,为实现上述目的,本申请实施例还提供一种数据分析设备,所述数据分析设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现如上述的数据分析方法的步骤。
此外,为实现上述目的,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如上述的数据分析方法的步骤。
本申请实施例通过对历史慢病患者的随时间推移而呈现数值变化的时序样本指标进行分析,识别出与疾病发展具有相关性的时序预测指标,并分析确定不同历史患者群体所对应的时序预测指标变化趋势,为慢病患者分群提供参考依据,然后将当前患者的时序检 验指标随时间的变化趋势与历史患者群体所对应的时序预测指标变化趋势进行对比匹配,进而确定当前患者的分群结果;由于本申请实施例是根据患者的多次检验指标进行患者分群,因而降低了单次检验数据的偶然性和随机性对分群可靠性的不利影响,提高了患者分群的可靠性。
附图说明
图1为本申请实施例方案中涉及的数据分析设备的硬件结构示意图;
图2为本申请数据分析方法第一实施例的流程示意图;
图3为本申请数据分析方法第一实施例涉及的SHAP value for K-数值变化斜率均值K示意图;
图4为本申请数据分析装置第一实施例的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例涉及的数据分析方法主要应用于数据分析设备,该数据分析设备可以是服务器、个人计算机(personal computer,PC)、笔记本电脑等具有数据处理功能的设备。
参照图1,图1为本申请实施例方案中涉及的数据分析设备的硬件结构示意图。本申请实施例中,该数据分析设备可以包括处理器1001(例如中央处理器Central Processing Unit,CPU),通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信;用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard);网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真WIreless-FIdelity,WI-FI接口);存储器1005可以是高速随机存取存储器(random access memory,RAM),也可以是稳定的存储器(non-volatile memory),例如磁盘存储器,存储器1005可选的还可以是独立于前述处理器1001的存储装置。本领域技术人员可以理解,图1中示出的硬件结构并不构成对本申请的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
继续参照图1,图1中作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块以及计算机程序。在图1中,网络通信模块可用于连接预设数据库,与数据库进行数据通信;而处理器1001可以调用存储器1005中存储的计算机程序,并执行本申请实施例提供的数据分析方法。
基于上述的硬件架构,提出本申请数据分析方法的各实施例。
本申请实施例提供了一种数据分析方法。
参照图2,图2为本申请数据分析方法第一实施例的流程示意图。
本实施例中,所述数据分析方法包括以下步骤:
步骤S10,访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
对于慢性疾病来说,如何把患者(比如的一亿糖尿病患者)分成若干的子群,为每个子群制定不一样的治疗方法,达到最佳的治疗效果,是个很大的挑战。对于慢病患者,往往需要多次就诊,持续观察,而目前已有的患者分群方法一般仅考虑当前单次的检验检查指标和基本信息,忽略了患者之前的指标与当次指标的相关性,因此现有的分群方法对慢性病而言具有偶然性和随机性,所得到的分群建议的可靠性不高。对此,本实施例提出一种基于风险指标轨迹趋势的数据分析方法,通过对历史慢病患者的随时间推移而呈现数值变化的时序样本指标进行分析,识别出与疾病发展具有相关性的时序预测指标,并分析确定不同历史患者群体所对应的时序预测指标变化趋势,为慢病患者分群提供参考依据,然后将当前患者的时序检验指标随时间的变化趋势与历史患者群体所对应的时序预测指标变化趋势进行对比匹配,进而确定当前患者的分群结果;由于本实施例是根据患者的多次检验指标进行患者分群,因而降低了单次检验数据的偶然性和随机性对分群可靠性的不利影响,提高了患者分群的可靠性,进而为患者的健康评估提供了有效地参考依据。
本实施例中的数据分析方法是由数据分析设备实现的,该数据分析设备可以是服务器、个人计算机、笔记本电脑等设备,本实施例中以服务器为例进行说明。服务器与预设数据库通信连接;该数据库中存储有若干历史患者所提供的样本指标,当然对于不同疾病类型的历史患者,其对应的样本指标类型不同,例如糖尿病患者的样本指标包括糖化血红蛋白、血糖浓度、血压等,慢性肾病患者的样本指标包括肾小球滤过率等。值得说明的是,对于每一类检验样本,都包括若干个检验时刻的数据值,具有一定的时序性,而不是单次检验的数据值,也即该样本指标为时序样本指标。
本实施例中的服务器可从预设数据库中获取时序样本指标。而对于这些时序样本指标,由于其类别较多,而在实际中不是所有的时序样本指标都与某类疾病具有相关性,因此服务器可通过显著性检验或人工标记筛选的方式,从时序样本指标中筛选出与用户健康(疾病的不良事件、死亡结局)有相关性的时序预测指标,作为可能的风险因素进行后续分析; 其中,对于用户健康,可以是根据时序样本指标所对应历史用户的健康信息获得,因此该时序预测指标可认为是与历史患者的健康信息具有统计学关联(具有显著的统计学意义)。例如,当采用显著性检验的方式时,可以将各类时序样本指标分别作为特征变量,将历史患者最终的健康状况(或疾病诊断结果、不良事件、死亡等)作为结局变量,然后采用卡方检验方式对特征变量和结局变量之间的关系进行挖掘,通过卡方检验计算的P-value<0.05的方式识别出对结局变量影响具有显著统计学意义的特征变量,该特征变量所对应的时序样本指标即为时序预测指标;进一步的,还可以利用相对危险度RR或比数比OR来分析这些特征变量对结局变量的是正向还是负向影响(进而确定时序样本指标为危险或保护因素)。
步骤S20,分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
本实施例中,服务器在得到与用户健康具有相关性的时序预测指标时,可对这些时序预测指标的数值随时间的变化关系进行分析,并通过数值变化斜率的方式对该变化关系进行表征。其中,在进行分析时,是以时间作为自变量(x轴),以时序预测指标的数值作为因变量(y轴),然后在预设坐标系中以绘制出各时序预测指标对应的数值点,然后根据时间先后顺序将各数值点连接成线,得到预测指标线;再对预测指标线进行斜率分析,确定预测指标的数值变化斜率均值,该数值变化斜率均值即表征了时序预测指标的数值随时间的变化关系。值得说明的是,当时序预测指标的类型包括多类时,服务器时分别对各类时序预测指标进行分析,得到多个数值变化斜率均值。
进一步的,考虑到在发生不可逆的病情变化时,与之具有相关性的指标一般是呈单调变化,因此本实施例中可仅对波动较小较稳定的指标来进行分析。具体的,所述步骤S20之前,还包括;
对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预设指标;
本实施例中,服务器在得到与健康具有相关性的时序预测指标时,为了使得分析过程更加准确可靠,可先对时序预测指标进行稳定性筛选,排除波动较大的时序预测指标,得到波动平缓的且有单调变化规律的目标预测指标,然后再对目标预测指标进行分析;对于该单调变化规律,包括单调下降和单调上升。其中,对于对于波动较平缓且有单点规律变化的指标,可以是通过以下公式进行识别:
对于单调下降的指标:
max(x(i+1)-x(i))<a,and
Figure PCTCN2020112468-appb-000001
对于单调上升的指标:
max(x(i+1)-x(i))>b,and
Figure PCTCN2020112468-appb-000002
上述公式中,x(i+1)为i+1时刻的时序预测指标的数据值,x(i)为i时刻的时序预测指标的数据值;a为大于零且接近零的常数,b为小于零且接近零的常数;threshold1、threshold2为变化率的绝对值阈值,且均为大于零的常数。波动平缓即限制时序预测指标的数值变化率绝对值在一个阈值以内。
所述步骤S20包括:
分析所述目标预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
服务器再得到目标预测指标时,可对目标预测指标的数值随时间的变化关系进行分析,得到对应的数值变化斜率均值,具体分析过程如上述,此处不再赘述。
步骤S30,基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
本实施例中,服务器得到时序预测指标(目标预测指标)所对应的数值变化斜率均值时,将基于SHAP特征归因法和历史患者的历史分群结果(即时序预测指标所对应的历史患者的历史分群结果)分析数值变化斜率均值与历史患者分群标准(患者健康状况)之间的非线性关系,找到用以表征该非线性关系的归类控制斜率,该归类控制斜率可包括最佳控制值k、对归类结局有正向典型影响的正向控制值k1、对归类结局有负向典型影响的负向控制值k2,从而建立指标的数值变化斜率均值对于患者分群的预测模型。其中,SHAP是一种解释机器学习模型输出的方法,通过计算一个特征加入到模型时的边际贡献,然后考虑到该特征在所有的特征序列的情况下不同的边际贡献并取均值,该均值也即该特征的SHAP值,并通过该SHAP值来表征该特征对与结局的非线性关系,SHAP值越大对结局的影响越正向,值越小对结局的影响越负向。
具体的,本实施例中,可以是将若干个的数值变化斜率K作为特征变量,这些特征变量形成了全集N,并将历史患者的历史分群结果作为结局变量;从特征变量全集N随机选择一个作为当前变量α,然后确定出全集N中包括当前变量α的所有子集(值得说明的是包括N本身),这些包括当前变量α的所有子集可记为Ri(γ+α),而这些子集的数量记为n;在确定这些子集时,可进一步将这些子集中的当前变量α去除,从而得到与Ri(γ+α)对应的非α子集,可记为Ri(γ);然后可基于预设算法(如LIME算法、DeepLIFT 算法、Layer-Wise Relevance Propagation算法、Classic Shapley Value Estimation算法等)计算出各Ri(γ+α)对于结局变量的贡献度F[Ri(γ+α)]、以及各Ri(γ)的贡献度F[Ri(γ)];然后可计算各F[Ri(γ+α)]与对应的F[Ri(γ)]的差值ΔFi,并对各ΔFi差值求均值,该均值即为当前变量α的SHAP值;依此类推,可得到各特征变量的SHAP值,也即计算出各数值变化斜率K针对历史分群结果的SHAP值,进而根据各SHAP值的大小确定出对所述结局变量具有典型影响的目标变量,并将所述目标变量对应的数值变化斜率均值确定为所述归类控制斜率,如确定出最佳控制值k、对归类结局有正向典型影响的正向控制值k1、对归类结局有负向典型影响的负向控制值k2;在得到上述控制值时,即可认为建立了相关患者分群的预测模型,当输入了某一预测的斜率均值时,可通过该预测模型中的控制值与输入值之间的大小关系输出预测结果。例如,对于通过建立指标的数值变化斜率均值K对历史分群结果的预测模型,通过SHAP特征归因方法分析指标的数值变化斜率均值K与历史分群结果之间的非线性关系,此关系由服务器输出的SHAP value for K-数值变化斜率均值K的图可以说明,如图3所示;在图3中,x轴表示数值变化斜率均值K,y轴表示了数值变化斜率均值K对于历史分群结果的影响程度,y>0表示正向影响,y<0表示负向影响;在此SHAP value for K-数值变化斜率均值K的图中找出SHAP value=0时K的取值,记为k,即为cutoff临界值(最佳控制值k);当K>k和K<k时数值变化对分类结果分别具有正向或者反向的作用,因此需要进一步找出典型正向作用的斜率均值k1(正向控制值k1),以及典型负向作用的斜率均值k2(负向控制值k2),来作为分类的中心线斜率依据;此时,可取SHAP value for K-数值变化斜率均值K的图中SHAP value=1时的K值为k1,指标斜率均值K的图中SHAP value=-1时的K值为k2。
得到归类控制斜率(k、k1、k2)之后,即可根据归类控制斜率在预设坐标系中拟合得到对应的控制轨迹线,这些控制轨迹线可将时序预测指标对应的指标数值变化轨迹分为三种典型类型;这些控制轨迹线分别可记为y=k*x+b、y1=k1*x+b1、y2=k2*x+b2,其中b1、b2、b3均为常数,y表示对历史分群结果无明显影响的指标数值轨迹,y1表示对历史分群结果有明显正向影响的指标数值轨迹,y2表示对历史分群结果有明显负向影响的指标数值轨迹,这三条控制轨迹线即为历史分群结果所对应的指标数值变化趋势对应的数据轨迹中心线。值得说明的是,在实际中,归类控制斜率和控制轨迹线的数量可以是根据实际情况进行定义。
步骤S40,根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
本实施例中,在得到控制轨迹线时,即可根据控制轨迹线、结合当前患者的时序检验指标对患者进行分群。首先,服务器可根据时序预测指标的指标类型获取当前患者的时序检验指标,也即获取与控制轨迹线对应的检验指标(如糖尿病患者的指标包括糖化血红蛋白、血糖浓度、血压等,慢性肾病患者的指标包括肾小球滤过率等)。
具体的,所述根据所述时序预测指标的指标类型获取当前患者的时序检验指标的步骤包括:
从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据,并根据所述时序预测指标的指标类型对所述周期体检数据进行筛选,获取与所述时序预测指标的指标类型对应的时序检验指标。
本实施例中,为了方便当前患者提供资料,对于该时序检验指标,可以时服务器根据当前患者的体检数据中自动识别和筛选得到。具体的,当前患者在进行体检(或者进行某些身体检查)后,可自行或通过授权他人将自己的体检数据上传至数据库(如医院的医疗系统数据库)。而服务器将与数据库进行连接,从数据库中获取当前患者在某一预设周期内的周期体检数据,然后根据时序预测指标的指标类型对周期体检数据进行筛选,获取与时序预测指标的指标类型对应的时序检验指标,并根据该时序检验指标进行后续分析处理,从而提高了指标(数据)获取的效率,也方便当前患者提供相关的检验指标资料。
进一步,由于患者的体检数据属于隐私数据,因此对于当前患者的体检数据,可以通过设置许可和加密的方式来提高当前患者的体检数据存储的安全性。具体的,所述从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据的步骤之前,还包括:
向患者终端发送数据获取请求;
本实施例中,各患者存储在数据库中的体检数据是分别以不同的账户标识进行建表存储,且这些体检数据在数据库中是以加密的方式进行存储,而解密所用的密钥则由当前患者自行保管,从而提高数据存储的安全性。服务器在获取当前患者的周期体检数据前,首先会向当前患者的患者终端(如手机、平板电脑等)发送数据获取请求,以获得调取当前患者的体检数据的权限。
接收所述患者终端返回的数据许可信息,并对所述数据许可信息进行解析,得到对应的患者账户标识和患者数据密钥;
本实施例中,若当前患者同意服务器调取自己的体检数据,可操作患者终端向服务器返回对于的数据许可信息,该数据许可信息包括患者账户标识和患者数据密钥。服务器在接收到数据许可信息时,可对所述数据许可信息进行解析,得到对应的患者账户标识和患 者数据密钥。
所述从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据的步骤包括:
通过所述患者账户标识访问所述预设数据库,获取所述当前患者的加密体验数据;
本实施例中,服务器得到患者账户标识和患者数据密钥时,即可通过患者账户标识访问所述预设数据库,查询到相应的数据表(账户数据),并获取到当前患者的加密体验数据
通过所述患者数据密钥对所述加密体验数据进行解密,并根据解密结果获取所述当前患者在预设周期内的周期体检数据。
本实施例中,服务器在得到当前患者的加密体验数据时,即可通过患者数据密钥对加密体验数据进行解密,并根据解密结果获取到当前患者在预设周期内的周期体检数据。
服务器在得到周期体检数据时,即可根据时序预测指标的指标类型对周期体检数据进行筛选,获取与时序预测指标的指标类型对应的时序检验指标;然后将时序检验指标中的数值作为因变量(y轴),以时间作为自变量(x轴),在预设坐标系中拟合得到对应的检验轨迹线。
步骤S50,将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
本实施例中,在得到检验轨迹线时,可将检验轨迹线与控制轨迹线进行位置比对,然后根据检验轨迹线与控制轨迹线的位置关系确定检验轨迹线的轨迹类型;而对于不同的位置关系,则对应了历史患者不同的历史分群结果,当确定检验轨迹线的与控制轨迹线的位置关系时,即可根据该位置关系确定出当前患者的分群结果,从而确定当前患者的相似患者群体。具体的,以一条控制轨迹线为例,历史患者的历史分群结果包括两种结果;在预设坐标系中,通过所述控制轨迹线可将预设坐标系的某一目标象限划分为至少两个子区域,其中每一个子区域分别对应一种历史分群结果;然后可确定检验轨迹线所处的目标子区域,对于该目标子区域所对应的历史分群结果,即为当前患者的分群结果;值得说明的是,为了方便比对控制轨迹线与检验轨迹线的位置关系,可以在对比时对两者进行一定的平移处理,以使两者在y轴或x轴中相交于同一点。
进一步,所述步骤S50之后,还包括:
将所述当前患者的分群结果发送至对应的诊疗终端;
本实施例中,服务器在得到当前患者的分群结果时,可将当前患者的分群结果发送至对应的诊疗终端,以医疗人员对当前患者的诊疗提供参考。
在接收到述诊疗终端返回的分群调整信息时,根据所述分群校正信息对所述当前患者的分群结果进行调整,并将所述当前患者调整后的分群结果和所述时序检验指标关联存储至所述预设数据库中。
本实施例中,由于服务器所提供的当前患者的分群结果仅为参考使用,而医疗人员对该当前患者的分群结果可能会进行调整;当需要调整时,医疗人员可通过诊疗终端向服务器返回对应的分群调整信息。服务器在接收所述诊疗终端返回的分群调整信息时,根据分群校正信息对当前患者的分群结果进行调整,然后将所述当前患者调整后的分群结果和时序检验指标关联存储至数据库中,以供后续参考使用。通过这样的方式,可不断根据实际医疗处理的过程积累更多的样本数据,便于后续对分析过程进行优化和调整。
再进一步的,本实施例的数据分析方法还包括:
当接收到所述分群调整信息的次数大于预设阈值时,根据所述预设数据库中新入库的时序检验指标和所述新入库的时序检验指标对应的分群结果重新获取对应的控制轨迹线。
本实施例中,服务器还将会对接收到分群调整信息的次数进行统计,当接收到的分群调整信息的次数大于预设阈值时,可认为是之前分析确定、当前使用的控制轨迹线不符合实际情况;此时服务器可调取新入库的时序检验指标和新入库的时序检验指标对应的分群结果,然后根据该新入库的时序检验指标及其分群结果重新进行分析处理,以重新获取对应的控制轨迹线并用于后续的患者分群;其中,控制轨迹线的重新获取过程如上述步骤所述,此处不再赘述。通过这样的方式,可不断根据实际医疗处理情况对控制轨迹线进行优化和调整,进而提高患者分群的准确性和可靠性。
本实施例通过对历史慢病患者的随时间推移而呈现数值变化的时序样本指标进行分析,识别出与疾病发展具有相关性的时序预测指标,并分析确定不同历史患者群体所对应的时序预测指标变化趋势,为慢病患者分群提供参考依据,然后将当前患者的时序检验指标随时间的变化趋势与历史患者群体所对应的时序预测指标变化趋势进行对比匹配,进而确定当前患者的分群结果;由于本申请实施例是根据患者的多次检验指标进行患者分群,因而降低了单次检验数据的偶然性和随机性对分群可靠性的不利影响,提高了患者分群的可靠性,进而为患者的健康评估提供了有效的参考依据。
基于上述图2所示实施例,提出本申请数据分析方法第二实施例。
本实施例中,所述步骤S50之后,还包括:
根据所述当前患者的分群结果从所述预设数据库中获取同类患者的历史健康数据,并将所述历史健康数据发送至对应终端。
本实施例中,服务器在得到当前患者的分群结果时,可根据当前患者的分群结果从数据库中获取同类患者的历史健康数据,然后将这些历史健康数据发送至对应终端(如诊疗人员的诊疗终端、当前患者的患者终端等),以为对应的终端人员提供健康参考依据,为后续的诊疗处理提供方便。
此外,本申请实施例还提供一种数据分析装置。
参照图4,图4为本申请数据分析装置第一实施例的功能模块示意图。
本实施例中,所述数据分析装置包括:
指标获取模块10,用于访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
第一分析模块20,用于分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
第二分析模块30,用于基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
轨迹拟合模块40,用于根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
位置比对模块50,用于将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
其中,上述数据分析装置的各虚拟功能模块存储于图1所示数据分析设备的存储器1005中,用于实现计算机程序的所有功能;各模块被处理器1001执行时,可实现患者分群的功能。
进一步的,所述数据分析装置还包括:
指标筛选模块,用于对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标;
所述第一分析模块20,还用于分析所述目标预测指标的数值随时间的变化关系,获 得所述变化关系对应的数值变化斜率均值。
进一步的,所述预设变化规律包括单调下降和/或单调上升,
所述指标筛选模块,具体用于通过第一公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调下降规律的目标预测指标,所述第一公式为
max(x(i+1)-x(i))<a,and
Figure PCTCN2020112468-appb-000003
和/或,通过第二公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调上升规律的目标预测指标,所述第二公式为
max(x(i+1)-x(i))>b,and
Figure PCTCN2020112468-appb-000004
其中,x(i+1)为i+1时刻的时序预测指标的数据值,x(i)为i时刻的时序预测指标的数据值;
a为大于零的常数,b为小于零的常数;
threshold1、threshold2均为大于零的常数。
进一步的,所述第二分析模块30包括:
斜率确定单元,用于将所述数值变化斜率均值作为特征变量,将所述历史分群结果作为结局变量,其中所述特征变量形成全集N;从所述N中选择一个特征变量作为当前变量α,确定出所述N的包括当前变量α的所有子集Ri(γ+α),并确定Ri(γ+α)对应的不包括当前变量α的非α子集Ri(γ);通过预设算法分别计算各Ri(γ+α)对所述结局变量的贡献度F[Ri(γ+α)],以及各Ri(γ)对所述结局变量的贡献度F[Ri(γ)];分别计算各F[Ri(γ+α)]与对应F[Ri(γ)]的贡献度差值ΔFi,并计算各ΔFi的均值作为当前变量α的SHAP值;依此分别计算所述N中各特征变量的SHAP值,根据各特征变量的SHAP值的大小确定出对所述结局变量具有典型影响的目标变量,并将所述目标变量对应的数值变化斜率均值确定为所述归类控制斜率。
进一步的,所述轨迹拟合模块40包括数据获取单元,
所述数据获取单元,用于从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据,并根据所述时序预测指标的指标类型对所述周期体检数据进行筛选,获取与所述时序预测指标的指标类型对应的时序检验指标。
进一步的,所述数据分析装置还包括:
数据发送模块,用于根据所述当前患者的分群结果从所述预设数据库中获取同类患者的历史健康数据,并将所述历史健康数据发送至对应终端。
其中,上述数据分析装置中各个模块的功能实现与上述数据分析方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
此外,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性的,也可以是易失性的。
本申请计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如上述的数据分析方法的步骤。
其中,计算机程序被执行时所实现的方法可参照本申请数据分析方法的各个实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种数据分析方法,包括:
    访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
    分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
    基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
    根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
    将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
  2. 如权利要求1所述的数据分析方法,其中,所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤之前,还包括:
    对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标;
    所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤包括:
    分析所述目标预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值。
  3. 如权利要求2所述的数据分析方法,其中,所述预设变化规律包括单调下降和/或单调上升,
    所述对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标的步骤包括:
    通过第一公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调下降规律的目标预测指标,所述第一公式为
    Figure PCTCN2020112468-appb-100001
    和/或,通过第二公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调上 升规律的目标预测指标,所述第二公式为
    Figure PCTCN2020112468-appb-100002
    其中,x(i+1)为i+1时刻的时序预测指标的数据值,x(i)为i时刻的时序预测指标的数据值;
    a为大于零的常数,b为小于零的常数;
    threshold1、threshold2均为大于零的常数。
  4. 如权利要求1所述的数据分析方法,其中,所述基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率的步骤包括:
    将所述数值变化斜率均值作为特征变量,将所述历史分群结果作为结局变量,其中所述特征变量形成全集N;
    从所述N中选择一个特征变量作为当前变量α,确定出所述N的包括当前变量α的所有子集Ri(γ+α),并确定Ri(γ+α)对应的不包括当前变量α的非α子集Ri(γ);
    通过预设算法分别计算各Ri(γ+α)对所述结局变量的贡献度F[Ri(γ+α)],以及各Ri(γ)对所述结局变量的贡献度F[Ri(γ)];
    分别计算各F[Ri(γ+α)]与对应F[Ri(γ)]的贡献度差值ΔFi,并计算各ΔFi的均值作为当前变量α的SHAP值;
    依此分别计算所述N中各特征变量的SHAP值,根据各特征变量的SHAP值的大小确定出对所述结局变量具有典型影响的目标变量,并将所述目标变量对应的数值变化斜率均值确定为所述归类控制斜率。
  5. 如权利要求1所述的数据分析方法,其中,所述将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果的步骤包括:
    通过所述控制轨迹线将所述预设坐标系的目标象限划分为至少两个子区域,其中各子区域分别与所述历史患者的历史分群结果一一对应;
    确定所述检验轨迹线所处的目标子区域,并根据所述目标子区域所对应的历史分群结果确定所述当前患者的分群结果。
  6. 如权利要求1所述的数据分析方法,其中,所述根据所述时序预测指标的指标类型获取当前患者的时序检验指标的步骤包括:
    从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据,并根据所述 时序预测指标的指标类型对所述周期体检数据进行筛选,获取与所述时序预测指标的指标类型对应的时序检验指标。
  7. 如权利要求1至6中任一项所述的数据分析方法,其中,所述将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果的步骤之后,还包括:
    根据所述当前患者的分群结果从所述预设数据库中获取同类患者的历史健康数据,并将所述历史健康数据发送至对应终端。
  8. 一种数据分析装置,其中,所述数据分析装置包括:
    指标获取模块,用于访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
    第一分析模块,用于分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
    第二分析模块,用于基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
    轨迹拟合模块,用于根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
    位置比对模块,用于将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
  9. 一种数据分析设备,其中,所述数据分析设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,所述计算机程序被所述处理器执行时,实现如下步骤:
    访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
    分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
    基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史 分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
    根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
    将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
  10. 如权利要求9所述的数据分析设备,其中,所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤之前,还包括:
    对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标;
    所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤包括:
    分析所述目标预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值。
  11. 如权利要求10所述的数据分析设备,其中,所述预设变化规律包括单调下降和/或单调上升,
    所述对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标的步骤包括:
    通过第一公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调下降规律的目标预测指标,所述第一公式为
    Figure PCTCN2020112468-appb-100003
    和/或,通过第二公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调上升规律的目标预测指标,所述第二公式为
    Figure PCTCN2020112468-appb-100004
    其中,x(i+1)为i+1时刻的时序预测指标的数据值,x(i)为i时刻的时序预测指标的数据值;
    a为大于零的常数,b为小于零的常数;
    threshold1、threshold2均为大于零的常数。
  12. 如权利要求9所述的数据分析设备,其中,所述基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表 征所述非线性关系的归类控制斜率的步骤包括:
    将所述数值变化斜率均值作为特征变量,将所述历史分群结果作为结局变量,其中所述特征变量形成全集N;
    从所述N中选择一个特征变量作为当前变量α,确定出所述N的包括当前变量α的所有子集Ri(γ+α),并确定Ri(γ+α)对应的不包括当前变量α的非α子集Ri(γ);
    通过预设算法分别计算各Ri(γ+α)对所述结局变量的贡献度F[Ri(γ+α)],以及各Ri(γ)对所述结局变量的贡献度F[Ri(γ)];
    分别计算各F[Ri(γ+α)]与对应F[Ri(γ)]的贡献度差值ΔFi,并计算各ΔFi的均值作为当前变量α的SHAP值;
    依此分别计算所述N中各特征变量的SHAP值,根据各特征变量的SHAP值的大小确定出对所述结局变量具有典型影响的目标变量,并将所述目标变量对应的数值变化斜率均值确定为所述归类控制斜率。
  13. 如权利要求9所述的数据分析设备,其中,所述将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果的步骤包括:
    通过所述控制轨迹线将所述预设坐标系的目标象限划分为至少两个子区域,其中各子区域分别与所述历史患者的历史分群结果一一对应;
    确定所述检验轨迹线所处的目标子区域,并根据所述目标子区域所对应的历史分群结果确定所述当前患者的分群结果。
  14. 如权利要求9所述的数据分析设备,其中,所述根据所述时序预测指标的指标类型获取当前患者的时序检验指标的步骤包括:
    从所述预设数据库中获取所述当前患者在预设周期内的周期体检数据,并根据所述时序预测指标的指标类型对所述周期体检数据进行筛选,获取与所述时序预测指标的指标类型对应的时序检验指标。
  15. 如权利要求9至14中任一项所述的数据分析设备,其中,所述将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果的步骤之后,还包括:
    根据所述当前患者的分群结果从所述预设数据库中获取同类患者的历史健康数据,并将所述历史健康数据发送至对应终端。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程 序,所述计算机程序被处理器执行时,实现如下步骤:
    访问预设数据库,从所述预设数据库中获取历史患者的时序样本指标,并通过显著性检验的方式在所述时序样本指标中筛选得到与所述历史患者的健康信息具有统计学关联的时序预测指标;
    分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值;
    基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率,并根据所述归类控制斜率在预设坐标系模拟得到控制轨迹线;
    根据所述时序预测指标的指标类型获取当前患者的时序检验指标,并根据所述时序检验指标在所述预设坐标系拟合得到对应的检验轨迹线;
    将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤之前,还包括:
    对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标;
    所述分析所述时序预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值的步骤包括:
    分析所述目标预测指标的数值随时间的变化关系,获得所述变化关系对应的数值变化斜率均值。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述预设变化规律包括单调下降和/或单调上升,
    所述对所述时序预测指标进行稳定性筛选,得到满足预设变化规律的目标预测指标的步骤包括:
    通过第一公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调下降规律的目标预测指标,所述第一公式为
    Figure PCTCN2020112468-appb-100005
    和/或,通过第二公式对所述对所述时序预测指标进行稳定性筛选,得到满足单调上升规律的目标预测指标,所述第二公式为
    Figure PCTCN2020112468-appb-100006
    其中,x(i+1)为i+1时刻的时序预测指标的数据值,x(i)为i时刻的时序预测指标的数据值;
    a为大于零的常数,b为小于零的常数;
    threshold1、threshold2均为大于零的常数。
  19. 如权利要求16所述的计算机可读存储介质,其中,所述基于特征归因法和所述历史患者的历史分群结果分析所述数值变化斜率均值与历史分群结果之间的非线性关系,确定表征所述非线性关系的归类控制斜率的步骤包括:
    将所述数值变化斜率均值作为特征变量,将所述历史分群结果作为结局变量,其中所述特征变量形成全集N;
    从所述N中选择一个特征变量作为当前变量α,确定出所述N的包括当前变量α的所有子集Ri(γ+α),并确定Ri(γ+α)对应的不包括当前变量α的非α子集Ri(γ);
    通过预设算法分别计算各Ri(γ+α)对所述结局变量的贡献度F[Ri(γ+α)],以及各Ri(γ)对所述结局变量的贡献度F[Ri(γ)];
    分别计算各F[Ri(γ+α)]与对应F[Ri(γ)]的贡献度差值ΔFi,并计算各ΔFi的均值作为当前变量α的SHAP值;
    依此分别计算所述N中各特征变量的SHAP值,根据各特征变量的SHAP值的大小确定出对所述结局变量具有典型影响的目标变量,并将所述目标变量对应的数值变化斜率均值确定为所述归类控制斜率。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述将所述检验轨迹线与所述控制轨迹线进行位置比对,并根据所述检验轨迹线与所述控制轨迹线的位置关系、所述历史患者的历史分群结果确定所述当前患者的分群结果的步骤包括:
    通过所述控制轨迹线将所述预设坐标系的目标象限划分为至少两个子区域,其中各子区域分别与所述历史患者的历史分群结果一一对应;
    确定所述检验轨迹线所处的目标子区域,并根据所述目标子区域所对应的历史分群结果确定所述当前患者的分群结果。
PCT/CN2020/112468 2019-09-18 2020-08-31 数据分析方法、装置、设备及计算机可读存储介质 WO2021052156A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884245.8 2019-09-18
CN201910884245.8A CN110782989B (zh) 2019-09-18 2019-09-18 数据分析方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021052156A1 true WO2021052156A1 (zh) 2021-03-25

Family

ID=69384226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112468 WO2021052156A1 (zh) 2019-09-18 2020-08-31 数据分析方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110782989B (zh)
WO (1) WO2021052156A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159453A (zh) * 2021-05-17 2021-07-23 北京字跳网络技术有限公司 资源数据预测方法、装置、设备及存储介质
CN116089491A (zh) * 2022-12-15 2023-05-09 清华大学 基于时序数据库的检索匹配方法和装置
CN116682566A (zh) * 2023-08-03 2023-09-01 青岛市中医医院(青岛市海慈医院、青岛市康复医学研究所) 一种血液透析的数据处理方法及系统
CN117150891A (zh) * 2023-08-15 2023-12-01 幂光新材料科技(上海)有限公司 基于数据驱动的led灯珠功率智能预测方法及系统
CN117373664A (zh) * 2023-10-09 2024-01-09 曜立科技(北京)有限公司 基于数字疗法的冠脉术后危险数据分析预警系统
CN117708764A (zh) * 2024-02-06 2024-03-15 青岛天高智慧科技有限公司 基于校园一卡通的学生消费数据智能分析方法
CN117854732A (zh) * 2024-03-08 2024-04-09 微脉技术有限公司 一种基于大数据分析的慢性病管理方法与系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782989B (zh) * 2019-09-18 2022-06-17 平安科技(深圳)有限公司 数据分析方法、装置、设备及计算机可读存储介质
CN111401788B (zh) * 2020-04-10 2022-03-25 支付宝(杭州)信息技术有限公司 业务时序指标的归因方法以及装置
CN111461055A (zh) * 2020-04-14 2020-07-28 上海异工同智信息科技有限公司 一种识别待监测信号状态的方法、装置和电子设备
CN111755125B (zh) * 2020-07-07 2024-04-23 医渡云(北京)技术有限公司 分析患者测量指标的方法、装置、介质及电子设备
CN111816310A (zh) * 2020-07-16 2020-10-23 山东大学 一种骨髓血液疾病危险因素贡献率计算及风险预测系统
CN112151136A (zh) * 2020-09-30 2020-12-29 上海依智医疗技术有限公司 医学数据的处理方法、装置及存储介质
CN114496264B (zh) * 2022-04-14 2022-07-19 深圳市瑞安医疗服务有限公司 基于多维数据的健康指数分析方法、装置、设备及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151516A1 (en) * 2011-12-08 2013-06-13 Electronics And Telecommunications Research Institute Clinical data analysis apparatus and clinical data analysis method
US20180150609A1 (en) * 2016-11-29 2018-05-31 Electronics And Telecommunications Research Institute Server and method for predicting future health trends through similar case cluster based prediction models
CN108139383A (zh) * 2015-05-06 2018-06-08 普雷西恩医药控股有限责任公司 用于检测fviii抑制剂的新型自动筛选方法的研究
CN109493979A (zh) * 2018-10-23 2019-03-19 平安科技(深圳)有限公司 一种基于智能决策的疾病预测方法和装置
CN109509549A (zh) * 2018-05-28 2019-03-22 平安医疗健康管理股份有限公司 诊疗服务提供方评价方法、装置、计算机设备和存储介质
CN109634801A (zh) * 2018-10-31 2019-04-16 深圳壹账通智能科技有限公司 数据趋势分析方法、系统、计算机装置及可读存储介质
CN110163195A (zh) * 2018-02-14 2019-08-23 中国医药大学附设医院 肝癌分群预测模型、其预测系统以及肝癌分群判断方法
CN110782989A (zh) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 数据分析方法、装置、设备及计算机可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161094A1 (en) * 2002-08-23 2011-06-30 Dxcg, Inc. System and method for health care costs and outcomes modeling using dosage and routing pharmacy information
US20170342503A1 (en) * 2016-05-24 2017-11-30 The Board Of Regents Of The University Of Texas System Xrn2 as a determinant of sensitivity to dna damage
CN106778042A (zh) * 2017-01-26 2017-05-31 中电科软件信息服务有限公司 心脑血管患者相似性分析方法及系统
WO2019160504A1 (en) * 2018-02-13 2019-08-22 Agency For Science, Technology And Research System and method for assessing clinical event risk based on heart rate complexity
CN109817338A (zh) * 2019-02-13 2019-05-28 北京大学第三医院(北京大学第三临床医学院) 一种慢性病加重风险评估与告警系统
CN110085318A (zh) * 2019-03-12 2019-08-02 平安科技(深圳)有限公司 预测未来血糖值的方法、装置及计算机设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151516A1 (en) * 2011-12-08 2013-06-13 Electronics And Telecommunications Research Institute Clinical data analysis apparatus and clinical data analysis method
CN108139383A (zh) * 2015-05-06 2018-06-08 普雷西恩医药控股有限责任公司 用于检测fviii抑制剂的新型自动筛选方法的研究
US20180150609A1 (en) * 2016-11-29 2018-05-31 Electronics And Telecommunications Research Institute Server and method for predicting future health trends through similar case cluster based prediction models
CN110163195A (zh) * 2018-02-14 2019-08-23 中国医药大学附设医院 肝癌分群预测模型、其预测系统以及肝癌分群判断方法
CN109509549A (zh) * 2018-05-28 2019-03-22 平安医疗健康管理股份有限公司 诊疗服务提供方评价方法、装置、计算机设备和存储介质
CN109493979A (zh) * 2018-10-23 2019-03-19 平安科技(深圳)有限公司 一种基于智能决策的疾病预测方法和装置
CN109634801A (zh) * 2018-10-31 2019-04-16 深圳壹账通智能科技有限公司 数据趋势分析方法、系统、计算机装置及可读存储介质
CN110782989A (zh) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 数据分析方法、装置、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIN YI , WANG ZHIBO: "Time Series Piecewise Linear Representation Method Based on First-order Filtering", COMPUTER ENGINEERING, vol. 42, no. 9, 15 September 2016 (2016-09-15), pages 151 - 157, XP055792728, ISSN: 1000-3428 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159453B (zh) * 2021-05-17 2024-04-30 北京字跳网络技术有限公司 资源数据预测方法、装置、设备及存储介质
CN113159453A (zh) * 2021-05-17 2021-07-23 北京字跳网络技术有限公司 资源数据预测方法、装置、设备及存储介质
CN116089491B (zh) * 2022-12-15 2024-01-30 清华大学 基于时序数据库的检索匹配方法和装置
CN116089491A (zh) * 2022-12-15 2023-05-09 清华大学 基于时序数据库的检索匹配方法和装置
CN116682566B (zh) * 2023-08-03 2023-10-31 青岛市中医医院(青岛市海慈医院、青岛市康复医学研究所) 一种血液透析的数据处理方法及系统
CN116682566A (zh) * 2023-08-03 2023-09-01 青岛市中医医院(青岛市海慈医院、青岛市康复医学研究所) 一种血液透析的数据处理方法及系统
CN117150891A (zh) * 2023-08-15 2023-12-01 幂光新材料科技(上海)有限公司 基于数据驱动的led灯珠功率智能预测方法及系统
CN117150891B (zh) * 2023-08-15 2024-04-26 幂光新材料科技(上海)有限公司 基于数据驱动的led灯珠功率智能预测方法及系统
CN117373664A (zh) * 2023-10-09 2024-01-09 曜立科技(北京)有限公司 基于数字疗法的冠脉术后危险数据分析预警系统
CN117373664B (zh) * 2023-10-09 2024-05-28 曜立科技(北京)有限公司 基于数字疗法的冠脉术后危险数据分析预警系统
CN117708764A (zh) * 2024-02-06 2024-03-15 青岛天高智慧科技有限公司 基于校园一卡通的学生消费数据智能分析方法
CN117708764B (zh) * 2024-02-06 2024-05-03 青岛天高智慧科技有限公司 基于校园一卡通的学生消费数据智能分析方法
CN117854732A (zh) * 2024-03-08 2024-04-09 微脉技术有限公司 一种基于大数据分析的慢性病管理方法与系统

Also Published As

Publication number Publication date
CN110782989A (zh) 2020-02-11
CN110782989B (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
WO2021052156A1 (zh) 数据分析方法、装置、设备及计算机可读存储介质
US10311036B1 (en) Database management for a logical registry
Letterie et al. Artificial intelligence in in vitro fertilization: a computer decision support system for day-to-day management of ovarian stimulation during in vitro fertilization
Bolton Sensitivity and specificity of outcome measures in patients with neck pain: detecting clinically significant improvement
US20100114601A1 (en) System and methods for prescribing therapeutic and preventive regimens
US10103947B2 (en) Processing of portable device data
US20170053082A1 (en) Method for prediction of a placebo response in an individual
Wang et al. Learning optimal individualized treatment rules from electronic health record data
US20170199972A1 (en) Processing of Portable Device Data
CN111681765B (zh) 一种医学问答系统的多模型融合方法
CN114416967A (zh) 智能推荐医生的方法、装置、设备及存储介质
Yadalam et al. Machine learning predicts patient tangible outcomes after dental implant surgery
CN112885466A (zh) 一种基于用户体质的肾脏疾病的预防方法及系统
CN115240828A (zh) 一种手术室智能调控系统及方法
Wang et al. An analytical solution for consent management in patient privacy preservation
Lutz et al. Patient-focused research in psychotherapy: Methodological background, decision rules and feedback tools
US20160034619A1 (en) Systems and Methods for Comparative Analysis
Li et al. Confounding adjustment in the analysis of augmented randomized controlled trial with hybrid control arm
WO2015169810A1 (en) Method for prediction of a placebo response in an individual
KR102510599B1 (ko) 익명화된 의료정보에 대한 2차적 의학 소견의 생성 및 관리를 위한 클라우드 컴퓨팅 환경기반 네트워크 서비스 시스템 및 방법
JP7384341B1 (ja) 脳卒中患者の身体の痛みの改善を目的とするリハビリテーションの効果を予測するための方法、及び、システム
CN113764106B (zh) 疫情防控效果预测方法及相关产品
CN117558407A (zh) 一种血瘘照射控制系统及方法
CN115295135A (zh) 基于分治算法的医疗数据质量改进方法、装置及存储介质
Carmona et al. Towards the Analysis of How Anonymization Affects Usefulness of Health Data in the Context of Machine Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20865041

Country of ref document: EP

Kind code of ref document: A1