CN111584063B - Method for evaluating statistical efficiency under different grouping sets - Google Patents

Method for evaluating statistical efficiency under different grouping sets Download PDF

Info

Publication number
CN111584063B
CN111584063B CN201910116297.0A CN201910116297A CN111584063B CN 111584063 B CN111584063 B CN 111584063B CN 201910116297 A CN201910116297 A CN 201910116297A CN 111584063 B CN111584063 B CN 111584063B
Authority
CN
China
Prior art keywords
specific disease
probability
sampling time
disease number
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910116297.0A
Other languages
Chinese (zh)
Other versions
CN111584063A (en
Inventor
陈陪蓉
蔡宗宪
陈亮恭
彭莉宁
李威儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acer Inc
National Yang Ming University NYMU
Original Assignee
Acer Inc
National Yang Ming University NYMU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Inc, National Yang Ming University NYMU filed Critical Acer Inc
Priority to CN201910116297.0A priority Critical patent/CN111584063B/en
Publication of CN111584063A publication Critical patent/CN111584063A/en
Application granted granted Critical
Publication of CN111584063B publication Critical patent/CN111584063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a method for evaluating statistical efficiency under different grouping sets, which comprises the steps of setting a plurality of first grouping ranges of a first grouping set corresponding to a sample space; setting a plurality of second grouping ranges of a second grouping set corresponding to the sample space; generating a plurality of first probability values and a plurality of first standard deviations corresponding to a plurality of first grouping ranges in each sampling time according to the sample space; generating a plurality of second probability values and a plurality of second standard deviations corresponding to a plurality of second packet ranges at each sampling time according to the sample space; and generating a plurality of statistical indexes corresponding to the first grouping set and the second grouping set, and outputting the efficiency ordering result of the first grouping set and the second grouping set according to the statistical indexes.

Description

Method for evaluating statistical efficiency under different grouping sets
Technical Field
The present invention relates to a method for evaluating statistical performance under different grouping sets, and more particularly, to a method for generating a statistical performance ranking result under different grouping sets according to statistical characteristics.
Background
With the technological changes, the medical technology is also gradually developed. In many current medical procedures, drug trials or disease management, the index of efficacy is often evaluated by using the patient's medical statistics. For example, a physician may count the number of diseases and the risk of mortality to establish different risk levels. In current medical technology, the debilitation Index (Frailty Index) is a common disease management Index. The debilitation index is defined as the number of patients in a specific disease set (e.g., 32 specific diseases) that results in a relationship between the number of diseases and the risk of death.
When using debilitating indicators for disease management, the number of diseases and the risk of death can be divided into arrays. For example, patients with 0 to 2 diseases are at lower risk of mortality, and thus patients with 0 to 2 diseases may be considered low risk patients. The risk of death of patients with 3-5 diseases is moderate, so patients with 3-5 diseases can be regarded as moderate risk patients. Patients with 6-8 diseases have a high risk of mortality, and thus patients with 6-8 diseases can be considered as high risk patients. Patients with 9 or more diseases are at extremely high risk of death, and thus patients with 9 or more diseases can be considered extremely high risk patients. In other words, in performing disease management, it is assumed that 32 specific diseases are considered, and the number set of the obtained diseases { [0 to 2], [3 to 5], [6 to 8], [9 to 32] } can be regarded as a sort of grouping set. However, the group set of 32 specific diseases is diversified, such as { [0 to 1], [2 to 7], [8 to 10], [11 to 32] } being another group set.
Currently, when disease management is performed by using a debilitating index, the number of diseases corresponding to different death risks can only be set by using a manual judgment mode. For example, the physician determines the merits of the grouping sets { 0-2 ], [ 3-5 ], [ 6-8 ], [ 9-32 ] } or the grouping sets { [ 0-1 ], [ 2-7 ], [ 8-10 ], [ 11-32 ] } by manual judgment. However, there is currently no automated mechanism to determine the merits of different sets of packets. Therefore, a lot of manpower is often consumed in disease management using the debilitating index. In addition, the accuracy is low because the decision criteria are not available for judging the quality of different grouping sets.
Disclosure of Invention
The embodiment of the invention provides a method for evaluating statistical performance under different grouping sets, which comprises the steps of setting a plurality of first grouping ranges of a first grouping set corresponding to a sample space, setting a plurality of second grouping ranges of a second grouping set corresponding to the sample space, generating a plurality of first probability values and a plurality of first standard deviations of the first grouping ranges corresponding to each sampling time according to the sample space, generating a plurality of second probability values and a plurality of second standard deviations of the second grouping ranges corresponding to each sampling time according to the sample space, and generating a plurality of statistical indexes corresponding to the first grouping set and the second grouping set, and outputting performance sequencing results of the first grouping set and the second grouping set according to the statistical indexes. The sample space is a random program sample space that varies over time. The at least one statistical index is derived from a plurality of first probability values and/or a plurality of first standard deviations corresponding to the plurality of first grouping ranges at each sampling time and a plurality of second probability values and/or a plurality of second standard deviations corresponding to the plurality of second grouping ranges at each sampling time.
Drawings
FIG. 1 is a block diagram of an embodiment of a statistical performance evaluation system of the present invention.
FIG. 2 is a schematic diagram of the statistical performance evaluation system of FIG. 1, which generates a plurality of statistical properties corresponding to a plurality of first grouping ranges at each sampling time according to a sample space.
FIG. 3 is a schematic diagram of the statistical performance evaluation system of FIG. 1, which generates a plurality of statistical properties corresponding to a plurality of second packet ranges at each sampling time according to a sample space.
FIG. 4 is a flowchart of a method for evaluating statistical performance under different grouping sets of the statistical performance evaluation system of the present invention.
Wherein reference numerals are as follows:
100. statistical performance evaluation system
10. Database for storing data
11. Processing device
11a probability value calculation unit
11b standard deviation calculation unit
11c statistical pointer calculation unit
11d statistical efficiency ordering unit
T1 to TN sampling time
G1 First group set
S1G1, S2G1, S3G1 and S4G1 probability curves
AG1, BG1, CG1 and DG1 feature points
S1G2, S2G2, S3G2 and S4G2 probability curves
AG2, BG2, CG2 and DG2 feature points
Steps S401 to S405
Detailed Description
FIG. 1 is a block diagram of an embodiment of a statistical performance evaluation system 100 of the present invention. The statistical performance evaluation system 100 has the function of automatically generating the statistical performance sequencing results under different grouping sets. Furthermore, the statistical performance evaluation system 100 can be applied to the analysis of statistical data in any field. However, for convenience of description, the statistical performance evaluation system 100 will be described below as applied to disease management and mortality risk management in medical science and technology. The statistical performance evaluation system 100 comprises a database 10 and a processing device 11. The database 10 may be any kind of memory space, such as a memory of a cloud server or a hard disk for storing patient data. The processing means 11 may be any hardware having computing capabilities, such as a computer, workstation or server or the like. The database 10 has a sample space, and a first grouping set and a second grouping set corresponding to the sample space. It should be appreciated that database 10 may also store a wide variety of groupings. However, for simplicity of description, the statistical performance evaluation system 100 will be described below in terms of a comparison of two sets of groupings. The processing device 11 is coupled to the database 10, and may include an probability value calculating unit 11a, a standard deviation calculating unit 11b, a statistical pointer calculating unit 11c, and a statistical performance ranking unit 11d. The probability value calculation unit 11a and the standard deviation calculation unit 11b are coupled to the database 10. The statistical pointer calculating unit 11c is coupled to the probability calculating unit 11a and the standard deviation calculating unit 11b. The statistical performance sorting unit 11d is coupled to the statistical pointer calculating unit 11c. In the statistical performance evaluation system 100, the probability value calculating unit 11a, the standard deviation calculating unit 11b, the statistical pointer calculating unit 11c and the statistical performance sorting unit 11d are not limited to the module forms. For example, the probability value calculating unit 11a, the standard deviation calculating unit 11b, the statistical pointer calculating unit 11c and the statistical performance sorting unit 11d may be software modules driven by a program, hardware modules or may be integrated into a function in a program language. Any reasonable modification of the module or technology of the processing device 11 is within the scope of the present disclosure. In the statistical performance evaluation system 100, the processing device 11 may set a plurality of first grouping ranges corresponding to the first grouping set of the sample space stored in the database 10. The processing means 11 may set a plurality of second packet ranges corresponding to the second set of packets of the sample space stored in the database 10. The probability value calculating unit 11a may generate a plurality of first probability values corresponding to a plurality of first packet ranges at each sampling time and a plurality of second probability values corresponding to a plurality of second packet ranges at each sampling time according to the sample space. The standard deviation calculating unit 11b may generate a plurality of first standard deviations corresponding to a plurality of first packet ranges at each sampling time and a plurality of second standard deviations corresponding to a plurality of second packet ranges at each sampling time according to the sample space. The statistical pointer calculating unit 11c may generate a plurality of statistical indexes corresponding to the first packet set and the second packet set. Then, the statistical performance ranking unit 11d may output the statistical performance ranking results of the first packet set and the second packet set according to the plurality of statistical indexes corresponding to the first packet set and the second packet set. In the statistical performance evaluation system 100, the sample space may be a random program sample space that varies over time. The plurality of statistical indicators may be generated by the statistical pointer calculating unit 11c according to a plurality of first probability values and/or a plurality of first standard deviations corresponding to the plurality of first packet ranges at each sampling time, and a plurality of second probability values and/or a plurality of second standard deviations corresponding to the plurality of second packet ranges at each sampling time. Details of the application of the statistical performance assessment system 100 to disease management and mortality risk management in medical technology will be described later.
As mentioned above, the sample space may be a random program sample space that varies over time. For example, the sample space includes data for a number of patients. The data for each patient may include a physiological condition versus time. For example, a patient is infected with 3 diseases 2 years ago. This patient was infected with 1 disease 3 years ago. Thus, it is expected that patient health in the sample space will be high at the time of initialization (the number of average infectious diseases is small). Patient health in the sample space will be progressively worse (greater number of average infectious diseases) with time, and thus the probability of patient death will be progressively higher. The first packet set has a plurality of first packet ranges. For example, the first set of packets may include four first packet ranges, such as {0, [ 1-3 ],4, [ > = 5] }. In other words, the first set of packets includes: (1) a case of infection with 0 diseases, (2) a case of infection with 1 to 3 diseases, (3) a case of infection with 4 diseases, (4) a case of infection with more than 5 diseases. Within the second set of packets are a plurality of second packet ranges. For example, the second set of packets may include four second packet ranges, such as { [ 0-3 ],4, [ 5-6 ], > = 7}. In other words, the second set of packets includes: (1) a case of infection with 0 to 3 diseases, (2) a case of infection with 4 diseases, (3) a case of infection with 5 to 6 diseases, and (4) a case of infection with more than 7 diseases. The objective of the statistical performance evaluation system 100 is to provide a systematic automatic evaluation method for comparing the statistical performance of the first packet set with that of the second packet set. In other words, the statistical performance evaluation system 100 may evaluate whether the grouping range is suitable for the sample space to have a referential property.
FIG. 2 is a diagram illustrating a plurality of statistical properties corresponding to each sampling time for generating a plurality of first packet ranges according to a sample space in the statistical performance evaluation system 100. As described above, the first group set G1 includes a plurality of first group ranges {0, [1 to 3],4, [ > =5 ] }. Therefore, at the sampling time T1, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate a plurality of first probability values and a plurality of first standard deviations of the first grouping ranges "0", "[1 to 3]," "4" and "[ > =5 ]". For example, the probability curve corresponding to the first packet range "0" is S1G1. At the sampling time T1, the feature point corresponding to the sample space is AG1. A first probability value (0.43) and a first standard deviation (0.025) for a sample space characterized by AG1 may be generated. The probability curve corresponding to the first grouping range "[1 to 3]" is S2G1. At the sampling time T1, the feature point corresponding to the sample space is BG1. A first probability value (0.60) and a first standard deviation (0.021) of the sample space for a feature point BG1 can be generated. The probability curve corresponding to the first packet range "4" is S3G1. At the sampling time T1, the feature point corresponding to the sample space is CG1. A first probability value (0.76) and a first standard deviation (0.0058) of the sample space for a feature point CG1 may be generated. The probability curve corresponding to the first packet range "[ > =5 ]", is S4G1. At the sampling time T1, the feature point corresponding to the sample space is DG1. A first probability value (0.91) and a first standard deviation (0.0012) of the sample space for a characteristic point DG1 may be generated. The first probability value may be considered a risk of death. It should be understood that the first packet range "0", the first packet ranges "[1 to 3]", the first packet range "4", and the first packet range "[ > =5 ]" belong to a plurality of first packet ranges under the first packet set G1. In other words, the value ranges of the plurality of first packet ranges under the first packet group G1 do not overlap with each other. In fig. 2, the X-axis is the time axis and the Y-axis is the survival rate value. Survival rate value was (1-risk of death). Therefore, the probability of death will be higher and the probability of survival will be lower with time from the probability curve S1G1 to the probability curve S4G1. And, in general, the greater the number of infectious diseases, the higher the risk of death and the lower the chance of survival. Therefore, at the sampling time T1, the statistical characteristics of the first packet ranges "0", "1 to 3", "4" and "[ > =5 ]" generated by the probability value calculating unit 11a and the standard deviation calculating unit 11b can be described in table 1 as follows.
TABLE 1
Also, the first probability value and the first standard deviation shown in table 1 are based on the sampling time T1. However, the probability value calculating unit 11a and the standard deviation calculating unit 11b may generate the corresponding first probability value and the first standard deviation at each time point. For example, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate the first probability value and the first standard deviation corresponding to the four first grouping ranges {0, [1 to 3],4, [ > =5 ] } at the sampling time T2. By analogy, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate the first probability value and the first standard deviation corresponding to the four first grouping ranges {0, [ 1-3 ],4, [ > =5 ] } at the sampling time TN. However, the statistical performance evaluation system 100 of the present invention is not limited to the number of sampling times N. N may be any positive integer.
FIG. 3 is a diagram illustrating a statistical performance evaluation system 100 for generating a plurality of statistical properties corresponding to a plurality of second packet ranges at each sampling time according to a sample space. As before, the second packet set G2 includes a plurality of second packet ranges { [0 to 3],4, [5 to 6], [ > =7 ] }. Therefore, at the sampling time T1, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate a plurality of second probability values and a plurality of second standard deviations of the second packet ranges "[0 to 3]," 4"," [5 to 6], "[ > =7 ]". For example, the probability curve corresponding to the second packet range "[0 to 3]" is S1G2. At the sampling time T1, the feature point corresponding to the sample space is AG2. A second probability value (0.33) and a second standard deviation (0.051) for the sample space for the feature point AG1 may be generated. The probability curve corresponding to the second packet range "4" is S2G2. At the sampling time T1, the feature point corresponding to the sample space is BG2. A second probability value (0.49) and a second standard deviation (0.028) of the sample space for BG2 may be generated. The probability curve corresponding to the second grouping range [ 5-6 ] "is S3G2. At the sampling time T1, the feature point corresponding to the sample space is CG2. A second probability value (0.60) and a second standard deviation (0.021) for a sample space characterized by CG2 can be generated. The probability curve corresponding to the second packet range "[ > =7 ]" is S4G2. At the sampling time T1, the feature point corresponding to the sample space is DG2. A second probability value (0.89) and a second standard deviation (0.0012) for the sample space at the characteristic point DG2 may be generated. The second probability value may be considered a risk of death. It should be understood that the second packet ranges "[0 to 3]", the second packet range "4", the second packet ranges "[5 to 6]" and the second packet range "[ > =7 ]" belong to the plurality of second packet ranges under the second packet set G2. In other words, the value ranges of the second packet ranges in the second packet set G2 do not overlap with each other. In fig. 3, the X-axis is the time axis and the Y-axis is the survival rate value. Survival rate value was (1-risk of death). Therefore, the probability of death will be higher and the probability of survival will be lower with time from the probability curve S1G2 to the probability curve S4G2. And, in general, the greater the number of infectious diseases, the higher the risk of death and the lower the chance of survival. Therefore, at the sampling time T1, the plurality of statistical properties of the packet ranges "[0 to 3]," 4"," [5 to 6], "[ > =7 ]" generated by the probability value calculating unit 11a and the standard deviation calculating unit 11b can be described in table 2 as follows.
TABLE 2
And, the second probability value and the second standard deviation shown in table 2 are based on the sampling time T1. However, the probability value calculating unit 11a and the standard deviation calculating unit 11b may generate the corresponding first probability value and the first standard deviation at each time point. For example, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate the second probability values and the second standard deviations corresponding to the four second packet ranges { [ 0-3 ],4, [ 5-6 ], [ > =7 ] } at the sampling time T2. By analogy, the probability value calculating unit 11a and the standard deviation calculating unit 11b can generate the second probability values and the second standard deviations corresponding to the four second packet ranges { [ 0-3 ],4, [ 5-6 ], [ > =7 ] } at the sampling time TN. Details of the algorithm for evaluating the statistical performance will be described below with respect to the statistical characteristics obtained as described above.
In the statistical performance evaluation system 100, the first method of statistical performance evaluation is to calculate a discriminatory power index. The statistical pointer calculating unit 11c generates a plurality of adjacent first packet ranges at each sampling time according to the first probability values corresponding to the first packet ranges at each sampling timeA plurality of first probability differences corresponding to each other. For example, in Table 1, at sample time T1, a first probability value (0.43) for a first packet range "0" and a first packet range "[ 1-3 ]]The first probability difference for the "first probability value (0.60) is 0.60-0.43=0.17. First grouping range "[ 1-3 ]]The first probability difference between the "first probability value (0.60) and the first probability value (0.76) of the first packet range" 4 "is 0.76-0.60=0.16. First probability value (0.76) for first packet range "4" and first packet range "[ first packet range ]>=5]The first probability difference for the "first probability value (0.91) is 0.91-0.76=0.15. Then, the statistical pointer calculating unit 11c may generate an average value and a standard deviation of the first probability differences corresponding to each sampling time. For example, the first probability difference {0.17, 0.16, 0.15} has an average value of 0.16 and a standard deviation of 0.01. Then, the statistical pointer calculating unit 11c may generate the first discriminative power index corresponding to each sampling time according to the average value and the standard deviation of the first probability differences. The first discriminative power indicator may be a ratio of an average value of the first probability gap to a standard deviation of the first probability gap. For example, in table 1, at the sampling time T1, the first discrimination index may be derived as 0.16/0.01=16. In other words, at the sampling time T1, the first discriminative force pointer is related to the degree of dispersion of the feature points AG1, BG1, CG1 and DG1. Also, the statistical pointer calculating unit 11c calculates the first discrimination index for all the sampling times T1 to TN, for example, the first discrimination index for the sampling time T1 is D 1 (T1) the first discriminatory power index of the sampling time T2 is D 1 (T2), …, and the first discrimination index of the sampling time TN is D 1 (TN). Finally, the statistical pointer calculating unit 11c obtains a minimum first discrimination index among all the sampling times T1 to TN of the first packet set G1. Therefore, the minimum first discriminative power index minD 1 Can be expressed as:
minD 1 =min{D 1 (T1),D 1 (T2),D 1 (T3),…D 1 (TN)}
thus, the minimum first discriminatory power index minD of the first group G1 1 Meaning that these probability feature points are least discreteIs a distinguishing power index of the time point of (a).
Similarly, the statistical pointer calculating unit 11c generates a plurality of second probability differences corresponding to each sampling time for a plurality of adjacent second packet ranges according to a plurality of second probability values corresponding to each sampling time for a plurality of second packet ranges. For example, in Table 2, at sample time T1, the second packet range "[ 0-3 ]]The second probability difference between the second probability value (0.33) of "and the second probability value (0.49) of the second packet range" 4 "is 0.49-0.33=0.16. First probability value (0.49) of second packet range "4" and second packet range "[ 5-6 ]]The second probability difference for the second probability value (0.60) of "is 0.60-0.49 = 0.11. Second packet Range "[ 5-6 ]]"second probability value (0.60) and second packet Range" [>=7]The first probability difference for the second probability value (0.89) of "is 0.89-0.60 = 0.29. Then, the statistical pointer calculating unit 11c may generate an average value and a standard deviation of a plurality of second probability differences corresponding to each sampling time. For example, the second probability difference {0.16, 0.11, 0.29} has an average value of 0.187 and a standard deviation of 0.09. Then, the statistical pointer calculating unit 11c may generate the second discriminative power index corresponding to each sampling time according to the average value and the standard deviation of the second probability differences. The second discriminative power indicator may be a ratio of an average value of the second probability gap to a standard deviation of the second probability gap. For example, in table 2, at the sampling time T1, the second discrimination index may be derived as 0.187/0.09=2.07. In other words, at the sampling time T1, the second discriminative force pointer is related to the degree of dispersion of the feature points AG2, BG2, CG2 and DG2. Also, the statistical pointer calculating unit 11c calculates all the second discrimination indexes of all the sampling times T1 to TN, for example, the second discrimination index of the sampling time T1 is D 2 (T1) the second discriminatory power index of the sampling time T2 is D 2 (T2), …, and the second discrimination index of the sampling time TN is D 2 (TN). Finally, the statistical pointer calculating unit 11c obtains a minimum second discrimination index among all the sampling times T1 to TN of the second packet set G2. Therefore, the minimum second differential power index minD 2 Can be expressed as:
minD 2 =min{D 2 (T1),D 2 (T2),D 2 (T3),…D 2 (TN)}
therefore, the smallest second discriminatory power index minD of the second packet set G2 2 Meaning that the probability feature point dispersion is worst.
As described above, the statistical pointer generated by the statistical pointer calculation unit 11c includes the minimum first discrimination index minD of the first group set G1 1 And a minimum second discriminatory power index minD for the second packet set G2 2 . Then, if the first discrimination index minD is the minimum 1 Greater than a minimum second differential force index minD 2 The statistical power sorting unit 11d will set the statistical power of the first group G1 to be greater than the statistical power of the second group G2. In other words, the rule that the statistical performance ordering unit 11d selects the preferred grouping set is as follows:
max{minD 1 ,minD 2 }
in other words, the statistical performance ranking unit 11d may use the rule of maximum and minimum (max-min) to select the preferred group set. Thus, the discriminative power index at the point in time at which the probability feature point dispersion is the worst will be maximized in the selected group set. Therefore, the selected group set should have acceptable statistical performance even at the time point where the probability feature point dispersion is the worst.
In the statistical performance evaluation system 100, a second method of statistical performance evaluation is to calculate an error level indicator. The statistical pointer calculating unit 11c generates a first standard deviation total coverage value corresponding to each sampling time according to a plurality of first standard deviations corresponding to each sampling time in a plurality of first grouping ranges by using a linear combination function. For example, in Table 1, at the sampling time T1, the first grouping range "[ 1-3 ] is based on the first standard deviation (0.025) of the first grouping range" 0]]"first standard deviation (0.021), first standard deviation (0.0058) of first packet range" 4", and first packet range" [>=5]"first standard deviation (0.0012), according to a linear combination function of 0.025+2×0.021+2×0.0058+0.0012 produces a first standard deviation total coverage value 0.0798. The statistical pointer calculating unit 11c obtains a maximum first probability difference corresponding to each sampling time from the plurality of first probability values according to the plurality of first probability values corresponding to each sampling time in the plurality of first grouping ranges. For example, in Table 1, at sampling time T1, the first probability value (0.43) according to the first packet range "0", the first packet ranges "[ 1-3 ]]"first probability value (0.60), first probability value (0.76) of first packet range" 4", and first packet range" [ first probability value of first packet range ]>=5]"first probability value (0.91), the maximum first probability difference is 0.91-0.43=0.48. Then, the statistical pointer calculating unit 11c generates a first error level index corresponding to each sampling time according to the first standard deviation total coverage value and the maximum first probability gap. The first error level indicator may be a ratio of the first standard deviation total coverage value to the maximum first probability gap. For example, in table 1, at the sampling time T1, the first error level indicator may be derived as 0.0798/0.48=0.166. In other words, at the sampling time T1, the first error level indicator is related to the concentration level of the sampling distribution corresponding to the probability curves S1G1, S2G1, S3G1 and S4G1 in the sample space. The statistical pointer calculating unit 11c also calculates the first error level index for all the sampling times T1 to TN, for example, the first error level index for the sampling time T1 is E 1 (T1) the first error degree index of the sampling time T2 is E 1 (T2), …, and the first error degree index of the sampling time TN is E 1 (TN). Finally, the statistical pointer calculating unit 11c obtains a maximum first error level indicator among all the sampling times T1 to TN of the first packet set G1. Therefore, the maximum first error degree index maxE 1 Can be expressed as:
maxE 1 =max{E 1 (T1),E 1 (T2),E 1 (T3),…E 1 (TN)}
thus, the maximum first error degree index maxE of the first group G1 1 Meaning an error level indicator at the time point where the concentration level of the sampling distribution is the worst.
Similarly, statisticsThe pointer calculating unit 11c generates a second standard deviation total coverage value corresponding to each sampling time according to a plurality of second standard deviations corresponding to each sampling time in a plurality of second packet ranges by using a linear combination function. For example, in Table 2, at sample time T1, according to the second packet range "[ 0-3 ]]"second standard deviation (0.051), second standard deviation (0.028) of the second packet range" 4", second packet range" [ 5-6 ]]"second standard deviation (0.021) and second packet range" [>=7]"second standard deviation (0.0012), a second standard deviation total coverage value 0.1502 is produced according to the linear combination function 0.051+2×0.028+2×0.021+0.0012. The statistical pointer calculating unit 11c obtains a maximum second probability difference corresponding to each sampling time from the second probability values according to the second probability values corresponding to each sampling time in the second packet ranges. For example, in Table 2, at sample time T1, according to the second packet range "[ 0-3 ]]"second probability value (0.33), second probability value (0.49) of second packet range" 4", second packet range" [ 5-6 ]]"second probability value (0.60) and second packet Range" [>=7]"second probability value (0.89), the maximum first probability difference is 0.89-0.33=0.56. Then, the statistical pointer calculating unit 11c generates a second error level index corresponding to each sampling time according to the second standard deviation total coverage value and the maximum second probability gap. The second error level indicator may be a ratio of the second standard deviation total coverage value to the maximum second probability gap. For example, in table 2, at the sampling time T1, the second error level indicator may be derived as 0.1502/0.56=0.268. In other words, at the sampling time T1, the second error level indicator is related to the concentration level of the sampling distribution corresponding to the probability curves S1G2, S2G2, S3G2 and S4G2 in the sample space. The statistical pointer calculating unit 11c also calculates a second error level index for all the sampling times T1 to TN, e.g., the second error level index for the sampling time T1 is E 2 (T1) the second error degree index of the sampling time T2 is E 2 (T2), …, and the second error degree index of the sampling time TN is E 2 (TN). Finally, a statistical pointerThe computing unit 11c obtains a maximum second error level indicator among all the sampling times T1 to TN of the second packet set G2. Therefore, the maximum second error degree index maxE 2 Can be expressed as:
maxE 2 =max{E 2 (T1),E 2 (T2),E 2 (T3),…E 2 (TN)}
thus, the maximum second error degree index maxE of the second packet set G2 2 Meaning an error level indicator at the time point where the concentration level of the sampling distribution is the worst.
As described above, the statistical pointer generated by the statistical pointer calculation unit 11c includes the maximum first error degree index maxE of the first packet set G1 1 And a maximum second error degree index maxE of the second packet set G2 2 . Then, if the maximum first error degree maxE 1 The index is smaller than the maximum second error degree index maxE 2 The statistical power sorting unit 11d will set the statistical power of the first group G1 to be greater than the statistical power of the second group G2. In other words, the rule that the statistical performance ordering unit 11d selects the preferred grouping set is as follows:
min{maxD 1 ,maxD 2 }
in other words, the statistical performance ranking unit 11d may use the rule of minimum maximum (min-max) to select the preferred group set. Therefore, the error level index at the time point where the concentration level of the sampling distribution is the worst in the selected group set is minimized. Therefore, even at the point in time where the concentration of the sampling profile is the worst, the selected set of packets should have acceptable statistical performance since the degree of error is minimized.
In the statistical performance evaluation system 100, a third method of statistical performance evaluation is to calculate a comprehensive index. The comprehensive index may be a ranking sum or a custom weight index, as described below. The statistical pointer sorting unit 11d may obtain the first ranking sum value corresponding to the first group set G1 according to a plurality of statistical indexes (distinction index and error degree index). The statistical pointer sorting unit 11d may obtain a second ranking sum value corresponding to the second group set G2 according to a plurality of statistical indexes (distinction index and error degree index), as shown in table 3.
TABLE 3 Table 3
The first ranking sum value and the second ranking sum value are two positive integers greater than two. In table 3, if the first ranking sum value (2) is smaller than the second ranking sum value (4), it indicates that the statistical performance of the first group G1 is greater than the statistical performance of the second group G2. However, the comprehensive indicator of the statistical performance evaluation system 100 may also be a custom weight indicator. The statistical pointer ordering unit 11d may set a plurality of weights corresponding to a plurality of statistical indexes. For example, the minimum first discrimination indexes minD mentioned above are set 1 Minimum second differential force index minD 2 Maximum first error degree index maxE 1 Maximum second error degree index maxE 2 Is a weight of (2). Then, the statistical pointer ordering unit 11d may generate a first comprehensive index corresponding to the first group set G1 and a second comprehensive index corresponding to the second group set G2 according to the weights by using a linear or nonlinear combination function. Moreover, the plurality of weights may be integer or floating point numbers, and the first and second comprehensiveness indexes may also be integer or floating point numbers. The statistical pointer ordering unit 11d may order the statistical performance of the first packet set G1 and the second packet set G2 according to the comparison result of the first comprehensive index and the second comprehensive index.
Any reasonable change of the technical content of the method for evaluating statistical performance under different grouping sets falls into the scope of the disclosure. For example, in the probability models shown in fig. 2 and 3 established by the processing device 11, samples corresponding to a small sampling time (e.g., 0-100 days) may be ignored. The reason is that the four probability curves between 0 and 100 days almost coincide (survival rate approaches 100%), so the reference value is low. Therefore, the operation complexity can be reduced by omitting the samples corresponding to a small sampling time.
FIG. 4 is a flowchart of a method for evaluating statistical performance of the statistical performance evaluation system 100 according to the present invention under different grouping sets. The method for evaluating statistical performance includes steps S401 to S405. Any reasonable modification is within the scope of the present disclosure. Steps S401 to S405 are described below.
Step S401: setting a plurality of first grouping ranges of a first grouping set G1 corresponding to the sample space;
step S402: setting a plurality of second grouping ranges of a second grouping set G2 corresponding to the sample space;
step S403: generating a plurality of first probability values and a plurality of first standard deviations corresponding to a plurality of first grouping ranges at each sampling time according to the sample space;
step S404: generating a plurality of second probability values and a plurality of second standard deviations corresponding to a plurality of second packet ranges at each sampling time according to the sample space;
step S405: and generating a plurality of statistical indexes corresponding to the first grouping set and the second grouping set, and outputting the statistical efficiency ordering result of the first grouping set and the second grouping set according to the statistical indexes.
Details of step S401 to step S405 are described above, and will not be repeated here. The statistical performance evaluation system 100 can automatically evaluate the statistical performance under different grouping sets according to steps S401 to S405. Therefore, the labor consumption can be greatly reduced, and the accuracy of the evaluation can be improved.
In summary, the present invention describes a method for evaluating statistical performance under different grouping sets and a statistical performance evaluation system. The statistical performance evaluation system can calculate a plurality of statistical indexes (such as a discrimination index, an error degree index and a comprehensive index) by using the probability value and/or the standard deviation of each sampling time of the sample space. Moreover, the statistical performance evaluation system can accurately and automatically sort the statistical performance of different grouping sets according to a plurality of statistical indexes. Therefore, when the statistical performance evaluation system is applied to medical science and technology, the optimal disease number grouping mode can be automatically determined, so that the risk difference of different risk levels is more obvious, the concentration of samples is higher, and the error value is smaller. Moreover, the statistical performance evaluation system can also be applied to data control of other physiological data (such as blood pressure values) and risk data (such as heart disease risk). In this application, the statistical performance evaluation system may automatically determine the optimal physiological data grouping pattern.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for evaluating statistical performance under different sets of packets, comprising:
setting a plurality of first specific disease number grouping ranges of a first specific disease number grouping set corresponding to a sample space stored in a database;
setting a plurality of second specific disease number grouping ranges of a second specific disease number grouping set corresponding to the sample space stored in the database;
according to the sample space, the probability value calculation unit of the processor and the standard deviation calculation unit of the processor are utilized to respectively generate a plurality of first specific disease number probability values and a plurality of first specific disease number standard deviations corresponding to the plurality of first specific disease number grouping ranges at each sampling time
Difference;
generating a plurality of second specific disease number probability values and a plurality of second specific disease number standard deviations corresponding to the sampling time respectively by using the probability value calculation unit and the standard deviation calculation unit according to the sample space;
generating a plurality of statistical indexes corresponding to the first specific disease number grouping set and the second specific disease number grouping set by using a statistical index calculation unit of the processor according to the first specific disease number grouping range and/or the first specific disease number probability value and/or the first specific disease number standard deviation corresponding to each sampling time and the second specific disease number grouping range and/or the second specific disease number probability value and/or the second specific disease number standard deviation corresponding to each sampling time; a kind of electronic device with high-pressure air-conditioning system
According to the statistical indexes corresponding to the first specific disease number grouping set and the second specific disease number grouping set, a statistical efficiency ordering unit of the processor is utilized to output a statistical efficiency ordering result of the first specific disease number grouping set and the second specific disease number grouping set;
the sample space is a random program sampling space which changes along with time, the statistical indexes comprise a minimum first discriminatory power index generated by the first specific disease quantity machine value in all sampling time of the first specific disease quantity grouping set and a minimum second discriminatory power index generated by the second specific disease quantity machine value in all sampling time of the second specific disease quantity grouping set, and if the minimum first discriminatory power index is larger than the minimum second discriminatory power index, the statistical efficiency of the first specific disease quantity grouping set in evaluating probability dispersion is larger than the statistical efficiency of the second specific disease quantity grouping set in evaluating probability dispersion; a kind of electronic device with high-pressure air-conditioning system
The minimum first discriminative power index corresponds to the worst probability dispersion of the first specific disease number group set sampled at the first specific disease number probability values in all the sampling times, and the minimum second discriminative power index corresponds to the worst probability dispersion of the second specific disease number group set sampled at the second specific disease number probability values in all the sampling times.
2. The method as recited in claim 1, further comprising:
generating a plurality of first specific disease number probability gaps corresponding to a plurality of adjacent first specific disease number grouping ranges at each sampling time according to the first specific disease number probability values corresponding to the first specific disease number grouping ranges at each sampling time;
generating an average value and a standard deviation of the probability differences of the first specific diseases corresponding to each sampling time;
generating a first discriminatory power index corresponding to each sampling time according to the average value and the standard deviation of the first specific disease number probability differences; a kind of electronic device with high-pressure air-conditioning system
Obtaining the minimum first discriminatory power indicator for the all sampling times of the first specific disease number packet set;
wherein the first discriminatory power indicator is a ratio of the average of the first plurality of disease number probability differences to the standard deviation of the first plurality of disease number probability differences.
3. The method as recited in claim 2, further comprising:
generating a plurality of second specific disease number probability gaps corresponding to a plurality of adjacent second specific disease number grouping ranges at each sampling time according to the second specific disease number probability values corresponding to the second specific disease number grouping ranges at each sampling time;
generating an average value and a standard deviation of the probability differences of the second specific disease number corresponding to each sampling time;
generating a second discriminatory power index corresponding to each sampling time according to the average value and the standard deviation of the second specific disease number probability differences; a kind of electronic device with high-pressure air-conditioning system
The smallest second of the all sampling times taken for the second group of disease-specific numbers
A discriminative power index;
wherein the second discriminatory power indicator is a ratio of the average of the plurality of second specific disease number odds differences to the standard deviation of the plurality of second specific disease number odds differences.
4. The method as recited in claim 1, further comprising:
generating a first standard deviation total coverage value corresponding to each sampling time by using a linear combination function according to the first specific disease number group range and the first specific disease number standard deviation corresponding to each sampling time;
grouping the first specific disease number probability values according to the first specific disease number groups, and selecting one of the first specific disease number probability values
Obtaining a maximum first probability gap corresponding to each sampling time;
generating a first error degree index corresponding to each sampling time according to the first standard deviation total coverage value and the maximum first probability gap; a kind of electronic device with high-pressure air-conditioning system
Obtaining a maximum first error degree index in all sampling time of the first specific disease quantity grouping set;
wherein the first error level indicator is a ratio of the first standard deviation total coverage value to the maximum first probability gap.
5. The method as recited in claim 4, further comprising:
generating a second standard deviation total coverage value corresponding to each sampling time according to the second specific disease number grouping range of the second specific disease numbers and the standard deviation of the second specific disease numbers corresponding to each sampling time by using the linear combination function;
grouping the plurality of second specific disease number probability values according to the plurality of second specific disease number probability values corresponding to each sampling time, and selecting one of the plurality of second specific disease number probability values
Obtaining a maximum second probability gap corresponding to each sampling time;
generating a second error degree index corresponding to each sampling time according to the second standard deviation total coverage value and the maximum second probability gap; a kind of electronic device with high-pressure air-conditioning system
Obtaining a maximum second error degree index in all sampling time of the second specific disease quantity grouping set;
wherein the second error level indicator is a ratio of the second standard deviation total coverage value to the maximum second probability gap.
6. The method of claim 5, wherein the plurality of statistical indicators includes the maximum first error level indicator and the maximum second error level indicator, and wherein if the maximum first error level indicator is less than the maximum second error level indicator, the statistical performance of the first set of specified number of diseases is greater than the statistical performance of the second set of specified number of diseases when evaluating the statistical performance of both error levels.
7. The method as recited in claim 1, further comprising:
setting a plurality of weights corresponding to the plurality of statistical indexes;
generating a first comprehensive index corresponding to the first specific disease quantity grouping set according to the plurality of weights; a kind of electronic device with high-pressure air-conditioning system
Generating a second comprehensive index corresponding to the second specific disease quantity grouping set according to the plurality of weights;
wherein the plurality of weights are integer or floating point numbers, and the first and second syndrome indexes are integer or floating point numbers.
8. The method as recited in claim 1, further comprising:
obtaining a first ranking sum value corresponding to the first specific disease quantity grouping set according to the plurality of statistical indexes;
obtaining a second ranking sum value corresponding to the second specific disease quantity grouping set according to the plurality of statistical indexes; a kind of electronic device with high-pressure air-conditioning system
Wherein the first and second rank sum values are two positive integers greater than two.
9. The method of claim 8, wherein if the first ranked sum value is less than the second ranked sum value, then the statistical performance of the first group of specified disease number is greater than the statistical performance of the second group of specified disease number when evaluating statistical performance in a two-weighted ranking.
CN201910116297.0A 2019-02-15 2019-02-15 Method for evaluating statistical efficiency under different grouping sets Active CN111584063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910116297.0A CN111584063B (en) 2019-02-15 2019-02-15 Method for evaluating statistical efficiency under different grouping sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910116297.0A CN111584063B (en) 2019-02-15 2019-02-15 Method for evaluating statistical efficiency under different grouping sets

Publications (2)

Publication Number Publication Date
CN111584063A CN111584063A (en) 2020-08-25
CN111584063B true CN111584063B (en) 2023-11-10

Family

ID=72125941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910116297.0A Active CN111584063B (en) 2019-02-15 2019-02-15 Method for evaluating statistical efficiency under different grouping sets

Country Status (1)

Country Link
CN (1) CN111584063B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200816078A (en) * 2006-09-25 2008-04-01 Univ Nat Yang Ming Data acquisition device of collecting physiological parameters and the method thereof
TW201237647A (en) * 2010-10-27 2012-09-16 Solido Design Automation Inc Method and system for identifying rare-event failure rates
CN105574322A (en) * 2011-06-17 2016-05-11 财团法人工业技术研究院 Physiological parameter index operation system and method
CN107111670A (en) * 2014-11-12 2017-08-29 皇家飞利浦有限公司 Weak method and apparatus for quantifying simultaneously monitoring object

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218010A1 (en) * 2004-10-18 2006-09-28 Bioveris Corporation Systems and methods for obtaining, storing, processing and utilizing immunologic information of individuals and populations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200816078A (en) * 2006-09-25 2008-04-01 Univ Nat Yang Ming Data acquisition device of collecting physiological parameters and the method thereof
TW201237647A (en) * 2010-10-27 2012-09-16 Solido Design Automation Inc Method and system for identifying rare-event failure rates
CN105574322A (en) * 2011-06-17 2016-05-11 财团法人工业技术研究院 Physiological parameter index operation system and method
CN107111670A (en) * 2014-11-12 2017-08-29 皇家飞利浦有限公司 Weak method and apparatus for quantifying simultaneously monitoring object

Also Published As

Publication number Publication date
CN111584063A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN108492887B (en) Medical knowledge map construction method and device
US8321383B2 (en) System and method for automatic weight generation for probabilistic matching
CN108766559B (en) Clinical decision support method and system for intelligent disease screening
CN112951413A (en) Asthma diagnosis system based on decision tree and improved SMOTE algorithm
McCarthy et al. Sib‐pair collection strategies for complex diseases
CN111341458B (en) Single-gene disease name recommendation method and system based on multi-level structure similarity
CN111584063B (en) Method for evaluating statistical efficiency under different grouping sets
CN111599487B (en) Assistant decision-making method for traditional Chinese medicine compatibility based on association analysis
TW202029075A (en) Statistic performance evaluation method for different grouping sets
CN109872783B (en) Diabetes literature information standard database set analysis method based on big data
Min et al. Relating Complexity and Error Rates of Ontology Concepts
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
CN114707608B (en) Medical quality control data processing method, device, equipment, medium and program product
CN111403013B (en) Method and device for capability assessment
Flehinger et al. HEME: A self-improving computer program for diagnosis-oriented analysis of hematologic diseases
EP2320342A1 (en) System and method for creating data links between diagnostic and prescription information records
CN114242178A (en) Method for quantitatively predicting biological activity of ER alpha antagonist based on gradient lifting decision tree
CN112599210B (en) Data management method and device, electronic equipment and storage medium
da Silva et al. Silhouette-based feature selection for classification of medical images
TWI817795B (en) Cancer progression discriminant method and system thereof
WO2018210877A1 (en) Method for analysing cell-free nucleic acids
Strashny et al. Survey Weights in the 2018 National Ambulatory Medical Care Survey Adjusted Using Iterative Proportional Fitting
CN107591206A (en) A kind of medical science Testing index importance evaluation method based on SVM
CN115295135B (en) Medical data quality improvement method and device based on divide-and-conquer algorithm and storage medium
WO2022267096A1 (en) Performance measurement method and apparatus for metric space partitioning boundaries, and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant