CN115910364A - Medical inspection quality control model training method, medical inspection quality control method and system - Google Patents
Medical inspection quality control model training method, medical inspection quality control method and system Download PDFInfo
- Publication number
- CN115910364A CN115910364A CN202211449669.XA CN202211449669A CN115910364A CN 115910364 A CN115910364 A CN 115910364A CN 202211449669 A CN202211449669 A CN 202211449669A CN 115910364 A CN115910364 A CN 115910364A
- Authority
- CN
- China
- Prior art keywords
- quality control
- model
- value
- feature
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003908 quality control method Methods 0.000 title claims abstract description 237
- 238000012549 training Methods 0.000 title claims abstract description 216
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000002558 medical inspection Methods 0.000 title claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 199
- 238000000611 regression analysis Methods 0.000 claims abstract description 38
- 238000007689 inspection Methods 0.000 claims abstract description 36
- 238000012544 monitoring process Methods 0.000 claims abstract description 13
- 238000001514 detection method Methods 0.000 claims description 36
- 238000007667 floating Methods 0.000 claims description 27
- 238000012216 screening Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 114
- 238000012360 testing method Methods 0.000 description 37
- 238000009826 distribution Methods 0.000 description 34
- 238000012417 linear regression Methods 0.000 description 31
- 230000000694 effects Effects 0.000 description 19
- 102000009027 Albumins Human genes 0.000 description 17
- 108010088751 Albumins Proteins 0.000 description 17
- 238000004458 analytical method Methods 0.000 description 17
- 210000000265 leukocyte Anatomy 0.000 description 17
- 238000010586 diagram Methods 0.000 description 14
- 238000013461 design Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 8
- 238000005520 cutting process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 102100036475 Alanine aminotransferase 1 Human genes 0.000 description 2
- 108010082126 Alanine transaminase Proteins 0.000 description 2
- 108010003415 Aspartate Aminotransferases Proteins 0.000 description 2
- 102000004625 Aspartate Aminotransferases Human genes 0.000 description 2
- 102000001554 Hemoglobins Human genes 0.000 description 2
- 108010054147 Hemoglobins Proteins 0.000 description 2
- 239000012491 analyte Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 210000003743 erythrocyte Anatomy 0.000 description 2
- 239000008103 glucose Substances 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000005534 hematocrit Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 125000000118 dimethyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 238000009533 lab test Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000803 paradoxical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 235000018102 proteins Nutrition 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Apparatus For Radiation Diagnosis (AREA)
Abstract
The application provides a medical inspection quality control model training method, a medical inspection quality control method and a medical inspection quality control system, wherein a training data set is obtained, and a multidimensional feature space corresponding to the training data set is determined according to a plurality of attribute features corresponding to the training data set; segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to the sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set; and constructing and training a quality control model according to a preset mean model, a target training set and preset quality control requirements, and then carrying out real-time quality monitoring on the real-time inspection data of the medical institution by using the trained quality control model. Regression analysis is carried out on the whole part of the characteristic space, and the technical problem that the recognition accuracy of the quality control model is not high enough is solved.
Description
Technical Field
The application relates to the technical field of medical quality control, in particular to a medical inspection quality control model training method, a medical inspection quality control method and a medical inspection quality control system.
Background
Real-time quality control based on patient data has become a research hotspot of quality monitoring inspection in the field of medical health and a main trend of future development, but at the present stage, a plurality of obstacles still exist in practical application, for example, the quality control effect of a quality control model is still not ideal.
The quality control model based on the real-time quality control of the patient data directly utilizes the test data as samples during training, but the samples are inevitably influenced by various factors, so that the accuracy of the expressed quality control information is not enough, for example, the theoretical accurate value of a certain test item is 3.2, but the numerical value obtained by sampling the training sample fluctuates within the range of 2.8-3.6, the influence factors causing the errors in the medical and health field can be operational or procedural, and some influence factors are very hidden, so that the identification accuracy and accuracy of the trained quality control model are not good, and the false positive rate is high.
Therefore, how to eliminate the influence of various known or unknown factors in the sample data on the accuracy of the information contained in the sample data when the quality control model is trained to obtain a quality control model with better quality control effect becomes a technical problem to be solved urgently.
Disclosure of Invention
The application provides a medical inspection quality control model training method, a medical inspection quality control method and a medical inspection quality control system, which aim to solve the technical problem that the identification accuracy of a quality control model is not high enough.
In a first aspect, the present application provides a method for training a quality control model for medical examination, comprising:
acquiring a training data set, and determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set, wherein the training data set comprises: sample data collected by a plurality of different levels and/or different types of medical institutions;
segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set;
and constructing and training a quality control model according to the preset mean model, the target training set and the preset quality control requirement, wherein the quality control model is used for carrying out real-time quality monitoring on the inspection data of the medical institution.
In one possible design, segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces, including:
calculating a loss function value corresponding to each attribute feature by using a preset loss function model, wherein the loss function value is used for representing the importance degree of the attribute feature to a segmentation feature space;
screening one or more target characteristics from the attribute characteristics according to a preset segmentation rule and a loss function value;
and carrying out one or more times of segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces.
In one possible design, the attribute feature corresponds to a plurality of feature values, and the predetermined loss function model includes: an information entropy model, or a Gini coefficient model, or a sum of squared errors model;
when the loss function model is the information entropy model, calculating the loss function value corresponding to each attribute characteristic by using a preset loss function model, wherein the method comprises the following steps:
calculating the information entropy of each attribute feature according to each feature value of each attribute feature and an information entropy formula, and taking the information entropy as a loss function value;
when the loss function model is a kini coefficient model, calculating a loss function value corresponding to each attribute feature by using a preset loss function model, wherein the method comprises the following steps:
calculating a kini coefficient of each attribute characteristic according to each characteristic value of each attribute characteristic and a kini coefficient formula, and taking the kini coefficient as a loss function value;
when the loss function model is the sum of squared errors model, calculating the loss function value corresponding to each attribute feature by using a preset loss function model, wherein the method comprises the following steps:
calculating a mean value corresponding to each characteristic value under each attribute characteristic, and taking the difference between the characteristic value and the mean value as an error; the sum of the squares of the individual errors is taken as the loss function value.
In one possible design, the preset segmentation rule includes: sorting the importance of all attribute features corresponding to the feature space for one time, then dividing the dimension corresponding to the attribute features for multiple times according to the sorting sequence, and dividing the space to be divided from one dimension for each division;
screening one or more target characteristics from the attribute characteristics according to a preset segmentation rule and a loss function value; performing one or more times of segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces, including:
determining one or more target features and the importance ranking corresponding to each target feature according to the size of each loss function value;
and according to the importance ranking, sequentially and respectively segmenting the feature space at the corresponding dimensionality of the target feature to obtain a plurality of subspaces.
In another possible design, the preset segmentation rule includes: repeatedly calling a recursive segmentation mode to segment the characteristic space for multiple times, wherein the loss function values of all attribute characteristics of the space to be segmented are recalculated during each segmentation, the dimension of the attribute characteristic with the minimum loss function value is selected, the space to be segmented is segmented to obtain a plurality of new spaces to be segmented, and repeated segmentation is performed in the recursive segmentation mode until the losses of all attribute characteristics of the current space to be segmented meet the requirement of stopping segmentation of the function values;
screening one or more target characteristics from the attribute characteristics according to a preset segmentation rule and a loss function value; performing one or more times of segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces, including:
recursion segmentation is carried out on the characteristic space circularly, and in each recursion segmentation, the segmentation space obtained by the last segmentation is used as the current space to be segmented;
calculating a first loss function value of each first attribute characteristic corresponding to the current space to be divided by using a preset loss function model;
taking the first attribute characteristic corresponding to the minimum first loss function value as a target characteristic;
determining a segmentation threshold value on a dimension corresponding to the target feature according to a preset loss function model and each first feature value corresponding to the target feature;
segmenting the current space to be segmented according to a segmentation threshold value to obtain a new space to be segmented;
judging whether a second loss function value of each second attribute characteristic corresponding to the new space to be divided meets a preset stopping requirement or not according to a preset loss function model;
if so, stopping the segmentation, otherwise, taking the new space to be segmented as the current space to be segmented, and repeatedly carrying out recursive segmentation.
In one possible design, determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model includes:
calculating the mean value of historical detection values in each sample data, and taking the mean value as a predicted value;
determining a target training set according to the predicted value and each sample data in the training data set, wherein the method comprises the following steps:
calculating the difference value between each historical detection value and the corresponding prediction value;
all differences are combined into a target training set.
Optionally, the preset mean value model includes: the index weighted floating mean model is characterized in that the preset quality control requirement comprises the following steps: the false positive rate corresponding to the quality control model is less than or equal to a false positive rate threshold value;
according to a preset mean value model, a target training set and preset quality control requirements, a quality control model is constructed and trained, and the method comprises the following steps:
inputting target training data in a target training set into an exponential weighted floating mean model to obtain a plurality of weighted floating means;
and adjusting the control upper limit value and/or the control lower limit value so that the ratio of the first weighted floating mean value which is larger than the control upper limit value and smaller than the control lower limit value to all weighted floating mean values is smaller than or equal to the false positive rate threshold value.
In a second aspect, the present application provides a medical examination and quality control method, comprising:
acquiring real-time inspection data of a target medical institution;
judging the quality control state of the real-time inspection data through a preset quality control model, wherein the preset quality control model comprises any one possible trained quality control model provided by the first aspect;
and if the quality control state is an out-of-control state, sending early warning information to the target medical institution.
The application provides a medical quality control model training method. The technical problem of how to establish a standard reference system for the medical data quality control method is solved. The data clustering processing result with higher precision is used as the labeling reference system of the medical quality control method, so that the technical effects of the quality control method on identification accuracy and identification stability of various types of medical data are improved.
In a third aspect, the present application provides a medical examination and quality control system, comprising:
an acquisition module for acquiring a training data set, the training data set comprising: sample data collected by a plurality of medical institutions with different quality control levels;
a model training module to:
determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set;
segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set;
and constructing and training a quality control model according to the preset mean model, the target training set and the preset quality control requirement, wherein the quality control model is used for carrying out real-time quality monitoring on the inspection data of the medical institution.
In one possible design, the acquisition module is further configured to acquire real-time inspection data of the target medical institution;
the medical inspection quality control system further comprises:
the quality control monitoring module is used for:
judging the quality control state of the real-time inspection data through a quality control model;
and if the quality control state is an out-of-control state, sending early warning information to the target medical institution.
In a fourth aspect, the present application provides an electronic device, comprising:
a memory for storing program instructions;
and the processor is used for calling and executing the program instructions in the memory, and executing any one of the possible medical quality control model training methods provided by the first aspect and/or any one of the possible medical quality control methods provided by the second aspect.
In a fifth aspect, the present application provides a storage medium, which can be read to store a computer program for executing any one of the possible medical examination quality control model training methods provided in the first aspect and/or any one of the possible medical examination quality control methods provided in the second aspect.
In a sixth aspect, the present application further provides a computer program product, which includes a computer program, and when executed by a processor, the computer program implements any one of the possible medical inspection quality control model training methods provided in the first aspect and/or any one of the possible medical inspection quality control methods provided in the second aspect.
The application provides a medical science inspection quality control model training method, a medical science inspection quality control method and a medical science inspection quality control system, wherein a training data set is obtained, and a multidimensional feature space corresponding to the training data set is determined according to a plurality of attribute features corresponding to the training data set, and the training data set comprises: sample data collected by a plurality of different levels and/or different types of medical institutions; segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set; and constructing and training a quality control model according to a preset mean model, a target training set and preset quality control requirements, and then carrying out real-time quality monitoring on the real-time inspection data of the medical institution by using the trained quality control model. Regression analysis is performed on the whole part of the characteristic space, and the technical problem that the recognition accuracy of the quality control model is not high enough is solved. The technical effects of accurately reducing the control limit of the quality control model and improving the identification precision and the identification accuracy are achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flowchart of a method for training a quality control model for medical examination according to an embodiment of the present application;
fig. 2 is a schematic distribution diagram of a training data set in a feature space according to an embodiment of the present application;
fig. 3 is a schematic diagram of a segmentation of the feature space shown in fig. 2 according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating another method for training a quality control model for medical examination according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart of another segmentation method for the feature space in S406 according to the embodiment of the present application;
FIG. 6-1a is a schematic diagram illustrating the fluctuation of an Albumin (ALB) test value when no regression analysis is performed on the training data of the quality control model provided in the present application;
FIG. 6-1b is a schematic diagram illustrating fluctuation of an Albumin (ALB) test value in a linear regression analysis of training data of a quality control model in a feature space as a whole according to the present application;
6-1c are graphs showing the fluctuation of the Albumin (ALB) test value when the non-linear regression analysis is performed in the subspace after the feature space is segmented for the training data of the quality control model provided by the present application;
FIG. 6-2a is a graph showing the fluctuation of White Blood Cell (WBC) test values without regression analysis of the training data of the quality control model provided in the present application;
FIG. 6-2b is a schematic diagram illustrating the fluctuation of White Blood Cell (WBC) test values when the training data of the quality control model is subjected to linear regression analysis in the feature space as a whole according to the present application;
6-2c are graphs showing the fluctuation of White Blood Cell (WBC) test values when non-linear regression analysis is performed in subspace after feature space segmentation for the training data of the quality control model provided in the present application;
FIG. 7 is a schematic flow chart illustrating a method for quality control of medical examinations according to an embodiment of the present disclosure;
FIG. 8 is a schematic structural diagram of a medical examination and quality control system according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device provided in the present application.
Specific embodiments of the present application have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, including but not limited to combinations of embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any inventive step are within the scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Various medical institutions, such as hospital laboratories, clinical laboratory, medical research institutions, third party laboratories, etc., may have detection errors at any time during the process of detecting and analyzing medical data. These detection errors may be caused by manual misoperations, errors in instrumentation, or the search statistical analysis algorithm itself. Some of these contributors may be discoverable, while more are quite hidden, or are referred to as unknown contributors.
The principle of real-time quality control based on patient data is that sample data is firstly subjected to truncation or tailing treatment, then the sample data with biased distribution is converted into normal distribution, and then the normal distribution is input into a preset mean value model, and a mean value, such as a floating mean value, a weighted mean value and the like, is calculated, so that an acceptable distribution range of a real detection value can be obtained, for example, a numerical range corresponding to a control limit controlled by a false positive rate, and if the numerical range is exceeded, the quality control state is determined to be in an out-of-control state, and the real-time quality control based on the patient data tries to weaken or eliminate the influence of various detection errors through the mode.
However, some further studies show that the quality control identification accuracy of real-time quality control based on patient data is still not high enough, and it is still difficult to achieve satisfactory quality control identification accuracy for some inspection items with large autocorrelation influence. In the related art, the influence of autocorrelation is eliminated by applying a conventional statistical regression analysis to patient data.
However, the inventor of the present application has found through intensive research that, although the linear regression equation can be used to find the distribution trend of the sample data to a certain extent, in practical application, on one hand, the basic logic of the linear regression is that the data is assumed to be distributed according to a linear rule, but in practical situations, the distribution trend of the sample data is not linear distribution when the data is distributed according to a lot of data, so that the nonlinear distribution is analyzed by using the linear regression in the related art, and an ideal regression analysis effect is certainly not achieved, and the defect exists in the logic; on the other hand, the number of independent variables corresponding to the influence factors in the linear regression equation determines the complexity of the linear regression equation, and when the number of the independent variables is too much, the linear regression equation cannot be calculated in practical application, or the calculation resources consumed by the calculation of the linear regression equation are too large, the calculation efficiency is low, so that the final quality control model cannot be obtained, and the quality control scheme cannot be implemented on the ground.
Obviously, the linear regression analysis processes sample data distribution in the field of medical quality control too simplistically, so that the problem can be solved if the regression equation is not designed into a curve equation? The inventor of the present application finds that the essence of the above problem is that sample data of different test items are influenced by different influencing factors, and the distribution trends of the sample data are not necessarily the same, and whether linear regression equation or nonlinear regression equation in the form of curves such as quadratic curve regression equation, logarithmic curve regression equation, exponential curve regression equation, etc. is adopted, these conventional processing methods assume that the sample data is distributed according to a certain fixed trend, but in practical application, we may not estimate at all whether the true distribution trend is linear or nonlinear, and the regression equation is any one of straight line, quadratic curve, logarithmic curve, exponential curve, etc.
Therefore, for the sample data of medical quality control influenced by various influencing factors, a new regression analysis mode must be adopted, and the regression trend of the sample data cannot be simply predicted to be a straight line or a curve. The invention conception of the application is as follows:
the inventor of the application finds that when a plurality of influence factors cause detection errors and cannot be exhausted or all the influence factors cannot be found, the distribution of sample data in the whole feature space presents a disordered form, or the entropy value of the sample data on the feature space is very high, and the higher the entropy value is, the more disordered the information expressed by the sample data is. Then, the entropy of the sample data needs to be reduced, so that more accurate regression analysis can be performed on the distribution trend of the sample data. The inventor of the present application finds that entropy of sample data in a subspace after segmentation can be reduced by segmenting the feature space corresponding to the sample data. Of course, whether the entropy value is decreased or not and the size of the decreased amplitude are related to the segmentation mode, after multiple segmentations are performed, when the entropy value of the sample data corresponding to the subspace is decreased to a certain degree, the distribution trend of the sample data in the subspace can be obviously observed, the regression analysis performed on the subspace is relatively simple, and the accuracy is relatively high. Therefore, relevance among various influence factors can be separated in a mode of breaking up the whole into parts, the residual error of the sample data can be calculated only by correspondingly distributing the newly acquired sample data to a certain subspace, the residual error is used for replacing the detection value of the sample data and is input into a mean value model of real-time quality control based on patient data, and the control upper limit and the control lower limit corresponding to medical quality control are set more accurately through fluctuation of the mean value of the residual error, so that the identification accuracy of the quality control model can be effectively improved.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a medical examination quality control model training method according to an embodiment of the present application. As shown in fig. 1, the specific steps of the medical quality control model training method include:
s101, a training data set is obtained, and a multi-dimensional feature space corresponding to the training data set is determined according to a plurality of attribute features corresponding to the training data set.
In this step, the spatial dimension of the feature space corresponds to the attribute feature, that is, one attribute feature corresponds to one dimension, and the training data set includes: sample data collected by a plurality of different levels and/or different types of medical institutions is distributed in the feature space, and the sample data can be regarded as one point in the feature space.
In this embodiment, one possible implementation manner of this step includes:
firstly, collecting historical test data of test items from 11 clinical laboratories of different levels and/or different types of medical institutions; then, data cleaning is carried out on the historical inspection data, invalid values or obviously abnormal values are removed, and data formats or units of the same inspection item are unified; next, performing feature engineering processing on the cleaned historical inspection data, extracting various attribute features of the historical inspection data through a feature correlation analysis model, wherein the correlation between the various attribute features is lower than a preset correlation threshold, that is, the various attribute features are independent from each other, it should be noted that each attribute feature may correspond to a plurality of feature values, for example: gender, corresponding to two characteristic values of 'male' and 'female', age, corresponding to 121 characteristic values of '0-120', and the characteristic values can be continuous numerical values such as age or discrete values such as gender; and then, encoding the characteristic value of the text type according to a preset encoding mode, so that the historical inspection data is converted into a characteristic vector, the space corresponding to the characteristic vector is a characteristic space, each dimension of the characteristic space corresponds to each attribute characteristic, and the characteristic vector can be understood as sample data of a training data set.
In this embodiment, 8 attribute features are extracted from the historical verification data, including: gender, age, patient type, diagnostic information, department type, laboratory test results, time of report, equipment brand.
It should be noted that in this embodiment, the quality control levels of the plurality of medical institutions are different, and the sources of the patients are different. Compared with the existing medical quality control scheme, such as a real-time quality control method based on patient data, the existing quality control model is used for training, generally, only one hospital is selected or only data of multiple hospitals in one region are selected, the requirements on diversity and representativeness of sample data for training are ignored, the application range of the trained quality control model is small, the universality of the model is poor, the real-time quality control method based on the patient data is popularized, the quality control model needs to be built for each hospital or each region, the cost is high, the efficiency is low, and the floor-to-ground implementation of the quality control scheme is seriously influenced. And this application embodiment adopts a plurality of different grades and/or different grade's medical institution, just so can follow the medical institution of different quality control levels and sample, because the quality control level is different, the procedural influence factor that introduces detection error is also not identical, and because the patient source of each medical institution is different, the individual difference fluctuation of the patient that introduces detection error is also not identical, just so can greatly enrich the variety of sample, bring more influence factors into the training process of quality control model, make the application scope of quality control model after the training wider, the universality is higher, the stability and the robustness of quality control model are also better.
In the related technology in the field of medical quality control, after a training data set is obtained, sample data of the training data set is assumed to obey linear regression distribution, then a regression equation corresponding to the sample data is obtained, and the influence of some influence factors of detection errors is identified through linear regression analysis. However, the linear regression assumption can reduce the influence of some or some of the influence factors of the detection error to some extent, such as reducing the influence of autocorrelation, but many other influence factors are ignored by the linear assumption, so that the recognition accuracy of the trained quality control model is not high enough. Therefore, in order to solve the problem, a new regression analysis mode is adopted, that is, the feature space is segmented, so that the information entropy of the segmented subspace is reduced, and thus detection errors caused by different effects of more influencing factors are separated, and then S102 to S104 are the new regression analysis process provided by the embodiment of the present application.
S102, segmenting the feature space in the dimensionality corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces.
Fig. 2 is a schematic distribution diagram of a training data set in a feature space according to an embodiment of the present application. As shown in fig. 2, for convenience of understanding, it is assumed that one coordinate axis (horizontal axis or vertical axis) represents attribute features of at least two dimensions, so that the feature space 100 can be represented by a plane area, and the distribution of each sample data in the training data set in the feature space is the position of each data point 101 on the plane. As shown in fig. 2, it is obvious that the distribution of each data point 101 does not satisfy the linear relationship, and it is also difficult to predict the distribution of each data point 101 with a curve.
Therefore, in the embodiment of the present application, a method of breaking up the whole into parts is adopted, and the feature space is segmented, so that the information entropy of the feature space, that is, the chaos degree of sample data distribution in the feature space is reduced.
Fig. 3 is a schematic diagram of segmenting the feature space shown in fig. 2 according to an embodiment of the present application. As shown in fig. 3, the feature space 100 is illustratively divided into four subspaces from the horizontal axis direction and the vertical axis direction, including: a first subspace 110, a second subspace 120, a third subspace 130 and a fourth subspace 140. After such a segmentation, for each subspace, it can be seen that the degree of misordering of the data points 101 distributed in the subspace is reduced from the degree of misordering distributed in the entire feature space 100. The dotted lines in the first subspace 110, the second subspace 120, the third subspace 130 and the fourth subspace 140 may be understood as regression lines or regression curves for each subspace.
It should be understood that fig. 3 is only a schematic cut, and those skilled in the art may use a cut with a vertical coordinate axis, or may use a non-vertical cut, or may even use a curved cut. And the information entropy or the kini coefficient can be used as an evaluation index of the segmentation position and/or the segmentation mode.
Assuming that fig. 3 is the subspace obtained by the optimal partition, then, regression analysis may be performed on the four subspaces, for example, taking a mean value of sample data in each subspace as a predicted value, or calculating a linear or nonlinear regression equation corresponding to each subspace, so as to obtain a predicted value of each subspace. It should be noted that the purpose of the regression analysis is to predict the distribution of the sample data to obtain a predicted value, and the calculation of the regression equation is only one of the ways of obtaining the predicted value, but not the only way, that is, a person skilled in the art may also obtain the predicted value by using other regression analysis ways, such as a machine learning way and a neural network analysis way, without calculating the regression equation, which all belong to the protection scope of the embodiments of the present application.
It should be further noted that, because the feature space in this embodiment is a high-dimensional space, if one attribute feature is used to correspond to one coordinate axis, the high-dimensional feature space cannot be drawn by an image, and in order to facilitate understanding of the segmentation manner of this step on the high-dimensional feature space, a possible implementation manner of this step specifically includes:
and S1021, calculating a loss function value corresponding to each attribute characteristic by using a preset loss function model.
In this step, the loss function value is used to characterize the degree of importance of the attribute feature to the split feature space. The preset loss function model comprises: an information entropy model, or a kini coefficient model, or a sum of squared errors model, etc., where the smaller the loss function value, the higher the degree or level of importance of the attribute feature.
When the loss function model is an information entropy model, the information entropy of each attribute feature can be calculated according to each feature value of each attribute feature and an information entropy formula, and the information entropy is used as a loss function value, and the calculation formula of the information entropy is shown as formula (1):
wherein H (x) represents the information entropy corresponding to the attribute feature, i.e. the loss function value, P i (x) Representing the probability or proportion of each feature value occurring in the feature space.
For example, for the attribute feature of gender, the corresponding feature values are "male" and "female", and it is assumed that "male" is represented by code 1 and "female" is represented by code 0. Then, i in the formula (1) has two values, which are 0 and 1 respectively. P i (x) The male and female proportions in the training data set are then assigned.
When the loss function model is a kini coefficient model, the kini coefficient of each attribute feature may be calculated according to each feature value of each attribute feature and a kini coefficient formula, and the kini coefficient may be used as a loss function value, where the calculation formula of the kini coefficient is shown in formula (2):
wherein G is a loss function value which is a Gini coefficient corresponding to the attribute characteristics, P i For each probability or proportion of occurrence of the feature value in the feature space. For example, for the attribute feature of gender, the corresponding feature values are "male" and "female", and it is assumed that "male" is represented by code 1 and "female" is represented by code 0. Then, i in the formula (1) has two values, which are 0 and 1 respectively. P is i The male and female proportions in the training data set are then assigned.
When the loss function model is the sum of squared errors model, a mean value corresponding to each feature value under each attribute feature can be calculated, and the difference between the feature value and the mean value is taken as an error; taking the sum of squares of the respective errors as a loss function value, the calculation formula of the sum of squares of the errors is shown as formula (3):
wherein SE (x) is the sum of squares of errors corresponding to the attribute features,is the mean value, x, of all the characteristic values under the attribute i Is any characteristic value under the attribute characteristic.
And S1022, screening one or more target characteristics from the attribute characteristics according to the preset segmentation rule and the loss function value.
And S1023, performing one-time or multi-time segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces.
For S1022 to S1023, those skilled in the art can define the preset segmentation rule according to actual situations, and there are many embodiments, for example:
(1) And presetting a segmentation rule to segment only on the dimension corresponding to one attribute feature at a time.
At this time, the target feature corresponding to each segmentation is one, at this time, the attribute feature when the loss function value is the minimum value in S1021 may be selected as the target feature, and one or more segmentations may be performed in the dimension of the target feature, for example, one segmentation includes: one of the characteristic values in the target characteristics is selected as a segmentation point, for example, the target characteristic of age, and any one of the values in the value range of 0 to 99 can be selected as the segmentation point for segmentation. For another example, multiple segmentations may also be performed on the same target feature dimension, including; and selecting a plurality of segmentation points from all the characteristic values in the target characteristic for segmentation. And after the target feature segmentation is finished, selecting the residual attribute features with the minimum loss function value to continue the next segmentation, and stopping the segmentation until the loss function value of the subspace obtained after the segmentation is less than or equal to a preset threshold value.
It should be noted that, as for the loss function values of the attribute features, there are two calculation modes, one is to calculate the loss function values of all attribute features of the whole feature space only once initially, and then do not update any more, and the other is to recalculate the loss function values of the attribute features included in the space to be divided next time after each division is completed, and then select the attribute feature corresponding to the minimum loss function value as the target feature when the space to be divided is divided.
It should be noted that, in the segmentation, at least one optimal segmentation point should exist theoretically, and then the optimal segmentation point can be solved as an optimization problem, for example, after arbitrarily taking a feature value to perform segmentation, at least two subspaces are obtained, loss function values of the two subspaces are respectively obtained, then all feature values are sequentially segmented, so that loss function values of the subspaces after all feature values are segmented can be obtained, and the feature value point with the smallest loss function value is selected as the optimal segmentation point.
(2) And presetting a segmentation rule to segment in the dimensionality corresponding to the attribute characteristics every time.
At this time, at least two target features are corresponding to each segmentation, at this time, the loss function values in the step S1021 are arranged from small to large, and then the attribute features with the top N bits are selected as the target features, where N is greater than or equal to 2.
S103, determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model.
In this step, the distribution trend of the sample data in the subspace obtained after the feature space is cut is relatively clear, that is, the entropy value of the sample data in the subspace is relatively small, and the distribution of the sample data is relatively ordered, as shown in fig. 3, the data distribution in each subspace can be predicted by using different curve regression equations for the first subspace 110, the second subspace 120, and the fourth subspace 140, and the data distribution therein can be predicted by using a linear regression equation for the third subspace 130, so that the predicted value corresponding to each subspace can be obtained.
In this embodiment, in addition to the regression equation, the mean value of the history detection values in each sample data may be calculated and taken as the predicted value. Therefore, the resource consumption of the calculation regression equation can be reduced, and the processing efficiency is improved.
And S104, determining a target training set according to the predicted value and each sample data in the training data set.
In this embodiment, the difference between the historical detection value in each sample data and the corresponding predicted value obtained in S103, that is, the residual, is combined into the target training set.
That is, different from the existing real-time quality control method based on patient data, the method for training the quality control model based on the residual error of the original data is not the original value of historical detection, and the quality control model judges that the quality control of the medical institution is out of control when the identification data exceeds the interval corresponding to the control limit. The residual error is used as training data, so that some test items or analytes with skewed distribution can be converted into normal distribution, and the recognition sensitivity and accuracy can be obviously improved.
And S105, constructing and training a quality control model according to the preset mean value model, the target training set and the preset quality control requirement.
In this step, the quality control model is used for real-time quality monitoring of the inspection data of the medical institution. The preset mean value model comprises: a floating Average Model (MA), a floating Median Model (MM), an Exponentially Weighted floating Average model (EWMA), a floating Standard Deviation model (MovSD), a floating quantile, and a floating outlier patient Number model (MovSO).
In this embodiment, assuming that the preset mean model is an exponentially weighted floating mean model, the preset quality control requirement includes: and the false positive rate of the quality control model is less than or equal to a false positive rate threshold value when the quality control model is used for carrying out a quality control test.
Specifically, one possible implementation of this step includes:
inputting target training data in a target training set into an exponential weighted floating mean model to obtain a plurality of weighted floating means;
and adjusting the control upper limit value and/or the control lower limit value so that the ratio of the first weighted floating average value which is greater than the control upper limit value and less than the control lower limit value to all weighted floating average values is less than or equal to the false positive rate threshold value.
The embodiment of the application provides a training method of a medical quality control model, which comprises the steps of obtaining a training data set, determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set, wherein the training data set comprises: sample data collected by a plurality of different levels and/or different types of medical institutions; segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set; and constructing and training a quality control model according to a preset mean model, a target training set and preset quality control requirements, and then carrying out real-time quality monitoring on the real-time inspection data of the medical institution by using the trained quality control model. Regression analysis is carried out on the whole part of the characteristic space, and the technical problem that the recognition accuracy of the quality control model is not high enough is solved. The technical effects of accurately reducing the control limit of the quality control model and improving the identification precision and the identification accuracy are achieved.
Fig. 4 is a flowchart illustrating another method for training a quality control model for medical examination according to an embodiment of the present disclosure. As shown in fig. 4, the specific steps of the medical quality control model training method include:
s401, obtaining historical test data from a plurality of different levels and/or different types of medical institutions.
In this step, the quality control levels of a plurality of medical institutions including a laboratory or a clinical laboratory of a hospital are different and the sources of patients are different.
In this embodiment, the data collection is organized by the clinical testing center of beijing, hebei, tianjin and shandong province in china, and the participating laboratories include: from the laboratory of the medium-sized hospital to the laboratory of the large-sized hospital, the laboratory of the specialist hospital and the private hospital is also included, and the historical examination data of 11 medical institutions in total. The respective historical test data is automatically extracted every day by each laboratory or manually extracted every week or month, and then transmitted to a pre-designated database through mail or other data transmission channels.
It is noted that in this example, the results of the measurement of 10 common analytes in serum, plasma or whole blood were collected, including: white Blood Cells (WBC), red Blood Cells (RBC), hematocrit (HCT), hemoglobin (HB), platelets (PLT), aspartate Aminotransferase (AST), alanine Aminotransferase (ALT), blood Glucose (GLU), total Protein (TP), albumin (ALB). These analytes were chosen because they are traceable standardized analytes and are the most commonly required analytes to be detected in the field of medical hygiene, which may represent a distribution of different test items common in laboratory medicine.
S402, preprocessing the historical inspection data.
In this step, the pretreatment comprises: data cleaning, data format or unit unification, error simulation and data encoding.
Data cleansing refers to the removal of data lacking patient information such as gender, age, paradoxical values such as age 120 or older, meaningless symbolic data such as "-", "/", etc.
The unified data format or unit is to unify the expression formats and units of the same test item or analyte uploaded by each medical institution.
Error simulation means that the historical detection value of the manually confirmed accurate test item or analyte is taken as a standard value, and a training value with a known deviation degree is obtained by using a formula (4), wherein the formula (4) is as follows:
x′=x×(1+P),
(P=-50%,-48%,-46%,....,46%,48%,50%) (4)
wherein x is a historical detection value, x' is a training value added with deviation, and P is a deviation rate.
The data coding means converting the text type data into numerical values according to a preset coding mode. Specifically, feature extraction is performed on the historical inspection data to obtain a plurality of attribute features, and then feature values of the attribute features are respectively encoded.
In this embodiment, with 8 attribute features, the encoding method of each attribute feature is as follows:
1. hospital grade: trimethyl → 3, dimethyl v2.
2. Department of hospital: and dividing the scores into three levels, calculating the inspection average value of each department, solving a quartile, assigning a value of 1 if the score is greater than the upper quartile, assigning a value of-1 if the score is less than the lower quartile, and assigning a value of 0 if the score is less than the upper quartile and greater than the lower quartile.
3. Type of patient: outpatient → 1, hospitalization → 2, emergency → 3, physical examination → 4.
4. Sex of the patients: male → 1, female → 2.
5. Age: the total age is 1, with the upper limit of 120 years, less than 1 year by 0 years, more than 1 year by 2 years by 1 year.
6. And (3) detection date: 1. the value is 1 in 2 and 3 months; 4. the value is 2 in 5 and 6 months; 7. the value is 3 in 8 and 9 months; 10. month 11 and 12 were assigned a value of 4.
7. Detection time: 24:00-6: a value of 00 is 1;6:00-12: a value of 00 is 2;12:00-18: a value of 00 is 3;18:00-24: a value of 00 is assigned to 4.
8. The instrument name: and (4) classifying and coding according to brands.
And S403, randomly scattering the preprocessed data.
In this step, the random breaking treatment includes: the preprocessed data are divided into a training set and a testing set, optionally, the training set can be further processed by N folds to obtain a plurality of first training sets and at least one detection set during training, so that the randomness of data distribution in the first training sets can be better, and the training effect of subsequent quality control models can be improved. It should be noted that the detection set is data used when the quality control quality of the quality control model is tested during the training of the quality control model, and the test set is test data used for further verifying the application range and the stability of the quality control model after the training of the quality control model is completed.
For example, from the historical examination data of 11 hospitals in S401, data of 8 hospitals are randomly selected as a training data set, and the other three are selected as a test set.
S404, screening and state changing are carried out on the scattered data to obtain a training data set, and a multi-dimensional feature space is determined according to a plurality of attribute features corresponding to the training data set.
In this embodiment, the screening the data includes: according to preset truncation parameters, such as an upper truncation limit and a truncation limit, performing truncation and/or tail shortening on each first training set; transitioning the data includes: the distribution morphology of the first training set is transformed into the morphology of a normal distribution by a BOX-COX transformation.
The acquisition of the training data set has been completed, i.e. S401 to S404 correspond to a specific embodiment of the acquisition of the training data set in S101.
S405, calculating a loss function value corresponding to each attribute feature in a feature space corresponding to the data set to be divided by using a preset loss function model.
In this step, the loss function value is used to represent the importance level of the attribute feature, and the smaller the loss function value, the higher the importance level.
In this embodiment, the calculation manner of the loss function value may refer to the formulas (1) to (3) in S1021, and is not described herein again.
After the loss function values corresponding to each attribute feature are arranged from small to large, the importance levels of different attribute features influencing the segmentation effect of the feature space can be obtained, namely, the feature space is segmented in the dimensionality corresponding to different attribute features, and whether the obtained subspace is convenient for regression analysis or not is different in effect. Or the segmentation order of the feature space in different dimensions is determined by the importance level of the attribute features.
S406, one or more target features are screened out from the attribute features according to the preset segmentation rule and the loss function value, and the feature space is segmented for one time or multiple times in the dimension corresponding to the target features to obtain multiple subspaces.
In this embodiment, the preset segmentation rule includes: the importance of all attribute features corresponding to the feature space is sorted once, then the attribute features are divided for multiple times in the dimension corresponding to the attribute features according to the sorting sequence, and the to-be-divided space is divided from one dimension in each division.
The method specifically comprises the following steps:
s4061, according to the size of each loss function value, one or more target features and the importance ranking corresponding to each target feature are determined.
S4062, according to the importance ranking, the feature space is sequentially and respectively segmented at the corresponding dimensionality of the target feature, and a plurality of subspaces are obtained.
Specifically, the dimension corresponding to the target feature with the highest importance rank is used as the dimension of the target to be segmented every time, and then multiple times of segmentation are sequentially performed according to the ranking sequence, wherein during each segmentation, the optimal segmentation point needs to be determined on the target dimension according to the optimal segmentation rule.
Optionally, the optimal segmentation rule includes:
for target features only containing discrete feature values, selecting each discrete feature value as a segmentation point to be selected, then calculating a loss function value of a subspace obtained after each segmentation point to be selected is segmented, and selecting the segmentation point to be selected corresponding to the minimum value as an optimal segmentation point;
for a target feature with a feature value of a continuous value, selecting a to-be-selected segmentation point between any two adjacent sample values (i.e., sample feature values serving as training data) on the target feature in a feature space, for example, taking a midpoint, a trisection point, a quarteration point, and the like of the two adjacent sample values as the to-be-selected segmentation point, then calculating a loss function value of the subspace obtained after each to-be-selected segmentation point is segmented, and selecting the to-be-selected segmentation point corresponding to the minimum value as an optimal segmentation point.
For example, for 8 attribute features in S402, after sorting the loss function values corresponding to the 8 attribute features from small to large, the obtained importance ranking order is as follows: age, hospital department, patient gender, date of testing, time of testing, patient type, hospital grade, instrument name.
Then, according to the above sequence, for the first segmentation, the feature space is segmented in the dimension where the age is located, that is, at the time of the first segmentation, the age is the target feature. And (4) according to the value range of the characteristic value corresponding to the age in the S402, which is [0,120], finding the optimal segmentation point in [0,120], namely solving the optimal segmentation point problem.
In order to facilitate understanding, in the embodiment, each segmentation is performed in a binary segmentation manner, that is, the space to be segmented is segmented into two new spaces to be segmented, but the dimensionality of each segmentation is different, that is, the target characteristics are different, and finally, a subspace is obtained.
For example, firstly, assuming that the numerical value of each specific age in a training data set, namely a data set to be segmented, namely the characteristic value of the age is a segmentation point, calculating two new loss function values of a space to be segmented obtained after segmentation, and assuming that 1000 segmentation points exist, 2000 loss function values exist; and then, selecting a dividing point corresponding to the minimum loss function value as an optimal dividing point.
In this embodiment, assuming that the optimal segmentation point in the attribute feature of age is 69.5 years old, the first segmentation of the feature space is to combine sample data less than or equal to 69.5 years old into a new space to be segmented, which is denoted as A1 space to be segmented, and combine sample data greater than 69.5 years old into another new space to be segmented, which is denoted as A2 space to be segmented.
And then, performing second segmentation, wherein the target characteristic of the second segmentation is hospital department according to the importance ranking, and similarly, respectively solving the optimal segmentation point in all the characteristic values corresponding to the hospital department in the A1 to-be-segmented space and the A2 to-be-segmented space to obtain different first optimal segmentation point and second optimal segmentation point. Further, the space to be divided A1 is divided into a space to be divided B1 and a space to be divided B2 according to the first optimal dividing point; and cutting the space to be cut A2 into a space to be cut C1 and a space to be cut C2 according to the second optimal cutting point.
And then, continuously and respectively carrying out binary segmentation on the space to be segmented B1, the space to be segmented B2 and the space to be segmented C1 in the dimensionality corresponding to the sex of the patient. The binary segmentation is recursively executed in the above manner until the stop segmentation condition is satisfied.
The cutting stopping condition comprises the following steps: and completing segmentation in the last dimension, namely the dimension corresponding to the instrument name, or enabling the loss function value of the attribute dimension which is remained in the new space to be segmented and is not segmented to be less than or equal to a preset threshold value.
In this embodiment, for steps S405 to S406, a CART regression decision tree algorithm may be executed by using a scikie-leann toolkit to construct a binary decision tree, so as to specifically implement the partition of the feature space corresponding to the data set to be segmented.
S407, determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model.
And S408, determining a target training set according to the predicted value and each sample data in the training data set.
And S409, constructing and training a quality control model according to the preset mean model, the target training set and the preset quality control requirement.
For the steps S407 to S409, the implementation principle and the noun explanation thereof can refer to S103 to S105, which are not described herein again.
It should be noted that, in the training process of the quality control model in S409, the detection set in S403 and the preset false positive rate threshold are used to continuously adjust the hyper-parameters in the quality control model, which includes: and identifying whether the control upper limit and the control lower limit are in an out-of-control state, namely judging that the quality control out-of-control state occurs when the residual error or the error in the target training set exceeds the control range corresponding to the control upper limit and the control lower limit.
After the false positive rate detection of the detection set is passed, the quality control model also needs to pass the test of the historical detection values of other 3 hospitals which do not participate in the training, namely the application range, the universality, the identification stability and the robustness of the quality control model are tested by the detection set, and the quality control model is determined to be qualified after the training when the false positive rate is smaller than or equal to the false positive rate threshold value.
In the training method of the medical quality control model provided by this embodiment, importance ranking is performed on each attribute feature in the feature space corresponding to the training data set, then the feature space is sequentially segmented according to the importance ranking to obtain a plurality of subspaces, then regression analysis is performed on the subspaces, and then a target training set is formed according to the result of the regression analysis to train the quality control model. The method and the device realize separation of the influence factors of various detection errors in the whole feature space, solve the technical problem that the influence of various influence factors cannot be eliminated by performing linear regression on the whole feature space, and achieve the technical effects of improving the quality of training data in a target training set and improving the identification accuracy and the identification precision of a quality control model.
In the embodiment shown in fig. 4, S4061 and S4062 adopt a one-time sorting and multi-time segmentation method for the high-dimensional feature space. The application also provides another segmentation mode, namely a recursive segmentation mode of reordering each segmentation. The following is a detailed description by way of example.
Fig. 5 is a schematic flow chart of another segmentation method for the feature space in S406 according to the embodiment of the present application. As shown in fig. 5, in this embodiment, the preset segmentation rule corresponding to the segmentation mode of the feature space includes: and repeatedly calling a recursive segmentation mode, segmenting the characteristic space for multiple times, taking the segmented space obtained by the last segmentation as the current space to be segmented in each recursive segmentation, recalculating loss function values of all attribute characteristics of the current space to be segmented in each segmentation, selecting the dimension of the attribute characteristic with the minimum loss function value, segmenting the space to be segmented to obtain a plurality of new spaces to be segmented, and repeatedly segmenting in the recursive segmentation mode until the new spaces to be segmented meet the requirement of stopping segmentation.
The specific steps of the feature space segmentation method include:
s501, calculating a first loss function value of each first attribute characteristic corresponding to the current space to be divided by using a preset loss function model.
In this step, during the first segmentation, the feature space corresponding to the training data set is the current space to be segmented of this segmentation. After the first segmentation is executed, a plurality of segmentation spaces are obtained, and then when the segmentation spaces are segmented, the segmentation spaces are the current spaces to be segmented. The preset loss function model and the corresponding loss function value may refer to equations (1) to (3) in S1021.
For convenience of understanding, in this embodiment, assuming that the preset loss function model is a sum of squared errors model, the first loss function value of each attribute feature included in the space to be currently divided can be calculated by formula (3). For example, in the first cutting, the loss function values of all the attribute features of the feature space corresponding to the training data set need to be calculated.
Optionally, if it is specified that after the dimension corresponding to a certain attribute feature is segmented, the dimension is not segmented again, in the nth (N is greater than or equal to 2) segmentation, the number of attribute features corresponding to the current space to be segmented is less than the number of all attribute features of the feature space corresponding to the training data set.
And S502, taking the first attribute feature corresponding to the minimum first loss function value as a target feature.
In this step, after each first loss function value of the current space to be segmented is recalculated each time, the minimum value is selected, and the first attribute feature corresponding to the minimum value is used as the target feature of the current segmentation, that is, the current segmentation can be performed in the dimension of the target feature.
For convenience of understanding, in the present embodiment, the dimension of the target feature is subjected to binary segmentation, that is, only the current space to be segmented is divided into two segmentation spaces. Thus, each recursive segmentation is a binary segmentation, and the final result of multiple recursive segmentations can be represented by a binary tree. The current space to be partitioned is a node of the binary tree, the subspace which is not partitioned at all is a leaf node of the binary tree, and the whole feature space corresponding to the training data set can be understood as a root node of the binary tree.
S503, determining a segmentation threshold value on the dimension corresponding to the target feature according to the preset loss function model and each first feature value corresponding to the target feature.
In this step, each first characteristic value is used as a segmentation point to be selected, a loss function value corresponding to a segmentation space obtained after each segmentation point to be selected is calculated by using a first loss function model, the segmentation point to be selected corresponding to the minimum loss function value is selected as a target segmentation point, and a numerical value corresponding to the target segmentation point is a segmentation threshold value of the current segmentation.
In this embodiment, it is assumed that the preset loss function model is an error square sum model, if only one segmentation point divides the current space to be segmented into two segmentation spaces at each segmentation point, the error square sum of the two segmentation spaces is calculated according to formula (3), and then the two error square sums are added to obtain a third loss function value for evaluating the segmentation effect of the segmentation point, and the smaller the third loss function value is, the better the corresponding segmentation effect is. By using the principle, a third loss function value corresponding to each segmentation point to be selected corresponding to the target feature is calculated, and then the segmentation point to be selected corresponding to the third loss function value with the minimum value is selected as the target segmentation point.
S504, segmenting the current space to be segmented according to the segmentation threshold value to obtain a new space to be segmented.
In this embodiment, the current space to be divided is subjected to binary division according to the division threshold, that is, the current space to be divided is divided into two new spaces to be divided according to the division threshold being less than or equal to the division threshold and being greater than the division threshold.
And S505, judging whether a second loss function value of each second attribute characteristic corresponding to the new space to be divided meets a preset stop requirement or not according to the preset loss function model.
In this step, the requirement for stopping the slicing includes: all the dimensions of the attribute features of the original feature space are segmented, or the loss function values of the attribute features contained in the new space to be segmented are less than or equal to a preset stopping threshold, and optionally, the preset stopping threshold is 0.
And if the requirement for stopping the segmentation is met, stopping the segmentation, otherwise, taking the new space to be segmented as the current space to be segmented, and returning to S501 to repeatedly perform recursive segmentation.
The segmentation mode to the feature space that this application embodiment provided, compare with the segmentation mode in the embodiment shown in fig. 4, consider that segmentation may cause the change of the importance level of attribute feature at every turn, therefore, this embodiment all needs to recalculate the loss function value of every attribute feature that the space corresponds to waiting to be segmented at present at every turn, can obtain more accurate segmentation effect like this, obtain more accurate predicted value when being convenient for follow-up regression analysis in every subspace, thereby make the target training data quality in the target training data set of quality control model higher, the training effect is better, finally further improve the recognition accuracy and the accuracy of quality control model.
In order to verify and prove that the training method of the medical quality control model provided by the application can be used for better training the quality control model, namely the quality control model achieves a better training effect, the verification can be carried out by comparing the fluctuation condition of data distribution in a target training data set for finally executing the training of the quality control model, namely if the fluctuation amplitude of the data distribution in the target training data set is smaller, the better the training effect which can be achieved by the target training data set is proved.
FIG. 6-1a is a graph showing the fluctuation of the Albumin (ALB) test value in the absence of regression analysis on the training data of the quality control model provided in the present application. As shown in fig. 6-1a, the albein ALB test value, without regression analysis, fluctuated as training data in the range of about 500 to 4000, with a fluctuation range of about 3500.
FIG. 6-1b is a schematic diagram illustrating fluctuation of an Albumin (ALB) test value when linear regression analysis is performed on the whole feature space according to the training data of the quality control model provided in the present application. As shown in FIG. 6-1b, when the linear regression analysis, i.e., the conventional statistical regression analysis, was performed on the albumin ALB test values, the values fluctuated in the range of-1500 to 1000 as training data, and the fluctuation range was about 2500.
Fig. 6-1c are schematic diagrams illustrating fluctuation of an Albumin (ALB) test value when non-linear regression analysis is performed in a subspace after feature space segmentation is performed on training data of a quality control model provided by the present application. As shown in FIGS. 6-1c, the albumin ALB test values were subjected to the non-linear regression analysis provided herein as training data fluctuating in the range of about-400 to about 400, with a fluctuation range of about 800.
Fig. 6-2a is a graph illustrating the fluctuation of White Blood Cell (WBC) test values without regression analysis of the training data of the quality control model provided in the present application. As shown in fig. 6-2a, the WBC test values were not regressed and fluctuated as training data in the range of about-2 to 6, which was about 8.
Fig. 6-2b is a schematic diagram illustrating fluctuation of White Blood Cell (WBC) test values when linear regression analysis is performed on the training data of the quality control model in the feature space as a whole according to the present application. As shown in FIG. 6-2b, the WBC test values were subjected to a linear regression analysis, which is a conventional statistical regression analysis, as training data, with fluctuations in the range of about-1 to about 1, with fluctuations in the range of about 2.
Fig. 6-2c are graphs showing the fluctuation of White Blood Cell (WBC) test values of the training data of the quality control model in the non-linear regression analysis in the subspace after the feature space is segmented. As shown in FIGS. 6-2c, the WBC test values were subjected to the non-linear regression analysis provided herein as training data to fluctuate in the range of approximately-0.6 to 0.6, with a fluctuation range of approximately 1.2.
As can be seen from fig. 6-1a to fig. 6-2c, after the feature space where the training data is located is segmented, the nonlinear regression analysis is performed in the subspace, so that the influence of other interference factors on the training data can be effectively reduced, the fluctuation range of the training data is reduced, the training accuracy is higher, and the stability and the universality of the quality control model are correspondingly improved.
The comparative analysis of the methods provided by the above embodiments of the present application and the related art:
since the detection result of the patient is influenced by various factors, wherein inherent factors comprise physiological characteristics, health status, detection environment and the like, variable factors comprise errors in detection, the information of the quality control state is named as B, the information of other factors is named as A, and the result of the patient is named as C. In the spatial theory of information, there is C = a + B. The main idea of all PBRTQCs which do not perform regression analysis is to infer B by using the information of C, while IFCC 'PBRTQC only uses the information of C, and the space of the IFCC' PBRTQC has only one dimension, but C > B exists in the information energy, which undoubtedly brings great difficulty to the judgment process of quality control state. The L-RARQC method for performing linear regression analysis on the whole feature space establishes se:Sub>A regression model for the information of A and B, the process is essentially to fit the information of A into an equivalent value with the same concept of C, the effect of C-A is realized, and the process is obvious and helps the judgment of B to se:Sub>A great extent. The result defect of the L-RARQC method for performing linear regression analysis on the whole characteristic space has two factors, firstly, the information A cannot be absolutely complete in reality, and the completeness of the information A is also represented by the shade of color in the graph; the second drawback is that the linear fitting method causes a large error in the conversion process from a to C, the information of a has multiple dimensions, and the objective property of C cannot be accurately approached by simply linearly combining multiple elements. The method (ML-NL-RARQC for short) provided by the application adopts a nonlinear means to fit the equivalent value of A, and natural complexity exists between A and C, while ML, namely a NL nonlinear means of machine learning, is an important method for quantitatively learning complex knowledge, and can enable the equivalent value of A to be closer to the property of C. In summary, the judgment accuracy for B is directly related to the information amount dispersion, and the corresponding information dispersions of the three methods are: IFCC' PBRTQC > > L-RARQC > ML-NL-RARQC, therefore, the accuracy of quality control is ordered as: IFCC' PBRTQC < < L-RARQC < ML-NL-RARQC.
In summary, the purpose of the present application is to more comprehensively introduce other factors except for instruments into the quality control calculation, and the introduced method is to more accurately convert the other factors into equivalent result values and then convert the equivalent result values into residual errors through the actual result values. The information such as age, department, sex, etc. of the patient is not purely linear with the detection index, if linear regression is used, the individual randomness is influenced by the overall linear correlation; non-linear regression can be understood as a combination of multiple linear regression planes that divide different types of patients into different clusters, each cluster regressing individually, resulting in a more accurate mapping.
After the quality control model is trained by the training method of the medical quality control model, the medical quality control can be carried out by using the quality control model.
Fig. 7 is a schematic flowchart of a medical examination and quality control method according to an embodiment of the present application. As shown in fig. 7, the medical quality control method includes the specific steps of:
and S701, acquiring real-time inspection data of the target medical institution.
In this step, the target medical institution, such as a clinical laboratory of a certain hospital, uploads real-time inspection data acquired by each medical inspection device to the medical quality control system in real time, and the medical quality control system is preset with a quality control model trained by the training method of the medical quality control model. Therefore, the real-time quality control module of the medical quality control system can acquire real-time inspection data.
And S702, judging the quality control state of the real-time inspection data through a preset quality control model.
In this step, the quality control state includes: an on-control state and an off-control state.
The preset quality control model is the quality control model trained by the training method, the quality control model firstly calculates the difference value between the real-time test data and the corresponding predicted value, then compares the difference value, namely the error recognition rate under the residual error, with the upper control limit and the lower control limit, judges whether the difference value is greater than the upper control limit or less than the lower control limit, if so, judges that the quality control state is an out-of-control state, otherwise, the quality control state is an in-control state.
And S703, if the quality control state is an out-of-control state, sending early warning information to a target medical institution.
According to the medical inspection quality control method provided by the embodiment of the application, the trained quality control model is utilized to judge the quality control state of the residual error or error of the real-time inspection data, whether the quality control problem occurs in the target medical institution or not can be identified more accurately, namely the target medical institution is in an out-of-control state, early warning can be sent out timely, and medical accidents caused by out-of-control are avoided.
Fig. 8 is a schematic structural diagram of a medical inspection and quality control system according to an embodiment of the present application. The medical inspection and quality control system 800 may be implemented in software, hardware, or a combination of both.
As shown in fig. 8, the medical inspection and quality control system 800 includes:
an obtaining module 801, configured to obtain a training data set, where the training data set includes: sample data collected by a plurality of medical institutions with different quality control levels;
a model training module 802 to:
determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set;
segmenting the feature space in the dimension corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set;
and constructing and training a quality control model according to the preset mean model, the target training set and the preset quality control requirement, wherein the quality control model is used for carrying out real-time quality monitoring on the inspection data of the medical institution.
In one possible design, model training module 802 is to:
calculating a loss function value corresponding to each attribute feature by using a preset loss function model, wherein the loss function value is used for representing the importance degree of the attribute feature to a segmentation feature space;
screening one or more target characteristics from the attribute characteristics according to a preset segmentation rule and a loss function value;
and performing one or more times of segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces.
In one possible design, the attribute feature corresponds to a plurality of feature values, and the predetermined loss function model includes: an information entropy model, or a Gini coefficient model, or a sum of squared errors model; correspondingly, when the loss function model is an information entropy model, the model training module 802 is configured to:
calculating the information entropy of each attribute feature according to each feature value of each attribute feature and an information entropy formula, and taking the information entropy as a loss function value;
when the loss function model is a kini coefficient model, the model training module 802 is configured to:
calculating a kini coefficient of each attribute characteristic according to each characteristic value of each attribute characteristic and a kini coefficient formula, and taking the kini coefficient as a loss function value;
when the loss function model is a sum of squared errors model, model training module 802 is configured to:
calculating a mean value corresponding to each characteristic value under each attribute characteristic, and taking the difference between the characteristic value and the mean value as an error; the sum of the squares of the individual errors is taken as the loss function value.
In one possible design, the preset segmentation rule includes: sorting the importance of all attribute features corresponding to the feature space for one time, then dividing the dimension corresponding to the attribute features for multiple times according to the sorting sequence, and dividing the space to be divided from one dimension for each division;
correspondingly, the model training module 802 is configured to:
determining one or more target features and the importance ranking corresponding to each target feature according to the size of each loss function value;
and according to the importance ranking, sequentially and respectively segmenting the feature space at the corresponding dimensionality of the target feature to obtain a plurality of subspaces.
In another possible design, the preset segmentation rule includes: repeatedly calling a recursive segmentation mode to segment the characteristic space for multiple times, wherein the loss function values of all attribute characteristics of the space to be segmented are recalculated during each segmentation, the dimension of the attribute characteristic with the minimum loss function value is selected, the space to be segmented is segmented to obtain a plurality of new spaces to be segmented, and repeated segmentation is performed in the recursive segmentation mode until the losses of all attribute characteristics of the current space to be segmented meet the requirement of stopping segmentation of the function values;
correspondingly, the model training module 802 is configured to:
recursion segmentation is carried out on the characteristic space circularly, and in each recursion segmentation, the segmentation space obtained by the last segmentation is used as the current space to be segmented;
calculating a first loss function value of each first attribute characteristic corresponding to the current space to be divided by using a preset loss function model;
taking the first attribute characteristic corresponding to the minimum first loss function value as a target characteristic;
determining a segmentation threshold value on a dimension corresponding to the target feature according to a preset loss function model and each first feature value corresponding to the target feature;
segmenting the current space to be segmented according to a segmentation threshold value to obtain a new space to be segmented;
judging whether a second loss function value of each second attribute characteristic corresponding to the new space to be divided meets a preset stop requirement or not according to a preset loss function model;
if so, stopping the segmentation, otherwise, taking the new space to be segmented as the current space to be segmented, and repeatedly carrying out recursive segmentation.
In one possible design, model training module 802 is to:
calculating the mean value of historical detection values in each sample data, and taking the mean value as a predicted value;
determining a target training set according to the predicted value and each sample data in the training data set, wherein the method comprises the following steps:
calculating the difference value between each historical detection value and the corresponding predicted value;
all differences are combined into a target training set.
In one possible design, the preset mean model includes: the index weighted floating mean model is characterized in that the preset quality control requirement comprises the following steps: the false positive rate corresponding to the quality control model is less than or equal to a false positive rate threshold, and correspondingly, the model training module 802 is configured to:
inputting target training data in a target training set into an exponential weighted floating mean model to obtain a plurality of weighted floating means;
and adjusting the control upper limit value and/or the control lower limit value so that the ratio of the first weighted floating mean value which is larger than the control upper limit value and smaller than the control lower limit value to all weighted floating mean values is smaller than or equal to the false positive rate threshold value.
In one possible design, the acquisition module 801 is further configured to acquire real-time inspection data of the target medical institution;
the medical inspection quality control system 800 further includes: a quality control monitoring module 803, configured to:
judging the quality control state of the real-time inspection data through a quality control model; and if the quality control state is an out-of-control state, sending early warning information to a target medical institution.
It should be noted that the medical examination quality control system provided in the embodiment shown in fig. 8 may execute the training method of the medical examination quality control model or the medical examination quality control method provided in any one of the above method embodiments, and the specific implementation principle, technical features, term explanation and technical effects thereof are similar and will not be described herein again.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may include: at least one processor 901 and memory 902. Fig. 9 shows an electronic device as an example of a processor.
A memory 902 for storing programs. In particular, the program may include program code including computer operating instructions.
The processor 901 is configured to execute computer-executable instructions stored in the memory 902 to implement the methods described in the above method embodiments.
The processor 901 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
Alternatively, the memory 902 may be separate or integrated with the processor 901. When the memory 902 is a device independent of the processor 901, the electronic device 900 may further include:
a bus 903 for connecting the processor 901 and the memory 902. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the memory 902 and the processor 901 are implemented by being integrated on one chip, the memory 902 and the processor 901 may complete communication through an internal interface.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium may include: various media that can store program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer-readable storage medium stores program instructions for the methods in the above method embodiments.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method in the foregoing method embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present application.
Claims (10)
1. A medical inspection quality control model training method is characterized by comprising the following steps:
acquiring a training data set, and determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set, wherein the training data set comprises: sample data collected by a plurality of different levels and/or different types of medical institutions;
segmenting the feature space in the dimensionality corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to the sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set;
and constructing and training a quality control model according to a preset mean value model, the target training set and preset quality control requirements, wherein the quality control model is used for carrying out real-time quality monitoring on the inspection data of the medical institution.
2. The method for training the medical examination and quality control model according to claim 1, wherein the segmenting the feature space in the dimension corresponding to one or more of the attribute features according to a preset segmentation rule to obtain a plurality of subspaces comprises:
calculating a loss function value corresponding to each attribute feature by using a preset loss function model, wherein the loss function value is used for representing the importance degree of the attribute feature to the division of the feature space;
screening one or more target characteristics from the attribute characteristics according to the preset segmentation rule and the loss function value;
and performing one or more times of segmentation on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces.
3. The method as claimed in claim 2, wherein one of the attribute features corresponds to a plurality of feature values, and the predetermined loss function model comprises: an information entropy model, or a Gini coefficient model, or a sum of squared errors model;
when the loss function model is the information entropy model, calculating a loss function value corresponding to each attribute feature by using a preset loss function model, including:
calculating the information entropy of each attribute feature according to each feature value of each attribute feature and an information entropy formula, and taking the information entropy as the loss function value;
when the loss function model is the kini coefficient model, calculating a loss function value corresponding to each attribute feature by using a preset loss function model, including:
calculating a kini coefficient of each attribute feature according to each feature value of each attribute feature and a kini coefficient formula, and taking the kini coefficient as the loss function value;
when the loss function model is the sum of squared errors model, the calculating, by using a preset loss function model, a loss function value corresponding to each attribute feature includes:
calculating a mean value corresponding to each characteristic value under each attribute characteristic, and taking the difference between the characteristic value and the mean value as an error; and taking the square sum of each error as the loss function value.
4. The method for training a medical examination and quality control model according to claim 2, wherein the preset segmentation rule comprises: sorting the importance of all the attribute features corresponding to the feature space for one time, then dividing the dimension corresponding to the attribute features for multiple times according to the sorting sequence, and dividing the space to be divided from one dimension for each division;
screening one or more target characteristics from the attribute characteristics according to the preset segmentation rule and the loss function value; performing one or more segmentations on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces, including:
determining one or more target features and an importance ranking corresponding to each target feature according to the size of each loss function value;
and according to the importance ranking, sequentially and respectively segmenting the feature space at the dimensionality corresponding to the target feature to obtain a plurality of subspaces.
5. The method for training the medical inspection quality control model according to claim 2, wherein the preset segmentation rule comprises: repeatedly calling a recursive segmentation mode, performing multiple segmentation on the feature space, taking the segmented space obtained by the last segmentation as the current space to be segmented in each recursive segmentation, recalculating loss function values of all attribute characteristics of the current space to be segmented, selecting the dimension of the attribute characteristic with the minimum loss function value, segmenting the current space to be segmented to obtain a plurality of new spaces to be segmented, and performing repeated segmentation in the recursive segmentation mode until the new spaces to be segmented meet the requirement of stopping segmentation;
screening one or more target characteristics from the attribute characteristics according to the preset segmentation rule and the loss function value; performing one or more segmentations on the feature space in the dimension corresponding to the target feature to obtain a plurality of subspaces, including:
when the space to be cut is cut, calculating a first loss function value of each first attribute characteristic corresponding to the current space to be cut by using the preset loss function model;
taking the first attribute feature corresponding to the minimum first loss function value as the target feature;
determining a segmentation threshold value on a dimension corresponding to the target feature according to the preset loss function model and each first feature value corresponding to the target feature;
segmenting the current space to be segmented according to the segmentation threshold value to obtain a new space to be segmented;
judging whether a second loss function value of each second attribute characteristic corresponding to the new space to be divided meets a preset stop requirement or not according to the preset loss function model;
if so, stopping the segmentation, otherwise, taking the new space to be segmented as the current space to be segmented, and repeating the recursive segmentation.
6. The method for training the medical examination quality control model according to any one of claims 1 to 5, wherein the determining the predicted value corresponding to each subspace according to the sample data corresponding to each subspace by using a preset regression analysis model comprises:
calculating the mean value of historical detection values in each sample data, and taking the mean value as the predicted value;
determining a target training set according to the predicted value and each sample data in the training data set, including:
calculating the difference value between each historical detection value and the corresponding predicted value;
combining all the difference values into the target training set.
7. The method for training the medical inspection quality control model according to any one of claims 1 to 5, wherein the preset mean value model comprises: the index weighted floating mean model, the preset quality control requirement comprises: the false positive rate corresponding to the quality control model is less than or equal to a false positive rate threshold value;
the method for constructing and training the quality control model according to the preset mean model, the target training set and the preset quality control requirement comprises the following steps:
inputting the target training data in the target training set into the exponential weighted floating mean model to obtain a plurality of weighted floating means;
and adjusting the control upper limit value and/or the control lower limit value so that the ratio of a first weighted floating mean value which is larger than the control upper limit value and smaller than the control lower limit value to all the weighted floating mean values is smaller than or equal to the false positive rate threshold value.
8. A medical inspection quality control method is characterized by comprising the following steps:
acquiring real-time inspection data of a target medical institution;
judging the quality control state of the real-time inspection data through a preset quality control model, wherein the preset quality control model comprises a quality control model trained in any one of claims 1-7;
and if the quality control state is an out-of-control state, sending early warning information to the target medical institution.
9. A medical examination and quality control system, comprising:
an acquisition module configured to acquire a training data set, the training data set including: sample data collected by a plurality of medical institutions with different quality control levels;
a model training module to:
determining a multidimensional feature space corresponding to the training data set according to a plurality of attribute features corresponding to the training data set;
segmenting the feature space in the dimensionality corresponding to one or more attribute features according to a preset segmentation rule to obtain a plurality of subspaces; determining a predicted value corresponding to each subspace according to the sample data corresponding to each subspace by using a preset regression analysis model, and determining a target training set according to the predicted value and each sample data in the training data set;
and constructing and training a quality control model according to a preset mean value model, the target training set and preset quality control requirements, wherein the quality control model is used for carrying out real-time quality monitoring on the inspection data of the medical institution.
10. The medical examination and quality control system of claim 9, wherein the acquisition module is further configured to acquire real-time examination data of a target medical institution;
the medical quality control system further comprises:
the quality control monitoring module is used for:
judging the quality control state of the real-time inspection data through the quality control model;
and if the quality control state is an out-of-control state, sending early warning information to the target medical institution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211449669.XA CN115910364A (en) | 2022-11-18 | 2022-11-18 | Medical inspection quality control model training method, medical inspection quality control method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211449669.XA CN115910364A (en) | 2022-11-18 | 2022-11-18 | Medical inspection quality control model training method, medical inspection quality control method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115910364A true CN115910364A (en) | 2023-04-04 |
Family
ID=86484339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211449669.XA Pending CN115910364A (en) | 2022-11-18 | 2022-11-18 | Medical inspection quality control model training method, medical inspection quality control method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115910364A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116844684A (en) * | 2023-05-18 | 2023-10-03 | 首都医科大学附属北京朝阳医院 | Quality control processing method, device, equipment and medium for medical inspection result |
CN117973566A (en) * | 2024-04-01 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Training data processing method and device and related equipment |
-
2022
- 2022-11-18 CN CN202211449669.XA patent/CN115910364A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116844684A (en) * | 2023-05-18 | 2023-10-03 | 首都医科大学附属北京朝阳医院 | Quality control processing method, device, equipment and medium for medical inspection result |
CN116844684B (en) * | 2023-05-18 | 2024-04-02 | 首都医科大学附属北京朝阳医院 | Quality control processing method, device, equipment and medium for medical inspection result |
CN117973566A (en) * | 2024-04-01 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Training data processing method and device and related equipment |
CN117973566B (en) * | 2024-04-01 | 2024-05-31 | 腾讯科技(深圳)有限公司 | Training data processing method and device and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115910364A (en) | Medical inspection quality control model training method, medical inspection quality control method and system | |
KR20190043135A (en) | Systems and methods for classifying biological particles | |
CN113053535B (en) | Medical information prediction system and medical information prediction method | |
Pedreira et al. | From big flow cytometry datasets to smart diagnostic strategies: The EuroFlow approach | |
Wang et al. | Change-point detection in multinomial data with a large number of categories | |
CN111243736A (en) | Survival risk assessment method and system | |
CN109920541A (en) | A kind of pathological diagnosis method based on data analysis | |
CN115691722B (en) | Quality control method, device, equipment, medium and program product for medical data detection | |
WO2023186051A1 (en) | Auxiliary diagnosis method and apparatus, and construction apparatus, analysis apparatus and related product | |
CN113392894A (en) | Cluster analysis method and system for multi-group mathematical data | |
CN107545133A (en) | A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis | |
Skitsan et al. | Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using the Kullback Method. | |
CN116564409A (en) | Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer | |
CN115620819A (en) | Biomarker reference interval calculation method and device | |
Navya et al. | Classification of blood cells into white blood cells and red blood cells from blood smear images using machine learning techniques | |
CN114707608A (en) | Medical quality control data processing method, apparatus, device, medium, and program product | |
CN112768058B (en) | Method and device for processing medical data of metering information type | |
US20230386665A1 (en) | Method and device for constructing autism spectrum disorder (asd) risk prediction model | |
CN108763864A (en) | A method of evaluation biological pathway sample state | |
Kareem | An evaluation algorithms for classifying leukocytes images | |
Mahdi et al. | A Customized Iomt-Cloud Based Healthcare System For Analyzing of Brain Signals Via Supervised Mining Algorithms | |
Degeest et al. | Feature ranking in changing environments where new features are introduced | |
Chen et al. | Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data | |
Vigil et al. | A combined neural network mechanism for categorizing the normal and cancer cells | |
TWI817795B (en) | Cancer progression discriminant method and system thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |