WO2020034801A1 - 医疗特征筛选方法、装置、计算机设备和存储介质 - Google Patents

医疗特征筛选方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020034801A1
WO2020034801A1 PCT/CN2019/096262 CN2019096262W WO2020034801A1 WO 2020034801 A1 WO2020034801 A1 WO 2020034801A1 CN 2019096262 W CN2019096262 W CN 2019096262W WO 2020034801 A1 WO2020034801 A1 WO 2020034801A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical
feature
subset
target
feature subset
Prior art date
Application number
PCT/CN2019/096262
Other languages
English (en)
French (fr)
Inventor
荣絮
冯骞
吴亚博
郑毅
Original Assignee
平安医疗健康管理股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安医疗健康管理股份有限公司 filed Critical 平安医疗健康管理股份有限公司
Publication of WO2020034801A1 publication Critical patent/WO2020034801A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present application relates to a method, a device, a computer device, and a storage medium for screening medical features.
  • machine learning is used to analyze and mine medical big data. Because machine learning algorithms are at the core, data and feature variables are the main inputs to the algorithm model, so they rely heavily on data quality and feature variables. Because there are many types of medical data and a large amount of data, when obtaining the input features required to train a machine learning model, it usually takes a lot of server operating resources to extract features from the medical data to the data, resulting in a decrease in server operating efficiency.
  • a method, a device, a computer device, and a storage medium for screening medical features are provided.
  • a method for screening medical features including:
  • the preset script Invoking a preset script and inputting preprocessed medical data into the preset script, the preset script is used to perform feature construction according to a target feature type to obtain a medical initial corresponding to the target feature type output by the preset script Characteristics;
  • the medical feature subset is generated based on the initial medical features, and the evaluation function value of the medical feature subset is calculated. When the evaluation function value of the medical feature subset reaches the stopping criterion, the medical feature subset is used as the target medical feature set.
  • a medical feature screening device includes:
  • a pre-processing module for obtaining raw medical data, pre-processing the raw medical data, and obtaining pre-processed medical data
  • a feature construction module is used to call a preset script and input pre-processed medical data into the preset script.
  • the preset script is used to perform feature construction according to a target feature type to obtain a target output by the preset script. Medical initial characteristics corresponding to the characteristic type; and
  • a feature selection module is used to generate a medical feature subset based on the initial medical features, calculate an evaluation function value of the medical feature subset, and when the evaluation function value of the medical feature subset reaches the stopping criterion, use the medical feature subset as the target medical feature set .
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed. The following steps: obtain the original medical data, pre-process the original medical data to obtain pre-processed medical data; call a preset script, and input the pre-processed medical data into the preset script, where the preset script is used for
  • the feature construction is performed according to the target feature type to obtain the medical initial feature corresponding to the target feature type output by the preset script; the medical feature subset is generated according to the initial medical feature, and the evaluation function value of the medical feature subset is calculated. When the value of the evaluation function reaches the stopping criterion, the medical feature subset is taken as the target medical feature set.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the one or more processors perform the following steps: obtaining raw medical data, and The medical data is pre-processed to obtain the pre-processed medical data; a preset script is called to input the pre-processed medical data into the preset script, and the preset script is used to perform feature construction according to the target feature type to obtain the
  • the initial medical features corresponding to the target feature type output by the preset script are described; a medical feature subset is generated based on the initial medical features, and an evaluation function value of the medical feature subset is calculated.
  • the medical feature subset is used as the target medical feature set.
  • FIG. 1 is an application scenario diagram of a medical feature screening method according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of a medical feature screening method according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of obtaining a medical initial feature according to one or more embodiments.
  • FIG. 4 is a schematic flowchart of obtaining a subset of medical features according to one or more embodiments.
  • FIG. 5 is a schematic flowchart of obtaining a target medical feature subset according to one or more embodiments.
  • FIG. 6 is a schematic flowchart of obtaining a medical feature subset in another embodiment.
  • FIG. 7 is a schematic flowchart of training a model according to a target medical feature set according to one or more embodiments.
  • FIG. 8 is a block diagram of a medical feature screening apparatus according to one or more embodiments.
  • FIG. 9 is a block diagram of a computer device according to one or more embodiments.
  • the medical feature screening method provided in this application can be applied to the application environment shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the server 104 obtains the original medical data, pre-processes the original medical data, and obtains the pre-processed medical data; constructs the pre-processed medical data according to the target feature type, and obtains the initial medical features corresponding to the target feature type;
  • the medical feature subset is generated, and the evaluation function value of the medical feature subset is calculated. When the evaluation value of the medical feature subset reaches the stopping criterion, the medical feature subset is used as the target medical feature set.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a medical feature screening method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • Raw medical data refers to the unprocessed data generated by patients during the treatment of diseases in the hospital, including insured person information, diagnostic information, drug information, surgery information, expense settlement information, doctor and hospital information, etc.
  • the server obtains the raw medical data of the target time period from each hospital, and the target time period may be one month, one quarter, and one year.
  • Process the incomplete, inconsistent or duplicate data in the original medical data that is, supplement the incomplete data, perform consistent processing on the inconsistent data, and delete the duplicate data.
  • the processed raw medical data is then processed for data standardization or normalization.
  • S204 Invoke a preset script and input the pre-processed medical data into the preset script.
  • the preset script is used for feature construction according to the target feature type to obtain a target feature type corresponding to the preset script output. Medical initial characteristics.
  • Feature construction refers to constructing new features based on similar attributes or similar categories based on the characteristics of the original medical data.
  • the types of target characteristics include: types of medical behaviors, types of expenses, types of medical items, and types of patient information.
  • the types of visit behaviors are used to reflect the characteristics of patients' visit behaviors, including the frequency of visits, the frequency of visits, the concentration of visits, and the self-consistency of visits.
  • Expense type is used to reflect the patient's expense-related information, including expenditure amount, detailed item distribution, and rate.
  • the types of medical items are used to reflect the characteristics of the information related to the three social security catalogs, including surgical information notifications, drug information characteristics, and inspection item characteristics.
  • the type of patient information is used to reflect the sociodemographic information of the patient and other non-diagnostic and directly related information characteristics, including age characteristics, gender characteristics, and whether they are civil servants.
  • the server invokes a preset script file and inputs the pre-processed medical data into a preset script
  • the preset script is used to count pre-treatment medical data for medical behavior types, expense types, medical item types, and patients
  • Corresponding data such as information type, according to the statistical data to obtain the initial medical characteristics corresponding to the type of visit behavior, the initial medical characteristics corresponding to the expense type, the initial medical characteristics corresponding to the medical item type, and the initial medical characteristics corresponding to the patient information type, and then predict
  • the script file output the medical initial features corresponding to the obtained target feature types.
  • preprocessed medical data is stored in a hive (a data warehouse tool based on Hadoop) database.
  • the script is loaded into hive in advance, and the server sends the preprocessed medical data in the hive database to the script in the form of an output stream
  • the script receives pre-processed medical data in the form of an input stream, performs feature construction according to the target feature type, and then changes the script to store the initial medical feature corresponding to the obtained target feature type in the form of an output stream to the hive database.
  • S206 Generate a medical feature subset according to the initial medical features, calculate an evaluation function value of the medical feature subset, and when the evaluation function value of the medical feature subset reaches the stopping criterion, use the medical feature subset as the target medical feature set.
  • the evaluation function is used to evaluate the quality of the obtained medical feature subset, and includes two categories: Filter and Wrapper.
  • the filter measures the quality of the medical feature subset by analyzing the internal characteristics of the medical feature subset.
  • the wrapper uses the medical feature subset to classify the sample set, and measures the quality of the medical feature subset according to the classification accuracy.
  • Common evaluation functions include correlation, distance, information gain, consistency, and classifier error rate, etc.
  • the stopping criterion refers to a preset threshold value of the evaluation function.
  • the server uses a search algorithm to generate a subset of medical features based on the initial medical features.
  • the search algorithms include full search, heuristic search, and random search algorithms.
  • the evaluation function of the generated medical feature subset is calculated using the evaluation function.
  • the evaluation function value of the medical feature subset reaches a preset threshold
  • the medical feature subset is used as the target medical feature set, and the target medical feature set is Screened medical feature set.
  • the evaluation function used in feature selection is different.
  • the target machine learning model is a classification model
  • an information gain evaluation function may be used.
  • a correlation evaluation function may be used to make the obtained feature set more suitable for the target machine learning model.
  • the raw medical data is obtained by pre-processing to obtain the pre-processed medical data; the pre-processed medical data is constructed according to the target feature type to obtain the medical initial corresponding to the target feature type.
  • Features Generate a medical feature subset based on the initial medical features, calculate the evaluation function value of the medical feature subset, and when the evaluation function value of the medical feature subset reaches the stopping criterion, use the medical feature subset as the target medical feature set.
  • the initial medical features corresponding to the target feature type are obtained, the initial medical features are obtained by using a preset script, the medical feature subset is generated based on the initial medical features, and the medical feature subset that meets the stop criterion is used as the target Medical characteristics set. That is, by using the initial medical features to obtain the target medical feature set, the amount of data used to obtain the medical features can be reduced, thereby saving the operating resources of the server and improving the operating efficiency when the server extracts the medical features.
  • step S202 that is, the step of performing feature construction on the pre-processed medical data according to the target feature type to obtain the medical initial feature corresponding to the target feature type includes the steps:
  • the target feature types include the types of medical behaviors, expenses, medical items, and patient information.
  • the server obtains a preset target feature type, and the target feature type includes a medical treatment behavior type, an expense type, a medical item type, and a patient information type.
  • S304 Calculate target feature type data according to the preprocessed medical data, and obtain medical initial features according to the target feature type data.
  • the target feature type data is calculated according to the preprocessed medical data, and the medical initial feature is obtained according to the target feature type data.
  • the data of the behavior type of the consultation includes the number of visits, frequency, and location concentration, etc.
  • the preprocessed medical data the number and frequency of visits to the patient in the target time period are counted, and the location concentration is calculated.
  • the initial medical characteristics corresponding to the data are obtained.
  • the target feature type data is obtained by calculating the target feature type
  • the target feature type data is calculated according to the preprocessed medical data
  • the medical initial feature is obtained according to the target feature type data.
  • the initial feature type of the case is used to obtain the initial medical features.
  • the feature can be pre-processed on the original medical data in advance to facilitate subsequent screening of the initial medical features and improve efficiency.
  • step S204 that is, generating a medical feature subset according to the initial medical features, includes the steps:
  • S402 Randomly select a first target feature from the medical initial features, obtain a first medical feature subset according to the first target feature, and calculate an evaluation function value of the first medical feature subset.
  • the initialized medical feature subset is empty, a feature is randomly selected as the first target feature from the medical initial features, and the first target feature is added to the initialized medical feature subset to obtain a subset with only one feature. That is, the first medical feature subset uses an evaluation function to calculate an evaluation function value of the first medical feature subset.
  • a filter is used to calculate the distance between samples of the first subset of medical features to obtain an evaluation function value.
  • a packager is used to classify the sample set according to the first medical feature subset, and the accuracy of the classification is calculated as an evaluation function value.
  • S404 Randomly select a second target feature from the initial medical features, add the second target feature to the first medical feature subset, obtain a second medical feature subset, and calculate an evaluation function value for the second medical feature subset.
  • a feature is randomly selected from the features other than the first target feature in the initial medical features, the feature is used as the second target feature, and the second target feature is added to the first medical feature subset to obtain Second medical feature subset. Then the second medical feature subset includes a first target feature and a second target feature.
  • An evaluation function is used to calculate an evaluation function value for the second subset of medical features.
  • S406 Compare the evaluation function value of the first medical feature subset with the evaluation function value of the second medical feature subset, and obtain a target medical feature subset according to the comparison result, and use the target medical feature subset as the first medical feature subset.
  • the target medical feature subset is obtained according to the magnitude of the evaluation function value, and the target medical feature subset is used as the first medical feature subset.
  • a medical feature subset is obtained.
  • the server determines whether to traverse the initial medical features.
  • the target medical feature subset is used as the first medical feature subset, and then returns to step S404 for execution, that is, from A feature is randomly selected from the medical initial features of the features in the first medical feature subset, and the feature is placed in the first medical feature subset to obtain a second feature subset for iterative calculation.
  • the target medical feature subset will be obtained as the medical feature subset.
  • the initial medical features may be used as a subset of medical features.
  • the evaluation function value of the medical feature subset before deletion and the evaluation function of the medical feature subset after deletion may be calculated. Value, compare the value of the evaluation function to obtain the target medical feature subset, and continue to repeat the iterative calculation. When all the features in the medical feature subset are traversed, the obtained target medical feature subset is used as the medical feature subset.
  • the first target feature in the initial medical features is randomly selected, the first subset of medical features is obtained according to the first target feature, and the evaluation function value of the first subset of medical features is calculated; Add the second target feature to the first medical feature subset to obtain the second medical feature subset, and calculate the evaluation function value of the second medical feature subset; compare the evaluation function value of the first medical feature subset And the evaluation function value of the second medical feature subset, and obtain the target medical feature subset according to the comparison result, and use the target medical feature subset as the first medical feature subset; return to step S404 for execution, and when the medical initial feature traversal is completed, Obtaining the medical feature subset can improve the efficiency of obtaining the medical feature subset.
  • a full search algorithm may be used to generate a subset of medical features from the medical initial features. Enumerate all medical feature subsets according to the initial medical features, calculate the evaluation function value of each medical feature subset, and use the medical feature subset corresponding to the largest evaluation function value as the final medical feature subset, which can get more accurate Medical characteristics subset.
  • a random search algorithm may be used to generate a subset of medical features from the medical initial features. First, multiple medical feature subsets are randomly generated, the evaluation function value of each medical feature subset is calculated, the medical feature subset corresponding to the evaluation function value below a preset threshold is deleted, and then the remaining medical feature subsets are randomly crossed, Random mutations and other methods breed the next generation of medical feature subsets, and iterative calculation is performed again. When the preset number of reproductions is reached, the evaluation function value of the medical feature subset is calculated, and the medical feature subset corresponding to the maximum evaluation function value is used as the The final medical feature subset can improve the accuracy of obtaining the medical feature subset.
  • step S406 is to compare the evaluation function value of the first medical feature subset with the evaluation function value of the second medical feature subset, and obtain the target medical feature subset according to the comparison result. , Including steps:
  • the evaluation result is A better first medical feature subset is used as the target medical feature subset.
  • the evaluation function value of the first medical feature subset is not greater than the evaluation function value of the second medical feature subset, it indicates that the evaluation of the second medical feature subset is better than the evaluation of the first medical feature subset.
  • the second medical feature subset is used as the target medical feature subset.
  • the first medical feature subset when the evaluation function value of the first medical feature subset is greater than the evaluation function value of the second medical feature subset, the first medical feature subset is taken as the target medical feature subset; when the first medical feature is When the evaluation function value of the subset is not greater than the evaluation function value of the second medical feature subset, the second medical feature subset is used as the target medical feature subset, and different target medical feature subsets are obtained according to different comparison results.
  • the evaluation results better target the subset of medical features, making the resulting subset of medical features more accurate.
  • generating a medical feature subset according to the initial medical features includes the following steps:
  • S604 Train the support vector machine model according to the current medical feature set, obtain the weight coefficients of each feature, calculate the scores of the corresponding features according to the weight coefficients, rank the features according to the scores, and obtain the feature with the lowest score.
  • initial medical initial features are obtained to obtain the current medical feature set
  • medical sample data with the current medical feature set is obtained
  • a formula is used Training a support vector machine model, where x i refers to the i-th medical sample data and y i refers to the classification or prediction result corresponding to the i-th medical sample data.
  • N is the data volume of the medical sample.
  • ⁇ i is a Lagrangian multiplier, using the formula The value of the weight coefficient ⁇ is calculated.
  • Use formula Calculate the ranking criterion score of the features to find the features with the smallest ranking score.
  • the feature with the lowest score is deleted from the current medical feature set to obtain the current medical feature subset, and it is determined whether the number of features in the current medical feature subset meets the preset feature number. When the feature number in the current medical feature subset meets the preset feature number, Use the current medical feature subset as the medical feature subset.
  • the current medical feature set is updated, that is, the feature with the lowest score is deleted from the current medical feature set to obtain the current medical feature subset, and it is determined whether the number of features in the current medical feature subset meets a preset feature number. When the number satisfies the preset feature number, the current medical feature subset is used as the medical feature subset.
  • the current medical feature set is obtained according to the initial medical feature; the support vector machine model is trained according to the current medical feature set to obtain the weight coefficient of each feature; the corresponding feature score is calculated according to the weight coefficient, and the feature is performed according to the score. Sort the features with the lowest score; delete the features with the lowest score from the current medical feature set to obtain the current medical feature subset, determine whether the number of features in the current medical feature subset meets the preset feature number, and when the number of features in the current medical feature subset meets When the number of features is preset, using the current medical feature subset as the medical feature subset can improve the efficiency of the medical feature subset.
  • the method further includes the following steps:
  • training the support vector machine model based on the current medical feature set is returned to obtain the weight coefficient of each feature, the corresponding feature score is calculated based on the weight coefficient, and the feature is scored based on the score The step of sorting to obtain the feature with the lowest score.
  • the current medical feature subset is used as the medical feature subset.
  • the support vector machine model needs to be retrained using the current medical feature subset for the next iteration, that is, return to step S602 for execution.
  • the iteration stops, and the current medical feature subset is used as the medical feature subset.
  • step S206 that is, generating a medical feature subset according to the initial medical features, calculating an evaluation function value of the medical feature subset, and when the evaluation function value of the medical feature subset reaches the stopping criterion
  • the method further includes the following steps:
  • S702 Obtain medical data, and obtain data corresponding to a target medical feature set according to the medical data.
  • the medical big data of each hospital in the country is obtained from the National Medical Fund Database, and the data corresponding to the target medical feature set is obtained based on the obtained medical big data.
  • the target machine learning model includes a medical expense prediction model and a medical data abnormality detection model.
  • the target machine learning model is a medical expense prediction model
  • medical data of the hospital in different time periods is obtained
  • data corresponding to the target medical feature set is obtained from the medical data
  • different time periods in the medical data are obtained.
  • Medical cost within the time frame at this time, the data corresponding to the target medical feature set in a time period is input as the target machine learning model, and the medical cost in the next time period is used as the label for training.
  • the target machine learning model uses logic Trained by the regression algorithm, when a preset condition is reached, a trained target machine learning model is obtained.
  • the medical data anomaly detection model may be obtained by training using a supervised machine learning algorithm random forest algorithm to obtain data corresponding to the target medical feature set from the medical data.
  • the anomaly detection results of the medical data are obtained, the data corresponding to the target medical feature set is used as the input of the random forest machine learning algorithm, and the corresponding anomaly detection results are used as labels for training.
  • the trained target is obtained Machine learning models. You can also use the unsupervised machine learning algorithm to isolate the forest algorithm, directly obtain the data corresponding to the target medical feature set from the medical data, and obtain the data corresponding to the target medical feature set to establish the isolated forest according to the isolated forest algorithm to obtain the trained medical data Anomaly detection model.
  • the data corresponding to the target medical feature set is obtained according to the medical data; the data corresponding to the target medical feature set is input into the target machine learning model for training, and the trained target machine learning model is obtained.
  • Machine learning models include medical cost prediction models and medical data anomaly detection models.
  • the target medical feature set can be directly used to train the target machine learning model, which can improve the efficiency of the machine learning model.
  • steps in the flowchart of FIG. 2-7 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed sequentially in the order indicated by the arrows. Unless explicitly stated herein, the steps are performed in a non-strict order, and the steps may be performed in other orders. Moreover, at least some of the steps in Figure 2-7 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
  • a medical feature screening device 800 including: a pre-processing module 802, a feature construction module 804, and a feature selection module 806, where:
  • the pre-processing module 802 is configured to obtain raw medical data, pre-process the raw medical data, and obtain pre-processed medical data.
  • a feature construction module 804 is configured to perform feature construction on the pre-processed medical data according to the target feature type to obtain a medical initial feature corresponding to the target feature type.
  • a feature selection module 806 is configured to generate a medical feature subset according to the initial medical features, calculate an evaluation function value of the medical feature subset, and when the evaluation function value of the medical feature subset reaches the stopping criterion, use the medical feature subset as the target medical feature set.
  • the raw medical data is pre-processed through a pre-processing module 802
  • the initial medical features are obtained through a feature-building module 804
  • a subset of medical features is generated by the feature-selecting module 806 to calculate a subset of medical features
  • the evaluation function value of the medical feature subset reaches the stopping criterion
  • using the medical feature subset as the target medical feature set and training the machine learning model using the target medical feature set can improve the performance of the machine learning model.
  • the pre-processing module 802 is further configured to obtain a target feature type.
  • the target feature type data is calculated based on the preprocessed medical data, and the medical initial features are obtained based on the target feature type data.
  • the feature selection module 806 is further configured to randomly select a first target feature in the initial medical features, obtain a first subset of medical features according to the first target feature, and calculate an evaluation function value of the first subset of medical features. .
  • the second target feature in the initial medical features is randomly selected, the second target feature is added to the first medical feature subset, the second medical feature subset is obtained, and the evaluation function value of the second medical feature subset is calculated.
  • the evaluation function value of the first medical feature subset is compared with the evaluation function value of the second medical feature subset. According to the comparison result, the target medical feature subset is obtained, and the target medical feature subset is used as the first medical feature subset.
  • the feature selection module 806 is further configured to use the first medical feature subset as the target medical feature when the evaluation function value of the first medical feature subset is greater than the evaluation function value of the second medical feature subset. set. When the evaluation function value of the first medical feature subset is not greater than the evaluation function value of the second medical feature subset, the second medical feature subset is taken as the target medical feature subset.
  • the feature selection module 806 is further configured to obtain the current medical feature set according to the initial medical features.
  • the support vector machine model is trained according to the current medical feature set to obtain the weight coefficient of each feature, and the score of the corresponding feature is calculated according to the weight coefficient, and the features are ranked according to the score to obtain the feature with the lowest score.
  • the feature with the lowest score is deleted from the current medical feature set to obtain the current medical feature subset. It is determined whether the number of features in the current medical feature subset meets the preset feature number. When the feature number in the current medical feature subset meets the preset feature number, the current The medical feature subset is used as the medical feature subset.
  • the feature selection module 806 is further configured to: when the number of features in the current medical feature subset does not satisfy the preset number of features, return to training a support vector machine model according to the current medical feature set to obtain the weight coefficient of each feature , Calculating the score of the corresponding feature according to the weight coefficient, ranking the features according to the score, and obtaining the feature with the smallest score.
  • the current medical feature is The subset serves as a subset of medical features.
  • the medical feature screening device 800 further includes:
  • a feature set data obtaining module for obtaining medical data, and obtaining data corresponding to a target medical feature set according to the medical data
  • a training module is configured to input data corresponding to a target medical feature set into a target machine learning model for training to obtain a trained target machine learning model.
  • the target machine learning model includes a medical expense prediction model and a medical data abnormality detection model.
  • Each module in the above-mentioned medical feature screening device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for operating systems and computer-readable instructions in a non-volatile storage medium.
  • the computer equipment database is used to store medical big data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a medical feature screening method.
  • FIG. 9 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • Computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement any one of the present application.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement one of the embodiments of the present application Provides steps for a medical feature screening method.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

一种医疗特征筛选方法、装置、计算机设备和存储介质。所述方法包括:获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征;根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。

Description

医疗特征筛选方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2018年08月14日提交中国专利局,申请号为201810925041X,申请名称为“医疗特征筛选方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种医疗特征筛选方法、装置、计算机设备和存储介质。
背景技术
目前在医疗领域中,通过机器学习来对医疗大数据进行分析挖掘和洞察。由于是以机器学习算法为核心,数据和特征变量是算法模型的主要输入,因此对数据质量和特征变量的依赖很大。由于医疗数据的种类多、数据量大,在得到训练机器学习模型时需要的输入特征时,通常需要花费大量的服务器运行资源从医疗数据提取到数据的特征,导致服务器的运行效率降低。
发明内容
根据本申请公开的各种实施例,提供一种医疗特征筛选方法、装置、计算机设备和存储介质。
一种医疗特征筛选方法,包括:
获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;
调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征;及
根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
一种医疗特征筛选装置,包括:
预处理模块,用于获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;
特征构建模块,用于调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征;及
特征选择模块,用于根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价 函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征;根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征;根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中医疗特征筛选方法的应用场景图。
图2为根据一个或多个实施例中医疗特征筛选方法的流程示意图。
图3为根据一个或多个实施例中得到医疗初始特征的流程示意图。
图4为根据一个或多个实施例中得到医疗特征子集的流程示意图。
图5为根据一个或多个实施例中得到目标医疗特征子集的流程示意图。
图6为另一个实施例中得到医疗特征子集的流程示意图。
图7为根据一个或多个实施例中根据目标医疗特征集训练模型的流程示意图。
图8为根据一个或多个实施例中医疗特征筛选装置的框图。
图9为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进 行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的医疗特征筛选方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104通过网络进行通信。服务器104获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;将预处理的医疗数据按照目标特征类型进行特征构建,得到目标特征类型对应的医疗初始特征;根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种医疗特征筛选方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
S202,获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据。
原始医疗数据是指患者在医院治疗疾病过程中产生的未经处理的数据,包括参保人信息,诊断信息,药品信息,手术信息,费用结算信息,医生和医院信息等。
具体地,服务器从各个医院获取到目标时间段的原始医疗数据,该目标时间段可以是一个月、一个季度和一年内等。对该原始医疗数据中不完整、不一致或者重复的数据进行处理,即补充不完整的数据,将不一致的数据进行一致性处理,将重复的数据进行删除。然后处理的原始医疗数据进行数据标准化处理或者归一化处理。
S204,调用预设脚本,将预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的目标特征类型对应的医疗初始特征。
特征构建是指在原始医疗数据特征的基础上基于相似的属性或者相似的类别构建新的特征。目标特征类型包括:就诊行为类型、费用支出类型、医疗项目类型和患者信息类型等。就诊行为类型用于反映患者的就诊行为的特征,包括就诊次数特征、就诊频率特征、就诊地点集中度特征和就诊行为自洽性等。费用支出类型用于反映患者的费用相关信息,包括支出额度、细项分布和速率等。医疗项目类型用于反映社保三目录相关信息的特征,包括手术信息通知、药品信息特征和检查项特征等。患者信息类型用于反映患者的社会人口学信息和其他非诊疗直接相关信息特征,包括年龄特征、性别特征和是否公务员特征等。
具体地,服务器调用预先设置的脚本文件,将预处理的医疗数据输入到预设脚本中,该预设脚本用于统计预处理的医疗数据中就诊行为类型、费用支出类型、医疗项目类型和患者信息类型等对应的数据,根据统计后的数据得到就诊行为类型对应的医疗初始特征、费用支出类型对应的医疗初始特征医疗项目类型对应的医疗初始特征和患者信息类型对应的医疗初始特征,然后预设脚本文件将得到的目标特征类型对应的医疗初始特征输出。比如,在hive(基于Hadoop的一个数据仓库工具)数据库中存储有预处理的医疗数据, 预先将脚本加载到hive中,服务器会将hive数据库中预处理的医疗数据以输出流的形式交给脚本,该脚本以输入流的形式接收预处理的医疗数据,按照目标特征类型进行特征构建,然后改脚本又以输出流的形式将得到的目标特征类型对应的医疗初始特征存储到hive数据库中。
S206,根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
评价函数用于评价得到的医疗特征子集的好坏,包括筛选器(Filter)、封装器(Wrapper)两大类。筛选器通过分析医疗特征子集内部的特点来衡量医疗特征子集好坏。封装器使用医疗特征子集对样本集进行分类,根据分类精度来衡量医疗特征子集好坏。常见评价函数包括相关性、距离、信息增益、一致性和分类器错误率等,停止准则是指预先设置好的评价函数值的阈值。
具体的,服务器根据医疗初始特征使用搜索算法生成医疗特征子集,搜索算法包括完全搜索、启发式搜索和随机搜索算法等。并使用评价函数计算生成的医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到预设的阈值时,将医疗特征子集作为目标医疗特征集,则该目标医疗特征集就是筛选出来的医疗特征集。其中,根据要训练的目标机器学习模型的不同,在进行特征选择时使用的评价函数不同,在一些实施例中,当目标机器学习模型为分类模型时,可以使用信息增益评价函数。在一些实施例中,当目标机器学习模型为预测模型时,可以使用相关性评价函数,使得到的特征集更适合目标机器学习模型。
上述医疗特征筛选方法中,通过获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据;将预处理的医疗数据按照目标特征类型进行特征构建,得到目标特征类型对应的医疗初始特征;根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。通过按照目标特征类型进行特征构建,得到目标特征类型对应的医疗初始特征,通过使用预设脚本得到医疗初始特征,根据医疗初始特征生成医疗特征子集,将符合停止准则的医疗特征子集作为目标医疗特征集。即通过使用医疗初始特征来得到目标医疗特征集,能够减少得到医疗特征时使用的数据量,从而节省服务器的运行资源,提高服务器提取医疗特征时的运行效率。
在其中一个实施例中,如图3所示,步骤S202,即步骤将预处理的医疗数据按照目标特征类型进行特征构建,得到目标特征类型对应的医疗初始特征,包括步骤:
S302,获取目标特征类型。
目标特征类型包括就诊行为类型、费用支出类型、医疗项目类型和患者信息类型等。
具体地,服务器获取预先设置好的目标特征类型,该目标特征类型包括了就诊行为类型、费用支出类型、医疗项目类型和患者信息类型。
S304,根据预处理的医疗数据计算得到目标特征类型数据,根据目标特征类型数据得到医疗初始特征。
具体地,根据预处理的医疗数据计算得到目标特征类型数据,根据目标特征类型数据得到医疗初始特征。比如:在就诊行为类型数据中包括了就诊次数、频率、地点集中度等,根据预处理的医疗数据统计患者在目标时间段的就诊次数、频率,计算地点集中度等。根据统计之后的数据得到了数据对应的医疗初始特征。
在上述实施例中,通过获取目标特征类型,根据预处理的医疗数据计算得到目标特征类型数据,根据目标特征类型数据得到医疗初始特征。案子预设特征类型得到医疗初始特征,可以预先对原始医疗数据进行特征预处理,方便后续进行对医疗初始特征进一步筛选,提高了效率。
在其中一个实施例中,如图4所示,步骤S204,即根据医疗初始特征生成医疗特征子集,包括步骤:
S402,随机选择医疗初始特征中的第一目标特征,根据第一目标特征得到第一医疗特征子集,计算第一医疗特征子集的评价函数值。
具体地,初始化医疗特征子集为空,随机从医疗初始特征中选择一个特征为第一目标特征,将该第一目标特征加入到初始化的医疗特征子集中,得到了只有一个特征的子集,即第一医疗特征子集,使用评价函数计算第一医疗特征子集的评价函数值。在一个实施例中,使用筛选器计算第一医疗特征子集的样本间距离得到评价函数值。在另一个实施例中,使用封装器根据第一医疗特征子集对样本集进行分类,计算分类的精度作为评价函数值。
S404,随机选择医疗初始特征中的第二目标特征,将第二目标特征加入第一医疗特征子集中,得到第二医疗特征子集,计算第二医疗特征子集的评价函数值。
具体的,在从医疗初始特征中除过第一目标特征之外的特征中随机选择一个特征,将该特征作为第二目标特征,并将第二目标特征加入到第一医疗特征子集中,得到第二医疗特征子集。则该第二医疗特征子集中包括第一目标特征和第二目标特征。使用评价函数计算第二医疗特征子集的评价函数值。
S406,比较第一医疗特征子集的评价函数值和第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将目标医疗特征子集作为第一医疗特征子集。
具体地,比较第一医疗特征子集的评价函数值和第二医疗特征子集的评价函数值,使用相同的评价函数计算第一医疗特征子集和第二医疗特征子集的评价函数值,比较评价函数值的大小,根据评价函数值的大小得到目标医疗特征子集,将目标医疗特征子集作为第一医疗特征子集。
S408,返回随机选择医疗初始特征中的第二目标特征,将第二目标特征加入第一医疗特征子集中,得到第二医疗特征子集,计算第二医疗特征子集的评价函数值的步骤,当医疗初始特征遍历完成时,得到医疗特征子集。
具体地,当得到目标医疗特征子集时,服务器会判断是否遍历医疗初始特征,当没有遍历完成时,将目标医疗特征子集作为第一医疗特征子集,然后返回步骤S404进行执行,即从除去第一医疗特征子集中的特征的医疗初始特征中中随机选择一个特征,将该特征放 入第一医疗特征子集,得到第二特征子集进行迭代计算,当医疗初始特征中所有的特征都遍历完成时,将得到目标医疗特征子集作为医疗特征子集。
在一些实施例中,可以将医疗初始特征作为医疗特征子集,每次从医疗特征子集中随机删除一个特征,计算删除前医疗特征子集的评价函数值和删除后医疗特征子集的评价函数值,比较评价函数值大小,得到目标医疗特征子集,继续重复迭代计算,当医疗特征子集中所有的特征都遍历完成时,就将得到的目标医疗特征子集作为医疗特征子集。
在上述实施例中,通过随机选择医疗初始特征中的第一目标特征,根据第一目标特征得到第一医疗特征子集,计算第一医疗特征子集的评价函数值;随机选择医疗初始特征中的第二目标特征,将第二目标特征加入第一医疗特征子集中,得到第二医疗特征子集,计算第二医疗特征子集的评价函数值;比较第一医疗特征子集的评价函数值和第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将目标医疗特征子集作为第一医疗特征子集;返回步骤S404执行,当医疗初始特征遍历完成时,得到医疗特征子集,可以提高得到医疗特征子集的效率。
在一些实施例中,可以使用完全搜索算法从医疗初始特征生成医疗特征子集。根据医疗初始特征枚举所有的医疗特征子集,计算每个医疗特征子集的评价函数值,将最大的评价函数值对应的医疗特征子集作为最终的医疗特征子集,可以得到更精确的医疗特征子集。
在一些实施例中,可以使用随机搜索算法从医疗初始特征生成医疗特征子集。首先随机产生多个医疗特征子集,计算每个医疗特征子集的评价函数值,删除评价函数值低于预设阈值对应的医疗特征子集,然后对剩余的医疗特征子集通过随机交叉、随机突变等方法繁殖出下一代的医疗特征子集,重新进行迭代计算,当达到预设的繁殖次数时,计算医疗特征子集的评价函数值,将最大评价函数值对应的医疗特征子集作为最终的医疗特征子集,能够提高得到医疗特征子集的精确率。
在其中一个实施例中,如图5所示,步骤S406,即比较第一医疗特征子集的评价函数值和第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,包括步骤:
S502,当第一医疗特征子集的评价函数值大于第二医疗特征子集的评价函数值时,将第一医疗特征子集作为目标医疗特征子集。
具体地,当第一医疗特征子集的评价函数值大于第二医疗特征子集的评价函数值时,说明第一医疗特征子集比第二医疗特征子集的评价更好,则将评价结果更好的第一医疗特征子集作为目标医疗特征子集。
S504,当第一医疗特征子集的评价函数值不大于第二医疗特征子集的评价函数值时,将第二医疗特征子集作为目标医疗特征子集。
具体地,当第一医疗特征子集的评价函数值不大于第二医疗特征子集的评价函数值时,说明第二医疗特征子集比第一医疗特征子集的评价好,此时将第二医疗特征子集作为 目标医疗特征子集。
在该实施例中,通过当第一医疗特征子集的评价函数值大于第二医疗特征子集的评价函数值时,将第一医疗特征子集作为目标医疗特征子集;当第一医疗特征子集的评价函数值不大于第二医疗特征子集的评价函数值时,将第二医疗特征子集作为目标医疗特征子集,根据不同的比较结果得到不同的目标医疗特征子集,能够得到评价结果更好目标医疗特征子集,使得最终得到的医疗特征子集更为精确。
在其中一个实施例中,如图6所示,根据医疗初始特征生成医疗特征子集,包括步骤:
S602,根据医疗初始特征得到当前医疗特征集。
S604,根据当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据权值系数计算对应特征的得分,按照得分对特征进行排序,得到得分最小的特征。
支持向量机一种有监督,基于统计学理论的机器学习算法,是一种二类分类模型。其基本模型定义为特征空间上的间隔最大的线性分类器。使用线性函数g(x)=w Tx+b和f(x)=sgn(g(x))作为分类器。
具体地,初始化医疗初始特征得到当前医疗特征集,获取到带当前医疗特征集的医疗样本数据,使用公式
Figure PCTCN2019096262-appb-000001
训练支持向量机模型,其中,x i是指第i个医疗样本数据,y i是指第i个医疗样本数据对应的分类或者预测结果。N为医疗样本数据量。α i为拉格朗日乘子,使用公式
Figure PCTCN2019096262-appb-000002
计算得到权值系数ω的值。使用公式
Figure PCTCN2019096262-appb-000003
计算特征的排序准则得分,找出排序得分最小的特征。
S606,从当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当当前医疗特征子集中特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集。
具体地,更新当前医疗特征集,即从当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,判断当前医疗特征子集中特征数是否满足预设特征数,当当前医疗特征子集中特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集。
在上述实例中,通过根据医疗初始特征得到当前医疗特征集;根据当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据权值系数计算对应特征的得分,按照得分对特征进行排序,得到得分最小的特征;从当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当当前医疗特征子集中特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集,能够使得到医疗特征子集的效率得到提高。
在其中一个实施例中,在确定当前医疗特征子集中特征数是否满足预设特征数之后,还包括步骤:
当当前医疗特征子集中特征数不满足预设特征数时,返回根据当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据权值系数计算对应特征的得分,按照得分对特征进行排序,得到得分最小的特征的步骤,当当前医疗特征子集中特征数满足预设特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集。
具体地,当当前医疗特征子集中特征数不满足预设特征数时,此时需要使用当前医疗特征子集重新训练支持向量机模型,进行下一次的迭代,即返回步骤S602进行执行,当当前医疗特征子集中的特征数满足预设特征数时,迭代停止,并将当前医疗特征子集作为医疗特征子集。
在其中一个实施例中,如图7所示,步骤S206,即在根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集之后,还包括步骤:
S702,获取医疗数据,根据医疗数据得到目标医疗特征集对应的数据。
具体地,从全国医疗基金库中获取到全国各个医院的医疗大数据,根据得到了医疗大数据得到目标医疗特征集对应的数据。
S704,将目标医疗特征集对应的数据输入到目标机器学习模型中进行训练,得到已训练的目标机器学习模型,目标机器学习模型包括医疗费用预测模型和医疗数据异常检测模型。
具体地,当目标机器学习模型是医疗费用预测模型时,获取到医院在不同时间段内的医疗数据,从医疗数据中获取到目标医疗特征集对应的数据,并获取到医疗数据中不同时间段内的医疗费用,此时,将一个时间段内目标医疗特征集对应的数据作为目标机器学习模型输入,将下一个时间段内的医疗费用作为标签进行训练,其中该目标机器学习模型是使用逻辑回归算法训练得到的,当达到预设条件时,得到已训练的目标机器学习模型。
当目标机器学习模型是医疗数据异常检测模型时,该医疗数据异常检测模型可以是使用有监督机器学习算法随机森林算法训练得到的,获取到从医疗数据中获取到目标医疗特征集对应的数据,并获取到医疗数据的异常检测结果,将目标医疗特征集对应的数据作为随机森林机器学习算法的输入,将对应的异常检测结果作为标签进行训练,当达到预设条件时,得到已训练的目标机器学习模型。也可以使用无监督机器学习算法隔离森林算法,直接从医疗数据中获取到目标医疗特征集对应的数据,将得到目标医疗特征集对应的数据根据隔离森林算法建立隔离森林,得到已训练的医疗数据异常检测模型。
上述实施例中,通过获取医疗数据,根据医疗数据得到目标医疗特征集对应的数据;将目标医疗特征集对应的数据输入到目标机器学习模型中进行训练,得到已训练的目标机器学习模型,目标机器学习模型包括医疗费用预测模型和医疗数据异常检测模型,可以直接使用目标医疗特征集训练目标机器学习模型,能提高得到机器学习模型效率。
应该理解的是,虽然图2-7的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的 执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-7中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图8所示,提供了一种医疗特征筛选装置800,包括:预处理模块802、特征构建模块804和特征选择模块806,其中:
预处理模块802,用于获取原始医疗数据,对原始医疗数据进行预处理,得到预处理的医疗数据。
特征构建模块804,用于将预处理的医疗数据按照目标特征类型进行特征构建,得到目标特征类型对应的医疗初始特征。
特征选择模块806,用于根据医疗初始特征生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集。
在该医疗特征筛选装置800中,通过预处理模块802对原始医疗数据进行预处理,通过特征构建模块804来得到医疗初始特征,最后通过特征选择模块806生成医疗特征子集,计算医疗特征子集的评价函数值,当医疗特征子集的评价函数值达到停止准则时,将医疗特征子集作为目标医疗特征集,使用该目标医疗特征集训练机器学习模型,能够提高机器学习模型性能。
在其中一个实施例中,预处理模块802还用于获取目标特征类型。根据预处理的医疗数据计算得到目标特征类型数据,根据目标特征类型数据得到医疗初始特征。
在其中一个实施例中,特征选择模块806还用于随机选择医疗初始特征中的第一目标特征,根据第一目标特征得到第一医疗特征子集,计算第一医疗特征子集的评价函数值。随机选择医疗初始特征中的第二目标特征,将第二目标特征加入第一医疗特征子集中,得到第二医疗特征子集,计算第二医疗特征子集的评价函数值。比较第一医疗特征子集的评价函数值和第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将目标医疗特征子集作为第一医疗特征子集。返回随机选择医疗初始特征中的第二目标特征,将第二目标特征加入第一医疗特征子集中,得到第二医疗特征子集,计算第二医疗特征子集的评价函数值的步骤,当医疗初始特征遍历完成时,得到医疗特征子集。
在其中一个实施例中,特征选择模块806还用于当第一医疗特征子集的评价函数值大于第二医疗特征子集的评价函数值时,将第一医疗特征子集作为目标医疗特征子集。当第一医疗特征子集的评价函数值不大于第二医疗特征子集的评价函数值时,将第二医疗特征子集作为目标医疗特征子集。
在其中一个实施例中,特征选择模块806还用于根据医疗初始特征得到当前医疗特征集。根据当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据权值系数 计算对应特征的得分,按照得分对特征进行排序,得到得分最小的特征。从当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当当前医疗特征子集中特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集。
在其中一个实施例中,特征选择模块806还用于:当当前医疗特征子集中特征数不满足预设特征数时,返回根据当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据权值系数计算对应特征的得分,按照得分对特征进行排序,得到得分最小的特征的步骤,当当前医疗特征子集中特征数满足预设特征数满足预设特征数时,将当前医疗特征子集作为医疗特征子集。
在其中一个实施例中,医疗特征筛选装置800,还包括:
特征集数据得到模块,用于获取医疗数据,根据医疗数据得到目标医疗特征集对应的数据;
训练模块,用于将目标医疗特征集对应的数据输入到目标机器学习模型中进行训练,得到已训练的目标机器学习模型,目标机器学习模型包括医疗费用预测模型和医疗数据异常检测模型。
关于医疗特征筛选装置的具体限定可以参见上文中对于医疗特征筛选方法的限定,在此不再赘述。上述医疗特征筛选装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储医疗大数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种医疗特征筛选方法。
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的医疗特征筛选方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令 被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的医疗特征筛选方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种医疗特征筛选方法,包括:
    获取原始医疗数据,对所述原始医疗数据进行预处理,得到预处理的医疗数据;
    调用预设脚本,将所述预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的所述目标特征类型对应的医疗初始特征;及
    根据所述医疗初始特征生成医疗特征子集,计算所述医疗特征子集的评价函数值,当所述医疗特征子集的评价函数值达到停止准则时,将所述医疗特征子集作为目标医疗特征集。
  2. 根据权利要求1所述的方法,其特征在于,将所述预处理的医疗数据按照目标特征类型进行特征构建,得到所述目标特征类型对应的医疗初始特征,包括:
    获取目标特征类型;及
    根据所述预处理的医疗数据计算得到所述目标特征类型数据,根据所述目标特征类型数据得到医疗初始特征。
  3. 根据权利要求1所述的方法,其特征在于,根据所述医疗初始特征生成医疗特征子集,包括:
    随机选择医疗初始特征中的第一目标特征,根据所述第一目标特征得到第一医疗特征子集,计算所述第一医疗特征子集的评价函数值;
    随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值;
    比较所述第一医疗特征子集的评价函数值和所述第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将所述目标医疗特征子集作为第一医疗特征子集;及
    返回随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值的步骤,当所述医疗初始特征遍历完成时,得到医疗特征子集。
  4. 根据权利要求3所述的方法,其特征在于,比较所述第一医疗特征子集的评价函数值和所述第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,包括:
    当所述第一医疗特征子集的评价函数值大于所述第二医疗特征子集的评价函数值时,将所述第一医疗特征子集作为目标医疗特征子集;及
    当所述第一医疗特征子集的评价函数值不大于所述第二医疗特征子集的评价函数值时,将所述第二医疗特征子集作为目标医疗特征子集。
  5. 根据权利要求1所述的方法,其特征在于,根据所述医疗初始特征生成医疗特征子集,包括:
    根据所述医疗初始特征得到当前医疗特征集;
    根据所述当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据所述权值系数计算对应特征的得分,按照所述得分对特征进行排序,得到得分最小的特征;及
    从所述当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当所述当前医疗特征子集中特征数满足预设特征数时,将所述当前医疗特征子集作为医疗特征子集。
  6. 根据权利要求5所述的方法,其特征在于,在确定当前医疗特征子集中特征数是否满足预设特征数之后,还包括:
    当所述当前医疗特征子集中特征数不满足预设特征数时,返回根据所述当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据所述权值系数计算对应特征的得分,按照所述得分对特征进行排序,得到得分最小的特征的步骤,当所述当前医疗特征子集中特征数满足预设特征数满足预设特征数时,将所述当前医疗特征子集作为医疗特征子集。
  7. 根据权利要求1所述的方法,其特征在于,在根据所述医疗初始特征生成医疗特征子集,计算所述医疗特征子集的评价函数值,当所述医疗特征子集的评价函数值达到停止准则时,将所述医疗特征子集作为目标医疗特征集之后,还包括:
    获取医疗数据,根据所述医疗数据得到所述目标医疗特征集对应的数据;及
    将所述目标医疗特征集对应的数据输入到目标机器学习模型中进行训练,得到已训练的目标机器学习模型,所述目标机器学习模型包括医疗费用预测模型和医疗数据异常检测模型。
  8. 一种医疗特征筛选装置,包括:
    预处理模块,用于获取原始医疗数据,对所述原始医疗数据进行预处理,得到预处理的医疗数据;
    特征构建模块,用于调用预设脚本,将所述预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的所述目标特征类型对应的医疗初始特征;
    特征选择模块,用于根据所述医疗初始特征生成医疗特征子集,计算所述医疗特征子集的评价函数值,当所述医疗特征子集的评价函数值达到停止准则时,将所述医疗特征子集作为目标医疗特征集。
  9. 根据权利要求8所述的装置,其特征在于,所述特征选择模块还用于随机选择医疗初始特征中的第一目标特征,根据所述第一目标特征得到第一医疗特征子集,计算所述第一医疗特征子集的评价函数值;随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值;比较所述第一医疗特征子集的评价函数值和所述第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将所述目标医疗特征子集作为第一 医疗特征子集;及返回随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值的步骤,当所述医疗初始特征遍历完成时,得到医疗特征子集。
  10. 根据权利要求8所述的装置,其特征在于,还包括:
    特征集数据得到模块,用于获取医疗数据,根据所述医疗数据得到所述目标医疗特征集对应的数据;及
    训练模块,用于将所述目标医疗特征集对应的数据输入到目标机器学习模型中进行训练,得到已训练的目标机器学习模型,所述目标机器学习模型包括医疗费用预测模型和医疗数据异常检测模型。
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取原始医疗数据,对所述原始医疗数据进行预处理,得到预处理的医疗数据;
    调用预设脚本,将所述预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的所述目标特征类型对应的医疗初始特征;及
    根据所述医疗初始特征生成医疗特征子集,计算所述医疗特征子集的评价函数值,当所述医疗特征子集的评价函数值达到停止准则时,将所述医疗特征子集作为目标医疗特征集。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取目标特征类型;及
    根据所述预处理的医疗数据计算得到所述目标特征类型数据,根据所述目标特征类型数据得到医疗初始特征。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    随机选择医疗初始特征中的第一目标特征,根据所述第一目标特征得到第一医疗特征子集,计算所述第一医疗特征子集的评价函数值;
    随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值;
    比较所述第一医疗特征子集的评价函数值和所述第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将所述目标医疗特征子集作为第一医疗特征子集;及
    返回随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值的步骤, 当所述医疗初始特征遍历完成时,得到医疗特征子集。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    当所述第一医疗特征子集的评价函数值大于所述第二医疗特征子集的评价函数值时,将所述第一医疗特征子集作为目标医疗特征子集;及
    当所述第一医疗特征子集的评价函数值不大于所述第二医疗特征子集的评价函数值时,将所述第二医疗特征子集作为目标医疗特征子集。
  15. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    根据所述医疗初始特征得到当前医疗特征集;
    根据所述当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据所述权值系数计算对应特征的得分,按照所述得分对特征进行排序,得到得分最小的特征;及
    从所述当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当所述当前医疗特征子集中特征数满足预设特征数时,将所述当前医疗特征子集作为医疗特征子集。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取原始医疗数据,对所述原始医疗数据进行预处理,得到预处理的医疗数据;
    调用预设脚本,将所述预处理的医疗数据输入到所述预设脚本中,所述预设脚本用于按照目标特征类型进行特征构建,得到所述预设脚本输出的所述目标特征类型对应的医疗初始特征;及
    根据所述医疗初始特征生成医疗特征子集,计算所述医疗特征子集的评价函数值,当所述医疗特征子集的评价函数值达到停止准则时,将所述医疗特征子集作为目标医疗特征集。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取目标特征类型;及
    根据所述预处理的医疗数据计算得到所述目标特征类型数据,根据所述目标特征类型数据得到医疗初始特征。
  18. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    随机选择医疗初始特征中的第一目标特征,根据所述第一目标特征得到第一医疗特征子集,计算所述第一医疗特征子集的评价函数值;
    随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值;
    比较所述第一医疗特征子集的评价函数值和所述第二医疗特征子集的评价函数值,根据比较结果,得到目标医疗特征子集,将所述目标医疗特征子集作为第一医疗特征子集;及
    返回随机选择医疗初始特征中的第二目标特征,将所述第二目标特征加入所述第一医疗特征子集中,得到第二医疗特征子集,计算所述第二医疗特征子集的评价函数值的步骤,当所述医疗初始特征遍历完成时,得到医疗特征子集。
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    当所述第一医疗特征子集的评价函数值大于所述第二医疗特征子集的评价函数值时,将所述第一医疗特征子集作为目标医疗特征子集;及
    当所述第一医疗特征子集的评价函数值不大于所述第二医疗特征子集的评价函数值时,将所述第二医疗特征子集作为目标医疗特征子集。
  20. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    根据所述医疗初始特征得到当前医疗特征集;
    根据所述当前医疗特征集训练支持向量机模型,得到各个特征的权值系数,根据所述权值系数计算对应特征的得分,按照所述得分对特征进行排序,得到得分最小的特征;及
    从所述当前医疗特征集中删除得分最小的特征,得到当前医疗特征子集,确定当前医疗特征子集中特征数是否满足预设特征数,当所述当前医疗特征子集中特征数满足预设特征数时,将所述当前医疗特征子集作为医疗特征子集。
PCT/CN2019/096262 2018-08-14 2019-07-17 医疗特征筛选方法、装置、计算机设备和存储介质 WO2020034801A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810925041.X 2018-08-14
CN201810925041.XA CN109065175A (zh) 2018-08-14 2018-08-14 医疗特征筛选方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020034801A1 true WO2020034801A1 (zh) 2020-02-20

Family

ID=64678403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096262 WO2020034801A1 (zh) 2018-08-14 2019-07-17 医疗特征筛选方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109065175A (zh)
WO (1) WO2020034801A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065175A (zh) * 2018-08-14 2018-12-21 平安医疗健康管理股份有限公司 医疗特征筛选方法、装置、计算机设备和存储介质
CN111178656A (zh) * 2019-07-31 2020-05-19 腾讯科技(深圳)有限公司 信用模型训练方法、信用评分方法、装置及电子设备
CN110738573A (zh) * 2019-09-06 2020-01-31 平安医疗健康管理股份有限公司 基于分类器的数据处理方法、设备、存储介质及装置
CN110706810A (zh) * 2019-09-29 2020-01-17 大连鸾实科技有限公司 孕产时间的估计方法、装置、计算机设备及存储介质
CN110993117A (zh) * 2019-12-26 2020-04-10 北京亚信数据有限公司 一种基于医疗大数据的非正常医保识别方法及装置
CN117558461B (zh) * 2024-01-12 2024-03-29 四川互慧软件有限公司 不同地域的同类蛇伤医疗方案选择方法、装置及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147753A1 (en) * 2015-11-25 2017-05-25 Electronics And Telecommunications Research Institute Method for searching for similar case of multi-dimensional health data and apparatus for the same
CN107066781A (zh) * 2016-11-03 2017-08-18 西南大学 基于遗传和环境相关的结直肠癌数据模型的分析方法
CN107785057A (zh) * 2017-06-19 2018-03-09 平安医疗健康管理股份有限公司 医疗数据处理方法、装置、存储介质和计算机设备
CN108346474A (zh) * 2018-03-14 2018-07-31 湖南省蓝蜻蜓网络科技有限公司 基于单词的类内分布与类间分布的电子病历特征选择方法
CN108389626A (zh) * 2018-02-09 2018-08-10 上海长江科技发展有限公司 基于人工智能的脑卒中筛查方法及系统
CN109065175A (zh) * 2018-08-14 2018-12-21 平安医疗健康管理股份有限公司 医疗特征筛选方法、装置、计算机设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102631194B (zh) * 2012-04-13 2013-11-13 西南大学 一种用于心电特征选择的禁忌搜索方法
CN106778861A (zh) * 2016-12-12 2017-05-31 齐鲁工业大学 一种关键特征的筛选方法
CN106874663A (zh) * 2017-01-26 2017-06-20 中电科软件信息服务有限公司 心脑血管疾病风险预测方法及系统
CN106778042A (zh) * 2017-01-26 2017-05-31 中电科软件信息服务有限公司 心脑血管患者相似性分析方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147753A1 (en) * 2015-11-25 2017-05-25 Electronics And Telecommunications Research Institute Method for searching for similar case of multi-dimensional health data and apparatus for the same
CN107066781A (zh) * 2016-11-03 2017-08-18 西南大学 基于遗传和环境相关的结直肠癌数据模型的分析方法
CN107785057A (zh) * 2017-06-19 2018-03-09 平安医疗健康管理股份有限公司 医疗数据处理方法、装置、存储介质和计算机设备
CN108389626A (zh) * 2018-02-09 2018-08-10 上海长江科技发展有限公司 基于人工智能的脑卒中筛查方法及系统
CN108346474A (zh) * 2018-03-14 2018-07-31 湖南省蓝蜻蜓网络科技有限公司 基于单词的类内分布与类间分布的电子病历特征选择方法
CN109065175A (zh) * 2018-08-14 2018-12-21 平安医疗健康管理股份有限公司 医疗特征筛选方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN109065175A (zh) 2018-12-21

Similar Documents

Publication Publication Date Title
WO2020034801A1 (zh) 医疗特征筛选方法、装置、计算机设备和存储介质
US20210257066A1 (en) Machine learning based medical data classification method, computer device, and non-transitory computer-readable storage medium
US11694109B2 (en) Data processing apparatus for accessing shared memory in processing structured data for modifying a parameter vector data structure
WO2021169111A1 (zh) 简历筛选方法、装置、计算机设备和存储介质
CN108536800B (zh) 文本分类方法、系统、计算机设备和存储介质
CN110504028A (zh) 一种疾病问诊方法、装置、系统、计算机设备和存储介质
CN108491511B (zh) 基于图数据的数据挖掘方法和装置、模型训练方法和装置
CN112037912A (zh) 基于医疗知识图谱的分诊模型训练方法、装置及设备
CN112016318B (zh) 基于解释模型的分诊信息推荐方法、装置、设备及介质
WO2021003938A1 (zh) 图像分类方法、装置、计算机设备和存储介质
CN109308488B (zh) 乳腺超声图像处理装置、方法、计算机设备及存储介质
CN108520041B (zh) 文本的行业分类方法、系统、计算机设备和存储介质
US10430716B2 (en) Data driven featurization and modeling
CN111145910A (zh) 基于人工智能的异常案例识别方法、装置、计算机设备
CN112035611B (zh) 目标用户推荐方法、装置、计算机设备和存储介质
CN111180086B (zh) 数据匹配方法、装置、计算机设备和存储介质
WO2022057309A1 (zh) 肺部特征识别方法、装置、计算机设备及存储介质
CN112035614B (zh) 测试集生成方法、装置、计算机设备和存储介质
CN112992377A (zh) 药物治疗结果预测模型生成方法、装置、终端及存储介质
US20210357729A1 (en) System and method for explaining the behavior of neural networks
Shams et al. REM: An integrative rule extraction methodology for explainable data analysis in healthcare
US20210397905A1 (en) Classification system
WO2020132918A1 (zh) 药品预测方法、装置、计算机设备及存储介质
CN109493975B (zh) 基于xgboost模型的慢性病复发预测方法、装置和计算机设备
CN115827877A (zh) 一种提案辅助并案的方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19849572

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19849572

Country of ref document: EP

Kind code of ref document: A1