CN111159181A - Medical data screening method and device, storage medium and electronic equipment - Google Patents

Medical data screening method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111159181A
CN111159181A CN201911311334.XA CN201911311334A CN111159181A CN 111159181 A CN111159181 A CN 111159181A CN 201911311334 A CN201911311334 A CN 201911311334A CN 111159181 A CN111159181 A CN 111159181A
Authority
CN
China
Prior art keywords
data
medical data
field
field data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911311334.XA
Other languages
Chinese (zh)
Inventor
陆可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911311334.XA priority Critical patent/CN111159181A/en
Publication of CN111159181A publication Critical patent/CN111159181A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure relates to a medical data screening method, a medical data screening device, a storage medium and electronic equipment, which are used for improving the efficiency and accuracy of medical data screening. The method comprises the following steps: acquiring medical data to be processed; inputting the medical data into a rule determination model to obtain a target inspection rule aiming at the medical data; according to the target inspection rule, inspecting and screening the medical data; the rule determination model is used for obtaining target inspection rules for medical data by: determining the data distribution characteristics of each field data in the medical data; for each field data in the medical data, determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset inspection rule corresponding to the target field data; and determining the acquired preset inspection rule as a target inspection rule for the medical data.

Description

Medical data screening method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a medical data screening method and apparatus, a storage medium, and an electronic device.
Background
With the continuous development of the medical technology level, a large amount of medical data is generated, and more medical data are stored in a database for subsequent research analysis and storage management. Therefore, how to screen valuable data from massive medical data for research analysis and storage management also becomes a problem which needs to be solved urgently. In the related art, the same data inspection rule is adopted for inspection and screening of different field data included in medical data. If the difference of the data distribution characteristics between the field data is large, the error of the inspection screening result is possibly large, and therefore subsequent research analysis and storage management cannot be well performed according to the screened data.
Disclosure of Invention
The disclosure aims to provide a medical data screening method, a medical data screening device, a storage medium and electronic equipment so as to provide a new medical data screening mode.
To achieve the above object, in a first aspect, the present disclosure provides a medical data screening method, including:
acquiring medical data to be processed;
inputting the medical data into a rule determination model to obtain a target inspection rule aiming at the medical data;
according to the target inspection rule, inspecting and screening the medical data;
the rule determination model comprises a plurality of sample medical data, a plurality of field data included in each sample medical data correspond to preset test rules respectively, and the rule determination model is used for obtaining target test rules for the medical data in the following mode:
determining data distribution characteristics of each field of data in the medical data;
for each field data in the medical data, determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset inspection rule corresponding to the target field data;
and determining the acquired preset inspection rule as a target inspection rule aiming at the medical data.
Optionally, the preset verification rule corresponding to the sample medical data is determined by:
for each sample medical data in the plurality of sample medical data, respectively performing examination screening on field data included in the sample medical data through a plurality of different examination rules;
for each field data included in the sample medical data, determining that a test rule corresponding to a target test result value is a preset test rule corresponding to the field data in a plurality of test result values obtained for the field data through a plurality of different test rules, wherein the target test result value is a test result value with the maximum data significance of the characterization.
Optionally, the method further comprises:
classifying the field data of the plurality of sample medical data;
establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification and a preset inspection rule corresponding to each type of field data;
the determining, for each field data in the medical data, target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset test rule corresponding to the target field data, includes:
for each field data in the medical data, taking the data distribution characteristics of the field data as an index, and searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data;
and determining a preset inspection rule corresponding to the target field data in the dictionary model.
Optionally, the inputting the medical data into a rule determination model to obtain a target verification rule for the medical data includes:
classifying the medical data according to a preset data characteristic classification rule to obtain a plurality of data sets of which the medical data meet different data characteristic conditions;
and respectively inputting the data of the plurality of data sets into the rule determination model to obtain a target inspection rule aiming at the medical data.
Optionally, the classifying the medical data according to a preset data feature classification rule includes:
classifying the medical data through a decision tree model or a binary classification model, wherein the decision tree model or the binary classification model is obtained by training classified sample medical data.
In a second aspect, the present disclosure also provides a medical data screening apparatus, the apparatus comprising:
the acquisition module is used for acquiring medical data to be processed;
the determining module is used for inputting the medical data into a rule determining model to obtain a target inspection rule aiming at the medical data;
the screening module is used for carrying out inspection screening on the medical data according to the target inspection rule;
the rule determination model comprises a plurality of sample medical data, a plurality of field data included in each sample medical data correspond to preset test rules respectively, and the determination module comprises:
the first determining submodule is used for determining the data distribution characteristics of each field of data in the medical data;
the second determining submodule is used for determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data according to each field data in the medical data, and acquiring a preset check rule corresponding to the target field data;
and the third determining submodule is used for determining the acquired preset inspection rule as a target inspection rule aiming at the medical data.
Optionally, the preset verification rule corresponding to the sample medical data is determined by:
the sample testing module is used for testing and screening field data included in the sample medical data through a plurality of different testing rules respectively aiming at each sample medical data in the plurality of sample medical data;
the storage module is used for determining a testing rule corresponding to a target testing result value as a preset testing rule corresponding to the field data in a plurality of testing result values obtained by aiming at the field data through a plurality of different testing rules for each field data included in the sample medical data, and the target testing result value is a testing result value with the maximum significance of the represented data.
Optionally, the apparatus further comprises:
the classification module is used for classifying the field data of the plurality of sample medical data;
the model establishing module is used for establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification processing and the preset inspection rule corresponding to each type of field data;
the second determination submodule is configured to:
for each field data in the medical data, taking the data distribution characteristics of the field data as an index, and searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data;
and determining a preset inspection rule corresponding to the target field data in the dictionary model.
In a third aspect, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first aspect.
In a fourth aspect, the present disclosure also provides an electronic device, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any one of the first aspect.
By the technical scheme, after medical data to be processed are acquired, the medical data are input into the rule determination model, a target inspection rule for the medical data is obtained, and accordingly the medical data are inspected and screened according to the target inspection rule. The medical data screening method comprises the steps that a rule determination model is established, a corresponding inspection rule is obtained for each field data in the medical data, and therefore the inspection rule corresponding to each field data is adopted for inspection screening, result errors caused by the fact that different field data are inspected and screened through the same inspection rule are avoided, the screening efficiency and accuracy of the medical data are improved, and further research analysis and storage management are better conducted according to the screened medical data.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow chart illustrating a method of medical data screening according to an exemplary embodiment of the present disclosure;
FIG. 2 is a diagram illustrating results of classifying medical data through a decision tree in a medical data screening method according to an exemplary embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a method of medical data screening according to another exemplary embodiment of the present disclosure;
FIG. 4 is a block diagram illustrating a medical data screening apparatus according to an exemplary embodiment of the present disclosure;
FIG. 5 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure;
fig. 6 is a block diagram illustrating an electronic device according to another exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
With the continuous development of the medical technology level, a large amount of medical data is generated, and more medical data are stored in a database for subsequent research analysis and management. Therefore, how to screen valuable data from massive medical data for research analysis and storage management also becomes a problem which needs to be solved urgently. In the related art, for different fields of data included in medical data, the same data inspection rule is adopted for inspection screening, for example, wald inspection is adopted for all the fields of data. If the difference of the data distribution characteristics between the field data is large, the error of the inspection screening result is possibly large, and therefore subsequent research analysis and storage management cannot be well performed according to the screened data.
For example, analysis is performed on diabetes data, and test screening is performed using the F-test for age fields, gender fields, and the like. In this case, the value of the gender field includes two cases, and the value of the age field includes multiple cases, and the difference between the data distribution characteristics of the gender field and the age field is large, if the F test is adopted for the test screening, only the result of the influence of the gender on the diabetes mellitus may be obtained, but the result of the influence of the age on the diabetes mellitus cannot be obtained.
In addition, the related art may try to determine the checking rule corresponding to the field data based on experience, for example, firstly using wald checking according to experience, and then performing F checking if the obtained result does not meet the requirement, and so on until the result meets the requirement. This approach is certainly very time consuming for massive amounts of medical data and requires sufficient empirical support to effectively implement the medical data screening process.
In view of this, embodiments of the present disclosure provide a method, an apparatus, a storage medium, and an electronic device for screening medical data, so as to improve efficiency and accuracy of screening medical data, and better perform research analysis and storage management according to the screened medical data.
It should be understood at first that the medical data screening method in the embodiment of the present disclosure may be applied to any electronic device, such as different types of electronic devices, such as a computer, a server, and the like, and the embodiment of the present disclosure is not limited thereto. If the medical data screening method is applied to a computer, research and analysis can be performed according to the screened medical data. If the medical data screening method is applied to a server, storage management can be performed on the screened medical data.
Fig. 1 is a flow chart illustrating a method of medical data screening according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the medical data screening method includes:
step 101, acquiring medical data to be processed.
Step 102, inputting the medical data into the rule determination model to obtain a target inspection rule aiming at the medical data.
And 103, checking and screening the medical data according to the target checking rule.
The rule determination model comprises a plurality of sample medical data, a plurality of field data included in each sample medical data correspond to preset inspection rules respectively, and the rule determination model is used for obtaining target inspection rules for the medical data in the following mode:
determining the data distribution characteristics of each field data in the medical data;
for each field data in the medical data, determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset inspection rule corresponding to the target field data;
and determining the acquired preset inspection rule as a target inspection rule aiming at the medical data.
By the method, after the medical data to be processed are acquired, the medical data can be input into the rule determination model, the target inspection rule for the medical data is obtained, and accordingly the medical data are inspected and screened according to the target inspection rule. The medical data screening method comprises the steps that a rule determination model is established, a corresponding inspection rule is obtained according to each field data in the medical data, and therefore the inspection rule can be adopted according to each field data to carry out inspection screening, result errors caused by the fact that the inspection screening is carried out on different field data according to the same inspection rule are avoided, the screening efficiency and accuracy of the medical data are improved, and further research analysis and storage management are better carried out according to the screened medical data.
In order to make the medical data screening method provided in the embodiments of the present disclosure more understandable to those skilled in the art, the above steps are exemplified in detail below.
In step 101, the medical data to be processed may be acquired from a database of a hospital according to actual conditions, and the acquisition mode of the medical data is not limited in the embodiment of the present disclosure. By way of example, if the medical data screening method in the embodiment of the present disclosure is applied to a client (such as a computer), a request for acquiring medical data may be triggered by a user at the client, where the request may include identification information of the medical data to be acquired, such as a name and an age of a patient corresponding to the medical data. The client can then send the request to a corresponding database, and the database can search the stored medical data for the medical data meeting the identification information according to the identification information included in the request and return the medical data to the client. Or, the medical data screening method in the embodiment of the present disclosure is applied to a server, and the acquired medical data may be sent to the server by a client, that is, the server may acquire the medical data to be processed in a receiving manner.
It should be understood that the medical data in the embodiments of the present disclosure may be medical data stored in a database in the form of a data table after data integration. Each row of the medical data table may correspond to medical data of a single user, and each column may correspond to a single field data, which may be, for example, an age field data, a gender field data, a specific medical index field data, and the like. The data distribution characteristic difference between each field data is possibly large, so that the corresponding inspection rule can be determined for each field data respectively for inspection and screening, the efficiency and the accuracy of medical data screening are improved, and the field-by-field research analysis and storage of the medical data are better performed.
In particular, the medical data may be input into the rule determination model, resulting in target verification rules for the medical data. The rule determination model comprises a plurality of sample medical data, and a plurality of field data included in each sample medical data correspond to preset inspection rules respectively.
In a possible manner, the preset verification rule corresponding to the sample medical data may be determined by: the method comprises the steps of firstly, aiming at each sample medical data in a plurality of sample medical data, respectively carrying out examination and screening on field data included in the sample medical data through a plurality of different examination rules, then aiming at each field data included in the sample medical data, determining the examination rule corresponding to a target examination result value as a preset examination rule corresponding to the field data in a plurality of examination result values obtained aiming at the field data through a plurality of different examination rules, wherein the target examination result value is an examination result value with the maximum significance of the represented data.
For example, the sample medical data may be a plurality of user medical data acquired from a hospital database in advance, and the acquisition mode of the sample medical data is not limited in the embodiments of the present disclosure. After the plurality of sample medical data are acquired, field data included in the sample medical data can be checked and screened through a plurality of different checking rules. For example, field data included in the sample medical data may be checked and screened according to different checking rules, such as wald checking, quartile checking, median checking, F checking, Restricted cubic spline (Restricted cubic spline) checking, and the like. The specific checking process for a plurality of different checking rules is similar to that in the related art, and is not described herein again.
Then, for each field data included in the sample medical data, among a plurality of test result values obtained by a plurality of different test rules, the test rule corresponding to the maximum test result value may be determined to be stored as the preset test rule corresponding to the field data. Wherein the verification result value may be of different types for different verification rules. For example, for the wald test, the quartile test, the median test, and the limited cubic spline curve test, the test result value may be a P value. Whereas for the F-test, the test result value may be a chi-squared value. It should be understood that the greater the significance of the data, the greater the P value, and the smaller the chi-squared value.
Correspondingly, for the case of performing inspection screening on field data in the sample medical data through wald inspection, quartile inspection, median inspection and limited cubic spline curve inspection, the inspection rule with the maximum P value can be determined to be stored as the preset inspection rule of the field data. Or, for the case of performing examination and screening on field data in the sample medical data by using the F-test, if the data significance of the chi-squared value representation obtained finally is greater than the data significance of the P-value representation obtained by the various examination rules, the F-test may be determined as the preset examination rule corresponding to the field data and stored.
In the above case, the determination target verification rule may be: the data distribution characteristics of each field data in the medical data are determined, then for each field data in the medical data, matching is performed in the field data of a plurality of sample medical data according to the data distribution characteristics of the field data, such as the data distribution characteristics including the data distribution condition, the average value, the absolute number, the relative number and the like of the field data, the field data with the similarity between the data distribution characteristics and the data distribution characteristics of the field data larger than or equal to the preset similarity is determined, and the field data is the target field data. The preset similarity may be set according to actual conditions, and is not limited in the embodiments of the present disclosure.
Then, a pre-set inspection rule of the pre-stored target field data may be acquired, and finally the acquired pre-set inspection rule is determined as a target inspection rule for the medical data. That is, in the embodiment of the present disclosure, the target verification rule may include a preset verification rule corresponding to each of the plurality of target field data, so that each field data in the medical data may be verified and filtered according to the target verification rule.
In other possible modes, in order to avoid comparing the field data of the medical data with the field data included in the sample medical data one by one and improve the efficiency of matching the target test rule, the method can also classify a plurality of sample medical data, and then establish a dictionary model for storage according to the data distribution characteristics of each type of field data after classification and the preset test rule corresponding to each type of field data. Accordingly, for each field data in the medical data, among the field data of the plurality of sample medical data, the target field data for which the data distribution characteristic is determined to be similar to the data distribution characteristic of the field data may be: and aiming at each field data in the medical data, taking the data distribution characteristics of the field data as an index, searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data, and then determining a preset inspection rule corresponding to the target field data in the dictionary model.
For example, the classification processing on the plurality of sample medical data may be performed through a random forest model and a data distribution characteristic of field data in the plurality of sample medical data, or may be performed through other classification models and a data distribution characteristic of field data in the plurality of sample medical data, which is not limited by the embodiment of the disclosure.
It should be appreciated that a random forest model is a classifier that trains and predicts samples using multiple decision trees to balance errors and produce high accuracy classification results. Therefore, preferably, the field data of the plurality of sample medical data can be classified and processed through the random forest model and the data distribution characteristics of the field data in the plurality of sample medical data, so that the accuracy of the subsequent determination of the target inspection rule is improved, the screening accuracy of the medical data is improved, and research analysis and storage management can be better performed according to the screened medical data.
For example, the dictionary model may include data distribution characteristics of each type of field data after the classification processing and a preset verification rule corresponding to each type of field data. The data distribution characteristic of each type of field data may be an average data distribution characteristic of all field data in the type of field data, or may be a data distribution characteristic of a certain field data randomly determined in each type of field data, and the like, which is not limited in this disclosure. In addition, since the data distribution characteristics of each field data in each type of field data are similar, the preset verification rule corresponding to each field data in each type of field data may be the same, so that the preset verification rule corresponding to each type of field data may be the preset verification rule corresponding to any field data in the type of field data.
After the field data of the sample medical data are classified and the dictionary model is obtained in the mode, the data distribution characteristics of the field data can be used as an index for each field data in the medical data, matching and searching are carried out in the dictionary model, and therefore target field data with the data distribution characteristics similar to the data distribution characteristics of the field data are obtained quickly, the target inspection rule is determined more efficiently according to the preset inspection rule of the target field data, then the medical data are inspected and screened according to the target inspection rule, and the screening efficiency and the accuracy rate of the medical data are improved.
In practical applications, there are cases where, for example, in analyzing the influence of age on diabetes, for the medical data of patients with diabetes, the influence of age on diabetes can be obtained after performing wald test screening on the age field. However, for medical data in both categories, including diabetic and non-patient, the effect of age on diabetes may not be available after the wald test screening of the age field. That is, in practical applications, even if different inspection rules are used for inspection screening for different field data, subsequent research analysis and storage management may not be performed well according to the screened data.
In order to solve the above problem, in the embodiment of the present disclosure, the medical data may be classified according to a preset data feature classification rule to obtain a plurality of data sets that the medical data satisfies different data feature conditions, and then the data of the plurality of data sets is input into the rule determination model, so as to obtain a target inspection rule for the medical data.
That is to say, in the embodiment of the present disclosure, in addition to the determination according to the field data when determining the inspection rule, before determining the inspection rule, the medical data may be divided according to different data characteristic conditions according to actual conditions, so as to obtain a more accurate medical data screening result, and further perform subsequent research analysis and storage management according to the screened data better.
For example, the preset data feature classification rule may be set according to actual conditions, and the embodiment of the present disclosure does not limit this. For example, in the above example, the medical data may be classified according to whether diabetes is present, and so on.
In a possible manner, according to the preset data feature classification rule, the classification of the medical data may be: the medical data is classified by a decision tree model or a binary model, which may be obtained by training classified sample medical data.
For example, in the analysis process of "whether diabetes occurs" on the medical data, the medical data is classified by the decision tree model, and the result shown in fig. 2 can be obtained (fig. 2 only shows a part of the classification result). Referring to fig. 2, for a certain medical data, the age field may be divided into two data sets according to whether the age is greater than 50 years, and then for each data set in the two data sets, the data sets may be further divided according to gender, so as to obtain two data sets with genders being female and male, respectively. Finally, for each data set subjected to gender division, the data sets can be further divided according to whether the blood sugar is 30 or not, and two data sets with the blood sugar being more than 30 and the blood sugar being less than or equal to 30 are obtained. By the method, a plurality of data sets meeting different data characteristic conditions in the medical data can be obtained, then the data of the data sets can be input into the rule determination model, the target inspection rule aiming at the medical data is obtained, and the screening accuracy of the medical data is further improved.
The method for screening medical data in the embodiment of the present disclosure is explained below by another exemplary embodiment. Referring to fig. 3, the medical data screening method may include:
step 301, for each sample medical data in the plurality of sample medical data, performing a test screening on field data included in the sample medical data through a plurality of different test rules.
Step 302, for each field data, determining a checking rule corresponding to the target checking result value as a preset checking rule corresponding to the field data among a plurality of checking result values obtained for the field data by a plurality of different checking rules.
And 303, classifying the plurality of sample medical data according to the random forest model and the data distribution characteristics of the field data in the plurality of sample medical data.
And 304, establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification and the preset inspection rule corresponding to each type of field data.
Step 305, medical data to be processed is acquired.
And step 306, classifying the medical data through the decision tree model to obtain a plurality of data sets of which the medical data meet different data characteristic conditions.
Step 307, inputting the data of the plurality of data sets into the rule determination model.
In step 308, the data distribution characteristics of each field of data in the medical data are determined.
Step 309, for each field data in the medical data, determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset inspection rule corresponding to the target field data.
And step 310, determining the acquired preset inspection rule as a target inspection rule for the medical data.
And 311, checking and screening the medical data according to the target checking rule.
The detailed description of the above steps is given above for illustrative purposes, and will not be repeated here. It will also be appreciated that for simplicity of explanation, the above-described method embodiments are all presented as a series of acts or combination of acts, but those skilled in the art will recognize that the present disclosure is not limited by the order of acts or combination of acts described above. Further, those skilled in the art will also appreciate that the embodiments described above are preferred embodiments and that the steps involved are not necessarily required for the present disclosure.
By the mode, the corresponding inspection rules can be adopted for inspecting and screening the data of each field, so that the result error caused by the fact that the same inspection rules are adopted for inspecting and screening the data of different fields is avoided, the screening efficiency and accuracy of the medical data are improved, and research analysis and management are better performed according to the screened medical data.
Based on the same inventive concept, the embodiment of the present disclosure further provides a medical data screening apparatus, which may be a part or all of an electronic device through hardware, software, or a combination of the two. Referring to fig. 4, the medical data screening apparatus may include:
an obtaining module 401, configured to obtain medical data to be processed;
a determining module 402, configured to input the medical data into a rule determining model, so as to obtain a target inspection rule for the medical data;
a screening module 403, configured to perform inspection screening on the medical data according to the target inspection rule;
wherein the rule determination model includes a plurality of sample medical data, a plurality of field data included in each sample medical data respectively correspond to a preset test rule, and the determination module 402 includes:
the first determining sub-module 4021 is used for determining the data distribution characteristics of each field of data in the medical data;
the second determining submodule 4022 is configured to determine, for each field data in the medical data, a target field data of which data distribution characteristics are similar to those of the field data in the field data of the plurality of sample medical data, and acquire a preset check rule corresponding to the target field data;
the third determining sub-module 4023 is configured to determine the acquired preset inspection rule as a target inspection rule for the medical data.
Optionally, the preset verification rule corresponding to the sample medical data is determined by:
the sample testing module is used for testing and screening field data included in the sample medical data through a plurality of different testing rules respectively aiming at each sample medical data in the plurality of sample medical data;
the storage module is used for determining a testing rule corresponding to a target testing result value as a preset testing rule corresponding to the field data in a plurality of testing result values obtained by aiming at the field data through a plurality of different testing rules for each field data included in the sample medical data, and the target testing result value is a testing result value with the maximum significance of the represented data.
Optionally, the apparatus 400 further comprises:
the classification module is used for classifying the field data of the plurality of sample medical data;
the model establishing module is used for establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification processing and the preset inspection rule corresponding to each type of field data;
the second determining sub-module 4022 is configured to:
for each field data in the medical data, taking the data distribution characteristics of the field data as an index, and searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data;
and determining a preset inspection rule corresponding to the target field data in the dictionary model.
Optionally, the determining module 402 is configured to:
classifying the medical data according to a preset data characteristic classification rule to obtain a plurality of data sets of which the medical data meet different data characteristic conditions;
and respectively inputting the data of the plurality of data sets into the rule determination model to obtain a target inspection rule aiming at the medical data.
Optionally, the determining module 402 is configured to:
classifying the medical data through a decision tree model or a binary classification model, wherein the decision tree model or the binary classification model is obtained by training classified sample medical data.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of any of the above medical data screening methods.
In one possible approach, a block diagram of the electronic device may be as shown in fig. 5. Referring to fig. 5, the electronic device 500 may include: a processor 501 and a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
The processor 501 is configured to control the overall operation of the electronic device 500, so as to complete all or part of the steps of the medical data screening method. The memory 502 is used to store various types of data to support operation at the electronic device 500, such as instructions for any application or method operating on the electronic device 500 and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 502 or transmitted through the communication component 505. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 504 provides an interface between the processor 501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 505 may thus comprise: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the medical data filtering method described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the medical data screening method described above is also provided. For example, the computer readable storage medium may be the memory 502 described above including program instructions executable by the processor 501 of the electronic device 500 to perform the medical data screening method described above.
In another possible approach, a block diagram of the electronic device may be as shown in fig. 6. Referring to fig. 6, the electronic device may be provided as a server. Referring to fig. 6, the electronic device 600 includes a processor 622, which may be one or more in number, and a memory 632 for storing computer programs executable by the processor 622. The computer program stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processor 622 may be configured to execute the computer program to perform the medical data screening method described above.
Additionally, electronic device 600 may also include a power component 626 that may be configured to perform power management of electronic device 600 and a communication component 650 that may be configured to enable communication, e.g., wired or wireless communication, of electronic device 600. The electronic device 600 may also include input/output (I/O) interfaces 658. The electronic device 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OSXTM, UnixTM, LinuxTM, and the like.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the medical data screening method described above is also provided. For example, the computer readable storage medium may be the memory 632 described above that includes program instructions executable by the processor 622 of the electronic device 600 to perform the medical data screening methods described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned medical data screening method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method of medical data screening, the method comprising:
acquiring medical data to be processed;
inputting the medical data into a rule determination model to obtain a target inspection rule aiming at the medical data;
according to the target inspection rule, inspecting and screening the medical data;
the rule determination model comprises a plurality of sample medical data, a plurality of field data included in each sample medical data correspond to preset test rules respectively, and the rule determination model is used for obtaining target test rules for the medical data in the following mode:
determining data distribution characteristics of each field of data in the medical data;
for each field data in the medical data, determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset inspection rule corresponding to the target field data;
and determining the acquired preset inspection rule as a target inspection rule aiming at the medical data.
2. The method of claim 1, wherein the predetermined test rule corresponding to the sample medical data is determined by:
for each sample medical data in the plurality of sample medical data, respectively performing examination screening on field data included in the sample medical data through a plurality of different examination rules;
for each field data included in the sample medical data, determining that a test rule corresponding to a target test result value is a preset test rule corresponding to the field data in a plurality of test result values obtained for the field data through a plurality of different test rules, wherein the target test result value is a test result value with the maximum data significance of the characterization.
3. The method of claim 1, further comprising:
classifying the field data of the plurality of sample medical data;
establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification and a preset inspection rule corresponding to each type of field data;
the determining, for each field data in the medical data, target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data, and acquiring a preset test rule corresponding to the target field data, includes:
for each field data in the medical data, taking the data distribution characteristics of the field data as an index, and searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data;
and determining a preset inspection rule corresponding to the target field data in the dictionary model.
4. The method of any of claims 1-3, wherein said entering the medical data into a rule determination model resulting in a target verification rule for the medical data comprises:
classifying the medical data according to a preset data characteristic classification rule to obtain a plurality of data sets of which the medical data meet different data characteristic conditions;
and respectively inputting the data of the plurality of data sets into the rule determination model to obtain a target inspection rule aiming at the medical data.
5. The method of claim 4, wherein the classifying the medical data according to the preset data feature classification rule comprises:
classifying the medical data through a decision tree model or a binary classification model, wherein the decision tree model or the binary classification model is obtained by training classified sample medical data.
6. A medical data screening apparatus, the apparatus comprising:
the acquisition module is used for acquiring medical data to be processed;
the determining module is used for inputting the medical data into a rule determining model to obtain a target inspection rule aiming at the medical data;
the screening module is used for carrying out inspection screening on the medical data according to the target inspection rule;
the rule determination model comprises a plurality of sample medical data, a plurality of field data included in each sample medical data correspond to preset test rules respectively, and the determination module comprises:
the first determining submodule is used for determining the data distribution characteristics of each field of data in the medical data;
the second determining submodule is used for determining target field data with data distribution characteristics similar to those of the field data in the field data of the plurality of sample medical data according to each field data in the medical data, and acquiring a preset check rule corresponding to the target field data;
and the third determining submodule is used for determining the acquired preset inspection rule as a target inspection rule aiming at the medical data.
7. The apparatus of claim 6, wherein the predetermined test rule corresponding to the sample medical data is determined by:
the sample testing module is used for testing and screening field data included in the sample medical data through a plurality of different testing rules respectively aiming at each sample medical data in the plurality of sample medical data;
the storage module is used for determining a testing rule corresponding to a target testing result value as a preset testing rule corresponding to the field data in a plurality of testing result values obtained by aiming at the field data through a plurality of different testing rules for each field data included in the sample medical data, and the target testing result value is a testing result value with the maximum significance of the represented data.
8. The apparatus of claim 6, further comprising:
the classification module is used for classifying the field data of the plurality of sample medical data;
the model establishing module is used for establishing a dictionary model for storage according to the data distribution characteristics of each type of field data after classification processing and the preset inspection rule corresponding to each type of field data;
the second determination submodule is configured to:
for each field data in the medical data, taking the data distribution characteristics of the field data as an index, and searching in the dictionary model to determine target field data with the data distribution characteristics similar to the data distribution characteristics of the field data;
and determining a preset inspection rule corresponding to the target field data in the dictionary model.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 5.
CN201911311334.XA 2019-12-18 2019-12-18 Medical data screening method and device, storage medium and electronic equipment Pending CN111159181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911311334.XA CN111159181A (en) 2019-12-18 2019-12-18 Medical data screening method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911311334.XA CN111159181A (en) 2019-12-18 2019-12-18 Medical data screening method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111159181A true CN111159181A (en) 2020-05-15

Family

ID=70557795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911311334.XA Pending CN111159181A (en) 2019-12-18 2019-12-18 Medical data screening method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111159181A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101990208A (en) * 2009-07-31 2011-03-23 中国移动通信集团公司 Automatic data checking method, system and equipment
CN104182958A (en) * 2013-05-21 2014-12-03 索尼公司 Target detection method and device
US20150052601A1 (en) * 2012-03-30 2015-02-19 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for rapid filtering of opaque data traffic
AU2016200847A1 (en) * 2005-11-29 2016-02-25 PhysIQ Inc. Residual-based monitoring of human health
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
CN109388675A (en) * 2018-10-12 2019-02-26 平安科技(深圳)有限公司 Data analysing method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2016200847A1 (en) * 2005-11-29 2016-02-25 PhysIQ Inc. Residual-based monitoring of human health
CN101990208A (en) * 2009-07-31 2011-03-23 中国移动通信集团公司 Automatic data checking method, system and equipment
US20150052601A1 (en) * 2012-03-30 2015-02-19 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for rapid filtering of opaque data traffic
CN104182958A (en) * 2013-05-21 2014-12-03 索尼公司 Target detection method and device
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
CN109388675A (en) * 2018-10-12 2019-02-26 平安科技(深圳)有限公司 Data analysing method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘晓平: "队列人群冠心病、脑卒中死亡的统计分析方法探讨" *

Similar Documents

Publication Publication Date Title
CN112889042A (en) Identification and application of hyper-parameters in machine learning
Hutchinson et al. Models and machines: how deep learning will take clinical pharmacology to the next level
CN110851539A (en) Metadata verification method and device, readable storage medium and electronic equipment
EP3311311A1 (en) Automatic entity resolution with rules detection and generation system
KR101850993B1 (en) Method and apparatus for extracting keyword based on cluster
CN111489517B (en) Screw locking abnormality warning method, device, computer device and storage medium
CN111191276A (en) Data desensitization method and device, storage medium and computer equipment
CN115277261B (en) Abnormal machine intelligent identification method, device and equipment based on industrial control network virus
US20180196924A1 (en) Computer-implemented method and system for diagnosis of biological conditions of a patient
CN108780047B (en) Method for detecting substance component, related device and computer-readable storage medium
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
JP2016146165A (en) Medical coding management system using an intelligent coding, reporting, and analytics-focused tool
KR20230165085A (en) Method and system for quantitatively evaluating alignment between multimodal feature vectors
CN109726826A (en) Training method, device, storage medium and the electronic equipment of random forest
CN109828902B (en) Interface parameter determining method and device, electronic equipment and storage medium
CN111241274B (en) Criminal legal document processing method and device, storage medium and electronic equipment
CN116910374A (en) Knowledge graph-based health care service recommendation method, device and storage medium
CN111159181A (en) Medical data screening method and device, storage medium and electronic equipment
CN111611781A (en) Data labeling method, question answering method, device and electronic equipment
US11715204B2 (en) Adaptive machine learning system for image-based biological sample constituent analysis
CN111291186B (en) Context mining method and device based on clustering algorithm and electronic equipment
CN114153954A (en) Test case recommendation method and device, electronic equipment and storage medium
CN110837469A (en) Data testing method and device, testing server and storage medium
CN113268494B (en) Method and device for processing database statement to be optimized
CN113127542B (en) Data anomaly analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200515