CN112652375B - Medicine recommendation method, device, electronic equipment and storage medium - Google Patents
Medicine recommendation method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112652375B CN112652375B CN202110022884.0A CN202110022884A CN112652375B CN 112652375 B CN112652375 B CN 112652375B CN 202110022884 A CN202110022884 A CN 202110022884A CN 112652375 B CN112652375 B CN 112652375B
- Authority
- CN
- China
- Prior art keywords
- related information
- class
- recommendation
- patient
- quasi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003814 drug Substances 0.000 title claims abstract description 98
- 229940079593 drug Drugs 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000007781 pre-processing Methods 0.000 claims abstract description 30
- 238000003066 decision tree Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 28
- 239000011159 matrix material Substances 0.000 claims description 38
- 238000012549 training Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000002787 reinforcement Effects 0.000 claims description 6
- 208000031940 Disease Attributes Diseases 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010061619 Deformity Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013503 de-identification Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Computer Security & Cryptography (AREA)
- Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a medicine recommendation method, a medicine recommendation device, electronic equipment and a storage medium. The method comprises the following steps: acquiring related information of a target object; performing privacy protection preprocessing on the related information; generating drug recommendation information for the target object based on the related information subjected to privacy preserving preprocessing and a model based on a gradient lifting decision tree algorithm. The medicine recommendation method can accurately and reliably recommend medicines to patients and simultaneously effectively protect privacy of the patients; the data of different types can be subjected to proper privacy protection processing; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much accuracy.
Description
Technical Field
The present invention relates to the field of medical service recommendation, and in particular, to a method, an apparatus, an electronic device, and a storage medium for recommending a drug.
Background
In order to provide accurate drug recommendation for patients, medical recommendation systems need to collect medical data of the patients, more detailed information is often provided with more accurate recommendation effects, privacy risks generated when the system collects and processes the patient data are often underestimated or ignored, and the problem of privacy leakage of the patients is also serious, so that it is important to adopt an effective method for protecting privacy of users in drug recommendation.
To protect privacy concerns, privacy preserving algorithms are becoming increasingly more focused on such techniques as data anonymization (also known as de-identification), data scrambling, data encryption, and access control. With the increase of big data, differential privacy has attracted extensive attention from students and is applied to the field of medical recommendation.
In many researches on privacy recommendation algorithms, a scoring matrix of a user is used for recommendation, when privacy processing is carried out on data before recommendation, only a single data type is processed, for example, a random disturbance technology is used for processing numerical user data, then a user scoring matrix is created, and for drug recommendation, the data of a classification type and a numerical type are contained, classification processing is needed, and the classification type data cannot generate the user scoring matrix.
In view of this, it is very necessary to select different privacy protection techniques to separately process data and select a recommendation algorithm with stronger robustness, and also consider inference attacks of the recommendation system, to more effectively protect the privacy of the patient without losing much accuracy.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a medicine recommending method, a device, electronic equipment and a storage medium, which are used for solving the defects that different types of data cannot be subjected to proper privacy protection treatment, the robustness of a recommending algorithm is low, the precision is poor and the privacy of a patient cannot be effectively protected in the prior art, and realizing the effect of accurately and reliably recommending medicines to the patient and simultaneously effectively protecting the privacy of the patient.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for recommending a drug, including:
Acquiring related information of a target object;
performing privacy protection preprocessing on the related information;
Generating drug recommendation information for the target object based on the related information subjected to privacy preserving preprocessing and a model based on a gradient lifting decision tree algorithm.
Further, the privacy protection preprocessing of the related information includes:
determining a quasi identifier attribute in the related information;
Adding random disturbance to the numerical standard identifier attribute in the related information;
and carrying out K-anonymization processing on the classification type quasi identifier attribute in the related information.
Further, the model based on the gradient boost decision tree algorithm comprises a model based on the gradient boost decision tree algorithm with a differential privacy protection function.
Further, before adding random disturbance to the numerical quasi-identifier attribute in the related information and performing K-anonymization processing on the classified quasi-identifier attribute in the related information, the privacy preserving preprocessing on the related information further includes:
based on the demographic information of the patient, the condition attributes of the patient, and the medication attributes used, a patient-medication matrix C is generated,
Where n represents the number of attributes of the patient and m represents the number of patients.
Further, the adding random perturbation to the numerical quasi-identifier attribute in the related information includes:
Determining a disturbance range [ -gamma, gamma ];
Random numbers uniformly distributed in [ -gamma, gamma ] are added to the numerical quasi-identifier attributes in the matrix C.
Further, the K-anonymizing the classified quasi-identifier attribute in the related information comprises:
Step 1), anonymizing classification quasi-identifier attributes in the related information by utilizing KACA algorithm, and generating an initial equivalence class X= { X 1,x2,x3,…,xn } based on a matrix C, wherein X is an equivalence class set, X 1,x2,x3,…,xn is an equivalence class, and the values of quasi-identifiers of the tuples in each equivalence class are equal;
Step 2), selecting an equivalent class X i with the number of tuples smaller than K, calculating the distances between the equivalent class X i and equivalent classes except the equivalent class X i in the equivalent class set X, and finding an equivalent class X j closest to the equivalent class X i;
If the number of common tuples of equivalence classes X i and X j is less than K, equivalence classes X i and X j are combined into a class, and equivalence class X j is deleted from equivalence class set X,
If the number of the common tuples of the equivalence classes X i and X j is greater than or equal to K, selecting K- |x i |tuples nearest to X i from X j to form an equivalence class X j1, merging the equivalence classes X i and X j1 into one class, and deleting the equivalence class X j1 from the equivalence class set X;
Step 3), circularly executing the step 2) for a plurality of times until no equivalent class with the number of the tuples smaller than K exists in the equivalent class set X, so that each tuple is at least consistent with the standard identifier of K-1 records in the equivalent class;
and 4) generalizing each equivalence class in the equivalence class set X.
Further, the method further comprises:
and after anonymizing the classified quasi-identifier attribute in the matrix C, performing One-hot encoding on the classified quasi-identifier attribute in the matrix C meeting k anonymity to form a feature matrix E.
Further, the method further comprises:
Training a model based on a gradient lifting decision tree algorithm with a differential privacy protection function, wherein the training of the model based on the gradient lifting decision tree algorithm with the differential privacy protection function comprises the following steps:
Inputting the characteristic matrix E into a model, wherein the standard identifier attribute and the patient disease attribute are taken as characteristics of the model, and the medicine name is taken as a prediction target of the model;
Updating the iterative reinforcement learner by using the residual error;
and carrying out differential privacy processing on the obtained strong learner based on the Laplace mechanism.
Further, the related information includes demographic information including age, sex, zip code, and address of the target subject, treatment information, and medication information,
The age, sex and postal code of the target object are numerical standard identifier attributes, and the address of the target object is a classification standard identifier attribute.
In a second aspect, the present invention provides a medication recommendation device comprising:
The acquisition unit is used for acquiring the related information of the target object;
the privacy protection preprocessing unit is used for carrying out privacy protection preprocessing on the related information;
And the medicine recommendation unit is used for generating medicine recommendation information aiming at the target object based on the related information subjected to privacy protection preprocessing and a model based on a gradient lifting decision tree algorithm.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the medicament recommendation method as described above when the program is executed.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the drug recommendation method as described above.
According to the medicine recommendation method, the device, the electronic equipment and the storage medium, privacy protection preprocessing is carried out on the related information, and medicine recommendation information for the target object is generated based on the related information subjected to the privacy protection preprocessing and a model based on a gradient lifting decision tree algorithm. The medicine recommendation method can accurately and reliably recommend medicines to patients and simultaneously effectively protect privacy of the patients; the data of different types can be subjected to proper privacy protection processing; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a method for recommending drugs according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a differential privacy recommendation method based on a k-anonymity model and random perturbation according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a method for training GBDT models with differential privacy according to one embodiment of the present invention;
fig. 4 is a schematic structural diagram of a drug recommendation device according to an embodiment of the present invention; and
Fig. 5 is a schematic structural diagram of an electronic device for drug recommendation according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The privacy risk generated during the collection and processing of patient data by the system is often underestimated or ignored, and the problem of privacy leakage of patients is also serious, so that it is important to adopt an effective method to protect the privacy of users in drug recommendation.
In many researches on privacy recommendation algorithms, a scoring matrix of a user is used for recommendation, when privacy processing is carried out on data before recommendation, only a single data type is processed, for example, a random disturbance technology is used for processing numerical user data, then a user scoring matrix is created, and for drug recommendation, the data of a classification type and a numerical type are contained, classification processing is needed, and the classification type data cannot generate the user scoring matrix.
Therefore, the improved medicine recommendation method can resist reasoning attack of a recommendation system on the premise that the accuracy of a recommendation result is basically not lost for medical medicine recommendation, and has the beneficial effects of protecting patient privacy more effectively and being high in algorithm robustness. For this reason, the present invention provides a method, an apparatus, an electronic device, and a storage medium for recommending drugs, and the details provided by the present invention will be explained and illustrated in detail by specific embodiments.
Fig. 1 shows a schematic diagram in a drug recommendation method. As shown in fig. 1, the method for recommending medicine provided by the embodiment of the invention comprises the following steps:
Step 110: acquiring related information of a target object;
step 120: performing privacy protection preprocessing on the related information;
step 130: generating drug recommendation information for the target object based on the related information subjected to privacy preserving preprocessing and a model based on a gradient lifting decision tree algorithm.
In step 110, in one example, the target object is a patient, preferably including a patient who has had a disease and has undergone treatment at various hospitals and clinics and has treatment information (electronic medical records) left in the system of the various hospitals and clinics. Of course, the target object may also include a person without disease history, and the drug recommendation method of the present invention may directly recommend an appropriate drug to the corresponding target object according to personal information of the target object, such as age, etc. At this point, the attributes of the patient, drug and condition, may be left blank or set to 0 or set to other suitable values or codes.
In one example, the relevant information includes demographic information including age, gender, zip code, address, etc. of the target subject, treatment information, and medication information. The age, sex and postal code of the target object are numerical standard identifier attributes, and the address of the target object is a classification standard identifier attribute. Of course, the related information may also include other types of data besides age, gender, zip code, and address.
The medicine recommendation method provided by the invention is used for respectively carrying out different privacy processing methods on the numerical standard identifier attribute and the classification standard identifier attribute, so that the privacy of an individual is comprehensively protected.
In step 120, the relevant information is subjected to privacy preserving preprocessing, specifically, in one example, privacy preserving preprocessing includes adding random perturbation to numeric data in the relevant information and/or K-anonymizing classification data in the relevant information.
Aiming at the numerical value type and the parting type data in the related information, a random disturbance technology and a k anonymization technology are respectively adopted to carry out privacy treatment, so that the system is prevented from acquiring the real sensitive data of the patient. The random disturbance technology is simple in operation and high in efficiency, is suitable for processing numerical data, and the k anonymization technology divides the data according to the equivalence class and then generalizes the data, so that the method is suitable for processing data of different types.
In step 130, drug recommendation information for the target object is generated based on the relevant information that has been privacy preserving pre-processed and on a model of a gradient boost decision tree algorithm.
The personal information of the patient has certain anti-theft capability after privacy treatment by a random disturbance technology and a k anonymity technology. Finally, in order to resist the reasoning attack of the recommendation system, a gradient lifting decision tree (GBDT) model meeting the differential privacy is utilized to predict the drug list. The collaborative filtering algorithm based on the gradient lifting decision tree (GBDT) model can process numerical data and classified data at the same time, so that privacy of a patient is effectively protected.
Fig. 2 shows a schematic diagram of a differential privacy recommendation method based on a k-anonymity model and random perturbation according to an embodiment of the present invention.
As shown in fig. 2, demographic data and treatment information can be obtained through electronic patient history, electronic patient history and drug data are used as input of a model, the model generates a patient-drug matrix C, random disturbance is added to numerical data in the related information and/or K-anonymization processing is performed on classified data in the related information, one-hot encoding is performed on classified data in a feature matrix C meeting K anonymity, a feature matrix E is finally formed, a GBDT model with differential privacy protection is trained, and a trained GBDT model with differential privacy protection is utilized to predict drugs to generate a drug recommendation list.
In the embodiment, the privacy protection technology and the gradient lifting decision tree algorithm are combined to predict the medicine and generate the recommendation list, so that the medicine recommendation is better performed on the patient, and the privacy of the patient is protected.
Based on the foregoing embodiment, in the method for recommending a drug according to another embodiment of the present invention, the privacy protection preprocessing for the related information includes:
determining a quasi identifier attribute in the related information;
Adding random disturbance to the numerical standard identifier attribute in the related information;
and carrying out K-anonymization processing on the classification type quasi identifier attribute in the related information.
Quasi identifier (QI, capable of determining a user record with a high probability in combination with certain external information): a single column does not locate an individual, but multiple columns of information can be used to potentially identify an individual. In one example, the system can potentially identify an individual through multiple columns of information, and the attribute revealing sensitive information of this individual is a quasi-identifier attribute, i.e., a quasi-identifier attribute is an attribute that is vulnerable to revealing sensitive information of the individual by the system attack. In one example, there is a certain correlation between multiple quasi identifier attributes. In one example, the quasi-identifier attribute is an attribute of an individual's age, gender, zip code, and address.
K represents the number of standard identifiers, namely the number of standard identifiers is at least K, namely a certain number (at least K) of records indistinguishable on the standard identifiers exist in the data required to be released, so that an attacker cannot distinguish a specific individual to which private information belongs, personal privacy is protected, and the maximum information leakage risk which can be born by a user is designated by K-anonymity through a parameter K. K-anonymization protects the privacy of individuals to some extent, but at the same time reduces the usability of the data. Thus, research efforts for k-anonymization have focused mainly on improving the usability of data while protecting private information.
Aiming at the numerical value type data and the parting type data, a random disturbance technology and a K anonymization technology are respectively adopted to carry out privacy treatment, so that the system is prevented from acquiring the real sensitive data of the patient. The random disturbance technology is simple in operation and high in efficiency, is suitable for processing numerical data, and the K anonymization technology divides the data according to the equivalence class, then generalizes the data and is suitable for processing data of different types.
In the above embodiment, the privacy protection effect of the patient is improved by performing different processing on different data in the information of the patient.
Based on the above embodiment, in the drug recommendation method provided by the other embodiment of the present invention, the model based on the gradient lifting decision tree algorithm has a differential privacy protection function.
In the embodiment, the K anonymity and random disturbance technology is combined with the differential privacy technology, so that the system is effectively prevented from acquiring the real data of the patient, and the inference attack of the recommendation system is solved.
Based on the above embodiment, in the method for recommending a drug provided in another embodiment of the present invention, before adding random disturbance to the numeric type standard identifier attribute in the related information and performing K-anonymization processing on the classified data in the related information, performing privacy protection preprocessing on the related information further includes:
based on the demographic information of the patient, the condition attributes of the patient, and the medication attributes used, a patient-medication matrix C is generated,
Where n represents the number of attributes of the patient, m represents the number of patients, in one example, A 11 represents the age of the first patient, A 12 represents the sex of the first patient, A 13 represents the zip code of the first patient, A 14 represents the address of the first patient, A 15 represents the condition attribute of the first patient, and A 16 represents the medication attribute of the first patient.
Wherein prior to generating matrix C, further comprising cleansing the patient's demographic data and treatment information, and also the medication data, removing null data, and digitizing the gender attribute (in one example, male 0, female 1); demographic information (age, sex, zip code, address) of the patient is retained, along with patient condition attributes and medication attributes used.
In the above embodiment, the patient-drug matrix C is formed by counting the patient's data, providing a data basis for the subsequent addition of random perturbations and K-anonymization.
Based on the above embodiment, in the method for recommending a drug according to another embodiment of the present invention, the adding random disturbance to the numerical type standard identifier attribute in the related information includes:
determining a disturbance range [ -gamma, gamma ], wherein the value of gamma takes 1;
Random numbers uniformly distributed in [ -gamma, gamma ] are added to the numerical quasi-identifier attributes in the matrix C.
The size of the disturbance range [ -gamma, gamma ] can be determined according to the actual situation, and the number of random numbers to be added can be adjusted and changed. Of course, in one example, the numerical values may be added to the numerical value type data in the matrix C according to a predetermined rule, so long as the effect of disturbing the numerical value type data in the matrix C is achieved.
In the above embodiment, the protection of the privacy of the patient is achieved by adding random perturbation to the numerical data.
Based on the above embodiment, in the drug recommendation method provided in another embodiment of the present invention, the performing K-anonymization processing on the classified quasi-identifier attribute in the related information includes:
Step 1), anonymizing classification quasi-identifier attributes in the related information by utilizing KACA algorithm, and generating an initial equivalence class X= { X 1,x2,x3,…,xn } based on a matrix C, wherein X is an equivalence class set, X 1,x2,x3,…,xn is an equivalence class, and the values of quasi-identifiers of the tuples in each equivalence class are equal;
Step 2), when equivalent classes with the number of the tuples smaller than K exist in the equivalent class set X, equivalent class X i with the number of the tuples smaller than K is selected, the distances between the equivalent class X i and equivalent classes except the equivalent class X i in the equivalent class set X are calculated, and an equivalent class X j closest to the equivalent class X i is found;
If the number of common tuples of equivalence classes X i and X j is less than K, merging equivalence classes X i and X j into a class, and deleting equivalence class X j in equivalence class set X, i.e., x=x-X j
If the number of common tuples of equivalence classes X i and X j is greater than or equal to K, selecting K- |x i | tuple nearest to X i from X j to form equivalence class X j1, merging equivalence classes X i and X j1 into one class, and deleting equivalence class X j1, namely X from equivalence class set X i=xi∪xj1,X=X-xj1
Step 3), circularly executing the step 2) for a plurality of times until no equivalent class with the number of the tuples smaller than K exists in the equivalent class set X, so that each tuple is at least consistent with the standard identifier of K-1 records in the equivalent class; until the equivalence class in X meets K anonymity protection or a new equivalence class cannot be constructed, the remaining record is inserted into the equivalence class nearest to it.
And 4) performing generalization processing on each equivalent class in the equivalent class set X, and outputting a hidden name table T'.
Specifically, in another example, an equivalence class T equal to s is randomly selected, where s is greater than 1 and less than K, and a distance Dist of T from other equivalence classes is calculated (T 1,t2). Where the distance of the equivalence class is represented by the distance between tuples, the distance between two tuples is the distance between the two tuples and the generalization set nearest to them:
Dist(t1,t2)=Distortion(t1,t12)+Distortion(t2,t12)
Wherein, the disfigurement (D, D ') represents a degree of deformation of the data D into D', D 'is a generalization table of D, t i is an i-th tuple in D, t i' is a generalization tuple of t i, and t i 'belongs to D'. The value of the degree of Distortion is the weighted hierarchy distance WHD (Weighted hierarchical distance) between each tuple and its final generalization table, and then the sum is accumulated:
Where WHD (p, q) represents the distance from X p to X q (p > q), let h be the highest level of generalization of the categorical quasi-identifier property S, X h be the generalization domain, w j,j-1 be the generalization weight between X j and X j-1 (2. Ltoreq.j.ltoreq.h), Wherein beta is simply 1.
Finding the equivalent class T1 with the minimum Dist value of the distance T, and merging and generalizing the T to the T1. T is at T1 and is one type, and T1 are generalized.
Repeating the merging and generalizing steps until no equivalent class with the number of the tuples smaller than K exists in the equivalent class set X, so that each tuple is at least consistent with the standard identifier of K-1 records in the equivalent class, and all the equivalent classes of the equivalent class set X meet K anonymity.
Returning the anonymized matrix C.
The basic idea of KACA algorithm is: classifying an original data set according to the similarity degree of standard identifiers, and forming tuples with the same standard identifiers into a class to obtain a plurality of initial equivalent classes; then, an equivalence class is selected at will, the number of the contained tuples is less than k, the equivalence class closest to the equivalence class is found out, the equivalence classes are combined, the operation is circularly executed until the new equivalence class cannot be combined, and the range of the number of the finally obtained equivalence class containing the tuples is [ k,2k-1]; finally, the tuples of the same equivalence class are generalized to the same value on the standard identifier, and a hidden name table is generated. The algorithm can make anonymous data in the same equivalence class be similar as far as possible on the standard identifier, and data in different equivalence classes have larger difference so as to ensure the usability of the data.
In the above embodiment, the KACA algorithm is adopted to divide the equivalence class, the KACA algorithm is a typical local recoding algorithm for dividing the equivalence class by combining the clustering thought, and the equivalence class obtained by dividing by the algorithm has higher data availability and smaller information loss. Therefore, the KACA method can be adopted to generate equivalence classes to improve data precision, each record in the table is at least consistent with the standard identifier of the K-1 record in the table through a K-anonymity mechanism, and an attacker cannot know whether someone is in the disclosed data. Given a person, an attacker cannot confirm whether he has a sensitive property. An attacker cannot confirm which person corresponds to a certain piece of data, and protection of privacy of a patient is achieved.
Based on the above embodiment, in the drug recommendation method provided by another embodiment of the present invention, after anonymizing the classified quasi-identifier attribute in the matrix C, one-hot encoding is performed on the classified quasi-identifier attribute in the matrix C satisfying K anonymity, so as to form the feature matrix E.
In the above embodiment, the classified data in the feature matrix C satisfying k anonymity is subjected to one-hot coding to form the feature matrix E, and the one-hot coder performs "binarization" operation on the class, and then uses the class as the feature of model training, so that training of the model is facilitated, and a sample (data) basis is provided for subsequent training of the model.
Based on the above embodiment, in the method for recommending a drug according to another embodiment of the present invention, the method further includes:
Training a model based on a gradient lifting decision tree algorithm with a differential privacy protection function, wherein the training of the model based on the gradient lifting decision tree algorithm with the differential privacy protection function comprises the following steps:
Inputting the characteristic matrix E into a model, wherein the standard identifier attribute and the patient disease attribute are taken as characteristics of the model, and the medicine name is taken as a prediction target of the model;
Updating the iterative reinforcement learner by using the residual error;
and carrying out differential privacy processing on the obtained strong learner based on the Laplace mechanism.
FIG. 3 illustrates a schematic diagram of a method of training GBDT models with differential privacy, as provided by one embodiment.
As shown in fig. 3, the coded feature matrix E is brought into a model, parameters are initialized, a learner is initialized, and the iterative reinforcement learner is updated by using residual errors, wherein each iteration satisfies the differential privacy, a final reinforcement learner is generated, and the model training is completed.
Specifically, in one example, training a model based on a gradient boost decision tree algorithm with differential privacy preserving functionality includes the steps of:
step 1, setting a training set and a testing set according to the proportion of 7:3, inputting the training set into a model, and checking the accuracy of prediction of the trained model by using the testing set.
And 2, initializing parameters. Setting the iteration times as T, namely generating T GBDT classification trees, wherein the tree depth is d, the learning rate is r, the privacy budget is epsilon, and the privacy budget of each iteration is epsilon/T.
Wherein N is the number of samples, and f 0 (x) is the initial weak learner;
And 3, updating the iterative reinforcement learner by utilizing the residual error, wherein the differential privacy is satisfied, and each iteration satisfies the differential privacy. Wherein, residual r im is found using a negative gradient:
wherein m=1, 2..t is the number of iterations, taking the residual error obtained by the iteration as training data of the next learner, updating the learner to f m (x):
Where J is the number of leaf nodes.
Differential privacy processing is carried out on the obtained learner, and specifically, the differential privacy processing is realized by using a Laplace mechanism:
Wherein the method comprises the steps of For Laplace noise, Δf is sensitivity, privacy budget ε:
Where Δf represents global sensitivity, i.e. manhattan distance between f (D 1) and f (D 2), D 1ΔD2 represents the number of symmetry differences, D 1ΔD2 =1 represents D 1 and D 2 neighbor datasets, P (x) is a probability density function of the Laplace distribution with a variance of 2b 2, which is expected to be 0.
And 4, forming a final strong learner. Wherein the results of the T trees are linearly summed to give the final strong learner f (x):
In the above embodiment, the laplace noise is added to the learner through the laplace mechanism, so that the learner has the function of differential privacy protection.
Based on the above embodiment, in the medication recommendation method provided in another embodiment of the present invention, the related information includes demographic information, treatment information, and medication information, the demographic information includes age, sex, zip code, and address of the target subject,
The age, sex and postal code of the target object are numerical standard identifier attributes, and the address of the target object is a classification standard identifier attribute.
In the above-described embodiments, the mapping of the input and output of the model is established by demographic information, treatment information, and medication information, quasi-identifier attributes and patient condition attributes are characteristic of the model, and medication names are predictive targets of the model, thereby enabling the model to accurately make medication recommendations.
In summary, the present invention lists the quasi-identifier attribute of the patient, including the numeric data (age, sex, zip code) and the classified data (address), according to the demographic data and the disease data of the patient, and the drug data, and adopts the technology of adding random disturbance for the numeric data and the k anonymity technology for the classified data. Demographic data and disease data of the patient after privacy treatment are used as characteristics of training samples, the drug name is used as a training target of the samples, the training is carried out in a model of GBDT algorithm meeting differential privacy, finally, a drug recommendation list is obtained through prediction, drug recommendation is better carried out on the patient, and meanwhile, the privacy of the patient is protected.
According to the medicine recommendation method, the device, the electronic equipment and the storage medium, privacy protection preprocessing is carried out on the related information, and medicine recommendation information for the target object is generated based on the related information subjected to the privacy protection preprocessing and a model based on a gradient lifting decision tree algorithm. The medicine recommendation method can accurately and reliably recommend medicines to patients and simultaneously effectively protect privacy of the patients; the data of different types can be subjected to proper privacy protection processing; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much accuracy.
The discharge chamber model identification device provided by the invention is described below, and the discharge chamber model identification device described below and the drug recommendation method described above can be referred to correspondingly.
Fig. 4 illustrates a physical schematic diagram of a pharmaceutical recommendation device, and as shown in fig. 4, the device may include:
an acquiring unit 410, configured to acquire related information of a target object;
a privacy protection preprocessing unit 420, configured to perform privacy protection preprocessing on the related information;
and a medicine recommendation unit 430 for generating medicine recommendation information for the target object based on the related information subjected to privacy preserving preprocessing and a model based on a gradient lifting decision tree algorithm.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 550, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 550. Processor 510 may invoke logic instructions in memory 530 to perform a medication recommendation method comprising:
Acquiring related information of a target object;
performing privacy protection preprocessing on the related information;
Generating drug recommendation information for the target object based on the related information subjected to privacy preserving preprocessing and a model based on a gradient lifting decision tree algorithm.
It will be appreciated that the refinement and expansion functions that the computer program may perform are as described with reference to the above embodiments.
Based on the same inventive concept, a further embodiment of the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements all the steps of the above-mentioned medication recommendation method.
It will be appreciated that the refinement and expansion functions that the computer program may perform are as described with reference to the above embodiments.
Based on the same inventive concept, a further embodiment of the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements all the steps of the above-mentioned method for implementing a medication recommendation.
It will be appreciated that the refinement and expansion functions that the computer program may perform are as described with reference to the above embodiments.
Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the embodiment of the invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the above technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform the security defense method described in the respective embodiments or some parts of the embodiments.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present disclosure, descriptions of the terms "one embodiment," "some embodiments," "examples," "particular examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. A method of medication recommendation, comprising:
Acquiring related information of a target object;
performing privacy protection preprocessing on the related information;
generating drug recommendation information for the target object based on the related information that has been privacy preserving pre-processed and based on a model of a gradient boost decision tree algorithm, wherein,
The privacy protection preprocessing of the related information comprises the following steps: determining a quasi identifier attribute in the related information; adding random disturbance to the numerical standard identifier attribute in the related information; k-anonymizing the classified quasi-identifier attribute in the related information,
The model based on the gradient lifting decision tree algorithm comprises a model based on the gradient lifting decision tree algorithm with a differential privacy protection function;
the model based on the gradient lifting decision tree algorithm comprises a model based on the gradient lifting decision tree algorithm with a differential privacy protection function;
Before adding random disturbance to the numerical quasi-identifier attribute in the related information and performing K-anonymization processing on the classified quasi-identifier attribute in the related information, the privacy preserving preprocessing on the related information further comprises: based on the demographic information of the patient, the condition attributes of the patient, and the medication attributes used, a patient-medication matrix C is generated,
Wherein n represents the number of attributes of the patient, m represents the number of patients, A is one of demographic information of the patient, the condition attribute of the patient, and the pharmaceutical attribute used;
The adding random perturbations to the numerical quasi-identifier properties in the relevant information comprises: determining a disturbance range [ -gamma, gamma ]; adding random numbers uniformly distributed in [ -gamma, gamma ] to the numerical standard identifier attribute in the matrix C, wherein the value of gamma takes 1;
The K-anonymizing of the classified quasi-identifier attribute in the related information comprises the following steps: step 1), anonymizing classification quasi-identifier attributes in the related information by utilizing KACA algorithm, and generating an initial equivalence class X= { X 1,x2,x3,…,xn } based on a matrix C, wherein X is an equivalence class set, X 1,x2,x3,…,xn is an equivalence class, and the values of quasi-identifiers of the tuples in each equivalence class are equal; Step 2), selecting an equivalent class X i with the number of tuples smaller than K, calculating the distances between the equivalent class X i and equivalent classes except the equivalent class X i in the equivalent class set X, and finding an equivalent class X j closest to the equivalent class X i; if the number of common tuples of equivalence classes x i and x j is less than K, equivalence classes x i and x j are combined into a class, And deleting the equivalence class X j from the equivalence class set X, if the number of common tuples of equivalence classes X i and X j is greater than or equal to K, The K- |x i | tuples closest to x i are selected in x j to form the equivalence class x j1, Merging the equivalence classes X i and X j1 into one class, and deleting the equivalence class X j1 from the equivalence class set X; Step 3), circularly executing the step 2) for a plurality of times until no equivalent class with the number of the tuples smaller than K exists in the equivalent class set X, so that each tuple is at least consistent with the standard identifier of K-1 records in the equivalent class; and 4) generalizing each equivalence class in the equivalence class set X, wherein the related information comprises demographic information, treatment information and medicine information, the demographic information comprises age, sex, zip code and address of the target object, the age, sex and zip code of the target object are numerical standard identifier attributes, and the address of the target object is a classification standard identifier attribute.
2. The method for recommending medicine according to claim 1, characterized in that the method further comprises:
and after anonymizing the classified quasi-identifier attribute in the matrix C, performing One-hot encoding on the classified quasi-identifier attribute in the matrix C meeting k anonymity to form a feature matrix E.
3. The method for recommending medicine according to claim 2, characterized in that the method further comprises:
Training a model based on a gradient lifting decision tree algorithm with a differential privacy protection function, wherein the training of the model based on the gradient lifting decision tree algorithm with the differential privacy protection function comprises the following steps:
Inputting the characteristic matrix E into a model, wherein the standard identifier attribute and the patient disease attribute are taken as characteristics of the model, and the medicine name is taken as a prediction target of the model;
Updating the iterative reinforcement learner by using the residual error;
and carrying out differential privacy processing on the obtained strong learner based on the Laplace mechanism.
4. A pharmaceutical recommendation device for implementing the method of claim 1, comprising:
The acquisition unit is used for acquiring the related information of the target object;
the privacy protection preprocessing unit is used for carrying out privacy protection preprocessing on the related information;
And the medicine recommendation unit is used for generating medicine recommendation information aiming at the target object based on the related information subjected to privacy protection preprocessing and a model based on a gradient lifting decision tree algorithm.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the medication recommendation method of any of claims 1-3 when the program is executed by the processor.
6. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the medication recommendation method according to any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110022884.0A CN112652375B (en) | 2021-01-08 | 2021-01-08 | Medicine recommendation method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110022884.0A CN112652375B (en) | 2021-01-08 | 2021-01-08 | Medicine recommendation method, device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112652375A CN112652375A (en) | 2021-04-13 |
CN112652375B true CN112652375B (en) | 2024-08-27 |
Family
ID=75367653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110022884.0A Active CN112652375B (en) | 2021-01-08 | 2021-01-08 | Medicine recommendation method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112652375B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094497B (en) * | 2021-06-07 | 2021-09-14 | 华中科技大学 | Electronic health record recommendation method and shared edge computing platform |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825955A (en) * | 2019-06-27 | 2020-02-21 | 安徽师范大学 | Distributed differential privacy recommendation method based on location based service |
CN111753543A (en) * | 2020-06-24 | 2020-10-09 | 北京百度网讯科技有限公司 | Medicine recommendation method and device, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190124195A (en) * | 2019-10-28 | 2019-11-04 | (주)이지서티 | Improved K-anonymity Model based Dataset De-identification Method and Apparatus |
CN111931233B (en) * | 2020-08-12 | 2022-11-15 | 哈尔滨工业大学(深圳) | Information recommendation method and system based on block chain and localized differential privacy protection |
-
2021
- 2021-01-08 CN CN202110022884.0A patent/CN112652375B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825955A (en) * | 2019-06-27 | 2020-02-21 | 安徽师范大学 | Distributed differential privacy recommendation method based on location based service |
CN111753543A (en) * | 2020-06-24 | 2020-10-09 | 北京百度网讯科技有限公司 | Medicine recommendation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112652375A (en) | 2021-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vatsalan et al. | A taxonomy of privacy-preserving record linkage techniques | |
Fung et al. | Privacy-preserving data publishing: A survey of recent developments | |
Sousa et al. | How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing | |
US10242213B2 (en) | Asymmetric journalist risk model of data re-identification | |
Khan et al. | Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying c‐diversity | |
Christen et al. | Pattern-mining based cryptanalysis of Bloom filters for privacy-preserving record linkage | |
CN102156755A (en) | K-cryptonym improving method | |
Kanwal et al. | A robust privacy preserving approach for electronic health records using multiple dataset with multiple sensitive attributes | |
Gkountouna et al. | Anonymizing collections of tree-structured data | |
Majeed et al. | Vulnerability-and diversity-aware anonymization of personally identifiable information for improving user privacy and utility of publishing data | |
Kieseberg et al. | Protecting anonymity in data-driven biomedical science | |
Indhumathi et al. | Healthcare Cramér generative adversarial network (HCGAN) | |
CN112652375B (en) | Medicine recommendation method, device, electronic equipment and storage medium | |
Torra et al. | Privacy models and disclosure risk measures | |
Majeed et al. | A practical anonymization approach for imbalanced datasets | |
Kanwal et al. | Fuzz-classification (p, l)-Angel: An enhanced hybrid artificial intelligence based fuzzy logic for multiple sensitive attributes against privacy breaches | |
Victor et al. | Privacy preserving sensitive data publishing using (k, n, m) anonymity approach | |
Podlesny et al. | Attribute compartmentation and greedy UCC discovery for high-dimensional data anonymization | |
Liu et al. | Transactional data anonymization for privacy and information preservation via disassociation and local suppression | |
Azman | Efficient identity matching using static pruning q-gram indexing approach | |
Podoliaka et al. | Privacy Attacks Based on Correlation of Dataset Identifiers: Assessing the Risk | |
Mohammed et al. | Complementing privacy and utility trade-off with self-organising maps | |
Roy | Determining t in t-closeness using Multiple Sensitive Attributes | |
Patel et al. | Privacy preservation for big data healthcare management | |
Kieseberg et al. | Protecting anonymity in the data-driven medical sciences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |