CN112652375A - Medicine recommendation method and device, electronic equipment and storage medium - Google Patents

Medicine recommendation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112652375A
CN112652375A CN202110022884.0A CN202110022884A CN112652375A CN 112652375 A CN112652375 A CN 112652375A CN 202110022884 A CN202110022884 A CN 202110022884A CN 112652375 A CN112652375 A CN 112652375A
Authority
CN
China
Prior art keywords
quasi
related information
information
equivalence class
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110022884.0A
Other languages
Chinese (zh)
Inventor
李建强
李媛
王延安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110022884.0A priority Critical patent/CN112652375A/en
Publication of CN112652375A publication Critical patent/CN112652375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a medicine recommendation method, a medicine recommendation device, electronic equipment and a storage medium. The method comprises the following steps: acquiring related information of a target object; carrying out privacy protection preprocessing on the related information; generating drug recommendation information for the target subject based on the relevant information that has been pre-processed for privacy protection and based on a model of a gradient boosting decision tree algorithm. The medicine recommending method can accurately and reliably recommend the medicine to the patient and can effectively protect the privacy of the patient; the method can perform appropriate privacy protection processing on different types of data; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much precision.

Description

Medicine recommendation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of medical service recommendation, in particular to a medicine recommendation method and device, electronic equipment and a storage medium.
Background
In order to provide accurate drug recommendation for patients according to the conditions of patients, medical recommendation systems need to collect medical data of the patients, more detailed information can provide more accurate recommendation effect, privacy risks generated when the systems collect and process patient data are often underestimated or ignored, and the problem of privacy disclosure of the patients is serious, so that it is important to adopt an effective method to protect the privacy of users in the drug recommendation.
In order to protect privacy, privacy protection algorithms are more and more concerned, such as data anonymization (also called de-identification), data scrambling, data encryption and access control. With the increase of big data, the differential privacy has attracted wide attention of scholars and is applied to the field of medical recommendation.
At present, in many researches on privacy recommendation algorithms, a user's scoring matrix is used for recommendation, when data is subjected to privacy processing before recommendation, only a single data type is processed, for example, numerical user data is processed by using a random disturbance technology, then a user scoring matrix is created, for drug recommendation, classification processing is required for data containing both classification types and numerical types, and the classification data cannot generate the user scoring matrix.
In view of this, it is very necessary to select different privacy protection technologies to separately process data, select a recommendation algorithm with strong robustness, and consider inference attack of a recommendation system at the same time, so as to protect the privacy of a patient more effectively without losing too much accuracy.
Disclosure of Invention
Aiming at the problems in the prior art, embodiments of the present invention provide a method and an apparatus for recommending a drug, an electronic device, and a storage medium, so as to solve the defects in the prior art that appropriate privacy protection processing cannot be performed on different types of data, a recommendation algorithm has low robustness and poor precision, and privacy of a patient cannot be effectively protected, and achieve the effect of accurately and reliably recommending a drug to a patient and effectively protecting privacy of the patient.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for recommending a medication, including:
acquiring related information of a target object;
carrying out privacy protection preprocessing on the related information;
generating drug recommendation information for the target subject based on the relevant information that has been pre-processed for privacy protection and based on a model of a gradient boosting decision tree algorithm.
Further, the privacy protection preprocessing the related information comprises:
determining quasi-identifier attributes in the relevant information;
adding random disturbance to the numerical quasi-identifier attribute in the related information;
and performing K-anonymization processing on the classified quasi-identifier attribute in the related information.
Further, the model based on the gradient boost decision tree algorithm comprises a model based on the gradient boost decision tree algorithm with a differential privacy protection function.
Further, before adding random perturbation to the numerical quasi-identifier attribute in the related information and performing K-anonymization on the classified quasi-identifier attribute in the related information, the performing privacy-preserving preprocessing on the related information further includes:
based on the patient's demographic information, the patient's condition attributes, and the drug attributes used, a patient-drug matrix C is generated,
Figure BDA0002889293150000021
where n represents the number of attributes of the patient and m represents the number of patients.
Further, the adding random perturbations to the numeric quasi-identifier attributes in the relevant information comprises:
determining a disturbance range [ -gamma, gamma ];
adding random numbers uniformly distributed in [ - γ, γ ] to the numeric quasi-identifier attribute in the matrix C.
Further, the K-anonymizing the classified quasi-identifier attribute in the related information includes:
step 1), anonymizing the classified quasi-identifier attribute in the related information by using a KACA algorithm, and generating an initial equivalence class X ═ X based on a matrix C1,x2,x3,…,xnWhere X is the set of equivalence classes, X1,x2,x3,…,xnFor equivalence classes, the quasi-identifiers of the tuples in each equivalence class are equal in value;
step 2), selecting equivalent class x with tuple number less than KiCalculating the equivalence class xiAnd the equivalence class X is divided from the equivalence class set XiDistance of other equivalence classes, find distance xiNearest equivalence class xj
If the equivalence class xiAnd xjIs less than K, the equivalence class xiAnd xjMerging into one class and deleting the equivalence class X in the equivalence class set Xj
If the equivalence class xiAnd xjIs greater than or equal to K, at xjIn selecting the distance xiThe most recent K- | xi| tuples form equivalence class xj1Will be of the equivalence class xiAnd xj1Merging into one class and deleting the equivalence class X in the equivalence class set Xj1
Step 3), circularly executing the step 2) for a plurality of times until the equivalent class set X does not have the equivalent class of which the tuple number is less than K, so that each tuple is at least consistent with the quasi-identifiers of K-1 records in the equivalent class set X;
and 4), generalizing each equivalent class in the equivalent class set X.
Further, the method further comprises:
and after anonymization processing is carried out on the classified quasi identifier attribute in the matrix C, carrying out One-hot coding on the classified quasi identifier attribute in the matrix C meeting k anonymity to form a characteristic matrix E.
Further, the method further comprises:
training a model with a differential privacy protection function based on a gradient boost decision tree algorithm, wherein the training of the model with the differential privacy protection function based on the gradient boost decision tree algorithm comprises the following steps:
inputting the feature matrix E into a model, wherein quasi-identifier attributes and patient condition attributes serve as features of the model, and drug names serve as predicted targets of the model;
updating an iterative reinforcement learning device by using a residual error;
and carrying out differential privacy processing on the obtained strong learner based on a Laplace mechanism.
Further, the related information includes demographic information, treatment information, and medication information, the demographic information includes age, gender, zip code, and address of the target subject,
the target object is classified into a plurality of types, wherein the target object is classified into a plurality of types, and the target object is classified into a plurality of types.
In a second aspect, the present invention provides a medication recommendation device comprising:
an acquisition unit configured to acquire related information of a target object;
the privacy protection preprocessing unit is used for carrying out privacy protection preprocessing on the related information;
a drug recommendation unit for generating drug recommendation information for the target subject based on the relevant information subjected to privacy protection preprocessing and a model based on a gradient boosting decision tree algorithm.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the medication recommendation method as described above when executing the program.
In a fourth aspect, the invention provides a non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, performs the steps of the medication recommendation method as described above.
According to the medicine recommendation method, the device, the electronic equipment and the storage medium, privacy protection preprocessing is performed on the related information, and medicine recommendation information aiming at the target object is generated based on the related information subjected to privacy protection preprocessing and a model based on a gradient boosting decision tree algorithm. The medicine recommending method can accurately and reliably recommend the medicine to the patient and can effectively protect the privacy of the patient; the method can perform appropriate privacy protection processing on different types of data; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much precision.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for recommending medications according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a differential privacy recommendation method based on a k-anonymity model and random perturbation according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a method for training a GBDT model with differential privacy according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a medication recommendation device according to an embodiment of the present invention; and
fig. 5 is a schematic structural diagram of an electronic device for recommending medications according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The privacy risks that arise when systems collect and process patient data are often underestimated or overlooked, leading to serious problems with patient privacy disclosure, and it is therefore important to take effective methods to protect the privacy of users in drug recommendations.
At present, in many researches on privacy recommendation algorithms, a user's scoring matrix is used for recommendation, when data is subjected to privacy processing before recommendation, only a single data type is processed, for example, numerical user data is processed by using a random disturbance technology, then a user scoring matrix is created, for drug recommendation, classification processing is required for data containing both classification types and numerical types, and the classification data cannot generate the user scoring matrix.
Therefore, the improved medicine recommendation method has the beneficial effects that the inference attack of a recommendation system can be resisted better on the premise that the accuracy of a recommendation result is not lost basically when the medicine recommendation is carried out, the privacy of a patient is protected more effectively, and the algorithm robustness is strong. To this end, the present invention provides a method, an apparatus, an electronic device and a storage medium for recommending a medicine, and the contents provided by the present invention will be explained and explained in detail through specific embodiments.
Fig. 1 shows a schematic diagram of a medication recommendation method. As shown in fig. 1, the method for recommending a drug provided by the embodiment of the present invention includes the following steps:
step 110: acquiring related information of a target object;
step 120: carrying out privacy protection preprocessing on the related information;
step 130: generating drug recommendation information for the target subject based on the relevant information that has been pre-processed for privacy protection and based on a model of a gradient boosting decision tree algorithm.
In step 110, the target object is a patient, in one example, preferably including a patient who has had a disease and has gone to various hospitals and clinics for treatment and has treatment information (electronic medical records) left in the systems of the various hospitals and clinics. Of course, the target object may also include a person without a history of diseases, and the drug recommendation method of the present invention may recommend an appropriate drug to the corresponding target object directly according to personal information of the target object, such as age, etc. At this point, the patient's attributes, drugs and conditions, may be empty or set to 0 or other suitable value or code.
In one example, the relevant information includes demographic information including age, gender, zip code, address, etc. of the target subject, therapy information, and medication information. The target object is classified into a plurality of types, wherein the target object is classified into a plurality of types, and the target object is classified into a plurality of types. Of course, the relevant information may also include other types of data besides age, gender, zip code, and address.
The medicine recommendation method disclosed by the invention is used for respectively carrying out different privacy processing methods on the numerical quasi-identifier attribute and the classification quasi-identifier attribute, so that the privacy of an individual is comprehensively protected.
In step 120, privacy preserving preprocessing is performed on the related information, and in one example, privacy preserving preprocessing includes adding random disturbance to numerical data in the related information and/or performing K-anonymization on classified data in the related information.
And aiming at numerical data and classified data in the related information, privacy processing is performed by respectively adopting a random disturbance technology and a k anonymity technology, so that the system is prevented from acquiring real and sensitive data of the patient. The random disturbance technology is simple to operate, high in efficiency and suitable for processing numerical data, the k anonymization technology divides data according to equivalence classes, and then data are generalized and suitable for processing classified data.
In step 130, drug recommendation information for the target subject is generated based on the relevant information that has been pre-processed for privacy protection and a model based on a gradient boosting decision tree algorithm.
After privacy processing is carried out on personal information of a patient through a random disturbance technology and a k anonymization technology, the personal information of the patient has certain anti-theft capability. Finally, in order to resist reasoning attacks of a recommendation system, a Gradient Boosting Decision Tree (GBDT) model meeting the difference privacy is used for predicting a medicine list. The collaborative filtering algorithm based on the Gradient Boosting Decision Tree (GBDT) model can process numerical data and classified data simultaneously, and therefore privacy of patients is effectively protected.
Fig. 2 is a schematic diagram illustrating a differential privacy recommendation method based on a k-anonymity model and random perturbation according to an embodiment of the present invention.
As shown in fig. 2, demographic data and treatment information may be obtained through an electronic medical record of a patient, the electronic medical record and the drug data of the patient are used as input of a model, the model generates a patient-drug matrix C, random disturbance is added to numerical data in the related information and/or K-anonymization is performed on categorical data in the related information, one-hot encoding is performed on categorical data in a feature matrix C satisfying K-anonymization, a feature matrix E is finally formed, a GBDT model with differential privacy protection is trained, and a drug is predicted by using the trained GBDT model with differential privacy protection to generate a drug recommendation list.
In the embodiment, the privacy protection technology and the gradient boosting decision tree algorithm are combined to predict the medicine and generate the recommendation list, so that medicine recommendation is better performed on the patient, and the privacy of the patient is protected.
Based on the foregoing embodiment, in a medication recommendation method provided by another embodiment of the present invention, the performing privacy protection preprocessing on the related information includes:
determining quasi-identifier attributes in the relevant information;
adding random disturbance to the numerical quasi-identifier attribute in the related information;
and performing K-anonymization processing on the classified quasi-identifier attribute in the related information.
Quasi-identifiers (QI, which can determine a user record with a high probability in combination with certain external information): a single column does not locate an individual, but multiple columns of information can be used to potentially identify an individual. In one example, the system can potentially identify an individual through multiple columns of information, and the attribute revealing the sensitive information of this individual is a quasi-identifier attribute, i.e., a quasi-identifier attribute is an attribute that is susceptible to revealing the sensitive information of the individual by system attacks. In one example, there is some correlation between the plurality of quasi-identifier attributes. In one example, the quasi-identifier attributes are attributes of age, gender, zip code, and address of the individual.
K represents the number of the quasi-identifiers, namely the number of the quasi-identifiers is at least K, namely a certain number (at least K) of records which are indistinguishable on the quasi-identifiers exist in the published data, so that an attacker cannot distinguish a specific individual to which the private information belongs, the privacy of the individual is protected, and the K-anonymity specifies the maximum information disclosure risk which can be borne by a user through a parameter K. K-anonymization protects the privacy of an individual to some extent, but at the same time reduces the availability of data. Therefore, research efforts on k-anonymization have focused primarily on improving the availability of data while protecting private information.
Aiming at numerical data and classified data, a random disturbance technology and a K anonymity technology are respectively adopted for privacy processing, so that the system is prevented from acquiring real and sensitive data of a patient. The random disturbance technology is simple to operate, high in efficiency and suitable for processing numerical data, the K anonymization technology divides data according to the equivalence class, and then generalizes the data, and the K anonymization technology is suitable for processing the classified data.
In the embodiment, different data in the information of the patient are processed differently, so that the privacy protection effect of the patient is improved.
Based on the foregoing embodiment, in the drug recommendation method provided by another embodiment of the present invention, the model based on the gradient boosting decision tree algorithm has a differential privacy protection function.
In the embodiment, the K anonymity and random disturbance technology is combined with the differential privacy technology, so that the system is effectively prevented from acquiring the real data of the patient, and the inference attack of the recommendation system is solved.
Based on the foregoing embodiment, in a drug recommendation method provided by another embodiment of the present invention, before adding random disturbance to a numerical type quasi-identifier attribute in the related information and performing K-anonymization processing on classified data in the related information, performing privacy protection preprocessing on the related information further includes:
based on the patient's demographic information, the patient's condition attributes, and the drug attributes used, a patient-drug matrix C is generated,
Figure BDA0002889293150000081
where n represents the number of attributes of the patient and m represents the number of patients, in one example, A11Represents the age of the first patient, A12Represents the sex of the first patient, A13Zip code representing the first patient, A14Address representing the first patient, A15Representing the attribute of the condition of the first patient, A16Representing the drug attributes of the first patient.
Wherein, before generating the matrix C, the method further comprises cleaning the patient demographic data, treatment information, and drug data, removing null data, and digitizing the gender attribute (in one example, male is 0, female is 1); patient demographic information (age, sex, zip code, address) is retained, along with patient condition attributes and drug attributes used.
In the above embodiment, the patient-drug matrix C is formed by performing statistics on the patient data, which provides a data base for subsequent random perturbation and K anonymization.
Based on the foregoing embodiment, in a medication recommendation method provided by another embodiment of the present invention, the adding a random disturbance to a numerical quasi-identifier attribute in the related information includes:
determining a disturbance range [ -gamma, gamma ];
adding random numbers uniformly distributed in [ - γ, γ ] to the numeric quasi-identifier attribute in the matrix C.
The size of the perturbation range [ - γ, γ ] can be determined according to the actual situation, and the number of random numbers to be added can be adjusted and changed. Of course, in an example, numerical values may also be added to the numerical data in the matrix C according to a predetermined rule, as long as the effect of disturbing the numerical data in the matrix C is achieved.
In the above embodiment, by adding random perturbations to the numerical data, protection of the privacy of the patient is achieved.
Based on the foregoing embodiment, in a drug recommendation method provided by another embodiment of the present invention, the performing K-anonymization on the classified quasi-identifier attribute in the related information includes:
step 1), anonymizing the classified quasi-identifier attribute in the related information by using a KACA algorithm, and generating an initial equivalence class X ═ X based on a matrix C1,x2,x3,…,xnWhere X is the set of equivalence classes, X1,x2,x3,…,xnFor equivalence classes, the quasi-identifiers of the tuples in each equivalence class are equal in value;
step 2), when the equivalent classes with the tuple number smaller than K exist in the equivalent class set X, selecting the equivalent classes X with the tuple number smaller than KiCalculating the equivalence class xiAnd the equivalence class X is divided from the equivalence class set XiDistance of other equivalence classes, find distance xiNearest equivalence class xj
If the equivalence class xiAnd xjIs less than K, the equivalence class xiAnd xjMerging into one class and deleting the equivalence class X in the equivalence class set XjI.e. X ═ X-Xj
If the equivalence class xiAnd xjOf a common tupleA number greater than or equal to K, at xjIn selecting the distance xiThe most recent K- | xi| tuples form equivalence class xj1Will be of the equivalence class xiAnd xj1Merging into one class and deleting the equivalence class X in the equivalence class set Xj1I.e. xi=xi∪xj1,X=X-xj1
Step 3), circularly executing the step 2) for a plurality of times until the equivalent class set X does not have the equivalent class of which the tuple number is less than K, so that each tuple is at least consistent with the quasi-identifiers of K-1 records in the equivalent class set X; until the equivalence class in X satisfies K anonymity protection or a new equivalence class cannot be constructed, the remaining records are inserted into the equivalence class nearest to it.
Step 4), generalizing each equivalence class in the equivalence class set X and outputting an anonymous list T
Specifically, in another example, an equivalence class T with a size equal to s is randomly selected, where s is greater than 1 and less than K, and the distance Dist (T) between T and other equivalence classes is calculated1,t2). Wherein the distance of the equivalence class is represented by the distance between two tuples, i.e. the distance between two tuples and their nearest generalized set:
Dist(t1,t2)=Distortion(t1,t12)+Distortion(t2,t12)
Figure BDA0002889293150000101
wherein, the Distortion (D, D ') represents the deformation degree of data D generalized into D ', D ' is the generalized table of D, tiIs the ith tuple, t 'in D'iIs tiOf generalized tuples of, t'iBelongs to D'. The Distortion value of the discrimination is that the weighted hierarchical distance WHD (weighted hierarchical distance) is calculated between each tuple and the final generalization table, and then the weighted hierarchical distances are accumulated and summed:
Figure BDA0002889293150000102
wherein WHD (p, q) represents a group represented by XpBy generalization to Xq(p>q), h is the highest level of generalization of the classification quasi-identifier attribute S, XhTo generalize the domain, wj,j-1Is XjAnd Xj-1(j is more than or equal to 2 and less than or equal to h),
Figure BDA0002889293150000103
where beta is simply taken to be 1.
Finding out the equivalence class T1 with the minimum Dist value of the distance T, and merging and generalizing T to T1. T is combined into a class at T1, and T1 are generalized.
Repeating the merging and generalization steps until the equivalent class set X does not have the equivalent class of which the tuple number is less than K, so that each tuple is at least consistent with the quasi-identifiers of K-1 records in the equivalent class set X, and all the equivalent classes in the equivalent class set X meet K anonymity.
And returning the matrix C after anonymous processing.
The basic idea of the KACA algorithm is: firstly, classifying original data sets according to the similarity degree of standard identifiers, and forming tuples with the same standard identifiers into a class to obtain a plurality of initial equivalent classes; then, an equivalence class is selected randomly, the number of tuples contained in the equivalence class is less than k, the equivalence class which is closest to the equivalence class is found out, the equivalence class and the equivalence class are combined, the operation is executed in a circulating mode until new equivalence class cannot be combined, and the range of the number of tuples contained in the finally obtained equivalence class is [ k, 2k-1 ]; finally, the tuples of the same equivalence class are generalized to the same value on the quasi-identifier, and an anonymous table is generated. The algorithm can make anonymous data in the same equivalence class similar as much as possible on a quasi-identifier, and the data in different equivalence classes have larger difference, so as to ensure the availability of the data.
In the above embodiment, the KACA algorithm is used for dividing the equivalence classes, the KACA algorithm is a typical local recoding algorithm for dividing the equivalence classes by combining the clustering idea, and the equivalence classes divided by the algorithm have high data availability and small information loss. Therefore, the KACA method can be adopted to generate equivalence classes to improve data accuracy, each record in the table is at least consistent with the standard identifier of K-1 records in the table through a K-anonymization mechanism, and an attacker cannot know whether a person is in the public data or not. Given a person, an attacker cannot confirm whether he has some sensitive attribute. An attacker cannot confirm which person corresponds to certain data, so that the privacy of a patient is protected.
Based on the above embodiment, in the drug recommendation method provided in another embodiment of the present invention, after anonymization processing is performed on the classified quasi-identifier attribute in the matrix C, One-hot encoding is performed on the classified quasi-identifier attribute in the matrix C that satisfies K anonymization, so as to form a feature matrix E.
In the above embodiment, the classified data in the feature matrix C satisfying k anonymity is subjected to one-hot encoding to form the feature matrix E, and the one-hot encoder performs "binarization" on the classes, and then uses the classes as features of model training, which is convenient for model training and provides a sample (data) basis for subsequent model training.
Based on the above embodiment, in a drug recommendation method provided by another embodiment of the present invention, the method further includes:
training a model with a differential privacy protection function based on a gradient boost decision tree algorithm, wherein the training of the model with the differential privacy protection function based on the gradient boost decision tree algorithm comprises the following steps:
inputting the feature matrix E into a model, wherein quasi-identifier attributes and patient condition attributes serve as features of the model, and drug names serve as predicted targets of the model;
updating an iterative reinforcement learning device by using a residual error;
and carrying out differential privacy processing on the obtained strong learner based on a Laplace mechanism.
Fig. 3 is a schematic diagram illustrating a method for training a GBDT model with differential privacy according to an embodiment.
As shown in fig. 3, the encoded feature matrix E is substituted into the model, the parameters are initialized, the learner is initialized, the strong learner is iteratively solved by using the residual error, wherein each iteration satisfies the difference privacy, the final strong learner is generated, and the model training is completed.
Specifically, in one example, training a gradient boosting decision tree algorithm-based model with differential privacy protection function includes the following steps:
step 1, setting a training set and a test set of the feature matrix E according to the ratio of 7: 3, inputting the training set into a model, and using the test set to test the accuracy of prediction of the trained model.
And step 2, initializing parameters. And setting the iteration number as T, namely generating T GBDT classification trees, wherein the depth of the tree is d, the learning rate is r, the privacy budget is epsilon, and the privacy budget of each iteration is epsilon/T.
Figure BDA0002889293150000121
Where N is the number of samples, f0(x) Is an initial weak learner;
and 3, updating the iterative solution strong learner by using the residual error, and meeting the differential privacy, wherein each iteration meets the differential privacy. Wherein, the residual rimThe negative gradient is used to find:
Figure BDA0002889293150000122
wherein m is 1,2.. T is iteration times, residual errors obtained by iteration are used as training data of a next learner, and the learner is updated to be fm(x):
Figure BDA0002889293150000131
Wherein J is the number of leaf nodes.
And performing differential privacy processing on the obtained learner, specifically, realizing the following by using a Laplace mechanism:
Figure BDA0002889293150000132
wherein
Figure BDA0002889293150000133
Is Laplace noise, Δ f is sensitivity, privacy budget ε:
Figure BDA0002889293150000134
Figure BDA0002889293150000135
where Δ f represents the global sensitivity, i.e., f (D)1) And f (D)2) Manhattan distance, | D, between1ΔD2I denotes the number of symmetry differences, | D1ΔD21 denotes D1And D2In the vicinity of the data set or sets,
Figure BDA0002889293150000136
p (c) is desirably 0 and variance 2b2Is calculated as a function of the probability density of the laplacian distribution of (a).
And 4, forming a final strong learner. Wherein the results of the T trees are linearly summed to obtain a final strong learner f (c):
Figure BDA0002889293150000137
in the above embodiment, laplacian noise is added to the learner through the laplacian mechanism, so that the learner has the function of differential privacy protection.
Based on the above embodiment, in the drug recommendation method according to another embodiment of the present invention, the related information includes demographic information, treatment information, and drug information, the demographic information includes the age, sex, zip code, and address of the target object,
the target object is classified into a plurality of types, wherein the target object is classified into a plurality of types, and the target object is classified into a plurality of types.
In the above-described embodiment, the mapping of the input and output of the model is established by the demographic information, the treatment information, and the drug information, the quasi-identifier attribute and the patient condition attribute as the features of the model, and the drug name as the prediction target of the model, thereby enabling the model to accurately make the drug recommendation.
In summary, according to the demographic data, the disease data and the drug data of the patient, the quasi-identifier attributes which are required to be processed in privacy by the patient are listed, and comprise numerical data (age, gender and zip code) and classification data (address), the numerical data are processed by a technology of adding random disturbance, and the classification data are processed in privacy by a k anonymization technology. The method comprises the steps of taking the demographic data and the disease data of a patient after privacy processing as the characteristics of a training sample, taking the medicine name as the training target of the sample, carrying out training in a GBDT algorithm model meeting the difference privacy, and finally predicting to obtain a medicine recommendation list.
According to the medicine recommendation method, the device, the electronic equipment and the storage medium, privacy protection preprocessing is performed on the related information, and medicine recommendation information aiming at the target object is generated based on the related information subjected to privacy protection preprocessing and a model based on a gradient boosting decision tree algorithm. The medicine recommending method can accurately and reliably recommend the medicine to the patient and can effectively protect the privacy of the patient; the method can perform appropriate privacy protection processing on different types of data; the robustness of the recommendation algorithm is strong; the privacy of the patient is more effectively protected without losing too much precision.
The following describes the discharge chamber model identification apparatus provided by the present invention, and the discharge chamber model identification apparatus described below and the above-described drug recommendation method can be referred to correspondingly.
Fig. 4 illustrates a physical structure diagram of a medication recommendation device, and as shown in fig. 4, the device may include:
an obtaining unit 410, configured to obtain relevant information of a target object;
a privacy protection preprocessing unit 420, configured to perform privacy protection preprocessing on the relevant information;
a drug recommendation unit 430 configured to generate drug recommendation information for the target subject based on the relevant information subjected to privacy protection preprocessing and a model based on a gradient boosting decision tree algorithm.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530, and a communication bus 550, wherein the processor 510, the communication Interface 520, and the memory 530 communicate with each other via the communication bus 550. Processor 510 may invoke logic instructions in memory 530 to perform a medication recommendation method comprising:
acquiring related information of a target object;
carrying out privacy protection preprocessing on the related information;
generating drug recommendation information for the target subject based on the relevant information that has been pre-processed for privacy protection and based on a model of a gradient boosting decision tree algorithm.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs all the steps of the above-described medication recommendation method.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
Based on the same inventive concept, yet another embodiment of the present invention provides a computer program product, which comprises a computer program, when being executed by a processor, the computer program realizes all the steps of the above-mentioned medicine recommendation method.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the security defense method according to the embodiments or some parts of the embodiments.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present disclosure, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for medication recommendation, comprising:
acquiring related information of a target object;
carrying out privacy protection preprocessing on the related information;
generating drug recommendation information for the target subject based on the relevant information that has been pre-processed for privacy protection and based on a model of a gradient boosting decision tree algorithm.
2. The medication recommendation method according to claim 1, wherein said privacy preserving preprocessing of said related information comprises:
determining quasi-identifier attributes in the relevant information;
adding random disturbance to the numerical quasi-identifier attribute in the related information;
and performing K-anonymization processing on the classified quasi-identifier attribute in the related information.
3. The medication recommendation method according to claim 1, wherein the model based on a gradient boosting decision tree algorithm comprises a model based on a gradient boosting decision tree algorithm with a differential privacy protection function.
4. The medication recommendation method of claim 2, wherein prior to adding random perturbations to numeric quasi-identifier attributes in the related information and K-anonymizing categorical quasi-identifier attributes in the related information, said privacy-preserving preprocessing the related information further comprises:
based on the patient's demographic information, the patient's condition attributes, and the drug attributes used, a patient-drug matrix C is generated,
Figure FDA0002889293140000011
where n represents the number of attributes of the patient and m represents the number of patients.
5. The medication recommendation method according to claim 4, wherein said adding random perturbations to numerical quasi-identifier attributes in said related information comprises:
determining a disturbance range [ -gamma, gamma ];
adding random numbers uniformly distributed in [ - γ, γ ] to the numeric quasi-identifier attribute in the matrix C.
6. The medication recommendation method according to claim 4, wherein said K-anonymizing the categorical quasi-identifier attribute in the related information comprises:
step 1), anonymizing the classified quasi-identifier attribute in the related information by using a KACA algorithm, and generating an initial equivalence class X ═ X based on a matrix C1,x2,x3,…,xnWhere X is the set of equivalence classes, X1,x2,x3,…,xnFor equivalence classes, the quasi-identifiers of the tuples in each equivalence class are equal in value;
step 2), selecting equivalent class x with tuple number less than KiCalculating the equivalence class xiAnd the equivalence class X is divided from the equivalence class set XiDistance of other equivalence classes, find distance xiNearest equivalence class xj
If the equivalence class xiAnd xjIs less than K, the equivalence class xiAnd xjMerging into one class and deleting the equivalence class X in the equivalence class set Xj
If the equivalence class xiAnd xjIs greater than or equal to K, at xjIn selecting the distance xiThe most recent K- | xi| tuples form equivalence class xj1Will be of the equivalence class xiAnd xj1Merging into one class and deleting the equivalence class X in the equivalence class set Xj1
Step 3), circularly executing the step 2) for a plurality of times until the equivalent class set X does not have the equivalent class of which the tuple number is less than K, so that each tuple is at least consistent with the quasi-identifiers of K-1 records in the equivalent class set X;
and 4), generalizing each equivalent class in the equivalent class set X.
7. The medication recommendation method according to claim 6, further comprising:
and after anonymization processing is carried out on the classified quasi identifier attribute in the matrix C, carrying out One-hot coding on the classified quasi identifier attribute in the matrix C meeting k anonymity to form a characteristic matrix E.
8. The medication recommendation method according to claim 7, further comprising:
training a model with a differential privacy protection function based on a gradient boost decision tree algorithm, wherein the training of the model with the differential privacy protection function based on the gradient boost decision tree algorithm comprises the following steps:
inputting the feature matrix E into a model, wherein quasi-identifier attributes and patient condition attributes serve as features of the model, and drug names serve as predicted targets of the model;
updating an iterative reinforcement learning device by using a residual error;
and carrying out differential privacy processing on the obtained strong learner based on a Laplace mechanism.
9. A medication recommendation method according to any of claims 1-8, characterized in that said related information comprises demographic information, treatment information, and medication information, said demographic information comprising age, gender, zip code, and address of said target subject,
the target object is classified into a plurality of types, wherein the target object is classified into a plurality of types, and the target object is classified into a plurality of types.
10. A medication recommendation device, comprising:
an acquisition unit configured to acquire related information of a target object;
the privacy protection preprocessing unit is used for carrying out privacy protection preprocessing on the related information;
a drug recommendation unit for generating drug recommendation information for the target subject based on the relevant information subjected to privacy protection preprocessing and a model based on a gradient boosting decision tree algorithm.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the medication recommendation method according to any one of claims 1-9.
12. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the medication recommendation method according to any one of claims 1-9.
CN202110022884.0A 2021-01-08 2021-01-08 Medicine recommendation method and device, electronic equipment and storage medium Pending CN112652375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110022884.0A CN112652375A (en) 2021-01-08 2021-01-08 Medicine recommendation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110022884.0A CN112652375A (en) 2021-01-08 2021-01-08 Medicine recommendation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112652375A true CN112652375A (en) 2021-04-13

Family

ID=75367653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110022884.0A Pending CN112652375A (en) 2021-01-08 2021-01-08 Medicine recommendation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112652375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094497A (en) * 2021-06-07 2021-07-09 华中科技大学 Electronic health record recommendation method and shared edge computing platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094497A (en) * 2021-06-07 2021-07-09 华中科技大学 Electronic health record recommendation method and shared edge computing platform

Similar Documents

Publication Publication Date Title
Iwendi et al. N-sanitization: A semantic privacy-preserving framework for unstructured medical datasets
Fung et al. Privacy-preserving data publishing: A survey of recent developments
Tamersoy et al. Anonymization of longitudinal electronic medical records
Poulis et al. Apriori-based algorithms for km-anonymizing trajectory data.
Csányi et al. Challenges and open problems of legal document anonymization
Anjum et al. An efficient approach for publishing microdata for multiple sensitive attributes
Xiang et al. Privacy protection and secondary use of health data: strategies and methods
WO2019102291A1 (en) Data anonymization
US11853329B2 (en) Metadata classification
Khan et al. Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying c-diversity
Hasan et al. A new approach to privacy-preserving multiple independent data publishing
Christen et al. Pattern-mining based cryptanalysis of Bloom filters for privacy-preserving record linkage
Elliot et al. The future of statistical disclosure control
Majeed et al. Vulnerability-and diversity-aware anonymization of personally identifiable information for improving user privacy and utility of publishing data
Kieseberg et al. Protecting anonymity in data-driven biomedical science
WO2020222005A1 (en) Data protection
Abbasi et al. A clustering‐based anonymization approach for privacy‐preserving in the healthcare cloud
Indhumathi et al. Healthcare Cramér generative adversarial network (HCGAN)
Sangaiah et al. Privacy-aware and AI techniques for healthcare based on k-anonymity model in internet of things
CN112652375A (en) Medicine recommendation method and device, electronic equipment and storage medium
Mansour et al. Quasi-Identifier recognition algorithm for privacy preservation of cloud data based on risk reidentification
Rajaei et al. Ambiguity in social network data for presence, sensitive-attribute, degree and relationship privacy protection
Torra et al. Privacy models and disclosure risk measures
Podlesny et al. Attribute compartmentation and greedy UCC discovery for high-dimensional data anonymization
Zhang et al. Differential privacy medical data publishing method based on attribute correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination