CN115859372A - Medical data desensitization method and system - Google Patents

Medical data desensitization method and system Download PDF

Info

Publication number
CN115859372A
CN115859372A CN202310199626.9A CN202310199626A CN115859372A CN 115859372 A CN115859372 A CN 115859372A CN 202310199626 A CN202310199626 A CN 202310199626A CN 115859372 A CN115859372 A CN 115859372A
Authority
CN
China
Prior art keywords
data
information
medical
text
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310199626.9A
Other languages
Chinese (zh)
Other versions
CN115859372B (en
Inventor
李睿
胡其桐
刘瑞华
郑名扬
唐学文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Angels Biomedical Technology Co ltd
Original Assignee
Chengdu Angels Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Angels Biomedical Technology Co ltd filed Critical Chengdu Angels Biomedical Technology Co ltd
Priority to CN202310199626.9A priority Critical patent/CN115859372B/en
Publication of CN115859372A publication Critical patent/CN115859372A/en
Application granted granted Critical
Publication of CN115859372B publication Critical patent/CN115859372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the technical field of data processing, and discloses a medical data desensitization method and a medical data desensitization system, wherein the medical data desensitization method comprises the following steps: classifying the acquired medical data into text data and non-text data; extracting keywords in the text data, reserving original texts without the keywords, and taking the extracted keywords and the non-text data as data to be desensitized; classifying personal identity information, personal medical information, date information, address information and other information of the data to be desensitized; desensitizing the classified information. A medical data desensitization system comprising: the system comprises a data classification module, a sensitive word extraction module, a field classification module and a data desensitization module. The medical data desensitization method and the medical data desensitization system can complete full-automatic desensitization aiming at the medical data, and a user only needs to input a medical field contained in the medical data; desensitization may be performed for a diverse set of medical data.

Description

Medical data desensitization method and system
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a medical data desensitization method and system.
Background
Medical data contains a large amount of information about the individual characteristics of a patient, such as the patient's name, contact phone, place of birth, life track, description of illness, etc., and this information needs to be protected from privacy since it could harm the patient if it were to be revealed. The existing text data desensitization algorithm is based on a pattern matching mechanism and is used for statically matching keywords in a text to process. This can lead to three problems:
1. accurate matching cannot be achieved aiming at private information with weak regularity such as names. Such as: once the keyword "one sheet" appears in the text, the "one sheet" and the two characters after the "one sheet" are deleted as the names of the persons. However, if the text information is "varicose vein phenomenon is obvious", the method may erroneously regard "varicose vein" as the name of the person;
2. sensitive data cannot be dynamically judged based on context information. Such as: the patient's disease description may have a word "people's hospital in south county of Sichuan province", and the existing method will take "south county" as sensitive information to cover and code, so that only "people's hospital in south county of Sichuan province" is left; however, "Min Hospital in south prefecture of Sichuan province" does not refer to the patient's information of place of birth, and does not require desensitization; moreover, the situation of unknown reference can occur in the ' Sichuan province ' people hospital ' and can be confused with a plurality of people hospitals in the Sichuan province;
3. static matching rules require desensitizers to enumerate all possible data formats of sensitive information in advance, but there is still a possibility of omission in the face of text data with diversified forms. Such as: the patient's visit date may appear in the patient's description of the condition as "May 6 d 2023", which is not in standard format: 5/6/2023/05/06, thus being difficult to be statically matched.
Disclosure of Invention
The present invention aims to solve the above technical problem at least to some extent. Therefore, the invention aims to provide a medical data desensitization method and system.
The technical scheme adopted by the invention is as follows:
a method of desensitizing medical data, comprising the steps of:
s1, classifying the acquired medical data into text data and non-text data;
s2, extracting keywords in the text data, reserving original texts without the keywords, and taking the extracted keywords and the non-text data as data to be desensitized;
s3, classifying the data to be desensitized into personal identity information, personal medical information, date information, address information and other information;
s4, desensitizing the classified information: the personal identity information is encrypted, the personal medical information and the date information are fuzzified, the address information is covered by a mask to obtain text description, and other information is processed by keeping original text.
Preferably, in the step S2, the Pointer Network model improves an Attention mechanism of the BERT model based on a Transformer framework to obtain a BERT-Pointer Network model; the BERT-Pointer Network model converts text information into word vectors based on context information and extracts keywords in the text data.
Preferably, step S3 comprises: the BERT model converts text information into word vectors based on context information; the PCA model carries out principal component decomposition on the output result of the BERT model, combines similar medical fields and deletes irrelevant medical fields; and clustering the output result of the PCA model by using a clustering algorithm.
Preferably, the cosine distance between the new data and the four types of data of personal identity information, personal medical information, date information and address information is judged through a clustering algorithm, and if the distance between the new data and one type of data is closest and is lower than a preset threshold value, the new data is distributed to the data in the type; and if the distance between the new data and the four types of data is larger than a preset threshold value, marking the new data as other information.
Preferably, in step S1, the text data and the non-text data are classified according to the field names of the medical data.
A medical data desensitization system, comprising:
the data classification module is used for classifying the acquired medical data into text data and non-text data;
the sensitive word extraction module is used for extracting key words in the text data, sending the extracted key words into the field classification module and reserving original texts without the key words;
the field classification module is used for classifying personal identity information, personal medical information, date information, address information and other information of the non-text data and the keywords;
the data desensitization module is used for desensitizing the information classified by the field classification module: the personal identity information is encrypted, the personal medical information and the date information are fuzzified, the address information is covered by a mask to obtain text description, and other information is processed by keeping original text.
Preferably, the sensitive word extraction module includes:
the device comprises a Pointer Network model, a transform framework, a BERT-Pointer Network model and a data processing module, wherein the Pointer Network model is used for improving an Attention mechanism of the BERT model based on the transform framework to obtain the BERT-Pointer Network model;
and the BERT-Pointer Network model is used for converting text information into word vectors based on context information and extracting keywords.
Preferably, the field classification module comprises:
the BERT model is used for converting text information into word vectors based on context information;
the PCA model is used for carrying out principal component decomposition on the output result of the BERT model, combining similar medical fields and deleting irrelevant medical fields;
and the clustering model is used for clustering the output result of the PCA model.
Preferably, the medical data desensitization system further comprises: and the output module is used for outputting the information after desensitization treatment to the original position of the medical data.
The invention has the beneficial effects that:
1. the medical data desensitization method and the medical data desensitization system can complete full-automatic desensitization aiming at the medical data, and a user only needs to input a medical field contained in the medical data; desensitization may be performed for a diverse set of medical data (including textual data and non-textual data).
2. The BERT-Pointer Network model adopted by the invention extracts sensitive keywords of the medical text data for desensitization. The model optimizes the BERT model, and can better combine context medical information to extract sensitive keywords. Compared with the traditional pattern recognition algorithm, the recognition accuracy is improved by 81%, and the recognition speed is improved by 13 times; compared with the BERT model, the recognition accuracy is improved by 22%.
Drawings
FIG. 1 is a functional block diagram of a medical data desensitization system of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should also be noted that, in some embodiments, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
As shown in fig. 1, a medical data desensitization method of the present embodiment includes the following steps:
s1, classifying the acquired medical data into text data and non-text data according to the name of each field; as shown in table 1, the medical data includes fields corresponding to field names "name", "sex", "date of birth", "identification number", "body temperature", and "blood pressure", which are non-text data, and the medical data includes fields corresponding to medical condition description "which are text data.
TABLE 1 original medical data record sheet
S2, converting text information into word vectors based on context information by using a BERT-Pointer Network model, extracting keywords in the text data, reserving original texts without the keywords, and taking the extracted keywords and the non-text data as data to be desensitized; as shown in table 1, the extracted keywords were "zhang qiang", "3/14/2018", and "3/21/day".
In particular, for medical data, several ways of segmenting some text are possible. Such as: the national hospital in the south county of Sichuan province can be divided into the national hospital in the south county of Sichuan province and the national hospital for coding, and the national hospital in the south county of Sichuan province can also be directly coded as a whole. In order to better process medical text information with a complex structure, a BERT model is optimized and is expanded by using a Pointer Network model. The BERT model is mainly based on an Attention mechanism, and the Pointer Network model improves the Attention mechanism based on a Transformer framework to obtain the BERT-Pointer Network model, so that the problem that the traditional seq2seq framework cannot solve the problem that an output sequence changes along with the change of the length of an input sequence is solved, and the new Attention mechanism can be better combined with context for coding and can solve the problem of label overlapping.
The conventional Attention architecture is as follows
Figure SMS_1
Figure SMS_2
Figure SMS_3
Wherein e j ,d i Is a state quantity, v, W 1 、W 2 In order to learn the parameters, the user may,
Figure SMS_4
are weights. And simplifying the Pointer Network model, discarding the third layer weight, and taking the result of softmax as the role of a Pointer pointing to a specific element of the input sequence.
The improved Attention mechanism formula is as follows:
Figure SMS_5
Figure SMS_6
;/>
the output sequence of the Pointer Network model is derived from the input sequence, the range of i is the preset maximum range and is the length of the output sequence, and the vector
Figure SMS_7
An Attention mask representing the jth input vector; t denotes the matrix transposition, e j 、d i Is a state quantity, v, W 1 、W 2 Is a learning parameter; c 1 、C i-1 、C i All random variables represent a certain item in an input sequence, and p is a hyper-parameter representing joint probability distribution; />
Figure SMS_8
The conditional probability of occurrence of the i-th term, the first i-1 term, is known.
The BERT-Pointer Network model can be coded on two levels, one is that 'Min Hospital in south county of Sichuan province' is coded as a whole, and the other is that 'south county of Sichuan province' and 'Min Hospital' are respectively coded, so that the problem of ambiguity in segmentation of medical text data is solved.
And S3, classifying the data to be desensitized into personal identity information (such as name and identification card number), personal medical information (such as age), date information, address information and other information.
Specifically, the BERT model converts text information into word vectors based on context information; and certain initialization processing is carried out, and the BERT model is mainly based on an Attention mechanism, so that the model can be operated in a parallelization mode and can have global information. Wherein the Attention function is defined as follows:
Figure SMS_9
where Q represents the input information, information that exists for the input text. K represents content information, namely semantic information, and Attention (Q, K) represents the matching degree of Query and Key, while V represents information per se and has the main function of weighting the matching degree. The BERT model also considers the position information in combination, the influence of the context on the result is fully considered, and the output of the BERT model contains probability distribution in the same type of character information.
The PCA model carries out principal component decomposition on the output result of the BERT model, combines similar medical fields and deletes irrelevant medical fields; for example, the output result obtained after the BERT model processing includes: age, date of birth, cell phone number, time of patient visit, etc. After the PCA model processing, the "cell phone number" is deleted as noise regardless of medical service by the PCA module, and the "age" and "date of birth" are automatically merged as similar medical fields by the PCA module.
The principal component decomposition of the PCA model aims to find a set of orthonormal bases so that the distance between data is maximum after data points are projected on a plane formed by the orthonormal bases
Figure SMS_10
Wherein
Figure SMS_11
For data, the maximum value of the dual problem of the problem is obtained by using a Lagrange multiplier method as follows:
Figure SMS_12
wherein
Figure SMS_13
And selecting a projection matrix formed by eigenvectors corresponding to the first k eigenvalues for the eigenvalues of the sequenced covariance matrix, and extracting principal components of the word vectors.
And clustering the output result of the PCA model by using a single-pass clustering text online clustering algorithm. First, four classifications of medical data are made: personal identity information, personal medical information, date information and address information, training data of each category are provided, and after the training data are processed by using a BERT-PCA model, space vector representation of the data and the category of the data are obtained. The single-pass clustering text online clustering algorithm judges cosine distances between the new data and four types of data, namely personal identity information, personal medical information, date information and address information, and if the distance between the new data and one type of data is closest and is lower than a preset threshold value, the new data is distributed to the data in the type; and if the distances between the new data and the four types of data are larger than a preset threshold value, marking the new data as other information.
Specifically, for four categories: personal identification information, personal medical information, date information, address information, respectively, providing corresponding medical field training data, such as: name, age, date of visit, and pharmacy address.
Putting the training data into a BERT-PCA model, and performing text vectorization and principal component normalization processing on the corresponding data so as to obtain some reference vectors for each type of data, such as: {1 01 01 01 0},{0 10 10 10 1},{0 0 0 01 11 1},{1 11 10 0 0 0}. These four sets of vectors are the reference vectors for the four types of data.
When new medical fields are added, such as: the date of the surgery. It is first put into the BERT-PCA model to be vectorized, such as: {1 11 01 0 0 0}. Next, this new vector is compared with the above-mentioned sets of reference vectors, the cosine distance of which is calculated, and it is found that it is very close to the date reference vector { 11 10 0 0} so that "date of operation" is marked as the medical field of the "date information" category.
S4, desensitizing the classified information: and the personal identity information is encrypted by using an encryption algorithm, so that the leakage of personal privacy information is prevented. For the personal medical information, a randomization algorithm is designed to fuzzify the personal medical information, so that the personal privacy is protected, and the data can be used for the following intelligent medical service. For example, the age of a patient, the system can add random noise of +/-5% of the age of the patient on the basis of the real age of the patient, so that the real age of the patient is covered, and processed data are not deviated too far from real data;
the date information includes the patient's visit date, operation date, CT taking date, etc. And (4) fuzzifying the date according to the specific legal and regulatory requirements. Such as: only the year and month of the original data are reserved, and the specific day is randomized in the current month, so that the visit date of the user is 3, 14 and 2018, the visit date of the user can be fuzzified to be 3, 11 and 2018;
for the address information, the patient's home address, the pharmacy address of buying a medicine, and the like are included. And performing covering mask processing according to specific legal and legal requirements. Such as: only provincial and city level information in the original data is retained, and information below the city level (county level, district level, etc.) is masked. Thus, if a patient goes to No. 29 Yongtai road 9 lane of south village on Yuhua district Yuxiang street in Shijia village in Hebei province to buy medicine, then the pharmacy address would be masked as "hebei province shizhuan city.
The other information is processed as it is, and the medical data in table 1 is shown in table 2 after being desensitized.
TABLE 2 medical data record sheet after desensitization treatment
Finally, the processed medical data can be written into a desensitization medical health database for access of intelligent medical developers, and the database does not contain privacy information of patients and doctors.
The embodiment also provides a medical data desensitization system adopting the medical data desensitization method, which comprises an acquisition module, a data classification module, a sensitive word extraction module, a field classification module, a data desensitization module and an output module, wherein the acquisition module is used for acquiring medical data, and the data classification module is used for classifying the acquired medical data into text data and non-text data; the sensitive word extraction module is used for extracting key words in the text data, sending the extracted key words into the field classification module and reserving original texts without the key words; the field classification module is used for classifying personal identity information, personal medical information, date information, address information and other information of the non-text data and the keywords; the data desensitization module is used for desensitizing the information classified by the field classification module: the personal identity information is encrypted, the personal medical information and the date information are fuzzified, the address information is covered by a mask to obtain text description, and other information is subjected to original text preservation. The output module is used for outputting the information after desensitization processing to the original position of the medical data.
The sensitive word extraction module comprises a Pointer Network model and a BERT-Pointer Network model, wherein the Pointer Network model is used for improving an Attention mechanism of the BERT model based on a Transformer frame to obtain the BERT-Pointer Network model; the BERT-Pointer Network model is used for converting text information into word vectors based on context information and extracting keywords.
The field classification module comprises a BERT model, a PCA model and a clustering model, wherein the BERT model is used for converting text information into word vectors based on context information; the PCA model is used for carrying out principal component decomposition on an output result of the BERT model, combining similar medical fields and deleting irrelevant medical fields; the clustering model is used for clustering the output result of the PCA model.
The invention is not limited to the above alternative embodiments, and any other various forms of products can be obtained by anyone in the light of the present invention, but any changes in shape or structure thereof, which fall within the scope of the present invention as defined in the claims, fall within the scope of the present invention.

Claims (10)

1. A method of desensitizing medical data, comprising the steps of:
s1, classifying the acquired medical data into text data and non-text data;
s2, extracting keywords in the text data, reserving original texts without the keywords, and taking the extracted keywords and the non-text data as data to be desensitized;
s3, classifying the data to be desensitized into personal identity information, personal medical information, date information, address information and other information;
s4, desensitizing the classified information: the personal identity information is encrypted, the personal medical information and the date information are fuzzified, the address information is covered by a mask to obtain text description, and other information is processed by keeping original text.
2. The medical data desensitization method according to claim 1, wherein in step S2, the Pointer Network model is modified based on a transform framework to an Attention mechanism of the BERT model to obtain a BERT-Pointer Network model; the BERT-Pointer Network model converts text information into word vectors based on context information and extracts keywords in the text data.
3. A method of desensitizing medical data according to claim 2, wherein: the improved Attention mechanism formula is as follows:
Figure QLYQS_1
Figure QLYQS_2
the output sequence of the Pointer Network model is derived from the input sequence, the range of i is the preset maximum range and is the length of the output sequence, and the vector
Figure QLYQS_3
Represents the j thAn Attention mask for each input vector; t denotes the matrix transposition, e j 、d i Is a state quantity, v, W 1 、W 2 Is a learning parameter; c 1 、C i-1 、C i All random variables represent a certain item in an input sequence, and p is a hyper-parameter representing joint probability distribution; />
Figure QLYQS_4
The conditional probability of occurrence of the i-th term, the first i-1 term, is known.
4. A method of desensitizing medical data according to claim 1, wherein step S3 comprises: the BERT model converts text information into word vectors based on context information; the PCA model carries out principal component decomposition on the output result of the BERT model, combines similar medical fields and deletes irrelevant medical fields; and clustering the output result of the PCA model by using a clustering algorithm.
5. A method of desensitizing medical data according to claim 4, comprising: the cosine distance between the new data and the four types of data of personal identity information, personal medical information, date information and address information is judged through a clustering algorithm, and if the distance between the new data and one type of data is closest and is lower than a preset threshold value, the new data is distributed to the data of the type; and if the distance between the new data and the four types of data is larger than a preset threshold value, marking the new data as other information.
6. A method of desensitizing medical data according to claim 1, wherein in step S1, textual data and non-textual data are classified according to the name of each field of the medical data.
7. A medical data desensitization system, comprising:
the data classification module is used for classifying the acquired medical data into text data and non-text data;
the sensitive word extraction module is used for extracting key words in the text data, sending the extracted key words into the field classification module and reserving original texts without the key words;
the field classification module is used for classifying personal identity information, personal medical information, date information, address information and other information of the non-text data and the keywords;
the data desensitization module is used for desensitizing the information classified by the field classification module: the personal identity information is encrypted, the personal medical information and the date information are fuzzified, the address information is covered by a mask to obtain text description, and other information is processed by keeping original text.
8. The medical data desensitization system according to claim 7, wherein the sensitive word extraction module includes:
the device comprises a Pointer Network model, a transform framework, a BERT-Pointer Network model and a data processing module, wherein the Pointer Network model is used for improving an Attention mechanism of the BERT model based on the transform framework to obtain the BERT-Pointer Network model;
and the BERT-Pointer Network model is used for converting text information into word vectors based on context information and extracting keywords.
9. The medical data desensitization system according to claim 7, wherein the field classification module includes:
the BERT model is used for converting text information into word vectors based on context information;
the PCA model is used for carrying out principal component decomposition on the output result of the BERT model, combining similar medical fields and deleting irrelevant medical fields;
and the clustering model is used for clustering the output result of the PCA model.
10. The medical data desensitization system according to claim 7, further comprising: and the output module is used for outputting the information after desensitization treatment to the original position of the medical data.
CN202310199626.9A 2023-03-04 2023-03-04 Medical data desensitization method and system Active CN115859372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310199626.9A CN115859372B (en) 2023-03-04 2023-03-04 Medical data desensitization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310199626.9A CN115859372B (en) 2023-03-04 2023-03-04 Medical data desensitization method and system

Publications (2)

Publication Number Publication Date
CN115859372A true CN115859372A (en) 2023-03-28
CN115859372B CN115859372B (en) 2023-04-25

Family

ID=85659891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310199626.9A Active CN115859372B (en) 2023-03-04 2023-03-04 Medical data desensitization method and system

Country Status (1)

Country Link
CN (1) CN115859372B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112858A (en) * 2023-10-24 2023-11-24 武汉博特智能科技有限公司 Object screening method based on association rule mining, processor and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187796A1 (en) * 1999-06-23 2005-08-25 Visicu, Inc. System and method for displaying a health status of hospitalized patients
CN107145799A (en) * 2017-05-04 2017-09-08 山东浪潮云服务信息科技有限公司 A kind of data desensitization method and device
CN109784015A (en) * 2018-12-27 2019-05-21 腾讯科技(深圳)有限公司 A kind of authentication identifying method and device
CN110135189A (en) * 2019-04-28 2019-08-16 上海市第六人民医院 A kind of patients' privacy information desensitization method towards medical text
CN110289059A (en) * 2019-06-13 2019-09-27 北京百度网讯科技有限公司 Medical data processing method, device, storage medium and electronic equipment
CN113065330A (en) * 2021-03-22 2021-07-02 四川大学 Method for extracting sensitive information from unstructured data
CN114595689A (en) * 2022-02-28 2022-06-07 深圳依时货拉拉科技有限公司 Data processing method, data processing device, storage medium and computer equipment
CN115186051A (en) * 2022-03-08 2022-10-14 马上消费金融股份有限公司 Sensitive word detection method and device and computer readable storage medium
CN115188440A (en) * 2021-12-31 2022-10-14 阳江市人民医院 Intelligent matching method for similar medical records
CN115618371A (en) * 2022-07-11 2023-01-17 上海期货信息技术有限公司 Desensitization method and device for non-text data and storage medium
CN115687980A (en) * 2022-11-11 2023-02-03 中国农业银行股份有限公司 Desensitization classification method of data table, and classification model training method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187796A1 (en) * 1999-06-23 2005-08-25 Visicu, Inc. System and method for displaying a health status of hospitalized patients
CN107145799A (en) * 2017-05-04 2017-09-08 山东浪潮云服务信息科技有限公司 A kind of data desensitization method and device
CN109784015A (en) * 2018-12-27 2019-05-21 腾讯科技(深圳)有限公司 A kind of authentication identifying method and device
CN110135189A (en) * 2019-04-28 2019-08-16 上海市第六人民医院 A kind of patients' privacy information desensitization method towards medical text
CN110289059A (en) * 2019-06-13 2019-09-27 北京百度网讯科技有限公司 Medical data processing method, device, storage medium and electronic equipment
CN113065330A (en) * 2021-03-22 2021-07-02 四川大学 Method for extracting sensitive information from unstructured data
CN115188440A (en) * 2021-12-31 2022-10-14 阳江市人民医院 Intelligent matching method for similar medical records
CN114595689A (en) * 2022-02-28 2022-06-07 深圳依时货拉拉科技有限公司 Data processing method, data processing device, storage medium and computer equipment
CN115186051A (en) * 2022-03-08 2022-10-14 马上消费金融股份有限公司 Sensitive word detection method and device and computer readable storage medium
CN115618371A (en) * 2022-07-11 2023-01-17 上海期货信息技术有限公司 Desensitization method and device for non-text data and storage medium
CN115687980A (en) * 2022-11-11 2023-02-03 中国农业银行股份有限公司 Desensitization classification method of data table, and classification model training method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
PEDRO J. ROSA等: "Effects of fear-relevant stimuli on attention: Integrating gaze data with subliminal exposure" *
唐迪等: "数据脱敏技术发展趋势" *
张勇: "面向金融的文本分析及摘要生成技术研究与实现" *
谢沂林等: "基于图数据库的电子病历存储方法" *
郑旭如: "基于深度学习的数据脱敏研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112858A (en) * 2023-10-24 2023-11-24 武汉博特智能科技有限公司 Object screening method based on association rule mining, processor and storage medium
CN117112858B (en) * 2023-10-24 2024-02-02 武汉博特智能科技有限公司 Object screening method based on association rule mining, processor and storage medium

Also Published As

Publication number Publication date
CN115859372B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
Alpaydin Introduction to machine learning
CN109669994B (en) Construction method and system of health knowledge map
CN107818138B (en) Case law regulation recommendation method and system
Wang et al. Hybrid transfer learning and broad learning system for wearing mask detection in the COVID-19 era
CN110909548B (en) Chinese named entity recognition method, device and computer readable storage medium
CN109670179B (en) Medical record text named entity identification method based on iterative expansion convolutional neural network
CN111709233B (en) Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network
Psychoula et al. A deep learning approach for privacy preservation in assisted living
Taira et al. Identification of patient name references within medical documents using semantic selectional restrictions.
Yang et al. Fast neighborhood component analysis
US10311374B2 (en) Categorization of forms to aid in form search
Nandakumar Integration of multiple cues in biometric systems
AU2019206078B2 (en) Intelligent persona generation
CN115859372B (en) Medical data desensitization method and system
CN109522740B (en) Health data privacy removal processing method and system
CN110175334A (en) Text knowledge's extraction system and method based on customized knowledge slot structure
CN112966517B (en) Training method, device, equipment and medium for named entity recognition model
JP6908977B2 (en) Medical information processing system, medical information processing device and medical information processing method
Gentzel Biased face recognition technology used by government: A problem for liberal democracy
Stock et al. Detecting geospatial location descriptions in natural language text
CN111651579B (en) Information query method, device, computer equipment and storage medium
CN111680131A (en) Document clustering method and system based on semantics and computer equipment
CN115186068A (en) Symptom question-answering method, device, equipment and storage medium based on knowledge graph
Islam A theoretical analysis of the legal status of transgender: Bangladesh perspective
Anggorojati et al. Securing communication in the IoT-based health care systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Li Rui

Inventor after: Hu Qitong

Inventor after: Liu Ruihua

Inventor after: Zheng Mingyang

Inventor before: Li Rui

Inventor before: Hu Qitong

Inventor before: Liu Ruihua

Inventor before: Zheng Mingyang

Inventor before: Tang Xuewen

CB03 Change of inventor or designer information