CN113257375A - Method for desensitizing sudden acute infectious disease data - Google Patents
Method for desensitizing sudden acute infectious disease data Download PDFInfo
- Publication number
- CN113257375A CN113257375A CN202110516944.4A CN202110516944A CN113257375A CN 113257375 A CN113257375 A CN 113257375A CN 202110516944 A CN202110516944 A CN 202110516944A CN 113257375 A CN113257375 A CN 113257375A
- Authority
- CN
- China
- Prior art keywords
- data
- desensitization
- file
- field
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000035473 Communicable disease Diseases 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000001154 acute effect Effects 0.000 title claims abstract description 21
- 208000015181 infectious disease Diseases 0.000 title claims abstract description 16
- 238000000586 desensitisation Methods 0.000 claims abstract description 71
- 238000004458 analytical method Methods 0.000 claims abstract description 11
- 238000013500 data storage Methods 0.000 claims abstract description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 abstract description 4
- 238000004364 calculation method Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000011841 epidemiological investigation Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Epidemiology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to a method for desensitizing sudden acute infectious disease data, which comprises the following steps: 101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization; 102. field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode; 103. outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished. The invention can autonomously select desensitization mode according to the data characteristics of different sudden acute infectious diseases, can select and extract effective data from the complete data table to generate a new table, and quickly complete desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.
Description
Technical Field
The invention belongs to the technical field of medical equipment, and particularly relates to a data desensitization method for emergent acute infectious diseases, which is used for performing batch data desensitization on data which is possibly marked with patient privacy information or is not related to statistical analysis.
Background
Etiologic data or epidemiological survey data are the basis for statistical analysis of various emergent acute infections. However, most of this data relates to private information, and desensitization of sensitive data is required before analysis. At present, etiology or epidemiology investigation data obtained by laboratory detection is usually uploaded to an epidemic situation data monitoring network developed by national disease control to desensitize sensitive data. The general process is that etiology data or epidemiology survey data are arranged into an excel form, an authorized account number is used for logging in an epidemic situation reporting system, manual data entry and uploading are carried out, the system automatically hides personal information such as names, identity card numbers and family addresses in the data, meanwhile, a corresponding unique number is generated for each piece of data, and complete original data can be found in a database according to the number. Case information uploaded by an epidemic situation reporting system for national disease control is fixed, and customized data desensitization is not performed according to different types of infectious diseases.
Data of sudden acute infectious diseases and epidemiological investigation in a laboratory generally comprise privacy information of patients, and cannot be directly analyzed and applied. At present, desensitization methods aiming at experimental data pathogen data and epidemiological investigation are not mature.
At present, local disease control or laboratories are still in a strategy of manually picking effective data when performing statistical analysis on partial data, which not only increases the workload of scientific research personnel, but also easily causes the problem of poor data consistency. In addition, although the data downloaded from the epidemic situation data monitoring system processes sensitive data, the data format is fixed, the phenomenon of data redundancy is often generated during statistical analysis, the data analysis is not facilitated, and the problem of data incompatibility is generated during multi-scale operation with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information).
Disclosure of Invention
In order to solve the above problems, a primary object of the present invention is to provide a method for desensitizing data of sudden acute infectious diseases, which provides a simple and visual processing means for data processing by researchers, and improves the work efficiency of the researchers on the premise of ensuring information security.
The invention also aims to provide a method for desensitizing sudden acute infectious disease data, which is used for desensitizing infectious disease data stored in a csv or excel file form and providing safe and effective data information to related use units.
It is still another object of the present invention to provide a method for desensitizing data of emergent acute infectious diseases, wherein the data format generated by the system satisfies compatibility with other data in calculation.
In order to achieve the above object, the technical solution of the present invention is as follows.
A method of desensitizing a sudden acute infectious disease, the method comprising the steps of:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
the desensitization file is a csv or excel file; infectious disease data stored in the csv or excel file form is desensitized, and safe and effective data information can be quickly and conveniently provided for a user.
102. Field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
the data saving mode has both batch saving and combined saving.
The rules for desensitization include two: (1) recoverable, which means that the desensitized data can be recovered to the original sensitive data in a certain way. (2) The unrecoverable class means that the desensitized portion of the desensitized data is unrecoverable using any means.
The desensitization mode has four modes of text output, character replacement, field encryption and hiding.
Specifically, the method comprises the steps of reading in raw data in batch, sorting the obtained data to include a field list, and generating a result file in one of the following modes:
original text output mode: and only the read data is spliced, the original text output is not processed, and the result file is written, mainly aiming at the data items which do not relate to personal privacy and are needed for analysis.
Character replacement mode: and replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field as the x, hiding sensitive information (the sensitive information cannot be restored by outputting the original text), splicing all the fields, and writing the complete data after replacing the characters into a file. Mainly aims at data items such as names, identity card numbers, family addresses and the like.
Encryption field mode: and completely encrypting the encryption field of each piece of read data through a specific encryption algorithm, outputting a completely encrypted character string, splicing all the fields, and writing the completely encrypted character string into a file by replacing the completely encrypted data corresponding to the encryption field. The method can be used for hiding data items such as names, identification numbers, home addresses and the like, and is beneficial to corresponding patient information in an analysis stage and mining more related data compared with a character replacement mode. The disadvantage is that it can be decrypted and needs to protect the encryption mode.
Hide this approach: the data is not read and the data item is not written to the generated file. Mainly for redundant data items in the analysis.
Further, the encryption and decryption algorithm in the 'encryption field' mode adopts standard base64 encoding, and decryption can be carried out by using a corresponding decoder, so that recoverable desensitization is achieved; the character replacement mode adopts the replacement algorithm that the part needing desensitization is replaced by defined characters or character strings; the mode of 'original text output' and 'hiding the item' is to splice the read data and generate a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
103. Outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
Specifically, in this step, an output path is set, data desensitization is started after the output path is selected, and the conversion process is displayed by a progress bar. And after the conversion is finished, displaying the result file under the target path.
The invention has the beneficial effects that:
the invention can autonomously select desensitization mode according to the data characteristics of different sudden acute infectious diseases, can select and extract effective data from the complete data table to generate a new table, and quickly complete desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.
Meanwhile, the invention develops aiming at the calculation requirement of multiple metadata and multiple space-time scales, so that the generated data format meets the compatibility with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information and the like) in calculation.
Drawings
FIG. 1 is a schematic diagram of data desensitization implemented by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention discloses a method for desensitizing sudden acute infectious disease data, which comprises the following steps:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
the desensitization file is required to be a csv or excel file, infectious disease data stored in the csv or excel file form is desensitized, and safe and effective data information can be quickly and conveniently provided for a use unit.
102. Field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
the data saving mode has both batch saving and combined saving.
The desensitization mode has four modes of text output, character replacement, field encryption and hiding.
103. Outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
The specific implementation is shown in fig. 1.
2.1 input file data. Clicking a button for selecting original data of the data, and popping up a path selection window; selecting the csv or excel input file needing desensitization. The right list shows all field names.
2.2 field desensitization setting. First, the left side sets a data save mode (batch save and merged save), then selects fields requiring desensitization on the right list, and sets a desensitization mode (text output, character replacement, encrypting fields, and hiding this entry).
Original text output mode: and only the read data is spliced, the original text output is not processed, and the result file is written, mainly aiming at the data items which do not relate to personal privacy and are needed for analysis.
Character replacement mode: and replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field as the x, hiding sensitive information (the sensitive information cannot be restored by outputting the original text), splicing all the fields, and writing the complete data after replacing the characters into a file. Mainly aims at data items such as names, identity card numbers, family addresses and the like.
Encryption field mode: and completely encrypting the encryption field of each piece of read data through a specific encryption algorithm, outputting a completely encrypted character string, splicing all the fields, and writing the completely encrypted character string into a file by replacing the completely encrypted data corresponding to the encryption field. The method can be used for hiding data items such as names, identification numbers, home addresses and the like, and is beneficial to corresponding patient information in an analysis stage and mining more related data compared with a character replacement mode. The disadvantage is that it can be decrypted and needs to protect the encryption mode.
Hide this approach: the data is not read and the data item is not written to the generated file. Mainly for redundant data items in the analysis.
The encryption and decryption algorithm in the 'encryption field' mode adopts standard base64 encoding, and can be decrypted by using a corresponding decoder for recoverable desensitization; the character replacement mode adopts the replacement algorithm that the part needing desensitization is replaced by defined characters or character strings; the mode of 'original text output' and 'hiding the item' is to splice the read data and generate a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
And 2.3, outputting the desensitized file data. And setting an output path, clicking to start data desensitization, and popping up a progress bar to display a conversion process. And after the conversion is finished, displaying the result file under the target path.
The invention can be used as a desensitization tool for csv and excel files in batches, can select any field for desensitization, and can also be configured with a secret key for encrypting and restoring field information.
When in desensitization, a csv or Excel file needing desensitization is selected, the system automatically extracts items in the infectious disease data through structured extraction of attribute information to list items in the data, and four output modes of 'original text output', 'hidden item', 'character replacement' and 'encrypted field' are selected in an interface. And then desensitization is started by clicking, so that a desensitized result file can be obtained, and a visual and concise laboratory data desensitization mode is provided for scientific researchers.
In a word, aiming at the data characteristics of different sudden acute infectious diseases, the invention can autonomously select a desensitization mode, can select and extract effective data from the complete data table to generate a new excel table, and quickly completes desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.
The method is developed according to the calculation requirements of the multi-metadata multi-space-time scale, so that the generated data format meets the compatibility with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information) in calculation.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (7)
1. A method for desensitizing data of emergent acute infectious diseases, which is characterized by comprising the following steps:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
102. field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
103. outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
2. The method for desensitizing sudden acute infectious disease data according to claim 1, wherein in step 101, the desensitization file is a csv or excel file.
3. A method of desensitizing sudden acute infectious disease data according to claim 1, wherein in said 102 steps, said data storage mode has both batch storage and pooled storage in said 102 steps.
4. A method of desensitizing emergent acute infectious disease data according to claim 1, wherein in said 102 step, said desensitization mode has four modes of textual output, character substitution, encrypted field, and hiding.
5. Method of desensitization to emergent acute infectious disease data according to claim 4, characterized in that said four desensitization modalities are:
original text output mode: only splicing the read data, not processing original text output, writing a result file, and mainly aiming at data items which do not relate to personal privacy and are needed for analysis;
character replacement mode: replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field needing to be replaced, hiding sensitive information, splicing all the fields, and writing the complete data after replacing the characters into a file;
encryption field mode: completely encrypting the encryption field of each piece of read data through an encryption and decryption algorithm, outputting a completely encrypted character string, splicing all the fields, and replacing the complete data corresponding to the encrypted field to write in a file;
hide this approach: the data is not read and the data item is not written to the generated file.
6. The method for desensitizing sudden acute infectious disease data according to claim 5, wherein the encryption/decryption algorithm of said encrypted field is encoded using standard base64, and decryption is possible using a corresponding decoder for recoverable desensitization; the character replacement adopts a replacement algorithm to replace the part needing desensitization by using a defined character or character string; and the original text output and hiding are used for splicing the read data and generating a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
7. A method for desensitizing data of emergent acute infectious diseases according to claim 1, wherein in step 103, an output path is set, data desensitization is started after the output path is selected, and a conversion process is displayed through a progress bar. And after the conversion is finished, displaying the result file under the target path.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110516944.4A CN113257375A (en) | 2021-05-12 | 2021-05-12 | Method for desensitizing sudden acute infectious disease data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110516944.4A CN113257375A (en) | 2021-05-12 | 2021-05-12 | Method for desensitizing sudden acute infectious disease data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113257375A true CN113257375A (en) | 2021-08-13 |
Family
ID=77222987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110516944.4A Pending CN113257375A (en) | 2021-05-12 | 2021-05-12 | Method for desensitizing sudden acute infectious disease data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113257375A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114669A (en) * | 2022-08-23 | 2022-09-27 | 山东双仁信息技术有限公司 | Interactive content desensitization method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008029419A (en) * | 2006-07-26 | 2008-02-14 | Fujifilm Corp | Data management device for diagnostic reading and data management method for diagnostic reading |
CN107145799A (en) * | 2017-05-04 | 2017-09-08 | 山东浪潮云服务信息科技有限公司 | A kind of data desensitization method and device |
CN109033873A (en) * | 2018-07-19 | 2018-12-18 | 四川长虹智慧健康科技有限公司 | A kind of data desensitization method preventing privacy compromise |
CN110008751A (en) * | 2019-04-11 | 2019-07-12 | 中国联合网络通信集团有限公司 | A kind of data desensitization method and system |
CN111506808A (en) * | 2020-02-23 | 2020-08-07 | 北京三快在线科技有限公司 | User data processing method, two-dimensional code display method, system and device |
CN111899893A (en) * | 2020-09-29 | 2020-11-06 | 南京汉卫公共卫生研究院有限公司 | Infectious disease early warning decision platform system |
-
2021
- 2021-05-12 CN CN202110516944.4A patent/CN113257375A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008029419A (en) * | 2006-07-26 | 2008-02-14 | Fujifilm Corp | Data management device for diagnostic reading and data management method for diagnostic reading |
CN107145799A (en) * | 2017-05-04 | 2017-09-08 | 山东浪潮云服务信息科技有限公司 | A kind of data desensitization method and device |
CN109033873A (en) * | 2018-07-19 | 2018-12-18 | 四川长虹智慧健康科技有限公司 | A kind of data desensitization method preventing privacy compromise |
CN110008751A (en) * | 2019-04-11 | 2019-07-12 | 中国联合网络通信集团有限公司 | A kind of data desensitization method and system |
CN111506808A (en) * | 2020-02-23 | 2020-08-07 | 北京三快在线科技有限公司 | User data processing method, two-dimensional code display method, system and device |
CN111899893A (en) * | 2020-09-29 | 2020-11-06 | 南京汉卫公共卫生研究院有限公司 | Infectious disease early warning decision platform system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115114669A (en) * | 2022-08-23 | 2022-09-27 | 山东双仁信息技术有限公司 | Interactive content desensitization method and system |
CN115114669B (en) * | 2022-08-23 | 2023-02-10 | 山东双仁信息技术有限公司 | Interactive content desensitization method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10778441B2 (en) | Redactable document signatures | |
CN103119594B (en) | Can retrieve encryption processing system | |
US9098490B2 (en) | Genetic information management system and method | |
US20180012039A1 (en) | Anonymization processing device, anonymization processing method, and program | |
US8949625B2 (en) | Systems for structured encryption using embedded information in data strings | |
CA2906475C (en) | Method and apparatus for substitution scheme for anonymizing personally identifiable information | |
WO2018102286A1 (en) | Generating and processing obfuscated sensitive information | |
US20160048690A1 (en) | Genetic information storage apparatus, genetic information search apparatus, genetic information storage program, genetic information search program, genetic information storage method, genetic information search method, and genetic information search system | |
JP2002108910A (en) | Enciphered filing system, enciphered file retrieving method and computer readable recording medium | |
US11893136B2 (en) | Token-based data security systems and methods with cross-referencing tokens in freeform text within structured document | |
CN111984987B (en) | Method, device, system and medium for desensitizing and restoring electronic medical records | |
Rane et al. | Multi-user multi-keyword privacy preserving ranked based search over encrypted cloud data | |
KR20200063187A (en) | Improved computing device | |
JP6781373B2 (en) | Search program, search method, and search device | |
CN113257375A (en) | Method for desensitizing sudden acute infectious disease data | |
CN106156076A (en) | The method and system that data process | |
WO2013192110A2 (en) | Secure molecular similarity calculations | |
JP2012073693A (en) | Gene information retrieval system, gene information storage device, gene information retrieval device, gene information storage program, gene information retrieval program, gene information storage method, and gene information retrieval method | |
Ptitsyn et al. | Computational workflow for analysis of gain and loss of genes in distantly related genomes | |
CN115700887A (en) | Electronic medical record processing method and device, storage medium and electronic equipment | |
JP2002169807A (en) | Information processing system and personal information retrieving method | |
CN114338058A (en) | Information processing method, device and storage medium | |
CN111933241A (en) | Medical data analysis method, medical data analysis device, electronic device, and storage medium | |
CN115757591A (en) | Database management system, method, device and medium based on multiple data sources | |
CN117272353B (en) | Data encryption storage protection system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |