CN113257375A - Method for desensitizing sudden acute infectious disease data - Google Patents

Method for desensitizing sudden acute infectious disease data Download PDF

Info

Publication number
CN113257375A
CN113257375A CN202110516944.4A CN202110516944A CN113257375A CN 113257375 A CN113257375 A CN 113257375A CN 202110516944 A CN202110516944 A CN 202110516944A CN 113257375 A CN113257375 A CN 113257375A
Authority
CN
China
Prior art keywords
data
desensitization
file
field
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110516944.4A
Other languages
Chinese (zh)
Inventor
王娇
王慧
李颖
刘宏图
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute for Viral Disease Control and Prevention Chinese Center for Disease Control and Prevention
Original Assignee
National Institute for Viral Disease Control and Prevention Chinese Center for Disease Control and Prevention
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Institute for Viral Disease Control and Prevention Chinese Center for Disease Control and Prevention filed Critical National Institute for Viral Disease Control and Prevention Chinese Center for Disease Control and Prevention
Priority to CN202110516944.4A priority Critical patent/CN113257375A/en
Publication of CN113257375A publication Critical patent/CN113257375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a method for desensitizing sudden acute infectious disease data, which comprises the following steps: 101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization; 102. field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode; 103. outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished. The invention can autonomously select desensitization mode according to the data characteristics of different sudden acute infectious diseases, can select and extract effective data from the complete data table to generate a new table, and quickly complete desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.

Description

Method for desensitizing sudden acute infectious disease data
Technical Field
The invention belongs to the technical field of medical equipment, and particularly relates to a data desensitization method for emergent acute infectious diseases, which is used for performing batch data desensitization on data which is possibly marked with patient privacy information or is not related to statistical analysis.
Background
Etiologic data or epidemiological survey data are the basis for statistical analysis of various emergent acute infections. However, most of this data relates to private information, and desensitization of sensitive data is required before analysis. At present, etiology or epidemiology investigation data obtained by laboratory detection is usually uploaded to an epidemic situation data monitoring network developed by national disease control to desensitize sensitive data. The general process is that etiology data or epidemiology survey data are arranged into an excel form, an authorized account number is used for logging in an epidemic situation reporting system, manual data entry and uploading are carried out, the system automatically hides personal information such as names, identity card numbers and family addresses in the data, meanwhile, a corresponding unique number is generated for each piece of data, and complete original data can be found in a database according to the number. Case information uploaded by an epidemic situation reporting system for national disease control is fixed, and customized data desensitization is not performed according to different types of infectious diseases.
Data of sudden acute infectious diseases and epidemiological investigation in a laboratory generally comprise privacy information of patients, and cannot be directly analyzed and applied. At present, desensitization methods aiming at experimental data pathogen data and epidemiological investigation are not mature.
At present, local disease control or laboratories are still in a strategy of manually picking effective data when performing statistical analysis on partial data, which not only increases the workload of scientific research personnel, but also easily causes the problem of poor data consistency. In addition, although the data downloaded from the epidemic situation data monitoring system processes sensitive data, the data format is fixed, the phenomenon of data redundancy is often generated during statistical analysis, the data analysis is not facilitated, and the problem of data incompatibility is generated during multi-scale operation with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information).
Disclosure of Invention
In order to solve the above problems, a primary object of the present invention is to provide a method for desensitizing data of sudden acute infectious diseases, which provides a simple and visual processing means for data processing by researchers, and improves the work efficiency of the researchers on the premise of ensuring information security.
The invention also aims to provide a method for desensitizing sudden acute infectious disease data, which is used for desensitizing infectious disease data stored in a csv or excel file form and providing safe and effective data information to related use units.
It is still another object of the present invention to provide a method for desensitizing data of emergent acute infectious diseases, wherein the data format generated by the system satisfies compatibility with other data in calculation.
In order to achieve the above object, the technical solution of the present invention is as follows.
A method of desensitizing a sudden acute infectious disease, the method comprising the steps of:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
the desensitization file is a csv or excel file; infectious disease data stored in the csv or excel file form is desensitized, and safe and effective data information can be quickly and conveniently provided for a user.
102. Field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
the data saving mode has both batch saving and combined saving.
The rules for desensitization include two: (1) recoverable, which means that the desensitized data can be recovered to the original sensitive data in a certain way. (2) The unrecoverable class means that the desensitized portion of the desensitized data is unrecoverable using any means.
The desensitization mode has four modes of text output, character replacement, field encryption and hiding.
Specifically, the method comprises the steps of reading in raw data in batch, sorting the obtained data to include a field list, and generating a result file in one of the following modes:
original text output mode: and only the read data is spliced, the original text output is not processed, and the result file is written, mainly aiming at the data items which do not relate to personal privacy and are needed for analysis.
Character replacement mode: and replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field as the x, hiding sensitive information (the sensitive information cannot be restored by outputting the original text), splicing all the fields, and writing the complete data after replacing the characters into a file. Mainly aims at data items such as names, identity card numbers, family addresses and the like.
Encryption field mode: and completely encrypting the encryption field of each piece of read data through a specific encryption algorithm, outputting a completely encrypted character string, splicing all the fields, and writing the completely encrypted character string into a file by replacing the completely encrypted data corresponding to the encryption field. The method can be used for hiding data items such as names, identification numbers, home addresses and the like, and is beneficial to corresponding patient information in an analysis stage and mining more related data compared with a character replacement mode. The disadvantage is that it can be decrypted and needs to protect the encryption mode.
Hide this approach: the data is not read and the data item is not written to the generated file. Mainly for redundant data items in the analysis.
Further, the encryption and decryption algorithm in the 'encryption field' mode adopts standard base64 encoding, and decryption can be carried out by using a corresponding decoder, so that recoverable desensitization is achieved; the character replacement mode adopts the replacement algorithm that the part needing desensitization is replaced by defined characters or character strings; the mode of 'original text output' and 'hiding the item' is to splice the read data and generate a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
103. Outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
Specifically, in this step, an output path is set, data desensitization is started after the output path is selected, and the conversion process is displayed by a progress bar. And after the conversion is finished, displaying the result file under the target path.
The invention has the beneficial effects that:
the invention can autonomously select desensitization mode according to the data characteristics of different sudden acute infectious diseases, can select and extract effective data from the complete data table to generate a new table, and quickly complete desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.
Meanwhile, the invention develops aiming at the calculation requirement of multiple metadata and multiple space-time scales, so that the generated data format meets the compatibility with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information and the like) in calculation.
Drawings
FIG. 1 is a schematic diagram of data desensitization implemented by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention discloses a method for desensitizing sudden acute infectious disease data, which comprises the following steps:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
the desensitization file is required to be a csv or excel file, infectious disease data stored in the csv or excel file form is desensitized, and safe and effective data information can be quickly and conveniently provided for a use unit.
102. Field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
the data saving mode has both batch saving and combined saving.
The desensitization mode has four modes of text output, character replacement, field encryption and hiding.
103. Outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
The specific implementation is shown in fig. 1.
2.1 input file data. Clicking a button for selecting original data of the data, and popping up a path selection window; selecting the csv or excel input file needing desensitization. The right list shows all field names.
2.2 field desensitization setting. First, the left side sets a data save mode (batch save and merged save), then selects fields requiring desensitization on the right list, and sets a desensitization mode (text output, character replacement, encrypting fields, and hiding this entry).
Original text output mode: and only the read data is spliced, the original text output is not processed, and the result file is written, mainly aiming at the data items which do not relate to personal privacy and are needed for analysis.
Character replacement mode: and replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field as the x, hiding sensitive information (the sensitive information cannot be restored by outputting the original text), splicing all the fields, and writing the complete data after replacing the characters into a file. Mainly aims at data items such as names, identity card numbers, family addresses and the like.
Encryption field mode: and completely encrypting the encryption field of each piece of read data through a specific encryption algorithm, outputting a completely encrypted character string, splicing all the fields, and writing the completely encrypted character string into a file by replacing the completely encrypted data corresponding to the encryption field. The method can be used for hiding data items such as names, identification numbers, home addresses and the like, and is beneficial to corresponding patient information in an analysis stage and mining more related data compared with a character replacement mode. The disadvantage is that it can be decrypted and needs to protect the encryption mode.
Hide this approach: the data is not read and the data item is not written to the generated file. Mainly for redundant data items in the analysis.
The encryption and decryption algorithm in the 'encryption field' mode adopts standard base64 encoding, and can be decrypted by using a corresponding decoder for recoverable desensitization; the character replacement mode adopts the replacement algorithm that the part needing desensitization is replaced by defined characters or character strings; the mode of 'original text output' and 'hiding the item' is to splice the read data and generate a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
And 2.3, outputting the desensitized file data. And setting an output path, clicking to start data desensitization, and popping up a progress bar to display a conversion process. And after the conversion is finished, displaying the result file under the target path.
The invention can be used as a desensitization tool for csv and excel files in batches, can select any field for desensitization, and can also be configured with a secret key for encrypting and restoring field information.
When in desensitization, a csv or Excel file needing desensitization is selected, the system automatically extracts items in the infectious disease data through structured extraction of attribute information to list items in the data, and four output modes of 'original text output', 'hidden item', 'character replacement' and 'encrypted field' are selected in an interface. And then desensitization is started by clicking, so that a desensitized result file can be obtained, and a visual and concise laboratory data desensitization mode is provided for scientific researchers.
In a word, aiming at the data characteristics of different sudden acute infectious diseases, the invention can autonomously select a desensitization mode, can select and extract effective data from the complete data table to generate a new excel table, and quickly completes desensitization, extraction and arrangement of analysis data on the basis of keeping the original data complete.
The method is developed according to the calculation requirements of the multi-metadata multi-space-time scale, so that the generated data format meets the compatibility with other data (such as weather cloud pictures, air detection indexes, logistics information or geographic space information) in calculation.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A method for desensitizing data of emergent acute infectious diseases, which is characterized by comprising the following steps:
101. inputting file data; clicking original data of the selected data, and selecting a file needing desensitization;
102. field desensitization setting; setting a data storage mode, selecting a desensitization field, and selecting a desensitization mode;
103. outputting desensitized file data; and setting an output path, starting data desensitization, and displaying a desensitization result file under the target path after the desensitization is finished.
2. The method for desensitizing sudden acute infectious disease data according to claim 1, wherein in step 101, the desensitization file is a csv or excel file.
3. A method of desensitizing sudden acute infectious disease data according to claim 1, wherein in said 102 steps, said data storage mode has both batch storage and pooled storage in said 102 steps.
4. A method of desensitizing emergent acute infectious disease data according to claim 1, wherein in said 102 step, said desensitization mode has four modes of textual output, character substitution, encrypted field, and hiding.
5. Method of desensitization to emergent acute infectious disease data according to claim 4, characterized in that said four desensitization modalities are:
original text output mode: only splicing the read data, not processing original text output, writing a result file, and mainly aiming at data items which do not relate to personal privacy and are needed for analysis;
character replacement mode: replacing the field needing to be replaced by the encrypted field of each piece of read data, outputting the field needing to be replaced, hiding sensitive information, splicing all the fields, and writing the complete data after replacing the characters into a file;
encryption field mode: completely encrypting the encryption field of each piece of read data through an encryption and decryption algorithm, outputting a completely encrypted character string, splicing all the fields, and replacing the complete data corresponding to the encrypted field to write in a file;
hide this approach: the data is not read and the data item is not written to the generated file.
6. The method for desensitizing sudden acute infectious disease data according to claim 5, wherein the encryption/decryption algorithm of said encrypted field is encoded using standard base64, and decryption is possible using a corresponding decoder for recoverable desensitization; the character replacement adopts a replacement algorithm to replace the part needing desensitization by using a defined character or character string; and the original text output and hiding are used for splicing the read data and generating a new data file, and the original text output, the character replacement and the hiding are unrecoverable desensitization.
7. A method for desensitizing data of emergent acute infectious diseases according to claim 1, wherein in step 103, an output path is set, data desensitization is started after the output path is selected, and a conversion process is displayed through a progress bar. And after the conversion is finished, displaying the result file under the target path.
CN202110516944.4A 2021-05-12 2021-05-12 Method for desensitizing sudden acute infectious disease data Pending CN113257375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110516944.4A CN113257375A (en) 2021-05-12 2021-05-12 Method for desensitizing sudden acute infectious disease data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110516944.4A CN113257375A (en) 2021-05-12 2021-05-12 Method for desensitizing sudden acute infectious disease data

Publications (1)

Publication Number Publication Date
CN113257375A true CN113257375A (en) 2021-08-13

Family

ID=77222987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110516944.4A Pending CN113257375A (en) 2021-05-12 2021-05-12 Method for desensitizing sudden acute infectious disease data

Country Status (1)

Country Link
CN (1) CN113257375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114669A (en) * 2022-08-23 2022-09-27 山东双仁信息技术有限公司 Interactive content desensitization method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008029419A (en) * 2006-07-26 2008-02-14 Fujifilm Corp Data management device for diagnostic reading and data management method for diagnostic reading
CN107145799A (en) * 2017-05-04 2017-09-08 山东浪潮云服务信息科技有限公司 A kind of data desensitization method and device
CN109033873A (en) * 2018-07-19 2018-12-18 四川长虹智慧健康科技有限公司 A kind of data desensitization method preventing privacy compromise
CN110008751A (en) * 2019-04-11 2019-07-12 中国联合网络通信集团有限公司 A kind of data desensitization method and system
CN111506808A (en) * 2020-02-23 2020-08-07 北京三快在线科技有限公司 User data processing method, two-dimensional code display method, system and device
CN111899893A (en) * 2020-09-29 2020-11-06 南京汉卫公共卫生研究院有限公司 Infectious disease early warning decision platform system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008029419A (en) * 2006-07-26 2008-02-14 Fujifilm Corp Data management device for diagnostic reading and data management method for diagnostic reading
CN107145799A (en) * 2017-05-04 2017-09-08 山东浪潮云服务信息科技有限公司 A kind of data desensitization method and device
CN109033873A (en) * 2018-07-19 2018-12-18 四川长虹智慧健康科技有限公司 A kind of data desensitization method preventing privacy compromise
CN110008751A (en) * 2019-04-11 2019-07-12 中国联合网络通信集团有限公司 A kind of data desensitization method and system
CN111506808A (en) * 2020-02-23 2020-08-07 北京三快在线科技有限公司 User data processing method, two-dimensional code display method, system and device
CN111899893A (en) * 2020-09-29 2020-11-06 南京汉卫公共卫生研究院有限公司 Infectious disease early warning decision platform system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114669A (en) * 2022-08-23 2022-09-27 山东双仁信息技术有限公司 Interactive content desensitization method and system
CN115114669B (en) * 2022-08-23 2023-02-10 山东双仁信息技术有限公司 Interactive content desensitization method and system

Similar Documents

Publication Publication Date Title
US10778441B2 (en) Redactable document signatures
CN103119594B (en) Can retrieve encryption processing system
US9098490B2 (en) Genetic information management system and method
US20180012039A1 (en) Anonymization processing device, anonymization processing method, and program
US8949625B2 (en) Systems for structured encryption using embedded information in data strings
CA2906475C (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
WO2018102286A1 (en) Generating and processing obfuscated sensitive information
US20160048690A1 (en) Genetic information storage apparatus, genetic information search apparatus, genetic information storage program, genetic information search program, genetic information storage method, genetic information search method, and genetic information search system
JP2002108910A (en) Enciphered filing system, enciphered file retrieving method and computer readable recording medium
US11893136B2 (en) Token-based data security systems and methods with cross-referencing tokens in freeform text within structured document
CN111984987B (en) Method, device, system and medium for desensitizing and restoring electronic medical records
Rane et al. Multi-user multi-keyword privacy preserving ranked based search over encrypted cloud data
KR20200063187A (en) Improved computing device
JP6781373B2 (en) Search program, search method, and search device
CN113257375A (en) Method for desensitizing sudden acute infectious disease data
CN106156076A (en) The method and system that data process
WO2013192110A2 (en) Secure molecular similarity calculations
JP2012073693A (en) Gene information retrieval system, gene information storage device, gene information retrieval device, gene information storage program, gene information retrieval program, gene information storage method, and gene information retrieval method
Ptitsyn et al. Computational workflow for analysis of gain and loss of genes in distantly related genomes
CN115700887A (en) Electronic medical record processing method and device, storage medium and electronic equipment
JP2002169807A (en) Information processing system and personal information retrieving method
CN114338058A (en) Information processing method, device and storage medium
CN111933241A (en) Medical data analysis method, medical data analysis device, electronic device, and storage medium
CN115757591A (en) Database management system, method, device and medium based on multiple data sources
CN117272353B (en) Data encryption storage protection system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination