CN117494193A - Combined desensitization checking method for medical data - Google Patents

Combined desensitization checking method for medical data Download PDF

Info

Publication number
CN117494193A
CN117494193A CN202311410452.2A CN202311410452A CN117494193A CN 117494193 A CN117494193 A CN 117494193A CN 202311410452 A CN202311410452 A CN 202311410452A CN 117494193 A CN117494193 A CN 117494193A
Authority
CN
China
Prior art keywords
data
splicing
desensitization
sensitive information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311410452.2A
Other languages
Chinese (zh)
Inventor
柏志安
朱立峰
于若颖
朱铁兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Original Assignee
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd filed Critical Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority to CN202311410452.2A priority Critical patent/CN117494193A/en
Publication of CN117494193A publication Critical patent/CN117494193A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses a combined desensitization checking method for medical data, wherein the method comprises the following steps: acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported; splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string; and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported. The technical scheme of the embodiment can perform desensitization check on the data to be exported, and avoid incomplete desensitization of the data, thereby reducing the possibility of sensitive data leakage and improving the data security.

Description

Combined desensitization checking method for medical data
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a combined desensitization checking method for medical data.
Background
The medical data must be desensitized when exported to an external database to protect patient privacy. At present, the medical data desensitization system in hospital generally binds corresponding desensitization rules for each data field to perform unified desensitization treatment on the field level or perform unified identification treatment on the field content.
However, when medical data is manually entered, sensitive information of a doctor or patient may be entered into a non-desensitized field, or the contents of one sensitive information field may be disassembled and uploaded to a different field, respectively. This causes a vulnerability in the data desensitization process, and the security performance of the data desensitization system is to be improved.
Disclosure of Invention
The embodiment of the invention provides a combined desensitization checking method, device, equipment and medium for medical data, which can perform desensitization checking on data to be exported, avoid incomplete desensitization of the data, reduce the possibility of sensitive data leakage and improve the data security.
In a first aspect, an embodiment of the present invention provides a combined desensitization checking method for medical data, including:
acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported;
splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string;
and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported.
In a second aspect, an embodiment of the present invention provides a combined desensitization checking apparatus for medical data, the apparatus comprising:
the field data acquisition module is used for acquiring target data to be exported and identifying field data of each data field item with target attribute in the target data to be exported;
the field splicing module is used for splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string;
and the data desensitization checking and processing module is used for carrying out sensitive information identification or sensitive information identification and desensitization processing on each data splicing character string to finish the desensitization checking process of the target data to be exported.
In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a combined desensitization check method for medical data as provided by any embodiment of the invention.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a combined desensitization check method for medical data as provided by any of the embodiments of the present invention.
The embodiments of the above invention have the following advantages or benefits:
according to the embodiment of the invention, the target data to be exported is obtained, and the field data of each data field item with the target attribute in the target data to be exported is identified; splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string; and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported. According to the technical scheme of the embodiment of the invention, the problem that the data desensitization is not thorough in the derived data in the desensitization of the medical data in the hospital is solved; the data to be exported can be subjected to desensitization check, incomplete desensitization of the data is avoided, the possibility of sensitive data leakage is reduced, and the data security is improved.
Drawings
FIG. 1 is a flow chart of a combined desensitization check method for medical data provided by an embodiment of the invention;
FIG. 2 is a flow chart of a combined desensitization check method for medical data provided by an embodiment of the invention;
FIG. 3 is a flow chart of a combined desensitization check method for medical data provided by an embodiment of the invention;
fig. 4 is a schematic structural diagram of a combined desensitization checking device for medical data according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Fig. 1 is a flowchart of a combined desensitization checking method for medical data according to an embodiment of the present invention, where the embodiment may be suitable for a scenario in which sensitive information checking is performed on data that needs to be exported outside an original data system, and data security is enhanced. The method can be executed by a combined desensitization checking device of medical data, the device can be configured to be applied to medical data management software, realized by a mode of software and/or hardware and integrated into computer equipment with an application development function.
As shown in fig. 1, the combined desensitization checking method of medical data includes the steps of:
s110, acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported.
Wherein the target data to be exported may be data that needs to be exported from a data management system. For example, in the medical data field, a medical data management system stores a large amount of medical data, which may be automatically acquired or manually entered patient information and clinical test data. When some data in the medical data management system needs to be exported outside the system for data use or analysis, the "some data" herein is the target data to be exported. Wherein the target data to be exported may include one or more pieces of medical data corresponding to at least one patient.
It will be appreciated that the target data to be exported is the data immediately prior to exporting the medical data management system. The target data to be exported may be medical data that has undergone a desensitization process, or may be medical data that has not undergone a desensitization process.
The target attribute may be information indicating the manner in which data is input to the data management system, for distinguishing whether a desensitization check is to be performed. Specifically, if the target attribute indicates that the data content corresponding to one data field item is mainly manually filled field data, the data corresponding to the data field item needs to be subjected to desensitization check. If the target attribute indicates that the data content corresponding to one data field item is mainly data which is extracted and filled by automatic information, for example, the test equipment uploads a test result to an associated computer end, and the test data is automatically filled, and the data of the data field item cannot be modified manually at the moment, so that desensitization check is not needed.
The field data of each data field item with the target attribute in the target data to be exported is identified, namely, all data contents needing desensitization check are found. When the data content corresponding to each data field item is expressed in character form in the medical data management system, the target data to be exported, which directly corresponds to, can be read from the database of the medical data management system. And for the picture or PDF type data under the data field item with the target attribute in the target data to be exported, the identification and extraction of the data content are also required, and the field data in the form of characters corresponding to the picture or PDF type data under the corresponding data field item is generated so as to facilitate the subsequent data desensitization check.
And S120, splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string.
The preset splicing rule may be a preset field data splicing sequence, and specifies the splicing position and the splicing times of each data field item with the target attribute. The splice location and number of data field items in different preset splice rules may be different.
In the process of determining the preset splicing rules, after the splicing times of all the data field items in the final data splicing character string are determined, the number of the data field items to be spliced can be determined, the splicing positions of the data field items to be spliced can be used for arranging and combining the data field items, and the result of each arrangement and combination is exhausted. The at least one preset splicing rule may be one or more of the data field item permutation and combination results of each data field item needing to be spliced. One data field item may be placed at any one or more sequence positions in the splice order.
And splicing field data corresponding to all or part of data field items in the data content to be subjected to desensitization check according to a field splicing sequence in a preset splicing rule to obtain at least one corresponding data splicing character string.
And S130, carrying out sensitive information identification or sensitive information identification and desensitization processing on each data splicing character string to finish the desensitization checking process of the target data to be exported.
When the sensitive information is identified, the target sensitive information can be checked in each data splicing character string obtained by splicing according to the data characteristics of some sensitive information. For example, sensitive information such as a telephone number, an identity card number and the like can be searched in the data splicing character string to be a sub-character string which is of a digital character type and accords with the length of the telephone number or the identity card number, so that corresponding suspected sensitive data can be found. And then matching the suspected sensitive data with a desensitization dictionary, and determining sensitive information in the data splicing character string. Thus, the desensitization processing can be carried out on the found sensitive information, and the desensitization checking process of the target data to be exported is completed.
Or, each data splicing character string can be directly matched with the content in the desensitization dictionary, so as to determine whether the data splicing character string has sensitive information.
According to the technical scheme, the target data to be exported are obtained, and the field data of each data field item with the target attribute in the target data to be exported are identified; splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string; and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported. The technical scheme of the embodiment of the invention solves the problem that the data desensitization is not thorough in the derived data in the desensitization of the medical data in the hospital; the sensitive information can be checked before the data is exported, so that the sensitive information is prevented from being doped in the exported data, the possibility of sensitive data leakage is reduced, and the data security is improved.
Fig. 2 is a flowchart of a combined desensitization checking method for medical data according to an embodiment of the present invention, and further describes a field data splicing process based on the above embodiment. The method can be executed by a combined desensitization checking device of medical data, the device can be configured in software applied to medical data management, realized by means of software and/or hardware, and integrated in computer equipment with application development functions.
As shown in fig. 2, the combined desensitization checking method of medical data includes the steps of:
s210, acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported.
S220, classifying the field data according to the data types, and splicing the field data of the same data type according to the corresponding splicing sequence in at least one preset splicing rule to obtain at least one data splicing character string.
Wherein, each field data is the data content which needs to be desensitized and checked. The data types may include digital (numerical), character, special symbol, and the like. The field data of the same data type are spliced according to the corresponding splicing sequence in at least one preset splicing rule, which can be understood as that the splicing sequence of the field items is determined according to different data types in each preset splicing rule, namely, the field data of the same type are spliced together to obtain a plurality of data splicing character strings. And the number and location of the splice of the field data in each data splice string may be different.
By classifying the data fields, the number of rules corresponding to all splicing rules can be reduced, sensitive information of different types can be checked more pertinently, and the field data processing efficiency can be improved.
And S230, carrying out sensitive information identification or sensitive information identification and desensitization processing on each data splicing character string to finish the desensitization checking process of the target data to be exported.
Specifically, when sensitive information identification is performed on each data splicing character string, the data splicing character strings can be matched in a desensitization dictionary, and whether the data splicing character strings contain sensitive information is determined; and/or matching each sensitive word in the desensitization dictionary in the data splicing character string to determine sensitive information in the data splicing character string; and/or performing word segmentation processing on the data spliced character string, and matching word segmentation results in a desensitization dictionary to determine sensitive information in the data spliced character string. The word segmentation processing of the data splicing character strings can be performed through a pre-trained text processing model with a text word segmentation function.
The desensitization dictionary may be a first desensitization dictionary generated in advance according to sensitive information containing at least one target to-be-exported data; the method can also be used for generating a second desensitization dictionary in real time based on the sensitive information of the target to-be-exported data, which is acquired based on the target to-be-exported data and contains at least one target to-be-exported data. The first desensitization dictionary may be understood as a dictionary containing sensitive information of a plurality of data to be exported (including the target data to be exported), and by matching with the first desensitization dictionary, it can be determined whether the target data to be exported includes sensitive information of a data object associated with the target data to be exported, and also whether the target data to be exported includes sensitive information of other data objects. The second desensitization dictionary is a desensitization dictionary only containing data sensitive information of the target to-be-exported data-associated data object, and has more pertinence to detection of the sensitive information.
In addition, the sensitive information identification is carried out for each data splicing character string, and the sensitive information identification can be finished in a deep learning mode. For example, firstly, word segmentation processing is carried out on the data spliced character strings to obtain a target word segmentation result; then, inputting a target word segmentation result into a preset sensitive information identification model, and determining suspected sensitive data in the data splicing character string according to a model output result; and finally, determining the sensitive information in the data splicing character string based on the suspected sensitive data. The method comprises the steps of determining the sensitive information in a data splicing character string, and obtaining a sensitive information identification result by matching a preset desensitization dictionary or a manual confirmation mode, if the sensitive information is contained, determining whether the sensitive information is of the same type or not, and if the sensitive information is contained, determining whether the sensitive information is of the same type or not.
According to the technical scheme, the target data to be exported are obtained, and field data of each data field item with the target attribute in the target data to be exported are identified; classifying the field data according to the data types, and splicing the field data of the same data type according to the corresponding splicing sequence in at least one preset splicing rule to obtain at least one data splicing character string; and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported. The technical scheme of the embodiment of the invention solves the problem that the data desensitization is not thorough in the derived data in the desensitization of the medical data in the hospital; the sensitive information can be checked before the data is exported, so that the sensitive information is prevented from being doped in the exported data, the possibility of sensitive data leakage is reduced, and the data security is improved.
Fig. 3 is a flowchart of a combined desensitization checking method for medical data according to an embodiment of the present invention, and further describes a process of identifying and desensitizing sensitive information based on the above embodiment. The method can be executed by a combined desensitization checking device of medical data, the device can be configured in software applied to medical data management, realized by means of software and/or hardware, and integrated in computer equipment with application development functions.
As shown in fig. 3, the combined desensitization checking method of medical data includes the steps of:
s310, acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported.
S320, splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string.
S330, extracting suspected sensitive data in the data splicing character string according to the character type and/or the character string length of the sub-character string in the data splicing character string.
The character types can be numerical type and non-numerical type, and the character string length can be the conventional character string length corresponding to any sensitive information type. Sensitive information can be rapidly screened through data characteristics such as character types, character string lengths and the like, so that suspected sensitive data are extracted from the data splicing character strings.
And S340, matching the suspected sensitive data with a desensitization dictionary, and determining sensitive information in the data splicing character string.
The desensitization dictionary may be a first desensitization dictionary generated in advance according to sensitive information containing at least one target to-be-exported data; the method can also be used for generating a second desensitization dictionary in real time based on the sensitive information of the target to-be-exported data, which is acquired based on the target to-be-exported data and contains at least one target to-be-exported data. The first desensitization dictionary may be understood as a dictionary containing sensitive information of a plurality of data to be exported (including the target data to be exported), and by matching with the first desensitization dictionary, it can be determined whether the target data to be exported includes sensitive information of a data object associated with the target data to be exported, and also whether the target data to be exported includes sensitive information of other data objects. The second desensitization dictionary is a desensitization dictionary only containing data sensitive information of the target to-be-exported data-associated data object, and has more pertinence to detection of the sensitive information.
If the suspected sensitive information is successfully matched with the content in the corresponding desensitization dictionary, the suspected sensitive information can be determined to be sensitive information.
And S350, matching a target desensitization rule according to the category of the sensitive information, and carrying out desensitization processing on the sensitive information based on the target desensitization rule.
Different sensitive information types can correspond to different desensitization rules, such as adopting special symbols to replace sensitive information, such as directly deleting corresponding sensitive information, and the like. The type of the sensitive information can be judged first, then the target desensitization rule is matched according to the judging result, and the sensitive information is desensitized based on the target desensitization rule.
According to the technical scheme, the target data to be exported are obtained, and field data of each data field item with the target attribute in the target data to be exported are identified; splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string; extracting suspected sensitive data in the data splicing character string according to the character type and/or the character string length of the sub-character string in the data splicing character string; matching the suspected sensitive data with a desensitization dictionary, and determining sensitive information in the data splicing character string; and matching a target desensitization rule according to the category of the sensitive information, and carrying out desensitization processing on the sensitive information based on the target desensitization rule. The technical scheme of the embodiment of the invention solves the problem that the data desensitization is not thorough in the derived data in the desensitization of the medical data in the hospital; the sensitive information can be checked before the data is exported, so that the sensitive information is prevented from being doped in the exported data, the possibility of sensitive data leakage is reduced, and the data security is improved.
Fig. 4 is a schematic structural diagram of a combined desensitization checking device for medical data provided by the embodiment of the invention, which is applicable to a scenario of performing sensitive information checking on data to be exported to the outside of an original data system and enhancing data security, and the device can be configured in software applied to medical data, realized by software and/or hardware, and integrated in computer equipment with an application development function.
As shown in fig. 4, the combined desensitization check device of medical data includes: a field data acquisition module 410, a field stitching module 420, and a data desensitization verification and processing module 430.
The field data obtaining module 410 is configured to obtain target data to be exported, and identify field data of each data field item having a target attribute in the target data to be exported; the field splicing module 420 is configured to splice each field data according to at least one preset splicing rule to obtain at least one data splicing string; the data desensitization checking and processing module 430 is configured to perform sensitive information identification or sensitive information identification and desensitization processing for each data concatenation string, so as to complete a desensitization checking process of the target data to be exported.
According to the technical scheme, the target data to be exported are obtained, and field data of each data field item with the target attribute in the target data to be exported are identified; splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string; and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported. The technical scheme of the embodiment of the invention solves the problem that the data desensitization is not thorough in the derived data in the desensitization of the medical data in the hospital; the sensitive information can be checked before the data is exported, so that the sensitive information is prevented from being doped in the exported data, the possibility of sensitive data leakage is reduced, and the data security is improved.
In an alternative embodiment, the field concatenation module 420 may be configured to:
splicing all or part of field data corresponding to the data field items according to the field splicing sequence in the preset splicing rule to obtain corresponding data splicing character strings;
wherein one of the data field items may be disposed at any one or more sequence positions in the splice order.
In an alternative embodiment, the field concatenation module 420 may be further configured to:
classifying the field data according to the data type;
and splicing the field data of the same data type according to the corresponding splicing sequence to obtain the data splicing character string.
In an alternative embodiment, the data desensitization verification and processing module 430 may be configured to:
extracting suspected sensitive data in the data splicing character string according to the character type and/or the character string length of the sub-character string in the data splicing character string;
and matching the suspected sensitive data with a desensitization dictionary, and determining sensitive information in the data splicing character string.
In an alternative embodiment, the data desensitization verification and processing module 430 may be configured to:
matching the data splicing character strings in a desensitization dictionary, and determining whether the data splicing character strings contain sensitive information or not; and/or the number of the groups of groups,
matching each sensitive word in the desensitization dictionary in the data splicing character string, and determining sensitive information in the data splicing character string; and/or the number of the groups of groups,
and performing word segmentation processing on the data spliced character string, and matching word segmentation results in a desensitization dictionary to determine sensitive information in the data spliced character string.
In an alternative embodiment, the data desensitization verification and processing module 430 may also be configured to:
performing word segmentation processing on the data spliced character string to obtain a target word segmentation result;
inputting the target word segmentation result into a preset sensitive information identification model, and determining suspected sensitive data in the data splicing character string according to a model output result;
and determining sensitive information in the data splicing character string based on the suspected sensitive data.
In an alternative embodiment, the desensitization dictionary comprises:
a first desensitization dictionary which is generated in advance according to sensitive information of data to be exported and comprises at least one target; and/or the number of the groups of groups,
and based on the target data to be exported acquired, the second desensitization dictionary is generated in real time and contains at least one target data sensitive information to be exported.
In an alternative embodiment, the data desensitization verification and processing module 430 may also be configured to:
matching a target desensitization rule according to the category of the sensitive information;
and desensitizing the sensitive information based on the target desensitizing rule.
In an alternative embodiment, the target data to be derived includes medical data that has undergone a desensitization process and medical data that has not undergone a desensitization process.
In an alternative embodiment, the field data acquisition module 410 may be further configured to:
and identifying and extracting the picture or PDF type data under each data field item with the target attribute in the target data to be exported, and generating field data corresponding to the picture or PDF type data under each data field item.
The combined desensitization checking device for the medical data provided by the embodiment of the invention can execute the combined desensitization checking method for the medical data provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing power, such as an intelligent controller, a server, a mobile phone, and the like.
As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, implementing a combined desensitization check method of medical data provided by the present embodiment, the method including:
acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported;
splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string;
and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a combined desensitization check method of medical data as provided by any embodiment of the present invention, comprising:
acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported;
splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string;
and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A method of combined desensitization verification of medical data, the method comprising:
acquiring target data to be exported, and identifying field data of each data field item with target attribute in the target data to be exported;
splicing the field data according to at least one preset splicing rule to obtain at least one data splicing character string;
and carrying out sensitive information identification or sensitive information identification and desensitization treatment on each data splicing character string to finish the desensitization checking process of the target data to be exported.
2. The method of claim 1, wherein the splicing each field data according to at least one preset splicing rule to obtain at least one data splice string comprises:
splicing all or part of field data corresponding to the data field items according to the field splicing sequence in the preset splicing rule to obtain corresponding data splicing character strings;
wherein one of the data field items may be disposed at any one or more sequence positions in the splice order.
3. The method according to claim 2, wherein the splicing the corresponding field data according to the splicing order of the data field items in the preset splicing rule to obtain a corresponding data splicing string includes:
classifying the field data according to the data type;
and splicing the field data of the same data type according to the corresponding splicing sequence to obtain the data splicing character string.
4. The method of claim 1, wherein said identifying sensitive information for each of said data splice strings comprises:
extracting suspected sensitive data in the data splicing character string according to the character type and/or the character string length of the sub-character string in the data splicing character string;
and matching the suspected sensitive data with a desensitization dictionary, and determining sensitive information in the data splicing character string.
5. The method of claim 1, wherein said identifying sensitive information for each of said data splice strings comprises:
matching the data splicing character strings in a desensitization dictionary, and determining whether the data splicing character strings contain sensitive information or not; and/or the number of the groups of groups,
matching each sensitive word in the desensitization dictionary in the data splicing character string, and determining sensitive information in the data splicing character string; and/or the number of the groups of groups,
and performing word segmentation processing on the data spliced character string, and matching word segmentation results in a desensitization dictionary to determine sensitive information in the data spliced character string.
6. The method of claim 1, wherein said identifying sensitive information for each of said data splice strings further comprises:
performing word segmentation processing on the data spliced character string to obtain a target word segmentation result;
inputting the target word segmentation result into a preset sensitive information identification model, and determining suspected sensitive data in the data splicing character string according to a model output result;
and determining sensitive information in the data splicing character string based on the suspected sensitive data.
7. The method of claim 4 or 5, wherein the desensitization dictionary comprises:
a first desensitization dictionary which is generated in advance according to sensitive information of data to be exported and comprises at least one target; and/or the number of the groups of groups,
and based on the target data to be exported acquired, the second desensitization dictionary is generated in real time and contains at least one target data sensitive information to be exported.
8. The method according to any one of claims 4 to 6, wherein the desensitizing process for each of the data splice strings comprises:
matching a target desensitization rule according to the category of the sensitive information;
and desensitizing the sensitive information based on the target desensitizing rule.
9. The method of claim 1, wherein the target data to be derived includes medical data that has undergone a desensitization process and medical data that has not undergone a desensitization process.
10. The method as recited in claim 1, further comprising:
and identifying and extracting the picture or PDF type data under each data field item with the target attribute in the target data to be exported, and generating field data corresponding to the picture or PDF type data under each data field item.
CN202311410452.2A 2023-10-27 2023-10-27 Combined desensitization checking method for medical data Pending CN117494193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311410452.2A CN117494193A (en) 2023-10-27 2023-10-27 Combined desensitization checking method for medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311410452.2A CN117494193A (en) 2023-10-27 2023-10-27 Combined desensitization checking method for medical data

Publications (1)

Publication Number Publication Date
CN117494193A true CN117494193A (en) 2024-02-02

Family

ID=89681921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311410452.2A Pending CN117494193A (en) 2023-10-27 2023-10-27 Combined desensitization checking method for medical data

Country Status (1)

Country Link
CN (1) CN117494193A (en)

Similar Documents

Publication Publication Date Title
CN107239666B (en) Method and system for desensitizing medical image data
CN108052577B (en) Universal text content mining method, device, server and storage medium
CN109002510B (en) Dialogue processing method, device, equipment and medium
CN109359194B (en) Method and apparatus for predicting information categories
CN109815147B (en) Test case generation method, device, server and medium
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
US20190147104A1 (en) Method and apparatus for constructing artificial intelligence application
CN111241496B (en) Method and device for determining small program feature vector and electronic equipment
CN111598122B (en) Data verification method and device, electronic equipment and storage medium
CN112181835A (en) Automatic testing method and device, computer equipment and storage medium
CN110764760A (en) Method, apparatus, computer system, and medium for drawing process flow diagrams
CN113762303B (en) Image classification method, device, electronic equipment and storage medium
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
CN109684207B (en) Method and device for packaging operation sequence, electronic equipment and storage medium
CN108664610B (en) Method and apparatus for processing data
CN117494193A (en) Combined desensitization checking method for medical data
CN111859985B (en) AI customer service model test method and device, electronic equipment and storage medium
CN113570205A (en) API risk equipment identification method and device based on single classification and electronic equipment
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN114169318A (en) Process identification method, apparatus, device, medium, and program
CN111753548A (en) Information acquisition method and device, computer storage medium and electronic equipment
CN108628909B (en) Information pushing method and device
CN113111229A (en) Regular expression-based method and device for extracting track-to-ground address of alarm receiving and processing text
CN111061854B (en) Interaction method and device of intelligent conversation and electronic equipment
CN110414395B (en) Content identification method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination