WO2022088754A1 - 文件脱敏方法、装置及存储介质 - Google Patents

文件脱敏方法、装置及存储介质 Download PDF

Info

Publication number
WO2022088754A1
WO2022088754A1 PCT/CN2021/105808 CN2021105808W WO2022088754A1 WO 2022088754 A1 WO2022088754 A1 WO 2022088754A1 CN 2021105808 W CN2021105808 W CN 2021105808W WO 2022088754 A1 WO2022088754 A1 WO 2022088754A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
desensitization
read
desensitized
completed
Prior art date
Application number
PCT/CN2021/105808
Other languages
English (en)
French (fr)
Inventor
李永辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21884491.8A priority Critical patent/EP4227838A4/en
Publication of WO2022088754A1 publication Critical patent/WO2022088754A1/zh
Priority to US18/307,986 priority patent/US20230315906A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0623Securing storage systems in relation to content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present application relates to the field of computer technology, and in particular, to a file desensitization method, device, and storage medium.
  • the present application proposes a file desensitization method, device and storage medium.
  • an embodiment of the present application provides a file desensitization method, the method is executed by a storage device, and the method includes: receiving a file read command sent by a host, where the file read command is used to request a file to be read file; in response to the read file command, obtain the file to be read, at least a part of the data blocks in the file to be read have been desensitized; for the files to be read that have not been desensitized The data block is desensitized; the file to be read that has been desensitized is sent to the host.
  • the file desensitization method in this embodiment is performed by a storage device, and the storage device can receive a file read command sent by a host, and in response to the file read command, obtain a file to be read, wherein at least a part of data in the file to be read.
  • the block has been desensitized, then desensitize the data blocks that have not been desensitized in the file to be read, and send the desensitized file to be read to the host, so that the file to be read can be incremented
  • Desensitization can not only make full use of the previous desensitization results to improve the processing efficiency of document desensitization, but also combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • the method further includes: saving the desensitized file to be read in the hard disk of the storage device .
  • the storage device may save each desensitization result (that is, the file to be read that has been desensitized) in the hard disk.
  • a clean backup can be gradually formed, and the clean backup includes files that have been desensitized and non-sensitive files. Subsequent development or testing activities can be carried out based on the purification backup, which can not only reduce unnecessary multiple desensitization, but also reduce the impact of online desensitization on IO efficiency during development or testing, and improve file reading efficiency.
  • the method further includes: acquiring a bitmap of the to-be-read file, where the bitmap is used to indicate the to-be-read file Get whether each data block included in the file has completed desensitization processing; according to the bitmap, identify the data blocks that have not been desensitized in the file to be read.
  • a bitmap can be set for the file to be processed, and by identifying the bitmap of the file to be read, the data blocks that have not been desensitized in the file to be read can be determined, which is simple and fast, thereby improving the processing efficiency. efficiency.
  • the document to be read has a desensitization mark, and the desensitization mark is used to indicate whether the document to be read has been desensitized sensitive treatment,
  • the method further includes: before performing desensitization processing on the data blocks of the to-be-read file that have not been desensitized yet, determining, according to the desensitization mark, whether the to-be-read file has not been desensitized yet. processed files.
  • the file to be read has a desensitization mark, and the desensitization state of the file to be read can be determined according to the desensitization mark, thereby realizing management and identification of the desensitization state of the file to be read.
  • the method further includes: modifying the bitmap with the desensitization process that has been completed.
  • the flag bit corresponding to the data block indicates that the corresponding data block has completed the desensitization process.
  • the flag bit corresponding to the desensitized data block in the bitmap can be modified to indicate that the data block has been desensitized. deal with.
  • the bitmap can be updated in time according to the change of the desensitization state of the data blocks in the file to be read, thereby improving the accuracy of the bitmap of the file to be read.
  • the method further includes: modifying the desensitized file to be read that has been desensitized desensitization mark to indicate that the file to be read has been desensitized.
  • the desensitization process of the file to be read when the desensitization process of the file to be read has been completed, that is, when the bitmap of the image to be processed indicates that the desensitization process has been completed for each data block included in it, the desensitization process of the file to be read can be modified. mark to indicate that desensitization has been completed. In this way, the desensitization mark of the document to be read can be updated in time according to the change of the desensitization state of the document to be read, so that the accuracy of the desensitization mark of the document to be read can be improved.
  • At least a part of the data blocks in the to-be-read file is desensitized through offline desensitization.
  • online desensitization can be performed on the basis of offline desensitization, so that offline desensitization and online desensitization can jointly maintain a desensitization result.
  • the to-be-read file includes a text file, At least one of report files, picture files, audio files and video files.
  • the file to be read may include at least one of a text file, a report file, a picture file, an audio file, and a video file, and may also include office documents, XML (Extensible Markup Language, Extensible Markup Language) files , HTML (HyperText Markup Language, Hypertext Markup Language) files and other unstructured data. Therefore, various unstructured data can be desensitized by the above method.
  • an embodiment of the present application provides a file desensitization device, the device is applied to a storage device, and the device includes: a command receiving module, configured to receive a file read command sent by a host, the file read command used to request the file to be read; the file acquisition module, in response to the read file command, acquires the to-be-read file, at least a part of the data blocks in the to-be-read file has been desensitized; desensitization processing module , used to desensitize the data blocks of the to-be-read file that have not been desensitized; the file sending module is used to send the to-be-read file that has been desensitized to the host.
  • a command receiving module configured to receive a file read command sent by a host, the file read command used to request the file to be read
  • the file acquisition module in response to the read file command, acquires the to-be-read file, at least a part of the data blocks in the to-be-read file has been des
  • the file desensitization device of this embodiment is applied to a storage device, and can receive a file read command sent by a host, and in response to the file read command, obtain a file to be read, wherein at least a part of the data blocks in the file to be read have been Complete the desensitization process, then desensitize the data blocks that have not been desensitized in the file to be read, and send the desensitized file to be read to the host, so that the file to be read can be incrementally desensitized , not only can make full use of the previous desensitization results, improve the processing efficiency of file desensitization, but also combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • the device further includes: a file storage module, configured to store the desensitized file to be read in the file storage module. on the hard disk of the storage device.
  • the storage device may save each desensitization result (that is, the file to be read that has been desensitized) in the hard disk.
  • a clean backup can be gradually formed, and the clean backup includes files that have been desensitized and non-sensitive files. Subsequent development or testing activities can be carried out based on the purification backup, which can not only reduce unnecessary multiple desensitization, but also reduce the impact of online desensitization on IO efficiency during development or testing, and improve file reading efficiency.
  • the device further includes: a bitmap acquisition module, configured to acquire a bitmap of the to-be-read file, the bitmap It is used to indicate whether each data block contained in the to-be-read file has completed desensitization processing; the data block identification module is used to identify, according to the bitmap, the desensitized data in the to-be-read file that has not been desensitized yet. data block.
  • a bitmap acquisition module configured to acquire a bitmap of the to-be-read file, the bitmap It is used to indicate whether each data block contained in the to-be-read file has completed desensitization processing
  • the data block identification module is used to identify, according to the bitmap, the desensitized data in the to-be-read file that has not been desensitized yet. data block.
  • a bitmap can be set for the file to be processed, and by identifying the bitmap of the file to be read, the data blocks that have not been desensitized in the file to be read can be determined, which is simple and fast, thereby improving the processing efficiency. efficiency.
  • the document to be read has a desensitization mark, and the desensitization mark is used to indicate whether the document to be read has been desensitized sensitive treatment,
  • the device further includes: a file desensitization identification module, configured to determine the to-be-read file based on the desensitization mark before desensitization is performed on the data blocks of the to-be-read file that have not been desensitized.
  • Read files are files that have not yet been desensitized.
  • the file to be read has a desensitization mark, and the desensitization state of the file to be read can be determined according to the desensitization mark, thereby realizing management and identification of the desensitization state of the file to be read.
  • the apparatus further includes: a bitmap modification module, configured to modify the bitmap with the The flag bit corresponding to the data block that has completed the desensitization process, to indicate that the corresponding data block has completed the desensitization process.
  • the flag bit corresponding to the desensitized data block in the bitmap can be modified to indicate that the data block has been desensitized. deal with.
  • the bitmap can be updated in time according to the change of the desensitization state of the data blocks in the file to be read, thereby improving the accuracy of the bitmap of the file to be read.
  • the device further includes: a desensitization mark modification module, configured to modify the desensitization mark that has been desensitized The desensitization mark of the file to be read that has been desensitized to indicate that the desensitization process of the file to be read has been completed.
  • the desensitization process of the file to be read when the desensitization process of the file to be read has been completed, that is, when the bitmap of the image to be processed indicates that the desensitization process has been completed for each data block included in it, the desensitization process of the file to be read can be modified. mark to indicate that desensitization has been completed. In this way, according to the change of the desensitization state of the file to be read, the desensitization mark of the file to be read can be updated in time, thereby improving the accuracy of the desensitization mark of the file to be read.
  • At least a part of the data blocks in the to-be-read file are desensitized through offline desensitization.
  • online desensitization can be performed on the basis of offline desensitization, so that offline desensitization and online desensitization can jointly maintain a desensitization result.
  • the file to be read includes a text file, At least one of report files, picture files, audio files and video files.
  • the file to be read may include at least one of a text file, a report file, a picture file, an audio file, and a video file, and may also include office documents, XML (Extensible Markup Language, Extensible Markup Language) files , HTML (HyperText Markup Language, Hypertext Markup Language) files and other unstructured data. Therefore, various unstructured data can be desensitized by the above method.
  • embodiments of the present application provide a file desensitization device, including a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to implement the above-mentioned first step when executing the instructions.
  • a file desensitization method including a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to implement the above-mentioned first step when executing the instructions.
  • the file desensitization device of this embodiment is applied to a storage device, and can receive a file read command sent by a host, and in response to the file read command, obtain a file to be read, wherein at least a part of the data blocks in the file to be read have been Complete the desensitization process, then desensitize the data blocks that have not been desensitized in the file to be read, and send the desensitized file to be read to the host, so that the file to be read can be incrementally desensitized , not only can make full use of the previous desensitization results, improve the processing efficiency of file desensitization, but also combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect is implemented
  • a file desensitization method in one or more of a variety of possible implementations.
  • a to-be-read file is obtained, wherein at least a part of the data blocks in the to-be-read file have been desensitized, and then treated
  • the data blocks of the read files that have not been desensitized are desensitized, and the desensitized files to be read that have been desensitized are sent to the host, so that the files to be read can be incrementally desensitized.
  • Desensitization results improve the processing efficiency of document desensitization, and combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • embodiments of the present application provide a computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in an electronic
  • the processor in the electronic device executes the first aspect or one or more of the file desensitization methods in the multiple possible implementation manners of the first aspect.
  • a to-be-read file is obtained, wherein at least a part of the data blocks in the to-be-read file have been desensitized, and then treated
  • the data blocks of the read files that have not been desensitized are desensitized, and the desensitized files to be read that have been desensitized are sent to the host, so that the files to be read can be incrementally desensitized.
  • Desensitization results improve the processing efficiency of document desensitization, and combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • FIG. 1 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • FIG. 2 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • FIG. 3 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • FIG. 4 shows a flowchart of a file desensitization method according to an embodiment of the present application.
  • FIG. 5 shows a schematic diagram of an application of a method for desensitizing a document according to an embodiment of the present application.
  • FIG. 6 shows a schematic diagram of an application of a method for desensitizing a document according to an embodiment of the present application.
  • FIG. 7 shows a schematic diagram of a processing procedure of a document desensitization method according to an embodiment of the present application.
  • FIG. 8 shows a block diagram of a document desensitization apparatus according to an embodiment of the present application.
  • desensitization processing refers to identifying the sensitive information in the object to be desensitized (such as data, files, etc.), and modifying the sensitive information by blocking, obfuscation, etc., so as to hide the sensitive information and realize the sensitive information. Reliable protection.
  • Desensitization treatment can include online desensitization and offline desensitization.
  • online desensitization means that when the host reads or accesses a file, the desensitization engine (used to perform desensitization processing) can identify the sensitive information in the read or accessed file in real time. The information is desensitized, and the desensitized files are sent to the host for use. That is to say, during online desensitization, the desensitization engine needs to perform a desensitization process for each file read or access of the host, which is inefficient and may have unnecessary multiple desensitization processes.
  • the desensitization engine when the file read by the host for multiple times is the same file, the desensitization engine will desensitize the file multiple times.
  • the desensitization engine needs to intercept the IO file, and after the desensitization process is completed, the file can be sent to the host, which will significantly affect the IO efficiency during development or testing.
  • Offline desensitization means that before product development or testing, the desensitization engine desensitizes all files used in the development or testing process through enumeration, traversal, etc., and saves the files that have been desensitized. After offline desensitization is completed, product development or testing activities will be carried out based on the saved files that have been desensitized, and online desensitization will not be required during development or testing.
  • the backup data including multiple files used during product development or testing, such as a backup LUN (Logical Unit Number, the backup data is stored on the device corresponding to the logical unit number)
  • a backup LUN Logical Unit Number
  • the desensitized image is a purified image that does not contain sensitive information, and development or testing activities will be carried out based on the purified image.
  • offline desensitization is that development or testing activities can only be started after offline desensitization is completed, the waiting time is long, and if offline desensitization is not performed, development or testing activities cannot be carried out.
  • the present application provides a file desensitization method.
  • the file desensitization method in the embodiments of the present application can be applied to a storage device.
  • the storage device can receive a file read command sent by a host, and respond to the file read command. , obtain the file to be read, wherein at least a part of the data blocks in the file to be read have been desensitized, and then desensitize the data blocks that have not been desensitized in the file to be read, and send the completed data block to the host.
  • the files to be read which are desensitized can be desensitized incrementally, which can not only make full use of the previous desensitization results, improve the processing efficiency of file desensitization, but also combine online desensitization with offline desensitization. Combined, product development or testing activities can be carried out at any time.
  • FIG. 1 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • the application scenario shown in FIG. 1 provides a desensitization system, including a production device 10 , a storage device 20 and a host 30 .
  • the file desensitization method in this embodiment is applied to the storage device 20 , and can be desensitized by the desensitization method in the storage device 20 . engine 21 to execute.
  • the production equipment 10 may be any form of electronic equipment, such as servers, desktop computers, mainframe computers, and any other type of computing equipment that includes a processor and memory.
  • the production equipment 10 saves the production data in the storage device 40, and the production data is the original file from the user, which may contain information related to the user's privacy. In order to prevent user privacy leakage, it is necessary to desensitize production data.
  • the storage device 40 can be either a memory located inside the production equipment 10 or a memory of a device located outside the production equipment 10 , and the external equipment can communicate with the production equipment 10 for storing production data from the production equipment 10 .
  • the external device may be any form of electronic device, such as a server, desktop computer, mainframe computer, storage array, and any other type of computing device that includes a processor and memory.
  • the host 30 mainly refers to a development/testing server, and a developer/tester obtains files from the storage device 20 through the host 30 for product development or testing.
  • the files provided by the storage device 20 to the host 30 are files after desensitization processing, so as to protect user privacy.
  • the host 30 may be any form of electronic device, such as a server, a desktop computer, a mobile device, and any other type of computing device that includes a processor and memory.
  • the storage device 20 may be a network attached storage (Network Attached Storage, NAS) device, and the NAS device is a dedicated high-performance file storage device, which provides file data for user access through a network and a file sharing protocol.
  • the protocols used between it and the host 30 include the TCP/IP protocol for data transmission, and the CIFS and NFS protocols for network file services.
  • the host 30 is configured with an NFS/CIFS client, and the NFS/CIFS client specifies the file name, location or other attributes in the read command to access a certain file.
  • the storage device 20 is configured with an NFS/CIFS server, and the NFS/CIFS server parses the read command. Since the file system records the location of the file in the hard disk, the storage device 20 can store the received read command. Convert the filename, location in to the address of the file to get the file.
  • the storage device 20 may also be a storage area network (Storage Area Network, SAN) device, and the SAN device communicates with the host 30 through a fiber channel network.
  • SAN device in this embodiment has a file system, and can perform file access.
  • the storage device 20 may also be other devices with storage functions, and the number of storage devices 20 included in the desensitization system may be one or multiple. This embodiment does not limit the storage device 20. Quantity is limited.
  • the production data stored in the production equipment 10 belongs to the original data, if the original data is desensitized directly, the data will be destroyed, and it is difficult to restore. Therefore, in the application scenario shown in FIG. 1, the production data is sent to the storage device 20 as a copy and saved, and the storage device 20 performs desensitization processing on the copy, not the original data itself.
  • FIG. 2 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • the application scenario shown in FIG. 2 is similar to that in FIG. 1, except that the production device 10 can directly store the production data in the storage device 20.
  • the storage device 20 creates a copy of the production data.
  • a copy of the data which is desensitized.
  • both the production data and the production data copies are located in the storage device 20 .
  • FIG. 3 shows a schematic diagram of an application scenario of a file desensitization method according to an embodiment of the present application.
  • the application scenario shown in FIG. 3 provides a desensitization system, including a production device 10 , a storage device 20 , a host 30 and a desensitization device 50 .
  • the difference from the application scenario shown in FIG. 1 is that the application scenario shown in FIG. 3 provides an independent desensitization device 50 for desensitizing files read by the host 30 .
  • the desensitization device 50 includes the desensitization engine 21 .
  • the desensitization device 50 may be any form of electronic device, such as a server, a desktop computer, a mobile device, and any other type of computing device that includes a processor and memory.
  • the production device 10 , the storage device 20 (excluding the desensitization engine), and the host 30 in the application scenario shown in FIG. 3 are similar to those in FIG. 1 , and will not be repeated here.
  • the desensitization system may include at least two desensitization engines.
  • the desensitization system includes two desensitization engines, the first desensitization engine located in the desensitization device and the second desensitization engine located in the storage device, respectively, the two desensitization engines can be used to execute Different desensitization processes, for example, the first desensitization engine is used to perform online desensitization, the second desensitization engine is used to perform offline desensitization, or the first desensitization engine is used to perform offline desensitization, and the second desensitization engine is used to perform offline desensitization.
  • the engine is used to perform online desensitization; two desensitization engines can also perform the same desensitization process, for example, both desensitization engines can perform online desensitization and offline desensitization. It should be noted that, those skilled in the art can set the number of desensitization engines in the desensitization system and the desensitization processing performed by each desensitization engine according to the actual situation, which is not limited in this embodiment.
  • FIG. 4 shows a flowchart of a file desensitization method according to an embodiment of the present application. As shown in FIG. 4 , the file desensitization method is executed by the storage device 20 , and the method includes steps S11 to S14 .
  • step S11 a file read command sent by the host is received, where the file read command is used to request a file to be read.
  • the host when a developer/tester obtains a file from a storage device through a host, the host can send a file read command to the storage device through an IO path, and the storage device can receive a file read command sent by the host, wherein the file read command Used to request files to be read.
  • the file to be read may include at least one of a text file, a report file, a picture file, an audio file, and a video file.
  • the files to be read may also include office documents, XML (Extensible Markup Language, Extensible Markup Language) files, HTML (HyperText Markup Language, Hypertext Markup Language) files and other unstructured data. This embodiment does not limit the specific type of the file to be read.
  • step S12 in response to the file read command, the to-be-read file is acquired, and at least a part of the data blocks in the to-be-read file have completed desensitization processing.
  • the storage device may, in response to the read file command, determine the storage address of the file to be read according to the file name, location and other information in the file read command, and According to the storage address, the file to be read is obtained.
  • the file to be read has been desensitized.
  • the file to be read can be divided into multiple data blocks, and some of the data blocks have been desensitized.
  • At least a part of the data blocks in the to-be-read file can be desensitized through offline desensitization.
  • the file to be read can be desensitized by offline desensitization.
  • the file to be read can be divided into multiple pieces of data of a certain size (for example, 4KB).
  • Block for example, divide the file to be read into 5 data blocks, and desensitize the file to be read through offline desensitization. If 3 data blocks have been desensitized, the desensitization process has been completed. When the number of data blocks is 3, product development or testing is required.
  • offline desensitization can be stopped, and the 3 data blocks that have been desensitized can be saved in the hard disk of the storage device, and passed through the bitmap , array, matrix, etc., record the desensitization status of each data block, and then switch the desensitization process to online desensitization, that is, during the development or testing process, the two files to be read by online desensitization have not been desensitized yet.
  • the processed data blocks are desensitized.
  • online desensitization can be performed on the basis of offline desensitization, so that offline desensitization and online desensitization can jointly maintain a desensitization result.
  • At least a part of the data blocks in the to-be-read file can be desensitized through online desensitization.
  • the host and the storage device support both the overall access to the file to be read and the independent access to each data block in the file to be read.
  • the file to be read includes 5 data blocks, and the file to be read has not undergone offline extraction. Sensitivity, when the host reads a file for the first time, it reads the first data block of the file to be read.
  • One data block is sent to the host, and the first data block that has been desensitized is stored in the hard disk of the storage device, and its desensitization status is recorded; when the host reads the file again, it reads the file to be read , the storage device may acquire the file to be read in response to the file read command, and perform online desensitization, wherein desensitization processing has been completed on a data block (ie, the first data block) of the file to be read.
  • step S13 desensitization processing is performed on data blocks of the to-be-read file that have not been desensitized yet.
  • the storage device can identify the data blocks of the file to be read that have not been desensitized yet, and perform desensitization processing on the data blocks of the file to be read that have not been desensitized.
  • desensitization processing may be performed by means of numeric/string replacement, invalidation, randomization, offset and rounding, mask masking, and the like.
  • the specific manner of desensitization treatment will be exemplarily described below with reference to specific examples.
  • Example 1 Desensitization by numeric/string replacement.
  • Numeric/string substitution refers to the use of fixed imaginary values in place of real numeric values or strings.
  • Table 1 the file before desensitization is shown in Table 1 below.
  • Table 2 the files after desensitization of mobile phone numbers are shown in Table 2 below.
  • Table 2 the desensitized mobile phone numbers are all displayed as 13800013800, and their true values have been hidden.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San City A, Sichuan province 138####8611 51132119######0672 2020-5-30 15:01:56 2 100001 Li Si City A, Sichuan City 133####6953 51132119######0611 2020-6-30 16:56:03 3 100002 Wang Wu Shenzhen District B 186####9898 51121019######5582 2020-4-30 16:01:50 4 100003 Zhao Liu Shenzhen District C 180####9465 46003319######0651 2020-7-30 16:15:03 5 100004 Qian Qi Shenzhen District D 181######7898 46003119######0818 2020-9-30 17:20:50
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San City A, Sichuan City 13800138000 51132119######0672 2020-5-30 15:01:56 2 100001 Li Si City A, Sichuan City 13800138000 51132119######0611 2020-6-30 16:56:03 3 100002 Wang Wu Shenzhen District B 13800138000 51121019######5582 2020-4-30 16:01:50 4 100003 Zhao Liu Shenzhen District C 13800138000 46003319######0651 2020-7-30 16:15:03 5 100004 Qian Qi Shenzhen District D 13800138000 46003119######0818 2020-9-30 17:20:50
  • # represents any one of the numbers 0-9.
  • Example 2 Desensitization treatment by nullification.
  • Desensitization through invalidation refers to desensitizing sensitive data by truncating, encrypting, hiding, etc., so that it no longer has value for use, for example, replacing the real value with*****.
  • Data invalidation is basically similar to the effect achieved by data replacement.
  • the files after hidden desensitization of addresses are shown in Table 3 below.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San ****** 138####8611 51132119######0672 2020-5-30 15:01:56 2 100001 Li Si ****** 133####6953 51132119######0611 2020-6-30 16:56:03 3 100002 Wang Wu ****** 186####9898 51121019######5582 2020-4-30 16:01:50 4 100003 Zhao Liu ****** 180####9465 46003319########0651 2020-7-30 16:15:03 5 100004 Qian Qi ****** 181######7898 46003119########0818 2020-9-30 17:20:50
  • truncated desensitization may also be performed on the address field, that is, only part of the information is hidden.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San Sichuan province*** 138####8611 51132119######0672 2020-5-30 15:01:56 2 100001 Li Si Sichuan province*** 133####6953 51132119######0611 2020-6-30 16:56:03 3 100002 Wang Wu Shenzhen*** 186####9898 51121019######5582 2020-4-30 16:01:50 4 100003 Zhao Liu Shenzhen*** 180####9465 46003319########0651 2020-7-30 16:15:03 5 100004 Qian Qi Shenzhen*** 181######7898 46003119########0818 2020-9-30 17:20:50
  • Desensitization through randomization refers to using random data to replace the real value/string, maintaining the randomness of the replaced value to simulate the authenticity of the sample. For example, use a randomly generated name instead of a real name, or a random number in a specified range instead of a real value.
  • the A1(rand(A1.len())+1)+B1(rand(B1.len())+1) function can be used to randomize the name (from A1 "last name.txt" and B1 "first name” .txt" external dictionary table to generate names with random combinations).
  • the files after desensitization of names through randomization are shown in Table 5 below.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang Yi City A, Sichuan province 138####8611 51132119######0672 2020-5-30 15:01:56 2 100001 Li Er City A, Sichuan City 133####6953 51132119######0611 2020-6-30 16:56:03 3 100002 Wang San Shenzhen District B 186####9898 51121019######5582 2020-4-30 16:01:50 4 100003 Zhao Si Shenzhen District C 180####9465 46003319######0651 2020-7-30 16:15:03 5 100004 money five Shenzhen District D 181######7898 46003119########0818 2020-9-30 17:20:50
  • Offset and rounding refer to changing the digital data by random shifting, for example, changing the date 2018-01-02 8:12:25 to 2018-01-02 8:00:00, the offset and rounding are kept in The security of the data also guarantees the approximate authenticity of the scope, which is of great value in the environment of big data utilization.
  • the string(operatetime,”yyyy-MM-dd HH:00:00") function can be used to format it into "yyyy-MM-dd HH:00:00" format according to the offset and rounding rules.
  • the files after desensitization of the operation time through offset and rounding are shown in Table 6.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San City A, Sichuan province 138####8611 51132119######0672 2020-5-30 15:00:00 2 100001 Li Si City A, Sichuan City 133####6953 51132119######0611 2020-6-30 16:00:00 3 100002 Wang Wu Shenzhen District B 186####9898 51121019######5582 2020-4-30 16:00:00 4 100003 Zhao Liu Shenzhen District C 180####9465 46003319######0651 2020-7-30 16:00:00 5 100004 Qian Qi Shenzhen District D 181####7898 46003119########0818 2020-9-30 17:00:00
  • Mask masking is a powerful tool for desensitizing part of account data, such as desensitization of bank card numbers or ID numbers.
  • Full or partial masking can be specified (the range of masking, such as last X bits, middle X bits, etc.). For example, mask and desensitize the date of birth of the ID number.
  • the left(string(idnumber),6)+"********"+right(string(idnumber),4) function can be used to mask the ID number.
  • the documents after desensitizing the ID number through mask masking are shown in Table 7 below.
  • serial number code Name address phone number identification number Operation time 1 100000 Zhang San City A, Sichuan City 138####8611 511321********0672 2020-5-30 15:01:56 2 100001 Li Si City A, Sichuan City 133####6953 511321********0611 2020-6-30 16:56:03 3 100002 Wang Wu Shenzhen District B 186####9898 511210********5582 2020-4-30 16:01:50 4 100003 Zhao Liu Shenzhen District C 180####9465 460033********0651 2020-7-30 16:15:03 5 100004 Qian Qi Shenzhen District D 181####7898 460031********0818 2020-9-30 17:20:50
  • step S14 the file to be read that has been desensitized is sent to the host.
  • the file to be read that has been desensitized can be obtained, and the storage device can send the desensitization process to the host. file to be read for use in product development or testing.
  • the file desensitization method in this embodiment is performed by a storage device, and the storage device can receive a file read command sent by a host, and in response to the file read command, obtain a file to be read, wherein at least a part of data in the file to be read.
  • the block has been desensitized, then desensitize the data blocks that have not been desensitized in the file to be read, and send the desensitized file to be read to the host, so that the file to be read can be incremented
  • Desensitization can not only make full use of the previous desensitization results to improve the processing efficiency of document desensitization, but also combine online desensitization with offline desensitization, so that product development or testing activities can be carried out at any time.
  • the method may further include: saving the desensitized file to be read in the hard disk of the storage device.
  • the storage device may save each desensitization result (that is, the file to be read that has been desensitized) in the hard disk.
  • a specific location can be selected to store desensitized files.
  • a clean backup can be gradually formed, and the clean backup includes files that have been desensitized and non-sensitive files. Subsequent development or testing activities can be carried out based on the purification backup, which can not only reduce unnecessary multiple desensitization, but also reduce the impact of online desensitization on IO efficiency during development or testing, and improve file reading efficiency.
  • the document to be read has a desensitization mark
  • the desensitization mark is used to indicate whether the document to be read has completed desensitization processing
  • the method may further include: before performing desensitization processing on the data blocks of the to-be-read file that have not been desensitized yet, determining, according to the desensitization mark, whether the to-be-read file has not been desensitized yet. sensitive files.
  • the file to be read has a desensitization mark, and the desensitization mark can be used to indicate whether the file to be read has completed the desensitization process.
  • the desensitization flag can be set to 1 or 0.
  • a desensitization flag of 1 indicates that the desensitization process of the file to be read has been completed, that is, all data blocks of the file to be read have been desensitized; the desensitization flag of 0 indicates that the desensitization process has been completed.
  • the file to be read has not yet been desensitized, that is, at least one data block in the file to be read has not been desensitized.
  • the desensitization flag is 1, it indicates that the desensitization process has not been completed for the file to be read, and the desensitization flag is 0, indicating that the file to be read has been desensitized.
  • the desensitization mark can also be set to other values. Those skilled in the art can set the value of the desensitization marker and the desensitization state corresponding to each value according to the actual situation, which is not limited in this application.
  • the desensitization flag can be set to 1 (indicating that the desensitization process has been completed) or 0 (indicating that the desensitization process has not been completed), it can be determined whether the desensitization flag of the file to be read is 0, and the desensitization flag of the file to be read can be set to 0.
  • step S13 can be executed to desensitize the data blocks of the file to be read that have not been desensitized;
  • the desensitization flag is 1, no desensitization processing is required, and the file to be read can be directly sent to the host.
  • the file to be read has a desensitization mark, and the desensitization state of the file to be read can be determined according to the desensitization mark, so that the management and identification of the desensitization state of the file to be read can be realized.
  • the method may further include: acquiring a bitmap of the to-be-read file, where the bitmap is used to indicate whether each data block included in the to-be-read file has been desensitized Processing; according to the bitmap, identify the data blocks in the to-be-read file that have not been desensitized yet.
  • a bitmap can be used to indicate whether each data block included in the to-be-read file has completed desensitization processing.
  • the bitmap may include several small squares, each small square corresponds to the address of a data block in the file to be read, and each small square records a flag bit with a value of 0 or 1. It can be used to indicate whether the data block stored at the address corresponding to the small square where the flag bit is recorded has completed the desensitization process.
  • the number of small squares in the bitmap of the file to be read is consistent with the number of data blocks contained in the file to be read.
  • the value of the flag bit in the bitmap can be 1 or 0, 1 can be used to indicate that the corresponding data block has completed desensitization processing, 0 indicates that the corresponding data block has not been desensitized yet, or 1 can be used to indicate that the corresponding data block has not been desensitized. The data block has not been desensitized yet, and 0 indicates that the corresponding data block has been desensitized.
  • a bitmap of the file to be read can be obtained, and according to the bitmap, data blocks that have not been desensitized in the file to be read can be identified. For example, if the value of the flag bit in the bitmap of the file to be read is 1 (indicating that the corresponding data block has completed desensitization processing) or 0 (indicating that the corresponding data block has not completed desensitization processing), the bitmap can be judged separately. Whether the value of each flag bit is 0, and the data block corresponding to the flag bit with the value 0 is determined as the data block that has not been desensitized in the file to be read.
  • a bitmap can be set for the file to be read, and by identifying the bitmap of the file to be read, the data blocks that have not been desensitized in the file to be read can be determined, which is simple and fast, and can improve processing efficiency.
  • the method may further include: modifying the flag bit in the bitmap corresponding to the data block that has completed the desensitization process, to indicate that the corresponding data block has completed the desensitization process.
  • the flag bit corresponding to the desensitized data block in the bitmap can be modified to indicate that the data block has been desensitized. deal with.
  • the bitmap can be updated in time according to the change of the desensitization state of the data blocks in the file to be read, thereby improving the accuracy of the bitmap of the file to be read.
  • the method may further include: modifying the desensitization mark of the to-be-read file that has been desensitized to indicate that the desensitization process of the to-be-read file has been completed.
  • the desensitization process of the file to be read when the desensitization process of the file to be read has been completed, that is, when the bitmap of the image to be processed indicates that the desensitization process has been completed for each data block included in it, the desensitization process of the file to be read can be modified. mark to indicate that desensitization has been completed. In this way, according to the change of the desensitization state of the file to be read, the desensitization mark of the file to be read can be updated in time, thereby improving the accuracy of the desensitization mark of the file to be read.
  • a desensitization metafile may also be set for the file to be read, which is used to identify the desensitization state of the file to be read and the desensitization state of each data block included in the file to be read.
  • the desensitization metafile such as bitmap, array, matrix, tensor, etc. The present application does not limit the specific representation of the desensitization metafile.
  • the desensitization metafile of the file to be read may include desensitization marks and bitmaps.
  • the desensitization marks and bitmaps are similar to the above, and will not be repeated here.
  • incremental desensitization may be performed by a combination of online desensitization and offline desensitization, and a combination of multiple online desensitizations.
  • incremental desensitization combined with online desensitization and offline desensitization may include offline desensitization of backup data first and then online desensitization, online desensitization of backup data first and then offline desensitization, or offline desensitization of backup data.
  • the data is subjected to multiple online desensitization and multiple offline desensitization, that is, the desensitization processing of the backup data is switched between offline desensitization and online desensitization multiple times.
  • the offline desensitization is performed on the backup data first, and then the online desensitization is performed.
  • the specific process is similar to the above-mentioned processing process, and will not be repeated here.
  • the backup data is first desensitized online and then desensitized offline. It may be that the backup data is not desensitized before product development or testing begins. After product development or testing begins, the host can read files or read files. When taking the data block in the file, after online desensitization of the file or data block to be read, send it to the host, and save the desensitized file or data block at the same time, update its desensitization status, and then use it in product development or testing. During the stop period, offline desensitization is performed on files or data blocks that have not been desensitized in the backup data.
  • the read files or data blocks that have not been desensitized are sent to the host after online desensitization, and the desensitized files or data blocks are saved at the same time, and their desensitization status is updated, so that they can be used with product development or testing.
  • the desensitization process of the backup data is switched between offline desensitization and online desensitization multiple times until all files in the backup data have been desensitized.
  • Incremental desensitization combined with multiple online desensitization can be in the process of product development or testing, when the host reads files or data blocks from the backup data multiple times, the file or data read by the host can be read each time. After the block is desensitized online and sent to the host, the desensitized file or data block is saved and its desensitization status is updated, so that when the host reads the file or data block next time, it can be desensitized according to the latest desensitization.
  • the document desensitization method combining online desensitization and offline desensitization and the document desensitization method combining multiple online desensitization will be exemplarily described below with reference to specific examples.
  • FIG. 5 shows a schematic diagram of an application of a method for desensitizing a document according to an embodiment of the present application.
  • the backup data before desensitization includes five files, namely F1, F2, F3, F4 and F5, and the backup data is used for product development or testing.
  • product development or testing has not yet started, offline desensitization of backup data can be performed, and files that have been desensitized can be saved.
  • the host reads F2, and the storage device reads the F2 that has been desensitized in a saved data block, and performs online desensitization on two data blocks in F2 that have not yet been desensitized; desensitization processing After completion, the storage device can send the F2 (ie F2') that has been desensitized to the host, and at the same time update the desensitization mark and bitmap of the F2 and save the F2'.
  • F2 ie F2'
  • the host reads F4, the storage device reads the F4 whose three data blocks have been desensitized, and desensitizes one data block in F4 that has not yet been desensitized; desensitization After completion, the F4 (ie, F4') that has been desensitized can be sent to the host, and the desensitization mark and bitmap of F4 are updated at the same time, and F4' is saved.
  • the storage device reads F5 from the backup data, desensitizes F5 online, and then sends the desensitized F5 (ie F5') to the host, and updates the desensitization flag and bit of F5 at the same time.
  • F5' the desensitized F5
  • online desensitization and offline desensitization can be combined, and the results of each desensitization process are saved, and the desensitization status of the file is updated at the same time, so that offline desensitization can be used for online desensitization. It is not necessary to wait for the offline desensitization to complete before starting product development or testing. It also enables offline desensitization to use the online desensitization results to incrementally desensitize files to each other, and gradually form a clean backup until all files are desensitized. All have been desensitized.
  • FIG. 6 shows a schematic diagram of an application of a method for desensitizing a document according to an embodiment of the present application.
  • the backup data before desensitization processing includes five files, namely F1, F2, F3, F4 and F5, and the backup data is used for product development or testing. Offline desensitization is not performed prior to product development or testing.
  • the host reads the file for the first time, it reads F1
  • the storage device reads F1 from the backup data, desensitizes the F1 online, and then sends the desensitized F1 to the host (that is, F1). '), at the same time update the desensitization mark and bitmap of F1 and save F1';
  • the storage device When the host reads the file for the second time, it reads the files F2 and F3, the storage device reads F2 and F3 from the backup data, and desensitizes F2 and F3 online respectively, and then sends the desensitized F2 ( That is, F2') and F3 (that is, F3') that have completed desensitization treatment, update the desensitization marks and bitmaps of F2 and F3 at the same time, and save F2' and F3';
  • the storage device When the host reads the file for the third time, it reads the files F4 and F5, the storage device reads F4 and F5 from the backup data, and desensitizes F4 and F5 online respectively, and then sends the desensitized F4 ( That is, F4') and F5 (that is, F5') after the desensitization treatment has been completed, and the desensitization marks and bitmaps of F4 and F5 are updated at the same time, and F4' and F5' are saved.
  • the files that have been desensitized each time can also be saved, so that multiple online desensitization can also be performed incrementally.
  • FIG. 7 shows a schematic diagram of a processing procedure of a document desensitization method according to an embodiment of the present application.
  • the host in the process of product development or testing, can read files from the storage device through the IO path, and the storage device can receive a file read command sent by the host in step S701, and in step S702, in response to the read file command, obtain the file to be read, wherein at least a part of the data blocks in the file to be read has been desensitized, in step S703, obtain the desensitization mark of the file to be read, and in step S704, determine the desensitization Whether the desensitization flag is 1, where the desensitization flag is 1 or 0.
  • the desensitization flag is 1, the desensitization process has been completed for the file to be read, and the storage device does not need to perform the desensitization process again, and in step S709, the file to be read that has been desensitized can be directly sent to the host;
  • step S705 can be executed to store the settings to obtain the bitmap of the file to be read, and in step S706 , according to the bitmap, identify the data blocks that have not been desensitized in the file to be read, and in step S707, desensitize the data blocks that have not been desensitized in the file to be read;
  • step S708 can be executed for the storage settings, and the file to be read that has been desensitized is sent to the host , for product development or testing.
  • step S709 may also be performed to update the desensitization mark and bitmap of the file to be read that has completed the desensitization process, and save them in the hard disk of the storage device.
  • online desensitization can identify the desensitization state of the file to be read and the desensitization state of each data block contained in the file to be read in real time through the desensitization mark and bitmap of the file to be read, whereby, unnecessary or repeated desensitization processing can be reduced, and the impact on the user's production environment (such as product development or test environment) can be reduced.
  • the host and the storage device when the host and the storage device support independent access to the data blocks in the file to be read, it can be determined according to the flag bits corresponding to the data blocks to be accessed in the bitmap of the file to be read. Whether the data block has been desensitized. If the flag indicates that the data block has been desensitized (for example, the flag is 1), the storage device can directly send the data block to the host without performing desensitization again; if the flag indicates that the data block has not yet been desensitized processing (for example, the flag bit is 0), the storage device can desensitize the data block, and send the desensitized data block to the host. At the same time, the storage device can also save the desensitized data block on the hard disk. , and update the corresponding flag bit in the bitmap.
  • the storage device when the storage device saves the desensitized file or data block to the hard disk, it can also save the attribute information of the desensitized file or data block to the hard disk at the same time.
  • the attribute information of the file or data block may include the name, size (eg size, length, etc.), location, etc. of the file or data block. This application does not limit the specific attribute information of the file or data block.
  • online desensitization and offline desensitization can be combined, and the files or data blocks that have been desensitized can be saved, so that not only can the desensitization process be realized during product development or testing
  • the incremental desensitization of the backup data used improves the processing efficiency of file desensitization, reduces the impact of online desensitization on the user's production environment (such as user development environment or test environment), and also enables desensitization processing online at any time. Switch between offline desensitization, thereby enabling product development or testing activities to be carried out at any time.
  • FIG. 8 shows a block diagram of a document desensitization apparatus according to an embodiment of the present application.
  • the file desensitization device is applied to a storage device, which can be realized by a desensitization engine, and the device includes:
  • the command receiving module 81 is configured to receive a file read command sent by the host, where the file read command is used to request a file to be read;
  • the file acquisition module 82 in response to the read file command, acquires the to-be-read file, where at least a part of the data blocks in the to-be-read file have completed desensitization processing;
  • the desensitization processing module 83 is used to desensitize the data blocks of the to-be-read file that have not been desensitized;
  • the file sending module 84 is configured to send the file to be read for which the desensitization process has been completed to the host.
  • the file desensitization device may be located inside the storage device, and may be implemented by hardware, software, or a combination of software and hardware.
  • the document desensitization device can also be located outside the storage device, and can be implemented as an independent desensitization device, wherein the desensitization device can be any form of electronic device, such as a server, a desktop computer, a mobile device, and other devices including processors and memory. of any type of computing device.
  • the present application does not limit the location and specific implementation of the document desensitization device.
  • the apparatus may further include: a file storage module, configured to save the to-be-read file for which the desensitization process has been completed in the hard disk of the storage device.
  • the apparatus may further include: a bitmap acquisition module, configured to acquire a bitmap of the to-be-read file, where the bitmap is used to indicate the contents of the to-be-read file Whether the desensitization processing is completed for each data block; the data block identification module is configured to identify, according to the bitmap, the data blocks that have not been desensitized in the to-be-read file.
  • a bitmap acquisition module configured to acquire a bitmap of the to-be-read file, where the bitmap is used to indicate the contents of the to-be-read file Whether the desensitization processing is completed for each data block
  • the data block identification module is configured to identify, according to the bitmap, the data blocks that have not been desensitized in the to-be-read file.
  • the document to be read has a desensitization mark
  • the desensitization mark is used to indicate whether the document to be read has completed desensitization processing
  • the device may further include: a file desensitization identification module, configured to determine the desensitization mark according to the desensitization mark before the desensitization process is performed on the data blocks of the to-be-read file that have not been desensitized.
  • the files to be read are files that have not yet been desensitized.
  • the device may further include: a desensitization mark modification module, configured to modify the desensitization mark of the to-be-read file that has been desensitized to indicate the to-be-read file Desensitization has been completed.
  • a desensitization mark modification module configured to modify the desensitization mark of the to-be-read file that has been desensitized to indicate the to-be-read file Desensitization has been completed.
  • the apparatus may further include: a bitmap modification module, configured to modify the flag bits in the bitmap corresponding to the data blocks that have been desensitized to indicate the corresponding data blocks Desensitization has been completed.
  • a bitmap modification module configured to modify the flag bits in the bitmap corresponding to the data blocks that have been desensitized to indicate the corresponding data blocks Desensitization has been completed.
  • At least a part of the data blocks in the to-be-read file is desensitized through offline desensitization.
  • the to-be-read file includes at least one of a text file, a report file, a picture file, an audio file, and a video file.
  • each of the above-mentioned modules in the file desensitization device can be implemented by the CPU calling program instructions.
  • An embodiment of the present application provides a file desensitization device, including: a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above method when executing the instructions.
  • Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method.
  • Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • EPROM Errically Programmable Read-Only-Memory
  • SRAM static random access memory
  • portable compact disk read-only memory Compact Disc Read-Only Memory
  • CD - ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • memory sticks floppy disks
  • Computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, may be connected to an external computer (eg, use an internet service provider to connect via the internet).
  • electronic circuits such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions.
  • Logic Array, PLA the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented by a combination of hardware and software, such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种文件脱敏方法、装置及存储介质,其中,所述方法由存储设备执行,所述方法包括:接收主机发送的读文件命令,所述读文件命令用于请求待读取文件(S11);响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理(S12);对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理(S13);向所述主机发送已完成脱敏处理的待读取文件(S14)。该方法可以对待读取文件进行增量脱敏,从而可以充分利用之前的脱敏结果,提高文件脱敏的处理效率。

Description

文件脱敏方法、装置及存储介质
本申请要求于2020年10月27日提交中国专利局、申请号为202011166187.4、申请名称为“文件脱敏方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种文件脱敏方法、装置及存储介质。
背景技术
产品开发或测试过程中,通常需要从存储设备获取文件进行相关处理。由于文件中可能包含涉及用户隐私的敏感信息,为了保护用户隐私,在将包含敏感信息的文件用于产品开发或测试之前,需要对其进行脱敏处理。然而,目前的脱敏处理存在耗时较长、效率较低等问题,此外,低效的脱敏处理还可能影响开发或测试过程中的输入输出(IO)效率。
发明内容
有鉴于此,本申请提出了一种文件脱敏方法、装置及存储介质。
第一方面,本申请的实施例提供了一种文件脱敏方法,所述方法由存储设备执行,所述方法包括:接收主机发送的读文件命令,所述读文件命令用于请求待读取文件;响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理;对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;向所述主机发送已完成脱敏处理的待读取文件。
本实施例的文件脱敏方法,由存储设备执行,存储设备可接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
根据第一方面,在所述文件脱敏方法的第一种可能的实现方式中,所述方法还包括:将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
在本实施例中,存储设备可将每次的脱敏结果(即已完成脱敏处理的待读取文件)保存在硬盘中。随着增量脱敏的不断进行,可逐渐形成一个净化备份,该净化备份包括已完成脱敏处理的文件及非敏感文件。之后的开发或测试活动可基于该净化备份展开,不仅可减少不必要的多次脱敏,还可降低在线脱敏对开发或测试过程中的IO效率的影响,提高文件的读取效率。
根据第一方面,在所述文件脱敏方法的第二种可能的实现方式中,所述方法还包括:获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
在本实施例中,可以为待处理文件设置位图,并通过对待读取文件的位图的识别,确定出待读取文件中尚未进行脱敏处理的数据块,简单快速,从而可提高处理效率。
根据第一方面,在所述文件脱敏方法的第三种可能的实现方式中,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
所述方法还包括:在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
在本实施例中,待读取文件具有脱敏标记,并可根据脱敏标记来确定待读取文件的脱敏状态,从而可实现对待读取文件的脱敏状态的管理与识别。
根据第一方面的第二种可能的实现方式,在所述文件脱敏方法的第四种可能的实现方式中,所述方法还把包括:修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
在本实施例中,对于待读取文件中已完成脱敏处理的数据块,可修改位图中与已完成脱敏处理的数据块相对应的标志位,以指示该数据块已完成脱敏处理。通过这种方式,可根据待读取文件中的数据块的脱敏状态的变化,及时更新位图,从而可提高待读取文件的位图的准确性。
根据第一方面的第三种可能的实现方式,在所述文件脱敏方法的第五种可能的实现方式中,所述方法还把包括:修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
在本实施例中,在待读取文件已完成脱敏处理时,即待处理图像的位图指示其所包含的各个数据块均已完成脱敏处理时,可修改待读取文件的脱敏标记,以指示其已完成脱敏处理。通过这种方式,可根据待读取文件的脱敏状态的变化,及时跟新待读取文件的脱敏标记,从而可提高待读取文件的脱敏标记的准确性。
根据第一方面,在所述文件脱敏方法的第六种可能的实现方式中,所述待读取文件中的至少一部分数据块通过离线脱敏完成脱敏处理。
通过这种方式,可在离线脱敏的基础上进行在线脱敏,使得离线脱敏与在线脱敏可以共同维护一个脱敏结果。
根据第一方面或者第一方面的多种可能的实现方式中的一种或几种,在所述文件脱敏方法的第七种可能的实现方式中,所述待读取文件包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。
在本实施例中,待读取文件可包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种,还可包括办公文档、XML(Extensible Markup Language,可扩展标记语言)文件、HTML(HyperText Markup Language,超文本标记语言)文件等其他非结构化数据。从而可以通过上述方式,对各种非结构化数据进行脱敏处理。
第二方面,本申请的实施例提供了一种文件脱敏装置,所述装置应用于存储设备,所述装置包括:命令接收模块,用于接收主机发送的读文件命令,所述读文件命令用于请求待读取文件;文件获取模块,响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理;脱敏处理模块,用于对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;文件发送模块,用于向所述主机发送已完成脱敏处理的待读取文件。
本实施例的文件脱敏装置,应用于存储设备,可接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
根据第二方面,在所述文件脱敏装置的第一种可能的实现方式中,所述装置还包括:文件存储模块,用于将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
在本实施例中,存储设备可将每次的脱敏结果(即已完成脱敏处理的待读取文件)保存在硬盘中。随着增量脱敏的不断进行,可逐渐形成一个净化备份,该净化备份包括已完成脱敏处理的文件及非敏感文件。之后的开发或测试活动可基于该净化备份展开,不仅可减少不必要的多次脱敏,还可降低在线脱敏对开发或测试过程中的IO效率的影响,提高文件的读取效率。
根据第二方面,在所述文件脱敏装置的第二种可能的实现方式中,所述装置还包括:位图获取模块,用于获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;数据块识别模块,用于根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
在本实施例中,可以为待处理文件设置位图,并通过对待读取文件的位图的识别,确定出待读取文件中尚未进行脱敏处理的数据块,简单快速,从而可提高处理效率。
根据第二方面,在所述文件脱敏装置的第三种可能的实现方式中,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
所述装置还包括:文件脱敏识别模块,用于在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
在本实施例中,待读取文件具有脱敏标记,并可根据脱敏标记来确定待读取文件的脱敏状态,从而可实现对待读取文件的脱敏状态的管理与识别。
根据第二方面的第二种可能的实现方式,在所述文件脱敏装置的第四种可能的实现方式中,所述装置还包括:位图修改模块,用于修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
在本实施例中,对于待读取文件中已完成脱敏处理的数据块,可修改位图中与已完成脱敏处理的数据块相对应的标志位,以指示该数据块已完成脱敏处理。通过这种方式,可根据待读取文件中的数据块的脱敏状态的变化,及时更新位图,从而可提高待读取文件的位图的准确性。
根据第二方面的第三种可能的实现方式,在所述文件脱敏装置的第五种可能的实现方式中,所述装置还包括:脱敏标记修改模块,用于修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
在本实施例中,在待读取文件已完成脱敏处理时,即待处理图像的位图指示其所包含的各个数据块均已完成脱敏处理时,可修改待读取文件的脱敏标记,以指示其已完成脱敏处理。通过这种方式,可根据待读取文件的脱敏状态的变化,及时跟新待读取文件的脱敏标记,从 而可提高待读取文件的脱敏标记的准确性。
根据第二方面,在所述文件脱敏装置的第六种可能的实现方式中,所述待读取文件中的至少一部分数据块通过离线脱敏完成脱敏处理。
通过这种方式,可在离线脱敏的基础上进行在线脱敏,使得离线脱敏与在线脱敏可以共同维护一个脱敏结果。
根据第二方面或者第二方面的多种可能的实现方式中的一种或几种,在所述文件脱敏装置的第七种可能的实现方式中,所述待读取文件包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。
在本实施例中,待读取文件可包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种,还可包括办公文档、XML(Extensible Markup Language,可扩展标记语言)文件、HTML(HyperText Markup Language,超文本标记语言)文件等其他非结构化数据。从而可以通过上述方式,对各种非结构化数据进行脱敏处理。
第三方面,本申请的实施例提供了一种文件脱敏装置,包括处理器及用于存储处理器可执行指令的存储器,其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的文件脱敏方法。
本实施例的文件脱敏装置,应用于存储设备,可接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的文件脱敏方法。
根据本申请的实施例,通过接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
第五方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的文件脱敏方法。
根据本申请的实施例,通过接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出根据本申请一实施例的文件脱敏方法的应用场景示意图。
图2示出根据本申请一实施例的文件脱敏方法的应用场景示意图。
图3示出根据本申请一实施例的文件脱敏方法的应用场景示意图。
图4示出根据本申请一实施例的文件脱敏方法的流程图。
图5示出根据本申请一实施例的文件脱敏方法的应用示意图。
图6示出根据本申请一实施例的文件脱敏方法的应用示意图。
图7示出根据本申请一实施例的文件脱敏方法的处理过程的示意图。
图8示出根据本申请一实施例的文件脱敏装置的框图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
在相关技术中,脱敏处理是指识别待脱敏对象(例如数据、文件等)中的敏感信息,并通过遮挡、混淆等方式对该敏感信息进行修改,以隐藏敏感信息,实现敏感信息的可靠保护。
脱敏处理可包括在线脱敏及离线脱敏。在产品开发或测试过程中,在线脱敏是指当主机读取或访问一个文件时,脱敏引擎(用于执行脱敏处理)可实时识别读取或访问的文件中的敏感信息,对敏感信息进行脱敏处理,并将已完成脱敏的文件发送给主机使用。也就是说,在线脱敏时,对于主机的每一次文件读取或访问,脱敏引擎都需要进行一次脱敏处理,处理效率较低且可能存在不必要的多次脱敏处理。例如,主机多次读取的文件为同一个文件时,脱敏引擎会对该文件进行多次脱敏处理。此外,在线脱敏时,脱敏引擎需要截取IO文件,在完成脱敏处理后,才可将文件发送给主机,这种方式会显著地影响开发或测试过程中的IO效率。
离线脱敏是指在产品开发或测试之前,脱敏引擎通过枚举、遍历等方式,对开发或测试过程中使用的所有文件进行脱敏处理,并对已完成脱敏处理的文件进行保存。离线脱敏结束后,产品的开发或测试活动将基于保存的已完成脱敏处理的文件展开,而开发或测试过程中将无需进行在线脱敏。
举例来说,对于产品开发或测试过程中使用的、包括多个文件的备份数据,例如一个备份LUN(Logical Unit Number,逻辑单元号,备份数据存储在与该逻辑单元号对应的设备上),进行离线脱敏时,可在备份LUN的基础上创建一个脱敏镜像,并将已完成脱敏处理的文件写入脱敏镜像。离线脱敏完成后,脱敏镜像即为不包含敏感信息的净化镜像,开发或测试活动 将基于净化镜像展开。
但是,离线脱敏的局限性在于,开发或测试活动必须在离线脱敏完成之后才可以开始,等待时间较长,且如果不进行离线脱敏,开发或测试活动将无法开展。
为了解决上述技术问题,本申请提供了一种文件脱敏方法,本申请实施例的文件脱敏方法可以应用于存储设备,存储设备可接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
图1示出根据本申请一实施例的文件脱敏方法的应用场景示意图。图1所示的应用场景提供了一种脱敏系统,包括生产设备10、存储设备20和主机30,本实施例的文件脱敏方法,应用于存储设备20,可由存储设备20中的脱敏引擎21来执行。
其中,生产设备10可以是任何形式的电子设备,例如服务器、台式计算机、大型计算机以及其他包含处理器和存储器的任何类型的计算设备。生产设备10将生产数据保存在存储装置40中,生产数据是来源于用户的原始文件,其中可能包含涉及用户隐私的信息。为了防止用户隐私泄露,需要对生产数据进行脱敏处理。存储装置40既可以是位于生产设备10内部的存储器,也可以是位于生产设备10外部的设备的存储器,该外部的设备与生产设备10之间可通信,用于存储来自生产设备10的生产数据。该外部的设备可以是任何形式的电子设备,例如服务器、台式计算机、大型计算机、存储阵列以及其他包含处理器和存储器的任何类型的计算设备。
主机30主要指开发/测试服务器,开发/测试人员通过主机30从存储设备20中获取文件以用于产品开发或测试。在本实施例中,存储设备20提供给主机30的文件是经过脱敏处理之后的文件,以保障用户隐私。在产品形态上,主机30可以是任何形式的电子设备,例如服务器、台式计算机、移动设备以及其他包含处理器和存储器的任何类型的计算设备。
存储设备20可以是一个网络附属存储(Network Attached Storage,NAS)设备,NAS设备是专用的高性能文件存储设备,它通过网络和文件共享协议将文件数据提供给用户访问。它和主机30之间使用的协议包括用于数据传输的TCP/IP协议,以及用于网络文件服务的CIFS和NFS协议等。这种情况下,主机30中配置有NFS/CIFS客户端,通过NFS/CIFS客户端在读命令中指明文件名、位置或其他属性来访问某个文件。相应的,存储设备20中配置有NFS/CIFS服务端,NFS/CIFS服务端解析所述读命令,由于文件系统中记录有该文件位于硬盘中的位置,因此存储设备20可以将接收的读命令中的文件名、位置转换为文件的地址以获取文件。
存储设备20也可以是一个存储区域网络(Storage Area Network,SAN)设备,SAN设备通过光纤通道网络与主机30通信。在本实施例中的SAN设备中具有文件系统,可以进行文件访问。除了NAS设备和SAN设备外,存储设备20还可以是其他具有存储功能的设备,并且脱敏系统所包含的存储设备20的数量可以是一个也可以是多个,本实施例不对存储设备20的数量进行限定。
由于生产设备10中存储的生产数据属于原始数据,如果直接对原始数据进行脱敏处理会破坏该数据,难以恢复。所以,在图1所示的应用场景中,生产数据被发送至存储设备20作 为副本保存下来,存储设备20针对所述副本,而非原始数据本身,进行脱敏处理。
图2示出根据本申请一实施例的文件脱敏方法的应用场景示意图。图2所示的应用场景与图1类似,不同之处在于生产设备10可直接将生产数据存储至存储设备20中,为了使得原始的生产数据不被破坏,存储设备20创建一份所述生产数据的副本,对所述副本进行脱敏处理。换言之,在图2所示的场景中,生产数据以及生产数据副本均位于存储设备20中。创建副本的方式有多种,例如复制、快照、克隆等,在此不一一赘述。
图3示出根据本申请一实施例的文件脱敏方法的应用场景示意图。图3所示的应用场景提供了一种脱敏系统,包括生产设备10、存储设备20、主机30和脱敏设备50。与图1所示的应用场景的不同之处的是,图3所示的应用场景提供了独立的脱敏设备50,用于对主机30读取的文件进行脱敏处理。脱敏设备50包括脱敏引擎21。
在产品形态上,脱敏设备50可以是任何形式的电子设备,例如服务器、台式计算机、移动设备以及其他包含处理器和存储器的任何类型的计算设备。图3所示的应用场景中的生产设备10、存储设备20(不包括脱敏引擎)、主机30均于图1类似,在此不一一赘述。
在其他可能的应用场景中,脱敏系统可包括至少两个脱敏引擎。举例来说,假设脱敏系统包括两个脱敏引擎,分别为位于脱敏设备中的第一脱敏引擎及位于存储设备中的第二脱敏引擎,两个脱敏引擎可分别用于执行不同的脱敏处理,例如,第一脱敏引擎用于执行在线脱敏,第二脱敏引擎用于执行离线脱敏,或者,第一脱敏引擎用于执行离线脱敏,第二脱敏引擎用于执行在线脱敏;两个脱敏引擎也可执行相同的脱敏处理,例如,两个脱敏引擎均可执行在线脱敏及离线脱敏。需要说明的是,本领域技术人员可根据实际情况对脱敏系统中脱敏引擎的数量及各个脱敏引擎执行的脱敏处理进行设置,本实施例对此不作限制。
图4示出根据本申请一实施例的文件脱敏方法的流程图。如图4所示,该文件脱敏方法由存储设备20执行,该方法包括步骤S11至步骤S14。
在步骤S11中,接收主机发送的读文件命令,所述读文件命令用于请求待读取文件。
在本实施例中,开发/测试人员通过主机从存储设备中获取文件时,主机可通过IO路径向存储设备发送读文件命令,存储设备可接收主机发送的读文件命令,其中,该读文件命令用于请求待读取文件。
在一种可能的实现方式中,待读取文件可包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。待读取文件还可包括办公文档、XML(Extensible Markup Language,可扩展标记语言)文件、HTML(HyperText Markup Language,超文本标记语言)文件等其他非结构化数据。本实施例对待读取文件的具体类型不作限制。
在步骤S12中,响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理。
在本实施例中,存储设备在接收到主机发送的读文件命令后,可响应于该读文件命令,根据读文件命令中的文件名、位置等信息,确定待读取文件的存储地址,并根据该存储地址,获取待读取文件。
其中,待读取文件中的至少一部分数据块已完成脱敏处理。换言之,待读取文件可被划分为多个数据块,其中的一部分数据块已完成脱敏处理。
在一种可能的实现方式中,所述待读取文件中的至少一部分数据块可通过离线脱敏完成脱敏处理。
举例来说,在未进行产品开发或测试的情况下,可通过离线脱敏对待读取文件进行脱敏处理,具体的,可将待读取文件划分为一定大小(例如4KB)的多个数据块,例如,将待读取文件划分为5个数据块,并通过离线脱敏对待读取文件进行脱敏处理,在其中3个数据块完成脱敏处理的情况下,即已完成脱敏处理的数据块的数量为3时,需要进行产品开发或测试,在该情况下,可停止离线脱敏,将已完成脱敏处理的3个数据块保存在存储设备的硬盘中,并通过位图、数组、矩阵等方式,记录各个数据块的脱敏状态,然后将脱敏处理切换为在线脱敏,即在开发或测试过程中,由在线脱敏对待读取文件的2个尚未进行脱敏处理的数据块进行脱敏处理。
通过这种方式,可在离线脱敏的基础上进行在线脱敏,使得离线脱敏与在线脱敏可以共同维护一个脱敏结果。
在一种可能的实现方式中,所述待读取文件中的至少一部分数据块可通过在线脱敏完成脱敏处理。
例如,假设主机及存储设备既支持对待读取文件的整体访问,也支持对待读取文件中的各个数据块的独立访问,待读取文件包括5个数据块,待读取文件未经过离线脱敏,主机首次读文件时,读取的是待读取文件的第1个数据块,存储设备可对待读取文件的第1个数据块进行在线脱敏,并将已完成脱敏处理的第1个数据块发送给主机,同时将已完成脱敏处理的第1个数据块保存在存储设备的硬盘中,并记录其脱敏状态;主机再次读文件时,读取的是待读取文件,存储设备可响应于该读文件命令,获取待读取文件,并进行在线脱敏,其中,待读取文件的一个数据块(即第1个数据块)已完成脱敏处理。
通过这种方式,可在之前的在线脱敏的基础上,继续通过在线脱敏进行增量脱敏,使得多次在线脱敏也可以共同维护一个脱敏结果。
在步骤S13中,对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理。
在本实施例中,存储设备可识别出待读取文件的尚未进行脱敏处理的数据块,并对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理。
在一种可能的实现方式中,可通过数值/字符串替换、无效化、随机化、偏移和取整、掩码屏蔽等方式进行脱敏处理。下面将结合具体示例,对脱敏处理的具体方式进行示例性说明。
示例一:通过数值/字符串替换进行脱敏处理。
数值/字符串替换是指使用固定的虚构值代替真实数值或字符串。例如,脱敏处理前的文件如下表1所示,对于表1中的手机号码,可以使用mobile=13800013800函数对手机号码字段进行赋值替换为13800013800。通过数值替换,对手机号码进行脱敏处理后的文件如下表2所示,表2中,脱敏处理后的手机号码均显示为13800013800,其真实值已被隐藏。
表1脱敏处理前的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 四川省A市 138####8611 51132119######0672 2020-5-30 15:01:56
2 100001 李四 四川省A市 133####6953 51132119######0611 2020-6-30 16:56:03
3 100002 王五 深圳市B区 186####9898 51121019######5582 2020-4-30 16:01:50
4 100003 赵六 深圳市C区 180####9465 46003319######0651 2020-7-30 16:15:03
5 100004 钱七 深圳市D区 181####7898 46003119######0818 2020-9-30 17:20:50
表2对手机号码进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 四川省A市 13800138000 51132119######0672 2020-5-30 15:01:56
2 100001 李四 四川省A市 13800138000 51132119######0611 2020-6-30 16:56:03
3 100002 王五 深圳市B区 13800138000 51121019######5582 2020-4-30 16:01:50
4 100003 赵六 深圳市C区 13800138000 46003319######0651 2020-7-30 16:15:03
5 100004 钱七 深圳市D区 13800138000 46003119######0818 2020-9-30 17:20:50
其中,#表示数字0-9中的任意一个。
示例二:通过无效化进行脱敏处理。
通过无效化进行脱敏处理是指通过对数据使用截断、加密、隐藏等方式使敏感数据脱敏,使其不再具有利用价值,例如,将地址以******代替真实值。数据无效化与数据替换所达成的效果基本类似。例如,可对表1中的地址进行隐藏式的脱敏处理。具体的,可采用address=“*******”函数将地址字段起来隐藏,以达到脱敏效果。对地址进行隐藏式脱敏处理后的文件如下表3所示。
表3对地址进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 ******* 138####8611 51132119######0672 2020-5-30 15:01:56
2 100001 李四 ******* 133####6953 51132119######0611 2020-6-30 16:56:03
3 100002 王五 ******* 186####9898 51121019######5582 2020-4-30 16:01:50
4 100003 赵六 ******* 180####9465 46003319######0651 2020-7-30 16:15:03
5 100004 钱七 ******* 181####7898 46003119######0818 2020-9-30 17:20:50
可选的,在上述示例中,还可以对地址字段进行截断式的脱敏,即,只隐藏部分信息。具体的,可使用address=left(address,3)+"******"函数对address地址源字符串的左边三位字串加上******。对地址进行截断式脱敏处理后的文件如下表4所示。
表4对地址进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 四川省*** 138####8611 51132119######0672 2020-5-30 15:01:56
2 100001 李四 四川省*** 133####6953 51132119######0611 2020-6-30 16:56:03
3 100002 王五 深圳市*** 186####9898 51121019######5582 2020-4-30 16:01:50
4 100003 赵六 深圳市*** 180####9465 46003319######0651 2020-7-30 16:15:03
5 100004 钱七 深圳市*** 181####7898 46003119######0818 2020-9-30 17:20:50
示例三:通过随机化进行脱敏处理
通过随机化进行脱敏处理是指采用随机数据代替真实的数值/字符串,保持替换值的随机性以模拟样本的真实性。例如,使用随机生成的姓名代替真实姓名,或者用指定范围的随机 数代替真实数值。
具体的,可采用A1(rand(A1.len())+1)+B1(rand(B1.len())+1)函数对姓名进行随机化(从A1“姓氏.txt”和B1“名字.txt”外部字典表随机化组合生成姓名)。通过随机化,对姓名进行脱敏处理后的文件如下表5所示。
表5对姓名进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张一 四川省A市 138####8611 51132119######0672 2020-5-30 15:01:56
2 100001 李二 四川省A市 133####6953 51132119######0611 2020-6-30 16:56:03
3 100002 王三 深圳市B区 186####9898 51121019######5582 2020-4-30 16:01:50
4 100003 赵四 深圳市C区 180####9465 46003319######0651 2020-7-30 16:15:03
5 100004 钱五 深圳市D区 181####7898 46003119######0818 2020-9-30 17:20:50
示例四:通过偏移和取整进行脱敏处理
偏移和取整是指通过随机移位改变数字数据,例如,将日期2018-01-02 8:12:25变为2018-01-02 8:00:00,偏移和取整在保持了数据的安全性的同时保证了范围的大致真实性,此项功能在大数据利用环境中具有重大价值。
具体的,可使用string(operatetime,"yyyy-MM-dd HH:00:00")函数按照偏移和取整规则格式化成“yyyy-MM-dd HH:00:00”格式。通过偏移和取整,对操作时间进行脱敏处理后的文件如表6所示。
表6对操作时间进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 四川省A市 138####8611 51132119######0672 2020-5-30 15:00:00
2 100001 李四 四川省A市 133####6953 51132119######0611 2020-6-30 16:00:00
3 100002 王五 深圳市B区 186####9898 51121019######5582 2020-4-30 16:00:00
4 100003 赵六 深圳市C区 180####9465 46003319######0651 2020-7-30 16:00:00
5 100004 钱七 深圳市D区 181####7898 46003119######0818 2020-9-30 17:00:00
示例五:通过掩码屏蔽进行脱敏处理
掩码屏蔽是针对账户类数据的部分信息进行脱敏时的有力工具,例如银行卡号或是身份证号的脱敏。可以指定全部掩盖还是部分掩盖(掩盖的范围,例如后X位,中间X位等)。例如,将身份证号的出生日期进行掩码屏蔽脱敏。
具体的,可使用left(string(idnumber),6)+"********"+right(string(idnumber),4)函数对身份证号进行掩码屏蔽。通过掩码屏蔽,对身份证号进行脱敏处理后的文件如下表7所示。
表7对身份证号进行脱敏处理后的文件
序号 代码 姓名 地址 手机号码 身份证号码 操作时间
1 100000 张三 四川省A市 138####8611 511321********0672 2020-5-30 15:01:56
2 100001 李四 四川省A市 133####6953 511321********0611 2020-6-30 16:56:03
3 100002 王五 深圳市B区 186####9898 511210********5582 2020-4-30 16:01:50
4 100003 赵六 深圳市C区 180####9465 460033********0651 2020-7-30 16:15:03
5 100004 钱七 深圳市D区 181####7898 460031********0818 2020-9-30 17:20:50
需要说明的是,尽管以上述示例对脱敏处理的具体方式进行了示例性说明,但是,本领域技术人员应当理解,脱敏处理的具体方式并不仅限于此,还可包括其他方式。在脱敏处理时,本领域技术人员可根据实际情况选取一种或多种具体方式,本实施例对此不作限制。
在步骤S14中,向所述主机发送已完成脱敏处理的待读取文件。
在本实施例中,在待读取文件中的所有数据块均已完成脱敏处理的情况下,可得到已完成脱敏处理的待读取文件,存储设备可向主机发送已完成脱敏处理的待读取文件,以用于产品开发或测试。
本实施例的文件脱敏方法,由存储设备执行,存储设备可接收主机发送的读文件命令,并响应于该读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,然后对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理,并向主机发送已完成脱敏处理的待读取文件,从而可以对待读取文件进行增量脱敏,不仅可充分利用之前的脱敏结果,提高文件脱敏的处理效率,还可将在线脱敏与离线脱敏相结合,使得产品开发或测试活动可以随时进行。
在一种可能的实现方式中,所述方法还可包括:将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
在本实施例中,存储设备可将每次的脱敏结果(即已完成脱敏处理的待读取文件)保存在硬盘中。例如,可选取某一特定位置,用于存放已脱敏完成的文件。随着增量脱敏的不断进行,可逐渐形成一个净化备份,该净化备份包括已完成脱敏处理的文件及非敏感文件。之后的开发或测试活动可基于该净化备份展开,不仅可减少不必要的多次脱敏,还可降低在线脱敏对开发或测试过程中的IO效率的影响,提高文件的读取效率。
在一种可能的实现方式中,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
所述方法还可包括:在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
在本实施例中,待读取文件具有脱敏标记,脱敏标记可用于指示待读取文件是否完成脱敏处理。例如,脱敏标记可设置为1或0,脱敏标记为1表示待读取文件已完成脱敏处理,即待读取文件的所有数据块均已完成脱敏处理;脱敏标记为0表示待读取文件尚未完成脱敏处理,即待读取文件中至少有一个数据块尚未完成脱敏处理。或者,脱敏标记为1表示待读取文件尚未完成脱敏处理,脱敏标记为0表示待读取文件已完成脱敏处理。
需要说明的是,脱敏标记也可设置为其他取值。本领域技术人员可根据实际情况对脱敏标记的取值及各个取值对应的脱敏状态进行设置,本申请对此不作限制。
在待读取文件具有脱敏标记时,在对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,可根据脱敏标记,确定待读取文件是尚未完成脱敏处理的文件。例如,脱敏标记可设置为1(表示已完成脱敏处理)或0(表示尚未完成脱敏处理),可判断待读取文件的脱敏标记是否为0,在待读取文件的脱敏标记为0时,可确定待读取文件是尚未完成脱敏处理的文件,然后可执行步骤S13,对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;在待读取文件的脱敏标记为1时,无需进行脱敏处理,可直接将待读取文件发送给主机。
根据本实施例,待读取文件具有脱敏标记,并可根据脱敏标记来确定待读取文件的脱敏状态,从而可实现对待读取文件的脱敏状态的管理与识别。
在一种可能的实现方式中,所述方法还可包括:获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
在本实施例中,可通过位图来指示待读取文件所包含的各个数据块是否完成脱敏处理。其中,位图可包括若干个小方格,每个小方格均对应待读取文件中的一个数据块的地址,每个小方格中记录取值为0或1的标志位,标志位可用于指示与记录该标志位的小方格对应的地址上存储的数据块是否已完成脱敏处理。待读取文件的位图中的小方格的数量与待读取文件所包含的数据块的数量一致。
其中,位图中的标志位的取值可以为1或0,可通过1表示对应的数据块已完成脱敏处理,0表示对应的数据块尚未完成脱敏处理,或者,可通过1表示对应的数据块尚未完成脱敏处理,0表示对应的数据块已完成脱敏处理。
需要说明的是,本领域技术人员可根据实际情况确定位图中的标志位的取值及各个取值对应的数据块的脱敏状态,本申请对此不作限制。
在本实施例中,可获取待读取文件的位图,并根据该位图,识别待读取文件中尚未进行脱敏处理的数据块。例如,待读取文件的位图中的标志位的取值为1(表示对应的数据块已完成脱敏处理)或0(表示对应的数据块尚未完成脱敏处理),可分别判断位图中的各个标志位的取值是否为0,并将取值为0的标志位对应的数据块,确定为待读取文件中尚未进行脱敏处理的数据块。
根据本实施例,可以为待读取文件设置位图,并通过对待读取文件的位图的识别,确定出待读取文件中尚未进行脱敏处理的数据块,简单快速,从而可提高处理效率。
在一种可能的实现方式中,所述方法还可包括:修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
在本实施例中,对于待读取文件中已完成脱敏处理的数据块,可修改位图中与已完成脱敏处理的数据块相对应的标志位,以指示该数据块已完成脱敏处理。通过这种方式,可根据待读取文件中的数据块的脱敏状态的变化,及时更新位图,从而可提高待读取文件的位图的准确性。
在一种可能的实现方式中,所述方法还可包括:修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
在本实施例中,在待读取文件已完成脱敏处理时,即待处理图像的位图指示其所包含的各个数据块均已完成脱敏处理时,可修改待读取文件的脱敏标记,以指示其已完成脱敏处理。通过这种方式,可根据待读取文件的脱敏状态的变化,及时跟新待读取文件的脱敏标记,从 而可提高待读取文件的脱敏标记的准确性。
在一种可能的实现方式中,还可为待读取文件设置脱敏元文件,用于标识待读取文件的脱敏状态,及待读取文件所包含的各个数据块的脱敏状态。脱敏元文件的表示方式有多种,例如位图、数组、矩阵、张量等,本申请对脱敏元文件的具体表示方式不作限制。
在一种可能的实现方式中,待读取文件的脱敏元文件可包括脱敏标记及位图。其中,脱敏标记及位图与上述类似,此处不再一一赘述。
在一种可能的实现方式中,可通过在线脱敏与离线脱敏相结合的方式、以及多次在线脱敏相结合的方式进行增量脱敏。其中,在线脱敏与离线脱敏相结合的增量脱敏,可包括对备份数据先进行离线脱敏后进行在线脱敏、对备份数据先进行在线脱敏后进行离线脱敏、或者对备份数据进行多次在线脱敏及多次离线脱敏,即备份数据的脱敏处理在离线脱敏与在线脱敏之间多次切换。
其中,对备份数据先进行离线脱敏后进行在线脱敏,其具体过程与上述处理过程类似,此处不再一一赘述。
对备份数据先进行在线脱敏后进行离线脱敏,可以是在产品开发或测试开始之前,并未对备份数据进行脱敏处理,可在产品开发或测试开始后,在主机读取文件或读取文件中的数据块时,对待读取文件或数据块进行在线脱敏后,发送给主机,同时保存已完成脱敏处理的文件或数据块,更新其脱敏状态,然后在产品开发或测试停止期间,对备份数据中尚未进行脱敏处理的文件或数据块进行离线脱敏。
对备份数据进行多次在线脱敏及多次离线脱敏,可以是在未进行产品开发或测试时,例如产品开发或测试开始之前或产品开发或测试停止期间等,对备份数据中尚未进行脱敏处理的文件或数据块进行离线脱敏,并保存已完成脱敏处理的文件或数据块,更新其脱敏状态;在需要进行产品开发或测试时,可随时停止离线脱敏,并对主机读取的、尚未进行脱敏处理的文件或数据块进行在线脱敏后,发送给主机,同时保存已脱敏完成的文件或数据块,更新其脱敏状态,从而可随着产品开发或测试活动的停止或进行,使得备份数据的脱敏处理在离线脱敏与在线脱敏之间多次切换,直到备份数据中所有的文件均已完成脱敏处理。
多次在线脱敏相结合的增量脱敏,可以是在产品开发或测试过程中,在主机多次从备份数据中读取文件或数据块时,可对主机每次读取的文件或数据块进行在线脱敏并发送给主机后,均保存已完成脱敏处理的文件或数据块,并更新其脱敏状态,使得在主机下一次读取文件或数据块时,可根据最新的脱敏状态,确定待读取文件或数据块的脱敏状态,并对尚未进行脱敏处理的文件或数据块进行在线脱敏,从而使得每次在线脱敏均在上次在线脱敏的基础上进行,实现多次在线脱敏之间的增量脱敏。
下面将结合具体示例分别对在线脱敏与离线脱敏相结合的文件脱敏方法、多次在线脱敏相结合的文件脱敏方法进行示例性说明。
图5示出根据本申请一实施例的文件脱敏方法的应用示意图。如图5所示,脱敏处理前的备份数据包括五个文件,分别为F1、F2、F3、F4及F5,该备份数据用于产品开发或测试。在产品开发或测试尚未开始时,可对备份数据进行离线脱敏,并保存已完成脱敏处理的文件。在离线脱敏完成对F1的脱敏处理,正在对F2进行脱敏处理(例如F2包括三个数据块,一个数据块已完成脱敏处理,剩余两个数据块尚未进行脱敏处理)的情况下,开发/测试人员启动产品开发或测试,该情况下,可停止离线脱敏,保存已完成脱敏处理的F1(即F1′)、及一个 数据块已完成脱敏处理的F2,并更新F1及F2的脱敏标记及位图,脱敏处理由离线脱敏切换为在线脱敏。其中,备份数据、已完成脱敏处理的文件(例如F1′)及部分完成脱敏处理的文件(例如F2)的存储路径或存储位置均不相同。
产品开发或测试时,主机读取F2,存储设备读取保存的一个数据块已完成脱敏处理的F2,并对F2中尚未完成脱敏处理的两个数据块进行在线脱敏;脱敏处理完成后,存储设备可向主机发送已完成脱敏处理的F2(即F2′),同时更新F2的脱敏标记及位图并保存F2′。
读取F2之后,产品开发或测试暂停,可继续对备份数据进行离线脱敏。在离线脱敏完成对F3的脱敏处理,正在对F4进行脱敏处理(例如F4包括四个数据块,三个数据块已完成脱敏处理,剩余一个数据块尚未进行脱敏处理)的情况下,开发/测试人员再次启动产品开发或测试,该情况下,可停止离线脱敏,保存已完成脱敏处理的F3(即F3′)、及三个数据块已完成脱敏处理的F4,并更新F3及F4的脱敏标记及位图,脱敏处理由离线脱敏切换为在线脱敏。
产品开发或测试时,主机读取F4,存储设备读取保存的三个数据块已完成脱敏处理的F4,并对F4中尚未完成脱敏处理的一个数据块进行在线脱敏;脱敏处理完成后,可向主机发送已完成脱敏处理的F4(即F4′),同时更新F4的脱敏标记及位图并保存F4′。之后,主机读取F5,存储设备从备份数据中读取F5,并对F5进行在线脱敏,然后向主机发送完成脱敏处理的F5(即F5′),同时更新F5的脱敏标记及位图并保存F5′。
在本实施例中的文件脱敏处理,可将在线脱敏与离线脱敏相结合,并保存每次脱敏处理结果,同时更新文件的脱敏状态,从而使得在线脱敏可利用离线脱敏的处理结果,而无需等待离线脱敏完成后才开始产品开发或测试,也使得离线脱敏可以利用在线脱敏的处理结果,彼此增量地进行文件脱敏,逐渐形成净化备份,直到所有文件均已完成脱敏处理。
图6示出根据本申请一实施例的文件脱敏方法的应用示意图。如图6所示,脱敏处理前的备份数据包括五个文件,分别为F1、F2、F3、F4及F5,该备份数据用于产品开发或测试。产品开发或测试之前,并未进行离线脱敏。产品开发或测试时,主机第一次读取文件时,读取F1,存储设备从备份数据中读取F1,并对F1进行在线脱敏,然后向主机发送完成脱敏处理的F1(即F1′),同时更新F1的脱敏标记及位图并保存F1′;
主机第二次读取文件时,读取文件F2和F3,存储设备从备份数据中读取F2和F3,并分别对F2和F3进行在线脱敏,然后向主机发送完成脱敏处理的F2(即F2′)和完成脱敏处理的F3(即F3′),同时更新F2和F3的脱敏标记及位图并保存F2′和F3′;
主机第三次读取文件时,读取文件F4和F5,存储设备从备份数据中读取F4和F5,并分别对F4和F5进行在线脱敏,然后向主机发送完成脱敏处理的F4(即F4′)和完成脱敏处理的F5(即F5′),同时更新F4和F5的脱敏标记及位图并保存F4′和F5′。
在本实施例中,在文件脱敏处理只有在线脱敏的情况下,也可对每次脱敏完成的文件进行保存,使得多次在线脱敏也可增量进行。
图7示出根据本申请一实施例的文件脱敏方法的处理过程的示意图。如图7所示,在产品开发或测试过程中,主机可通过IO路径从存储设备读取文件,存储设备可在步骤S701中,接收主机发送的读文件命令,在步骤S702中,响应于读文件命令,获取待读取文件,其中,待读取文件中的至少一部分数据块已完成脱敏处理,在步骤S703中,获取待读取文件的脱敏标记,并在步骤S704中,判断脱敏标记是否为1,其中,脱敏标记的取值为1或0,脱敏标记为1时,指示待读取文件已完成脱敏处理,脱敏标记为0时,指示待读取文件尚未完成脱 敏处理;
在脱敏标记为1的情况下,待读取文件已完成脱敏处理,存储设备无需再次执行脱敏处理,可在步骤S709中,直接向主机发送已完成脱敏处理的待读取文件;
在脱敏标记不为1(即脱敏标记为0)的情况下,待读取文件尚未完成脱敏处理,存储设置可执行步骤S705,获取待读取文件的位图,并在步骤S706中,根据位图,识别待读取文件中尚未进行脱敏处理的数据块,在步骤S707中,对待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;
在待读取文件的所有数据块均已完成脱敏处理的情况下,可认为待处理文件已完成脱敏处理,存储设置可执行步骤S708,向主机发送已完成脱敏处理的待读取文件,以用于产品开发或测试。
在步骤S707之后,还可执行步骤S709,更新已完成脱敏处理的待读取文件的脱敏标记及位图,并将其保存在存储设备的硬盘中。
在本实施例中,在线脱敏可通过待读取文件的脱敏标记及位图,实时识别待读取文件的脱敏状态,以及待读取文件所包含的各个数据块的脱敏状态,从而可以减少不必要或重复的脱敏处理,减少对用户生产环境(例如产品开发或测试环境)的影响。
在一种可能的实现方式中,在主机及存储设备支持对待读取文件中的数据块进行独立访问时,可根据待读取文件的位图中与待访问的数据块对应的标志位,确定该数据块是否已完成脱敏处理。如果标志位指示该数据块已完成脱敏处理(例如标志位为1),存储设备可直接将该数据块发送给主机,无需再次执行脱敏处理;如果标志位指示该数据块尚未完成脱敏处理(例如标志位为0),存储设备可对该数据块进行脱敏处理,并将完成脱敏处理的数据块发送给主机,同时存储设备还可将完成脱敏处理的数据块保存在硬盘中,并更新位图中对应的标志位。
在一种可能的实现方式中,存储设备将已完成脱敏处理的文件或数据块保存到硬盘时,可同时将已完成脱敏处理的文件或数据块的属性信息也保存到硬盘中。其中,文件或数据块的属性信息可包括文件或数据块的名称、尺寸(例如大小、长度等)、位置等。本申请对文件或数据块的具体属性信息不作限制。
根据本申请的实施例所述的文件脱敏方法,可将在线脱敏与离线脱敏相结合,并对已完成脱敏处理的文件或数据块进行保存,从而不仅可实现产品开发或测试时使用的备份数据的增量脱敏,提高文件脱敏的处理效率,减少在线脱敏对用户生产环境(例如用户开发环境或测试环境)的影响,还可使得脱敏处理可随时在线脱敏与离线脱敏之间切换,进而使得产品开发或测试活动可以随时进行。
图8示出根据本申请一实施例的文件脱敏装置的框图。如图8所示,所述文件脱敏装置应用于存储设备,可通过脱敏引擎来实现,该装置包括:
命令接收模块81,用于接收主机发送的读文件命令,所述读文件命令用于请求待读取文件;
文件获取模块82,响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理;
脱敏处理模块83,用于对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;
文件发送模块84,用于向所述主机发送已完成脱敏处理的待读取文件。
在一种可能的实现方式中,所述文件脱敏装置可位于存储设备内部,可通过硬件、软件或软硬件结合的来实现。所述文件脱敏装置还可位于存储设备外部,可实现为独立的脱敏设备,其中,脱敏设备可以是任何形式的电子设备,例如服务器、台式计算机、移动设备以及其他包含处理器和存储器的任何类型的计算设备。本申请对文件脱敏装置的位置及具体实现方式不作限制。
在一种可能的实现方式中,所述装置还可包括:文件存储模块,用于将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
在一种可能的实现方式中,所述装置还可包括:位图获取模块,用于获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;数据块识别模块,用于根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
在一种可能的实现方式中,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
所述装置还可包括:文件脱敏识别模块,用于在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
在一种可能的实现方式中,所述装置还可包括:脱敏标记修改模块,用于修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
在一种可能的实现方式中,所述装置还可包括:位图修改模块,用于修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
在一种可能的实现方式中,所述待读取文件中的至少一部分数据块通过离线脱敏完成脱敏处理。
在一种可能的实现方式中,所述待读取文件包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。
在一种可能的实现方式中,文件脱敏装置中的上述各个模块,均可通过CPU调用程序指令的方式实现。
需要说明的是,本申请的实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,此处不再一一赘述。
本申请的实施例提供了一种文件脱敏装置,包括:处理器以及用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述方法。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的 例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (18)

  1. 一种文件脱敏方法,其特征在于,所述方法由存储设备执行,所述方法包括:
    接收主机发送的读文件命令,所述读文件命令用于请求待读取文件;
    响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理;
    对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;
    向所述主机发送已完成脱敏处理的待读取文件。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
  3. 根据权利要求1所述的方法,其特征在于,还包括:
    获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;
    根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
  4. 根据权利要求1所述的方法,其特征在于,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
    所述方法还包括:
    在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
  5. 根据权利要求3所述的方法,其特征在于,还包括:
    修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
  6. 根据权利要求4所述的方法,其特征在于,还包括:
    修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
  7. 根据权利要求1所述的方法,其特征在于,所述待读取文件中的至少一部分数据块通过离线脱敏完成脱敏处理。
  8. 根据权利要求1-7中任意一项所述的方法,其特征在于,所述待读取文件包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。
  9. 一种文件脱敏装置,其特征在于,所述装置应用于存储设备,所述装置包括:
    命令接收模块,用于接收主机发送的读文件命令,所述读文件命令用于请求待读取文件;
    文件获取模块,响应于所述读文件命令,获取所述待读取文件,所述待读取文件中的至少一部分数据块已完成脱敏处理;
    脱敏处理模块,用于对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理;
    文件发送模块,用于向所述主机发送已完成脱敏处理的待读取文件。
  10. 根据权利要求9所述的装置,其特征在于,还包括:
    文件存储模块,用于将所述已完成脱敏处理的待读取文件保存在所述存储设备的硬盘中。
  11. 根据权利要求9所述的装置,其特征在于,还包括:
    位图获取模块,用于获取所述待读取文件的位图,所述位图用于指示所述待读取文件所包含的各个数据块是否完成脱敏处理;
    数据块识别模块,用于根据所述位图,识别所述待读取文件中所述尚未进行脱敏处理的数据块。
  12. 根据权利要求9所述的装置,其特征在于,所述待读取文件具有脱敏标记,所述脱敏标记用于指示所述待读取文件是否完成脱敏处理,
    所述装置还包括:
    文件脱敏识别模块,用于在所述对所述待读取文件的尚未进行脱敏处理的数据块进行脱敏处理之前,根据所述脱敏标记,确定所述待读取文件是尚未完成脱敏处理的文件。
  13. 根据权利要求11所述的装置,其特征在于,还包括:
    位图修改模块,用于修改所述位图中与已完成脱敏处理的数据块相对应的标志位,以指示相应的数据块已完成脱敏处理。
  14. 根据权利要求12所述的装置,其特征在于,还包括:
    脱敏标记修改模块,用于修改所述已完成脱敏处理的待读取文件的脱敏标记,以指示所述待读取文件已完成脱敏处理。
  15. 根据权利要求9所述的装置,其特征在于,所述待读取文件中的至少一部分数据块 通过离线脱敏完成脱敏处理。
  16. 根据权利要求9-15中任意一项所述的装置,其特征在于,所述待读取文件包括文本文件、报表文件、图片文件、音频文件及视频文件中的至少一种。
  17. 一种文件脱敏装置,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-8中任意一项所述的方法。
  18. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-8中任意一项所述的方法。
PCT/CN2021/105808 2020-10-27 2021-07-12 文件脱敏方法、装置及存储介质 WO2022088754A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21884491.8A EP4227838A4 (en) 2020-10-27 2021-07-12 FILE DESENSITIZATION METHOD AND APPARATUS AND STORAGE MEDIUM
US18/307,986 US20230315906A1 (en) 2020-10-27 2023-04-27 File anonymization method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011166187.4 2020-10-27
CN202011166187.4A CN114491612A (zh) 2020-10-27 2020-10-27 文件脱敏方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/307,986 Continuation US20230315906A1 (en) 2020-10-27 2023-04-27 File anonymization method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2022088754A1 true WO2022088754A1 (zh) 2022-05-05

Family

ID=81381805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105808 WO2022088754A1 (zh) 2020-10-27 2021-07-12 文件脱敏方法、装置及存储介质

Country Status (4)

Country Link
US (1) US20230315906A1 (zh)
EP (1) EP4227838A4 (zh)
CN (1) CN114491612A (zh)
WO (1) WO2022088754A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952542A (zh) * 2022-12-30 2023-04-11 上海爱数信息技术股份有限公司 一种数据处理方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829789A (zh) * 2018-06-01 2018-11-16 平安普惠企业管理有限公司 日志处理方法、装置、计算机设备和存储介质
CN110110543A (zh) * 2019-03-14 2019-08-09 深圳壹账通智能科技有限公司 数据处理方法、装置、服务器及存储介质
CN110472434A (zh) * 2019-07-12 2019-11-19 北京字节跳动网络技术有限公司 数据脱敏方法、系统、介质和电子设备
CN110502515A (zh) * 2019-08-15 2019-11-26 中国平安财产保险股份有限公司 数据采集方法、装置、设备及计算机可读存储介质
CN110532799A (zh) * 2019-07-31 2019-12-03 平安科技(深圳)有限公司 数据脱敏控制方法、电子装置及计算机可读存储介质
CN110795756A (zh) * 2019-09-25 2020-02-14 江苏满运软件科技有限公司 一种数据脱敏方法、装置、计算机设备及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9338220B1 (en) * 2011-03-08 2016-05-10 Ciphercloud, Inc. System and method to anonymize data transmitted to a destination computing device
WO2017077600A1 (ja) * 2015-11-04 2017-05-11 株式会社 東芝 匿名化システム
CN109964228B (zh) * 2016-09-21 2023-03-28 万事达卡国际股份有限公司 用于数据双重匿名化的方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829789A (zh) * 2018-06-01 2018-11-16 平安普惠企业管理有限公司 日志处理方法、装置、计算机设备和存储介质
CN110110543A (zh) * 2019-03-14 2019-08-09 深圳壹账通智能科技有限公司 数据处理方法、装置、服务器及存储介质
CN110472434A (zh) * 2019-07-12 2019-11-19 北京字节跳动网络技术有限公司 数据脱敏方法、系统、介质和电子设备
CN110532799A (zh) * 2019-07-31 2019-12-03 平安科技(深圳)有限公司 数据脱敏控制方法、电子装置及计算机可读存储介质
CN110502515A (zh) * 2019-08-15 2019-11-26 中国平安财产保险股份有限公司 数据采集方法、装置、设备及计算机可读存储介质
CN110795756A (zh) * 2019-09-25 2020-02-14 江苏满运软件科技有限公司 一种数据脱敏方法、装置、计算机设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4227838A4

Also Published As

Publication number Publication date
EP4227838A4 (en) 2023-12-06
EP4227838A1 (en) 2023-08-16
CN114491612A (zh) 2022-05-13
US20230315906A1 (en) 2023-10-05

Similar Documents

Publication Publication Date Title
US9336217B2 (en) Determining user key-value storage needs from example queries
US9892278B2 (en) Focused personal identifying information redaction
US20210311927A1 (en) Systems and methods for locating application specific data
US20190087130A1 (en) Key-value storage device supporting snapshot function and operating method thereof
US11601443B2 (en) System and method for generating and storing forensics-specific metadata
WO2022088754A1 (zh) 文件脱敏方法、装置及存储介质
US9519780B1 (en) Systems and methods for identifying malware
Rowe Identifying forensically uninteresting files using a large corpus
CN111291001B (zh) 计算机文件的读取方法、装置、计算机系统及存储介质
CA3217234A1 (en) System and method of dynamic search result permission checking
US11514002B2 (en) Indexing splitter for any pit replication
US20140222876A1 (en) File system extended attribute support in an operating system with restricted extended attributes
WO2022048464A1 (zh) 数据脱敏方法、数据脱敏装置以及存储设备
KR101828466B1 (ko) 파일시스템을 기반으로 하는 저장장치에서 객체기반 스토리지 인터페이스를 제공하는 방법 및 장치
CN113505153A (zh) 一种基于iOS系统的备忘录备份方法和相关设备
KR20220073951A (ko) 블룸필터를 이용한 분산식별자 검색 방법
US9569453B1 (en) Systems and methods for simulating file system instances
US10938765B1 (en) Systems and methods for preparing email databases for analysis
WO2020065778A1 (ja) 情報処理装置、制御方法、及びプログラム
Liu et al. Cloud-based personal data protection system and its performance evaluation
US20170220571A1 (en) Information Processing Device, Information Processing Method, and Non-Transitory Computer Readable Medium Storing Information Processing Program
US11899953B1 (en) Method of efficiently identifying rollback requests
KR102513228B1 (ko) 데이터 변환 규칙에 따라 데이터의 표시 형식을 자동 변환하여 이관할 수 있는 전자 장치 및 그 동작 방법
WO2022121385A1 (zh) 一种文件访问方法、存储节点以及网卡
US20230350765A1 (en) Data replication using an extended file replication protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884491

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021884491

Country of ref document: EP

Effective date: 20230508

NENP Non-entry into the national phase

Ref country code: DE