CN112016127A - Method and device for identifying and separating sensitive data of backup system - Google Patents

Method and device for identifying and separating sensitive data of backup system Download PDF

Info

Publication number
CN112016127A
CN112016127A CN202011057703.XA CN202011057703A CN112016127A CN 112016127 A CN112016127 A CN 112016127A CN 202011057703 A CN202011057703 A CN 202011057703A CN 112016127 A CN112016127 A CN 112016127A
Authority
CN
China
Prior art keywords
sensitive data
data
sensitive
identifying
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011057703.XA
Other languages
Chinese (zh)
Inventor
张玉启
任伟
王传安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Chaoshu Software Technology Co ltd
Original Assignee
Shenzhen Chaoshu Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Chaoshu Software Technology Co ltd filed Critical Shenzhen Chaoshu Software Technology Co ltd
Priority to CN202011057703.XA priority Critical patent/CN112016127A/en
Publication of CN112016127A publication Critical patent/CN112016127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and a device for identifying and separating sensitive data of a backup system comprise a sensitive data identifier, a sensitive data algorithm selector, a sensitive data separator and a sensitive data storage strategy device. The sensitive data recognizer is used for recognizing the backed up sensitive data and has 5 different algorithms for selection, and the sensitive data algorithm selector determines to apply one or more algorithms to recognize the data as general data and sensitive data. The sensitive data separator separates the data into general data and sensitive data. The sensitive data storage policy device is used for setting whether the sensitive data need to be stored or not, the sensitive data can be conveniently restored or used after being stored, the sensitive data can be prevented from being leaked without storing the sensitive data, and therefore the safety of backup data is improved.

Description

Method and device for identifying and separating sensitive data of backup system
Technical Field
The invention relates to the technical field of big data and new generation information, in particular to a method and a device for identifying and separating sensitive data of a backup system.
Background
With the advance of 5G, big data, industrial internet, mobile internet, digital economy and digital industrialization, data is becoming bigger and bigger. Data is an underlying resource and is also an important productivity.
The data accumulated in the system is a valuable asset and once lost, the loss is immeasurable. The data loss is caused by various reasons, such as hardware failure, operation error, software defect, computer virus, hacker attack and natural disaster. The problems faced by IT application systems in data protection are mainly:
the funds for data protection are limited and the data protection mechanism is not perfected.
Once data is lost, it is extremely difficult or impossible to recover the data at all.
In general, we can refer to a hardware or system failure as a hard error; human errors, software program errors, virus/hacker corruption are referred to as soft errors or logical errors. Therefore, the reasons for data loss we need to guard against are classified into three categories: (1) hard error (44%); (2) soft error/logical error (53%); (3) disaster (3%).
In the '911 event' of 2001, almost all the government enterprises without remote backup suffer huge data loss, and some companies which put core technology in buildings without backup even fall back because they cannot continue to operate.
5/12/2008, Wenchuan earthquake, destructive destruction of archival data. In a Yushu earthquake, 13 thousands of files in the file pipe are buried in the ruins due to collapse of office buildings, and the rest 12 thousands of files are in a crisis after the earthquake.
2011 3/11 messages 8.9 grade earthquake occurring in northeast japan threatens many technology companies to establish their data centers in tokyo.
In 2013, 4, 20, 8, and 7.0-grade earthquake occurs in the area 02 of lushan county (north latitude 30.3, east longitude 103.0) in yaan city, Sichuan province. The source depth is 13 km. The epicenter distance is about 100 kilometers. Chengdu, Chongqing and Shaanxi Bao chicken, Hanzhong and Ankang have strong vibration sense. According to the emergency report of the Yaan city government, more than 99% of houses collapse in Longxiang county of Lushan county in Vicenter, the health hospital and the residential department stop working, and the water and power are cut off … …
The above-mentioned training of tragic pain has educated us that a disaster may occur around us at any time, and the inability of the computer system to face the disaster will cause service disorder and data loss, with consequent huge and significant economic and trust loss. On the other hand, the dependence on computer systems in various work and life reaches an unprecedented depth, and the dependence becomes a key point to be seriously confronted once the risk resistance after a problem occurs while bringing convenience to the people.
The data backup is an effective method for preventing data loss, but if the data is backed up, sensitive data is backed up at the same time, and sensitive data is easy to leak.
With the advent of the internet, mobile internet, internet of things, 5G, cloud computing and big data era, we are in a ubiquitous digital environment. The work, life, study and entertainment of people can not be supported by data, but the problem of leakage of sensitive data becomes increasingly prominent while people enjoy the convenience and convenience brought to people by digital technology.
Good people all received spam messages and fraud or marketing calls, which all indicate that your data has been revealed! Meanwhile, the issue of MIUI12 exposes more problems of excessive acquisition of system permissions by APP. Since a few years ago, the Ministry of industry and correspondence began to control the APP specially, and opened public numbers to encourage netizens to report.
During the 2020 meeting, one of the chinese national court drafts, which is related to the draft of a law that protects personal privacy and personal data, has attracted attention to a number of media, including external media. Furthermore, the luck agency reported this in an article at 26/5/2020, claiming that it will be the first time china to establish personal privacy and personal data rights, since more people in this 14 billion population country have already implemented "digitization" and are therefore more susceptible to leakage of personal information and hacking.
In the civil court, the individual is confirmed to have the privacy right, the data collector is responsible for protecting the personal information of the individual, and the data similar to the data cannot be collected, disclosed or traded without consent, so that the privacy information of the individual can be effectively protected.
For the backed up data, if the data contains sensitive data, the sensitive data can be identified and separated, the backed up data is divided into general data and sensitive data, and the sensitive data can be selected not to be reserved, so that the problem of leakage of the sensitive data can be solved.
Analysis and comparison of the closest technology in the existing situation.
At present, the closest technical solution in the existing situation is: a data desensitization method and a data desensitization server in data transmission under application No. CN201710103860.1 (hereinafter referred to as "reference 1") and a data classification processing method and apparatus under application No. CN201610720678.6 (hereinafter referred to as "reference 2").
The comparison file 1 is related to a desensitization method and a data desensitization server, and the comparison file 2 is related to a data classification processing method and a data classification processing device, but is greatly different from the scheme.
The comparison file 1 is an invention, and the embodiment of the invention provides a data desensitization method and a data desensitization server in data transmission, wherein the method comprises the steps of judging the type of a data transmission request containing sensitive data sent by a user terminal, and if the type of the data transmission request is determined to be a data downloading request and the total amount of data to be downloaded corresponding to the data downloading request is greater than or equal to a first threshold value, performing data desensitization processing on a database query instruction in the data downloading request; and acquiring corresponding data to be downloaded in the target database according to the database query instruction subjected to data desensitization processing, and sending the data to be downloaded to the corresponding user terminal.
The method mentioned in this document is different from the comparison document 1, and the comparison document 1 is to desensitize the data in transmission and then send the data to the corresponding user terminal. The backup data is identified and separated into general data and sensitive data, transmission is not involved, and the sensitive data can be selected not to be stored.
The comparison document 2 is also an invention, and discloses a data classification processing method and device, wherein the method comprises the following steps: receiving data classification item data input by a user; the data classification item data comprises a classification item code and a classification item name; acquiring information of historical data classification items; the historical data classification item comprises a plurality of records, and each record at least comprises: historical classification item codes, historical classification item names and classification item versions; determining whether a first history record exists in the plurality of records that is encoded the same as the classification item; if so, determining whether the classification item name is the same as a historical classification item name of the first history record; if not, adding the data classification item data into the historical data classification item to form a new record; and the classification item version of the newly added record is the next version of the classification item version of the first historical record.
The method mentioned in this document is different from the comparison document 2, and the comparison document 2 provides a processing method and device for data classification, and the data classification item data includes classification item codes and classification item names. The backup data is identified and separated into general data and sensitive data, the classification of the data is not involved, the code of the classification item and the name of the classification item are not involved, and the sensitive data can be selected not to be stored.
Disclosure of Invention
In order to solve the technical problems, the invention aims to: the method and the device for identifying and separating the sensitive data of the backup system comprise a sensitive data identifier, a sensitive data algorithm selector, a sensitive data separator and a sensitive data storage strategy device, wherein: the sensitive data recognizer is used for recognizing the backed up sensitive data and has a plurality of different algorithms; a sensitive data algorithm selector for identifying a selection of a backed up sensitive data algorithm; the sensitive data separator is used for separating the backed-up data into general data and sensitive data; and the sensitive data storage policy device is used for selecting whether to store the sensitive data.
A method and a device for identifying and separating sensitive data of a backup system are provided, and the identified and separated data comprise: structured data, semi-structured data, and unstructured data.
A method and apparatus for identifying and separating sensitive data of a backup system is provided, wherein the sensitive data identification has a plurality of different algorithms, and one or more of the different algorithms can be selected by a sensitive data algorithm selector.
A method and a device for identifying and separating sensitive data of a backup system can separate backup data into general data and sensitive data, and can select to store the sensitive data while storing the general data or select not to store the sensitive data while storing the general data.
Drawings
Fig. 1 is a diagram of a method and apparatus for identifying and separating sensitive data of a backup system.
FIG. 2 is a sensitive data storage policer flow diagram.
Detailed Description
Fig. 1 is a structural diagram of a method and an apparatus for identifying and separating sensitive data of a backup system. The whole system consists of a sensitive data identifier, a sensitive data algorithm selector, a sensitive data separator and a sensitive data storage strategy device. Wherein: the sensitive data recognizer is used for recognizing the backed up sensitive data and has a plurality of different algorithms; a sensitive data algorithm selector for identifying a selection of a backed up sensitive data algorithm; the sensitive data separator is used for separating the backed-up data into general data and sensitive data; and the sensitive data storage policy device is used for selecting whether to store the sensitive data.
The sensitive data is identified through a sensitive data identifier, and a plurality of different algorithms are built in the sensitive data identifier. Specifically, the method comprises the following steps:
algorithm 1, capable of automatically identifying and separating sensitive data, includes: email box, Chinese address, company name, unit name, Chinese name, cell phone number, landline number, date, tax number, unified social credit code, identification card, amount, bank card number, permanent residence permit, Harbour and Australian pass, China passport, zip code, military officer's license, IP address, MAC address, license plate number, frame number, integer, customer name, fund code, fund name, stock code, stock name;
the algorithm 2 is used for automatically identifying and separating sensitive data based on the custom type of the regular expression, namely accurately identifying the sensitive data according to the regular expression set by a user;
the algorithm 3 is based on the self-defined type of the feature dictionary, and can automatically identify and separate sensitive data according to the feature dictionary set by a user, wherein the feature dictionary refers to a set of data with extreme and severe conditions;
the algorithm 4 is used for automatically identifying and separating sensitive data based on the self-defined type of the mixed type, namely, the sensitive data can be combined according to a plurality of single rules set by a user to identify a plurality of sections of sensitive data stored in one data unit;
and the algorithm 5 automatically identifies and separates the sensitive data based on the self-defined type of the self-defined function, namely, the identification and separation rules of the developed sensitive data can be individually customized according to the actual needs of the user.
One or more of these different algorithms are selected by the sensitive data algorithm selector and applied to the backup data, including: structured data, semi-structured data, and unstructured data.
The data identified and shared are divided into general data and sensitive data. And for the sensitive data, whether the sensitive data are stored or not is determined by the sensitive data storage policy device, and the sensitive data can be stored while the general data are stored, or the sensitive data are not stored while the general data are stored, so that the sensitive data are not leaked due to backup.
FIG. 2 is a flow diagram of a sensitive data storage policer. The sensitive data storage policy unit judges whether the sensitive data needs to be stored at the same time as the general data. According to the requirements of different users, some users need to reserve the sensitive data to facilitate later recovery or use, and some users need to improve the data security and directly select the sensitive data not to be stored, so that the security of backup data is improved. The selection right is handed to the user, and the user can decide according to the conditions of own industry and business.
Finally, it should be emphasized that the detailed description of the embodiments of the present application with reference to the drawings is not limited to the above embodiments, and those skilled in the art can make various modifications or alterations without departing from the spirit and scope of the claims of the present application. Other embodiments obtained by those skilled in the art according to the technical solutions of the present disclosure also belong to the scope of protection of the present disclosure.

Claims (5)

1. A method and a device for identifying and separating sensitive data of a backup system are characterized by comprising a sensitive data identifier, a sensitive data algorithm selector, a sensitive data separator and a sensitive data storage strategy device, wherein:
the sensitive data recognizer is used for recognizing the backed up sensitive data and has a plurality of different algorithms;
a sensitive data algorithm selector for identifying a selection of a backed up sensitive data algorithm;
the sensitive data separator is used for separating the backed-up data into general data and sensitive data;
and the sensitive data storage policy device is used for selecting whether to store the sensitive data.
2. The sensitive data identifier of claim 1, characterized in that the data identified and separated comprises: structured data, semi-structured data, and unstructured data.
3. A method and a device for identifying and separating sensitive data of a backup system are characterized in that a sensitive data identifier of the backup system is provided with a plurality of different algorithms, and one or more of the different algorithms can be selected by a sensitive data algorithm selector.
4. A method and a device for identifying and separating sensitive data of a backup system are characterized in that sensitive data identification of the backup system has a plurality of different algorithms, specifically:
algorithm 1, capable of automatically identifying and separating sensitive data, includes: email box, Chinese address, company name, unit name, Chinese name, cell phone number, landline number, date, tax number, unified social credit code, identification card, amount, bank card number, permanent residence permit, Harbour and Australian pass, China passport, zip code, military officer's license, IP address, MAC address, license plate number, frame number, integer, customer name, fund code, fund name, stock code, stock name;
the algorithm 2 is used for automatically identifying and separating sensitive data based on the custom type of the regular expression, namely accurately identifying the sensitive data according to the regular expression set by a user;
the algorithm 3 is based on the self-defined type of the feature dictionary, and can automatically identify and separate sensitive data according to the feature dictionary set by a user, wherein the feature dictionary refers to a set of data with extreme and severe conditions;
the algorithm 4 is used for automatically identifying and separating sensitive data based on the self-defined type of the mixed type, namely, the sensitive data can be combined according to a plurality of single rules set by a user to identify a plurality of sections of sensitive data stored in one data unit;
and the algorithm 5 automatically identifies and separates the sensitive data based on the self-defined type of the self-defined function, namely, the identification and separation rules of the developed sensitive data can be individually customized according to the actual needs of the user.
5. The sensitive data storage policy maker of claim 1, wherein the sensitive data may be selected to be stored simultaneously with the general data, or may be selected not to be stored simultaneously with the general data.
CN202011057703.XA 2020-09-30 2020-09-30 Method and device for identifying and separating sensitive data of backup system Pending CN112016127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011057703.XA CN112016127A (en) 2020-09-30 2020-09-30 Method and device for identifying and separating sensitive data of backup system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011057703.XA CN112016127A (en) 2020-09-30 2020-09-30 Method and device for identifying and separating sensitive data of backup system

Publications (1)

Publication Number Publication Date
CN112016127A true CN112016127A (en) 2020-12-01

Family

ID=73528115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011057703.XA Pending CN112016127A (en) 2020-09-30 2020-09-30 Method and device for identifying and separating sensitive data of backup system

Country Status (1)

Country Link
CN (1) CN112016127A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526836A (en) * 2017-09-07 2017-12-29 山东省城市商业银行合作联盟有限公司 Bank's retail deposit business datum analysis system and method based on big data
CN109344258A (en) * 2018-11-28 2019-02-15 中国电子科技网络信息安全有限公司 A kind of intelligent self-adaptive sensitive data identifying system and method
CN109582861A (en) * 2018-10-29 2019-04-05 复旦大学 A kind of data-privacy information detecting system
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium
CN111143884A (en) * 2019-12-31 2020-05-12 北京懿医云科技有限公司 Data desensitization method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526836A (en) * 2017-09-07 2017-12-29 山东省城市商业银行合作联盟有限公司 Bank's retail deposit business datum analysis system and method based on big data
CN109582861A (en) * 2018-10-29 2019-04-05 复旦大学 A kind of data-privacy information detecting system
CN109344258A (en) * 2018-11-28 2019-02-15 中国电子科技网络信息安全有限公司 A kind of intelligent self-adaptive sensitive data identifying system and method
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium
CN111143884A (en) * 2019-12-31 2020-05-12 北京懿医云科技有限公司 Data desensitization method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110263585B (en) Test supervision method, device, equipment and storage medium
US9654510B1 (en) Match signature recognition for detecting false positive incidents and improving post-incident remediation
Talha et al. Analysis of research on amazon AWS cloud computing seller data security
US7313825B2 (en) Data security system and method for portable device
US7540021B2 (en) System and methods for an identity theft protection bot
US20040210763A1 (en) Confidential data sharing and anonymous entity resolution
EP1412868A1 (en) Computer software product for data security of sensitive words characters or icons
US20210133742A1 (en) Detection of security threats in a network environment
CN101848092A (en) Malicious code detection method and device
CN102906756A (en) Security threat detection associated with security events and actor category model
CN115238286A (en) Data protection method and device, computer equipment and storage medium
JP3762935B1 (en) Information processing apparatus, file management system, and file management program
CN112016127A (en) Method and device for identifying and separating sensitive data of backup system
Landry et al. Dispelling 10 common disaster recovery myths: Lessons learned from hurricane katrina and other disasters
CN112398724A (en) E-mail sending method and system
CN116662987A (en) Service system monitoring method, device, computer equipment and storage medium
CN106844005A (en) Based on data reconstruction method and system under virtualized environment
JP3928006B2 (en) Customer information management system
CN114237517A (en) File decentralized storage method and device
CN113326501A (en) Information layering detection method and device based on block chain
CN110309312B (en) Associated event acquisition method and device
Jones et al. The 2006 analysis of information remaining on disks offered for sale on the second hand market
CN111506651A (en) Data storage method and device
JP3799383B2 (en) Customer information management system and information processing apparatus with customer information management function
CN107168823A (en) A kind of method and apparatus of Java Process Protections

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201201