CN110147680B - Method for optimizing data extraction - Google Patents

Method for optimizing data extraction Download PDF

Info

Publication number
CN110147680B
CN110147680B CN201910456202.XA CN201910456202A CN110147680B CN 110147680 B CN110147680 B CN 110147680B CN 201910456202 A CN201910456202 A CN 201910456202A CN 110147680 B CN110147680 B CN 110147680B
Authority
CN
China
Prior art keywords
data
desensitization
field
encryption
result set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910456202.XA
Other languages
Chinese (zh)
Other versions
CN110147680A (en
Inventor
王冠中
杨超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201910456202.XA priority Critical patent/CN110147680B/en
Publication of CN110147680A publication Critical patent/CN110147680A/en
Application granted granted Critical
Publication of CN110147680B publication Critical patent/CN110147680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method for optimizing data extraction, which relates to the technical field of data processing and adopts the technical scheme that the method comprises the following steps: collecting data; the method comprises the steps of judging whether collected data are non-private data or private data, if the collected data are the non-private data, directly outputting and generating a result set, if the collected data are the private data, further judging whether the private data have association requirements or not, if the association requirements exist, carrying out encryption operation on the private data in a data encryption mode, if the association requirements do not exist, carrying out desensitization operation on the private data in a data desensitization mode, and after encryption or desensitization operation, outputting the data and generating the result set. And the data in the result set can be output outwards. The invention can encrypt the collected privacy data while collecting the complete data, and avoids the privacy data from being directly output so as to reveal the privacy of the user.

Description

Method for optimizing data extraction
Technical Field
The invention relates to the technical field of data processing, in particular to a method for optimizing data extraction.
Background
The credit information platform needs to use a large amount of enterprise and personal data, and as the data collected by the credit information platform is more and more, a part of privacy data related to personal privacy is inevitably collected. For the part of data, the extracted data is generally directly transmitted to the field needing to be used in the collection and use process, and personal privacy is easily revealed.
Of course, with the development of network technology, the protection of private data becomes more and more obvious. At present, data desensitization operation is usually performed on private data in a credit information platform, the data desensitization operation generally provides a desensitization method for ten common private data in total, such as an identity card, a contact phone, a mailbox, a home address and the like, and if some items have special private data and need desensitization operation, additional data needs to be added in the method. When the encryption field is configured in the data desensitization stage, the input and output of the field need one-to-one maintenance, and if the data volume is large and the number of the fields is large, the maintenance time is long.
Disclosure of Invention
The invention provides a method for optimizing data extraction aiming at the requirements and the defects of the prior art development.
The invention discloses a method for optimizing data extraction, which adopts the following technical scheme for solving the technical problems:
a method of optimizing data extraction, the method comprising:
1) collecting data;
2) judging whether the collected data is non-private data or private data:
2a) acquiring data as non-private data, and executing the step 3);
2b) the collected data is the privacy data, and whether the privacy data has the association requirement or not is further judged:
2b-1) the association requirement exists, the private data is encrypted by adopting a data encryption mode, and the step 3) is executed;
2b-2) no association requirement exists, desensitizing operation is carried out on the private data in a data desensitization mode, and the step 3) is executed;
3) and outputting the data and generating a result set, wherein the data in the result set can be output outwards.
Specifically, the specific process of performing the encryption operation on the private data includes:
reading all field information of the private data through JAVA;
screening fields needing encryption according to the reference;
carrying out encryption operation on fields needing encryption by adopting an SM4 national password symmetric algorithm;
adding the encrypted fields into the specified fields of the result set;
the data contained in the processed result set can be output to the outside.
Specifically, when the SM4 cryptographic symmetric algorithm is used to encrypt a field to be encrypted:
the used encryption key supports two modes of a fixed key and a user-defined key;
storing the fixed secret key in the appointed path;
the custom keys are applicable to specific types of private data.
Optionally, the specific process of performing desensitization operation on the private data includes:
reading all field information of the private data through JAVA;
screening fields needing desensitization and desensitization types;
carrying out corresponding desensitization operation on fields needing desensitization in JAVA according to the desensitization type;
adding the desensitized field to a specified field of the result set;
the data contained in the processed result set can be output to the outside.
Further, during the desensitization operation of the private data:
when fields needing desensitization are screened, the desensitization types are divided according to screening results;
defining a desensitization rule according to the desensitization type, wherein the desensitization rule comprises shielding, deformation, replacement, randomness, format reservation encryption and a data encryption algorithm;
in JAVA, at least one desensitization rule is selected according to the desensitization type of a field needing desensitization, and desensitization operation is carried out on the field needing desensitization.
Optionally, the specific process of performing desensitization operation on the private data includes:
locating a sensitive field in certain private data;
formulating a rule generated by the sensitive field, and storing the rule into a sensitive field generation rule base;
reading original privacy data, calling a sensitive field to generate a rule base when the sensitive field is found, generating a new field different from the sensitive field by using a rule generated by the sensitive field corresponding to the sensitive field generation rule base, replacing the sensitive field by the new field according to a certain transformation rule until all sensitive fields in the original privacy data are replaced, forming a desensitized field, storing the desensitized field in a result set, and outputting data contained in the result set.
Further, rules for sensitive field generation are formulated, including masking, morphing, substitution, randomization, format preserving encryption, and data encryption algorithms.
Specifically, the data can be collected by connecting an external device through a universal interface, or by manually inputting the data.
Compared with the prior art, the method for optimizing data extraction has the following beneficial effects:
the method and the device sequentially collect data, judge the type of the collected data, further judge whether the privacy data has relevance, encrypt or safely output the collected data according to the relevance of the privacy data and the operation of data desensitization, can encrypt the collected privacy data while collecting complete data, and avoid the privacy data from being directly output so as to reveal the privacy of a user.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
In order to make the technical solutions, technical problems to be solved, and technical effects of the present invention more clearly apparent, the technical solutions of the present invention are described below in detail and completely with reference to specific embodiments, and it is obvious that the described embodiments are only a part of embodiments of the present invention, but not all embodiments. All embodiments that can be obtained by a person skilled in the art without making any inventive step on the basis of the embodiments of the present invention are within the scope of protection of the present invention.
The first embodiment is as follows:
with reference to fig. 1, the present embodiment provides a method for optimizing data extraction,
a method of optimizing data extraction, the method comprising:
1) collecting data by a credit information platform;
2) judging whether the collected data is non-private data or private data:
2a) acquiring data as non-private data, and executing the step 3);
2b) the collected data is the privacy data, and whether the privacy data has the association requirement or not is further judged:
2b-1), carrying out encryption operation on private data by adopting a data encryption mode and executing the step 3) if correlation requirements exist;
2b-2) no association requirement exists, desensitizing the private data by adopting a data desensitization mode, and executing the step 3);
3) and outputting the data and generating a result set, wherein the data in the result set can be output externally.
In this embodiment, the specific process of performing an encryption operation on private data includes:
reading all field information of the private data through JAVA;
screening fields needing encryption according to the reference;
carrying out encryption operation on fields needing to be encrypted by adopting an SM4 national password symmetric algorithm;
adding the encrypted fields into the specified fields of the result set;
the data contained in the processed result set can be output to the outside.
In this embodiment, when the SM4 cryptographic symmetric algorithm is used to perform an encryption operation on a field to be encrypted:
the used encryption key supports two modes of a fixed key and a user-defined key;
storing the fixed secret key in the appointed path;
the custom key is applicable to a specific type of private data.
In this embodiment, the specific process of performing desensitization operation on the private data includes:
reading all field information of the private data through JAVA;
screening fields needing desensitization and desensitization types;
carrying out corresponding desensitization operation on fields needing desensitization in JAVA according to the desensitization type;
adding the desensitized fields to the specified fields of the result set;
and the data contained in the processed result set can be output to the outside.
In the process of desensitizing the private data:
when fields needing desensitization are screened, the desensitization types are divided according to screening results;
defining desensitization rules according to desensitization types, wherein the desensitization rules comprise shielding, deformation, replacement, randomness, format preserving encryption and data encryption algorithms;
in JAVA, at least one desensitization rule is selected according to the desensitization type of the field needing desensitization, and desensitization operation is carried out on the field needing desensitization.
In this embodiment, the credit information platform may be connected to an external device through a general-purpose interface, so as to collect data.
Example two:
with reference to fig. 1, this embodiment provides a method for optimizing data extraction, where the method includes:
1) collecting data;
2) judging whether the collected data is non-private data or private data:
2a) acquiring data as non-private data, and executing the step 3);
2b) the collected data is the privacy data, and whether the privacy data has the association requirement or not is further judged:
2b-1) the association requirement exists, the private data is encrypted by adopting a data encryption mode, and the step 3) is executed;
2b-2) no association requirement exists, desensitizing operation is carried out on the private data in a data desensitization mode, and the step 3) is executed;
3) and outputting the data and generating a result set, wherein the data in the result set can be output externally.
In this embodiment, the specific process of performing the encryption operation on the private data includes:
reading all field information of the private data through JAVA;
screening fields needing encryption according to the reference;
carrying out encryption operation on fields needing encryption by adopting an SM4 national password symmetric algorithm;
adding the encrypted fields into the specified fields of the result set;
the data contained in the processed result set can be output to the outside.
In this embodiment, when the SM4 cryptographic symmetric algorithm is used to perform an encryption operation on a field to be encrypted:
the used encryption key supports two modes of a fixed key and a user-defined key;
storing the fixed secret key in the appointed path;
the custom key is applicable to a specific type of private data.
In this embodiment, the specific process of performing desensitization operation on the private data includes:
locating a sensitive field in certain private data;
formulating a rule generated by the sensitive field, and storing the rule into a sensitive field generation rule base;
reading original privacy data, calling a sensitive field to generate a rule base when the sensitive field is found, generating a new field different from the sensitive field by using a rule generated by the corresponding sensitive field in the sensitive field generation rule base, replacing the sensitive field by the new field according to a certain transformation rule until all sensitive fields in the original privacy data are replaced, forming a desensitization field, storing the desensitization field in a result set, and outputting data contained in the result set.
Rules for sensitive field generation are formulated, including masking, morphing, substitution, randomization, format preserving encryption, and data encryption algorithms.
In this embodiment, the credit information platform not only connects to the external device through the universal interface to collect data, but also collects data through manual input.
In summary, the method for optimizing data extraction of the invention sequentially collects data, judges the type of the collected data, further judges whether the privacy data has relevance, and outputs the collected data safely in operation of encryption or data desensitization according to the relevance of the privacy data.
The principle and embodiments of the present invention are described in detail by using specific examples, which are only used to help understanding the core technical content of the present invention, and are not used to limit the protection scope of the present invention, and the technical solution of the present invention is not limited to the above specific embodiments. Based on the above embodiments of the present invention, those skilled in the art should make any improvements and modifications to the present invention without departing from the principle of the present invention, and all such modifications and modifications should fall within the scope of the present invention.

Claims (6)

1. A method for optimizing data extraction, the method comprising:
1) collecting data;
2) judging whether the collected data is non-private data or private data:
2a) acquiring data as non-private data, and executing the step 3);
2b) the collected data is the privacy data, and whether the privacy data has the association requirement or not is further judged:
2b-1), the method adopts a data encryption mode to carry out encryption operation on private data, and the specific encryption process comprises the following steps: reading all field information of the private data through JAVA, screening fields needing to be encrypted according to the transmission parameters, carrying out encryption operation on the fields needing to be encrypted by adopting an SM4 national password symmetric algorithm, adding the encrypted fields into the specified fields of the result set, outputting the data contained in the processed result set, and executing the step 3);
2b-2) no association requirement exists, a data desensitization mode is adopted to perform desensitization operation on the private data, and the specific desensitization process comprises the following steps: reading all field information of the private data through JAVA, screening out fields needing desensitization and desensitization types, carrying out corresponding desensitization operation on the fields needing desensitization in JAVA according to the desensitization types, adding the desensitized fields into specified fields of a result set, outputting data contained in the processed result set to the outside, and executing step 3);
3) and outputting the data and generating a result set, wherein the data in the result set can be output externally.
2. The method for optimizing data extraction as claimed in claim 1, wherein when the SM4 algorithm is used to perform encryption operation on the field to be encrypted:
the used encryption key supports two modes of a fixed key and a user-defined key;
storing the fixed secret key in the appointed path;
the custom key is applicable to a specific type of private data.
3. A method of optimising data extraction as claimed in claim 1 wherein, during the desensitization operation on the private data:
when fields needing desensitization are screened, desensitization types are divided according to screening results;
defining desensitization rules according to desensitization types, wherein the desensitization rules comprise shielding, deformation, replacement, randomness, format preserving encryption and data encryption algorithms;
in JAVA, at least one desensitization rule is selected according to the desensitization type of the field needing desensitization, and desensitization operation is carried out on the field needing desensitization.
4. The method for optimizing data extraction according to claim 1, wherein the specific process of desensitizing the private data includes:
locating a sensitive field in certain private data;
formulating a rule generated by the sensitive field, and storing the rule into a sensitive field generation rule base;
reading original privacy data, calling a sensitive field to generate a rule base when the sensitive field is found, generating a new field different from the sensitive field by using a rule generated by the sensitive field corresponding to the sensitive field generation rule base, replacing the sensitive field by the new field according to a certain transformation rule until all sensitive fields in the original privacy data are replaced, forming a desensitized field, storing the desensitized field in a result set, and outputting data contained in the result set.
5. The method of claim 4, wherein rules for sensitive field generation are formulated, the rules including masking, morphing, substitution, randomization, format preserving encryption, and data encryption algorithms.
6. The method for optimizing data extraction as claimed in claim 1, wherein the data collection can be performed by connecting an external device through a universal interface, or by manually inputting the data collection.
CN201910456202.XA 2019-05-29 2019-05-29 Method for optimizing data extraction Active CN110147680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910456202.XA CN110147680B (en) 2019-05-29 2019-05-29 Method for optimizing data extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910456202.XA CN110147680B (en) 2019-05-29 2019-05-29 Method for optimizing data extraction

Publications (2)

Publication Number Publication Date
CN110147680A CN110147680A (en) 2019-08-20
CN110147680B true CN110147680B (en) 2022-07-26

Family

ID=67593715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910456202.XA Active CN110147680B (en) 2019-05-29 2019-05-29 Method for optimizing data extraction

Country Status (1)

Country Link
CN (1) CN110147680B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203145A (en) * 2016-08-04 2016-12-07 北京网智天元科技股份有限公司 Data desensitization method and relevant device
CN108418676A (en) * 2018-01-26 2018-08-17 山东超越数控电子股份有限公司 A kind of data desensitization method based on permission
CN109614816A (en) * 2018-11-19 2019-04-12 平安科技(深圳)有限公司 Data desensitization method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9467279B2 (en) * 2014-09-26 2016-10-11 Intel Corporation Instructions and logic to provide SIMD SM4 cryptographic block cipher functionality

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203145A (en) * 2016-08-04 2016-12-07 北京网智天元科技股份有限公司 Data desensitization method and relevant device
CN108418676A (en) * 2018-01-26 2018-08-17 山东超越数控电子股份有限公司 A kind of data desensitization method based on permission
CN109614816A (en) * 2018-11-19 2019-04-12 平安科技(深圳)有限公司 Data desensitization method, device and storage medium

Also Published As

Publication number Publication date
CN110147680A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110933063B (en) Data encryption method, data decryption method and equipment
Anwar et al. Forensic SIM card cloning using authentication algorithm
CN109698884B (en) Fraud call identification method and system
EP3220573A1 (en) Method and system for controlling encryption of information and analyzing information as well as terminal
CN102387343A (en) Terminal device, server, data processing system, data processing method, and program
CN109271798A (en) Sensitive data processing method and system
WO2020233014A1 (en) Message sending method and apparatus, and computer device and storage medium
CN106101092A (en) A kind of information evaluation processing method and first instance
CN112529586B (en) Transaction information management method, device, equipment and storage medium
CN112039902A (en) Data encryption method and device
CN113836578A (en) Method and system for maintaining security of sensitive data of big data
CN112287371B (en) Method and device for storing industrial data and computer equipment
CN110147680B (en) Method for optimizing data extraction
Mudgal et al. Application of genetic algorithm in cryptanalysis of mono-alphabetic substitution cipher
CN116861477A (en) Data processing method, system, terminal and storage medium based on privacy protection
CN115422579A (en) Data encryption storage and query method and system after storage
CN102768671B (en) File processing method and system
CN110990848A (en) Sensitive word encryption method and device based on hive data warehouse and storage medium
CN113946862A (en) Data processing method, device and equipment and readable storage medium
CN115292729A (en) Privacy-protecting multi-party data processing method, device and equipment
CN113674083A (en) Internet financial platform credit risk monitoring method, device and computer system
CN103902921A (en) File encryption method and system
CN108632228B (en) Decision engine scheduling method and system
CN111914271B (en) Privacy protection system and method for big data release
Gardazi et al. Email system architecture for HITECH compliance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 271000 Langchao science and Technology Park, 527 Dongyue street, Tai'an City, Shandong Province

Applicant after: INSPUR SOFTWARE Co.,Ltd.

Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Applicant before: INSPUR SOFTWARE Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant