CN115935421B - Data product release method, system and storage medium - Google Patents

Data product release method, system and storage medium Download PDF

Info

Publication number
CN115935421B
CN115935421B CN202211664319.5A CN202211664319A CN115935421B CN 115935421 B CN115935421 B CN 115935421B CN 202211664319 A CN202211664319 A CN 202211664319A CN 115935421 B CN115935421 B CN 115935421B
Authority
CN
China
Prior art keywords
data
product
distributed
information
compliance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211664319.5A
Other languages
Chinese (zh)
Other versions
CN115935421A (en
Inventor
顾逸圣
刘汪根
陆懿庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202211664319.5A priority Critical patent/CN115935421B/en
Publication of CN115935421A publication Critical patent/CN115935421A/en
Application granted granted Critical
Publication of CN115935421B publication Critical patent/CN115935421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention discloses a data product release method, a system and a storage medium. The method comprises the following steps: according to the acquired metadata information of the data to be issued and the acquired compliance requirements of the data products, static desensitization and security policy configuration are carried out on the data to be issued, and the compliance data to be issued is stored and determined; determining data communication information according to a storage mode of the compliance data to be released, and determining version information according to the modification times of the compliance data to be released; packaging and determining metadata information, security policy, data communication information and version information as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information; and determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system. The technical scheme provided by the embodiment of the invention ensures the compliance of the whole data transaction process.

Description

Data product release method, system and storage medium
Technical Field
The present invention relates to the field of data security technologies, and in particular, to a method, a system, and a storage medium for publishing a data product.
Background
In the current society, data is taken as a novel production element, and becomes a key element for high-quality development and comprehensive digitization of the economic society. The data resource becomes a production element and strategic asset of the human society increasingly, the opening and circulation of the data are the precondition and foundation of the value expression, and under the related legal and legal requirements of information and data security protection, how to safely open and share the data gradually becomes a problem to be solved in order to play the value of the data element of each industry.
In the process of data transaction, national laws and regulations have strict safety compliance requirements for each step, and in the current scheme of data transaction and circulation, a certain isolated process in the life cycle of data is focused, such as processing of adding noise to personal information in the data of the transaction, and marking the process of application, approval and transaction by using a blockchain intelligent contract technology, so that real-time access and freezing and the like are performed on information issued on a chain when risks are found.
However, the compliance of the data products is not systematically limited by the laws and regulations in China in the prior art, and the improvement of each isolated process is difficult to combine to achieve better improvement effect. If the personal information in the data is processed through the differential privacy, the possibility of the personal information is limited greatly, the self-defined query analysis of the data cannot be realized, and the blockchain technology cannot meet the requirements of high throughput, low storage cost and easy deployment, so that the limitation in the existing data transaction process is more, and the data product obtained by improvement of the isolation process possibly has security risks for data transaction due to insufficient security inspection.
Disclosure of Invention
The invention provides a data product release method, a system and a storage medium, which ensure the compliance of data release and the safety in the data release process and reduce the risk of data circulation.
In a first aspect, an embodiment of the present invention provides a data product publishing method, applied to a data providing end of a data product publishing system, where the method includes:
extracting data information of the acquired data to be distributed, and determining corresponding metadata information;
according to the metadata information and the acquired compliance requirements of the data products, static desensitization and security policy configuration are carried out on the data to be released, and the compliance data to be released is determined and stored;
determining data communication information according to a storage mode of the compliance data to be released, and determining version information according to the modification times of the compliance data to be released;
packaging and determining metadata information, security policy, data communication information and version information as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information;
and determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system.
In a second aspect, an embodiment of the present invention further provides a data product publishing method, applied to a data market of a data product publishing system, where the method includes:
after receiving a data product application request of a data demand end of a data product release system, checking a data consumption environment of the data demand end;
if the data consumption environment is consistent with the consumption mode of the data product corresponding to the data product application request and comprises an encryption database and a development tool corresponding to the consumption mode, pushing the security policy of the data product to the encryption database;
and transmitting the connection information of the encryption database and the data connection information of the data product to a data providing end corresponding to the data product so that the encryption database receives the compliance data corresponding to the data product.
In a third aspect, an embodiment of the present invention further provides a data product publishing system, including at least one data providing end, at least one data demand end, and a data market, where each data providing end and each data demand end are located in mutually isolated domains;
the data providing end is used for carrying out data information extraction, data compliance processing, data communication information determination, version information determination and consumption mode determination on the acquired data to be distributed, generating a data product to be distributed, which contains a product number, and distributing the data product to be distributed to a data market;
The data market is used for determining a target data product according to the data product application request when the data product application request of the data demand end is received, and checking the data consumption environment of the data demand end according to the consumption mode corresponding to the target data product;
the data demand end is used for receiving the security policy of the target data product pushed by the data market when the data demand end has a data consumption environment, and storing the security policy into the encryption database;
the data market is used for sending the connection information of the encryption database and the data communication information of the target data product to the data providing end;
the data providing end is used for pushing the compliance data corresponding to the target data product to the encryption database according to the connection information of the encryption database and the data communication information of the target data product;
the product to be distributed comprises metadata information, a security policy, data communication information and version information, and the product number comprises the metadata information, the security policy, the data communication information, the version information and a consumption mode.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium storing computer instructions for causing a processor to execute the data product issuing method according to any embodiment of the present invention.
According to the data product publishing method, system and storage medium provided by the embodiment of the invention, the corresponding metadata information is determined by extracting the data information of the acquired data to be published; according to the metadata information and the acquired compliance requirements of the data products, static desensitization and security policy configuration are carried out on the data to be released, and the compliance data to be released is determined and stored; determining data communication information according to a storage mode of the compliance data to be released, and determining version information according to the modification times of the compliance data to be released; packaging and determining metadata information, security policy, data communication information and version information as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information; and determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system. By adopting the technical scheme, after the data source is developed and obtained at the data providing end, basic information is extracted from the data to be released, static desensitization and configuration of a safety strategy are carried out on the data to be released according to the extracted basic information and the compliance requirement of the data product obtained in advance, so that the corresponding safety strategy is dynamically configured for different types of data demand ends on the basis that the configured data to be released meets the national legal regulation requirement, and the data communication information determined according to the storage mode of the data to be released, the version information determined according to the modification times, the consumption mode defined according to the data communication information and the like are combined, the data product to be released is numbered, and the release of the data product to the data market is completed. The method and the system have the advantages that parameters for keeping data compliance in the whole data life cycle are completely included in the issued data product, the problem that safety control of data circulation is isolated and safety of the issued data product is difficult to guarantee is solved, and through sending compliance data which corresponds to the data product and is subjected to static desensitization and contains a safety strategy to a data demand end of a request data product, control precision of a data providing end on a data using mode of the data demand end is enhanced, compliance of the whole data transaction process is guaranteed, the issued data product is guaranteed, and safety risks are avoided in the using process.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data product publishing method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a data product publishing method according to a second embodiment of the present invention;
FIG. 3 is a schematic flow chart of determining static desensitized data by performing de-identification processing on data to be published in a second embodiment of the invention;
FIG. 4 is a flowchart of a data product publishing method according to a third embodiment of the present invention;
FIG. 5 is a schematic diagram of a data product distribution system according to a fourth embodiment of the present invention;
Fig. 6 is a diagram showing an example of the structure of a data product distribution system according to the fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data product publishing method provided by an embodiment of the present invention, where the embodiment of the present invention is applicable to a case where a data provider builds and completes publishing a data product, and the method may be applied to a data provider of a data product publishing system, where the data provider may be implemented by software and/or hardware, and the data provider may be configured in a private domain logically isolated from a public network, or may be configured in other environments that may ensure data privacy inside the data provider, where the embodiment of the present invention is not limited in this respect.
As shown in fig. 1, a data product publishing method provided in a first embodiment of the present invention specifically includes the following steps:
s101, extracting data information of the acquired data to be distributed, and determining corresponding metadata information.
In this embodiment, the data providing terminal may be specifically understood as a main body for generating and distributing a data product in the data product distribution system, and providing actual data corresponding to the data product to the outside. The data demand end can be specifically understood as a main body for acquiring actual data of the data product and developing the acquired actual data in the data product release system. A data product is understood to mean in particular a carrier whose data provider shares the actual data basic information, as well as the manner of provision and requirements, to the data consumer. The data to be distributed can be specifically understood as data which is obtained by a data provider after development according to the obtained original data, can be correspondingly generated into data products and can be provided to a data demand end after processing. The metadata information may be specifically understood as information extracted from the data to be distributed and used for describing the features of the data to be distributed, and may be, for example, description information such as a library, a table, a data field name, etc. in the metadata information, for example, a file name, a size, a creation time, a creator, a column name of the library table, sampling data, etc. which may also be used for describing basic information of a data product corresponding to the data to be distributed.
Specifically, after the development of the obtained original data is completed by using a development tool in the data providing end, to-be-distributed data which can be used for generating data products to be distributed outwards is obtained, and the content which can be used for describing the characteristic information of the to-be-distributed data is extracted by a preset data identification rule or a manual calibration mode, so that metadata information corresponding to the to-be-distributed data is obtained.
The data product publishing system can be deployed in a private cloud mode, a single data providing end can create a logically isolated enterprise tenant on the cloud as a service execution environment, when the generation of data to be published is executed, a database can be directly created under the tenant of the data providing end, the database is used as a data source for storing original data, or the original data is imported into the database of the data providing end through a derivative tool in/outside the tenant, a developer of the data providing end performs development operation on the original data in the database through a development tool arranged in the data providing end, and the data obtained after the development is determined as the data to be published. Alternatively, the development tool in the data provider may be an Extract-Transform-Load (ETL) tool, a structured query language (Structured Query Language, SQL) editor, or the like, which is not limited in this embodiment of the present invention, and the development tool may implement control of data visibility by means of a strongly isolated working area, so that different developers may associate data sources from inside or far-end of the data provider into the development tool in respective supply areas to perform development operations.
S102, static desensitization and security policy configuration are carried out on the data to be released according to the metadata information and the acquired compliance requirements of the data products, and the compliance data to be released is determined and stored.
In this embodiment, the compliance requirement of the data product may be specifically understood as a condition that the published data product is not revealed in privacy according to national legal regulations or according to practical situation adaptability, and the sensitivity requirement such as a specific natural person cannot be identified without additional information is satisfied. Static desensitization is specifically understood to be a desensitization scheme that uses a preset model, a desensitization algorithm or other processing methods to mask and deform sensitive data and reduce the sensitivity level, and is suitable for desensitizing data with general desensitization rules, such as desensitizing personal information. The security policy is specifically understood to be a desensitization and protection policy set according to national laws and regulations, industry guidelines and actual conditions of different data demand terminals, so that the data demand terminals access corresponding data of the data product, and the security policy can be dynamically adjusted for different objects to reject access or desensitize a field with a certain security level or a field with a certain field type at the time of access. The to-be-issued compliance data is specifically understood as data which has been subjected to the compliance processing, and can be stored as data products corresponding to actual data at the data providing end and provided to the data demand end.
Specifically, the sensitive states of various data in the data to be released are distinguished by utilizing metadata information, the data needing static desensitization in the data to be released is determined according to the compliance requirement of a data product, shielding, deformation and other operations are carried out on the data, so that the sensitive state of the data to be released meeting the static desensitization condition is reduced, meanwhile, the security policies meeting different applicable conditions are determined according to the compliance requirement of the data product, the security policies are configured in the data to be released which is subjected to static desensitization, the compliance data to be released is obtained, and the compliance data to be released is stored in a database of a data providing end.
In the embodiment of the invention, the data to be released is subjected to static desensitization and configuration of the security policy respectively according to the compliance requirement of the data product, and the desensitization processing of the data to be released is realized according to different conditions, so that the compliance and the adaptability of the data corresponding to the released data product are ensured, the control intensity of the data providing end for the data use of the data requiring end is enhanced, and the safety of the data use is ensured.
S103, determining data communication information according to a storage mode of the to-be-released compliance data, and determining version information according to the modification times of the to-be-released compliance data.
In this embodiment, the data connection information is specifically understood as connection information describing the actual data pointed by the data product, and the location of the connection information in the data providing end may be represented in different forms according to the delivery form, that is, the storage form of the compliance data to be distributed in the data providing end under different delivery forms. It is clear that the data communication information is not opened to the data demand end, and is only called after the delivery form is determined, so that the data communication information belongs to default encrypted information and is prevented from being revealed. Version information is specifically understood as information for recording the version condition of the data product itself and the actual content corresponding to each version, and it should be clear that the version information is updated with each modification or parameter change for the compliance data to be released.
Specifically, since the to-be-released compliance data is stored in the database of the data providing end, according to different data delivery methods, such as API delivery or federal learning delivery, the storage modes of the to-be-released compliance data storage position are defined and accessed differently, the set of the storage modes corresponding to the different delivery methods is determined as the data communication information of the to-be-released compliance data, and meanwhile, the version information of the current to-be-released compliance data is determined according to the number of modifications of the to-be-released compliance data after the self-development is completed. The version of the compliance data to be issued is the first version after the desensitization and the security policy configuration are finished for the first time, and then the version information is accumulated and changed for each modification of the compliance data to be issued.
And S104, packaging and determining the metadata information, the security policy, the data communication information and the version information as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information.
In this embodiment, the data product to be distributed may be specifically understood as a carrier that is not yet distributed to the public network for access by other entities, and includes basic information of compliance data stored in the data providing terminal, and providing modes and requirements. The consumption mode is specifically understood as a form of delivering corresponding data of the data product from the data providing end to the data demand end.
Specifically, metadata information, security policy, data communication information and version information corresponding to compliance data to be issued are packaged into a set, the set is a to-be-issued data product corresponding to the compliance data to be issued, the set includes information of introducing to-be-issued compliance data content and issuing modes of the to-be-issued compliance data, and since the data communication information of the to-be-issued data product includes a plurality of different storage modes, issuing approval is required before the to-be-issued data product is issued to a data market, at this time, personnel participating in sharing process approval in a data providing end register and enter an organization relationship in the data market, and different consumption modes are defined for different data demand ends aiming at the same to-be-issued data product according to the importance of the to-be-issued data product and the property of the data demand end of the to-be-issued data product.
For example, the same data product to be distributed may be distributed in different delivery forms, if all delivery forms corresponding to the data communication information are reflected in the consumption modes of the same data product to be distributed, all data demand ends applying for the data product to be distributed may pass through the consumption mode with the lowest security, and consume the corresponding data of the data product to be distributed, so to ensure the consumption security of different types of data demand ends, multiple different consumption modes may be determined for the same data product to be distributed according to different delivery forms.
S105, determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system.
In this embodiment, the product number is specifically understood as a number for uniquely identifying the data product to be distributed in the data market and the data provider. A data market is understood to mean in particular a body deployed in a public network for carrying a plurality of data products for access by a data consumer accessing the same.
Specifically, through preset product numbering rules, numbering definition is performed for different types corresponding to metadata information, types of different security policies, different storage modes corresponding to data communication information, version numbers and different consumption modes, so that a product number for uniquely identifying a data product to be distributed is obtained, the product number can be used as a summary of the data product to be distributed, basic information of the data product to be distributed is summarized, and the data product to be distributed is searched by each data demand end after being distributed in a data market. Marking the data products to be released through the product numbers, namely, after the product numbers and the data products to be released form a one-to-one correspondence, releasing the data products to be released after the product numbers are completed into a data market in a data product release system, so that a data demand end accessing the data market can access the data demand end.
For example, assuming that the data connectivity information corresponding to the data product to be distributed is denoted as CI, the security policy is denoted as PS, the version information is denoted as V, the consumption mode is denoted as CM, and the metadata information is denoted as AI, the product number PC corresponding to the generation thereof may be denoted as:
PC=digest(CI,AI,PS,V,CM)。
according to the technical scheme, the corresponding metadata information is determined by extracting the data information of the acquired data to be distributed; according to the metadata information and the acquired compliance requirements of the data products, static desensitization and security policy configuration are carried out on the data to be released, and the compliance data to be released is determined and stored; determining data communication information according to a storage mode of the compliance data to be released, and determining version information according to the modification times of the compliance data to be released; packaging and determining metadata information, security policy, data communication information and version information as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information; and determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system. By adopting the technical scheme, after the data source is developed and obtained at the data providing end, basic information is extracted from the data to be released, static desensitization and configuration of a safety strategy are carried out on the data to be released according to the extracted basic information and the compliance requirement of the data product obtained in advance, so that the corresponding safety strategy is dynamically configured for different types of data demand ends on the basis that the configured data to be released meets the national legal regulation requirement, and the data communication information determined according to the storage mode of the data to be released, the version information determined according to the modification times, the consumption mode defined according to the data communication information and the like are combined, the data product to be released is numbered, and the release of the data product to the data market is completed. The method and the system have the advantages that parameters for keeping data compliance in the whole data life cycle are completely included in the issued data product, the problem that safety control of data circulation is isolated and safety of the issued data product is difficult to guarantee is solved, and through sending compliance data which corresponds to the data product and is subjected to static desensitization and contains a safety strategy to a data demand end of a request data product, control precision of a data providing end on a data using mode of the data demand end is enhanced, compliance of the whole data transaction process is guaranteed, the issued data product is guaranteed, and safety risks are avoided in the using process.
Example two
Fig. 2 is a flowchart of a data product publishing method provided by a second embodiment of the present invention, where the technical solution of the second embodiment of the present invention is further optimized based on the above-mentioned alternative technical solutions, and a method for determining a data type and a security level of data to be published according to metadata information and compliance requirements of the data product is defined, so as to perform de-identification processing on the data to be published according to the determined data type and security level, so as to achieve the purpose of static desensitization, and further determine a method for formulating a security policy for the data to be published according to the compliance requirements of the data product. Meanwhile, the method for defining the consumption mode of the data product to be distributed according to different data communication information is clear, so that compliance processing aiming at the development process of the data product is more systematic, the control precision of the data providing end on the data using mode of the data demand end is enhanced, the compliance of the whole data transaction process is ensured, the distributed data product is ensured, and safety risks are avoided in the use process.
As shown in fig. 2, a data product publishing method provided in a second embodiment of the present invention specifically includes the following steps:
S201, extracting data information of the acquired data to be distributed, and determining corresponding metadata information.
S202, determining at least one data type of the data to be published according to the metadata information.
In this embodiment, the data type may be specifically understood as a description of a preset and defined data base classification for different industries, and exemplary data types may be name, gender, client name, face, ID, etc., and the data types determined for the structured data to be distributed and the unstructured data to be distributed may be different, which is not limited in the embodiment of the present invention.
Specifically, according to the data basic information included in the metadata information corresponding to the data to be distributed, the data to be distributed is classified, and the data to be distributed belonging to the same classification is classified into the same data type.
S203, identifying the data to be distributed, of which the data type belongs to the preset personal information type.
Wherein the identification process includes a direct identifier identification process and a quasi-identifier identification process.
In this embodiment, the preset personal information type may be specifically understood as information preset according to actual situations, which may be used to directly or indirectly point to a specific natural person. A direct identifier is understood to mean in particular an identifier that identifies data containing personal information which can directly locate a natural person. Quasi-identifiers are understood to mean in particular identifiers which cannot be identified by direct localization of data of a natural person containing personal information.
Specifically, determining that the data type of the data to be distributed belongs to the data of the preset personal information type, further determining whether the data type belongs to the data type capable of directly locating natural people, if so, adding a direct identifier to the data corresponding to the data type, and if not, adding a quasi identifier to the data corresponding to the data type. Illustratively, the data of the direct identifier type may be data such as a name and an identity card number, which may directly indicate the identity of a natural person, or may directly indicate a small portion of the identity of a natural person; the data of the quasi-identifier type can be data such as age, etc. which can not directly indicate the identity of natural people or point to more people of the object.
S204, determining the security level of the data corresponding to each data type in the data to be issued according to each data type and the compliance requirement of the data product.
In this embodiment, the security level is specifically defined as a level predefined according to actual conditions or laws and regulations to indicate importance of data.
Specifically, security levels are defined for different types of data types in advance according to compliance requirements of data products, and after a plurality of data types corresponding to the data types to be distributed are determined, the security level corresponding to the data types is determined to be the security level corresponding to the data to be distributed. Furthermore, after the security levels of the data corresponding to the data types in the data to be issued are matched according to the data types, the security levels corresponding to certain data can be manually adjusted, the security levels must be made to meet the requirements of laws and regulations in the manual adjustment process, and meanwhile, the security levels can only be adjusted up but not down.
In the embodiment of the invention, the security level of the data with different data types in the data to be distributed is determined by combining the data type matching and the manual adjustment, so that the determination of the security level is more in line with the generation requirement of an actual data product, and the security level can only be raised or not lowered when the manual adjustment is performed, thereby improving the security of processing the data according to the security level.
It should be clear that there is no obvious sequential relationship between S203 and S204, and they may be executed simultaneously or according to different sequences.
S205, performing de-identification processing on the data to be issued, and determining static desensitization data.
In this embodiment, the de-identification process is specifically understood as a technique of deleting or transforming the direct identifier or the quasi-identifier so that an attacker cannot identify a specific subject from the de-identified personal data. Static desensitization data is specifically understood to be data to be distributed after the personal information type data is de-identified.
Specifically, the data marked by the direct identifier or the quasi-identifier in the data to be issued is subjected to de-marking treatment, and the data to be issued after de-marking is determined to be static desensitization data.
Further, fig. 3 is a schematic flow chart of determining static desensitized data by performing de-identification processing on data to be published according to a second embodiment of the present invention, as shown in fig. 3, and specifically includes the following steps:
s2051, inputting the to-be-processed release data with the direct identifier or the quasi-identifier in the to-be-released data into a preset de-identification model.
In this embodiment, the issue data to be processed is specifically understood as data marked with an identifier in the issue data. The preset de-identification model is specifically understood as a preset or pre-trained model capable of performing marking processing on data marked by different types of identifiers. Alternatively, the preset de-identification model may be a K-anonymity model, an L-diversity model, etc., which is not limited in this embodiment of the present invention.
S2052, determining de-identification data according to the output result, and de-identification rating corresponding to the de-identification data.
Specifically, the output result of the preset de-identification model includes de-identification data corresponding to de-identification processing of the input data, and the output de-identification data is rated according to the pre-input data sharing type information while the data is de-identification processing, so that whether the de-identification purpose of the input data is achieved is determined through the de-identification rating. Optionally, the pre-input data sharing type information may be a data type set according to practical situations, where the data to be distributed is allowed to be provided to the data demand end to a greater extent. The rating standard of the de-identified rating can be preset according to the national standard, and can also be adjusted according to the actual situation, and the embodiment of the invention is not limited to this.
S2053, determining static desensitization data according to the de-identification data, the de-identification rating and the data to be distributed.
Specifically, whether the de-identified data is processed or not is determined according to the de-identified rating, if so, the corresponding data in the data to be issued can be replaced by the de-identified data, and the replaced data to be issued is determined to be static desensitization data; if the data to be distributed is not identified, the identification removal data is considered to be not identified, and the identification removal data is required to be adjusted again until the identification removal is achieved, the corresponding data in the data to be distributed is replaced through the identification removal data, and the replaced data to be distributed is determined to be static desensitization data.
Further, the static desensitization data is determined according to the de-identification data, the de-identification rating and the data to be distributed, and the method can be realized in the following way:
if the de-identification rating does not meet the preset rating condition, the de-identification data is used as new to-be-processed release data, parameters in a preset de-identification model are adjusted, and the step of inputting to-be-processed release data with a direct identifier or a standard identifier in the to-be-released data into the preset de-identification model is executed in a returning mode. Otherwise, the to-be-processed release data is replaced by the de-identified data, and the replaced to-be-released data is determined to be static desensitization data.
In this embodiment, the preset rating condition may be specifically understood as a condition preset according to an actual situation, for evaluating whether the output de-identified data achieves the purpose of de-identification.
Specifically, if the de-identification rating does not meet the preset rating condition, the de-identification data output by the preset de-identification model can be considered to not achieve the expected de-identification purpose, in order to avoid the waste of the de-identification processing operation, the de-identification data obtained currently is used as new to-be-processed release data in the to-be-released data, parameters in the preset de-identification model are adjusted to enable the de-identification granularity of the preset de-identification model to be correspondingly adjusted, the new to-be-processed release data is further input into the preset de-identification model after parameter adjustment until the de-identification rating output by the model can meet the preset rating condition, at the moment, the corresponding data in the to-be-released data is replaced by the de-identification data, and the replaced to-be-released data is determined to be static desensitization data.
In the embodiment of the invention, the de-identification result of the data to be issued is evaluated through the de-identification rating, and if the de-identification purpose is not achieved, the de-identification processing is further completed on the basis of the previous de-identification processing, so that the waste of data processing resources is avoided, the sensitivity of outputting static desensitization data is reduced, and the compliance of the data is ensured.
S206, determining the security policy of the static desensitization data according to the security level and the compliance requirement of the data product, and configuring the security policy and the static desensitization data correspondingly to obtain the compliance data to be issued.
Specifically, since the static desensitization data only completes desensitization of the data containing the personal information type data in the data to be released, the data with higher security level, namely higher sensitivity, is not subjected to desensitization operation, and the unusable data cannot be accessed after being provided to the data demand end for guaranteeing the data product, the security policies required by different data demand ends are customized in a personalized way according to the security level of other data and the compliance requirement of the data product, and then the security policies are configured in the static desensitization data, and the configured static desensitization data is determined as the compliance data to be released.
Further, the security policy of the static desensitization data is determined according to the security level and the compliance requirement of the data product, and the method specifically comprises the following situations:
a. and determining a security access level according to the compliance requirement of the data product, and determining the security access level as a first security policy of the static desensitization data.
In this embodiment, the security access level may be specifically understood as a security level that is preset according to a rule and that a data demand end may access data. It should be clear that the security policy making requirement on which the security access level is the most basic, i.e. the security access level must be included in the data product compliance requirements.
Specifically, sensitive data with high security level can be prevented from being accessed on the most basic level through the security access level, so that corresponding security access levels can be determined for different data demand ends according to the compliance requirement of data products, and the security level equal to the security access level in static desensitization data is determined as a first security policy. The data demand end applicable to the first security policy can only access the data with the security level being the security access level in the static desensitization data, so that the security of the sensitive data is fundamentally ensured.
b. If the field access control requirements are included in the data product compliance requirements, the security access level and the field access control requirements are determined to be a second security policy for the static desensitized data.
In this embodiment, the field access control requirement is specifically understood as a requirement for performing access control on a certain field in specific data.
Specifically, when the field access control requirement is included in the data product compliance requirement, it can be considered that a higher requirement is put forward on the data compliance, and the method is applicable to the special case based on the first security policy, namely, under the condition that the first security policy base is met, the data providing end considers that the static desensitization data still needs to meet the control of accessing a certain field, and then the security access level and the field access control requirement can be determined as the second security policy of the static desensitization data. The data request end applicable to the second security policy not only can access the data with the security level being the security access level in the static desensitization data, but also can not access certain field data with the security level being the security access level in the static desensitization data, or can access certain field data with the security level not being the security access level in the static desensitization data.
c. If the user field access control requirements are included in the data product compliance requirements, the security access level and the user field access control requirements are determined to be a third security policy for the static desensitized data.
In this embodiment, the user field access control requirement is specifically understood as a requirement for a specific user, so that the user performs access control on a certain field in specific data.
Specifically, when the data product compliance requirement includes a user field access control requirement, it may be considered that a higher requirement is put forward on the data compliance, which is applicable to a special case based on the first security policy, that is, in a case where the first security policy base is satisfied, the data providing end does not consider that the static desensitization data needs to satisfy the control of accessing a certain field, but when the static desensitization data is provided to a specific user, the access control of a certain field still needs to be satisfied, so that the security access level and the user field access control requirement may be determined as a third security policy of the static desensitization data. The data request end applicable to the third security policy not only can access the data with the security level being the security access level in the static desensitization data, but also can not access certain field data with the security level being the security access level in the static desensitization data, which is the same as the access control requirement of the user field, or can access certain field data with the security level being not the security access level in the static desensitization data, which is not the security access level.
d. If the data product compliance requirements include field access control requirements and user field access control requirements, determining the security access level, the field access control requirements and the user field access control requirements as a fourth security policy for the static desensitized data.
Specifically, when the data product compliance requirements include both field access control requirements and user field access control requirements, it can be considered that a higher requirement is put forward on the data compliance, and the method is applicable to a special case based on the second security policy, that is, when the second security policy base is met, static desensitization data still needs to meet the control of accessing a certain field when being provided to a certain specific user, and then the security access level, the field access control requirements and the user field access control requirements can be determined as a fourth security policy of the static desensitization data. The data demand end applicable to the fourth security policy not only can access the data with the security level equal to the security access level in the static desensitization data, but also needs to control the data corresponding to a certain field in the static desensitization data, and also needs to control the access to the specific field corresponding to the data.
Illustratively, assuming security access levels of 1, 2, and 3, name, identification card, and gender are data types of security access level 2, field access control requires access to the identification card field to be prohibited, and user field access control requires access to the gender information to be prohibited. The data demand end executing the first security policy can access the data with the security level of 1-3 in the static desensitization data under the assumption; the data demand end executing the second security policy can access the data with the security level of 1-3 in the static desensitization data, but can not access the identity card information; the data demand end executing the third security policy can access the data with the security level of 1-3 in the static desensitization data, but can not access the sex information; the data demand end executing the fourth security policy can access the data with the security level of 1-3 in the static desensitization data, but can not access the identity card information and the sex information.
S207, storing the to-be-issued compliance data in a database in the data providing end.
S208, determining data communication information according to a storage mode of the to-be-released compliance data, and determining version information according to the modification times of the to-be-released compliance data.
S209, packaging and determining metadata information, security policies, data connectivity information and version information as data products to be distributed.
S210, determining at least one data delivery mode of the data product to be distributed according to the data communication information.
In this embodiment, the delivery mode may be specifically understood as information describing how the data product is delivered to the data-requiring end, and exemplary delivery modes may include gateway call, API access, privacy calculation, federal learning, etc., which is not limited by the embodiment of the present invention.
Specifically, at least one data delivery mode in which the data product to be distributed is feasible is determined according to different storage forms contained in the data communication information. Illustratively, the data delivery schema supports direct access to the encrypted database of the data provider through the SQL gateway; the data delivery mode supports accessing data through an API, and when supporting the API to access the data, an exposed API path, an access parameter, a corresponding database query statement and the like are required to be further configured; the data delivery mode also comprises modes of privacy calculation, federal learning and the like.
S211, auditing each data delivery mode according to the property of the data product to be distributed corresponding to the target data demand end, and determining the data delivery mode passing the auditing as the consumption mode of the data product to be distributed.
In this embodiment, the target data request end may be specifically understood as a data request end that can apply for a data product to be distributed in the data market.
Specifically, for a data product to be distributed, determining a data demand end which is expected to be applied after the data product is distributed to a data market, determining the data demand end as a target data demand end, further determining the safety requirements of a data transmission process and a data use process by the target data demand end according to the properties of the target data demand end, auditing different data delivery modes of the data product to be distributed based on the safety requirements, and determining a data delivery mode meeting the safety requirements as a consumption mode of the data product to be distributed.
S212, determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system.
According to the technical scheme, the data to be issued are classified and classified according to the security level, so that the sensitivity condition of the data of each data type is determined, the identification and de-identification processing is carried out on the data belonging to the preset personal information type, the corresponding static desensitization data is obtained, the security policies with different priorities are configured for the static desensitization data according to the security level and the security access control requirement of the fields and/or the access control requirement of the user fields in the compliance requirements of the data products, the dynamic desensitization of the issued data products is realized, meanwhile, the consumption mode of the data products to be issued is defined according to the different data communication information, the compliance processing aiming at the development process of the data products is more systematic, the control precision of the data use mode of the data providing end is enhanced, the compliance of the whole data transaction process is ensured, and the issued data products are ensured not to have security risks in the use process.
Example III
Fig. 4 is a flowchart of a data product publishing method according to a third embodiment of the present invention, where the method may be applied to a data market of a data product publishing system, where the data market may be implemented by software and/or hardware, and the data market may be configured in a public network environment separate from the data providing end and the data requesting end, and the data market publishing end and the data market subscribing end may be respectively configured in a public network, where the data providing end may implement publishing of the data product through the data market publishing end, and the data requesting end may implement browsing and application of the data product through accessing the data market subscribing end.
As shown in fig. 4, the data product publishing method provided in the third embodiment of the present invention specifically includes the following steps:
s301, after receiving a data product application request of a data demand end of a data product release system, checking a data consumption environment of the data demand end.
In this embodiment, the data product application request may be specifically understood as information that is sent to the data market and that is hoped to request a certain data product published in the data market by the data demand end accessing the data market. The data consumption environment is specifically understood as a configuration provided by the data consumer to support the consumption of the data product.
Specifically, each data demand end connected to the data market can browse a plurality of data products issued on the data market, after determining the required data products, the data demand end sends a data product application request to the corresponding data market for the required data products, and at this time, the data market checks the data consumption environment of the data demand end according to the configuration conditions required by the delivery of the data products corresponding to the data product application request.
S302, if the data consumption environment is consistent with the consumption mode of the data product corresponding to the data product application request and comprises an encryption database and a development tool corresponding to the consumption mode, the security policy of the data product is pushed to the encryption database.
Specifically, when the data consumption environment of the data demand end is consistent with the consumption mode of the data product corresponding to the data product application request, the data product can be considered to have the possibility of being delivered to the data demand end for development, and because the data product needs to transmit the compliance data corresponding to the data product to the data demand end for development and use, the data demand end needs to be provided with an encryption database and a development tool corresponding to the consumption mode, otherwise, the data demand end can be considered that the actual data corresponding to the data product cannot be delivered to the data demand end, and when the data demand end is provided with the encryption database and the development tool corresponding to the consumption mode, the data market pushes the security policy of the data product to the encryption database of the data demand end, so that the data demand end can correspondingly store and configure the security policy, and then after the connection mode of the encryption database is registered to the corresponding working area of all development tools, each development tool can acquire the data from the encryption database according to the security policy for development.
And S303, transmitting the connection information of the encryption database and the data connection information of the data product to a data providing end corresponding to the data product so that the encryption database receives the compliance data corresponding to the data product.
In this embodiment, compliance data may be understood as, in particular, actual data corresponding to a data product, which has been configured by static desensitization and security policies, stored in a database at the data provider.
Specifically, the data market can acquire the connection information of the encrypted database at the data demand end in the checking process, meanwhile, the data communication information of the data product is the information visible only to the data providing end corresponding to the data product, and the data market provides the connection information and the data communication information to the data providing end, so that the data providing end can establish a secure data link between the database at the data providing end and the encrypted database at the data demand end according to the information, and further the encrypted database at the data demand end can receive the compliance data corresponding to the data product through the secure data link.
Further, after checking the data consumption environment of the data demand end, the method further comprises:
and if the data consumption environment is inconsistent with the consumption mode of the data product corresponding to the data product application request, rejecting the data product application request.
Specifically, if the consumption environment of the data in the data demand end is inconsistent with the consumption mode of the data product corresponding to the data product application request, the data demand end can be considered to be unable to apply for the data product, or the data demand end can be considered to be unable to realize delivery of the data product corresponding to the compliance data even if the data product is applied, and at this time, the data market refuses the data product application request of the data demand end so as to reduce the data processing pressure caused by incorrect application to the data providing end.
Further, after checking the data consumption environment of the data demand end, the method further comprises:
if the data consumption environment is consistent with the consumption mode of the data product corresponding to the data product application request and does not comprise an encryption database and a development tool corresponding to the consumption mode, the data consumption environment is built or prompted to be built at the data demand end.
Specifically, if the data consumption environment in the data demand end is consistent with the consumption mode of the data product corresponding to the data product application request, the data product can be considered to have the possibility of being delivered to the data demand end for development, and because the data product needs to transmit the compliance data corresponding to the data product to the data demand end for development and use, the data demand end needs to be provided with an encryption database and a development tool corresponding to the consumption mode, otherwise, the data product can be considered to be not delivered to the data demand end, and when the data demand end does not have the encryption database and the development tool corresponding to the consumption mode, the data market can inform the data product issuing system to construct the encryption database and the development tool in the data demand end according to the deployment position of the data demand end, or prompt the data demand end to construct the data consumption environment.
The data market and the data provider end are deployed in the same enterprise cloud when serving as two enterprise tenants which are isolated from each other, after receiving a data product application request, the data market firstly detects whether a data sandbox exists at the data provider end sending the data product application request, and if the data sandbox does not exist at the data provider end, the data product issuing system is informed of building a sandbox instance in the data provider end; and the data market checks whether the sandbox at the data demand end is provided with an encryption database and a development tool which meet the consumption mode according to the consumption mode of the data product application request, if not, the encryption database and the development tool are created in the sandbox through the data product release system, and then the encryption database and the development tool are associated to the data sandbox, and the data sandbox automatically registers the connection mode of the encryption database to the working areas corresponding to all development tools, so that the development tool can directly extract and use the compliance data stored in the sandbox according to the security strategy configured in the encryption database when the data demand end performs subsequent data development.
The data market is divided into a release end data market and a subscription end data market, wherein the release end data market and the data providing end are arranged in the same enterprise cloud, the data product release of the data demand end is received, the subscription end data market and the data demand end are arranged in a correlated mode and are used for receiving a data product application request of the data demand end, meanwhile, the release end data market and the subscription end data market are connected in an encrypted mode through a private line or VPN, synchronization of a market data product and request information is achieved between the release end data market and the subscription end data market, when the data market detects that the data consumption environment of the data demand end does not meet the delivery requirement of the data product, the data market cannot directly inform a data product release system to construct the data consumption environment at the data demand end, prompt information is sent to the data demand end by the data market at the moment, so that a worker can correspondingly construct the data consumption environment after receiving the prompt information, and follow-up safety strategy issuing and compliance data delivery are executed after the data consumption environment is constructed.
According to the technical scheme, after the data market receives the data product application request of the data demand end, the data consumption environment of the data demand end is checked and configured, after the data consumption environment of the data demand end is checked successfully, the security policy of the data product is pushed to the encryption database corresponding to the data demand end, and then after the security policy configuration of the data demand end is completed, the data demand end and the data communication information are sent to the data supply end to realize delivery of the compliance data, so that the success rate of delivering the data product is guaranteed.
Example IV
Fig. 5 is a schematic structural diagram of a data product release system according to a fourth embodiment of the present invention, where, as shown in fig. 5, the data product release system includes: at least one data provider 41, at least one data consumer 42 and a data market 43, each data provider 41 and each data consumer 42 being located in separate domains, one data provider 41 and one data consumer 42 being exemplified in the present embodiment.
The data providing end 41 is configured to perform data information extraction, data compliance processing, data connectivity information determination, version information determination and consumption mode determination on the acquired data to be distributed, generate a data product to be distributed including a product number, and distribute the data product to be distributed to the data market 43;
a data market 43, configured to determine a target data product according to the data product application request when receiving the data product application request of the data demand end 42, and check a data consumption environment of the data demand end 42 according to a consumption mode corresponding to the target data product;
the data demand end 42 is configured to receive a security policy of a target data product pushed by the data market when the data demand end has a data consumption environment, and store the security policy in the encryption database;
a data market 43 for transmitting the connection information of the encrypted database and the data connectivity information of the target data product to the data providing terminal 41;
the data providing end 41 is configured to push compliance data corresponding to the target data product to the encrypted database according to the connection information of the encrypted database and the data connection information of the target data product;
the product to be distributed comprises metadata information, a security policy, data communication information and version information, and the product number comprises the metadata information, the security policy, the data communication information, the version information and a consumption mode.
Further, the data market 43 is further configured to reject the data product application request of the data demand end 42 if the data consumption environment is inconsistent with the consumption mode after checking the data consumption environment of the data demand end 42 according to the consumption mode corresponding to the target data product.
Further, the data market 43 is further configured to, after checking the data consumption environment of the data demand end 42 according to the consumption mode corresponding to the target data product, if the data consumption environment is consistent with the consumption mode and the data demand end 42 does not include the encryption database and the development tool corresponding to the consumption mode, construct or prompt the data consumption environment at the data demand end 42.
According to the technical scheme, basic information is extracted from the data to be released, static desensitization and safety strategy configuration are carried out on the data to be released according to the extracted basic information and the pre-acquired compliance requirement of the data product, so that the corresponding safety strategies are dynamically configured for different types of data demand ends on the basis that the configured compliance data meets the national legal regulation requirement, and the data communication information determined according to the storage mode of the compliance data to be released, version information determined according to the modification times, consumption modes defined according to the data communication information and the like are combined, the data product to be released is numbered, and the release of the data product to the data market is completed. The data market checks and configures the data consumption environment of the data demand end after receiving the data product application request of the data demand end, and pushes the security policy of the data product into the encryption database corresponding to the data demand end after the check is successful, and then after the security policy configuration of the data demand end is completed, the data demand end and the data communication information are sent to the data providing end to realize delivery of the compliance data, so that the delivery success rate of the data product is ensured, and meanwhile, due to the fact that the security policy of the data demand end is configured for releasing the data product, compliance of the data product corresponding to compliance data in subsequent application is ensured, compliance of the whole data transaction process is ensured, the released data product is ensured, and no security risk exists in the use process.
Fig. 6 is a diagram illustrating an exemplary structure of a data product publishing system according to a fourth embodiment of the present invention, and as shown in fig. 6, the data product publishing system includes a data providing end 51, a publishing end data market 52, and a data requesting end 531, which are disposed in the same enterprise cloud, a data requesting end 532 with a customized integrated machine, which is independent of the enterprise cloud, and a subscribing end data market 54 corresponding to the data requesting end 532. The subscription data market 54 may be disposed inside the data request end 532 or outside the data request end 532, which is not limited in this embodiment of the present invention.
The data providing end 51 includes a development tool 511, a compliance tool 512 and a pushing tool 513, a worker of the data providing end 51 performs development operation on data acquired by a data source through the development tool 511, performs static desensitization and security policy configuration processing on the developed data to be distributed through the compliance tool 512, obtains and stores the compliance data to be distributed, determines corresponding metadata information, security policy, data communication information and version information of the compliance data to be distributed, packages the compliance data to obtain a data product to be distributed, and determines a product number of the data product to be distributed according to the metadata information, security policy, data communication information, version information and consumption mode after checking to obtain a consumption mode of the data product to be distributed, and distributes the numbered data product to the data market 52 of the distribution end. The publisher data marketplace 52 transmits the data products published therein to the subscriber data marketplace 54 via private line/VPN encryption such that the data needs 531 of the publisher data marketplace 52, or the data needs 532 of the subscriber data marketplace 54, are accessed. After receiving the request for the data product, the publishing-side data market 52 or the subscribing-side data market 54 performs data consumption environment inspection on the corresponding data demand side 531 or data demand side 532, and issues the corresponding security policy of the data product to the encrypted database or the secure storage computing environment after the inspection is successful, and simultaneously sends the connection information corresponding to the data demand side 531 or the data demand side 532 and the data communication information of the data product to the pushing tool 513 of the data providing side 51, so that the data providing side 51 can push the compliance data corresponding to the data product to the corresponding data demand side 531 or the data demand side 532 through the pushing tool 513.
The data product release system provided by the embodiment of the invention can execute the data product release method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
In some embodiments, the data product distribution method may be implemented as a computer program, which is tangibly embodied on a computer-readable storage medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the data product distribution system via the ROM and/or the communication unit. One or more of the steps of the data product distribution method described above may be performed when the computer program is loaded into RAM and executed by a processor. Alternatively, in other embodiments, the processor may be configured to perform the data product publishing method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (13)

1. A data product distribution method, applied to a data providing end of a data product distribution system, the method comprising:
extracting data information of the acquired data to be distributed, and determining corresponding metadata information;
performing static desensitization and security policy configuration on the data to be issued according to the metadata information and the acquired compliance requirements of the data products, and determining and storing the compliance data to be issued;
determining data communication information according to a storage mode of the to-be-distributed compliance data, and determining version information according to the modification times of the to-be-distributed compliance data;
determining the metadata information, the security policy, the data communication information and the version information package as data products to be distributed, and defining a consumption mode of the data products to be distributed according to the data communication information;
determining the product number of the data product to be distributed according to the metadata information, the security policy, the data communication information, the version information and the consumption mode, and distributing the numbered data product to be distributed to a data market in the data product distribution system;
The data communication information is connection information used for describing the position of actual data pointed by a data product in the data providing end;
and the consumption mode is a form that the data providing end delivers the corresponding data of the data product to be distributed to the data demand end.
2. The method of claim 1, wherein the statically desensitizing and security policy configuring the data to be distributed according to the metadata information and the acquired data product compliance requirements, determining and storing the compliance data to be distributed, comprises:
determining at least one data type of the data to be distributed according to the metadata information;
determining the security level of the data corresponding to each data type in the data to be distributed according to each data type and the compliance requirement of the data product;
de-labeling the data to be issued, and determining static desensitization data;
determining a security policy of the static desensitization data according to the security level and the compliance requirement of the data product, and configuring the security policy and the static desensitization data correspondingly to obtain compliance data to be issued;
and storing the to-be-issued compliance data in a database in the data providing end.
3. The method of claim 2, further comprising, after said determining at least one data type of said data to be distributed based on said metadata information:
identifying the data to be distributed, of which the data type belongs to a preset personal information type;
wherein the identification process includes a direct identifier identification process and a quasi identifier identification process;
wherein the direct identifier identification process is a process of carrying out identifier identification on data containing personal information capable of directly locating natural people;
the quasi-identifier identification process is a process of identifying the data containing personal information, wherein the data cannot directly locate the natural person.
4. A method according to claim 3, wherein said de-labelling the data to be distributed, determining static desensitisation data, comprises:
inputting the to-be-processed release data with the direct identifier or the standard identifier in the to-be-released data into a preset de-identification model;
determining de-identified data and de-identified ratings corresponding to the de-identified data according to the output result;
and determining static desensitization data according to the de-identification data, the de-identification rating and the data to be distributed.
5. The method of claim 4, wherein the determining static desensitization data from the de-identified data, the de-identified ratings, and the data to be distributed comprises:
if the de-identification rating does not meet the preset rating condition, the de-identification data is used as new to-be-processed release data, parameters in the preset de-identification model are adjusted, and the step of inputting to-be-processed release data with a direct identifier or a quasi-identifier in the to-be-released data into the preset de-identification model is executed in a returning mode;
otherwise, the to-be-processed release data is replaced by the de-identified data, and the replaced to-be-released data is determined to be static desensitization data.
6. The method of claim 2, wherein said determining a security policy for the static desensitization data based on the security level and the data product compliance requirements comprises:
determining a security access level according to the data product compliance requirements, and determining the security access level as a first security policy of the static desensitization data;
if the data product compliance requirements include field access control requirements, determining the security access level and the field access control requirements as a second security policy of the static desensitized data;
If the data product compliance requirements include user field access control requirements, determining the security access level and the user field access control requirements as a third security policy of the static desensitized data;
and if the data product compliance requirements comprise field access control requirements and user field access control requirements, determining the security access level, the field access control requirements and the user field access control requirements as a fourth security policy of the static desensitized data.
7. The method of claim 1, wherein the defining a consumption pattern of the data product to be distributed according to the data connectivity information comprises:
determining at least one data delivery mode of the data product to be distributed according to the data communication information;
and according to the property of the data product to be distributed corresponding to the target data demand end, auditing each data delivery mode, and determining the data delivery mode passing the auditing as the consumption mode of the data product to be distributed.
8. A data product distribution method, for use in a data market of a data product distribution system, the method comprising:
After receiving a data product application request of a data demand end of the data product release system, checking a data consumption environment of the data demand end;
if the data consumption environment is consistent with the consumption mode of the data product corresponding to the data product application request and comprises an encryption database and a development tool corresponding to the consumption mode, pushing the security policy of the data product to the encryption database;
transmitting the connection information of the encryption database and the data communication information of the data product to a data providing end corresponding to the data product so that the encryption database receives the compliance data corresponding to the data product;
the data communication information is connection information used for describing the position of the actual data pointed by the data product in the data providing end;
the consumption mode is a form that a data providing end delivers corresponding data of the data product to the data demand end;
the data consumption environment is configured provided by the data demand end for supporting the data product consumption.
9. The method of claim 8, further comprising, after said checking the data consumption environment of the data-requiring end:
If the data consumption environment is inconsistent with the consumption mode of the data product corresponding to the data product application request, rejecting the data product application request;
if the data consumption environment is consistent with the consumption mode of the data product corresponding to the data product application request and does not comprise an encryption database and a development tool corresponding to the consumption mode, the data consumption environment is built or prompted to be built at the data demand end.
10. A data product distribution system comprising at least one data providing end, at least one data requesting end, and a data market, each said data providing end and each said data requesting end being located in mutually isolated domains;
the data providing end is used for carrying out data information extraction, data compliance processing, data communication information determination, version information determination and consumption mode determination on the acquired data to be distributed, generating a data product to be distributed, which contains a product number, and distributing the data product to be distributed to the data market;
the data market is used for determining a target data product according to the data product application request when the data product application request of the data demand end is received, and checking the data consumption environment of the data demand end according to the consumption mode corresponding to the target data product;
The data demand end is used for receiving the security policy of the target data product pushed by the data market when the data consumption environment is provided, and storing the security policy into an encryption database;
the data market is used for sending the connection information of the encryption database and the data communication information of the target data product to the data providing end;
the data providing end is used for pushing the compliance data corresponding to the target data product to the encryption database according to the connection information of the encryption database and the data communication information of the target data product;
the data product to be distributed comprises metadata information, a security policy, data communication information and version information, and the product number comprises the metadata information, the security policy, the data communication information, the version information and the consumption mode;
the data communication information is connection information used for describing the position of the actual data pointed by the data product to be distributed in the data providing end;
the consumption mode is a form that the data providing end delivers the corresponding data of the data product to be distributed to the data demand end.
11. The system of claim 10, wherein the system further comprises a controller configured to control the controller,
the data market is further configured to, when the data demand end does not have the data consumption environment and the data consumption environment in the data demand end is arranged through a data sandbox, establish a sandbox at the data demand end through the data product release system, and establish an encrypted database and at least one development tool corresponding to the consumption mode in the sandbox;
the data demand end is used for registering the connection mode of the encryption database into each development tool after the encryption database is created, so that when data development is carried out through each development tool, the corresponding data of the target data product is read based on the security policy in the encryption database.
12. The system of claim 10, wherein the system further comprises a controller configured to control the controller,
the data market is further configured to send a data consumption environment construction prompt message to the data demand end when the data demand end does not have the data consumption environment and the data consumption environment of the data demand end is arranged through the data all-in-one machine, so that the data demand end constructs the data consumption environment according to the data consumption environment construction prompt message.
13. A computer readable storage medium storing computer instructions for causing a processor to implement the data product release method of any one of claims 1-9 when executed.
CN202211664319.5A 2022-12-23 2022-12-23 Data product release method, system and storage medium Active CN115935421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211664319.5A CN115935421B (en) 2022-12-23 2022-12-23 Data product release method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211664319.5A CN115935421B (en) 2022-12-23 2022-12-23 Data product release method, system and storage medium

Publications (2)

Publication Number Publication Date
CN115935421A CN115935421A (en) 2023-04-07
CN115935421B true CN115935421B (en) 2024-01-30

Family

ID=86700738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211664319.5A Active CN115935421B (en) 2022-12-23 2022-12-23 Data product release method, system and storage medium

Country Status (1)

Country Link
CN (1) CN115935421B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020133346A1 (en) * 2018-12-29 2020-07-02 Nokia Shanghai Bell Co., Ltd. Data sharing
CN112732811A (en) * 2020-12-31 2021-04-30 广西中科曙光云计算有限公司 Data open platform
CN112818390A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Data information publishing method, device and equipment based on privacy protection
CN114021184A (en) * 2021-10-28 2022-02-08 深圳乐信软件技术有限公司 Data management method and device, electronic equipment and storage medium
CN114077610A (en) * 2020-08-14 2022-02-22 深信服科技股份有限公司 Data publishing method and related device
CN115129716A (en) * 2022-06-27 2022-09-30 浪潮工业互联网股份有限公司 Data management method, equipment and storage medium for industrial big data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017201850B2 (en) * 2016-03-21 2020-10-29 Vireshwar K. ADHAR Method and system for digital privacy management

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020133346A1 (en) * 2018-12-29 2020-07-02 Nokia Shanghai Bell Co., Ltd. Data sharing
CN114077610A (en) * 2020-08-14 2022-02-22 深信服科技股份有限公司 Data publishing method and related device
CN112732811A (en) * 2020-12-31 2021-04-30 广西中科曙光云计算有限公司 Data open platform
CN112818390A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Data information publishing method, device and equipment based on privacy protection
CN114021184A (en) * 2021-10-28 2022-02-08 深圳乐信软件技术有限公司 Data management method and device, electronic equipment and storage medium
CN115129716A (en) * 2022-06-27 2022-09-30 浪潮工业互联网股份有限公司 Data management method, equipment and storage medium for industrial big data

Also Published As

Publication number Publication date
CN115935421A (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US10965714B2 (en) Policy enforcement system
WO2021051612A1 (en) Automatic data authorization desensitization method, system, device, and storage medium
US20220075900A1 (en) Tracing objects across different parties
US20210117556A1 (en) Verification of bitstreams
EP3278263B1 (en) Computing on encrypted data using deferred evaluation
US20140380404A1 (en) Automatic data protection in a computer system
US11106820B2 (en) Data anonymization
AU2014385227A1 (en) System and methods for location based management of cloud platform data
CN110795315A (en) Method and device for monitoring service
CN113452683A (en) Method and system for controlling row-column-level authority of database
KR101086452B1 (en) System for identity management with privacy policy using number and method thereof
CN116015840B (en) Data operation auditing method, system, equipment and storage medium
CN115935421B (en) Data product release method, system and storage medium
CN114238273A (en) Database management method, device, equipment and storage medium
CN115906131B (en) Data management method, system, equipment and storage medium
US11429747B2 (en) Data management level determining method
CN114531247B (en) Data sharing method, device, equipment, storage medium and program product
US20230153457A1 (en) Privacy data management in distributed computing systems
KR102235775B1 (en) Personal information processing agency and management method and computer program
US20230153450A1 (en) Privacy data management in distributed computing systems
Abubakar et al. Personal Data and Privacy Protection: State of Literature
CN115878653A (en) Data access control method and device, electronic equipment and storage medium
CN112784247A (en) Authority verification method and device for application program
CN115935420A (en) Data processing method, device, server and medium
CN114266547A (en) Method, device, equipment, medium and program product for identifying business processing strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant