CN114638005A

CN114638005A - Data processing method, device and system based on block chain and storage medium

Info

Publication number: CN114638005A
Application number: CN202210301183.5A
Authority: CN
Inventors: 喻聪龙; 王辰淅; 杨卓; 李正煜
Original assignee: Ant Blockchain Technology Shanghai Co Ltd
Current assignee: Ant Blockchain Technology Shanghai Co Ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-17

Abstract

One or more embodiments of the present specification provide a method, an apparatus, a system, and a storage medium for data processing based on a block chain, where the method includes: the method comprises the steps of obtaining data to be processed of a data processing requester through a block chain system, wherein the data to be processed is ciphertext data obtained by encrypting target data, the target data comprises an encryption key configured by the data processing requester and sampling data and/or metadata of classified data to be processed, decrypting the data to be processed to obtain the target data, determining a sensitive information type and a sensitive information level to which the classified data to be processed belongs according to the target data, encrypting the sensitive information type and the sensitive information level by using the encryption key, and returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through the block chain system.

Description

Data processing method, device and system based on block chain and storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a method, an apparatus, and a system for processing data based on a block chain, and a storage medium.

Background

With the development of internet technology, more and more data are processed through internet technology, and the security of data is a key point of current attention. The data classification and classification is a premise and a foundation for managing data in various industries, is also a foundation work for establishing a unified and perfect data life cycle safety protection framework, and can provide support for formulating targeted data safety control measures for various industries. Based on this, it is necessary to provide a technical solution to realize safe, reliable and accurate classification and classification of data.

Disclosure of Invention

An object of one or more embodiments of the present specification is to provide a block chain-based data processing method, including: and acquiring the data to be processed of the data processing requester through the block chain system. The data to be processed is ciphertext data obtained by encrypting the target data. The target data comprises an encryption key configured by the data processing requester, and sample data and/or metadata of the classified data to be classified. And decrypting the data to be processed to obtain the target data. And determining the type and the level of the sensitive information to which the classified data belongs according to the target data. And encrypting the sensitive information type and the sensitive information level by using the encryption key. And returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a blockchain system.

An object of one or more embodiments of the present specification is to provide a block chain-based data processing method, including: metadata and/or sampling data of the classified data are obtained. And obtaining a pre-configured encryption key. The acquired data is taken as target data. And packaging and encrypting the target data to obtain data to be processed. And sending the data to be processed to a block chain node through a block chain system. And the data to be processed is used for the block chain node to determine the sensitive information type and the sensitive information level of the classified data. Receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

It is an object of one or more embodiments of the present specification to provide a blockchain-based data processing system including a transaction device of a data processing requester and a blockchain node; and the transaction equipment of the data processing requester acquires metadata and/or sampling data of the classified data. And obtaining a pre-configured encryption key. The acquired data is taken as target data. And packaging and encrypting the target data to obtain data to be processed. And sending the data to be processed to a block chain node through a block chain system. And the block chain node acquires the data to be processed. And decrypting the data to be processed to obtain the target data. And determining the type and the level of the sensitive information to which the classified data belongs according to the target data. And encrypting the sensitive information type and the sensitive information level by using the encryption key. And returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a blockchain system. And the transaction equipment of the data processing requester receives the encrypted sensitive information type and the encrypted sensitive information level returned by the block link point.

It is an object of one or more embodiments of the present specification to provide a blockchain-based data processing apparatus including: the first acquisition unit acquires the data to be processed of the data processing requester through the block chain system. The data to be processed is ciphertext data obtained by encrypting the target data. The target data comprises an encryption key configured by the data processing requester, and sample data and/or metadata of the classified data. And the first processing unit is used for decrypting the data to be processed to obtain the target data. And determining the sensitive information type and the sensitive information level of the classified data according to the target data. And the first sending unit is used for encrypting the sensitive information type and the sensitive information level by using the encryption key. And returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a blockchain system.

It is an object of one or more embodiments of the present specification to provide a blockchain-based data processing apparatus including: and the second acquisition unit is used for acquiring metadata and/or sampling data of the classified data. And acquiring a pre-configured encryption key. And a second processing unit that takes the acquired data as target data. And packaging and encrypting the target data to obtain data to be processed. And the second sending unit is used for sending the data to be processed to the block chain nodes through the block chain system. And the data to be processed is used for the block chain node to determine the sensitive information type and the sensitive information level of the classified data. The data receiving unit is used for receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

It is an object of one or more embodiments of the present specification to provide a data processing apparatus comprising: a processor; and a memory arranged to store computer executable instructions which, when executed, cause the processor to carry out the steps of the data processing method described above.

It is an object of one or more embodiments of the present specification to provide a storage medium for storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method described above.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some of the embodiments described in one or more of the specification, and that other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a schematic application scenario diagram of a data processing method based on a blockchain according to one or more embodiments of the present disclosure;

fig. 2 is a flowchart illustrating a data processing method based on a blockchain according to one or more embodiments of the present disclosure;

fig. 3 is a schematic flowchart of another data processing method based on a blockchain according to one or more embodiments of the present disclosure;

fig. 4 is a flowchart illustrating a method for processing data based on a blockchain according to one or more embodiments of the present disclosure;

fig. 5 is a schematic structural diagram of a data processing apparatus based on a blockchain according to one or more embodiments of the present disclosure;

fig. 6 is a schematic structural diagram of another block chain-based data processing apparatus according to one or more embodiments of the present disclosure;

fig. 7 is a schematic structural diagram of a data processing apparatus according to one or more embodiments of the present disclosure.

Detailed Description

In order to make the technical solutions in one or more embodiments of the present specification better understood, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments but not all embodiments of one or more portions of the present specification. All other embodiments that can be derived by a person skilled in the art from the embodiments given in one or more of the present specification without inventive step shall fall within the scope of protection of this document. It should be noted that one or more embodiments and features of the embodiments in the present description may be combined with each other without conflict. Reference will now be made in detail to one or more embodiments of the disclosure, examples of which are illustrated in the accompanying drawings.

Fig. 1 is a schematic view of an application scenario of a data processing method based on a block chain according to one or more embodiments of the present specification, as shown in fig. 1, the scenario includes: a transaction device and a blockchain system of a data processing requester. Wherein, the data processing request party can be an enterprise, a group, an organization, an individual, etc. The transaction device of the data processing requester can be a terminal device such as a mobile phone, a tablet computer, a desktop computer, a portable notebook computer and the like, and can also be a server. When the transaction device is a terminal device, a transaction-related Application may be installed in the transaction device, and the Application may be an independent Application program (App for short) or an applet embedded in another Application program. The blockchain system includes at least one blockchain node (only 1 shown in fig. 1). A data classification grading scheme is pre-deployed in the blockchain nodes, and the scheme can be deployed by large enterprises in the industry.

In an implementation manner of this specification, a transaction device of a data processing requester obtains metadata and/or sample data of data to be classified and classified, obtains a preconfigured encryption key, uses the obtained data as target data, packages and encrypts the target data to obtain the data to be processed, and initiates a transaction of an intelligent contract on a call chain to a blockchain system in response to a transaction initiation operation of the data processing requester according to the data to be processed, where the transaction carries the data to be processed, and the intelligent contract is used for classifying and classifying the data to be processed. After receiving the transaction, the block chain nodes in the block chain system broadcast the transaction in a block chain network based on a mode of P2P, after passing the transaction consensus verification, the block chain nodes run corresponding intelligent contracts, decrypt the data to be processed to obtain target data through a pre-deployed data classification scheme, determine the sensitive information type and the sensitive information level to which the classified data belongs according to the target data, encrypt the sensitive information type and the sensitive information level by using the encryption key, run pre-deployed intelligent contracts used for sending information, and return the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester.

In an embodiment of this specification, a block link point has a TEE (Trusted Execution Environment), and the block link point may execute the above process in the Trusted Execution Environment, run a corresponding intelligent contract, decrypt data to be processed by using a pre-deployed data classification scheme to obtain target data, determine a sensitive information type and a sensitive information level to which the classified data belongs according to the target data, encrypt the sensitive information type and the sensitive information level by using an encryption key, run a pre-deployed intelligent contract for sending information, and return the encrypted sensitive information type and the encrypted sensitive information level to a data processing requester.

Therefore, the data processing requester can classify and grade the data to be processed through the block chain link points based on the block chain system, determine the sensitive information type and the sensitive information grade of the data to be classified and graded, and realize the classification and grading of the data under the block chain environment, so that the effect of storing evidence in the data classification and grading process can be achieved, the reliability of the data classification and grading is improved, and a safe, reliable and accurate data classification and grading mode is realized. And by deploying the data classification and classification scheme on the chain, a large number of data processing requesters can efficiently and safely classify the data, and the service scale of the data classification and classification is improved. In addition, data are classified and classified inside the TEE, a data classification and classification scheme can be safely and efficiently used by a data processing requester, and the privacy and the reliability of data classification and classification are further improved.

Based on the application scenario architecture, one or more embodiments of the present specification provide a data processing method based on a block chain. Fig. 2 is a flowchart illustrating a method for processing data based on a blockchain according to one or more embodiments of the present disclosure, where as shown in fig. 2, the method may include the following steps:

step S202, acquiring data to be processed of a data processing requester through a block chain system; the data to be processed is ciphertext data obtained by encrypting the target data; the target data comprises an encryption key configured by a data processing requester, and sample data and/or metadata of the classified data to be classified;

step S204, decrypting the data to be processed to obtain target data, and determining the sensitive information type and the sensitive information level of the classified data according to the target data;

step S206, the sensitive information type and the sensitive information level are encrypted by using the encryption key, and the encrypted sensitive information type and the encrypted sensitive information level are returned to the data processing requester through the block chain system.

In this embodiment, the data processing requester may classify and grade the data to be processed through the block chain link points based on the block chain system, determine the sensitive information type and the sensitive information grade to which the data to be classified and graded belongs, and implement classification and grading of the data in the block chain environment, so as to achieve the effect of storing evidence in the data classification and grading process, improve the reliability of the data classification and grading, and implement a safe, reliable, and accurate data classification and grading manner. And by deploying the data classification and classification scheme on the chain, a large number of data processing requesters can efficiently and safely classify the data, and the service scale of the data classification and classification is improved. And when the block chain node has a TEE environment, the data is classified and classified inside the TEE, so that a data classification and classification scheme can be safely and efficiently used by a data processing requester, and the privacy and the reliability of data classification and classification are further improved.

The method of fig. 2 is applied to the blockchain nodes, which are performed by the blockchain nodes. The method flow of fig. 2 is described in detail below.

In step S202, the block chain node obtains the to-be-processed data of the data processing requester through the block chain system. The data to be processed is ciphertext data obtained by encrypting the target data. The target data comprises an encryption key configured by a data processing requester, and sample data and/or metadata of the classified data. Specifically, transaction equipment of a data processing requester acquires metadata and/or sampling data of classified data, acquires a pre-configured encryption key, uses the acquired data as target data, packages and encrypts the target data to obtain the to-be-processed data, and initiates a transaction of an intelligent contract on a call chain to a block chain system in response to a transaction initiation operation of the data processing requester according to the to-be-processed data, wherein the transaction carries the to-be-processed data, and the intelligent contract is used for classifying and grading the to-be-processed data. After receiving the transaction, the blockchain node broadcasts the transaction in a blockchain network based on the way of P2P, and acquires data to be processed from the transaction after passing the consensus verification of the transaction.

The data to be processed is ciphertext data obtained by encrypting the target data. In one case, the target data includes an encryption key configured by the data processing requester and sample data of the classified data. In another case, the target data includes an encryption key configured by the data processing requester and metadata of the classified data. In another case, the target data includes an encryption key configured by the data processing requester, sample data of the classified data to be classified, and metadata of the classified data to be classified.

In step S204, the block link point runs a pre-deployed intelligent contract for decryption, and decrypts the data to be processed by using a decryption key agreed with the data processing requester in advance, so as to obtain the target data. And then, operating a pre-deployed intelligent contract for classification and grading, and determining the sensitive information type and the sensitive information grade of the data to be classified and graded according to the target data. The data processing requester and the block link point are agreed with one or more sets of keys in advance, and the keys can be symmetric keys or asymmetric keys. The data processing request party packs and encrypts target data according to a pre-agreed encryption key to obtain data to be processed, and the block link points decrypt the data to be processed according to a pre-agreed decryption key to obtain the target data. The type of the sensitive information to which the classified data belongs may be, for example, a name, an identification number, a mobile phone number, an address, and the like, or may be, for example, a personal natural information, a personal basic information profile, and the like. The sensitive information level to which the classified data belongs may be, for example, a level 1, a level 2, a level 3, or the like.

In step S206, the block node runs a pre-deployed intelligent contract for encryption, and encrypts the determined sensitive information type and the sensitive information level by using an encryption key in the target data to obtain an encrypted sensitive information type and an encrypted sensitive information level, where the encryption key is configured by the data processing requester, and may be a symmetric key or an asymmetric key. And finally, the block link point runs a pre-deployed intelligent contract for sending information, and returns the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester.

As can be seen from the above description, there are three cases for the type of data included in the target data, and the process of step S204 in fig. 2 will be described in detail below according to the type of data included in the target data.

In one case, the target data includes an encryption key configured by a data processing requester, sample data of the classified data to be classified, and metadata of the classified data to be classified. Correspondingly, in step S204, according to the target data, the sensitive information type and the sensitive information level to which the classified data belongs are determined, specifically:

(a1) determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the type of sensitive information to which the sample data belongs according to the metadata; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining a sensitive information type to which the sampling data belongs according to a first sensitive information type determination model defined by a data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

(a2) determining the sensitive information level of the sampled data according to the sensitive information type of the sampled data;

(a3) and determining the sensitive information type and the sensitive information level of the classified data according to the sensitive information type and the sensitive information level of the sampled data.

In another case, the target data includes an encryption key configured by the data processing requester and metadata of the classified data. Correspondingly, in step S204, according to the target data, the sensitive information type and the sensitive information level to which the classified data belongs are determined, specifically:

(b1) acquiring a data storage address of the classified data to be classified from the metadata, and acquiring sampling data of the classified data to be classified based on the data storage address;

(b2) determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the type of sensitive information to which the sample data belongs according to the metadata; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining a sensitive information type to which the sampling data belongs according to a first sensitive information type determination model defined by a data processing requester; determining a model according to a second preset sensitive information type, and determining the sensitive information type of the sampled data;

(b3) determining the sensitive information level of the sampled data according to the sensitive information type of the sampled data;

(b4) and determining the sensitive information type and the sensitive information level of the classified data according to the sensitive information type and the sensitive information level of the sampled data.

Comparing the above two processes (a1) - (a3) and (b1) - (b4), the difference between the two processes is that the target data in the previous process includes the sample data of the classified data, the sensitive information type of the sample data is determined directly based on at least one of the four ways, the target data in the next process does not include the sample data of the classified data, the sample data of the classified data is obtained according to the metadata of the classified data, and then the sensitive information type of the sample data is determined based on at least one of the four ways. The two processes have the common point that after the sensitive information type of the sample data is determined, the sensitive information level of the sample data is determined according to the sensitive information type of the sample data, and the sensitive information type and the sensitive information level of the classified data are determined according to the sensitive information type and the sensitive information level of the sample data. These two processes are described in detail below.

In the above-mentioned action (b1), the metadata has a data storage address of the classified data to be classified, and the data storage address may include an IP address of a data storage server and a data storage directory, and first, the data storage address of the classified data to be classified is obtained from the metadata, and then, the sample data of the classified data to be classified is obtained based on the data storage address. Based on the data storage address, acquiring sampling data of classified data, specifically:

(b11) acquiring classified data to be classified according to the data storage address, and acquiring data content description information of the classified data from metadata;

(b12) when determining that a data screening rule corresponding to the classified data to be classified is configured according to the data content description information, screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, and sampling the obtained data to obtain sampled data of the classified data to be classified;

(b13) and sampling the classified data to be classified when determining that the data screening rule corresponding to the classified data to be classified is not configured according to the data content description information, so as to obtain the sampled data of the classified data to be classified.

First, in action (b11), data stored at a data storage address, that is, the hierarchical data to be classified, is obtained. The metadata is data describing the data, and the metadata of the classified data includes data content description information of the classified data, for example, the data content description information is "name", and it indicates that the data content of the classified data is "name". Thereby, the data content description information of the rating data to be classified recorded in the metadata is acquired.

In this embodiment, at least one data filtering rule is pre-configured, and each data filtering rule is applicable to corresponding data content. For example, the data content "name" is configured with a corresponding data filtering rule, and the data content "identity card" is configured with a corresponding data filtering rule. The data screening rule is used for screening data meeting corresponding data content, so that dirty data in the data are removed. In this embodiment, whether a data filtering rule corresponding to the classified data is configured is further determined according to the data content description information of the classified data.

If so, in an action (b12), the classified data to be classified is screened according to the data screening rule corresponding to the classified data to be classified, so as to obtain the data matched with the data content description information of the classified data to be classified, and the obtained data is sampled, so as to obtain the sampled data of the classified data to be classified. If not, in the action (b13), the classified data to be classified is directly sampled to obtain the sampled data of the classified data to be classified.

In one embodiment, the data content description information of the classified data indicates that the data content to be classified and classified is an identification number, the data filtering rule corresponding to the classified data includes a regular expression, and the classified data is filtered according to the data filtering rule corresponding to the classified data to obtain data matching with the data content description information of the classified data, specifically: and judging whether the classified data to be classified conforms to the regular expression or not, and if so, screening the conformed classified data to be classified as data matched with the data content description information of the classified data to be classified.

For example, identity card data can be derived to obtain a regular expression through deduction: and if the data content of the classified data to be classified which accords with the regular expression is an identity card, screening the classified data to be classified which accords with the regular expression as data matched with the data content description information of the classified data to be classified, thereby eliminating dirty data in the classified data to be classified.

In one embodiment, the data content description information of the classified data indicates that the data content to be classified and classified is a name, the data filtering rule corresponding to the classified data includes a normal distribution function, and the classified data is filtered according to the data filtering rule corresponding to the classified data to obtain data matching with the data content description information of the classified data, specifically: and substituting the length of the classified data into the normal distribution function to calculate to obtain a function result, and screening the classified data serving as data matched with the data content description information if the function result meets the requirement of a preset result.

Specifically, the length of the name is subjected to sampling fitting to finally obtain a normal distribution function, wherein in the function, the probability distribution of names of one word and four words is low, and the probability of names of two words and three words is high. Assuming that the probability of name length fits a normal distribution function:

wherein the number u is 2.5,

is 1, x is the name length, e.g., 1, 2, 3, 4, 5, f (x) is the corresponding probability.

In this embodiment, the length of the classified data is substituted into the normal distribution function to perform calculation, so as to obtain a function result, and if the function result meets the preset result requirement, for example, if the function result is greater than 0.5, the classified data to be classified is screened out as data matched with the data content description information "name", so as to eliminate dirty data in the classified data. The length of the hierarchical data to be classified refers to the length of the data stored in any row and column in the data storage table.

In the above-described operation (b12), the data obtained by the screening is sampled to obtain sampled data of the classified data. In the above-mentioned operation (b13), the classification data to be classified is directly sampled, and the sampled data of the classification data to be classified is obtained. In both actions, a predetermined amount of data may be extracted as the data is sampled.

Although the metadata records the data content description information of the classified data to be classified, the classified data to be classified still has the possibility of being mixed with dirty data, so in this embodiment, when a data screening rule corresponding to the classified data to be classified is configured, the classified data to be classified is screened according to the corresponding data screening rule to obtain data matched with the data content description information of the classified data to be classified, and then sampling is performed on the obtained data to obtain sampled data of the classified data to be classified, so that all the classified data to be classified does not need to be obtained, and the data source quality is improved and the accuracy rate of the classified data is improved by using smaller resource consumption (cpu, memory and the like).

In the above action (a1) or (b2), the sensitive information type to which the sample data belongs is determined according to at least one of the following determination manners:

(1) determining the type of sensitive information to which the sample data belongs according to the metadata;

(2) determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source;

(3) determining a sensitive information type to which the sampled data belongs according to a first sensitive information type determination model defined by a data processing requester;

(4) determining a sensitive information type to which the sampling data belongs according to a pre-configured second sensitive information type determination model;

the method comprises the following steps of determining the type of sensitive information to which sample data belongs according to at least one of the following determination modes, specifically:

the method comprises the steps of obtaining data content description information of hierarchical data to be classified from metadata, obtaining a preset rule for determining a classification mode based on the data content description information, selecting at least one of the determination modes as a target determination mode according to the data content description information and the rule for determining the classification mode based on the data content description information, and determining the type of sensitive information to which the sample data belongs according to the target determination mode.

Specifically, first, data content description information of the hierarchical data to be classified is obtained from the metadata, for example, the data content description information is "mobile phone number", and then, a rule for determining a classification manner based on the data content description information, which is pre-configured by a data processing requester, is obtained, where the rule may indicate which manner is selected from the four manners to determine a sensitive information type to which the sample data belongs according to the data content description information. And then, according to the acquired data content description information of the classified data to be classified and a rule which is configured in advance by a data processing requester and is based on the data content description information, determining a classification mode, selecting at least one mode from the four determination modes as a target determination mode, and determining the type of the sensitive information to which the sample data belongs according to the target determination mode.

In one embodiment, in the manner (1) above, the type of sensitive information to which the sample data belongs is determined according to the metadata, and specifically:

(11) acquiring data source description information of classified data to be classified from metadata, and acquiring a preset rule for determining the type of sensitive information based on the data source description information;

(12) and determining the sensitive information type of the sampled data according to the data source description information and the rule for determining the sensitive information type based on the data source description information.

First, data source description information of the classified data is obtained from the metadata, the data source description information is used to describe a data source of the classified data, the data source includes a library name, a table name, a row name, a column name, a data name, and the like of the data, for example, the data source description information may be "xx. And acquiring a rule which is configured in advance by the data processing requester and is used for determining the type of the sensitive information based on the data source description information, wherein the rule is used for indicating the type of the sensitive information of the classified data when the data source description information of the classified data comprises what content.

And finally, determining the sensitive information type of the sampled data according to the acquired data source description information and a rule for determining the sensitive information type based on the data source description information. For example, the rule for determining the sensitive information type based on the data source description information indicates that the sensitive information type to which the hierarchical data to be classified belongs is "manager" when the library name of the data source is xx, the table name is desens, the row name is rule, the column name is info, and the data name is winner.

In this embodiment, the sensitive information type of the sample data can be determined according to the data source description information of the classified data recorded in the metadata of the classified data to be classified and the rule, which is configured in advance by the data processing requester, for determining the sensitive information type based on the data source description information, and the determination mode is simple and efficient.

In another embodiment, in the above mode (1), the sensitive information type to which the sample data belongs is determined according to the metadata, specifically:

(13) obtaining remark information of the classified data to be classified from the metadata, and obtaining a preset rule for determining the type of the sensitive information based on the remark information;

(14) and determining the sensitive information type of the sampled data according to the remark information and the rule for determining the sensitive information type based on the remark information.

Firstly, obtaining remark information of the classified data to be classified from the metadata, wherein the remark information can be added by database maintenance personnel and is used for recording remarks of the classified data to be classified. And acquiring a rule which is configured in advance by the data processing requester and is used for determining the type of the sensitive information based on the remark information, wherein the rule is used for indicating the type of the sensitive information of the classified data when the remark information of the classified data comprises what content.

And finally, determining the sensitive information type of the sampled data according to the acquired remark information and a rule for determining the sensitive information type based on the remark information. For example, it is indicated in the rule for determining the sensitive information type based on the remark information that when "owner" is included in the remark information, the sensitive information type to which the hierarchical data to be classified belongs is "manager".

In the embodiment, the sensitive information type of the sample data can be determined according to the remark information of the classified data to be recorded in the metadata of the classified data to be classified and the rule which is pre-configured by the data processing requester and is used for determining the sensitive information type based on the remark information, and the determination mode is simple and efficient.

In an embodiment, in the above mode (2), the sensitive information type to which the sample data belongs is determined by comparing the sample data with a preconfigured data source, specifically:

(21) for each data source configured in advance, comparing the sampled data with the data in the data source to determine the same data as the data in the data source in the sampled data and determine the proportion of the same data in the sampled data;

(22) and determining the sensitive information type of the sampled data according to each proportion and the sensitive information type corresponding to each data source.

First, a plurality of data sources are configured in advance in a blockchain node, and each data source stores a plurality of pieces of data. For each data source, the sampled data is compared with the data in the data source so as to determine the same data in the sampled data as the data in the data source and determine the data volume ratio of the same data in the sampled data. For example, for any data source, the sampled data is compared with the data in the data source, it is determined that there are 100 pieces of data in the sampled data that are the same as the data in the data source, and there are 200 pieces of data in the sampled data, so that it is determined that the data amount of the same data in the sampled data accounts for 50%. Through this process, a proportion that results in the same data can be determined for each data source.

And then, selecting the ratio with the highest value from each ratio, determining the data source corresponding to the highest ratio, and determining the sensitive information type corresponding to the data source as the sensitive information type to which the sampled data belongs. For example, there are 3 data sources, which are respectively constellation, nationality and nationality, where the proportion is 20% for constellation, 90% for nationality and 10% for nationality, and then the sensitive information type "nationality" corresponding to the nationality of the data sources is determined, and is the sensitive information type to which the sample data belongs.

In the embodiment, the sensitive information type of the sampled data is determined by pre-configuring a plurality of data sources and calculating the proportion of the same data in the sampled data as the data in each data source, and the determination method is simple, rapid and efficient.

In an embodiment, in the above manner (3), the determining, according to the first sensitive information type determination model customized by the data processing requester, the sensitive information type to which the sample data belongs is specifically:

(31) the sampling data are sequentially input into each first sensitive information type determining model, and a first processing result of each first sensitive information type determining model on the sampling data is obtained;

(32) determining the type of sensitive information to which the sampled data belongs according to each first processing result; the first sensitive information type determining model corresponds to the sensitive information types one by one.

In this embodiment, a plurality of first sensitive information type determination models are trained in advance, and each first sensitive information type determination model corresponds to one sensitive information type and is used for identifying whether the sample data is the sensitive information type. The training process of the first sensitive information type determination model will be described later.

In the action (31), the sample data is sequentially input into each first sensitive information type determination model for processing, and a first processing result of each first sensitive information type determination model on the sample data is obtained, where the first processing result may include "yes" and "no", where "yes" indicates that the sample data belongs to the sensitive information type corresponding to the corresponding first sensitive information type determination model, and "no" indicates that the sample data does not belong to the sensitive information type corresponding to the corresponding first sensitive information type determination model.

In act (32), a type of sensitive information to which the sampled data belongs is determined based on the respective first processing results. Specifically, the sensitive information type corresponding to the first sensitive information type determination model with the processing result of yes is determined as the sensitive information type to which the sample data belongs.

The first sensitive information type determination model is a user-defined model and can be obtained by training in the following mode: the method comprises the steps of obtaining a model training sample uploaded by a data processing requester through a block chain system and a sensitive information type label to which the model training sample belongs, and training a first sensitive information type determination model according to the model training sample and the sensitive information type label to which the model training sample belongs.

Specifically, the sensitive information type may be an identification card, a name, a mobile phone number, or the like. The first sensitive information type determination model may be a classification model, and taking the sensitive information type as a mobile phone number as an example, the corresponding first sensitive information type determination model may be obtained by training in the following manner. Firstly, a data processing requester initiates a transaction on a blockchain system, wherein the transaction carries a plurality of mobile phone numbers and a plurality of other character strings except the mobile phone numbers, and carries a sensitive information type label '1' to which the mobile phone numbers belong and a sensitive information type label '0' to which other characters belong. The block chain link point acquires the transaction, so that a plurality of mobile phone numbers, sensitive information type labels '1' to which the mobile phone numbers belong, and sensitive information type labels '0' to which other character strings and other characters belong are acquired. And training a pre-constructed classification model by using the acquired data until the model converges, and using the training converged classification model as a first sensitive information type determination model for identifying the mobile phone number.

In the embodiment, the model training samples provided by the data processing requester and the sensitive information type labels to which the model training samples belong are used for training to obtain the first sensitive information type determination model defined by the data processing requester, and the sensitive information types to which the sampled data belongs are identified by using the trained first sensitive information type determination models, so that the method has the advantages of accuracy and reliability in identification.

In an embodiment, in the above manner (4), the sensitive information type to which the sample data belongs is determined according to a second sensitive information type determination model configured in advance, and specifically:

(41) the sampling data are sequentially input into each second sensitive information type determining model, and a second processing result of each second sensitive information type determining model on the sampling data is obtained;

(42) determining the type of the sensitive information to which the sampled data belongs according to each second processing result; and the second sensitive information type determining model corresponds to the sensitive information types one by one.

In this embodiment, a plurality of second sensitive information type determination models are trained in advance, and each second sensitive information type determination model corresponds to one sensitive information type and is used for identifying whether the sample data is of the sensitive information type. The training process of the second sensitive information type determination model will be described later. The first sensitive information type determination model and the second sensitive information type determination model are different in that the first sensitive information type determination model is a self-defined model of a data processing request party and is obtained by training through a training sample provided by the data processing request party, the second sensitive information type determination model is obtained by training block link points through a training sample obtained by the second sensitive information type determination model, and the training process is irrelevant to the data processing request party.

In the action (41), the sampled data is sequentially input into each second sensitive information type determination model for processing, and a second processing result of each second sensitive information type determination model on the sampled data is obtained, where the second processing result may include "yes" and "no", where "yes" indicates that the sampled data belongs to the sensitive information type corresponding to the corresponding second sensitive information type determination model, and "no" indicates that the sampled data does not belong to the sensitive information type corresponding to the corresponding second sensitive information type determination model.

In act (42), a sensitive information type to which the sampled data belongs is determined based on the respective second processing results. Specifically, the sensitive information type corresponding to the second sensitive information type determination model with the processing result of yes is determined as the sensitive information type to which the sample data belongs.

The sensitive information type can be an identity card, a name, a mobile phone number and the like. The second sensitive information type determination model may be a classification model, and for example, the sensitive information type is a mobile phone number, and the corresponding second sensitive information type determination model may be obtained by training in the following manner. The method comprises the steps of obtaining a plurality of mobile phone numbers as positive samples, obtaining other character strings except the mobile phone numbers as negative samples, setting sample labels on the positive samples and the negative samples respectively, training a pre-constructed classification model by using the positive samples and the negative samples after the labels are set until the model converges, and taking the classification model with the converged training as a second sensitive information type determination model for identifying the mobile phone numbers.

In the embodiment, each second sensitive information type determination model trained by the block link points is used for identifying the sensitive information type to which the sampled data belongs, and the method has the advantages of accuracy and reliability in identification.

As can be seen from the above description, in the above action (a1) or (b2), the sensitive information type to which the sample data belongs is determined according to at least one of the above four determination manners (1) - (4). In this embodiment, after determining the sensitive information type to which the sample data belongs according to at least two of the above four determination manners, and obtaining at least two sensitive information types, a pre-configured type screening policy may also be obtained, where the policy may be configured by the data processing requester, and according to the type screening policy, a sensitive information type is screened from the at least two sensitive information types as a final sensitive information type of the sample data.

The type screening policy may specify, when there are multiple sensitive information types determined in multiple ways, which way determines the obtained sensitive information type, so that the sensitive information type determined in this way is used as the final sensitive information type of the sample data. For example, in one case, the type screening policy may record priorities of the four determination methods (1) to (4), and when there are multiple sensitive information types determined by multiple methods, determine the sensitive information type determined by the method with the highest priority as the final sensitive information type of the sample data.

In the above action (a2) or (b3), the sensitive information level to which the sample data belongs is determined according to the sensitive information type to which the sample data belongs, and the action is specifically: and searching a preset type level mapping relation according to the sensitive information type of the sample data to obtain the sensitive information level of the sample data.

The sensitive information type to which the sampling data belongs may include sensitive information fields, such as a mobile phone number, an identification number, and the like. In an embodiment, a pre-configured mapping relationship may be searched according to the sensitive information field to obtain the previous-stage or multi-stage sensitive information description, for example, the previous stage is the personal basic information, the company basic information, and the like, and the previous stage is the personal attribute information, the company financial information, and the like, and the previous-stage or multi-stage sensitive information description is also used as the sensitive information type.

In the above-mentioned action (a3) or (b4), the sensitive information type and the sensitive information level to which the classified data belongs are determined according to the sensitive information type and the sensitive information level to which the sample data belongs, and specifically, the sensitive information type and the sensitive information level to which the sample data belongs are used as the sensitive information type and the sensitive information level to which the classified data belongs. Since the sampling data is from the classified data to be classified and is the sampling representation of the classified data to be classified, the sensitive information type and the sensitive information level to which the sampling data belongs are used as the sensitive information type and the sensitive information level to which the classified data belongs.

(c1) determining the sensitive information type to which the sample data belongs according to the metadata according to at least one of the following determination modes: acquiring data source description information of classified data to be classified from metadata, acquiring a preset rule for determining the type of sensitive information based on the data source description information, and determining the type of sensitive information to which the sampled data belongs according to the data source description information and the rule for determining the type of sensitive information based on the data source description information; acquiring remark information of the classified data to be classified from the metadata, acquiring a preset rule for determining the type of the sensitive information based on the remark information, and determining the type of the sensitive information to which the sampled data belongs according to the remark information and the rule for determining the type of the sensitive information based on the remark information;

(c2) determining the sensitive information level of the sampled data according to the sensitive information type of the sampled data;

(c3) and determining the sensitive information type and the sensitive information level of the classified data according to the sensitive information type and the sensitive information level of the sampled data.

In this case, the data content description information of the classified data to be classified may be obtained from the metadata, and a rule configured in advance to determine the classification manner based on the data content description information may be obtained, the rule to determine the classification manner based on the data content description information and the data content description information may be obtained, at least one of the two determination manners may be selected as a target determination manner, and the type of the sensitive information to which the sample data belongs may be determined based on the target determination manner and the metadata.

Correspondingly, after the sensitive information type to which the sample data belongs is determined according to at least two of the two determination modes, and at least two sensitive information types are obtained, a pre-configured type screening strategy can be obtained, wherein the strategy can be configured by the data processing requester, and a sensitive information type is screened from the at least two sensitive information types according to the type screening strategy to be used as the final sensitive information type of the sample data.

The above is a process of determining the sensitive information type and the sensitive information level to which the classified data belongs in the case where the target data includes the encryption key and the metadata, and the detailed process of the process is the same as the foregoing description, and reference may be made to the foregoing description, and will not be repeated here.

In another case, the target data includes an encryption key configured by the data processing requester, and sample data of the classified data. Correspondingly, in step S204, according to the target data, the sensitive information type and the sensitive information level to which the classified data belongs are determined, specifically:

(d1) according to any one of the following determining modes, determining the sensitive information type of the sampled data according to the sampled data; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining a sensitive information type to which the sampling data belongs according to a first sensitive information type determination model defined by a data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

(d2) determining the sensitive information level of the sample data according to the sensitive information type of the sample data;

(d3) and determining the sensitive information type and the sensitive information level of the classified data according to the sensitive information type and the sensitive information level of the sampled data.

The above is a process of determining the sensitive information type and the sensitive information level to which the classified data belongs in the case that the target data includes an encryption key and sample data, and the detailed process of the process is the same as the foregoing description, and reference may be made to the foregoing description, and the description is not repeated here.

In one embodiment, the block link point has a trusted execution environment TEE, the TEE executes decryption on data to be processed to obtain target data, the sensitive information type and the sensitive information level of the classified data are determined according to the target data, and the sensitive information type and the sensitive information level are encrypted by using an encryption key. The act of returning the encrypted sensitive information type and sensitive information level to the data processing requestor may also be performed by the TEE.

In the embodiment, when the block chain node has the TEE environment, the data is classified and classified inside the TEE, so that the data processing requester can safely and efficiently use a data classification and classification scheme, and the privacy and the reliability of data classification and classification are further improved.

In one embodiment, after the blockchain node returns the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through the blockchain system in step S208, the data processing requester may decrypt the received information according to a decryption key corresponding to the encryption key to obtain the sensitive information type and the sensitive information level of the classified data, thereby implementing an effect of classifying and classifying the data by using the data classification service on the blockchain.

In one embodiment, after the encrypted sensitive information type and the encrypted sensitive information level are returned to the data processing requester through the blockchain system, the method further includes: acquiring the acquisition time of data to be processed, the type of the encrypted sensitive information and the generation time of the level of the encrypted sensitive information; and generating a data processing record according to the data to be processed, the acquisition time, the generation time, the encrypted sensitive information type and the encrypted sensitive information level, and storing the data processing record in the block chain system.

Specifically, the time of receiving the data to be processed is obtained as the obtaining time of the data to be processed, the encrypted sensitive information type and the encrypted sensitive information level generation time are obtained, and the data to be processed, the obtaining time, the generation time, the encrypted sensitive information type and the encrypted sensitive information level are combined into a data processing record and stored in a block chain system, so that the effect of enabling the data processing process to be complete and facilitating the later-period tracing is achieved.

The above describes the data processing procedure based on the block chain from the block chain link point angle, and the following describes the data processing procedure based on the block chain from the data processing requester. Fig. 3 is a flowchart illustrating another method for processing data based on a blockchain according to one or more embodiments of the present disclosure, where as shown in fig. 3, the method may include the following steps:

step S302, obtaining metadata and/or sampling data of the classified data, and obtaining a pre-configured encryption key;

step S304, the acquired data is used as target data, and the target data is packaged and encrypted to obtain data to be processed;

step S306, sending the data to be processed to a block chain node through a block chain system; the data to be processed is used for determining the sensitive information type and the sensitive information level of the classified data by the block chain node;

step S308, receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

The method of fig. 3 may be performed by a transaction device of a data processing requestor. In the step S302, the transaction device of the data processing requester obtains the pre-configured encryption key, and obtains the metadata and/or the sample data of the classified data. The obtaining of the metadata of the classified data may be obtaining the metadata describing the classified data from a database storing the classified data. The method comprises the following steps of obtaining sampling data of classified data to be classified, specifically:

(1) acquiring data content description information of the classified data from metadata of the classified data;

(2) when determining that a data screening rule corresponding to the classified data is configured according to the data content description information, screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, and sampling the obtained data to obtain sampled data of the classified data;

(3) and sampling the classified data to be classified when determining that the data screening rule corresponding to the classified data to be classified is not configured according to the data content description information, so as to obtain the sampled data of the classified data to be classified.

Wherein, the data content description information indicates that the data content is an identity card number; the corresponding data screening rule comprises a regular expression; screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, wherein the data screening method comprises the following steps: judging whether the classified data conforms to a regular expression or not; and if the data are in accordance with the classification information, screening the classified data to be classified as the data matched with the data content description information.

Wherein, the data content description information represents that the data content is a name; the corresponding data screening rule comprises a normal distribution function; screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, wherein the data screening method comprises the following steps: substituting the length of the classified data into a normal distribution function to calculate to obtain a function result; and if the function result meets the requirement of a preset result, screening the classified data serving as the data matched with the data content description information.

The process of acquiring the sample data by the transaction device of the data request processing party is the same as the process of acquiring the sample data by the block link point, so the process can refer to the sampling process of the previous action (b11) -action (b13), and is not repeated here.

In the step S304, the transaction device uses the acquired data as target data, and uses a secret key agreed with the block link point in advance to perform packing encryption on the target data, so as to obtain data to be processed. In step S306, the transaction device initiates a transaction on the blockchain system, where the transaction carries to-be-processed data, so that the to-be-processed data is sent to the blockchain nodes through the blockchain system for data classification and classification. In the step S308, the transaction device receives the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the blockchain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

Fig. 4 is a flowchart illustrating a method for processing data based on a blockchain according to one or more embodiments of the present disclosure, where the method may include the following steps, as shown in fig. 4:

step S402, the transaction equipment of the data processing requester acquires metadata, sampling data and a pre-configured encryption key of the classified data;

step S404, the trading device of the data processing requester encrypts metadata, sampling data and an encryption key by using an RSA public key agreed with the TEE side of the block chain node;

step S406, the transaction equipment of the data processing requester initiates a transaction on the blockchain system, and sends the encrypted data to the blockchain node through the blockchain system;

step S408, the TEE side in the block chain node calls a pre-deployed intelligent contract, and decrypts the received data by using an RSA private key agreed with the transaction equipment to obtain metadata, sampling data and an encryption private key;

step S410, calling a pre-deployed intelligent contract by a TEE side in a block chain node, and determining a sensitive data type and a sensitive data level of the classified data according to metadata and sampling data;

step S412, a TEE side in the block chain node calls a pre-deployed intelligent contract, and according to an encryption key, the sensitive data type and the sensitive data level to which the classified data belongs are encrypted;

step S414, the TEE side in the block chain node initiates a transaction in the block chain system, and returns the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester;

in step S416, the transaction device of the data processing requester decrypts the encrypted sensitive information type and the encrypted sensitive information level by using the decryption key corresponding to the encryption key, so as to obtain a plaintext.

In summary, the above data processing method based on the block chain has at least the following beneficial effects:

(1) the data classification and classification are realized based on the TEE and the block chain platform, the trust problem between a classification and classification service user and a service provider can be effectively solved, and the small and medium-sized enterprises can use the data classification and classification service in a compliance, safe and efficient manner;

(2) the TEE environment is utilized to classify and grade data, so that a user can safely and efficiently use complex classification and grading services of a large enterprise, and the problem of credible classification and grading in a complex scene is solved;

(3) the block chain is combined to carry out on-chain classification and grading, the classification and grading service used by the user each time is stored, the reliability is further improved, and a large number of users can efficiently and safely use the classification and grading service through a set of uplink classification and grading service deployment;

(4) sampling preposed rule screening is carried out by utilizing data characteristic distribution or a regular model, and the quality of a data source and the accuracy of data classification and classification are improved by utilizing smaller resource consumption (CPU, memory and the like);

(5) data classification is carried out in a mode of jointly using the metadata rule and the AI model, under the condition of keeping the identification precision, the identification speed is improved, the system performance consumption is reduced, and the data classification experience is improved.

One or more embodiments of the present specification further provide a blockchain-based data processing system, including a transaction device of a data processing requester and a blockchain node;

the transaction equipment of the data processing requester acquires metadata and/or sampling data of the classified data to be classified and acquires a pre-configured encryption key; the acquired data is used as target data, and the target data is packed and encrypted to obtain data to be processed; sending data to be processed to a block chain node through a block chain system;

the block chain node acquires data to be processed, decrypts the data to be processed to obtain target data, and determines the sensitive information type and the sensitive information level of the classified data according to the target data; encrypting the sensitive information type and the sensitive information level by using the encryption key, and returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through the block chain system;

and the transaction equipment of the data processing requester receives and decrypts the encrypted sensitive information type and the encrypted sensitive information level returned by the block link node to obtain the sensitive information type and the sensitive information level.

The detailed description of the system may refer to the preceding method sections and will not be repeated here.

One or more embodiments of the present specification further provide a data processing apparatus based on a blockchain, which is applied to the above-mentioned blockchain node, and fig. 5 is a schematic structural diagram of a data processing apparatus based on a blockchain provided in one or more embodiments of the present specification, as shown in fig. 5, the apparatus includes:

a first obtaining unit 51, which obtains the data to be processed of the data processing requester through the block chain system; the data to be processed is ciphertext data obtained by encrypting target data; the target data comprises an encryption key configured by the data processing requester, and sample data and/or metadata of the classified data to be classified;

the first processing unit 52 is configured to decrypt the to-be-processed data to obtain the target data, and determine a sensitive information type and a sensitive information level to which the to-be-classified data belongs according to the target data;

the first sending unit 53 encrypts the sensitive information type and the sensitive information level by using the encryption key, and returns the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a blockchain system.

Optionally, the target data comprises the metadata and the sample data; the first processing unit 52 is provided with a first processing unit,

determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the type of sensitive information to which the sample data belongs according to the metadata; determining the type of the sensitive information to which the sampled data belongs by comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

determining the sensitive information level of the sample data according to the sensitive information type of the sample data;

and determining the sensitive information type and the sensitive information level of the classified data according to the sensitive information type and the sensitive information level of the sampled data.

Optionally, the target data comprises the metadata; the first processing unit 52 is provided with a first processing unit,

acquiring a data storage address of the classified data to be classified from the metadata, and acquiring sampling data of the classified data to be classified based on the data storage address;

determining the sensitive information type to which the sample data belongs according to at least one of the following determination modes: determining the sensitive information type of the sample data according to the metadata; determining the type of the sensitive information to which the sampled data belongs by comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

Alternatively, the first processing unit 52,

acquiring data source description information of the classified data to be classified from the metadata, and acquiring a preset rule for determining the type of sensitive information based on the data source description information;

and determining the sensitive information type of the sampling data according to the data source description information and the rule for determining the sensitive information type based on the data source description information.

Alternatively, the first processing unit 52,

obtaining remark information of the classified data to be classified from the metadata, and obtaining a preset rule for determining the type of sensitive information based on the remark information;

and determining the sensitive information type of the sampling data according to the remark information and the rule for determining the sensitive information type based on the remark information.

Alternatively, the first processing unit 52,

for each pre-configured data source, comparing the sampled data with the data in the data source to determine the same data in the sampled data as the data in the data source and determine the proportion of the same data in the sampled data;

and determining the sensitive information type of the sampling data according to the occupation ratio and the sensitive information type corresponding to the data source.

Alternatively, the first processing unit 52,

sequentially inputting the sampling data into each first sensitive information type determining model, and acquiring a first processing result of each first sensitive information type determining model on the sampling data;

determining the type of sensitive information to which the sampling data belongs according to each first processing result; and the first sensitive information type determining model corresponds to the sensitive information types one by one.

Alternatively, the first processing unit 52,

sequentially inputting the sampled data into each second sensitive information type determination model, and acquiring a second processing result of each second sensitive information type determination model on the sampled data;

determining the type of the sensitive information to which the sampling data belongs according to each second processing result; and the second sensitive information type determining model corresponds to the sensitive information types one by one.

The data processing method in this embodiment can implement the above-described processes of the data processing method applied to the block chain node, and achieve the same effect and function, which are not repeated here.

One or more embodiments of the present specification further provide a block chain-based data processing apparatus, which is applied to the transaction device of the data processing requester, and fig. 6 is a schematic structural diagram of another block chain-based data processing apparatus provided in one or more embodiments of the present specification, as shown in fig. 6, where the apparatus includes:

a second obtaining unit 61, configured to obtain metadata and/or sample data of the classified data, and obtain a pre-configured encryption key;

the second processing unit 62 is configured to perform packing and encryption on the target data to obtain data to be processed, with the obtained data as the target data;

a second sending unit 63, configured to send the data to be processed to a blockchain node through a blockchain system; the data to be processed is used for the block chain node to determine the sensitive information type and the sensitive information level of the classified data;

the data receiving unit 64 is used for receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

Alternatively, the second acquisition unit 61,

acquiring data content description information of the classified data from metadata of the classified data;

when determining that a data screening rule corresponding to the classified data is configured according to the data content description information, screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, and sampling the obtained data to obtain sampled data of the classified data;

and sampling the classified data to be classified when determining that a data screening rule corresponding to the classified data to be classified is not configured according to the data content description information, so as to obtain sampled data of the classified data to be classified.

The data processing method in this embodiment can implement the above-described processes of the data processing method applied to the data processing requester, and achieve the same effects and functions, which are not repeated here.

Fig. 7 is a schematic structural diagram of a data processing device according to one or more embodiments of the present disclosure, and as shown in fig. 7, the data processing device may have a relatively large difference due to different configurations or performances, and may include one or more processors 1001 and a memory 1002, and one or more applications or data may be stored in the memory 1002. Memory 1002 may be, among other things, transient storage or persistent storage. The application programs stored in memory 1002 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a data processing device. Still further, the processor 1001 may be configured to communicate with the memory 1002 to execute a series of computer-executable instructions in the memory 1002 on a data processing device. The data processing apparatus may also include one or more power supplies 1003, one or more wired or wireless network interfaces 1004, one or more input-output interfaces 1005, one or more keyboards 1006, etc.

In a particular embodiment, the data processing apparatus is a blockchain node as described above, comprising a processor and a memory arranged to store computer executable instructions which, when executed, cause the processor to carry out the following procedure:

acquiring data to be processed of a data processing requester through a block chain system; the data to be processed is ciphertext data obtained by encrypting target data; the target data comprises an encryption key configured by the data processing requester, and sample data and/or metadata of the classified data to be classified;

decrypting the data to be processed to obtain the target data, and determining the sensitive information type and the sensitive information level of the classified data according to the target data;

and encrypting the sensitive information type and the sensitive information level by using the encryption key, and returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a block chain system.

Optionally, the computer executable instructions, when executed, the target data comprises the metadata and the sample data; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the sensitive information type and the sensitive information level comprise the following steps:

determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the type of sensitive information to which the sample data belongs according to the metadata; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

Optionally, the computer executable instructions, when executed, the target data comprises the metadata; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the sensitive information type and the sensitive information level comprise the following steps:

determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the sensitive information type of the sample data according to the metadata; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model customized by the data processing request party; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

Optionally, the computer-executable instructions, when executed, determine a sensitive information type to which the sample data belongs based on the metadata, comprising:

Optionally, when executed, the computer-executable instructions determine the type of sensitive information to which the sample data belongs by comparing the sample data with a preconfigured data source, including:

Optionally, when executed, the computer-executable instructions determine the sensitive information type to which the sample data belongs according to a first sensitive information type determination model customized by the data processing requester, including:

Optionally, when executed, the computer-executable instructions determine, according to a second sensitive information type determination model configured in advance, a sensitive information type to which the sample data belongs, including:

the sampling data are sequentially input into each second sensitive information type determining model, and a second processing result of each second sensitive information type determining model on the sampling data is obtained;

The data processing device provided in this embodiment can implement each process of the data processing method applied to the block chain node, and achieve the same effect and function, which is not repeated here.

In a specific embodiment, the data processing apparatus is the transaction apparatus of the data processing requester, comprising a processor and a memory arranged to store computer executable instructions, which when executed cause the processor to implement the following process:

acquiring metadata and/or sampling data of classified data to be classified, and acquiring a pre-configured encryption key;

the acquired data is used as target data, and the target data is packed and encrypted to obtain data to be processed;

sending the data to be processed to a block chain node through a block chain system; the data to be processed is used for the block chain node to determine the sensitive information type and the sensitive information level of the classified data;

receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

Optionally, the computer executable instructions, when executed, obtain sample data of the classified data, comprising:

The data processing apparatus provided in this embodiment can implement the above-described processes of the data processing method applied to the data processing requester, and achieve the same effects and functions, which are not repeated here.

Further, one or more embodiments of the present specification further provide a storage medium for storing computer-executable instructions, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and the storage medium stores computer-executable instructions that, when executed by a processor, implement the following processes:

Optionally, the storage medium stores computer-executable instructions that, when executed by a processor, the target data includes the metadata and the sample data; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the sensitive information type and the sensitive information level comprise the following steps:

determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the sensitive information type of the sample data according to the metadata; determining the type of the sensitive information to which the sampled data belongs by comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a second sensitive information type determination model configured in advance;

Optionally, the storage medium stores computer-executable instructions that, when executed by a processor, the target data includes the metadata; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the sensitive information type and the sensitive information level comprise the following steps:

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, determine a sensitive information type to which the sample data belongs based on the metadata, including:

Optionally, the storage medium stores computer executable instructions that, when executed by the processor, determine the type of sensitive information to which the sample data belongs by comparing the sample data with a preconfigured data source, including:

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, determine a sensitive information type to which the sample data belongs according to a first sensitive information type determination model customized by the data processing requester, including:

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, determine a sensitive information type to which the sample data belongs according to a second sensitive information type determination model configured in advance, including:

The storage medium provided in this embodiment can implement the above-described processes of the data processing method applied to the blockchain node, and achieve the same effects and functions, which are not repeated here.

In a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and the storage medium stores computer executable instructions that, when executed by the processor, implement the following process:

the method comprises the steps that acquired data serve as target data, and the target data are packaged and encrypted to obtain data to be processed;

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, obtain sample data of the classified data, including:

when determining that a data screening rule corresponding to the classified data to be classified is configured according to the data content description information, screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, and sampling the obtained data to obtain sampled data of the classified data to be classified;

The storage medium provided in this embodiment can implement the above-described processes of the data processing method applied to the data processing requester, and achieve the same effects and functions, which are not repeated here.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90's of the 20 th century, improvements to a technology could clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements to process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: the ARC625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations of one or more of the present descriptions.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied in the medium.

One or more of the present specification has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to one or more embodiments of the specification. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

One or more of the present specification can be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is merely illustrative of one or more embodiments of the present disclosure and is not intended to limit one or more embodiments of the present disclosure. Various modifications and alterations to one or more of the present descriptions will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of one or more of the present specification should be included in the scope of one or more claims of the present specification.

Claims

1. A data processing method based on a block chain comprises the following steps:

2. The method of claim 1, the target data comprising the metadata and the sample data; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the steps of:

determining the sensitive information type to which the sampled data belongs according to at least one of the following determination modes: determining the type of sensitive information to which the sample data belongs according to the metadata; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a pre-configured second sensitive information type determination model;

3. The method of claim 1, the target data comprising the metadata; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the steps of:

4. The method of claim 2 or 3, determining the type of sensitive information to which the sample data belongs based on at least one of the following determinations, comprising:

acquiring data content description information of the classified data to be classified from the metadata, and acquiring a preset rule for determining a classification mode based on the data content description information;

and determining a classification mode according to the data content description information and the rule based on the data content description information, selecting at least one of the determination modes as a target determination mode, and determining the sensitive information type of the sample data according to the target determination mode.

5. The method of claim 2 or 3, determining from the metadata a type of sensitive information to which the sample data belongs, comprising:

6. The method of claim 2 or 3, determining from the metadata a type of sensitive information to which the sample data belongs, comprising:

7. The method of claim 2 or 3, wherein determining the sensitive information type to which the sample data belongs by comparing the sample data with a preconfigured data source comprises:

for each data source configured in advance, comparing the sampled data with the data in the data source to determine the same data as the data in the data source in the sampled data and determine the proportion of the same data in the sampled data;

8. The method of claim 2 or 3, wherein the determining the sensitive information type to which the sample data belongs according to the first sensitive information type determination model customized by the data processing requester comprises:

9. The method of claim 8, wherein the first sensitive information type determination model is trained by:

obtaining a model training sample uploaded by the data processing requester through a block chain system and a sensitive information type label to which the model training sample belongs;

and training the first sensitive information type determination model according to the model training sample and the sensitive information type label to which the model training sample belongs.

10. The method of claim 2 or 3, wherein determining the sensitive information type to which the sample data belongs according to a second sensitive information type determination model configured in advance comprises:

11. The method of claim 3, obtaining sample data of the classified hierarchical data based on the data storage address, comprising:

acquiring the classified data to be classified according to the data storage address, and acquiring data content description information of the classified data to be classified from the metadata;

12. The method of claim 11, wherein the data content description information indicates that the data content is an identification number; the corresponding data screening rule comprises a regular expression; screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, wherein the data screening method comprises the following steps:

judging whether the classified data to be classified conforms to the regular expression or not;

and if so, screening the classified data to be classified as the data matched with the data content description information.

13. The method of claim 11, wherein the data content description information indicates that the data content is a name; the corresponding data screening rule comprises a normal distribution function; screening the classified data to be classified according to the corresponding data screening rule to obtain data matched with the data content description information, wherein the data screening method comprises the following steps:

substituting the length of the classified data to be classified into the normal distribution function to calculate to obtain a function result;

and if the function result meets the requirement of a preset result, screening the classified data to be classified as the data matched with the data content description information.

14. The method of claim 1, the target data comprising the metadata; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the steps of:

and determining the sensitive information type of the sample data according to the metadata according to at least one of the following determination modes: acquiring data source description information of the classified data to be classified from the metadata, acquiring a preset rule for determining the type of sensitive information based on the data source description information, and determining the type of the sensitive information to which the sampled data belongs according to the data source description information and the rule for determining the type of the sensitive information based on the data source description information; acquiring remark information of the classified data to be classified from the metadata, acquiring a preset rule for determining the type of sensitive information based on the remark information, and determining the type of the sensitive information to which the sampled data belongs according to the remark information and the rule for determining the type of the sensitive information based on the remark information;

15. The method of claim 1, the target data comprising the sample data; according to the target data, determining the sensitive information type and the sensitive information level of the classified data, wherein the sensitive information type and the sensitive information level comprise the following steps:

determining the sensitive information type of the sample data according to any one of the following determination modes; determining the type of sensitive information to which the sampled data belongs in a mode of comparing the sampled data with a pre-configured data source; determining the sensitive information type of the sampling data according to a first sensitive information type determination model defined by the data processing requester; determining a sensitive information type to which the sampling data belongs according to a pre-configured second sensitive information type determination model;

16. The method of claim 1, the blockchain node having a Trusted Execution Environment (TEE); and decrypting the data to be processed through the TEE execution to obtain the target data, determining the sensitive information type and the sensitive information level of the classified data according to the target data, and encrypting the sensitive information type and the sensitive information level by using the encryption key.

17. The method of claim 1, further comprising, after returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requestor through a blockchain system:

acquiring the acquisition time of the data to be processed, the encrypted sensitive information type and the encrypted sensitive information level generation time;

and generating a data processing record according to the data to be processed, the acquisition time, the generation time, the encrypted sensitive information type and the encrypted sensitive information grade, and storing the data processing record in a block chain system.

18. A data processing method based on a block chain comprises the following steps:

receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; and the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

19. The method of claim 18, obtaining sample data of the classified data, comprising:

20. A data processing system based on a block chain comprises a transaction device of a data processing requester and a block chain node;

the transaction equipment of the data processing requester acquires metadata and/or sampling data of the classified data to be classified, and acquires a pre-configured encryption key; the acquired data is used as target data, and the target data is packed and encrypted to obtain data to be processed; sending the data to be processed to a block chain node through a block chain system;

the block chain node acquires the data to be processed, decrypts the data to be processed to obtain the target data, and determines the sensitive information type and the sensitive information level of the classified data according to the target data; encrypting the sensitive information type and the sensitive information level by using the encryption key, and returning the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a block chain system;

and the transaction equipment of the data processing requester receives the encrypted sensitive information type and the encrypted sensitive information level returned by the block link point.

21. A blockchain-based data processing apparatus comprising:

the first acquisition unit acquires data to be processed of a data processing requester through a block chain system; the data to be processed is ciphertext data obtained by encrypting target data; the target data comprises an encryption key configured by the data processing requester, and sample data and/or metadata of the classified data to be classified;

the first processing unit is used for decrypting the data to be processed to obtain the target data and determining the sensitive information type and the sensitive information level of the classified data according to the target data;

and the first sending unit encrypts the sensitive information type and the sensitive information level by using the encryption key, and returns the encrypted sensitive information type and the encrypted sensitive information level to the data processing requester through a block chain system.

22. A blockchain-based data processing apparatus comprising:

the second acquisition unit is used for acquiring metadata and/or sampling data of the classified data to be classified and acquiring a pre-configured encryption key;

the second processing unit is used for taking the acquired data as target data, and packaging and encrypting the target data to obtain data to be processed;

the second sending unit is used for sending the data to be processed to the block chain nodes through the block chain system; the data to be processed is used for the block chain node to determine the sensitive information type and the sensitive information level of the classified data;

the data receiving unit is used for receiving the encrypted sensitive information type and the encrypted sensitive information level of the classified data returned by the block chain node; the encrypted sensitive information type and the encrypted sensitive information level are obtained by encrypting based on the encryption key.

23. A data processing apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to carry out the steps of the data processing method of any one of claims 1 to 17 or any one of claims 18 to 19.

24. A storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 17 or any one of claims 18 to 19.