Disclosure of Invention
The invention aims to provide a data management authority processing method and system based on big data so as to solve the problems.
The embodiment of the application is realized in the following way:
in a first aspect, an embodiment of the present application provides a method for processing data management rights based on big data, which is applied to a data management system, where the method includes:
responding to the permission allocation instruction, and acquiring a service interaction data set to be allocated;
projecting the service interaction data set to be distributed into a private array value domain of service interaction data of a service interaction data set category based on a private data projection variable of a private feature extraction network, and obtaining a private description array of the service interaction data set to be distributed according to a projection result, wherein a debugging template of the private feature extraction network comprises a service interaction data set template Eg_A without annotation information;
Based on a private type recognition network, carrying out private data type recognition on the private description array, and determining a recognition result of the to-be-allocated business interaction data set under a preset private data type of the private type recognition network; the private type recognition network is obtained by debugging a private description array extracted by a business interaction data set template Eg_B through the private feature extraction network, wherein the business interaction data set template Eg_B carries private data type annotation information;
determining an actual private data category of the to-be-allocated business interaction data set based on the identification result of the to-be-allocated business interaction data set;
determining a target business interaction data authority allocation mode corresponding to the actual private data category in a preset business interaction data authority allocation mode;
and determining the data management authority of the service interaction data set to be distributed according to the target service interaction data authority distribution mode.
As one implementation manner, the private feature extraction network includes at least two cascaded description array extraction networks, the private data projection variables include at least two private data projection sub-variables, each private data projection sub-variable corresponds to one description array extraction network, and different description array extraction networks are used for extracting private description arrays of different layers of the service interaction data set to be allocated;
The method for extracting the private data projection variable of the network based on the private characteristics comprises the steps of projecting a service interaction data set to be distributed into a private array value domain of service interaction data of a service interaction data set category, and obtaining a private description array of the service interaction data set to be distributed according to a projection result, wherein the method comprises the following steps:
based on the private data projection sub-variables of the description array extraction networks and the position distribution of the description array extraction networks, projecting the service interaction data set to be distributed into the private array value domain of the service interaction data set category, and obtaining the private description arrays extracted by the description array extraction networks according to projection results;
and processing the private description arrays extracted by one or more description array extraction networks in the description array extraction network to obtain the private description arrays of the service interaction data set to be distributed, wherein the one or more description array extraction networks comprise the last description array extraction network in the position distribution.
As an implementation manner, the extracting the secret data projection variable of the network based on the secret feature projects the service interaction data set to be allocated into a secret array value domain of the service interaction data set category, and before obtaining the secret description array of the service interaction data set to be allocated according to the projection result, the method further includes:
Optimizing the private feature extraction network to be optimized based on the business interaction data set template Eg_A to obtain an optimized private feature extraction network, wherein the business interaction data set template Eg_A does not carry private data category annotation information, and the private feature extraction network to be optimized is obtained by performing pre-optimization on business data with the same data type based on the business interaction data set template Eg_A;
based on the private data projection variable of the private feature extraction network, projecting the service interaction data set template Eg_B into a private data array value domain of service interaction data of a service interaction data set class, and obtaining a second private description array of the service interaction data set template Eg_B according to a projection result;
carrying out private data type recognition on the second private description array by adopting a private type recognition network to be optimized, and determining an estimated recognition result of the business interaction data set template Eg_B under a preset private data type of the private type recognition network to be optimized;
and optimizing the network parameter of the private type recognition network based on the private data type annotation information of the business interaction data set template Eg_B and the estimated recognition result to obtain an optimized private type recognition network.
As an implementation manner, the optimizing the privacy feature extraction network to be optimized based on the service interaction data set template eg_a to obtain an optimized privacy feature extraction network includes:
performing data segmentation operation on the service interaction data set template Eg_A to obtain interaction grouping data of the service interaction data set template Eg_A;
based on a plurality of interactive grouping data conversion logics in the preset interactive grouping data conversion logics, converting the interactive grouping data to obtain a converted business interactive data set template Eg_A, wherein the plurality of interactive grouping data conversion logics comprise target conversion logics;
the interactive grouping data which is converted based on the target conversion logic in the service interactive data set template Eg_A is determined to be the interactive grouping data to be estimated;
adopting a privacy feature extraction network to be optimized, and estimating the interaction grouping data to be estimated according to the converted business interaction data set template Eg_A to obtain estimated privacy data output by the privacy feature extraction network to be optimized;
determining an error of the privacy feature extraction network to be optimized based on the estimated privacy data and the interaction grouping data to be estimated;
And optimizing the network parameter of the private feature extraction network to be optimized based on the error to obtain an optimized private feature extraction network.
As an implementation manner, the privacy feature extraction network to be optimized comprises a pre-estimation network and a privacy description array extraction module, wherein the privacy description array extraction module comprises a privacy data projection variable;
the adoption of the privacy feature extraction network to be optimized predicts the interaction grouping data to be predicted according to the converted business interaction data set template Eg_A to obtain predicted privacy data output by the privacy feature extraction network to be optimized, and comprises the following steps:
projecting the converted business interaction data set template Eg_A into a private data set value domain of business interaction data of a business interaction data set class through a private data projection variable of the private description data set extraction module, and obtaining a first private description array of the converted business interaction data set template Eg_A according to a projection result;
estimating the interaction grouping data to be estimated based on the first privacy description array through the estimation network to obtain estimated privacy data;
the optimizing the network parameter of the private feature extraction network to be optimized based on the error to obtain an optimized private feature extraction network comprises the following steps:
And optimizing the parameter values of the estimated network and the private data projection variable of the private description array extraction module based on the error to obtain an optimized private feature extraction network.
As one implementation, the private description array extraction module includes at least two description array extraction networks, the private data projection variables include at least two private data projection sub-variables, and each description array extraction network includes a private data projection sub-variable;
the projecting, by the secret data projection variable of the secret description array extraction module, the converted service interaction data set template eg_a into a secret array value field of service interaction data of a service interaction data set class, and obtaining, according to a projection result, a first secret description array of the converted service interaction data set template eg_a, including:
based on the private data projection sub-variables of the description array extraction networks and the position distribution of the description array extraction networks, projecting the converted service interaction data set template Eg_A into the private array value domain of the service interaction data set class to obtain private description arrays extracted by the description array extraction networks;
Processing the private description arrays extracted by one or more description array extraction networks in the description array extraction network to obtain a first private description array of the converted business interaction data set template eg_a, wherein the one or more description array extraction networks comprise a description array extraction network at the last in the position distribution;
optimizing the parameters of the estimated network and the private data projection variables of the private description array extraction module based on the error to obtain an optimized private feature extraction network, wherein the method comprises the following steps:
and optimizing the parameter values of the estimated network and the private data projection sub-variables of the description array extraction network based on the errors to obtain an optimized private feature extraction network.
As an implementation manner, the optimizing the network parameter of the private category identification network based on the private data category annotation information of the service interaction data set template eg_b and the estimated identification result, and after obtaining the optimized private category identification network, the method further includes:
extracting a private data projection variable of a network through the private feature, projecting the service interaction data set template Eg_A into a private data array value domain of service interaction data of a service interaction data set class, and obtaining a third private description array of the service interaction data set to be distributed according to a projection result;
Carrying out private data category identification on the third private description array through the private category identification network, and determining an identification result of the business interaction data set template Eg_A under a preset private data category of the private category identification network;
setting private data category annotation information for the service interaction data set template Eg_A based on the identification result of the service interaction data set template Eg_A to obtain a service interaction data set annotation template Eg_A';
and optimizing the privacy class identification network based on the business interaction data set annotation template Eg_A' and the business interaction data set template Eg_B.
As an embodiment, the method further comprises:
receiving a service interaction data set corresponding to a private data class and a corresponding private class mark uploaded by a third-party service terminal, wherein the private class mark comprises a service interaction node;
the service interaction data set is used as a new service interaction data set template and is stored to a service interaction data set template library together with the corresponding privacy class mark;
and acquiring a privacy class mark of the service interaction data set to be distributed, taking the service interaction data set to be distributed as a newly-added service interaction data set template, and storing the newly-added service interaction data set template and the corresponding privacy class mark into the service interaction data set template library.
As an embodiment, the method further comprises:
when the distance between the time corresponding to the current node and the latest optimization time of the private class identification network reaches a network optimization setting interval, determining a service interaction data set template of the service interaction node after the latest optimization time in the service interaction data set template library to obtain a target service interaction data set template;
and optimizing the privacy class identification network based on the target service interaction data set template.
In a second aspect, embodiments of the present application provide a data management system comprising a processor and a memory, the memory storing a computer program for execution by the processor to implement the method described above.
According to the data management authority processing method and system based on big data, a to-be-allocated service interaction data set can be projected to a private array value domain of service interaction data of a service interaction data set type based on private data projection variables of a private feature extraction network, a private description array of the to-be-allocated service interaction data set is obtained according to projection results, a debugging template of the private feature extraction network comprises a service interaction data set template Eg_A without annotation information, private data type identification is conducted on the private description array based on a private type identification network, an identification result of the to-be-allocated service interaction data set under a preset private data type of the private type identification network is determined, the private type identification network is used for debugging the private description array of the service interaction data set template Eg_B extracted through the private feature extraction network, the service interaction data set template Eg_B carries private data type annotation information, the actual private data type of the to-be-allocated service interaction data set is determined based on the identification result of the to-be-allocated service interaction data set, a target private data type corresponding to-be-allocated service interaction data type is determined in a preset service interaction data distribution mode, and authority processing authority is conducted according to the target data interaction data distribution authority. Through the above process, the identification of the service interaction data set in the embodiment of the application comprises the extraction of the private description array and the identification of the private description array, which are respectively based on different networks, and the optimization process of the private feature extraction network is to accurately extract the private description array of the service interaction data set class, so that the optimization process of the private feature extraction network omits the private data class using the service interaction data set, only uses the service interaction data set template without annotation information, so as to ensure the accurate extraction of the private description array by the private feature extraction network, and based on the accuracy of the private description array, the private class identification network can acquire good identification capability even if the template quantity with annotation is insufficient, thereby reducing the dependence on manpower and improving the identification effect of the private data class.
In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.
The execution main body of the data management authority processing method based on big data in the embodiment of the application is a data management system, including but not limited to a server, a personal computer, a notebook computer, a tablet computer, a smart phone and the like. As shown in fig. 1, the data management authority processing method based on big data provided in the embodiment of the application includes the following steps:
100: and responding to the permission allocation instruction, and acquiring a business interaction data set to be allocated.
In the embodiment of the application, the service interaction data set is, for example, a set of data interaction data, such as online office data, e-commerce transaction data, social interaction data, audit data, and the like, performed between the internet platform and the background server by the user. For the service interaction data set to be allocated, corresponding data management authorities need to be allocated to the service interaction data set to perform safe and reasonable data management, specific authorities of the data management are, for example, different data states, different data processing authorities (such as allocation of different query fields, display fields, processing fields and the like) and the like of different personnel are given for different data privacy categories, and specific management authority contents are not limited.
200: based on a private data projection variable of a private feature extraction network, projecting a service interaction data set to be distributed into a private array value domain of service interaction data of a service interaction data set category, and obtaining a private description array of the service interaction data set to be distributed according to a projection result, wherein a debugging template of the private feature extraction network comprises a service interaction data set template Eg_A without annotation information.
It should be noted that the secret feature extraction network may be constructed by any feasible machine learning model, for example, a neural network model such as CNN, RNN, DNN, BERT, transformer, and may extract a secret description array based on the service interaction data set to be allocated (the secret description array is a vector information expression result obtained after feature extraction is performed on data of the service interaction data set to be allocated, and may have different manifestations according to the form and composition of the data, for example, may be a one-dimensional array (vector), or a two-dimensional array (i.e. matrix), or even a multidimensional tensor, which is not specifically limited). The debug templates of the privacy feature extraction network include a business interaction dataset template eg_a of non-annotated information (e.g., various types of tag labels indicating real information).
The private feature extraction network and the private category identification network are both arranged in the data management system, and the extraction of the private description array from the service interaction data set to be distributed can be performed based on the feature extraction network, such as a convolutional neural network. The private array value field of the service interaction data is a data field formed by the dimensions of the private description array of the service interaction data, the dimensions of the private description array are obtained by learning based on the characteristics of the service interaction data by a network, the service interaction data are different in category and characteristics, and the dimensions and the meaning of the private description array are different. In the embodiment of the present application, a network for extracting a private description array, where a private array value field depends on a class of a template adopted when the extracting performance of the private description array is optimized according to the network, and in the present application, the private feature extracting network optimizes through a service interaction data set template eg_a without annotation information, so that the dimension of the private array value field can excellently embody the characteristic of service interaction data of the service interaction data set class, so as to perform accurate vector description on the service interaction data set.
Since the data may be composed of multiple forms in the service interaction data set to be processed, for example, text comment data, voice consultation data, collection behavior data and the like may be included in the e-commerce data, as an implementation manner, the method provided by the embodiment of the application further includes the following steps before the step of obtaining the private description array of the service interaction data set to be distributed according to the projection result, wherein the private data projection variable of the network is extracted based on the private feature, the service interaction data set to be distributed is projected into the private array value field of the service interaction data set category: determining the type of the composition data corresponding to the service interaction data set to be distributed; and acquiring a privacy feature extraction network corresponding to the type of the composition data, and determining the privacy feature extraction network as a network for extracting the privacy description array for the type of the business interaction data set. It can be understood that different kinds of composition data can be respectively provided with corresponding different private feature extraction networks so as to improve the efficiency of private feature extraction.
In addition, because of data differences (such as structural differences) between data from different sources, if the number of networks of the adopted privacy feature extraction network is small, inaccurate privacy description array extraction may be caused. Based on this, the service interaction data set to be allocated may be subjected to a private feature extraction by a private feature extraction network comprising at least two description array extraction networks, in other words, the private feature extraction network comprises at least two cascaded description array extraction networks, the private data projection variables comprise at least two private data projection sub-variables, each private data projection sub-variable corresponds to one description array extraction network, and different description array extraction networks are used for extracting the private description arrays of different layers of the service interaction data set to be allocated.
In some technical solutions that may be implemented independently, extracting a private data projection variable of a network based on a private feature, projecting a service interaction data set to be allocated to a private array value field of service interaction data of a service interaction data set category, and obtaining a private description array of the service interaction data set to be allocated according to a projection result, where the method specifically may include: based on the private data projection sub-variables of each description array extraction network and the position distribution (such as the connection relation of each other) of the description array extraction network, projecting the service interaction data set to be distributed into the private array value domain of the service interaction data set category, and obtaining the private description arrays extracted by each description array extraction network according to the projection result; and processing the privacy description arrays extracted by one or more description array extraction networks in the description array extraction network to obtain the privacy description arrays of the business interaction data set to be distributed. Wherein the one or more description array extraction networks include a description array extraction network at a last location on the location distribution. The different description array extraction networks are configured to extract private description arrays of different layers of the service interaction data set to be distributed, and the different description array extraction networks are connected based on a preset sequence. In the present application, the secret data projection variable of the secret feature extraction network is a parameter for performing secret mapping, and may be obtained through deep learning, for example, the description array extraction network may include a data source description array extraction network, a data structure description array extraction network, a data semantic description array extraction network, and the like. In the embodiment of the application, in order to improve the accuracy of identifying the privacy category of the service interaction data set, a plurality of description arrays are extracted to be integrated to obtain the privacy description array, so that the privacy description array simultaneously covers the characteristic information of the shallow layer and the bottom layer, and the accuracy of the privacy description array is enhanced. As one embodiment, the description array extraction network includes a concatenated data source description array extraction network, a data structure description array extraction network, and a data semantic description array extraction network. The corresponding private data projection sub-variables are respectively a data source projection parameter, a data structure projection parameter and a data semantic projection parameter.
Based on the private data projection sub-variables of the description array extraction networks and the position distribution of the description array extraction networks, projecting the service interaction data set to be distributed into the private array value domain of the service interaction data set category, and obtaining the private description arrays extracted by the description array extraction networks according to the projection result, wherein the method comprises the following steps: loading the service interaction data set to be distributed into a data source description array extraction network, extracting data source projection parameter values of the network through the data source description array, and projecting the service interaction data set to be distributed into a data source array value field of service interaction data of a service interaction data set category to obtain a data source description array of the service interaction data set to be distributed; loading a data source description array into a data structure description array extraction network, extracting data semantic projection parameter values of the network through the data structure description array, and projecting the data source description array into a data structure array value domain of service interaction data of a service interaction data set category to obtain a data structure description array of the service interaction data set to be distributed; loading the data structure description array into a data semantic description array extraction network, extracting data semantic projection parameter values of the network through the data semantic description array, and projecting the data structure description array into a data semantic description array value domain of service interaction data of a service interaction data set category to obtain the data semantic description array of the service interaction data set to be distributed.
Processing the privacy description array extracted by one or more description array extraction networks in the description array extraction network to obtain the privacy description array of the service interaction data set to be distributed, wherein the privacy description array comprises: integrating (e.g. splicing or adding) the data source description array extracted by the data source description array extraction network, the data structure description array extracted by the data structure description array extraction network and the data semantic description array extracted by the data semantic description array extraction network to obtain a privacy description array. The private description array is located in the description array value field, namely the private array value field of the service interaction data needing to be described in the service interaction data set category.
As an implementation manner, only the end description array is used for extracting the privacy description array extracted by the network as the privacy description array, so that the class identification efficiency of the privacy data is improved. In some technical solutions that can be implemented independently, processing a secret description array extracted by one or more description array extraction networks in the description array extraction network to obtain a secret description array of a service interaction data set to be allocated, including: and taking the data semantic description array as a private description array of the business interaction data set to be distributed.
In some technical schemes which can be implemented independently, the output of the last description array extraction network in position distribution in the description array extraction network of the private feature extraction network is connected with the input of the private category identification network, so that the data semantic description array can be output from the private feature extraction network to the private category identification network, and meanwhile, the error in optimizing the private category identification network can be used for optimizing the weight parameter of the private feature extraction network so as to improve the accuracy of the private feature extraction network.
In some technical schemes which can be implemented independently, the method comprises the steps of projecting a service interaction data set to be distributed into a private array value field of service interaction data of a service interaction data set category based on private data projection variables of the private feature extraction network, and obtaining a private description array of the service interaction data set to be distributed according to a projection result, wherein the method can also comprise the step of network optimization: optimizing the private feature extraction network to be optimized based on the business interaction data set template Eg_A to obtain an optimized private feature extraction network, wherein the business interaction data set template Eg_A does not carry private data category annotation information, and the private feature extraction network to be optimized is obtained by performing pre-optimization on business data of the same data type based on the business interaction data set template Eg_A; based on the private feature extraction network private data projection variable, projecting the service interaction data set template Eg_B into a private data array value domain of service interaction data of a service interaction data set class, and obtaining a second private description array of the service interaction data set template Eg_B according to a projection result; carrying out private data type recognition on the second private description array by adopting a private type recognition network to be optimized, and determining the estimated recognition result of the business interaction data set template Eg_B under the preset private data type of the private type recognition network to be optimized; and optimizing the network parameter of the private type recognition network based on the private data type annotation information of the business interaction data set template Eg_B and the estimated recognition result to obtain an optimized private type recognition network. The business interaction data set templates eg_a and eg_b are only used for distinguishing and describing different business interaction data set templates.
The private feature extraction network may be pre-optimized according to the service data of the same data type as the service interaction data set template eg_a, and then optimized according to the service interaction data set template eg_a, where the service data of the same data type is the service data of the same type as the service interaction data set template eg_a. For example, when the service interaction data set template eg_a is transaction data in the e-commerce field, the service data with the same data type pre-optimized by the privacy feature extraction network is the transaction data in the e-commerce field. In addition, the error of the private type recognition network can be determined based on the private data type annotation information of the service interaction data set template Eg_B and the estimated recognition result, and then the network parameter of the private type recognition network is optimized based on the error of the private type recognition network to obtain the optimized private type recognition network. Or according to the error, optimizing parameters of the private feature extraction network at the same time.
Specifically, the error of the private type recognition network under the current network parameter can be obtained by adopting a cross entropy error function and a log likelihood error function based on the private data type annotation information and the estimated recognition result of the service interaction data set template eg_b, the private data type recognition is performed on the second private description array by adopting the private type recognition network to be optimized according to the network parameter of the error optimization private type recognition network, the estimated recognition result of the service interaction data set template eg_b under the preset private data type of the private type recognition network to be optimized is determined, and iteration is performed until the optimization cut-off requirement is met (such as the maximum iteration times is reached, the error value accords with the preset error threshold value, and the like), so that the optimized private type recognition network is obtained. The estimated recognition result may be a recognition result representing only one type of private data category, or a recognition result representing a plurality of types of private data category.
As an implementation manner, if the description array is extracted from the whole of each service interaction data set, the description array of the private feature extraction network has low extraction quality, and the data in the service interaction data can be segmented and then processed. In some technical schemes which can be independently implemented, when the private feature extraction network is optimized based on the business interaction data set template Eg_A, data segmentation operation is performed on the business interaction data set template Eg_A, so that the description array extraction performance of the private feature extraction network is perfected. Optimizing the privacy feature extraction network to be optimized based on the business interaction data set template eg_a to obtain an optimized privacy feature extraction network, which specifically comprises the following steps: performing data segmentation operation on the service interaction data set template Eg_A to obtain interaction grouping data of the service interaction data set template Eg_A; based on a plurality of interactive grouping data conversion logics in the preset interactive grouping data conversion logics, converting the interactive grouping data to obtain a converted business interactive data set template Eg_A, wherein the plurality of interactive grouping data conversion logics comprise target conversion logics; the interactive grouping data which is converted based on the target conversion logic in the business interactive data set template Eg_A is determined to be the interactive grouping data to be estimated; adopting a private feature extraction network to be optimized, and estimating the interaction grouping data to be estimated according to the converted business interaction data set template Eg_A to obtain estimated private data output by the private feature extraction network to be optimized; determining an error of the private feature extraction network to be optimized based on the estimated private data and the interaction grouping data to be estimated; and optimizing the network parameter of the private feature extraction network to be optimized based on the error to obtain an optimized private feature extraction network. When converting the interactive clustered data, if only one conversion logic (replacement rule) is used, for example, the target conversion logic is used to convert the interactive clustered data, the private feature extraction network may be familiar with the conversion logic relatively quickly, so that excellent optimization cannot be achieved, and then the interactive clustered data conversion logic at least comprises the target conversion logic and one or more conversion logics which interfere with each other. When the interactive grouping data are converted based on the disturbance conversion logic, the disturbance interactive grouping data can be obtained. Disturbance interaction grouping data can cause interference when a private feature extraction network predicts predicted private data, so that the private feature extraction network can complete data understanding only based on context, and the recognition extraction performance of the private feature extraction network is enhanced. The perturbation conversion logic may be to convert the interaction grouping data with the set data or to convert the interaction grouping data based on data unrelated to the set of business interaction data to be allocated. When the interactive grouping data is converted based on the target conversion logic, the interactive grouping data to be estimated can be obtained, and the target conversion logic can convert the interactive grouping data based on a preset field set. The interactive grouping data conversion logic can also comprise an interactive grouping data conversion strategy corresponding to the interactive grouping data conversion logic. The interactive grouping data conversion strategy can be used for determining the interactive grouping data which is converted under the corresponding interactive grouping data conversion logic, such as the proportion, structure, property and the like of the converted interactive grouping data.
As an implementation manner, a private feature extraction network to be optimized is adopted, and the interaction grouping data to be estimated is estimated according to the converted business interaction data set template eg_a, so as to obtain estimated private data output by the private feature extraction network to be optimized, which specifically may include: projecting the converted business interaction data set template Eg_A into a private data set value domain of business interaction data of a business interaction data set category through a private data projection variable of the private description data set extraction module, and obtaining a first private description array of the converted business interaction data set template Eg_A according to a projection result; and estimating the interaction grouping data to be estimated based on the first privacy description array through the estimation network to obtain estimated privacy data.
Further, based on the error, optimizing a network parameter of the private feature extraction network to be optimized to obtain an optimized private feature extraction network, which may specifically include: and optimizing the parameter of the estimated network and the private data projection variable of the private description array extraction module based on the error to obtain an optimized private feature extraction network.
The estimating network is a network for completing the estimating process of the interactive grouping data to be estimated in the private feature extracting network to be optimized. The number of the interaction grouping data to be estimated is one or more, and the error is obtained through the difference between each piece of estimated private data and the corresponding interaction grouping data to be estimated. The conversion ratio of the estimated interactive grouping data with different compositions can be different, when the errors are obtained, one type of sub-errors can be obtained for each composition data based on the difference between the estimated private data of each composition data type and the corresponding interactive grouping data to be estimated, and the weighting summation of a plurality of sub-errors is carried out, so that the weight distribution principle is not limited.
In some embodiments, the private description array extraction module includes at least two description array extraction networks, the private data projection variable includes at least two private data projection sub-variables, and each description array extraction network includes a private data projection sub-variable. Optionally, projecting the converted service interaction data set template eg_a into a private data set value field of service interaction data of a service interaction data set class through a private data projection variable of a private description data set extraction module, and obtaining a first private description array of the converted service interaction data set template eg_a according to a projection result, which specifically may include: based on the private data projection sub-variables of each description array extraction network and the position distribution of the description array extraction network, projecting the converted business interaction data set template Eg_A into the private array value domain of business interaction data of the business interaction data set class to obtain a private description array extracted by each description array extraction network; and processing the privacy description arrays extracted by one or more description array extraction networks in the description array extraction network to obtain a first privacy description array of the converted business interaction data set template eg_a, wherein the one or more description array extraction networks comprise a description array extraction network at the last position distribution.
In some technical solutions that can be implemented independently, optimizing parameters of the pre-estimated network and private data projection variables of the private description array extraction module based on the error to obtain an optimized private feature extraction network, including: optimizing parameters of the estimated network and private data projection sub-variables of each description array extraction network based on errors to obtain an optimized private feature extraction network.
In the embodiment of the application, in order to increase the number of templates in the optimization process, prevent overfitting, and finally enhance the accuracy of the network, the service interaction data set template eg_a without annotation information can be identified and annotated through the private category identification network, and the private category identification network is optimized again according to the annotated service interaction data set template eg_a and service interaction data set template eg_b. In some technical solutions that can be implemented independently, based on the private data class annotation information of the service interaction data set template eg_b and the estimated recognition result, the network parameter of the private class recognition network is optimized, and after the optimized private class recognition network is obtained, the method further includes: extracting a private data projection variable of a network through the private feature, projecting the service interaction data set template Eg_A into a private data array value field of service interaction data of a service interaction data set category, and obtaining a third private description array of the service interaction data set to be distributed according to a projection result; carrying out private data category identification on the third private description array through the private category identification network, and determining an identification result of the business interaction data set template eg_a under a preset private data category of the private category identification network; setting private data category annotation information for the service interaction data set template Eg_A based on the identification result of the service interaction data set template Eg_A to obtain a service interaction data set annotation template Eg_A'; the privacy class identification network is optimized based on the business interaction dataset annotation template eg_a' and the business interaction dataset template eg_b.
In some technical schemes which can be implemented independently, private data category annotation information is set for the service interaction data set template eg_a based on the identification result of the service interaction data set template eg_a, so as to obtain the service interaction data set annotation template eg_a', and the private data category annotation information which only represents a private data category can be matched for each service interaction data set template in the service interaction data set template eg_a based on the identification result of the service interaction data set template eg_a. The private category recognition network is optimized based on the business interaction data set annotation template eg_a ' and the business interaction data set template eg_b, the private category recognition network can be optimized by respectively optimizing the business interaction data set annotation template eg_a ' and the business interaction data set template eg_b, or the business interaction data set annotation template eg_a ' and the business interaction data set template eg_b are mixed to form a business interaction data set template eg_c, and the private category recognition network is optimized based on the business interaction data set template eg_c.
200: and carrying out private data type recognition on the private description array based on a private type recognition network, and determining a recognition result of the to-be-allocated service interaction data set under a preset private data type of the private type recognition network, wherein the private type recognition network is obtained by debugging the private description array extracted by the service interaction data set template Eg_B through the private characteristic extraction network, and the service interaction data set template Eg_B carries the private data type annotation information.
It should be noted that the private data type recognition network may be constructed by any feasible machine learning model, for example, a neural network model such as CNN, RNN, DNN, BERT, transformer, which is used for carrying out private data type recognition based on the private description array. The private category identification network is obtained through debugging a business interaction data set template Eg_B carrying private data category annotation information.
In optimizing the privacy class identification network, the privacy data class annotation information includes a privacy data indicator for indicating the privacy data class of the business interaction data set template eg_b, and the specific form is not limited, for example 0001. In some technical solutions that may be implemented independently, the private category identification network provided in the embodiments of the present application may include a plurality of cascaded full-connection layers and additional classification layers, or may be implemented in a logistic regression manner, a naive bayes manner, or the like.
300: and determining the actual private data category of the to-be-allocated business interaction data set based on the identification result of the to-be-allocated business interaction data set.
400: and determining a target business interaction data authority allocation mode corresponding to the actual private data category in a preset business interaction data authority allocation mode.
In this embodiment of the present application, for different private data types, the corresponding rights allocation manners are different, for example, corresponding to different data processing rights, corresponding to different data viewing ranges, and the like.
500: and determining the data management authority of the service interaction data set to be distributed according to the target service interaction data authority distribution mode.
In some independent embodiments, the data of the service interaction data set is time-efficient, and the optimized network may not be suitable after a period of time, so that the network needs to be updated and optimized according to a certain frequency, and in addition, the data in the service interaction data set may be difficult to identify in the classification process, and may be combined with a human expert to perform auxiliary identification, so that the method provided by the embodiment of the present application further includes: receiving a service interaction data set corresponding to a private data class and a corresponding private class mark uploaded by a third-party service terminal, wherein the private class mark comprises a service interaction node; the service interaction data set is used as a new service interaction data set template and is stored to a service interaction data set template library together with the corresponding privacy class mark; acquiring a private category mark of the service interaction data set to be distributed, taking the service interaction data set to be distributed as a newly-added service interaction data set template, and storing the newly-added service interaction data set template and the corresponding private category mark together into a service interaction data set template library; when the distance between the time corresponding to the current node and the latest optimization time of the private category identification network reaches a network optimization setting interval, determining a service interaction data set template of the service interaction node after the latest optimization time in the service interaction data set template library to obtain a target service interaction data set template; and updating and optimizing the privacy class identification network based on the target service interaction data set template.
In some technical schemes which can be independently implemented, in order to avoid degradation of the private feature extraction network and ensure performance of the private feature extraction network, updating optimization can be completed on the private feature extraction network when updating optimization is performed on the private category identification network based on the target business interaction data set template. The service interaction data set corresponding to the private data type uploaded by the third-party service terminal is, for example, the service interaction data set of the private data type obtained by collecting the confirmation of the service interaction data set by an expert. The business interaction data set template library is used for storing business interaction data set templates.
In summary, according to the data management authority processing method and system based on big data provided in the embodiments of the present application, a service interaction data set to be allocated may be projected to a private array value domain of service interaction data of a service interaction data set class based on a private data projection variable of a private feature extraction network, a private description array of the service interaction data set to be allocated is obtained according to a projection result, a debug template of the private feature extraction network includes a service interaction data set template eg_a without annotation information, private data class identification is performed on the private description array based on a private class identification network, an identification result of the service interaction data set to be allocated in a preset private data class of the private class identification network is determined, the private class identification network obtains the private description array of the service interaction data set extracted by the private feature extraction network, the private data set template eg_b carries private data class information, an actual private data class of the service interaction data set to be allocated is determined based on an identification result of the service interaction data set to be allocated, an actual private data class of the service interaction data set to be allocated is annotated in a preset service interaction data allocation target-corresponding to a service interaction data allocation authority mode, and an actual service interaction data authority is allocated according to a service interaction data allocation authority. Through the above process, the identification of the service interaction data set in the embodiment of the application comprises the extraction of the private description array and the identification of the private description array, which are respectively based on different networks, and the optimization process of the private feature extraction network is to accurately extract the private description array of the service interaction data set class, so that the optimization process of the private feature extraction network omits the private data class using the service interaction data set, only uses the service interaction data set template without annotation information, so as to ensure the accurate extraction of the private description array by the private feature extraction network, and based on the accuracy of the private description array, the private class identification network can acquire good identification capability even if the template quantity with annotation is insufficient, thereby reducing the dependence on manpower and improving the identification effect of the private data class.
Based on the same principle as the method shown in fig. 1, there is also provided in an embodiment of the present application a data management right processing apparatus 10, as shown in fig. 2, the apparatus 10 includes:
the data acquisition module 11 is used for responding to the authority allocation instruction and acquiring a service interaction data set to be allocated;
the array extraction module 12 is configured to extract a secret data projection variable of a network based on a secret feature, project the service interaction data set to be allocated to a secret array value domain of service interaction data of a service interaction data set category, and obtain a secret description array of the service interaction data set to be allocated according to a projection result, where a debug template of the secret feature extraction network includes a service interaction data set template eg_a without annotation information;
the class identification module 13 is configured to identify a private data class of the private description array based on a private class identification network, and determine an identification result of the service interaction data set to be allocated under a preset private data class of the private class identification network; the private type recognition network is obtained by debugging a private description array extracted by a business interaction data set template Eg_B through the private feature extraction network, wherein the business interaction data set template Eg_B carries private data type annotation information; determining an actual private data category of the to-be-allocated business interaction data set based on the identification result of the to-be-allocated business interaction data set;
The permission distribution module 14 is configured to determine a target service interaction data permission distribution mode corresponding to the actual private data category in a preset service interaction data permission distribution mode; and determining the data management authority of the service interaction data set to be distributed according to the target service interaction data authority distribution mode.
Since in the description of the method, each step has been described, a detailed description of the specific implementation principle of each module is not repeated here.
The above embodiment describes the data management authority processing apparatus 10 from the viewpoint of a virtual module, and the following describes a data management system from the viewpoint of a physical module, specifically as follows:
an embodiment of the present application provides a data management system, as shown in fig. 3, the data management system 100 includes: a processor 101 and a memory 103. Wherein the processor 101 is coupled to the memory 103, such as via bus 102. Optionally, the data management system 100 may also include a transceiver 104. It should be noted that, in practical applications, the transceiver 104 is not limited to one, and the structure of the data management system 100 is not limited to the embodiment of the present application.
The processor 101 may be a CPU, general purpose processor, GPU, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 101 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 102 may include a path to transfer information between the aforementioned components. Bus 102 may be a PCI bus or an EISA bus, etc. The bus 102 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
Memory 103 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 103 is used for storing application program codes for executing the present application and is controlled to be executed by the processor 101. The processor 101 is configured to execute application code stored in the memory 103 to implement what is shown in any of the method embodiments described above.
The embodiment of the application provides a data management system, which comprises: one or more processors; a memory; the method comprises the steps that one or more computer programs are stored in a memory and configured to be executed by one or more processors, when the one or more programs are executed by the processors, the method is achieved, a service interaction data set to be distributed is projected to a private data array value field of service interaction data of a service interaction data set category based on private data projection variables of a private feature extraction network, a private description array of the service interaction data set to be distributed is obtained according to the projection results, a debugging template of the private feature extraction network comprises a service interaction data set template Eg_A without annotation information, private data category identification is conducted on the private description array based on a private category identification network, the identification result of the service interaction data set to be distributed under the preset private data category of the private category identification network is determined, the private category identification network obtains the private description array of the service interaction data set to be distributed through the private feature extraction network, the service interaction data set template Eg_B carries private data category information, the service interaction data category is corresponding to the service interaction data set to be distributed according to the actual service interaction data set identification method, and the service interaction authority of the service interaction data set to be distributed is determined according to the preset service interaction data category identification authority of the service interaction data set. Through the above process, the identification of the service interaction data set in the embodiment of the application comprises the extraction of the private description array and the identification of the private description array, which are respectively based on different networks, and the optimization process of the private feature extraction network is to accurately extract the private description array of the service interaction data set class, so that the optimization process of the private feature extraction network omits the private data class using the service interaction data set, only uses the service interaction data set template without annotation information, so as to ensure the accurate extraction of the private description array by the private feature extraction network, and based on the accuracy of the private description array, the private class identification network can acquire good identification capability even if the template quantity with annotation is insufficient, thereby reducing the dependence on manpower and improving the identification effect of the private data class.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed on a processor, enables the processor to perform the corresponding content of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.