CN113742758A

CN113742758A - Data set authority management and control method, system and storage medium based on central station

Info

Publication number: CN113742758A
Application number: CN202111297295.XA
Authority: CN
Inventors: 陈盛慧; 纪德良; 蒋震宇; 陈立; 朱世鹏; 于亚丰; 林捷; 曹燕萍; 严伟; 钱华; 徐志安; 阳东; 解林超; 汪娟玉; 杨春晨; 陈辉; 陈怀狮; 陈依婷
Original assignee: Zhejiang Huayun Information Technology Co Ltd
Current assignee: Zhejiang Huayun Information Technology Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2021-12-03
Anticipated expiration: 2041-11-04
Also published as: CN113742758B

Abstract

The invention provides a data set authority control method, a system and a storage medium based on a middle station, comprising the following steps: the middle station performs mirror image processing on the received data sources to obtain a plurality of data sources, and processes the plurality of data sources respectively to obtain a plurality of different data sets; determining a data range of each data set according to a specific dimension of each data set; determining that the two data sets are a parent data set and a child data set respectively according to the data conversion direction; generating a heterogeneous data set corresponding to the data set; classifying the heterogeneous data set into a sub-second class data set; generating and storing a data conversion strategy and a comparison strategy, and respectively generating a first sub-class interface and a second sub-class interface based on a parent data set, the data conversion strategy and the comparison strategy; and respectively setting the first sub-authority range and the second sub-authority range corresponding to the first sub-interface and the second sub-interface, generating a parent interface based on a parent data set, and respectively setting the parent authority range corresponding to the parent interface, the first sub-interface and the second sub-interface.

Description

Data set authority management and control method, system and storage medium based on central station

Technical Field

The invention relates to the technical field of data processing, in particular to a data set right management and control method and system based on a middle station and a storage medium.

Background

With the continuous improvement of enterprise systems and the higher requirements on informatization security, the traditional interface level authority function does not satisfy the new form of business development any more. The requirement of finer granularity is provided for the authority security, and different authorities need to be granted according to the user identity so as to achieve the security target of strong management and control.

Although the data level authority is not a new noun and a great number of system services are realized, the data level authority often has strong business properties, so that the data level authority cannot be abstracted as a basic service or technical platform capacity to be shared and shared, and meanwhile, if the system services which are not considered for the data authority at first need to be modified later and higher modification cost may exist. As a large-scale energy and national post enterprise, the national power grid is particularly important to enterprise safety.

For some new data sources, how to perform safe and rapid management configuration on the data authority of the data source is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a data set authority control method, a system and a storage medium based on a middle station, which can process and distribute and store source data when the source data are stored, and can rapidly configure corresponding interfaces and authorities for a data set according to different data in the data set, so that the source data can be safely stored and rapidly configured and called.

In a first aspect of the embodiments of the present invention, a method for controlling authority of a data set based on a middlebox is provided, including:

the middle station performs mirror image processing on the received data sources to obtain a plurality of data sources, and respectively processes the plurality of data sources to obtain a plurality of different data sets, wherein each data set has corresponding specific data;

determining a data range of each data set according to a specific dimension of each data set;

if data conversion can be carried out between any two data sets according to a data conversion strategy, determining that the two data sets are a parent data set and a child data set respectively according to the data conversion direction;

if data conversion can not be carried out between any two data sets according to the data conversion strategy, generating a heterogeneous data set corresponding to the data set;

comparing the heterogeneous data set with the parent data set to obtain a difference data set, and classifying the heterogeneous data set into a secondary data set if the difference data set does not contain data in the heterogeneous data set;

generating and storing a data conversion strategy and a comparison strategy corresponding to the first-class sub data set and the second-class sub data set, respectively generating a first-class sub interface and a second-class sub interface based on the parent data set, the data conversion strategy and the comparison strategy, generating a parent interface based on the parent data set, and deleting data in the first-class sub data set and the second-class sub data set;

and acquiring a first sub-authority range, a second sub-authority range and a parent authority range of the first sub-class data set, the second sub-class data set and the parent data set, setting the first sub-authority range and the second sub-authority range respectively corresponding to the first sub-class interface and the second sub-class interface, and setting the parent authority range respectively corresponding to the parent interface, the first sub-class interface and the second sub-class interface.

Optionally, in a possible implementation manner of the first aspect, generating a first sub-class interface and a second sub-class interface based on the parent class data set, the data conversion policy, and the comparison policy respectively includes:

respectively extracting the storage addresses of the parent data set and the data conversion strategy to obtain a parent address and a conversion address, and generating the child interface according to the parent address and the conversion address;

and respectively extracting the storage addresses of the parent data set and the comparison strategy to obtain a parent address and a comparison address, and generating the child class II interface according to the parent address and the comparison address.

Optionally, in a possible implementation manner of the first aspect, the intermediate station performs mirroring processing on the received data source to obtain a plurality of data sources, and processes the plurality of data sources to obtain a plurality of different data sets, respectively, where each data set has corresponding specific data, and the data sets include:

extracting sensitive data in the data source according to a preset sensitive condition, and determining the number and the position of a mask according to the data quantity value of the sensitive data;

performing mask processing on sensitive data in a data source based on the mask number and the mask position to obtain a corresponding first mask data set, and generating a sub-second data set according to the first mask data set; and/or

Extracting partial data in the data source according to a partial preset condition, and obtaining a corresponding primary partial data set based on the partial data;

extracting sensitive data in the primary partial data set according to a preset sensitive condition, and determining the number and the position of a mask according to the data quantity value of the sensitive data;

masking sensitive data in the primary partial data set based on the mask number and the mask position to obtain a corresponding second mask data set;

generating a sub-three class data set based on the second mask data set.

Optionally, in a possible implementation manner of the first aspect, the sub-three-type interface is generated according to a storage address of the sub-three-type data set.

Optionally, in a possible implementation manner of the first aspect, extracting sensitive data in the primary partial data set according to a preset sensitivity condition, and determining the number of masks and the mask positions according to the data quantity value of the sensitive data includes:

extracting text information meeting a preset text format in the primary partial data set, and determining the number of texts in the text information;

if the preset text format is in a character form, calculating the mask number through the following formula,

wherein,

in order to calculate the number of masks,

for the number of mask alignments to be considered,

the amount of text that is sensitive data within a partial data set at a time,

for previously stored first partial data setkThe number of masks for each text message,

for previously stored first partial data setiThe number of texts of the individual text information,

is as followshThe preset weight value of the character form is set,

is as followshA preset proportion value of the character form is planted;

and determining the mask position to be any one of a front position, a middle position or a rear position based on the type of the character form.

Optionally, in a possible implementation manner of the first aspect, the user pair is receivedhA mask number correction value for inputting text information in a text form, based on which a correction value for the number of masks is calculated by the following formulahThe preset weight value of the character form is corrected,

wherein,

in order to correct the value for the number of masks,

for the corrected pre-stageThe weight value is set, and the weight value,

in order to correct the increased weight, the weight is increased,

the weight is reduced for correction.

Optionally, in a possible implementation manner of the first aspect, if data conversion can be performed between any two data sets according to a data conversion policy, determining that the two data sets are a parent data set and a child data set respectively according to a data conversion direction includes:

traversing all data conversion strategies, sequentially carrying out data conversion on each data set to obtain a converted data set, and comparing the converted data set with other unconverted data sets;

if the converted data set is the same as other unconverted data sets, taking the converted data set as a parent data set and taking the data set before conversion as a child data set;

if data conversion cannot be performed between any two data sets according to the data conversion strategy, generating a heterogeneous data set corresponding to the data set comprises:

traversing all the data conversion strategies, sequentially carrying out data conversion on one data set to obtain a converted data set, and comparing the converted data set with other unconverted data sets;

if the converted data set is not identical to the other data sets, a heterogeneous data set is derived based on the data set.

Optionally, in a possible implementation manner of the first aspect, the method further includes:

acquiring historical operation data of a user, wherein the historical operation data comprises data query frequency and data query duration;

the processing load of the sub-class one data set or the sub-class two data set is calculated by the following formula,

wherein,

the processing load for either the sub-class data set or the sub-class two data set,

is the amount of data in the sub-class data set or the sub-class two data set,

in order to achieve the processing efficiency of the processing apparatus,

in order to handle the cache space of the device,Tthe number of days is the value of the number of days,

is as followsxThe number of queries per day, Z is the number of data queries,

is as followsoA duration of the secondary data query;

if it is as described

Greater than a predetermined value

And if so, not deleting the first-class data set or the second-class data set and classifying the first-class data set or the second-class data set into a flat-class data set for storage.

In a second aspect of the embodiments of the present invention, a data set authority control device based on a middlebox is provided, including:

the system comprises a mirror image module, a data processing module and a data processing module, wherein the mirror image module is used for carrying out mirror image processing on a received data source to obtain a plurality of data sources, and respectively processing the plurality of data sources to obtain a plurality of different data sets, and each data set is internally provided with corresponding specific data;

the dimension determining module is used for determining the data range of each data set according to the specific dimension of each data set;

the judgment determining module is used for determining that two data sets are respectively a parent data set and a child data set according to the data conversion direction when data conversion can be carried out between any two data sets according to a data conversion strategy;

the judging and generating module is used for generating a heterogeneous data set corresponding to any two data sets when the data conversion can not be carried out between the two data sets according to the data conversion strategy;

the comparison module is used for comparing the heterogeneous data set with the parent data set to obtain a difference data set, and if the difference data set does not contain data in the heterogeneous data set, classifying the heterogeneous data set into a secondary data set;

the interface generation module is used for generating and storing a data conversion strategy and a comparison strategy corresponding to the first-class data set and the second-class data set, respectively generating a first-class sub interface and a second-class sub interface based on the father-class data set, the data conversion strategy and the comparison strategy, generating a father-class interface based on the father-class data set, and deleting data in the first-class sub data set and the second-class sub data set;

and the corresponding module is used for acquiring the first sub-authority range, the second sub-authority range and the parent-authority range of the first sub-data set, the second sub-data set and the parent-data set, setting the first sub-authority range and the second sub-authority range respectively corresponding to the first sub-interface and the second sub-interface, and setting the parent-authority range respectively corresponding to the parent interface, the first sub-interface and the second sub-interface.

In a third aspect of the embodiments of the present invention, a readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.

According to the data set authority management and control method, system and storage medium based on the middlebox, data processing is carried out on a data source according to different strategies to obtain a plurality of data sets, all the data sets can meet the requirement that roles with different authorities can see the data, the data sources are divided into whole parts in the above mode, so that the roles with each authority can quickly see the data corresponding to the authority range, then data classification is carried out according to a data conversion strategy and a comparison strategy, and partial data sets are deleted. By the mode, the data of the corresponding authority can be rapidly obtained by each role, and the data storage capacity can be reduced.

When sensitive data are processed, the mask number is calculated according to historical behaviors, so that the mask can shield sensitive contents in a sensitive data set and expose partial contents with statistical significance, the mask position is determined according to the type of a character form, the sensitive contents in the sensitive data set are shielded in a purposeful and strategic manner, and data statistics can be conveniently carried out by workers with corresponding roles.

The invention can determine the number of the masks by adopting an active learning mode, when a user thinks that the number of the outputted masks does not accord with the current scene, a mask number correction value which accords with the current scene more can be input, the change trend of the preset weight value is determined by comparing the mask number correction value with the calculated mask number, so that the corrected preset weight value is more accurate, and different weights and change amplitudes can be adopted according to the increase or decrease of the number of the masks when the preset weight value is corrected, so that the corrected preset weight value is more accurate and is fit with the use scene.

According to the invention, when a plurality of data sets are classified, the processing load of each data set is judged, if the processing load of the data sets is too large, the data sets are not classified any more, but the data sets are directly determined to be flat data sets for storage, so that the calling time is not too long due to too long processing time when the flat data sets are called, and the users with corresponding roles are ensured to quickly and effectively call the data which can be seen by the authorities of the users.

Drawings

FIG. 1 is a flow chart of a middlebox-based data set rights management method;

FIG. 2 is a block diagram of two major components included in the middle station;

FIG. 3 is a flowchart of rights configuration;

FIG. 4 is a flow chart of rights access;

FIG. 5 is a flowchart of rights interception;

fig. 6 is a block diagram of a middlebox-based data set authority management device.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The invention provides a data set authority control method based on a middle station, as shown in fig. 1, the data set authority control method specifically comprises the following steps:

step S110, the middle station performs mirror image processing on the received data sources to obtain a plurality of data sources, and processes the plurality of data sources to obtain a plurality of different data sets, where each data set has corresponding specific data. The data set right control method provided by the invention can be realized based on a middlebox, and the middlebox can be a middlebox of a certain enterprise.

Since the data is updated day by day and month, the middlebox first performs mirroring and copying on a plurality of data sources after receiving the data sources, and the data sources can be regarded as the most basic data without any processing. The data source in the invention can be regarded as power consumption data of a certain area, for example, power consumption data of a mountain area in Beijing city. The data source comprises the electricity utilization condition of each household in the mountainous area, and can be regarded as an electricity value.

Different data exists in each different data set, and the following steps can be used for elaborating a plurality of different data sets obtained by processing the data sources respectively.

Step S120, determining a data range of each data set according to the specific dimension of each data set. The data in each data set may be processed, such as by deleting portions of the data, converting portions of the data, and so forth. For example, the area of the house and the mountain includes a district-level statistical department, a district-level power supply unit, and a plurality of ballast power supply units, where the district-level power supply unit and the ballast power supply units can be regarded as a plurality of roles, the authority corresponding to the district-level power supply unit is the power consumption data of all users in the district, the authority of the ballast power supply unit can see the power consumption data of the users in the corresponding town, the specific dimension can be regarded as the dimension of the role corresponding to each data set, the data range of the data set is determined according to the dimensions of the roles in different data sets, the data range can be the power consumption data of all users in a certain district, and the power consumption data of all users in each town. The electricity data for all households in the town belongs to a subset of the electricity data for all households in the district.

The method takes 4 data sets as an example, the data set 1 is data which can be called and checked by the authority of the district-level power supply unit, at this time, the data in the data set 1 can be regarded as source data, the data set 2 is data which can be called and checked by the district-level statistical department, the data set 3 is data which can be called and checked by the ballast power supply unit of the 1 st town, and the data set 4 is data which can be called and checked by the ballast power supply unit of the 2 nd town.

In one possible practical scenario, the district-level statistics department is an external entity that has rights and obligations for data statistics, but sensitive data cannot be leaked out, and the data that can be seen by the authority of the district-level statistics department must be desensitized, i.e., data set 2 is desensitized data in data set 1. According to normal department responsibilities, the ballast power supply unit can only check the power consumption data of the town where the ballast power supply unit is located, the data set 3 is the data of the 1 st town in the data set 1, and the data set 4 is the data of the 2 nd town in the data set 1.

Step S130, if data conversion can be performed between any two data sets according to the data conversion policy, determining that the two data sets are a parent data set and a child data set respectively according to the data conversion direction.

Step S130 specifically includes:

and traversing all the data conversion strategies to sequentially perform data conversion on each data set to obtain a converted data set, and comparing the converted data set with other unconverted data sets. The data conversion policy may be preset, for example, a data desensitization data conversion policy exists between the data set 2 and the data set 1, that is, each time a new data source is obtained, data with corresponding authority is determined for the corresponding data set 2, data desensitization processing is performed according to the data desensitization data conversion policy.

And if the converted data set is the same as other unconverted data sets, taking the converted data set as a parent data set and taking the data set before conversion as a child data set. The present invention acquires a plurality of data sets, and then, acquires a reverse conversion mode for the plurality of data sets, wherein the reverse conversion in the present invention may be a one-time reverse conversion (i.e. only one-step reverse conversion action is performed), for example, one step is used to convert desensitized data into data before desensitization. When one data set is the same as other data sets after being subjected to inverse conversion, the data set is proved to be converted from other data sets, so that the data set after inverse conversion is taken as a parent data set, and the data before conversion is taken as a child data set. For example, if data set 2 can be converted back to data set 1, data set 1 is a parent data set and data set 2 is a child data set. Through the method, classification on the data set conversion dimension is completed.

Step S140, if data conversion cannot be performed between any two data sets according to the data conversion policy, generating a heterogeneous data set corresponding to the data set. For example, dataset 3 may not be able to obtain dataset 1 or other datasets in the form of a reverse transformation, i.e., dataset 3 is a heterogeneous dataset.

Step S140 specifically includes:

and traversing all the data conversion strategies, sequentially carrying out data conversion on one data set to obtain a converted data set, and comparing the converted data set with other unconverted data sets. There are many data conversion strategies, such as desensitizing the data, translating the data, and so forth. The data conversion strategy is to express the data in a form, but does not delete or add the data.

According to the invention, one data set is respectively subjected to inverse conversion according to a plurality of data conversion strategies, and then is sequentially compared with other data sets. The comparison among a plurality of data sets is more thorough, and the condition of processing omission is avoided.

If the converted data set is not identical to the other data sets, a heterogeneous data set is derived based on the data set. After traversing all data transformation strategies and inverting the data, if the transformed data set is different from other data sets, it is proved that the transformed data set is not directly obtained by transformation according to other data sets, so that the data set is classified into heterogeneous data sets, such as the data set 3 and the data set 4.

Step S150, comparing the heterogeneous data set with the parent data set to obtain a difference data set, and classifying the heterogeneous data set into a secondary two-class data set if the difference data set does not include data in the heterogeneous data set. According to the method, after the heterogeneous data sets are obtained, the heterogeneous data sets and the parent data sets are compared to obtain the difference data sets, whether the data in the heterogeneous data sets are completely contained by the parent data sets is judged in a comparison mode, for example, the parent data sets are data sets 1, the heterogeneous data sets are data sets 3, the data sets 1 completely contain the data sets 3, the data sets 3 are considered to be obtained through one-time deleting operation of the data sets 1, and the data sets 3 are classified into the sub-second data sets.

Step S160, generating and storing a data conversion policy and a comparison policy corresponding to the first-class data set and the second-class data set, respectively generating a first-class sub interface and a second-class sub interface based on the parent data set, the data conversion policy, and the comparison policy, generating a parent interface based on the parent data set, and deleting data in the first-class sub data set and the second-class sub data set. For example, if the first-child data set is desensitized data of the first-parent data set, the data conversion policy corresponding to the first-child data set is a desensitization conversion policy, the desensitization conversion policy may be formed based on the source code, and the middlebox stores the desensitization conversion policy formed by the source code. For example, if the first-child data set is part of the data in the first-parent data set, the data conversion policy corresponding to the second-child data set is a data comparison policy, the data comparison policy may be formed based on the source code, and the middle station stores the data comparison policy formed by the source code.

Wherein, step S160 includes:

and respectively extracting the storage addresses of the parent data set and the data conversion strategy to obtain a parent address and a conversion address, and generating the child interface according to the parent address and the conversion address. The data can be stored according to different conditions of the data when the data is stored, the child interface of the child data set can be the address corresponding to the parent data set and the address of the data conversion strategy, the parent data set is converted into the child data set on line through the conversion strategy, the child data set is prevented from being stored, and the storage quantity of total data is reduced. The parent interface may be the address of the corresponding parent data set.

And respectively extracting the storage addresses of the parent data set and the comparison strategy to obtain a parent address and a comparison address, and generating the child class II interface according to the parent address and the comparison address. The data can be stored according to different conditions of the data when the data is stored, the child class two interfaces of the child class two data set are called to be addresses corresponding to the parent class data set and addresses of a data conversion strategy, the parent class data set is converted into the child class two data set on line through the conversion strategy, the condition that the child class two data set is stored is avoided, and the storage quantity of the total data is reduced.

Step S170, acquiring a first sub-authority range, a second sub-authority range and a first parent authority range of the first sub-data set, the second sub-data set and the first parent authority range of the first parent data set, the second sub-authority range and the second parent authority range, setting the first sub-authority range and the second sub-authority range respectively corresponding to the first sub-interface and the second sub-interface, and setting the second parent authority range respectively corresponding to the first parent interface, the second sub-interface and the second sub-interface.

The sub-class data set can be regarded as the data set 2, and the sub-authority range can be regarded as a role corresponding to the data set 2, for example, a district level statistical department. And when the middle station judges the role corresponding to the regional statistical department to carry out the data access request, the middle station calls the corresponding sub-class interface. The sub-class two data set can be regarded as the data set 3, and the sub-class two authority range can be regarded as the role corresponding to the data set 3, for example, the role is a ballast power supply unit. And when the middle station judges that the role corresponding to the corresponding ballast power supply unit carries out a data access request, the middle station calls the corresponding sub-class II interface.

In the technical solution provided by the present invention, preferably, step S110 specifically includes:

and extracting sensitive data in the data source according to a preset sensitive condition, and determining the number and the position of the mask according to the data quantity value of the sensitive data. The sensitive preset conditions include a plurality of preset text formats, the mobile phone number and the address are taken as examples in the invention, for example, a user in an electricity utilization data set is an electricity utilization account registered through the mobile phone number, and the electricity utilization is recorded through the electricity utilization account, at this time, the mobile phone number of the user and the electricity utilization address of the user can be counted by the invention, for example, the mobile phone number of the user is 13577689980, and the address is 88 in village, townhuancun, village, 1, in mountainous area, hanshan district, Beijing city. At this time, the cell phone number 13577689980 and the address 88 of the area of Beijing city, mountain area, Han village 1 in Han village town are sensitive data, and cannot be leaked out at will. The preset text format of the mobile phone number is 1XXXXXXXXXX, namely the preset text format of the mobile phone number is triggered as long as 11 continuous numbers with 1 beginning exist, the preset text format of the address can be XX town XX street/village in XX city XX area, and when the city, the district, the town, the street/village appear, the preset text format of the address is considered to be the preset text format of the starting address.

And processing the sensitive data mask in the data source based on the mask number and the mask position to obtain a corresponding first mask data set, and generating a sub-second data set according to the first mask data set. After the mask number and the mask position are obtained, the sensitive data are processed according to the mask number and the mask position, for example, if the mobile phone number 13577689980 of the sensitive data is described, the mask number is 5, and the mask position is a rear position, the data after masking the mobile phone number 13577689980 of the sensitive data is 135776 xxxx. When the data set is generated according to the data source, the present invention performs different processing according to the data condition of different data sets, for example, the data set 2 needs desensitized data, and the present invention performs desensitization processing on the data of the data source with respect to the data set 2 to obtain data in the data set 2.

And extracting partial data in the data source according to a partial preset condition, and obtaining a corresponding primary partial data set based on the partial data. It can be understood that the primary partial data set is intermediate data. For example, there is a town statistics station in the townhuang river town, which needs all desensitized power consumption data of the town, and at this time, the data source includes all non-desensitized data in the whole region, for example, the town statistics corresponds to a role authority, the role authority corresponds to the data set 5, and the data in the data set 5 is the desensitized data of the corresponding town. The electricity consumption of all households in the Korean villages and towns is extracted under partial preset conditions, and the extraction modes are various in the prior art, and the invention is not explained. At this time, the primary partial data set includes non-desensitized power consumption data of all households in the towns.

And extracting sensitive data in the primary partial data set according to a preset sensitive condition, and determining the number and the position of a mask according to the data quantity value of the sensitive data. The manner of extracting the sensitive data in the primary partial data set and the manner of extracting the sensitive data in the data source may be the same, and the details are not repeated in the present invention.

And performing mask processing on the sensitive data in the primary partial data set based on the mask number and the mask position to obtain a corresponding second mask data set. The step of masking the sensitive data to obtain the second set of masked data may be in the same way as obtaining the first set of masked data.

Generating a sub-three class data set based on the second mask data set. The sub-three types of data sets are data obtained by processing the data source through two or more processing steps, and the sub-three types of data sets can be directly stored after being obtained. And generating a sub-three-type interface according to the storage address of the sub-three-type data set. When the roles corresponding to the sub-three interfaces access the data set 5, the sub-three interfaces can directly call the sub-three data sets.

The method does not store the first-class data set and the second-class data set, but stores the third-class data set, because the third-class data set needs to be obtained by processing the parent data twice, and the two processing respectively comprises comparison and conversion. The method can fix the parent data, namely considering the parent data as a data source. All the generated data sets are compared with the data source, and the data sets are judged to be the first-class data sets, the second-class data sets or the third-class data sets.

The extracting of the sensitive data in the primary partial data set according to a preset sensitivity condition, and the determining of the number of masks and the positions of the masks according to the data quantity value of the sensitive data specifically include:

extracting text information meeting a preset text format in the primary partial data set, and determining the number of texts in the text information. For example, the preset text format is a mobile phone number, the text information is 13577689980, and the number of texts is 11.

wherein,

in order to calculate the number of masks,

for the number of mask alignments to be considered,

the amount of text that is sensitive data within a partial data set at a time,

is as followshThe preset weight value of the character form is set,

is as followshAnd (4) a preset proportion value of the character form.

By passing

The ratio of the masks in the previous behavior is obtained, and the mask quantity of different types of text information is counted in each mask processing of the invention. When sensitive data are processed, the mask number is calculated according to historical behaviors, so that the mask mode can shield sensitive contents in a sensitive data set and expose partial contents with statistical significance, the mask position can be determined according to the type of a character form, the sensitive contents in the sensitive data set are shielded in a targeted and strategic manner, and data statistics can be conveniently carried out by workers with corresponding roles.

For example, the address is 88 th area of the south mountain area, the south village, the townhu, the town, the south village, the south China. Although the information does not specify which household is, it is possible to know which village the user belongs to, and statistics of village electricity data can be performed. The method achieves the purposes of shielding the sensitive contents in the sensitive data set and exposing partial contents with statistical significance.

And determining the mask position to be any one of a front position, a middle position or a rear position based on the type of the character form. The front set, the middle set or the rear set may be preset, for example, the mobile phone number is set as the middle set, the address is set as the rear set, and the like, and may be set according to different requirements.

In one possible implementation mode, receiving a mask number correction value input by a user for the text information in the h character form, correcting the preset weight value of the h character form by the following formula based on the mask number correction value,

wherein,

in order to correct the value for the number of masks,

in order to obtain the corrected preset weight value,

the correction increases the weight of the image data,

the correction reduces the weight.

To define the conditions, overfitting is avoided.

The invention can determine the number of the masks by adopting an active learning mode, when a user considers that the number of the output masks does not accord with the current scene, a mask number correction value which is more accordant with the current scene can be input, the mask number correction value is compared with the calculated mask number, and the mask number correction value is compared with the calculated mask number

The change trend of the preset weight value is determined, so that the corrected preset weight value is more accurate, and different weights and change amplitudes can be adopted according to increasing or decreasing differences when the preset weight value is corrected, so that the corrected preset weight value is more accurate and is fit to a use scene.

In one possible embodiment, the invention further comprises:

wherein,

is the amount of data in the sub-class data set or the sub-class two data set,

in order to achieve the processing efficiency of the processing apparatus,

is as followsxThe number of queries per day, Z is the number of data queries,

is as followsoThe duration of the secondary data query.

By passing

The daily data query frequency of the middlebox in the actual use process can be obtained, and

the occupied time of the middle station provided by the invention in each inquiry in the actual use process can be obtained. Since different companies use middleboxes with different configurations, the invention acquires the processing efficiency and the cache space of the processing unit of the middlebox, wherein the processing efficiency can be the frequency of the CPU, and the cache space can be the cache space of the CPU. The invention is in the category of calculatorsWhen the processing load of the data set or the sub-second type data set is processed, the data volume of the sub-first type data set or the sub-second type data set and the performance of the middle station are considered, the situation that the middle station is crashed due to overlarge load when the sub-first type data set or the sub-second type data set is generated is avoided, and the method provided by the invention is stable under certain concurrency.

If it is as described

Greater than a predetermined value

And if so, not deleting the first-class data set or the second-class data set and classifying the first-class data set or the second-class data set into a flat-class data set for storage. When in use

Greater than a predetermined value

Then, it is proved that the generation of the first-class data set or the second-class data set is relatively large compared with the configuration load of the middle station, so the invention needs to classify the first-class data set or the second-class data set into a flat data set for storage, and the interface of the flat data set corresponds to the corresponding role, so that the corresponding role directly accesses the flat data set through the interface of the flat data set when accessing. The corresponding role can be the district statistics department and the ballast power supply unit.

According to the technical scheme provided by the invention, the data which can be accessed by different roles can be stored in different modes according to the data volume and the configuration of the middle station.

The middle stage of the present invention, as shown in fig. 2, may further include the following two major components:

the authority configuration center component is an online authority configuration service, and can complete all authority-related configuration management operations based on the service;

and the authority access assembly consists of a plurality of processing units and corresponds to different authority access methods.

The permission configuration flow chart shown in fig. 3 includes the following steps:

and configuring a data source, namely configuring the data source needing to be subjected to authority management.

And configuring the data authorities owned by different main bodies according to the service requirements. The main body is divided into five types of tenants, applications, groups, roles and users, the priority is from low to high, and when the same type of authorities conflict, one of the authority configurations is selected by combining the priority configuration. The permission types are divided into a table level, a row level and a column level, the table level belongs to the coarsest granularity level, and the row level and the column level are provided with table-level permission. The row-level authority support returns a conforming response according to the condition control of the current user, the current group and the father group, and the current group and the subgroup. Column level permissions support a variety of desensitization rules, for example: "156 × 2345", "ge 11", and the like, by default. The configuration mode supports a visualization mode and a developer mode, the visualization mode can complete simple configuration, and the developer mode supports complex configuration such as multi-table association.

And testing and releasing, wherein the online test can be carried out after the configuration is finished, and the configured strategy can be released and taken into effect after the configuration is passed.

As shown in fig. 4, the right access flow chart includes the following steps:

selecting an access mode, and selecting the most appropriate access mode by each service system according to actual requirements

And calling the SDK, wherein the service system can use auth-SDK provided by the authority configuration center to realize the required authority processing in the access mode.

And replacing the connection pool, wherein the service system needs to modify dependence in the access mode, and the database connection pool is replaced by auth-pool provided by the authority configuration center.

And replacing the connection address, wherein the service system needs to modify the configuration in the access mode, and the database connection address is replaced by an auth-proxy address provided by the authority configuration center.

And (3) during operation injection, the service system needs to modify the start script in the access mode, and the auth-agent provided by the authority configuration center is dynamically injected in a java agent injection mode.

As shown in fig. 5, the permission interception flow chart includes the following steps:

and starting the service and starting the corresponding service.

And (3) pulling the authority configuration information from the authority configuration center, and after the service is started, pulling (or pushing) the authority data configured by the authority configuration center by using a push at regular time and caching the authority data to the local.

And the service initiates subsequent operation after receiving the data processing request.

And (4) global SQL interception, which is realized through the authority access component.

SQL AST parsing, SQL is parsed to form an abstract syntax tree.

And judging whether the request needs to be subjected to authority processing or not by comparing the abstract syntax tree generated in the step five and the authority configuration information acquired in the step two if the interception configuration is hit.

And modifying SQL adding authority to filter SQL segments, and if the authority needs to be processed, adding the authority to the requested SQL according to the configuration logic to process the SQL segments.

And executing the SQL and returning a result, and executing the corresponding SQL and returning the result.

An embodiment of the present invention further provides a data set right management and control device based on a middlebox, as shown in fig. 6, including:

the judgment determining module is used for determining that two data sets are respectively a parent data set and a child data set according to the data conversion direction if data conversion can be carried out between any two data sets according to a data conversion strategy;

the judging and generating module is used for generating a heterogeneous data set corresponding to any two data sets if the data conversion between the two data sets can not be carried out according to the data conversion strategy;

the interface generation module is used for generating and storing a data conversion strategy and a comparison strategy corresponding to the first-class data set and the second-class data set, respectively generating a first-class sub interface and a second-class sub interface based on the parent-class data set, the data conversion strategy and the comparison strategy, and deleting data in the first-class sub data set and the second-class sub data set;

and the corresponding module is used for acquiring a first sub-authority range, a second sub-authority range and a parent authority range of the first sub-data set, the second sub-data set and the parent data set, setting the first sub-authority range and the second sub-authority range respectively corresponding to the first sub-interface and the second sub-interface, generating a parent interface based on the parent data set, and setting the parent authority range respectively corresponding to the parent interface, the first sub-interface and the second sub-interface.

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The data set right management and control method based on the middle station is characterized by comprising the following steps:

2. The middlebox-based data set authority management and control method according to claim 1,

respectively generating a first sub-class interface and a second sub-class interface based on the parent class data set, the data conversion strategy and the comparison strategy comprises the following steps:

3. The middlebox-based data set authority management and control method according to claim 1,

the middle station performs mirror image processing on the received data sources to obtain a plurality of data sources, and processes the plurality of data sources respectively to obtain a plurality of different data sets, wherein each data set has corresponding specific data comprising:

extracting sensitive data in the data source according to a preset sensitive condition, and determining the number and the position of a mask according to the data quantity value of the sensitive data in the data source;

extracting sensitive data in the primary partial data set according to a preset sensitive condition, and determining the number and the position of a mask according to the data quantity value of the sensitive data in the primary partial data set;

generating a sub-three class data set based on the second mask data set.

4. The middlebox-based data set authority management and control method according to claim 3,

and generating sub-three interfaces according to the storage addresses of the sub-three data sets.

5. The middlebox-based data set authority management and control method according to claim 3,

extracting sensitive data in the primary partial data set according to a preset sensitive condition, and determining the number of masks and the positions of the masks according to the data quantity value of the sensitive data comprises the following steps:

wherein,

in order to calculate the number of masks,

for the number of mask alignments to be considered,

the amount of text that is sensitive data within a partial data set at a time,

firstly, toFirst in previously stored primary partial data setiThe number of texts of the individual text information,

is as followshThe preset weight value of the character form is set,

is as followshA preset proportion value of the character form is planted;

6. The middlebox-based data set authority management and control method according to claim 5,

receiving a mask number correction value input by a user for the text information in the h character form, correcting the preset weight value of the h character form by the following formula based on the mask number correction value,

wherein,

in order to correct the value for the number of masks,

in order to obtain the corrected preset weight value,

in order to correct the increased weight, the weight is increased,

the weight is reduced for correction.

7. The middlebox-based data set authority management and control method according to claim 1,

if data conversion can be performed between any two data sets according to a data conversion strategy, determining that the two data sets are a parent data set and a child data set respectively according to the data conversion direction comprises:

8. The middlebox-based data set authority management and control method according to claim 7, further comprising:

wherein,

is the amount of data in the sub-class data set or the sub-class two data set,

in order to achieve the processing efficiency of the processing apparatus,

t is the number of days value,

is as followsxThe number of queries per day, Z is the number of data queries,

duration for the o-th data query;

if it is as described

Greater than a predetermined value

9. Data set authority management and control device based on middle stage includes:

10. Storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 8.