CN116595567A

CN116595567A - Dynamic data isolation method and system based on multiple data sources

Info

Publication number: CN116595567A
Application number: CN202310868566.5A
Authority: CN
Inventors: 张福军; 方晓明; 王驰旭; 何伟
Original assignee: Green City Technology Industry Service Group Co ltd
Current assignee: Green City Technology Industry Service Group Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-08-15

Abstract

The invention relates to the technical field of data isolation, in particular to a dynamic data isolation method based on multiple data sources, which comprises the following steps: collecting the dynamic data of all data sources into target dynamic data, and carrying out data state labeling and data splitting operation on the target dynamic data to obtain a target text data set, a target picture data set and a target audio data set; encoding the target text dataset, the target picture dataset and the target audio dataset into standard mode feature sets by using a cross-mode encoder respectively; and carrying out feature clustering on the standard modal feature set to obtain a modal feature class set, extracting key features of the isolation keys of each data isolation area, and carrying out encryption isolation on corresponding data in the target dynamic data according to the key features and the modal feature class set. The invention also provides a dynamic data isolation system based on multiple data sources. The invention can improve the safety of dynamic data isolation.

Description

Dynamic data isolation method and system based on multiple data sources

Technical Field

The invention relates to the technical field of data isolation, in particular to a dynamic data isolation method and system based on multiple data sources.

Background

Multiple data sources dynamic data refers to data with changes and updates from multiple different data sources. These data sources may include different systems, applications, sensors or devices that generate data that is dynamically changing in time, requiring data isolation for dynamic data of different data sources during data processing and sharing in order to secure the data of the dynamic data generated by the data sources in real time.

In practical application, the data isolation method based on the data sources cannot find the internal association of different data sources, and the data privacy leakage may be caused by isolating the dynamic data of one data source in one data isolation area, and the security is lower when the dynamic data isolation is performed.

Disclosure of Invention

The invention provides a dynamic data isolation method and a system based on multiple data sources, which mainly aim to solve the problem of lower safety when dynamic data isolation is performed.

In order to achieve the above object, the present invention provides a method for dynamic data isolation based on multiple data sources, comprising:

Collecting dynamic data of all data sources into target dynamic data, carrying out data state labeling on the target dynamic data to obtain target labeling data, and splitting the target labeling data into a target text data set, a target picture data set and a target audio data set according to data types;

performing text vectorization operation on the target text data set to obtain a target text feature set, sequentially performing picture denoising and picture feature extraction operation on the target picture data set to obtain a target picture feature set, and sequentially performing audio standardization and frequency domain feature extraction operation on the target audio data set to obtain a target audio feature set;

mapping the target text feature set into a standard text feature set, mapping the target picture feature set into a standard picture feature set, mapping the target audio feature set into a standard audio feature set, aggregating the standard text feature set, the standard picture feature set and the standard audio feature set into a standard mode feature set by using a pre-trained cross-mode encoder respectively;

acquiring the total number of all the data isolation areas, carrying out feature clustering on the standard modal feature set according to the total number of the areas and a preset modal reconstruction distance algorithm to obtain a modal feature class set, collecting isolation keys of all the data isolation areas into an isolation key set, and extracting a key feature set corresponding to the isolation key set;

Selecting the mode feature classes in the mode feature classes one by one as target mode feature classes, screening the isolation key corresponding to the target mode feature class from the isolation key set according to the key feature set to serve as a target isolation key, screening the data corresponding to the target mode feature class from the target labeling data to serve as target data to be isolated, encrypting and isolating the target data to be isolated by utilizing the target isolation key, and ending data isolation when the target mode feature class is the last mode feature class in the mode feature classes.

Optionally, the performing data state labeling on the target dynamic data to obtain target labeling data includes:

selecting data in the target dynamic data one by one as target data, and sequentially acquiring a data time stamp, a data position and a data type of the target data;

performing time marking on the target data according to the data time stamp to obtain target time data;

performing position marking on the target time data according to the data position to obtain target position data;

and performing type labeling on the target position data according to the data type to obtain target type data, and collecting all the target type data into target labeling data.

Optionally, the performing text vectorization operation on the target text data set to obtain a target text feature set includes:

selecting text data in the target text data set one by one as target text data, and performing text word segmentation on the target text data to obtain a target text word sequence;

screening out the deactivated words and the messy code words from the target text word sequence in sequence to obtain a target filtering word sequence;

vectorizing each word in the target filtered word sequence to obtain a target word vector sequence;

and splicing the word vectors in the target word vector sequence according to the sequence number order to obtain target text features, and collecting all the target text features into a target text feature set.

Optionally, the sequentially performing the operations of image denoising and image feature extraction on the target image dataset to obtain a target image feature set includes:

selecting the picture data in the target picture data set one by one as target picture data, and carrying out picture graying operation on the target picture data to obtain a target gray picture;

performing contrast enhancement and median filtering operation on the target gray level picture in sequence to obtain a target denoising picture;

And sequentially carrying out picture stretching and texture feature extraction operations on the target denoising picture to obtain target picture features, and collecting all the target picture features into a target picture feature set.

Optionally, the sequentially performing audio normalization and frequency domain feature extraction on the target audio data set to obtain a target audio feature set, including:

selecting the audio data in the target audio data set one by one as target audio data, and sequentially resampling and effectively extracting the target audio data to obtain target extracted audio;

performing filtering and denoising operation on the target extracted audio to obtain a target denoising frequency;

sequentially performing audio windowing and audio framing operation on the target noise-removing frequency to obtain target framing audio;

sequentially performing frequency domain conversion and Mel filtering operation on the target framing audio to obtain target frequency spectrum audio;

and sequentially carrying out logarithmic compression and cepstrum coefficient calculation on the target frequency spectrum audio to obtain target audio characteristics, and collecting all the target audio characteristics into a target audio characteristic set.

Optionally, the mapping the target text feature set to a standard text feature set, the target picture feature set to a standard picture feature set, and the target audio feature set to a standard audio feature set with a pre-trained cross-modality encoder respectively includes:

Mapping the target text feature set into a dimension-reducing text feature set, mapping the target picture feature set into a dimension-reducing picture feature set and mapping the target audio feature set into a dimension-reducing audio feature set by utilizing a full connection layer of a pre-trained cross-mode encoder;

mapping the dimension-reduced text feature set into an attention text feature set, the dimension-reduced picture feature set into an attention-seeking slice feature set, and the dimension-reduced audio feature set into an attention audio feature set by using a self-attention layer of the cross-modal encoder respectively;

weighting operation is carried out on each dimension-reduced text feature in the dimension-reduced text feature set according to the attention text feature set, so that a standard text feature set is obtained;

weighting operation is carried out on each dimension-reduced picture feature in the dimension-reduced picture feature set according to the attention-seeking picture feature set, so that a standard picture feature set is obtained;

and carrying out weighted operation on each dimension-reduction audio feature in the dimension-reduction audio feature set according to the attention audio feature set to obtain a standard audio feature set.

Optionally, the weighting operation is performed on each dimension-reduced text feature in the dimension-reduced text feature set according to the attention text feature set to obtain a standard text feature set, which includes:

Selecting the attention text features of the attention text feature concentrate as target attention text features one by one, taking a query vector in the target attention text features as a text query vector, a key vector in the target attention text features as a text key vector and a value vector in the target attention text features as a text value vector;

inquiring attention try sheet characteristics corresponding to the target attention text characteristics from the attention try sheet characteristics set according to the text key vectors and the text value vectors to be used as relevant attention try sheet characteristics, taking key vectors of the relevant attention try sheet characteristics as picture key vectors, and screening out dimension-reducing picture characteristics corresponding to the relevant attention try sheet characteristics from the dimension-reducing picture characteristics set to be used as relevant dimension-reducing picture characteristics;

inquiring attention audio features corresponding to the target attention text features from the attention audio feature set according to the text key vectors and the text value vectors to serve as relevant attention audio features, taking key vectors of the relevant attention audio features as audio key vectors, and screening out dimension-reducing audio features corresponding to the relevant attention audio features from the dimension-reducing audio feature set to serve as relevant dimension-reducing audio features;

Taking the inner product of the text query vector and the picture key vector as a picture attention weight, and taking the product of the picture attention weight and the related dimension-reduction picture feature as a weighted dimension-reduction picture feature;

taking the inner product of the text query vector and the audio key vector as an audio attention weight, and taking the product of the audio attention weight and the related dimension-reduction audio feature as a weighted dimension-reduction audio feature;

and taking the target attention text feature, the weighted dimension-reduction picture feature and the average vector of the weighted dimension-reduction audio feature as standard text features, and collecting all the standard text features into a standard text feature set.

Optionally, the extracting the key feature set corresponding to the isolated key set includes:

performing global pooling operation on each isolation key in the isolation key set to obtain a primary key feature set;

establishing a primary key feature matrix according to the primary key feature set, wherein the primary key feature matrix comprises the following steps:

；

wherein ,refers to the primary key feature matrix, < >>Means +.>First->Vitamin characteristics (I)>Is the total number of features of the primary key feature set,/or- >Is the total feature dimension of each primary key feature in the primary key feature set;

calculating the spatial variance of the primary key feature matrix by using the following maximum variance algorithm:

；

wherein ,refers to the spatial variance,/->Is a maximum function>Is the projection matrix of the primary key feature matrix,/i>Means the feature dimension of the projection matrix, < >>Is the primary key feature matrix->Mean vector of>Refers to transposed symbols;

calculating a standard projection matrix of the primary key feature matrix according to the spatial variance;

performing feature projection on the primary key feature matrix by using the standard projection matrix to obtain a dimension-reduction key feature matrix;

and extracting a key feature set from the dimension-reduction key feature matrix.

Optionally, the encrypting and isolating the target data to be isolated by using the target isolation key includes:

selecting data in the target data to be isolated one by one as target encrypted data, and splitting the target encrypted data to obtain a plurality of target encrypted data blocks;

performing byte substitution, row shifting, column confusion and round key addition on each target encrypted data block by using the target isolation key in sequence to obtain ciphertext data;

And storing all ciphertext data into a data isolation area corresponding to the target isolation key, and ending encryption isolation.

In order to solve the above problems, the present invention further provides a dynamic data isolation system based on multiple data sources, the system comprising:

the data labeling module is used for collecting the dynamic data of all the data sources into target dynamic data, carrying out data state labeling on the target dynamic data to obtain target labeling data, and splitting the target labeling data into a target text data set, a target picture data set and a target audio data set according to data types;

the feature extraction module is used for carrying out text vectorization operation on the target text data set to obtain a target text feature set, sequentially carrying out picture denoising and picture feature extraction operation on the target picture data set to obtain a target picture feature set, and sequentially carrying out audio standardization and frequency domain feature extraction operation on the target audio data set to obtain a target audio feature set;

the modal coding module is used for mapping the target text feature set into a standard text feature set, mapping the target picture feature set into a standard picture feature set and mapping the target audio feature set into a standard audio feature set by utilizing a pre-trained cross-modal coder, and converging the standard text feature set, the standard picture feature set and the standard audio feature set into a standard modal feature set;

The feature clustering module is used for acquiring the total number of the areas of all the data isolation areas, carrying out feature clustering on the standard modal feature sets according to the total number of the areas and a preset modal reconstruction distance algorithm to obtain modal feature class sets, collecting the isolation keys of all the data isolation areas into isolation key sets, and extracting key feature sets corresponding to the isolation key sets;

the data isolation module is used for selecting the mode feature classes in the mode feature class set one by one as target mode feature classes, screening the isolation key corresponding to the target mode feature class from the isolation key set according to the key feature set to be used as a target isolation key, screening the data corresponding to the target mode feature class from the target labeling data to be used as target data to be isolated, encrypting and isolating the target data to be isolated by utilizing the target isolation key, and ending data isolation when the target mode feature class is the last mode feature class in the mode feature class set.

The embodiment of the invention collects the dynamic data of all data sources into the target dynamic data, carries out data state labeling on the target dynamic data to obtain the target labeling data, can facilitate the subsequent feature analysis and data isolation on the dynamic data, can facilitate the subsequent feature extraction and data isolation on different types of data by splitting the target labeling data into the target text data set, the target picture data set and the target audio data set according to the data types, carries out text vectorization operation on the target text data set to obtain the target text feature set, carries out picture denoising and picture feature extraction operation on the target picture data set in sequence to obtain the target picture feature set, carries out audio standardization and frequency domain feature extraction operation on the target audio data set in sequence, the method comprises the steps of obtaining a target audio feature set, converting text data, picture data and audio data into corresponding features respectively, reducing data dimension, facilitating subsequent extraction of cross-modal features, mapping the target text feature set into a standard text feature set, mapping the target picture feature set into a standard picture feature set and mapping the target audio feature set into a standard audio feature set respectively by utilizing a pre-trained cross-modal encoder, converging the standard text feature set, the standard picture feature set and the standard audio feature set into a standard modal feature set, and realizing cross-modal mapping of texts, pictures and audio features, thereby facilitating classification according to internal features of different modal data.

The method comprises the steps of obtaining the total number of all data isolation areas, carrying out feature clustering on the standard mode feature sets according to the total number of the areas and a preset mode reconstruction distance algorithm to obtain mode feature class sets, collecting isolation keys of all data isolation areas into isolation key sets, extracting the key feature sets corresponding to the isolation key sets, realizing data classification according to the relevance between internal structural content features of different data sources and different types of dynamic data, realizing dimension reduction coding of the isolation keys according to the features of the keys, facilitating matching of subsequent isolation area data and the isolation keys, improving data safety, and screening out isolation keys corresponding to the target mode feature classes from the isolation key sets according to the key feature sets to serve as target isolation keys. Therefore, the dynamic data isolation method and system based on multiple data sources can solve the problem of lower safety when dynamic data isolation is performed.

Drawings

FIG. 1 is a flow chart of a dynamic data isolation method based on multiple data sources according to an embodiment of the present application;

fig. 2 is a schematic flow chart of extracting a feature set of a target picture according to an embodiment of the application;

FIG. 3 is a flowchart illustrating a method for extracting a target audio feature set according to an embodiment of the present application;

FIG. 4 is a functional block diagram of a dynamic data isolation system based on multiple data sources according to an embodiment of the present application;

the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a dynamic data isolation method based on multiple data sources. The execution subject of the multi-data source-based dynamic data isolation method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the multi-data source based dynamic data isolation method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Referring to fig. 1, a flow chart of a dynamic data isolation method based on multiple data sources according to an embodiment of the invention is shown. In this embodiment, the method for dynamic data isolation based on multiple data sources includes:

s1, collecting dynamic data of all data sources into target dynamic data, carrying out data state labeling on the target dynamic data to obtain target labeling data, and splitting the target labeling data into a target text data set, a target picture data set and a target audio data set according to data types.

In the embodiment of the invention, the data source refers to a data source, namely a system, a device, an application or a service for storing or generating data, the data source refers to a system or an application program, a plurality of data sources are used in the data source, the data sources may be from different databases, file systems, sensors, APIs, web services or other data storage or generation devices and systems, and the dynamic data refers to data which is changed and updated at different time points, and the value, attribute, state or characteristic of the data changes with the time.

In the embodiment of the present invention, the performing data state labeling on the target dynamic data to obtain target labeling data includes:

In detail, the data timestamp refers to real-time corresponding to the target dynamic data, dynamic data can be converted into static data by utilizing the data timestamp to carry out dynamic data marking, subsequent data feature extraction and data isolation are facilitated, the data position refers to a data source position of the target dynamic data and a relative position in the target dynamic data, the target position data is obtained by carrying out position marking on the target time data according to the data position, subsequent data positioning and data recovery are facilitated, and the data types refer to storage format types of data, such as text data of TXT and BLOG, picture data of JPEG and PNG and audio data of WAV and MP 3.

In detail, the data timestamp of the target data may be obtained by using a stat function of linux or a getFileTime function of windows, the data location may be obtained by using a realpath function of linux or a getfullpath name function of windows, and the data type may be obtained according to a file extension of the data by using a regular expression.

In the embodiment of the present invention, splitting the target annotation data into the target text data set, the target picture data set and the target audio data set according to the data types refers to collecting all data with text data in the target annotation data into the target text data set, collecting all data with picture data in the target annotation data into the target picture data set, and collecting all data with audio data in the target annotation data into the target audio data set.

In the embodiment of the invention, the dynamic data of all data sources are collected into the target dynamic data, the data state of the target dynamic data is marked to obtain the target marked data, the subsequent feature analysis and data isolation of the dynamic data can be facilitated, and the target marked data is split into the target text data set, the target picture data set and the target audio data set according to the data types, so that the subsequent feature extraction and data isolation of different types of data can be facilitated.

S2, performing text vectorization operation on the target text data set to obtain a target text feature set, sequentially performing picture denoising and picture feature extraction operation on the target picture data set to obtain a target picture feature set, and sequentially performing audio standardization and frequency domain feature extraction operation on the target audio data set to obtain a target audio feature set.

In the embodiment of the invention, the target text feature set is a feature set formed by integrating a plurality of target text features, and each target text feature is a feature obtained by vectorizing one target text data in the target text data set.

In the embodiment of the present invention, the text vectorization operation is performed on the target text data set to obtain a target text feature set, including:

In the embodiment of the invention, the target text data can be subjected to text word segmentation by utilizing a jiaba word segmentation algorithm or a conditional random field algorithm (Conditional Random Fields, abbreviated as CRF) to obtain a target text word sequence.

In detail, the stop Word list-based screening method may be used to screen the stop Word from the target text Word sequence, where the stop Word refers to a common Word that is not helpful to text analysis, such as "yes", "in", and the like, the messy Word refers to a Word that cannot be normally resolved in the text due to incompatibility of coding formats or other reasons in the text processing process, and the Word appears as a series of messy code symbols, so that effective semantic analysis and processing cannot be performed, and the single-hot coding or Word2vec algorithm may be used to perform vectorization operation on each Word in the target filter Word sequence to obtain the target Word vector sequence.

In detail, referring to fig. 2, the sequentially performing the operations of denoising the image and extracting the image feature from the target image dataset to obtain a target image feature set includes:

S21, selecting the picture data in the target picture data set one by one as target picture data, and performing picture graying operation on the target picture data to obtain a target gray picture;

s22, performing contrast enhancement and median filtering operation on the target gray level picture in sequence to obtain a target denoising picture;

and S23, sequentially carrying out picture stretching and texture feature extraction operations on the target denoising picture to obtain target picture features, and collecting all the target picture features into a target picture feature set.

In the embodiment of the invention, the image graying operation can be performed on the target image data by using a weighted average method or an average method to obtain a target Gray image, the contrast enhancement operation can be performed on the target Gray image by using a linear stretching method or a histogram equalization method, the image stretching operation on the target noise-removed image means that the image size of the target image is stretched to a specified size and size ratio, and the texture feature extraction operation can be performed by using a Gray-Level Co-occurrence Matrix (abbreviated as GLCM) or a Gabor filter to obtain the target image feature.

Specifically, referring to fig. 3, the sequentially performing audio normalization and frequency domain feature extraction on the target audio data set to obtain a target audio feature set includes:

s31, selecting the audio data in the target audio data set one by one as target audio data, and sequentially resampling and effectively extracting the target audio data to obtain target extracted audio;

s32, filtering and denoising operation is carried out on the target extracted audio to obtain a target denoising frequency;

s33, sequentially performing audio windowing and audio framing operation on the target noise-removal frequency to obtain target framing audio;

s34, sequentially performing frequency domain conversion and Mel filtering operation on the target framing audio to obtain target frequency spectrum audio;

and S35, sequentially carrying out logarithmic compression and cepstrum coefficient calculation on the target frequency spectrum audio to obtain target audio features, and collecting all the target audio features into a target audio feature set.

In detail, the target audio data may be resampled using polynomial interpolation (Polynomial Interpolation) or fourier transform interpolation (Fourier Transform Interpolation), and the effective audio extraction is to remove silence segment audio from the target audio data, and audio of a signal is preserved.

Specifically, a statistical filter or a time domain filtering method can be utilized to perform filtering and denoising operations on the target extracted audio to obtain a target noise-removing frequency, and a hanning window or a rectangular window can be utilized to sequentially perform audio windowing and audio framing operations on the target noise-removing frequency to obtain target framing audio.

In detail, the target framing audio may be subjected to frequency domain conversion by using a fast fourier transform (Fast Fourier Transform, abbreviated as FFT), a mel filter operation may be performed by using a mel filter bank to obtain a target spectrum audio, and a cepstrum coefficient calculation may be performed by using a discrete cosine transform (Discrete Cosine Transform, abbreviated as DCT) to obtain a target audio feature.

According to the embodiment of the invention, the target text data set is subjected to text vectorization operation to obtain the target text feature set, the target picture data set is subjected to picture denoising and picture feature extraction operation in sequence to obtain the target picture feature set, and the target audio data set is subjected to audio standardization and frequency domain feature extraction operation in sequence to obtain the target audio feature set, so that the text data, the picture data and the audio data can be respectively converted into corresponding features, and the data dimension is reduced, thereby facilitating the subsequent cross-modal feature extraction.

S3, mapping the target text feature set into a standard text feature set, mapping the target picture feature set into a standard picture feature set and mapping the target audio feature set into a standard audio feature set, respectively using a pre-trained cross-mode encoder, and converging the standard text feature set, the standard picture feature set and the standard audio feature set into a standard mode feature set.

In the embodiment of the invention, the cross-modal encoder refers to an unsupervised neural network trained by cross entropy loss functions through text features, picture features and audio features in advance, and can convert the text features, the picture features and the audio features into uniform cross-modal features, and the cross-modal encoder comprises a full-connection layer, a hidden layer, a self-attention layer and an output layer.

In an embodiment of the present invention, the mapping the target text feature set to a standard text feature set, the target picture feature set to a standard picture feature set, and the target audio feature set to a standard audio feature set by using a pre-trained cross-mode encoder respectively includes:

In detail, the full connection layer (Fully Connected Layer) is a basic neural network layer for connecting all nodes of a previous layer to each node of a current layer, and the full connection layer using a pre-trained cross-mode encoder maps the target text feature set to a dimension-reduced text feature set, the target picture feature set to a dimension-reduced picture feature set, and the target audio feature set to a dimension-reduced audio feature set, respectively, refers to mapping text features in the target text feature set, picture features in the target picture feature set, and audio features in the target audio feature set to the same feature dimension and feature range using the full connection layer.

Specifically, the mapping the dimension-reduced text feature set to an attention text feature set, the dimension-reduced picture feature set to an attention picture feature set, and the dimension-reduced audio feature set to an attention audio feature set by using the self-attention layer of the cross-modal encoder respectively refers to calculating a Query vector (Query), a Key vector (Key), and a Value vector (Value) for each dimension-reduced feature in the dimension-reduced text feature set, the dimension-reduced picture feature set, and the dimension-reduced audio feature set by using the self-attention layer, and assembling the Query vector (Query), the Key vector (Key), and the Value vector (Value) into the attention feature corresponding to the dimension-reduced feature.

In the embodiment of the present invention, the weighting operation is performed on each dimension-reduced text feature in the dimension-reduced text feature set according to the attention text feature set to obtain a standard text feature set, including:

Specifically, the dimension-reduced picture features in the dimension-reduced picture feature set are weighted according to the attention-seeking picture feature set to obtain a standard picture feature set; the method for obtaining the standard audio feature set by performing the weighted operation on each dimension reduction audio feature in the dimension reduction audio feature set according to the attention audio feature set is consistent with the method for obtaining the standard text feature set by performing the weighted operation on each dimension reduction text feature in the dimension reduction text feature set according to the attention text feature set in the above steps, and is not repeated here.

In the embodiment of the invention, the cross-modal mapping of the text, the picture and the audio features can be realized by mapping the target text feature set into the standard text feature set, mapping the target picture feature set into the standard picture feature set and mapping the target audio feature set into the standard audio feature set respectively by utilizing the pre-trained cross-modal encoder, and converging the standard text feature set, the standard picture feature set and the standard audio feature set into the standard modal feature set, thereby facilitating classification according to the internal features of different modal data.

S4, obtaining the total number of all the data isolation areas, carrying out feature clustering on the standard mode feature set according to the total number of the areas and a preset mode reconstruction distance algorithm to obtain a mode feature class set, collecting isolation keys of all the data isolation areas into an isolation key set, and extracting a key feature set corresponding to the isolation key set.

In the embodiment of the invention, the data isolation area refers to an area for data storage or data isolation of dynamic data, and the total number of the areas refers to the total area number of the data isolation area.

In the embodiment of the present invention, the feature clustering is performed on the standard modal feature set according to the total number of the regions and a preset modal reconstruction distance algorithm to obtain a modal feature class set, including:

splitting all the features in the standard modal feature set into a total number of modal feature groups of the region, and randomly selecting modal center features for each modal feature group;

and calculating the reconstruction distance between each modal feature and each modal center feature by using the following modal reconstruction distance algorithm:

；

wherein ,means the reconstruction distance, +.>Is the total feature dimension of the modal feature, and the total feature dimension of the modal feature and the total feature dimension of the modal center feature are equal, +. >Refers to->Dimension feature vector->Refers to the +.>Dimension feature vector->Refers to the +.>A dimension feature vector;

updating the modal feature groups into standard modal feature groups one by one according to the reconstruction distance;

calculating standard modal center features of each standard modal feature group, and calculating center reconstruction distances between the standard modal center features and the corresponding modal center features one by one;

and iteratively updating each standard modal feature group into a corresponding modal feature class according to all the center reconstruction distances, and collecting all the modal feature classes into a modal feature class set.

Specifically, updating the modal feature groups into standard modal feature groups one by one according to the reconstruction distance refers to reassigning each feature in the standard modal feature set to a modal feature group corresponding to a modal center feature closest to the reconstruction distance, so as to obtain a standard modal feature group.

In detail, the standard modal center feature is a mean feature of the standard modal feature group, and the center reconstruction distance is a reconstruction distance between the standard modal center feature and the corresponding modal center feature.

In the embodiment of the invention, the reconstruction distance between each modal feature and each modal center feature is calculated by utilizing the modal reconstruction distance algorithm, so that the feature distance between each modal feature can be calculated, and the similarity between each modal feature is determined according to the feature distance, thereby realizing feature clustering.

Specifically, the iterative updating of each standard modal feature group into a corresponding modal feature class according to all center reconstruction distances refers to calculating a distance sum of all center reconstruction distances, when the distance sum is greater than a preset distance threshold, taking the standard modal feature group as a modal feature group, and returning the standard modal feature as a modal center feature to the step of calculating the reconstruction distance between each modal feature and each modal center feature by using a following modal reconstruction distance algorithm until the distance sum is less than or equal to the distance threshold, taking the standard modal feature group as the corresponding modal feature class.

In detail, the isolation key refers to a key used for data protection and data encryption in each data isolation area, the isolation key of each data isolation area is not equal, and the isolation key is composed of a string of characters containing digital symbols.

In the embodiment of the present invention, the extracting the key feature set corresponding to the isolated key set includes:

；

wherein ,refers to the primary key feature matrix, < >>Means +.>First->Vitamin characteristics (I)>Is the total number of features of the primary key feature set,/or->Is the total feature dimension of each primary key feature in the primary key feature set;

；

wherein ,is referred to asSpatial variance, & lt>Is a maximum function>Is the projection matrix of the primary key feature matrix,/i>Means the feature dimension of the projection matrix, < >>Is the primary key feature matrix->Mean vector of>Refers to transposed symbols;

In detail, the step of establishing the following primary key feature matrix according to the primary key feature set refers to splitting each feature in the primary key feature set according to feature dimensions, so that subsequent feature dimension reduction is facilitated.

Specifically, by calculating the spatial variance of the primary key feature matrix using the maximum variance algorithm as follows, the primary key feature matrix can be mapped to the corresponding feature space, thereby facilitating principal component analysis.

In detail, a standard projection matrix of the primary key feature matrix can be calculated according to the space variance by using a Lagrangian multiplier method, the feature projection is performed on the primary key feature matrix by using the standard projection matrix, and the dimension-reducing key feature matrix is obtained by multiplying the standard projection matrix by the primary key feature matrix.

The method for extracting the key feature set from the dimension-reduction key feature matrix is a reverse step of the method for establishing the following primary key feature matrix according to the primary key feature set, and is not described herein.

In the embodiment of the invention, the feature clustering is carried out on the standard modal feature set according to the total number of the areas and a preset modal reconstruction distance algorithm by acquiring the total number of the areas of all the data isolation areas, so as to obtain a modal feature class set, the isolation keys of all the data isolation areas are collected into an isolation key set, the key feature set corresponding to the isolation key set is extracted, the data classification can be realized according to the degree of association between the internal structural content features of different data sources and different types of dynamic data, and the dimension reduction coding of the isolation keys is realized according to the features of the keys, thereby facilitating the matching of the data of the subsequent isolation areas and the isolation keys, and further improving the data security.

S5, selecting the modal feature classes in the modal feature class set one by one as target modal feature classes, screening the isolation key corresponding to the target modal feature class from the isolation key set according to the key feature set to serve as a target isolation key, screening the data corresponding to the target modal feature class from the target labeling data to serve as target data to be isolated, encrypting and isolating the target data to be isolated by utilizing the target isolation key, and ending data isolation when the target modal feature class is the last modal feature class in the modal feature class set.

In the embodiment of the present invention, the step of screening the isolation key corresponding to the target modal feature class from the isolation key set according to the key feature set as the target isolation key means that the key feature set is normalized to obtain a key code set, and the distribution of the isolation key is realized according to the distance relationship between each key code in the key code set and the cluster center of each modal feature class in the modal feature class set.

Specifically, the step of screening the data corresponding to the target modal feature class from the target labeling data to serve as target data to be isolated refers to labeling selected data according to the data states of all features in the target modal feature class to serve as target isolated data.

In the embodiment of the present invention, the encrypting and isolating the target data to be isolated by using the target isolation key includes:

In detail, the target encrypted data may be split according to the data byte length to obtain a plurality of target encrypted data blocks, that is, 128 target encrypted data blocks.

In the embodiment of the invention, the mode feature classes in the mode feature class set are selected one by one to serve as target mode feature classes, the isolation key corresponding to the target mode feature class is selected from the isolation key set according to the key feature set to serve as a target isolation key, the data corresponding to the target mode feature class is selected from the target labeling data to serve as target data to be isolated, the target isolation key is utilized to encrypt and isolate the target data to be isolated, the encryption isolation of the data can be realized according to the data features, the scattered isolation of the data is realized, and the security of the data isolation is improved.

The method comprises the steps of obtaining the total number of all data isolation areas, carrying out feature clustering on the standard mode feature sets according to the total number of the areas and a preset mode reconstruction distance algorithm to obtain mode feature class sets, collecting isolation keys of all data isolation areas into isolation key sets, extracting the key feature sets corresponding to the isolation key sets, realizing data classification according to the relevance between internal structural content features of different data sources and different types of dynamic data, realizing dimension reduction coding of the isolation keys according to the features of the keys, facilitating matching of subsequent isolation area data and the isolation keys, improving data safety, and screening out isolation keys corresponding to the target mode feature classes from the isolation key sets according to the key feature sets to serve as target isolation keys. Therefore, the dynamic data isolation method based on multiple data sources can solve the problem of lower safety when dynamic data isolation is performed.

FIG. 4 is a functional block diagram of a dynamic data isolation system based on multiple data sources according to an embodiment of the present invention.

The multiple data source based dynamic data isolation system 100 of the present invention may be installed in an electronic device. Depending on the functions implemented, the multi-data source based dynamic data isolation system 100 may include a data annotation module 101, a feature extraction module 102, a modality encoding module 103, a feature clustering module 104, and a data isolation module 105. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the data labeling module 101 is configured to aggregate dynamic data of all data sources into target dynamic data, perform data state labeling on the target dynamic data to obtain target labeling data, and split the target labeling data into a target text data set, a target picture data set and a target audio data set according to data types;

the feature extraction module 102 is configured to perform text vectorization operation on the target text data set to obtain a target text feature set, sequentially perform image denoising and image feature extraction operation on the target image data set to obtain a target image feature set, and sequentially perform audio normalization and frequency domain feature extraction operation on the target audio data set to obtain a target audio feature set;

The mode encoding module 103 is configured to map the target text feature set to a standard text feature set, map the target picture feature set to a standard picture feature set, map the target audio feature set to a standard audio feature set, map the standard text feature set, and aggregate the standard text feature set, the standard picture feature set, and the standard audio feature set to a standard mode feature set, respectively, using a pre-trained cross-mode encoder;

the feature clustering module 104 is configured to obtain a total number of regions of all data isolation regions, perform feature clustering on the standard modal feature set according to the total number of regions and a preset modal reconstruction distance algorithm to obtain a modal feature class set, collect isolation keys of each data isolation region into an isolation key set, and extract a key feature set corresponding to the isolation key set;

the data isolation module 105 is configured to select the mode feature classes in the mode feature class set one by one as a target mode feature class, screen an isolation key corresponding to the target mode feature class from the isolation key set according to the key feature set as a target isolation key, screen data corresponding to the target mode feature class from the target annotation data as target data to be isolated, encrypt and isolate the target data to be isolated by using the target isolation key, and end data isolation when the target mode feature class is the last mode feature class in the mode feature class set.

In detail, each module in the multi-data-source-based dynamic data isolation system 100 in the embodiment of the present invention adopts the same technical means as the multi-data-source-based dynamic data isolation method described in fig. 1 to 3, and can produce the same technical effects, which are not described herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, system and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and other manners of division may be implemented in practice.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or systems set forth in the system embodiments may also be implemented by one unit or system in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A method of dynamic data isolation based on multiple data sources, the method comprising:

s1: collecting dynamic data of all data sources into target dynamic data, carrying out data state labeling on the target dynamic data to obtain target labeling data, and splitting the target labeling data into a target text data set, a target picture data set and a target audio data set according to data types;

s2: performing text vectorization operation on the target text data set to obtain a target text feature set, sequentially performing picture denoising and picture feature extraction operation on the target picture data set to obtain a target picture feature set, and sequentially performing audio standardization and frequency domain feature extraction operation on the target audio data set to obtain a target audio feature set;

s3: mapping the target text feature set into a standard text feature set, mapping the target picture feature set into a standard picture feature set, mapping the target audio feature set into a standard audio feature set, aggregating the standard text feature set, the standard picture feature set and the standard audio feature set into a standard mode feature set by using a pre-trained cross-mode encoder respectively;

S4: acquiring the total number of all the data isolation areas, carrying out feature clustering on the standard modal feature set according to the total number of the areas and a preset modal reconstruction distance algorithm to obtain a modal feature class set, collecting isolation keys of all the data isolation areas into an isolation key set, and extracting a key feature set corresponding to the isolation key set;

s5: selecting the mode feature classes in the mode feature classes one by one as target mode feature classes, screening the isolation key corresponding to the target mode feature class from the isolation key set according to the key feature set to serve as a target isolation key, screening the data corresponding to the target mode feature class from the target labeling data to serve as target data to be isolated, encrypting and isolating the target data to be isolated by utilizing the target isolation key, and ending data isolation when the target mode feature class is the last mode feature class in the mode feature classes.

2. The method for isolating dynamic data based on multiple data sources according to claim 1, wherein said performing data state labeling on said target dynamic data to obtain target labeling data comprises:

performing position marking on the target time position according to the data position to obtain target position data;

and performing type labeling on the target time data according to the data types to obtain target type data, and collecting all the target type data into target labeling data.

3. The method for dynamic data isolation based on multiple data sources of claim 1, wherein said performing text vectorization on said target text data set to obtain a target text feature set comprises:

4. The method for dynamic data isolation based on multiple data sources according to claim 1, wherein sequentially performing image denoising and image feature extraction operations on the target image dataset to obtain a target image feature set comprises:

5. The method for dynamic data isolation based on multiple data sources of claim 1, wherein said sequentially performing audio normalization and frequency domain feature extraction operations on said target audio data set to obtain a target audio feature set comprises:

6. The multi-data source based dynamic data isolation method of claim 1, wherein the mapping the target text feature set to a standard text feature set, the target picture feature set to a standard picture feature set, and the target audio feature set to a standard audio feature set with a pre-trained cross-modality encoder, respectively, comprises:

7. The method for dynamic data isolation based on multiple data sources of claim 6, wherein said weighting each dimension-reduced text feature in said dimension-reduced text feature set according to said attention text feature set to obtain a standard text feature set comprises:

8. The method for dynamic data isolation based on multiple data sources according to claim 1, wherein the extracting the key feature set corresponding to the isolated key set comprises:

；

wherein ,refers to the spatial variance,/->Is a maximum function>Is the projection matrix of the primary key feature matrix,/i >Means the feature dimension of the projection matrix, < >>Is the primary key feature matrix->Mean vector of>Refers to transposed symbols;

9. The method for dynamically isolating data based on multiple data sources according to claim 1, wherein said encrypting and isolating said target data to be isolated using said target isolation key comprises:

10. A multiple data source based dynamic data isolation system, the system comprising:

The data labeling module is used for collecting the dynamic data of all the data sources into target dynamic data, carrying out data state labeling on the target dynamic data to obtain target labeling data, and splitting the target labeling data into a target text data set, a target picture data set and a target audio data set according to the data types;