CN116049766A - Multisource data anomaly monitoring system based on digital twin - Google Patents

Multisource data anomaly monitoring system based on digital twin Download PDF

Info

Publication number
CN116049766A
CN116049766A CN202310124265.1A CN202310124265A CN116049766A CN 116049766 A CN116049766 A CN 116049766A CN 202310124265 A CN202310124265 A CN 202310124265A CN 116049766 A CN116049766 A CN 116049766A
Authority
CN
China
Prior art keywords
data
encryption
sub
acquisition
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310124265.1A
Other languages
Chinese (zh)
Inventor
张学银
王尚文
郭群浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongxun Wanglian Technology Co ltd
Original Assignee
Shenzhen Zhongxun Wanglian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongxun Wanglian Technology Co ltd filed Critical Shenzhen Zhongxun Wanglian Technology Co ltd
Priority to CN202310124265.1A priority Critical patent/CN116049766A/en
Publication of CN116049766A publication Critical patent/CN116049766A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a multisource data anomaly monitoring system based on digital twinning, relates to the technical field of digital twinning, and aims to solve the problems of anomaly detection and data safety performance of data. The digital twinning-based multisource data anomaly monitoring system comprises a twinning data acquisition system, an acquisition data extraction system, an anomaly monitoring system and a data encryption storage system, wherein the acquired data is compressed, so that the data volume can be effectively reduced, the storage space can be reduced, the transmission, storage and processing efficiency can be improved after the data is compressed, the damage rate of the data can be reduced, the anomaly detection can be respectively carried out on the data through a local detection module, a proximity detection module, a fusion detection module and a mixed anomaly detection module, the safety performance of the normal data in the final transmission process is better due to layer-by-layer encryption, the leakage risk can not be increased due to the change of the position during transmission, the confidential information of the data can be better stored after encryption, and the safety of the data is effectively protected.

Description

Multisource data anomaly monitoring system based on digital twin
Technical Field
The invention relates to the technical field of digital twinning, in particular to a multisource data anomaly monitoring system based on digital twinning.
Background
The digital twin is to fully utilize data such as a physical model, sensor update, operation history and the like, integrate simulation processes of multiple disciplines, multiple physical quantities, multiple scales and multiple probabilities, and complete mapping in a virtual space, thereby reflecting the full life cycle process of corresponding entity equipment. The following problems also exist in the detection of the abnormal data of the existing digital twin data:
1. when twin data are acquired, the data acquisition is inaccurate due to the fact that the acquisition mode is too single, and the data are not processed after the data acquisition is completed, so that the acquired data are damaged.
2. When the multisource data is subjected to anomaly detection, the detected anomaly data is inaccurate due to the fact that the dependence of a detection method on parameters is too strong and the data view deviates from a modeling sample.
3. After abnormality detection is performed on multi-source data without abnormality, when the multi-source data is transmitted, the safety of the data is reduced due to layer-by-layer processing of the data, so that data leakage occurs when normal data is finally transmitted.
Disclosure of Invention
The invention aims to provide a digital twin-based multisource data anomaly monitoring system, which can effectively reduce the data volume to reduce the storage space, can improve the transmission, storage and processing efficiency after compression, and can reduce the damage rate of data.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the digital twinning-based multisource data anomaly monitoring system comprises a twinning data acquisition system, an acquisition data extraction system, an anomaly monitoring system and a data encryption storage system;
the twin data acquisition system is used for acquiring data of different types of twin data in different acquisition modes and processing the acquired data of different types;
the acquisition data extraction system is used for respectively extracting different types of data processed in the twin data acquisition system;
the anomaly monitoring system is used for respectively detecting different data sets extracted by the acquisition data extraction system and respectively storing data with anomalies and data without anomalies;
and the data encryption storage system is used for receiving the sub-data set without the detected abnormality in the abnormality monitoring system and encrypting the sub-data set without the abnormality.
Preferably, the twin data acquisition system comprises:
the data acquisition module is used for:
collecting data according to different types of creation modes, wherein the creation modes comprise four types of threads, namely an acquisition thread, a compression thread, a recombination thread and a network transmission thread;
the data buffer module is used for:
according to the acquired data based on the acquisition threads in the data acquisition module, carrying out various initialization buffering on the acquired data, wherein the data comprises three types of data buffers, namely an acquisition buffer, a compression buffer and a transmission buffer, the data buffers are used for sharing data among the multiple threads, and the data buffers ensure the mutex of the multi-thread access in a locking mode;
the data compression module is used for:
based on the data in the buffer area of the data buffer module, different sub-data sets are assigned to the buffer data according to the read data, then the assigned sub-data sets are compressed in the compression area of the different sub-data sets,
the compression area comprises compression threads with the same number as the data acquisition types, acquisition data are acquired from the acquisition buffer area in the main cycle of the compression threads, and the acquisition data are compressed to obtain compression data;
a compression reorganization module for:
packing the sub-data sets based on the types of the sub-data sets which are compressed in the data compression module, and compressing and reorganizing the sub-data sets of the packed sub-data sets;
a class of memory modules for:
and storing the data set after compression and recombination.
Preferably, the collected data extraction system includes:
a multi-data receiving module for:
receiving based on the compressed sub-data sets in the twin data acquisition system;
a classification module for:
the multiple data receiving modules are classified based on their multiple compressed sub-data sets received in them. And each classified compressed sub-data set has an independent processing module.
Preferably, the abnormality monitoring system includes:
the classified data receiving module is used for:
receiving the sub-data sets according to different classifications;
the detection module is used for:
and respectively detecting different types of anomalies according to the unused molecular data sets.
Preferably, the classified data receiving module includes:
and the data classification and screening sub-module is used for:
classifying the received sub-data set to be used as classified data to be stored;
a storage area sensing sub-module for:
acquiring capacity coefficients of storage areas of the classified data, wherein the capacity coefficients of the storage areas represent used space of the storage areas, and searching for available target storage areas;
and setting a control sub-module for:
setting the copy number and the storage sensing strategy when the classified data are stored; storing the awareness policy includes determining data nodes of the categorized data storage area for storing categorized data;
executing a storage sub-module for:
and storing the classified data to be stored and the copy number into a target storage area, and recording the storage information of the operation behavior data to be stored and the copy number.
Preferably, the detection module includes:
the local detection module is used for:
each sample in the sub-data set is regarded as a node in the graph, only the local neighborhood of each node is considered when the connecting edges between the nodes are constructed, and an asymmetric weighted directed graph is constructed on the data set to be detected;
the method comprises the steps of applying a self-defined random walk process to the graph, so that random walk points jump from nodes corresponding to normal samples to nodes corresponding to abnormal samples with high probability.
Meanwhile, considering the problem that the asymmetric relation in the local information graph possibly causes the abnormal convergence of the random walk process, based on the principle that the abnormal samples should be accessed with higher weight, two different types of restarting vectors are provided, and the fact that the corresponding nodes of the abnormal samples in the local information graph can be selected with higher probability when the random walk points are restarted is ensured;
the proximity detection module is used for:
analyzing parameters of the data samples distributed to the samples according to the data models in the sub-data sets, and finding that scores obtained when different types of samples use different proximity graphs show different change modes.
The parameter difference value is constructed based on the pattern of change of the parameter.
When the relation among samples in different types of data sets is described, the data model can freely select the required proximity measure;
the fusion detection module is used for:
performing fusion detection on the plurality of sub-data sets, and fusing the sub-data sets into an extended feature space;
and calculating membership degrees of the sample to a plurality of cluster structures implied in the data set by using fuzzy clustering in the space, and describing membership behaviors of the sample to each cluster structure in different views.
Marking samples with inconsistent behaviors in different views as abnormal objects according to fuzzy clustering calculation;
the mixing anomaly detection module is used for:
and according to the sub-data set, representing the mutual representation relation among the learning samples through a low rank as a dictionary, and constructing a similarity matrix among the samples by utilizing the relation.
And obtaining a cluster representative point corresponding to each sample by applying affinity propagation clusters on similarity matrixes corresponding to different views in the overall feature space.
Deviations of the samples from the cluster center are defined as their attribute anomaly scores, while inconsistencies in the behavior of the samples across different views are defined as category anomaly scores. The attribute anomaly score and the behavioral anomaly score are used together to determine the anomaly degree of the sample.
Preferably, the process of encrypting the data without abnormality by the data encryption storage system includes:
taking the data without abnormality as original data and dividing the original data into segments to be encrypted with preset length, and establishing an encryption sequence based on the segments to be encrypted;
according to the number of the segments to be encrypted, taking the encryption sequence as a first column of a matrix, and establishing a first encryption matrix;
adjusting a first position of each segment to be encrypted on a corresponding row of the first encryption matrix based on a preset interference factor, and carrying out first encryption on the segment to be encrypted after adjustment;
acquiring corresponding interference passwords in a preset password set based on the interference times, and rolling out random numbers by using the encrypted dice;
marking a second location in each row of the first encryption matrix that is separated from the first location by a random number;
sequentially inputting the interference passwords to the second position for second encryption, and establishing a second encryption matrix;
determining that the current encryption degree is insufficient when the first number of blank positions in the second encryption matrix is greater than the second number of non-blank positions;
sequentially removing the outermost peripheral position of the second matrix to generate a plurality of sub-encryption matrixes;
respectively calculating the encryption rank of each sub encryption matrix, and generating a rank password based on all the encryption ranks;
inputting the rank password into the second encryption matrix for third encryption to obtain a third encryption matrix;
randomly generating a homotype matrix based on the specification of the third encryption matrix;
marking the residual blank positions on the third encryption matrix, and superposing the marked third encryption matrix and the homotype matrix to obtain the superposition number generated by the marked positions;
inputting the superposition number to the residual blank position to obtain a full encryption matrix;
and extracting data corresponding to each position in the first full encryption matrix to obtain encrypted data.
Preferably, for a feature space formed by a data set containing m samples, the depicting samples have n cluster structures in different views;
the membership of each sample is calculated using the following formula:
Figure BDA0004081250950000061
wherein ,
Figure BDA0004081250950000062
in the above, ω j Representing the membership of the jth sample; n represents the number of cluster structures; m represents the number of samples; gamma ray ij A cluster number representing the j-th sample and the i-th cluster structure; k represents an intra-model fuzzy weighted smoothness index; d, d ij Representing the difference of the distances between the jth sample and the ith cluster structure;
and according to the calculation, if the membership of the sample is smaller than a preset membership threshold, taking the sample as a sample with inconsistent behaviors in different views, and marking the sample as an abnormal object.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a digital twin-based multisource data anomaly monitoring system, which is characterized in that collected data are collected by threads of four types, namely a collection thread, a compression thread, a recombination thread and a network transmission thread, and are subjected to multi-type initialization buffering, wherein the data buffering comprises three types of data buffering areas, namely a collection buffering area, a compression buffering area and a transmission buffering area, the data buffering areas are used for sharing data, the data buffering areas ensure the mutex of multi-thread access in a locking mode, the data in the buffering areas are compressed, the compression areas comprise compression threads with the same number as the types of data collection, in the main cycle of the compression threads, the collected data are obtained from the collection buffering area, the compressed data are obtained by compressing the collected data, the data quantity can be effectively reduced to reduce the storage space, the transmission, storage and processing efficiency can be improved, and the damage rate of the data can be reduced after the compression.
2. The invention provides a digital twin-based multisource data anomaly monitoring system, which can not only pay attention to nodes, continuous edges or correlations among nodes in a graph or between nodes in a traditional basic graph model in data, but also strengthen detection of local neighborhood information of samples, only consider local neighborhood of each node when constructing continuous edges among the nodes, an asymmetric weighted directed graph is constructed on a data set to be detected, a self-defined random walk process is applied to the graph, so that random walk points jump from nodes corresponding to normal samples to nodes corresponding to abnormal samples with high probability, a data model in a sub-data set of a proximity detection module analyzes parameters of the data samples distributed by the samples, and the obtained scores of different types of samples show different change modes when using different proximity graphs, and can freely select required proximity metrics when describing the relationships among the samples in the different types of data sets, and the different models can use different proximity metrics to calculate similarity relationships or corresponding distances between the samples, so that the clustering structure of the clustering structure is more flexible when clustering the clustering structure is fused to the different types of the clustering structure. And marking samples with inconsistent behaviors in different views as abnormal objects according to fuzzy clustering calculation, so that correlation and inconsistent relation existing between data of different sources are effectively adapted during detection, the effectiveness is effectively improved, and a mixed anomaly detection module applies affinity propagation clustering to similarity matrixes corresponding to different views in an overall feature space to obtain clustering representative points corresponding to each sample. The deviation of the sample from the clustering center is defined as the attribute anomaly score, and the inconsistency of the sample behavior on different views is defined as the category anomaly score, so that the model has better performance than an algorithm which simply uses the inconsistency to define the anomaly degree, meanwhile, the inconsistency of the sample behavior in different views is focused when the abnormal objects in the multi-source data are analyzed, and the samples with serious deviations in the views are focused.
3. The invention provides a multisource data anomaly monitoring system based on digital twinning, after a first encryption matrix is established, a first position on a corresponding row of the first encryption matrix is subjected to first encryption after adjustment, a second position which is separated from the first position by a random number is marked in each row of the first encryption matrix, an interference password is sequentially input into the second position for second encryption, a second encryption matrix is established, when the first number of blank positions in the second encryption matrix is larger than the second number of non-blank positions, the current encryption degree is determined to be insufficient, the outermost peripheral positions of the second matrix are sequentially removed, a plurality of sub encryption matrices are generated, the encryption rank of each sub encryption matrix is calculated respectively, a rank password is generated, the third encryption matrix is carried out in the second encryption matrix, the encryption layer by layer is obtained, the safety performance of normal data in the final transmission process is better, the leakage risk is not increased due to the position change in the transmission process, the secret information of the data can be effectively stored after encryption, and the safety of the data is better protected.
Drawings
FIG. 1 is a schematic overall flow chart of the present invention;
FIG. 2 is a schematic diagram of a twin data acquisition system module according to the present invention;
FIG. 3 is a schematic diagram of a system for extracting collected data according to the present invention;
FIG. 4 is a schematic diagram of an anomaly monitoring system according to the present invention;
FIG. 5 is a schematic diagram of a detection module according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to solve the problem that in the prior art, when twin data is acquired, because the acquisition mode is too single, the data acquisition is inaccurate, and after the data acquisition is completed, the data is not processed, so that the acquired data is damaged, please refer to fig. 1 and 3, the embodiment provides the following technical scheme:
the digital twinning-based multisource data anomaly monitoring system comprises a twinning data acquisition system, an acquisition data extraction system, an anomaly monitoring system and a data encryption storage system; the twin data acquisition system is used for acquiring data of different types of twin data in different acquisition modes and processing the acquired data of different types; the acquisition data extraction system is used for respectively extracting different types of data processed in the twin data acquisition system; the anomaly monitoring system is used for respectively detecting different data sets extracted by the acquisition data extraction system and respectively storing data with anomalies and data without anomalies; and the data encryption storage system is used for receiving the sub-data set without the detected abnormality in the abnormality monitoring system and encrypting the sub-data set without the abnormality.
The twin data acquisition system comprises: the data acquisition module is used for: collecting data according to different types of creation modes, wherein the creation modes comprise four types of threads, namely an acquisition thread, a compression thread, a recombination thread and a network transmission thread; the data buffer module is used for: according to the acquired data based on the acquisition threads in the data acquisition module, carrying out various initialization buffering on the acquired data, wherein the data comprises three types of data buffers, namely an acquisition buffer, a compression buffer and a transmission buffer, the data buffers are used for sharing data among the multiple threads, and the data buffers ensure the mutex of the multi-thread access in a locking mode; the data compression module is used for: reading a plurality of groups of data in a buffer area based on a data buffer module, endowing the buffer area with different sub-data sets according to the plurality of groups of read buffer data, and then compressing the endowed sub-data sets in compression areas, wherein the compression areas contain compression threads with the same number as the data acquisition types, acquiring acquisition data from the acquisition buffer area in a main cycle of the compression threads, and compressing the acquisition data to obtain compressed data; a compression reorganization module for: packing the sub-data sets based on the types of the sub-data sets which are compressed in the data compression module, and compressing and reorganizing the sub-data sets of the packed sub-data sets; a class of memory modules for: and storing the data set after compression and recombination.
The acquisition data extraction system comprises: a multi-data receiving module for: receiving based on the compressed sub-data sets in the twin data acquisition system; a classification module for: the multiple data receiving modules are classified based on their multiple compressed sub-data sets received in them. And each classified compressed sub-data set has an independent processing module.
Specifically, the four types of threads of an acquisition thread, a compression thread, a recombination thread and a network transmission thread are used for acquiring the acquired data, and carrying out various initialization buffering on the acquired data, wherein the four types of threads comprise an acquisition buffer area, a compression buffer area and a transmission buffer area, the data buffer area is used for sharing the data, the mutual exclusivity of multi-thread access is ensured by the data buffer area in a locking mode, the data in the buffer area is compressed, the compression area comprises compression threads with the same number as the data acquisition types, in the main circulation of the compression threads, the acquired data is acquired from the acquisition buffer area, the compression data is obtained by compressing the acquired data, the acquired data are classified differently by a classification module, and each classified compressed sub-data set is provided with an independent processing module.
In order to solve the problem in the prior art that when multi-source data is abnormally detected, due to the fact that the dependence of a detection method on parameters is too strong and a data view deviates from a modeling sample, the detected abnormal data is inaccurate, referring to fig. 4 and 5, the following technical scheme is provided in the embodiment:
the anomaly monitoring system includes: the classified data receiving module is used for: receiving the sub-data sets according to different classifications; the detection module is used for: and respectively detecting different types of anomalies according to the unused molecular data sets.
The detection module comprises: the local detection module is used for: each sample in the sub-data set is regarded as a node in the graph, only the local neighborhood of each node is considered when the connecting edges between the nodes are constructed, and an asymmetric weighted directed graph is constructed on the data set to be detected; the method comprises the steps of applying a self-defined random walk process to the graph, so that random walk points jump from nodes corresponding to normal samples to nodes corresponding to abnormal samples with high probability. Meanwhile, considering the problem that the asymmetric relation in the local information graph possibly causes the abnormal convergence of the random walk process, based on the principle that the abnormal samples should be accessed with higher weight, two different types of restarting vectors are provided, and the fact that the corresponding nodes of the abnormal samples in the local information graph can be selected with higher probability when the random walk points are restarted is ensured; the proximity detection module is used for: analyzing parameters of the data samples distributed to the samples according to the data models in the sub-data sets, and finding that scores obtained when different types of samples use different proximity graphs show different change modes. The parameter difference value is constructed based on the pattern of change of the parameter. When the relation among samples in different types of data sets is described, the data model can freely select the required proximity measure; the fusion detection module is used for: performing fusion detection on the plurality of sub-data sets, and fusing the sub-data sets into an extended feature space; and calculating membership degrees of the sample to a plurality of cluster structures implied in the data set by using fuzzy clustering in the space, and describing membership behaviors of the sample to each cluster structure in different views. Marking samples with inconsistent behaviors in different views as abnormal objects according to fuzzy clustering calculation; the mixing anomaly detection module is used for: and according to the sub-data set, representing the mutual representation relation among the learning samples through a low rank as a dictionary, and constructing a similarity matrix among the samples by utilizing the relation. And obtaining a cluster representative point corresponding to each sample by applying affinity propagation clusters on similarity matrixes corresponding to different views in the overall feature space. Deviations of the samples from the cluster center are defined as their attribute anomaly scores, while inconsistencies in the behavior of the samples across different views are defined as category anomaly scores. The attribute anomaly score and the behavioral anomaly score are used together to determine the anomaly degree of the sample.
Specifically, the local detection module can focus on nodes, continuous edges or correlations among nodes in a traditional basic graph model in data, local neighborhood information of samples is enhanced, only local neighborhood of each node is considered when continuous edges among the nodes are constructed, an asymmetric weighted directed graph is constructed on a data set to be detected, a self-defined random walk process is applied to the graph, so that random walk points jump from nodes corresponding to normal samples to nodes corresponding to abnormal samples with high probability, the proximity detection module is used for analyzing parameters of the data samples distributed by the samples, the data models in the sub-data sets show different change modes when scores obtained by using different proximity graphs are found, the data models can freely select required proximity metrics when the relationships among the samples in the different types of data sets are depicted, the different models can use different proximity metrics to calculate similarity or distance relationships among the samples, the corresponding models have different types of clustering behaviors in the clustering structure when the corresponding models face different types of the nodes are more flexible, and the clustering structure is clustered in the clustering structure is fused with the clustering structure. And marking samples with inconsistent behaviors in different views as abnormal objects according to fuzzy clustering calculation, so that correlation and inconsistent relation existing between data of different sources are effectively adapted during detection, the effectiveness is effectively improved, and a mixed anomaly detection module applies affinity propagation clustering to similarity matrixes corresponding to different views in an overall feature space to obtain clustering representative points corresponding to each sample. The deviation of the sample from the clustering center is defined as the attribute anomaly score, and the inconsistency of the sample behavior on different views is defined as the category anomaly score, so that the model has better performance than an algorithm which simply uses the inconsistency to define the anomaly degree, meanwhile, the inconsistency of the sample behavior in different views is focused when the abnormal objects in the multi-source data are analyzed, and the samples with serious deviations in the views are focused.
In order to solve the problem in the prior art that after classified data is received, classified storage and management are performed, so that data security is improved, the following technical scheme is provided in this embodiment:
a classified data receiving module comprising: the data classification screening submodule is used for screening the classification of the received sub-data set and taking the classified data as classified data to be stored; the storage area sensing sub-module is used for acquiring capacity coefficients of storage areas of all the classified data, wherein the capacity coefficients of the storage areas represent used space of the storage areas, and the available target storage areas are searched; the control setting sub-module is used for setting the copy number and the storage perception strategy when the classified data is stored; storing the awareness policy includes determining data nodes of the categorized data storage area for storing categorized data; the execution storage submodule is used for storing the classified data to be stored and the copy number into the target storage area and recording the storage information of the operation behavior data to be stored and the copy number.
All procedural data for classifying operation can be obtained after receiving one piece of data, so that the comprehensiveness of classified data receiving and storing is ensured, data retrieval can be carried out according to classification when the method is used, interference of different classified data is prevented, the data retrieval is more accurate, and the operation is simpler and more convenient; by setting the copy number of the classified data during storage, the safety of the classified data is ensured, the data incompleteness caused by the loss of the classified data is prevented, and the reliability and the practicability of the system are enhanced.
In order to solve the problem that in the prior art, after abnormality detection is performed on multi-source data without abnormality, when the multi-source data is transmitted, the security of the data is reduced due to layer-by-layer processing of the data, so that data leakage occurs when normal data is finally transmitted, referring to fig. 1, the embodiment provides the following technical scheme:
the process of encrypting data without anomalies by the data encryption storage system comprises the following steps: taking the data without abnormality as original data and dividing the original data into segments to be encrypted with preset length, and establishing an encryption sequence based on the segments to be encrypted; according to the number of the segments to be encrypted, taking the encryption sequence as a first column of a matrix, and establishing a first encryption matrix; adjusting a first position of each segment to be encrypted on a corresponding row of the first encryption matrix based on a preset interference factor, and carrying out first encryption on the segment to be encrypted after adjustment; acquiring corresponding interference passwords in a preset password set based on the interference times, and rolling out random numbers by using the encrypted dice; marking a second location in each row of the first encryption matrix that is separated from the first location by a random number; sequentially inputting the interference passwords to the second position for second encryption, and establishing a second encryption matrix; determining that the current encryption degree is insufficient when the first number of blank positions in the second encryption matrix is greater than the second number of non-blank positions; sequentially removing the outermost peripheral position of the second matrix to generate a plurality of sub-encryption matrixes; respectively calculating the encryption rank of each sub encryption matrix, and generating a rank password based on all the encryption ranks; inputting the rank password into the second encryption matrix for third encryption to obtain a third encryption matrix; randomly generating a homotype matrix based on the specification of the third encryption matrix; marking the residual blank positions on the third encryption matrix, and superposing the marked third encryption matrix and the homotype matrix to obtain the superposition number generated by the marked positions; inputting the superposition number to the residual blank position to obtain a full encryption matrix; and extracting data corresponding to each position in the first full encryption matrix to obtain encrypted data.
Specifically, after the first encryption matrix is built, a first position on a corresponding row of the first encryption matrix is subjected to first encryption after adjustment, a second position which is separated from the first position by a random number is marked in each row of the first encryption matrix, an interference password is sequentially input into the second position for second encryption, a second encryption matrix is built, when the first number of blank positions in the second encryption matrix is larger than the second number of non-blank positions, the current encryption degree is insufficient, the outermost positions of the second matrix are sequentially removed, a plurality of sub-encryption matrices are generated, the encryption rank of each sub-encryption matrix is calculated respectively, a rank password is generated, the rank password is input into the second encryption matrix for third encryption, the third encryption matrix is obtained, the layer-by-layer encryption can enable the security performance of normal data to be better in the final transmission process, the leakage risk cannot be increased for the position change during transmission, the data can be better stored after encryption, and the security of the data can be effectively protected.
The membership degree of the samples to a plurality of cluster structures implied in the data set, and the characteristic space formed by the data set containing m samples is used for describing that the samples have n cluster structures in different views;
the membership of each sample is calculated using the following formula:
Figure BDA0004081250950000141
wherein ,
Figure BDA0004081250950000142
in the above, ω j Representing the membership of the jth sample; n represents the number of cluster structures; m represents the number of samples; gamma ray ij A cluster number representing the j-th sample and the i-th cluster structure; k represents an intra-model fuzzy weighted smoothness index; d, d ij Representing the difference of the distances between the jth sample and the ith cluster structure;
and according to the calculation, if the membership of the sample is smaller than a preset membership threshold, taking the sample as a sample with inconsistent behaviors in different views, and marking the sample as an abnormal object.
Calculating the membership degree of the sample to a plurality of cluster structures implied in the data set by using the formula, and describing the membership behaviors of the sample to each cluster structure in different views; if the membership degree deviates from the preset and threshold range, the inconsistent behavior of the sample is indicated, and therefore the samples with inconsistent behavior in different views are marked as abnormal objects; by adopting the formula, the quantitative analysis of the sample can be realized, and the accuracy of the analysis is improved, so that the abnormal sample can be accurately found out.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. The digital twinning-based multisource data anomaly monitoring system comprises a twinning data acquisition system, an acquisition data extraction system, an anomaly monitoring system and a data encryption storage system;
the twin data acquisition system is used for acquiring data of different types of twin data in different acquisition modes and processing the acquired data of different types;
the acquisition data extraction system is used for respectively extracting different types of data processed in the twin data acquisition system;
the anomaly monitoring system is used for respectively detecting different data sets extracted by the acquisition data extraction system and respectively storing data with anomalies and data without anomalies;
and the data encryption storage system is used for receiving the sub-data set without the detected abnormality in the abnormality monitoring system and encrypting the sub-data set without the abnormality.
2. The digital twinning-based multisource data anomaly monitoring system of claim 1, wherein: the twin data acquisition system comprises:
the data acquisition module is used for:
collecting data according to different types of creation modes, wherein the creation modes comprise four types of threads, namely an acquisition thread, a compression thread, a recombination thread and a network transmission thread;
the data buffer module is used for:
according to the acquired data based on the acquisition threads in the data acquisition module, carrying out various initialization buffering on the acquired data, wherein the data comprises three types of data buffers, namely an acquisition buffer, a compression buffer and a transmission buffer, and the data buffers are used for sharing data among multiple threads;
the data compression module is used for:
reading a plurality of groups of data in a buffer area based on the data buffer module, endowing different sub-data sets to the data buffer area according to the plurality of groups of read buffer data, and then compressing the endowed different sub-data set compression areas;
the compression area comprises compression threads with the same number as the data acquisition types, acquisition data are acquired from the acquisition buffer area in the main cycle of the compression threads, and the acquisition data are compressed to obtain compression data;
a compression reorganization module for:
packing the sub-data sets based on the types of the sub-data sets which are compressed in the data compression module, and compressing and reorganizing the sub-data sets of the packed sub-data sets;
a class of memory modules for:
and storing the data set after compression and recombination.
3. The digital twinning-based multisource data anomaly monitoring system of claim 1, wherein: the acquisition data extraction system comprises:
a multi-data receiving module for:
receiving based on the compressed sub-data sets in the twin data acquisition system;
a classification module for:
the compressed sub-data sets are classified based on the plurality of compressed sub-data sets received in the multiple data receiving modules, each classified compressed sub-data set having an independent processing module.
4. The digital twinning-based multisource data anomaly monitoring system of claim 1, wherein: the anomaly monitoring system includes:
the classified data receiving module is used for:
receiving the sub-data sets according to different classifications;
the detection module is used for:
and respectively detecting different types of anomalies according to the unused molecular data sets.
5. The digital twinning-based multisource data anomaly monitoring system of claim 4, wherein: a classified data receiving module comprising:
and the data classification and screening sub-module is used for:
classifying the received sub-data set to be used as classified data to be stored;
a storage area sensing sub-module for:
acquiring capacity coefficients of storage areas of the classified data, wherein the capacity coefficients of the storage areas represent used space of the storage areas, and searching for available target storage areas;
and setting a control sub-module for:
setting the copy number and the storage sensing strategy when the classified data are stored; storing the awareness policy includes determining data nodes of the categorized data storage area for storing categorized data;
executing a storage sub-module for:
and storing the classified data to be stored and the copy number into a target storage area, and recording the storage information of the operation behavior data to be stored and the copy number.
6. The digital twinning-based multisource data anomaly monitoring system of claim 4, wherein: the detection module comprises:
the local detection module is used for:
each sample in the sub-data set is regarded as a node in the graph, only the local neighborhood of each node is considered when the connecting edges between the nodes are constructed, and an asymmetric weighted directed graph is constructed on the data set to be detected;
the method comprises the steps that a self-defined random walk process is applied to the graph, so that random walk points jump from nodes corresponding to normal samples to nodes corresponding to abnormal samples with high probability;
the proximity detection module is used for:
analyzing parameters of the data samples distributed for the samples according to the data models in the sub-data sets;
when the relation among samples in different types of data sets is described, the data model freely selects the required proximity measure;
the fusion detection module is used for:
performing fusion detection on the plurality of sub-data sets, and fusing the sub-data sets into an extended feature space;
calculating membership degrees of the sample to a plurality of implicit clustering structures in the data set by using fuzzy clustering in the space, and describing membership behaviors of the sample to each clustering structure in different views;
marking samples with inconsistent behaviors in different views as abnormal objects according to fuzzy clustering calculation;
the mixing anomaly detection module is used for:
according to the sub-data set, the mutual representation relation among the learning samples is represented by a low rank as a dictionary, and a similarity matrix among the samples is constructed by utilizing the relation;
the method comprises the steps of obtaining a cluster representative point corresponding to each sample by applying affinity propagation clusters on similarity matrixes corresponding to different views in an overall feature space;
the deviation of the sample from the cluster center is defined as the attribute anomaly score, the inconsistency of the behavior of the sample on different views is defined as the category anomaly score, and the attribute anomaly score and the behavior anomaly score are used for jointly determining the anomaly degree of the sample.
7. The digital twinning-based multisource data anomaly monitoring system of claim 1, wherein: the process of encrypting data without anomalies by the data encryption storage system comprises the following steps:
taking the data without abnormality as original data and dividing the original data into segments to be encrypted with preset length, and establishing an encryption sequence based on the segments to be encrypted;
according to the number of the segments to be encrypted, taking the encryption sequence as a first column of a matrix, and establishing a first encryption matrix;
adjusting a first position of each segment to be encrypted on a corresponding row of the first encryption matrix based on a preset interference factor, and carrying out first encryption on the segment to be encrypted after adjustment;
acquiring corresponding interference passwords in a preset password set based on the interference times, and rolling out random numbers by using the encrypted dice;
marking a second location in each row of the first encryption matrix that is separated from the first location by a random number;
sequentially inputting the interference passwords to the second position for second encryption, and establishing a second encryption matrix;
determining that the current encryption degree is insufficient when the first number of blank positions in the second encryption matrix is greater than the second number of non-blank positions;
sequentially removing the outermost peripheral position of the second matrix to generate a plurality of sub-encryption matrixes;
respectively calculating the encryption rank of each sub encryption matrix, and generating a rank password based on all the encryption ranks;
inputting the rank password into the second encryption matrix for third encryption to obtain a third encryption matrix;
randomly generating a homotype matrix based on the specification of the third encryption matrix;
marking the residual blank positions on the third encryption matrix, and superposing the marked third encryption matrix and the homotype matrix to obtain the superposition number generated by the marked positions;
inputting the superposition number to the residual blank position to obtain a full encryption matrix;
and extracting data corresponding to each position in the first full encryption matrix to obtain encrypted data.
8. The digital twinning-based multisource data anomaly monitoring system of claim 6, wherein: for a feature space formed by a data set containing m samples, describing that the samples have n clustering structures in different views;
the membership of each sample is calculated using the following formula:
Figure FDA0004081250940000051
wherein ,
Figure FDA0004081250940000052
in the above, ω j Representing the membership of the jth sample; n represents the number of cluster structures; m represents the number of samples; gamma ray ij A cluster number representing the j-th sample and the i-th cluster structure; k represents an intra-model fuzzy weighted smoothness index; d, d ij Representing the difference of the distances between the jth sample and the ith cluster structure;
and according to the calculation, if the membership of the sample is smaller than a preset membership threshold, taking the sample as a sample with inconsistent behaviors in different views, and marking the sample as an abnormal object.
CN202310124265.1A 2023-02-16 2023-02-16 Multisource data anomaly monitoring system based on digital twin Withdrawn CN116049766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310124265.1A CN116049766A (en) 2023-02-16 2023-02-16 Multisource data anomaly monitoring system based on digital twin

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310124265.1A CN116049766A (en) 2023-02-16 2023-02-16 Multisource data anomaly monitoring system based on digital twin

Publications (1)

Publication Number Publication Date
CN116049766A true CN116049766A (en) 2023-05-02

Family

ID=86120060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310124265.1A Withdrawn CN116049766A (en) 2023-02-16 2023-02-16 Multisource data anomaly monitoring system based on digital twin

Country Status (1)

Country Link
CN (1) CN116049766A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078048A (en) * 2023-10-17 2023-11-17 深圳市福山自动化科技有限公司 Digital twinning-based intelligent city resource management method and system
CN117657186A (en) * 2023-12-08 2024-03-08 无锡梁溪智慧环境发展有限公司 Intelligent regulation and control method and system for vehicle-mounted transmitting end

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078048A (en) * 2023-10-17 2023-11-17 深圳市福山自动化科技有限公司 Digital twinning-based intelligent city resource management method and system
CN117078048B (en) * 2023-10-17 2024-01-26 深圳市福山自动化科技有限公司 Digital twinning-based intelligent city resource management method and system
CN117657186A (en) * 2023-12-08 2024-03-08 无锡梁溪智慧环境发展有限公司 Intelligent regulation and control method and system for vehicle-mounted transmitting end

Similar Documents

Publication Publication Date Title
Wu et al. Accurate Markov boundary discovery for causal feature selection
CN116049766A (en) Multisource data anomaly monitoring system based on digital twin
Rodriguez et al. Patent clustering and outlier ranking methodologies for attributed patent citation networks for technology opportunity discovery
WO2002073446A1 (en) Data mining application with improved data mining algorithm selection
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN112528035A (en) Knowledge graph reasoning method and device based on relational attention and computer equipment
CN111292008A (en) Privacy protection data release risk assessment method based on knowledge graph
CN106060008A (en) Network invasion abnormity detection method
Wang et al. An improved data characterization method and its application in classification algorithm recommendation
CN111767192B (en) Business data detection method, device, equipment and medium based on artificial intelligence
Wang et al. Towards a hierarchical bayesian model of multi-view anomaly detection
Benkessirat et al. Fundamentals of feature selection: an overview and comparison
Loganathan et al. Development of machine learning based framework for classification and prediction of students in virtual classroom environment
CN113822336A (en) Cloud hard disk fault prediction method, device and system and readable storage medium
Jha et al. Criminal behaviour analysis and segmentation using k-means clustering
CN110135196B (en) Data fusion tamper-proof method based on input data compression representation correlation analysis
Liu et al. Correlation-based feature partition regression method for unsupervised anomaly detection
CN117009509A (en) Data security classification method, apparatus, device, storage medium and program product
Marian et al. Software defect detection using self-organizing maps
Muruzábal et al. On the visualization of outliers via self-organizing maps
Fornells et al. Unsupervised case memory organization: Analysing computational time and soft computing capabilities
Wu et al. Fragmentary multi-instance classification
Rahman Supervised machine learning algorithms for credit card fraudulent transaction detection: A comparative survey
Bond et al. An unsupervised machine learning approach for ground‐motion spectra clustering and selection
CN115831339B (en) Medical system risk management and control pre-prediction method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230502