CN116341016B - Big data secure storage method and system - Google Patents

Big data secure storage method and system Download PDF

Info

Publication number
CN116341016B
CN116341016B CN202310626386.6A CN202310626386A CN116341016B CN 116341016 B CN116341016 B CN 116341016B CN 202310626386 A CN202310626386 A CN 202310626386A CN 116341016 B CN116341016 B CN 116341016B
Authority
CN
China
Prior art keywords
response
original data
trend
item
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310626386.6A
Other languages
Chinese (zh)
Other versions
CN116341016A (en
Inventor
荆海伟
李少敏
宋士彪
马海峰
鲁宽
李司慧
李福蕾
王静
董冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Dalu Mechanism & Electron Co ltd
Original Assignee
Jinan Dalu Mechanism & Electron Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Dalu Mechanism & Electron Co ltd filed Critical Jinan Dalu Mechanism & Electron Co ltd
Priority to CN202310626386.6A priority Critical patent/CN116341016B/en
Publication of CN116341016A publication Critical patent/CN116341016A/en
Application granted granted Critical
Publication of CN116341016B publication Critical patent/CN116341016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Storage Device Security (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a big data safe storage method and a system, comprising the following steps: decomposing the original data sequence to obtain a trend item and a residual item; obtaining the degree of abnormality of the original data; obtaining a loss and benefit value of the original data according to the trend item and the abnormality degree of the generated response; obtaining a true damage value of the trend item according to the damage value of the original data and the trend item of the response; obtaining a true damage value of the residual error item according to the responded residual error item; and obtaining corrected values of the original data according to the true loss and benefit values of the trend item and the residual item, and decomposing and safely storing the corrected values of the original data. According to the method, the original data is corrected to obtain the stable trend item data and the small residual item data, namely the ciphertext data quantity is reduced, and the ciphertext data storage efficiency is effectively improved.

Description

Big data secure storage method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a big data safe storage method and a big data safe storage system.
Background
With the development of computer technology, social production gradually enters an automation and informatization age, and intelligent production and intelligent monitoring are presented, wherein data generated by the computer is mainly stored. Because of the large volume of monitoring data generated in large-scale enterprise energy consumption monitoring, a computer system is required for analysis and storage. Meanwhile, the energy consumption data of a large enterprise can generally reflect the capacity and production allocation conditions of the enterprise to a certain extent and relate to private information of the enterprise, so that the energy consumption data needs to be encrypted in the storage process.
The storage security of the data is mainly based on the data conversion according to the relationship between the data, so as to realize the hiding of the original data information; the current monitoring data are time sequence data, so that data conversion can be directly carried out by utilizing conventional time sequence decomposition, such as STL decomposition in a common method, but the conventional STL decomposition method obtains decomposition items depending on original distribution of data, so that the obtained decomposition items comprise unstable trend items and oversized residual items, and the decomposed data are large in quantity, so that the current large-volume energy consumption monitoring data are not facilitated. Therefore, the invention directly corrects the original data, and is beneficial to decomposing to obtain the decomposed item data which is convenient to store.
Disclosure of Invention
The invention provides a big data safe storage method and a big data safe storage system, which aim to solve the existing problems.
The invention discloses a big data safe storage method and a system, which adopt the following technical scheme:
the embodiment of the invention provides a big data secure storage method and a system, wherein the method comprises the following steps:
a method for secure storage of big data, the method comprising the steps of:
monitoring the energy consumption of different production lines through energy consumption monitoring equipment to obtain an original data sequence;
decomposing the original data sequence to obtain all trend items and all residual items; obtaining the degree of abnormality of each original data according to the residual error item corresponding to each original data in the original data sequence, the trend item difference value corresponding to each original data and the nearest adjacent trend item difference value around the trend item; determining to generate a corresponding trend term and residual term for each original data; obtaining a loss benefit value of the original data according to the abnormality degree of the original data corresponding to the trend item generating the response, the difference between the trend item difference value after generating the response and the trend item difference value before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response; calculating the true damage value of each trend item in each response according to the damage value of the original data and the difference between the trend item differences of each response and the adjacent responses; calculating the true damage value of the residual value according to the residual item difference value before and after the response and the adjacent residual item difference value before and after the response;
calculating the real damage and benefit value after the original data correction according to the real damage and benefit values of all response trend items and residual items corresponding to the original data;
calculating the corrected value of each original data according to the corrected real damage value and the original data;
decomposing the corrected value of the original data, and obtaining a decomposed trend item sequence, a decomposed residual item sequence and a decomposed periodic item sequence; and taking the decomposed trend item sequence and the decomposed residual item sequence as ciphertext data, taking the decomposed periodic item sequence as a secret key, and safely storing the secret key.
Preferably, the decomposing the original data sequence to obtain all trend items and all residual items includes the following specific steps:
using STL decomposition to obtain a trend item sequence and a residual item sequence for the original data sequence; each element in the trend item sequence is noted as a trend item; each element in the sequence of residual items is denoted as a residual item.
Preferably, the obtaining the abnormality degree of each original data according to the residual term corresponding to each original data in the original data sequence, the difference value of the trend term corresponding to each original data and the difference value of the nearest neighboring trend term around the trend term includes the following specific steps:
recording the ratio of the residual items corresponding to the original data in the original data sequence to the maximum value of the residual items as a first ratio;
acquiring differences between trend item differences corresponding to the original data and each trend item difference in the nearest neighboring trend items around, and recording the differences as first differences of each trend item difference in the nearest neighboring trend items around; and (3) marking the average value of the first differences of the differences of all the trend items in the nearest neighboring trend items around as a first average value, and marking the product of the first ratio and the first average value as the degree of abnormality of each piece of original data.
Preferably, the specific formulas are as follows, where the damage value of the original data is obtained according to the degree of abnormality of the original data corresponding to the trend item generating the response, the difference between the trend item difference after generating the response and the trend item difference before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response:
the response number of the trend item corresponding to each original data is recorded as K, and the response number of the residual item corresponding to the trend item when responding is recorded as N;
wherein the method comprises the steps ofRepresenting the difference of the responsive u-th trend term after one correction;representation ofTrend term differences prior to response;representation ofThe degree of abnormality of the corresponding raw data;representing the corresponding y-th responsive residual term in the x-th cycle,representation ofThe corresponding residual term before the response,representing the loss value of any one of the original data in the sequence of original data.
Preferably, the calculating the true benefit value of each trend term in each response according to the benefit value of the original data, the difference between the trend term difference of each response and the adjacent trend term difference of each response includes the following specific formulas:
indicating that the loss value of the original data corresponding to the s-th trend term in the adjacent c-th response is obtained,representing the absolute value of the corresponding reduction in the difference of the s-th trend term in the j-th response,the difference value of the front trend difference value and the rear trend difference value is taken as an absolute value when the s trend item responds to the c response in the adjacent response of the j response,representing the actual profit and loss of the s-th trend item in the j-th response; q represents the number of adjacent responses.
Preferably, the calculating the true damage value of the residual value according to the residual item difference before and after the response and the adjacent residual item difference before and after the response includes the following specific steps:
wherein the method comprises the steps ofRepresenting the s-th response residual after the j-th response,representation ofIn response to the previously corresponding residual term(s),represents the s-th residual term after the c-th response in the neighbor response of the j-th response,a loss value representing the original data corresponding to the c-th response,the true profit and loss of the jth response of the s-th residual error is represented; q represents the number of adjacent responses.
Preferably, the real damage value after the original data is corrected refers to the sum of the real damage of trend items and the real damage value of residual items of all responses corresponding to the original data.
Preferably, the calculating the corrected value of each original data according to the corrected real damage value and the original data includes the following specific steps:
and calculating the final correction of each original data according to the corrected real damage value and the residual error item, and recording the sum of the final correction and the original data as the corrected value of each original data.
Preferably, the calculating the final correction of each original data according to the corrected real damage value and residual error item includes the following specific steps:
and taking the opposite number of the residual error item corresponding to the original data as the correction number of the original data, mapping the corrected real damage value to obtain a mapped real damage value, and taking the product of the mapped real damage value and the correction number of the original data as the final correction number of the original data.
Preferably, the system comprises the following modules:
the data acquisition module monitors the energy consumption of different production lines through the energy consumption monitoring equipment to obtain an original data sequence; the data acquisition module monitors the energy consumption of different production lines through the energy consumption monitoring equipment to obtain an original data sequence;
the data processing module is used for decomposing the original data sequence to obtain all trend items and all residual items; obtaining the degree of abnormality of each original data according to the residual error item corresponding to each original data in the original data sequence; obtaining a loss benefit value of the original data according to the abnormality degree of the original data corresponding to the trend item generating the response, the difference between the trend item difference value after generating the response and the trend item difference value before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response; calculating the true damage value of each trend item in each response according to the damage value of the original data and the difference between the trend item differences of each response and the adjacent responses; calculating the true damage value of the residual value according to the residual item difference value before and after the response and the adjacent residual item difference value before and after the response; obtaining corrected values of the original data according to the true loss and benefit values of all response trend items and residual items corresponding to the original data; and decomposing the corrected value of the original data to obtain a decomposed trend item sequence, a decomposed residual item sequence and a decomposed periodic item sequence.
And the data security protection module takes the decomposed trend item sequence and the decomposed residual item sequence as ciphertext data, takes the decomposed periodic item sequence as a secret key and performs security storage.
The technical scheme of the invention has the beneficial effects that:
(1) By correcting the original data, stable trend item data and small residual item data are obtained, namely the ciphertext data amount is reduced, and the ciphertext data storage efficiency is effectively improved.
(2) The original data correction is analyzed to correspond to the change characteristics of a plurality of decomposition items, so that the true loss and benefit value of the original data is obtained, the superposition of correction effects caused by the correction of a plurality of original data is avoided, the correction is scaled, the influence of unnecessary correction on other corrections is avoided, and the ciphertext data storage correction is more favorable.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart showing the steps of a method for securely storing big data according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to specific implementation, structure, characteristics and effects of a big data security storage method and system according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the big data security storage method and system provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of steps of a method for securely storing big data according to an embodiment of the present invention is shown, the method includes the following steps:
step S001: and acquiring original energy monitoring data of a large enterprise.
The energy consumption monitoring equipment is an integrated equipment which utilizes digital technology and informatization means to collect, analyze, process, display and store various physical quantity signal data in real time, and generally comprises a sensor, a digital acquisition card, a digital control device, a computer processing unit, display equipment, a storage unit, communication equipment, a human-computer interface and the like.
The energy monitoring equipment mainly has three functions: firstly, collecting physical quantity signals on line in real time; secondly, data processing; thirdly, remote data transmission. The real-time on-line acquisition of physical quantity signals comprises the real-time acquisition of various different electrical signals (such as voltage, current, power factor and the like) in an electrical loop; the data processing comprises the steps of processing the original data and reading the original data into meaningful information; remote data transmission involves the transmission of processed information to a central station for use by a host computer or other application.
Generally, the production of large enterprises involves a plurality of production lines, and the corresponding energy consumption monitoring needs to be performed on the plurality of production lines at the same time, so that the energy consumption of different production lines is monitored by the energy consumption monitoring equipment, and the obtained monitoring data is the original data sequence to be processed in the subsequent step. It should be noted that, in the case that the original data sequence is a time sequence in which power varies with time. Each data in the original data sequence is referred to as original data in this embodiment.
Step S002: and carrying out STL decomposition on the original data to obtain a trend item sequence, a period item sequence and a residual item sequence.
It should be noted that, when the STL decomposition data is stored, the more stable the corresponding trend term sequence is, the more favorable the construction of the expression is, namely, the storage of the decomposition term (including trend term, residual term and period term) is facilitated, and in addition, the smaller the residual term sequence is, the more favorable the storage thereof is. According to the invention, the original data correction is obtained and corrected through the original data correction loss benefit value, meanwhile, a trend item sequence and a residual sequence which are convenient to store are obtained, and a ciphertext data storage system formed by the trend item sequence and the residual item sequence is realized.
The process of implementing data encryption in combination with STL decomposition is as follows:
because the production line is in a time continuous sequence, the obtained energy consumption data are time sequence distribution data, and obvious distribution rules exist for the time sequence distribution data, wherein the rules are mainly represented by the change trend and the periodic distribution characteristics of the data. In studying the trend and the periodic characteristics of the time series data distribution, STL decomposition is often adopted to obtain the corresponding trend and periodic characteristics. The specific decomposition process is as follows: and (3) using STL decomposition to obtain a trend item sequence, a period item sequence and a residual item sequence for the original data sequence.
The trend item sequence, the period item sequence and the residual item sequence are equal-length time sequence, each data in the sequences is respectively marked as trend item, period item and residual item, and each original data in the original data sequence is respectively in one-to-one correspondence with each trend item in the trend sequence, each period item in the period sequence and each residual item in the residual sequence.
In the energy consumption data analysis, the energy consumption trend and the consumption difference of different time periods are mainly considered, so that the capacity information of the factory can be effectively reflected. In the decomposed item data after STL decomposition, the overall data distribution is composed of a trend item sequence and a periodic item sequence, and the trend item sequence cannot reflect the overall size of the data, namely cannot represent the real energy consumption, while the periodic item sequence cannot reflect the change relation of the data, namely cannot reflect the energy consumption trend, while the residual item sequence mainly reflects the influence of the original data on the regular distribution, is meaningless as a mutation feature in the energy consumption, and mainly reflects the capacity change caused by equipment or other reasons in the production process in the decomposition relation.
The three decomposition amount sequences form original data together, and the three decomposition amount sequences are inexhaustible, so that when one of the three decomposition amount sequences is lost, the original data cannot be accurately obtained, namely the energy consumption data cannot be obtained. Therefore, the characteristic is utilized to encrypt the original data, namely, the trend item sequence and the residual item sequence are saved as ciphertext data, and the periodic item sequence is used as a secret key, and only one period data is saved in the periodic item sequence, so that the corresponding secret key data volume is smaller.
Step S003: and screening the abnormal data in the original data to obtain the decomposed data with larger degree of abnormality.
The original data of the whole distribution may have data with large continuous variation difference and larger data regularity damage, which affects the data decomposition effect; data with unstable trend item sequence distribution and overlarge residual items can exist, and the storage of encrypted ciphertext data is not facilitated, so that the invention needs to screen and correct data with broken rule distribution in the original data and store the ciphertext data.
The data anomaly screening operation is as follows:
data which breaks the data distribution rule in the original data is called distribution abnormal data, and the main manifestation of the data is unstable trend item sequence and overlarge residual item.
Decomposing anomalies represented by the data:
wherein the method comprises the steps ofRepresenting the residual term corresponding to the i-th original data in the original data sequence,represents the maximum value in the sequence of residual terms,the residual error is shown to be close to the maximum, the larger the value is, the larger the residual error is, and the greater the degree of abnormality of the corresponding original data is;representing the difference value of the trend item corresponding to the ith original data;representation ofA v-th trend term difference in the surrounding nearest trend term, whereinThe nearest neighbor trend item around is all trend items in the time window with the length of n and centered on the ith original data and in the trend item sequence; this embodiment will be described with n=9 as an example.
The difference representing the corresponding difference value is indicated,representation ofDifferences from the surrounding n nearest trend term differences.The larger the value, the less stable the trend term, and the greater the degree of anomaly corresponding to the original data.I.e., the degree of abnormality of the i-th original data.
At this time, the original data is screened according to the degree of abnormality of the original data, the original data is arranged according to the degree of abnormality from large to small, the first 40% of data is selected as the abnormal data, the correction of the original data described later in this embodiment is only performed on the abnormal data, the subsequent correction is not performed on the non-abnormal data other than the abnormal data in this embodiment, and the description of the original data is continued for convenience of description.
Step S004: and calculating the data correction response quantity of each decomposition item sequence after STL decomposition.
When the original data is modified, the main purpose is to obtain a trend item sequence which is as stable as possible and ciphertext data which is as small as possible during STL decomposition. Since the regularity of different data in the original data shows different effects on the decomposition of the original data, the invention needs to determine the optimal correction effect of each original data according to the effect of single data on the decomposition process.
It is known that in STL decomposition, a general trend term sequence is obtained by a moving average method, where one original data affects a plurality of surrounding data trend terms, and the specific impact data amount is determined by the number K of steps in the moving average method, that is, a single original data correction causes K trend term responses.
Then, in the period items of the calculation data, the K trend items corresponding to the change relate to the calculation of each period item, so that there is a period itemAnd responses of the data, wherein N represents the number of cycles obtained.
Finally, since the residual term is directly related to the acquisition period term, the residual term data response number is also
Step S005: a correction response loss benefit value is calculated for the STL decomposition data.
Because the data decomposition process involves the relation among a plurality of data, the change of single data has a plurality of data changes in different decomposition data, namely responses corresponding to a plurality of data, and the original data correction effect needs to be judged according to the data responses at the moment so as to adjust the correction.
Firstly, correcting the data according to the residual error item, wherein the corresponding correction of the ith original data is thatWhereinThe ith residual error item is represented, and correction of one original data causes a plurality of data responses to exist in the decomposed data, namely a plurality of data changes occur, at the moment, the more stable the trend item sequence is after the change, the more beneficial to current data storage is corresponding, the more effective the response of the current correction is, the smaller the residual error is after the change, the more beneficial to data storage is, and the better the correction effect is.
When the implementation changes any one of the original data in the original data sequence, K trend item responses are corresponding, K x N residual item responses are corresponding, and the damage value corresponding to the correction is judged, specifically expressed as:
wherein the method comprises the steps ofAnd after one correction, the difference value of the responsive u-th trend item is represented to reflect the stable characteristic of the trend item.Representation ofThe trend term difference value prior to the response,the larger the decrease in the difference of the trend term after the pre-correction, the more effective the response to the current correction,representation ofThe greater the corresponding degree of abnormality of the original data, the greater its value,the more effective the reduction in trend terms is at the smoothness of the trend term,the effective reduction of the K trend item difference values is shown, and the larger the value is, the more favorable the stability of the trend item sequence is, so the larger the current correction damage benefit value is.Representing the corresponding y-th responsive residual term in the x-th cycle,representation ofThe corresponding residual term before the response,the larger the value of the reduced amount of the residual term in the response process is, the smaller the corrected residual term is, and the residual storage is facilitated.Representing the reduction of K response residual terms in the x-th period,the larger the value representing the decrease in all response residuals for N cycles, the larger the current correction loss benefit value.I.e. a damage value representing the correction of any one of the original data in the sequence of original data.
Step S006: the true loss of the STL decomposition data is calculated.
In practice, a plurality of original data needs to be corrected, and since a single correction causes the response of a plurality of decomposed data, there may be response overlap in practice, that is, the corresponding single decomposed data will respond in the correction of different original data, so there is overlap for different correction loss benefit values.
Is the superposition of different times of damage? It is apparent that there is no difference in the degree of corresponding data change in the different responses, and there may be a difference in the direction of data change. At this time, the change of one decomposition data is repeated in multiple responses, so that unnecessary responses and corresponding unreal damage values exist for the decomposition data, and meanwhile, the change direction of the same decomposition data in multiple responses is opposite, so that damage values corresponding to the same decomposition amount in multiple responses conflict, namely, unreal damage values exist. Therefore, in the correction corresponding to the response overlapping, the true damage in the response overlapping is required to be judged according to the data change degree and the change direction. The real damage and benefit is represented by the change relation of each component data in different responses, at this time, the real damage and benefit corresponding to the single STL decomposition data is mainly represented by a trend item and a residual item, and the specific process is as follows:
(1) True damage represented by trend items:
the difference between continuous trend items is mainly reduced in the trend item response, the larger the reduction amount is, the larger the damage value of the corresponding trend item is, and in the multiple responses, the variable of one response is repeated with the reduction amount of other responses.
First, all original data in the original data sequence are corrected in turn, and then each trend item is responded multiple times, that is, each trend item has response overlapping. Taking the s-th trend term as an example for analysis:
wherein the method comprises the steps ofThe absolute value of the difference reduction of the corresponding s-th trend item in the j-th response is represented, namely the difference value of the s-th trend item trend difference value before and after the j-th response is taken as the absolute value again;
taking other Q responses nearest to the jth response as jth adjacent responses; the absolute value of the difference between the front trend difference and the rear trend difference of the s-th trend term in the adjacent c-th response of the j-th response is taken as the formulaIn this embodiment, q=6 is described as an example.
Acquiring original data corresponding to the s-th trend item in adjacent c-th response, namely correcting the original data so that the s-th trend item generates the c-th response, wherein the loss benefit value of the original data is recorded asThe greater the value of which is,the more the trust is,representation ofQuilt is covered withThe greater the inclusion relationship, the lesser the degree of inclusion,representation ofAnd (3) withThe smaller the inclusion degree, the correspondingThe more the trust is,representing the relative relation of the difference reduction of the s-th trend item in the j-th response and other responses about the trend item, reflecting the actual damage of the s-th trend item in the j-th response,i.e., the s-th trend term is actually profit and loss in the j-th response.
The average value of the real damage of the s-th trend item in all responses is recorded as the average real damage of the s-th trend item. (2) true profit and loss exhibited by residual terms:
the damage of the residual data is mainly represented by the change direction and the change degree of the residual data, the larger the reduction degree of the residual data is, the higher the true damage value is, the true damage in multiple responses is mainly dependent on the residual reduction degree in different responses, the possible residual changes of one residual data in multiple responses can be overlapped, the damage of the residual responses is influenced, and at the moment, each residual item corresponds to the true damage in different responses:
wherein the method comprises the steps ofRepresenting the s-th response residual after the j-th response,representation ofIn response to the previously corresponding residual term(s),the larger the value of the reduction amount of the expression residual error is, the larger the real loss of the s-th response residual error after the j-th response is;represents the s-th residual term after the adjacent c-th response of the j-th response,the jth response is represented by the jth residual reduction of the next response,representation ofIncludedThe greater the value of the inclusion relationship, the lesser the degree of inclusion,representation ofIs relatively involved in the relationship,the larger the value of the loss value representing the original data corresponding to the c-th response,the more trusted corresponds toThe more the trust is,the relative inclusion relation of the reduction of the s-th residual error in the j-th correction and the reduction of the s-th residual error in other Q-th corrections is shown, and the larger the value is, the smaller the inclusion degree is, and the larger the damage of the corresponding s-th residual error in the j-th response is.I.e. the actual impairment of the response at the jth time of the s-th residual.
The average value of the real damage of the s-th residual error term in all responses is recorded as the average real damage of the s-th residual error term
(3) The true profit and loss of each response.
According to the analysis, the real damage and benefit corresponding to each decomposition item in each original data correction is obtained, and since a plurality of decomposition items always change in the response of the original data, the final original data is still the damage and benefit value of each original data, so that the real damage and benefit value corresponding to each correction is required to be determined according to the real damage and benefit of all the decomposition items, so that the correction of the original data is determined according to the real damage and benefit value.
The true impairment value of the response for each raw data is determined jointly by the true impairment of the multiple decomposition terms of the corresponding response,
when the ith original data is modified, the average true damage of the s-th trend item in the K trend items corresponding to the ith original data is recorded asThe average true damage of the s-th residual item in the corresponding NxK residual items is recorded asThen the calculation method of the corrected true loss benefit value PF corresponding to the ith original data is:
wherein the method comprises the steps ofRepresenting the average true benefit of the ith raw data response corresponding to the ith trend term,representing the average true loss of all trend terms corresponding to the ith raw data,represents the average true loss of the s-th residual term in the i-th original data modification,representing the sum of all response residuals and true damage in N periods in the ith original data response,representing the sum of the true damage of the trend term and the residual term of all responses corresponding to the ith original data correction, i.e. reflecting the true damage value of the ith original data correction
Step S007: and calculating the final correction of the original data.
The loss benefit value of each original data correction is obtained in the steps, the beneficial effects of the current correction on the storage of the trend item and the residual item are reflected, the larger the value is, the higher the beneficial effects are, so that the pre-correction is scaled according to the corresponding loss benefit value of the correction in practice, and the correction effect which is more beneficial to the data storage is obtained. The final correction obtained at this time is:
wherein the method comprises the steps ofRepresenting the residual term corresponding to the i-th original data,namely the original data correction directly reflected by the residual term,representation ofThe mapping method directly adopts the existing number function,representing the final correction of the ith raw data. Wherein (0, 2) is aboveFor a pair ofAdjusting according to the corresponding correction loss benefit valueThe storage efficiency of the original data to the trend item and the residual item is effectively improved, namely the original data encryption ciphertext data storage is facilitated.
It should be noted that, the correction operations described in the steps S004 to S007 are only for the abnormal data, and are not performed for the non-abnormal data, and the final correction of the non-abnormal data is directly set to 0, that is, the correction operation is not performed for the non-abnormal data in this embodiment.
Step S008: the original data is encrypted and stored securely by the original data and final corrections.
Determining the final correction of the original data according to the steps, and then directly correcting the original data to obtain corrected original data, wherein the corrected original data is as follows:
wherein the method comprises the steps ofRepresenting the i-th original data of the data,indicating that the i-th original data corresponds to the final correction,representing the i-th original data correction value.
According to this step, all the original data to be corrected are corrected. STL decomposition is carried out on the corrected original data sequence to obtain a corrected trend item sequence, a corrected residual item sequence and a corrected periodic item sequence; the corrected trend item sequence and the corrected residual item sequence are stored as ciphertext, the corrected periodic item sequence and all final corrections are used as keys, and the keys are stored in the offline device. For example, in a mobile USB flash disk, the mobile USB flash disk is prevented from being stolen by a network, and the security of data storage is ensured.
When decryption is needed, the storage place trend image sequence, the residual error item sequence and the periodic item sequence are restored into a corrected original data sequence, and then the original data sequence, namely the plaintext data, is obtained according to the difference value between the corrected original data sequence and the stored final correction.
Finally, the present embodiment provides a big data secure storage system, which includes the following modules:
the data acquisition module monitors the energy consumption of different production lines through the energy consumption monitoring equipment to obtain an original data sequence;
the data processing module is used for decomposing the original data sequence to obtain all trend items and all residual items; obtaining the degree of abnormality of each original data according to the residual error item corresponding to each original data in the original data sequence; obtaining a loss benefit value of the original data according to the abnormality degree of the original data corresponding to the trend item generating the response, the difference between the trend item difference value after generating the response and the trend item difference value before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response; calculating the true damage value of each trend item in each response according to the damage value of the original data and the difference between the trend item differences of each response and the adjacent responses; calculating the true damage value of the residual value according to the residual item difference value before and after the response and the adjacent residual item difference value before and after the response; obtaining corrected values of the original data according to the true loss and benefit values of all response trend items and residual items corresponding to the original data; decomposing the corrected value of the original data, and obtaining a decomposed trend item sequence, a decomposed residual item sequence and a decomposed periodic item sequence;
and the data security protection module takes the decomposed trend item sequence and the decomposed residual item sequence as ciphertext data, takes the decomposed periodic item sequence as a secret key and performs security storage.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (5)

1. A method for securely storing big data, the method comprising the steps of:
monitoring the energy consumption of different production lines through energy consumption monitoring equipment to obtain an original data sequence; arranging according to the degree of abnormality from large to small, selecting data with a preset proportion arranged at the front as abnormal data, and setting the final correction of non-abnormal data to 0 directly for the subsequent correction operation only aiming at the abnormal data, namely, not performing the correction operation on the non-abnormal data;
decomposing the original data sequence to obtain all trend items and all residual items; obtaining the degree of abnormality of each original data according to the residual error item corresponding to each original data in the original data sequence, the trend item difference value corresponding to each original data and the nearest adjacent trend item difference value around the trend item; determining to generate a corresponding trend term and residual term for each original data; obtaining a loss benefit value of the original data according to the abnormality degree of the original data corresponding to the trend item generating the response, the difference between the trend item difference value after generating the response and the trend item difference value before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response; calculating the true damage value of each trend item in each response according to the damage value of the original data and the difference between the trend item differences of each response and the adjacent responses; calculating the true damage value of the residual value according to the residual item difference value before and after the response and the adjacent residual item difference value before and after the response;
calculating the real damage and benefit value after the original data correction according to the real damage and benefit values of all response trend items and residual items corresponding to the original data;
calculating the corrected value of each original data according to the corrected real damage value and the original data;
decomposing the corrected value of the original data, and obtaining a decomposed trend item sequence, a decomposed residual item sequence and a decomposed periodic item sequence; the decomposed trend item sequence and the decomposed residual item sequence are used as ciphertext data, and the decomposed periodic item sequence is used as a secret key and is stored safely;
wherein, the response is: the change of single data is that a plurality of data changes exist in different decomposition data, namely the response corresponding to a plurality of data; the adjacent secondary responses are: a response that is chronologically adjacent to the current response;
the specific formulas are as follows:
the response number of the trend item corresponding to each original data is recorded as K, and the response number of the residual item corresponding to the trend item when responding is recorded as N;
wherein the method comprises the steps ofRepresenting the difference of the responsive u-th trend term after one correction; />Representation->Trend term differences prior to response; />Representation->The degree of abnormality of the corresponding raw data; />Representing the corresponding y-th responsive residual term in the x-th cycle,/->Representation->Residual items corresponding before the response, +.>A loss value representing any one of the original data in the original data sequence;
the real loss benefit value of each trend item in each response is calculated according to the loss benefit value of the original data, the difference between the trend item difference value of each response and the adjacent trend item difference value of each response, and the specific formula is as follows:
loss value representing the original data corresponding to the acquisition of the s-th trend term in the adjacent c-th response, is->Absolute value representing the corresponding decrease in the difference of the s-th trend term in the j-th response,/->When the s trend item is in the c response in the adjacent response of the j response, the difference value of the front trend difference value and the rear trend difference value is taken as an absolute value, and the difference value is +.>Representing the actual profit and loss of the s-th trend item in the j-th response; q represents the number of adjacent responses;
the method comprises the following specific steps of:
wherein the method comprises the steps ofRepresents the s response residual error after the j response, and +.>Representation->Responsive to the previously corresponding residual term, +.>Representing the s-th residual term after the c-th response in the neighbor response of the j-th response,/th residual term after the c-th response>A loss value representing the original data corresponding to the c-th response,/th response>The true profit and loss of the jth response of the s-th residual error is represented; q represents the number of adjacent responses;
the real damage value after the original data is corrected refers to the sum of the real damage values of all response trend items corresponding to the original data and the real damage values of residual items;
the method comprises the following specific steps of: and calculating the final correction of each original data according to the corrected real damage value and the residual error item, and recording the sum of the final correction and the original data as the corrected value of each original data.
2. The method for securely storing big data according to claim 1, wherein the decomposing of the original data sequence to obtain all trend items and all residual items comprises the following specific steps:
using STL decomposition to obtain a trend item sequence and a residual item sequence for the original data sequence; each element in the trend item sequence is noted as a trend item; each element in the sequence of residual items is denoted as a residual item.
3. The method for securely storing big data according to claim 1, wherein the obtaining the degree of abnormality of each original data according to the residual term corresponding to each original data in the original data sequence, the difference value of the trend term corresponding to each original data, and the difference value of the nearest neighboring trend term around the trend term comprises the following specific steps:
recording the ratio of the residual items corresponding to the original data in the original data sequence to the maximum value of the residual items as a first ratio;
acquiring differences between trend item differences corresponding to the original data and each trend item difference in the nearest neighboring trend items around, and recording the differences as first differences of each trend item difference in the nearest neighboring trend items around; and (3) marking the average value of the first differences of the differences of all the trend items in the nearest neighboring trend items around as a first average value, and marking the product of the first ratio and the first average value as the degree of abnormality of each piece of original data.
4. The method for securely storing big data according to claim 1, wherein the calculating the final correction of each original data according to the corrected real damage value and residual term comprises the following specific steps:
and taking the opposite number of the residual error item corresponding to the original data as the correction number of the original data, mapping the corrected real damage value to obtain a mapped real damage value, and taking the product of the mapped real damage value and the correction number of the original data as the final correction number of the original data.
5. A big data secure storage system, the system comprising the following modules:
the data acquisition module monitors the energy consumption of different production lines through the energy consumption monitoring equipment to obtain an original data sequence; arranging according to the degree of abnormality from large to small, selecting data with a preset proportion arranged at the front as abnormal data, and setting the final correction of non-abnormal data to 0 directly for the subsequent correction operation only aiming at the abnormal data, namely, not performing the correction operation on the non-abnormal data;
the data processing module is used for decomposing the original data sequence to obtain all trend items and all residual items; obtaining the degree of abnormality of each original data according to the residual error item corresponding to each original data in the original data sequence; obtaining a loss benefit value of the original data according to the abnormality degree of the original data corresponding to the trend item generating the response, the difference between the trend item difference value after generating the response and the trend item difference value before generating the response, and the difference between the residual item after generating the response and the residual item before generating the response; calculating the true damage value of each trend item in each response according to the damage value of the original data and the difference between the trend item differences of each response and the adjacent responses; calculating the true damage value of the residual value according to the residual item difference value before and after the response and the adjacent residual item difference value before and after the response; calculating the real damage and benefit value after the original data correction according to the real damage and benefit values of all response trend items and residual items corresponding to the original data; calculating the corrected value of each original data according to the corrected real damage value and the original data; decomposing the corrected value of the original data, and obtaining a decomposed trend item sequence, a decomposed residual item sequence and a decomposed periodic item sequence;
wherein, the response is: the change of single data is that a plurality of data changes exist in different decomposition data, namely the response corresponding to a plurality of data; the adjacent secondary responses are: a response that is chronologically adjacent to the current response;
the specific formulas are as follows:
the response number of the trend item corresponding to each original data is recorded as K, and the response number of the residual item corresponding to the trend item when responding is recorded as N;
wherein the method comprises the steps ofRepresenting the difference of the responsive u-th trend term after one correction; />Representation->Trend term differences prior to response; />Representation->The degree of abnormality of the corresponding raw data; />Representing the corresponding y-th responsive residual term in the x-th cycle,/->Representation->Residual items corresponding before the response, +.>A loss value representing any one of the original data in the original data sequence;
the real loss benefit value of each trend item in each response is calculated according to the loss benefit value of the original data, the difference between the trend item difference value of each response and the adjacent trend item difference value of each response, and the specific formula is as follows:
loss value representing the original data corresponding to the acquisition of the s-th trend term in the adjacent c-th response, is->Absolute value representing the corresponding decrease in the difference of the s-th trend term in the j-th response,/->When the s trend item is in the c response in the adjacent response of the j response, the difference value of the front trend difference value and the rear trend difference value is taken as an absolute value, and the difference value is +.>Representing the actual profit and loss of the s-th trend item in the j-th response; q represents the number of adjacent responses
The method comprises the following specific steps of:
wherein the method comprises the steps ofRepresents the s response residual error after the j response, and +.>Representation->Responsive to the previously corresponding residual term, +.>Representing the s-th residual term after the c-th response in the neighbor response of the j-th response,/th residual term after the c-th response>A loss value representing the original data corresponding to the c-th response,/th response>The true profit and loss of the jth response of the s-th residual error is represented; q represents the number of adjacent responses;
the real damage value after the original data is corrected refers to the sum of the real damage values of all response trend items corresponding to the original data and the real damage values of residual items;
the method comprises the following specific steps of: calculating the final correction of each original data according to the corrected real damage value and residual error item, and recording the sum of the final correction and the original data as the corrected value of each original data;
and the data security protection module takes the decomposed trend item sequence and the decomposed residual item sequence as ciphertext data, takes the decomposed periodic item sequence as a secret key and performs security storage.
CN202310626386.6A 2023-05-31 2023-05-31 Big data secure storage method and system Active CN116341016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310626386.6A CN116341016B (en) 2023-05-31 2023-05-31 Big data secure storage method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310626386.6A CN116341016B (en) 2023-05-31 2023-05-31 Big data secure storage method and system

Publications (2)

Publication Number Publication Date
CN116341016A CN116341016A (en) 2023-06-27
CN116341016B true CN116341016B (en) 2023-08-11

Family

ID=86893393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310626386.6A Active CN116341016B (en) 2023-05-31 2023-05-31 Big data secure storage method and system

Country Status (1)

Country Link
CN (1) CN116341016B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117807279B (en) * 2024-02-29 2024-05-14 辽宁云也智能信息科技有限公司 Data retrieval method for highway quality detection

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190070728A (en) * 2017-12-13 2019-06-21 주식회사 케이티 Method and Apparatus for Checking of Error of Time Series Data
CN113228006A (en) * 2018-12-17 2021-08-06 华为技术有限公司 Apparatus and method for detecting anomalies in successive events and computer program product thereof
CN113987941A (en) * 2021-10-29 2022-01-28 新智我来网络科技有限公司 Time series prediction method, device, computer equipment and readable storage medium
CN114662696A (en) * 2020-12-23 2022-06-24 微软技术许可有限责任公司 Time series exception ranking
CN115617867A (en) * 2022-09-22 2023-01-17 南京上铁电子工程有限公司 Time series prediction method, electronic device and storage medium
CN115964614A (en) * 2022-11-07 2023-04-14 中国人民解放军海军工程大学 CEEMDAN decomposition and period item extraction method, system, device and medium
CN116032476A (en) * 2023-03-30 2023-04-28 北京点聚信息技术有限公司 Electronic contract content intelligent encryption method based on sequence decomposition
CN116049905A (en) * 2023-04-03 2023-05-02 西安中创博远网络科技有限公司 Tamper-proof system based on detecting system file change

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8676964B2 (en) * 2008-07-31 2014-03-18 Riverbed Technology, Inc. Detecting outliers in network traffic time series

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190070728A (en) * 2017-12-13 2019-06-21 주식회사 케이티 Method and Apparatus for Checking of Error of Time Series Data
CN113228006A (en) * 2018-12-17 2021-08-06 华为技术有限公司 Apparatus and method for detecting anomalies in successive events and computer program product thereof
CN114662696A (en) * 2020-12-23 2022-06-24 微软技术许可有限责任公司 Time series exception ranking
CN113987941A (en) * 2021-10-29 2022-01-28 新智我来网络科技有限公司 Time series prediction method, device, computer equipment and readable storage medium
CN115617867A (en) * 2022-09-22 2023-01-17 南京上铁电子工程有限公司 Time series prediction method, electronic device and storage medium
CN115964614A (en) * 2022-11-07 2023-04-14 中国人民解放军海军工程大学 CEEMDAN decomposition and period item extraction method, system, device and medium
CN116032476A (en) * 2023-03-30 2023-04-28 北京点聚信息技术有限公司 Electronic contract content intelligent encryption method based on sequence decomposition
CN116049905A (en) * 2023-04-03 2023-05-02 西安中创博远网络科技有限公司 Tamper-proof system based on detecting system file change

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Comparison of Statistical and Deterministic Smoothing Methods to Reduce the Uncertainty of Performance Loss Rate Estimates;Philip Ingenhoven等;IEEE Journal of Photovoltaics;全文 *

Also Published As

Publication number Publication date
CN116341016A (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN109902832B (en) Training method of machine learning model, anomaly prediction method and related devices
Faisal et al. Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: A feasibility study
CN116341016B (en) Big data secure storage method and system
WO2022142120A1 (en) Data detection method and apparatus based on artificial intelligence, and server and storage medium
Li et al. Multi-agent system based distributed pattern search algorithm for non-convex economic load dispatch in smart grid
US20100275147A1 (en) Industrial energy demand management and services
US9255948B2 (en) Data converting device, data processing device, power consumption processing system and computer program product
CN109711155A (en) A kind of early warning determines method and apparatus
WO2019036095A1 (en) Deep convolutional neural network based anomaly detection for transactive energy systems
CN115766104A (en) Self-adaptive generation method based on improved Q-learning network security decision
CN112131274B (en) Method, device, equipment and readable storage medium for detecting abnormal points of time sequence
CN115907307B (en) Power grid real-time data interaction-oriented online analysis method for carbon emission flow of power system
CN116049905B (en) Tamper-proof system based on detecting system file change
CN114598556B (en) IT infrastructure configuration integrity protection method and protection system
CN113282356B (en) Method, system and storage medium for executing local distributed analysis in real time
CN115208604A (en) Method, device and medium for detecting AMI network intrusion
CN115238574A (en) Digital twin-based power transmission line data management method
CN112329025B (en) Power terminal bypass safety analysis method and power terminal bypass safety analysis system
CN111984982A (en) Method for hiding information, electronic equipment and computer readable storage medium
CN117395279B (en) Building intelligent energy management system based on Internet of things
CN109450617A (en) Encryption and decryption method and device, electronic equipment, computer readable storage medium
CN116992274B (en) Short-term wind speed prediction method and system based on improved principal component regression model
US20230342453A1 (en) Cross-layer anomaly detection in industrial control networks
CN116032457A (en) Chaotic stream encryption method, device, equipment and medium based on Tent mapping
Su Intelligent Network Security Situation Prediction Method Based on Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant