CN117151745B - Method and system for realizing marketing event data real-time processing based on data stream engine - Google Patents

Method and system for realizing marketing event data real-time processing based on data stream engine Download PDF

Info

Publication number
CN117151745B
CN117151745B CN202311442334.XA CN202311442334A CN117151745B CN 117151745 B CN117151745 B CN 117151745B CN 202311442334 A CN202311442334 A CN 202311442334A CN 117151745 B CN117151745 B CN 117151745B
Authority
CN
China
Prior art keywords
real
data
data stream
time
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311442334.XA
Other languages
Chinese (zh)
Other versions
CN117151745A (en
Inventor
孙钢
裘炜浩
阮栩翔
金良峰
王波
杨柳欣
林深
郭烨烨
蒋自若
杨帆
李乘风
苏文军
顾琼予
金晟
赵梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202311442334.XA priority Critical patent/CN117151745B/en
Publication of CN117151745A publication Critical patent/CN117151745A/en
Application granted granted Critical
Publication of CN117151745B publication Critical patent/CN117151745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a method and a system for realizing real-time processing of marketing event data based on a data stream engine. If there are some problems with the incoming data stream, the data streaming engine will not function well. The real-time processing method of the invention comprises the following steps: an unordered data stream analysis step of converting an unordered data stream into an ordered data stream; simulating a least square circle by means of an ORIGIN model, and identifying false data injection attacks in real time; invoking an attack classifier (ICON) model to classify real-time false data injection attacks; the classified dummy data extracted in the first two steps is used for injecting attack retrieval control signals, and the attack retrieval control signals are sent to the control unit. The invention effectively processes the unordered data stream, avoids the condition of asymmetric information and weakens the influence of the arrival order of the data stream; real-time false data injection attacks which cannot be observed in the power system are detected and identified, characteristics of the attacks are revealed, and the influence of the attacks on the power system is avoided.

Description

Method and system for realizing marketing event data real-time processing based on data stream engine
Technical Field
The invention relates to the field of data processing, in particular to a method and a system for realizing marketing event data real-time processing based on a data stream engine.
Background
A data streaming engine is a computing engine for processing real-time data streams. It provides an architecture and tool for processing large-scale data streams in real time, supporting high throughput, low latency, and high reliability data processing. The method is commonly used for real-time monitoring and analysis, real-time data processing and the like, and has the characteristics of real-time performance, expandability, fault tolerance and the like.
But if there are some problems with the incoming data stream, the data streaming engine will not function well. Specifically, there are the following problems: (1) The data flow is unordered, and if the unordered data flow enters the system, the effect of the model can be greatly reduced; (2) The presence of real-time False Data Injection (FDI) attacks in the data stream, such as by an intruder, can interfere with the operation of the control system by initiating a False Data Injection (FDI) attack, which can undoubtedly have an impact on the grid system.
Therefore, the data stream needs to be preprocessed before the real-time data stream is correspondingly calculated and processed.
Disclosure of Invention
The invention aims to process data streams input into a power grid system, and provides a method and a system for realizing marketing event data real-time processing based on a data stream engine, which convert unordered data streams into ordered data streams and detect unobservable real-time FDI attacks in the power grid system.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a method for realizing real-time processing of marketing event data based on a data stream engine comprises the following steps:
step 1), an unordered data stream analysis step, namely converting unordered data streams into ordered data streams;
step 2), fitting a least square circle by means of an ORIGIN model, and identifying false data injection attack in real time;
step 3), calling an attack classifier ICON model to classify real-time false data injection attacks;
and 4) injecting attack retrieval control signals by using the classified false data extracted in the first two steps, and sending the attack retrieval control signals to a control unit.
The invention converts unordered data streams into ordered data streams; and (3) identifying false data injection attacks in real time, detecting FDI attacks, classifying the FDI attacks, and removing the FDI attacks. After the data stream is subjected to the two steps, the property of the data stream is greatly improved, and the subsequent operation is facilitated.
Further, the step 1) includes:
step 11), dividing the window into a plurality of check points, and reserving an aggregation result at each check point in an on-demand manner;
step 12), sliding window aggregation;
step 13), converting unordered data stream into ordered data stream based on sliding window.
In sliding window aggregation, the output result at each moment is not reserved, but the aggregation result is reserved at each check point in an on-demand manner (such as taking the maximum value, etc.). Through the operation, the memory of the system can be saved, and the aggregation result is not necessarily changed even if the data which arrives in a delayed way exists, so that the interference of the delayed arrival to the system is avoided. By sliding window aggregation, an operator can see long-time output results instead of output results at a certain moment, and meanwhile, convenience is provided for converting a subsequent unordered data stream into a ordered data stream. Because of the aggregation of the sliding windows, the operator can see the output results a few seconds before the current state, so that the process of converting the data stream from unordered to ordered can be seen more clearly.
Further, in the step 2), a least square circle is fitted by using an ORIGIN model, when the power grid system is attacked, a deviation is generated between the interrupt signal and the control signal, and if the deviation is greater than a threshold set by the system, the power grid system is considered to be attacked by real-time false data injection.
An intruder can interfere with the operation of the control system by initiating a False Data Injection (FDI) attack, which can undoubtedly have an impact on the handling of the marketing event data facts. The ORIGIN model is used to detect and identify real-time FDI attacks in the power system that are not observable. If the deviation of the interrupt signal from the control signal is greater than a set threshold, the system is deemed to be subject to a real-time FDI attack. For normal signals, the path is centered at the origin even though the radius of the signal changes.
Further, in the step 2), the threshold is set to beThe input being PMU data P 0 Initializing a FIFO queue Q with a size of +.>The method comprises the steps of carrying out a first treatment on the surface of the First at P 0 Upper calculation c 0 ,c 0 Is the center of a circle and updates the sliding window +.>The method comprises the steps of carrying out a first treatment on the surface of the Then in->Go up to calculate->And->And will->The data are transmitted into a FIFO queue Q; if->Greater than threshold->And the number of elements in queue Q is +.>And setting an intrusion detection mark, namely that the power grid system is attacked by real-time false data injection.
Further toStep 3), when the power grid system is attacked by real-time false data injection, namely the intrusion detection flag is set, an attack example is createdThen invoke attack classifier->An ICON integrated learner that performs semi-supervised multi-class classification using cross-relative validation; at the same time, it can operate in a non-stationary environment and can self-update without relying on external updates. This may help reveal the characteristics of the attack, including the direction, magnitude, and ratio of the injected erroneous data, etc.
Further, in the step 3), sensitivity parameters are setCalculate similarity measure->There are three cases:
(1)
wherein,representing the original set; />Representing a new class; />Represent the firstiClass->The number of elements in the matrix; />Representation->Memory size of (2); />The representation will->The oldest pattern in the class is removed;
three cases of comparing sensitivity parameters with similarity measures:
(3.1) if similarity measuresGreater than the sensitivity parameter->Then the real-time false data injection attack is summarized as a new class;
(3.2) if similarity measuresLess than or equal to sensitivity parameter->And (1)iClass->The number of elements in the method is less thanThe memory size of the real-time false data injection attack is generalized to +.>Class;
(3.3) if similarity measuresLess than or equal to sensitivity parameter->And class i->The number of the elements in the formula is greater than or equal to->The memory size of (2) will be +.>The oldest pattern in the class is removed while generalizing real-time spurious data injection attacks toClass.
Still further, training samplesSelect->To create a first class->The method comprises the steps of carrying out a first treatment on the surface of the For the followingCalculate->Of (2), wherein->,/>Is the class number before time t; and the symmetry of the cross-correlation relationship is called as similarity measure, and the calculation formula of the cross-correlation relationship is as follows:
(2)
wherein the asterisks indicate the complex conjugate,representation->The first of (3)i+mThe number of elements to be added to the composition,Nis the length;
the calculation formula of the similarity measure is as follows:
(3)
wherein,representing the cardinality of the collection; />Represent the firstiThe output vectors;ρrepresenting the last output vector;representing an upward rounding.
Further, in the step 4), once the attack sample is classified, the original phasor measurement value can be estimated, and the injected error and the search control signal are dynamically calculated and removed in the sampling in a sampling-by-sampling manner; and sends the control signal to the control unit after retrieving it.
Further, the control signal is retrieved and sent to the control unit, and the specific contents are as follows: when the power grid system suffers from real-time false data injection attack, namely the intrusion detection mark is set, the method is applicable to anyThe raw phasor measurement is calculated using the formula +.>And searching the control signal S in the sampling mode; after retrieving the control signal, sending it to the control unit;
(4)
wherein,indicating deviation->And->Representing the coordinates before and after intrusion->Representing imaginary part, < >>Representation ofQThe number of the elements in the process is equal to the number of the elements in the process,Qis a FIFO queue.
The invention also provides a system for realizing the real-time processing of the marketing event data based on the data stream engine, which is used for realizing the method for realizing the real-time processing of the marketing event data based on the data stream engine.
The invention has the following beneficial effects: the invention effectively processes the unordered data stream, avoids the condition of asymmetric information and weakens the influence of the arrival order of the data stream; the real-time FDI attack which cannot be observed in the power system is detected and identified, the characteristics of the attack are revealed, and the influence on the power system is avoided.
Drawings
FIG. 1 is a graph of the relevant types of marketing event data in an embodiment of the present invention;
FIG. 2 is a current state diagram of CPiX when processing a data stream ACQ in accordance with an embodiment of the present invention;
FIG. 3 is a graph of the intermediate results of the effective update of CPiX on non-FIFO streams as a window slides in an embodiment of the present invention;
FIG. 4 is a general block diagram of CPiX in an embodiment of the present invention;
FIG. 5 is a diagram of an exemplary CPiX for running ACQ on a data stream at 0-18 seconds in an embodiment of the present invention;
FIG. 6 is a diagram of an exemplary CPiX for operating an ACQ on a data stream at 2-20 seconds in an embodiment of the present invention;
FIG. 7 is a diagram of an exemplary CPiX for running ACQ on a data stream at 4-22 seconds in an embodiment of the present invention;
FIG. 8 is a diagram of an exemplary CPiX for operating an ACQ on a data stream at 6-24 seconds in an embodiment of the present invention;
FIG. 9 is a diagram of an example CPiX of an ACQ running on a data stream when a checkpoint is fully processed in an embodiment of the present invention;
FIG. 10 is a diagram showing the comparison of control signals and interrupt signals in the step of real-time identification of a dummy data injection attack in accordance with an embodiment of the present invention;
FIG. 11 is a graph of deviation of a control signal and an interrupt signal fitted to a least square circle in a real-time identification step of a dummy data injection attack in accordance with an embodiment of the present invention;
FIG. 12 is an explanatory diagram of real-time FDI attack classification steps in an embodiment of the invention;
FIG. 13 is a flow chart of a method for implementing real-time processing of marketing event data based on a data stream engine according to the present invention.
Detailed Description
The invention will be further described with reference to the drawings and detailed description.
The embodiment provides a method for realizing real-time processing of marketing event data based on a data stream engine, which processes data streams input into a power grid system.
Marketing events refer to a particular activity or situation associated with a marketing campaign, while marketing event data flows refer to the process of continuous flow and recording of data associated with a marketing campaign. The data structure of the marketing event comprises basic factors such as ID, type, timestamp and the like, and has the characteristics of time sequence, real-time performance and the like. The marketing event is combined with the related data of the power grid, so that the marketing event data in the power grid has positive effects and influences on the adjustment of the electricity price of the power company, the feedback of the electricity consumption and the visualization and real-time monitoring of the power grid, and therefore, the marketing event data in the power grid has important influences on the whole power system.
As shown in FIG. 1, the marketing event data comprises text, numbers, etc. types, and the present invention contemplates grid operation data in the digital data. There are also factors that can have an impact on grid operation data, such as unpredictable attacks, most commonly hacking. Hacking can interfere with and control the power system, creating a series of security issues that result in the revealing of marketing event data and the infringement of privacy. For example, the control signal is affected, so that a larger deviation is generated between the interrupt signal and the control signal, and the fluctuation of the interrupt signal is far greater than that of the control signal. For another example, the marketing event data may include some unordered data streams, which are first in and last out as compared to the ordered data streams, which may result in loss and coverage of the marketing event data and an increase in memory, so that the unordered data streams need to be converted before formally processing the data.
Judging the type of the data stream, and converting the unordered data stream in the marketing event into the ordered data stream; on the other hand, the real-time FDI attack is detected, so that the influence on marketing event data and a power system is prevented.
Specifically, a method for implementing real-time processing of marketing event data based on a data stream engine, as shown in fig. 13, includes:
step 1), an unordered data stream analysis step, namely converting unordered data streams into ordered data streams;
step 2), fitting a least square circle by means of an ORIGIN model, and identifying false data injection attack in real time;
step 3), calling an attack classifier ICON model to classify real-time false data injection attacks;
and 4) injecting attack retrieval control signals by using the classified false data extracted in the first two steps, and sending the attack retrieval control signals to a control unit.
The invention comprises the following specific steps:
1. and analyzing the unordered data stream. As shown in fig. 2 to 4:
(1.1) dividing the window into several checkpoints, at each of which the aggregate result is retained on-demand. As shown in fig. 4, the entire window is divided into k checkpoints in fig. 4, the first checkpoint being the oldest checkpoint, the kth checkpoint being the current checkpoint, and each checkpoint comprising |n/k| indices. And reserving an aggregation result in each check point by using a maximum value taking method, namely taking the maximum value every two seconds to obtain p-value, taking the maximum value as t-value in all p-value in the first check point, taking the maximum value as c-value in all p-value in the second check point to the kth check point, reserving the maximum value g-value from all c-value, and finally taking the maximum value in t-value and g-value as the aggregation result. The benefits of this treatment are two: firstly, the memory of the system is saved; and secondly, even if data with delay arrival exist, the probability of the aggregation result is unchanged, so that the interference of the delay arrival to the system is avoided.
(1.2) sliding window polymerization. As shown in FIG. 2, where p-value, t-value, and g-value are still derived by the method of maximizing. It can be seen that the output results are seen in the current window for 0 to 18 seconds, while (15 s, 9), (19 s, 5) and (20 s, 1) data are waiting to enter the system. After 2 seconds, the three data enter the system and the operator can see the output of the next window for 2 seconds to 20 seconds.
(1.3) unordered data stream conversion. As shown in fig. 3, the operator can see the output result of 2 seconds to 20 seconds after the three data are entered into the system. At this point the output of 0 to 2 seconds has not been seen, so the p-value18 of 0 to 2 seconds is erased, which also results in a change in the final t-value from 18 to 16. For data of 19 seconds and 20 seconds, the maximum p-value is taken to be equal to 5, but it has no effect on the value of g-value. For 15 seconds of data, the value of p-value in index 8 is replaced with 15 seconds of data, since the result is greater than the original p-value, but there is no effect on the final result of g-value. This also verifies what is said in (1.1): preserving the aggregate results on an as-needed basis at each checkpoint avoids interference from delayed arrivals on the system. And (3) after the operation of (1.3), the original 15 seconds of data are transmitted to the corresponding positions, and the unordered data stream is successfully converted into the ordered data stream.
In step 1, the algorithm involved is the CPiX algorithm, which is specifically as follows:
algorithm 1 CPiX
Input: with a windowAnd slip->ACQ, polymerization procedure->Data stream->
And (3) outputting: query results
1:
2:
3:
4:From->And->Number of partitions of (a)
5:From->Number of starting checkpoints
6, constructing a double-precision array
7, constructing a double-precision array
8 Window sliding, new streamReach to
9:
10 for the purpose ofIs->
11:
12 end the cycle
13 for theIs->
14:
15:
16:
17 end the cycle
18 if the current checkpoint has been fully processed
19 for the oldest checkpoints
20:
End cycle 21
22 for
23:
24 end the cycle
25 end if
26:
27 end the cycle
The concrete explanation of algorithm 1 is as follows, and the aggregation operation is the maximum value among all the values in this embodiment.
Wherein p1 is the oldest checkpoint; p2 is the other checkpoint. Fig. 5 is an initial state.
Expired records are purged by deleting the oldest p-value (expValue) from the binary tree (algorithm line 9).
(5)
For each aggregate value of p1 (p-value) (algorithm 1, line 10), the tree is updated using equation (6) (algorithm lines 10-12), as shown in FIG. 6.
(6)
For all records in p2 (algorithm line 13), pValue represents the value to be aggregated into the affected partition, with index pIndex. Likewise, cIndex represents the index of the corresponding affected checkpoint. At this point the following procedure will be performed:
update each affected partition (algorithm line 14) by equation (7):
(7)
update each affected checkpoint (algorithm line 15) by equation (8):
(8)
the two-step operation is shown in fig. 7.
The g-value is updated by equation (9) (algorithm line 16), as shown in fig. 8:
(9)
if the current checkpoint has been fully processed or after each |n/k| slip (algorithm line 18), the following two processes will be performed:
a binary tree (algorithm lines 19 to 21) is created from all partitions (p-values) in the oldest checkpoints by equation (10), as shown in fig. 9.
(10)
The g-value is recalculated by polymerizing all k c-values by equation (11)
,/>(11)
Results of calculating ACQ by (12) (algorithm line 26)
(12)
2. False data injection attack real-time identification step. As shown in fig. 10, wherein the solid line is a control signal and the broken line is an interrupt signal. It can be seen that when the grid system is attacked by real-time FDI, the interrupt signal does not coincide with the control signal, and the up-and-down fluctuation of the interrupt signal is much greater than that of the control signal. In addition, a least squares circle is fitted by using the ORIGIN model, as shown in fig. 11, when the grid system is attacked, a deviation is generated between the interrupt signal and the control signal, and if the deviation is greater than a threshold set by the system, the grid system is considered to be attacked by the real-time FDI. For normal signals, the path is centered at the origin even though the radius of the signal changes. Further, as shown in the first half of fig. 12, the threshold is set toThe input being PMU data P 0 Initializing a FIFO (first in first out) queue Q with a size of +.>The method comprises the steps of carrying out a first treatment on the surface of the First at P 0 Upper calculation c 0 ,c 0 Is the center of a circle and updates the sliding window +.>The method comprises the steps of carrying out a first treatment on the surface of the Then in->Go up to calculate->And->And will->The data are transmitted into a FIFO queue Q; if->Greater than threshold->And the number of elements in queue Q is +.>And setting an intrusion detection mark, namely that the power grid system is attacked by the real-time FDI.
The algorithm of the ORIGIN model is as follows:
algorithm 2 ORIGIN
Input: PMU (phasor measurement unit) data P 0 Threshold value
And (3) outputting: intrusion detection result F, attack classification result h, retrieved control signal S
Definition:is training set and inserted into FIFO (first in first out) queue
Initializing:
1 initializing FIFO queue Q to an empty set
2, setting intrusion detection flag f=0
3 setting counter j=1
Training:
4 at P 0 Upper calculation c 0
And (3) testing:
5 when P (P is P 0 The changed value) is established, the following operations are performed
6 updating sliding Window,/>
7 atUpper calculation/>
8:
9:
10 update FIFO queue Q
11 if it isGreater than threshold->And the number of elements in queue Q is +.>
12 setting intrusion detection flag f=1
Creating an attack example
Call attack classifier 14
15:Calculate +.>To retrieve the control signal S
16 end if
17 end the cycle
3. Real-time FDI attack classification step. As shown in the second half of fig. 12, when the grid system is subject to real-time FDI attack, the deviceWhen the intrusion detection flag is set, an attack example is createdThen invoke attack classifier->An ICON ensemble learner that performs semi-supervised multi-class classification using cross-relative validation. While it can operate in a non-stationary environment and can self-update independent of external updates. This may help reveal the characteristics of the attack, including the direction, magnitude, and ratio of the injected erroneous data, etc. Furthermore, a sensitivity parameter is set +.>Calculate similarity measure->Specifically, there are three cases:
three cases of comparing sensitivity parameters with similarity measures:
(3.1) if similarity measuresGreater than the sensitivity parameter->Then the real-time false data injection attack is summarized as a new class;
(3.2) if similarity measuresLess than or equal to sensitivity parameter->And (1)iClass->The number of elements in the method is less thanThe memory size of the real-time false data injection attack is generalized to +.>Class;
(3.3) if similarity measuresLess than or equal to sensitivity parameter->And class i->The number of the elements in the formula is greater than or equal to->The memory size of (2) will be +.>The oldest pattern in the class is removed while generalizing real-time spurious data injection attacks toClass.
The algorithm of the ICON model is as follows:
algorithm 3 ICON
Input: attack exampleSensitivity parameter->
And (3) outputting: attack classification results
Definition:is a training set inserted into the FIFO queue
Return label
Initializing:
1 initializing an empty set E 0
Initializing the first type memory
Training:
3 whenWhen the following operations are performed
4 calculation of
5:
6 calculating similarity measure
7 updating using equation (1)
8, ending the cycle
And (3) testing:
9 whenWhen the following operations are performed
10:
11 if it is
12:
13:
14 otherwise
15:
16 if (1)
17:
18 otherwise
19:
20 end if
21 end if
22 return to
23 end the cycle
Specifically, training samplesSelect->To create a first class->. For->Calculate->Of (2), wherein->,/>Is the class number before time t; and the symmetry of such a cross-correlation relationship is referred to as a similarity measure. The calculation formula of the cross-correlation relation is as follows:
wherein the asterisks indicate the complex conjugate,representation->The first of (3)i+mThe number of elements to be added to the composition,Nis the length;
the calculation formula for the similarity measure is shown below, with the card representing the cardinality of the collection.
4. And a control signal retrieval step. As shown in the second half of fig. 12, when the grid system is subject to a real-time FDI attack, i.e., the intrusion detection flag is set, for any of the followingCalculating ∈10 using the following formula>And retrieves the control signal S in the samples by means of the samples. After retrieving the control signal, it is sent to the control unit. Through the above operation, the signal retrieval module can easily recover the original control signal and eliminate the injected dummy data.
Wherein,indicating deviation->And->Representing the coordinates before and after intrusion->Representing imaginary part, < >>Representation ofQThe number of the elements in the process is equal to the number of the elements in the process,Qis a FIFO queue. />

Claims (7)

1. A method for realizing real-time processing of marketing event data based on a data stream engine is characterized by comprising the following steps:
step 1), an unordered data stream analysis step, namely converting unordered data streams into ordered data streams;
step 2), fitting a least square circle by means of an ORIGIN model, and identifying false data injection attack in real time;
step 3), calling an attack classifier ICON model to classify real-time false data injection attacks;
step 4), injecting attack retrieval control signals by using the classified false data extracted in the previous two steps, and sending the attack retrieval control signals to a control unit;
step 3) when the power grid system suffers from real-time false data injection attack, namely, an intrusion detection mark is set, an attack example is createdThen invoke attack classifier->ICON integrated learner, itPerforming semi-supervised multi-class classification using cross-relative validation;
step 3) setting sensitivity parametersCalculate similarity measure->There are three cases:
(1)
wherein,representing the original set; />Representing a new class; />Represent the firstiClass->The number of elements in the matrix; />Representation->Memory size of (2); />The representation will->The oldest pattern in the class is removed; />Representing attacksExamples;
three cases of comparing sensitivity parameters with similarity measures:
(3.1) if similarity measuresGreater than the sensitivity parameter->Then the real-time false data injection attack is summarized as a new class;
(3.2) if similarity measuresLess than or equal to sensitivity parameter->And (1)iClass->The number of elements in the composition is less than->The memory size of the real-time false data injection attack is generalized to +.>Class;
(3.3) if similarity measuresLess than or equal to sensitivity parameter->And class i->The number of the elements in the formula is greater than or equal to->The memory size of (2) will be +.>The oldest pattern in the class is removed while the real-time dummy data injection attack is generalized to +.>Class;
training sampleSelect->To create a first class->The method comprises the steps of carrying out a first treatment on the surface of the For->Calculation ofOf (2), wherein->,/>Is the class number before time t; and the symmetry of the cross-correlation relationship is called as similarity measure, and the calculation formula of the cross-correlation relationship is as follows:
(2)
wherein the asterisks indicate the complex conjugate,representation->The first of (3)i+mThe number of elements to be added to the composition,Nis the length;
the calculation formula of the similarity measure is as follows:
(3)
wherein,representing the cardinality of the collection; />Represent the firstiThe output vectors;ρrepresenting the last output vector; />Representing an upward rounding.
2. The method for implementing the real-time processing of the marketing event data based on the data stream engine according to claim 1, wherein the step 1) comprises:
step 11), dividing the window into a plurality of check points, and reserving an aggregation result at each check point in an on-demand manner;
step 12), sliding window aggregation;
step 13), converting unordered data stream into ordered data stream based on sliding window.
3. The method for implementing real-time processing of marketing event data based on a data stream engine according to claim 1, wherein in the step 2), a least square circle is fitted by means of an ORIGIN model, when the power grid system is attacked, a deviation is generated between an interrupt signal and a control signal, and if the deviation is greater than a threshold set by the system, the power grid system is considered to be attacked by real-time false data injection.
4. Root of Chinese characterThe method for real-time processing marketing event data based on the data stream engine according to claim 3, wherein in the step 2), the threshold value is set to beThe input being PMU data P 0 Initializing a FIFO queue Q with a size of +.>The method comprises the steps of carrying out a first treatment on the surface of the First according to P 0 Fitting a least squares circle and calculating c 0 ,c 0 Is the center of a circle and updates the sliding windowThe method comprises the steps of carrying out a first treatment on the surface of the Then according to->Calculate->And->And will->The data are transmitted into a FIFO queue Q; if->Greater than a threshold valueAnd the number of elements in queue Q is +.>And setting an intrusion detection mark, namely that the power grid system is attacked by real-time false data injection.
5. The method for real-time processing marketing event data based on the data stream engine according to claim 1, wherein in the step 4), once the attack sample is classified, the original phasor measurement value can be estimated, and the injected error and the search control signal can be dynamically calculated and removed in the sampling by the sampling manner; and sends the control signal to the control unit after retrieving it.
6. The method for realizing real-time processing of marketing event data based on the data stream engine according to claim 5, wherein the retrieving control signal is sent to the control unit, and the specific contents are as follows: when the power grid system suffers from real-time false data injection attack, namely the intrusion detection mark is set, the method is applicable to anyThe raw phasor measurement is calculated using the formula +.>And searching the control signal S in the sampling mode; after retrieving the control signal, sending it to the control unit;
(4)
wherein,indicating deviation->And->Representing the coordinates before and after intrusion->Representing imaginary part, < >>Representation ofQThe number of the elements in the process is equal to the number of the elements in the process,Qis a FIFO queue.
7. A system for realizing real-time processing of marketing event data based on a data stream type engine, which is characterized by being used for realizing the method for realizing real-time processing of marketing event data based on the data stream type engine according to any one of claims 1-6.
CN202311442334.XA 2023-11-01 2023-11-01 Method and system for realizing marketing event data real-time processing based on data stream engine Active CN117151745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311442334.XA CN117151745B (en) 2023-11-01 2023-11-01 Method and system for realizing marketing event data real-time processing based on data stream engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311442334.XA CN117151745B (en) 2023-11-01 2023-11-01 Method and system for realizing marketing event data real-time processing based on data stream engine

Publications (2)

Publication Number Publication Date
CN117151745A CN117151745A (en) 2023-12-01
CN117151745B true CN117151745B (en) 2024-03-29

Family

ID=88908631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311442334.XA Active CN117151745B (en) 2023-11-01 2023-11-01 Method and system for realizing marketing event data real-time processing based on data stream engine

Country Status (1)

Country Link
CN (1) CN117151745B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516206A (en) * 2016-01-28 2016-04-20 西南大学 Network intrusion detection method and system based on partial least squares
JP6501855B1 (en) * 2017-12-07 2019-04-17 ヤフー株式会社 Extraction apparatus, extraction method, extraction program and model
CN113612733A (en) * 2021-07-07 2021-11-05 浙江工业大学 Twin network-based few-sample false data injection attack detection method
CN113904795A (en) * 2021-08-27 2022-01-07 北京工业大学 Rapid and accurate flow detection method based on network security probe
CN114612550A (en) * 2022-03-02 2022-06-10 深圳汇控智能技术有限公司 Circle fitting algorithm based on multistage optimization
CN114760098A (en) * 2022-03-16 2022-07-15 南京邮电大学 CNN-GRU-based power grid false data injection detection method and device
CN114816258A (en) * 2022-04-29 2022-07-29 中国人民解放军国防科技大学 External ordering method and device of NVM (non-volatile memory) and NVM memory
WO2023007479A1 (en) * 2021-07-29 2023-02-02 Elta Systems Ltd. Technique for detecting cyber attacks on radars
WO2023187444A1 (en) * 2022-03-30 2023-10-05 Telefonaktiebolaget Lm Ericsson (Publ) Classification and model retraining detection in machine learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8312542B2 (en) * 2008-10-29 2012-11-13 Lockheed Martin Corporation Network intrusion detection using MDL compress for deep packet inspection
US10871551B2 (en) * 2015-12-31 2020-12-22 Herbert U Fluhler Least squares fit classifier for improved sensor performance

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516206A (en) * 2016-01-28 2016-04-20 西南大学 Network intrusion detection method and system based on partial least squares
JP6501855B1 (en) * 2017-12-07 2019-04-17 ヤフー株式会社 Extraction apparatus, extraction method, extraction program and model
CN113612733A (en) * 2021-07-07 2021-11-05 浙江工业大学 Twin network-based few-sample false data injection attack detection method
WO2023007479A1 (en) * 2021-07-29 2023-02-02 Elta Systems Ltd. Technique for detecting cyber attacks on radars
CN113904795A (en) * 2021-08-27 2022-01-07 北京工业大学 Rapid and accurate flow detection method based on network security probe
CN114612550A (en) * 2022-03-02 2022-06-10 深圳汇控智能技术有限公司 Circle fitting algorithm based on multistage optimization
CN114760098A (en) * 2022-03-16 2022-07-15 南京邮电大学 CNN-GRU-based power grid false data injection detection method and device
WO2023187444A1 (en) * 2022-03-30 2023-10-05 Telefonaktiebolaget Lm Ericsson (Publ) Classification and model retraining detection in machine learning
CN114816258A (en) * 2022-04-29 2022-07-29 中国人民解放军国防科技大学 External ordering method and device of NVM (non-volatile memory) and NVM memory

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Novel Non-Model-Based Fault Detection and Isolation of Satellite Reaction Wheels Based on a Mixed-Learning Fusion Framework;Nozari, HA等;《IFAC PAPERSONLINE》;第52卷(第12期);94-199 *
基于快速回归算法的虚假数据攻击构造新方法;李雪;钟慧欣;孙庆;陈凯;;仪器仪表学报(第03期);179-189 *
基于最小哈希的网络多路虚假数据清洗算法;王影 李柯景;《计算机仿真》;第40卷(第5期);511-514, 519 *

Also Published As

Publication number Publication date
CN117151745A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN111475804B (en) Alarm prediction method and system
CN109359439B (en) software detection method, device, equipment and storage medium
CN112905421B (en) Container abnormal behavior detection method of LSTM network based on attention mechanism
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
CN105471882A (en) Behavior characteristics-based network attack detection method and device
CN108965340B (en) Industrial control system intrusion detection method and system
CN110909811A (en) OCSVM (online charging management system) -based power grid abnormal behavior detection and analysis method and system
CN111709028B (en) Network security state evaluation and attack prediction method
CN111523588B (en) Method for classifying APT attack malicious software traffic based on improved LSTM
CN111526136A (en) Malicious attack detection method, system, device and medium based on cloud WAF
Pathak et al. Study on decision tree and KNN algorithm for intrusion detection system
CN112134862A (en) Coarse-fine granularity mixed network anomaly detection method and device based on machine learning
CN113660196A (en) Network traffic intrusion detection method and device based on deep learning
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN117151745B (en) Method and system for realizing marketing event data real-time processing based on data stream engine
CN114285587B (en) Domain name identification method and device and domain name classification model acquisition method and device
KR102548321B1 (en) Valuable alert screening methods for detecting malicious threat
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
CN114553473A (en) Abnormal login behavior detection system and method based on login IP and login time
Qiao et al. Behavior analysis-based learning framework for host level intrusion detection
CN116756578B (en) Vehicle information security threat aggregation analysis and early warning method and system
CN116545783B (en) Sparse logistic regression-based network intrusion detection method and device
US20230409422A1 (en) Systems and Methods for Anomaly Detection in Multi-Modal Data Streams
CN114745161B (en) Abnormal traffic detection method and device, terminal equipment and storage medium
CN115378702B (en) Attack detection system based on Linux system call

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant