CN113239351A - Novel data pollution attack defense method for Internet of things system - Google Patents

Novel data pollution attack defense method for Internet of things system Download PDF

Info

Publication number
CN113239351A
CN113239351A CN202011443807.4A CN202011443807A CN113239351A CN 113239351 A CN113239351 A CN 113239351A CN 202011443807 A CN202011443807 A CN 202011443807A CN 113239351 A CN113239351 A CN 113239351A
Authority
CN
China
Prior art keywords
server
result
data
terminal
update
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011443807.4A
Other languages
Chinese (zh)
Other versions
CN113239351B (en
Inventor
王骞
刘歌灵
赵令辰
胡胜山
姜建林
沈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202011443807.4A priority Critical patent/CN113239351B/en
Publication of CN113239351A publication Critical patent/CN113239351A/en
Application granted granted Critical
Publication of CN113239351B publication Critical patent/CN113239351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Virology (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a novel data pollution attack defense method facing an internet of things system, which comprises the steps of collecting updates of all terminals of current round of training by a server, and then randomly grouping the updates to form a secondary result; for each secondary result, randomly selecting a terminal from the set of all terminals according to a preset rule and authorizing, wherein each terminal can detect and evaluate the received secondary result; the server collects the evaluation report of the terminal on the secondary result, and counts the times that a certain update is judged to have potential malice; the server adjusts the weight, including adding a penalty coefficient to each update based on the number of times each update is judged to be malicious; and calculating an average value by using the penalty coefficient as a weight, and aggregating secondary results. In order to resist data pollution attacks possibly existing in statistical analysis under the scene of the Internet of things, abnormal information uploaded by a user can be detected in collaborative learning in which data held by different terminals are independently and identically distributed or not.

Description

Novel data pollution attack defense method for Internet of things system
Technical Field
The invention belongs to the field of network security, and particularly relates to a novel data pollution attack defense method for an Internet of things system.
Background
In recent years, the rapid development of deep learning technology has made great success in various applications such as image understanding, voice recognition, cancer analysis and the like, mainly because of the support of a large amount of data acquired by internet services, and the internet of things has a great deal of diversified sensor equipment, which makes a major contribution to data acquisition. Generally, if statistical analysis can be performed using more diverse data, the results of the analysis will be more accurate, which will thus encourage companies and institutions to collect as much data from the terminals as possible. These data are typically generated by sensors that are personal to the user, such as GPS, cameras, smart phones, heart rate monitors, etc. Because the terminal equipment collects data locally and only shares the analysis result of the terminal equipment but not the collected data, the data collected by a large number of terminal equipment can be fully utilized. In each round of the analysis result optimization process, the server randomly selects a subset of the terminal equipment, the selected terminal equipment downloads the current global analysis result, the updated result is calculated based on respective local data and is transmitted back to the server, and the server carries out aggregation to construct an improved global analysis result. For the purpose of protecting the privacy of the terminal device, the server is designed to be invisible to the terminal device local data.
A drawback of this design is that it provides opportunities for data corruption attacks. Because each terminal device can influence the overall analysis result, and the server can not see the respective data analysis process, an attacker can construct a malicious result and implant a backdoor in the malicious result to achieve the purpose of controlling the overall result function. There is evidence that the attacker-tailored updates can control the behavior of the global analysis results in classifying some data specified by the attacker, while not affecting its performance on normal tasks. This will provide advantages for attackers in many scenarios, such as a merchandise recommendation system constructed by cooperation of multiple shopping malls, etc.
In order to solve such problems, research has been conducted to analyze the updates of all the terminal devices statistically and find out the special updates. However, in reality, the collected data held by the terminal device usually has non-independent and same distribution, which makes different updates naturally have great difference, so that the practical role played by the existing method is very limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a novel data pollution attack defense method facing an internet of things system, and abnormal updating can be detected in a system with data acquisition in independent and same distribution or non-independent and same distribution. Meanwhile, in the process of checking abnormal updating, the method can prevent malicious users from stealing the participation information of other people, and protect the privacy of the users.
The technical scheme of the invention is a novel data pollution attack defense method facing an internet of things system, which comprises the following steps:
step 1, after collecting K updates of all terminals in the current round of training, a server randomly divides the updates into d groups, wherein the update number of each group is u-K/d; forming d secondary results by aggregating the u updates for each group; wherein d is a preset numerical value;
step 2, setting the tth round for optimizing the result, and for each secondary result, collecting S from all terminals according to a preset ruletIn the method, e terminals are randomly selected and authorized, and each terminal detects and evaluates m received secondary results, m<d;
Step 3, the server collects the evaluation reports of the terminal to the secondary results, wherein a majority voting mode is adopted, each terminal is required to submit a binary matrix to report the performance of each evaluated result, and the binary matrix declares whether the data samples of the related types accurately fit the corresponding statistical results; the server counts the times that a certain update is judged to be potentially malicious through the collected binary matrix;
step 4, after the possible maliciousness degree evaluation of each update is obtained, the server adjusts the weight of each result in the later average calculation process, and a penalty coefficient is added to each update based on the number of times that each update is judged to be malicious;
step 5, calculating an average value by using the penalty coefficient as a weight, aggregating the secondary results as follows,
Figure BDA0002823519090000021
wherein t is defined as the updated round number, d represents the number of terminal devices, and wt+1Aggregating received updated global analysis results of round t +1, w, for server groupstAs a global result of the server's round t,
Figure BDA0002823519090000022
indicating secondary results to be applied to the ith terminal device of round t +1
Figure BDA0002823519090000023
The number of penalty coefficients.
And, adopting a layered method, calculating the ith updated penalty coefficient in the round by utilizing a penalty function according to the data returned by observation
Figure BDA0002823519090000024
And, for use in either a stand-alone co-distributed or a non-stand-alone co-distributed environment.
In step 1, for the case of non-independent and same distribution, the server sums the binary matrices of the recorded data set information received from the terminal device to obtain a matrix count value, then arranges the values corresponding to each category in the count value in a descending order, and assigns the minimum value to d.
And for the case of non-independent and same distribution, the following protocol is established, the secondary result is dynamically distributed to the proper terminal equipment for evaluation,
the terminal equipment reports the server firstly, whether the quantity of the local acquired data on each data category exceeds a certain threshold value or not so as to facilitate the server to distribute the secondary result to the terminal equipment with a large quantity of acquired data on the same data category for evaluation;
in each round of updating, the server performs the process once, and selects a batch of new terminal equipment according to the updating result to realize the dynamic allocation of the secondary result.
The invention reduces the influence of the attack on the overall analysis result by utilizing the difference of the update result caused by the data pollution attack in the Internet of things system and the verifiability of the terminal equipment terminal to other terminal equipment updates through the cross verification among clients and adopting the method of reducing the weight of abnormal equipment, thereby defending the attack of malicious terminal equipment and ensuring the safety of the system.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a schematic diagram of allocating detection tasks to terminal devices in the embodiment of the present invention.
Detailed Description
The technical solution of the present invention is specifically described below with reference to the accompanying drawings and examples.
The invention provides a novel defense method for pollution attack based on client-side cross validation, which is mainly based on a data statistical analysis framework oriented to an internet of things system and considers the characteristics of independence of terminal equipment data acquisition and local calculation processes and the characteristic of difference of abnormal results in fitting normal data. The method fully considers the distribution characteristics of terminal equipment data in independent and uniform distribution and non-independent and uniform distribution environments, sets a penalty coefficient by a majority voting method, and adjusts the weight of received updates during aggregation. The defense realized by the invention has higher success rate and is also suitable for the condition that the data of the terminal equipment are distributed in a non-independent and same way.
The embodiment of the invention provides a novel data pollution attack defense method facing an internet of things system, which comprises the following steps:
step 1, training the analysis result, G, by using the local data set according to the normal flow by the terminal equipment participating in the trainingtAs a result of the global analysis of the t-th round, DlocalFor locally collecting data of terminal equipment, Lt+1Are updates submitted for the analysis results of round t + 1.
And the server collects the updates uploaded by the terminal equipment and transmits the updated analysis results to the terminal equipment for detection. The server will randomly divide the K updates into d parts and then each part will be aggregated into a secondary result by averaging u ═ K/d updates. Each secondary result will then be randomly assigned to e terminal devices (i.e. a part of the terminal devices is selected for evaluation), and each terminal device will evaluate m (m < d) analysis results. For the case of non-independent and same distribution, the setting of the d value has certain limitation. Due to the disparity of data held by the terminal devices, it is difficult to balance the task load of each individual, especially in the case of d > em, the number of terminal devices to evaluate is not sufficient to evaluate all secondary results. In order to solve the problem, the invention adopts the following method to determine the d value: the server sums the binary matrixes of the recorded data set information received from the terminal equipment to obtain matrix count values, then arranges the numerical values corresponding to each category in the count values in a descending order, and assigns the minimum numerical value to d.
Under ideal conditions, each updated analysis result is transmitted to all terminal devices, so that a high success rate on attack detection can be achieved. But in fact it is not feasible to let every terminal device download and evaluate all updates for computational and communication costs. In view of this, the invention selects only a part of the terminal devices for evaluation for each analysis result. For the independent and same-distributed data conditions, a method of randomly selecting a small part of terminal equipment for evaluation is adopted, so that the high success rate can still be achieved under the condition that the terminal equipment has a large number of data samples. Due to the motivation of the deep learning framework, it is the more information collected from various terminal devices to optimize the performance of the analysis results, so the premise that the selected terminal device has a large number of data samples can be easily satisfied.
The invention improves the condition that the data of the terminal equipment is not independently and identically distributed on the basis of independent and identically distributed. In the case of non-independent and uniform distribution of data, some data categories are unique to some terminal devices, and the analysis results obtained by using the data cannot be directly evaluated on the data sets of other terminal devices. The invention designs a protocol to dynamically allocate secondary results to appropriate terminal devices, so that each terminal device only needs to evaluate the analysis results established in the range of the data set owned by the terminal device. Particularly, the invention adopts the following method to judge whether a certain terminal device is suitable for evaluating a certain analysis result:
firstly, the terminal device informs the server whether the data sample of each data category exceeds a corresponding threshold value, the exceeding is represented by 1, the not exceeding is represented by 0, and the feedback of the terminal device is represented by a binary matrix. In each round of training, the local data of the terminal equipment is possible to update, so the binary matrixes need to be updated accordingly, and the server correspondingly selects the series of terminal equipment with the highest similarity to the secondary result data class to execute the evaluation task. The binary matrix is used instead of the terminal equipment reporting the amount of each type of data, so that the information of the database of the terminal equipment is prevented from being leaked and utilized by an attacker, and the data privacy of the terminal equipment is protected.
And 2, the server distributes the updated secondary result to the selected terminal equipment for evaluation. The strategy for assigning tasks requires three conditions to be met. (1) Any update should not be assigned to its owner, i.e. the terminal device that submitted the update, for evaluation. Otherwise, it may be possible for an attacker to corrupt the detection by submitting forged evaluation results. (2) The number of analysis results assigned to the same terminal device for evaluation should not exceed a certain predetermined corresponding threshold value. If one terminal device needs to evaluate many more updates than the other terminal devices, waiting for the terminal device to submit results will cause significant delay in the round of training. (3) The communication costs should be reduced as much as possible.
The first two requirements are easier to satisfy by adding relevant constraints to the assigned functions. For the third requirement, the invention proposes a method for grouping and aggregating to form a secondary result. This is why, instead of simply processing each update independently and forming a secondary result individually, a series of updates are aggregated to form one secondary result for evaluation.
In the case of non-independent co-distribution, if a part of the terminal devices have a large number of data categories, the evaluation result may be reused in order to save the communication cost allocated to the new terminal device. This process is illustrated in fig. 2:
for secondary result wiThe server distributes all evaluation tasks to eight terminal devices, and each task class is received by four terminal devices (i.e., e-4) for evaluation. Since the first terminal device holds data of categories 2 and 4, it is assigned a task to evaluate the accuracy of the secondary result of classifying the two numbers. Then, the server only needs to select the other three terminal devices holding these data, instead of selecting among the remaining 4 terminal devices. For example, for the number 1, the evaluations would be assigned to the two, fourth, sixth and seventh terminal devices.
The terminal device evaluates the secondary result and generates a result. For the evaluation of malicious updates, the invention mainly considers two attacks. First, there is a high probability that a portion of the sample will be incorrectly classified into a particular class. Secondly, the accuracy of the overall analysis results is significantly reduced. In the invention, each terminal device receiving the detection task can use each distributed secondary result to fit the local data sample thereof, and evaluate the classification accuracy of each secondary result (for dealing with the second attack) and whether the characteristic sample has the trend of misclassifying directionally (for dealing with the first attack) or not. If there are only individual attackers in all terminal devices, it is clear that the evaluation result will have a high accuracy. But as an attacker can launch the Sybil attack, the proportion of malicious terminal equipment is increased.
In the embodiment of the invention, the proportion of the terminal equipment controlled by the attacker in all the terminal equipment is defined as p, pevdThe probability that a secondary result of a poisoning is assigned to t malicious terminal devices is calculated according to the following formula:
Figure BDA0002823519090000051
where K is the total number of updates, u is the number of updates per group of aggregates, i is used to identify the i-th update in the aggregate, and i is 1,2, … u, e is the number of users assigned to each sub-model.
For example, it is assumed that the server selects 100 terminal devices for detection in the current round, wherein 10 malicious terminal devices are included, and each update is delegated to 3 terminal devices. Calculate p from the aboveevdThe probability that a secondary result comprising a poisoning update is delegated to a malicious terminal device is 0.12. The probability that it can evade detection, i.e. the probability that all 3 clients evaluating updates are malicious is 0.0003, which can be ignored. Finding only a fraction of the total poisoned updates is in fact sufficient to mitigate contamination attacks. The present invention is therefore effective even if an attacker breaks multiple clients.
And 3, the server collects the feedback of the terminal equipment executing the evaluation task, and aggregates the global analysis result according to the feedback. Typically, the updating of the analysis results is achieved by averaging the values of all uploaded results. The invention uses a weighted average method, determines the weight occupied by the update through the evaluation result, and reduces the influence of the malicious update on the global analysis result. The invention adopts a majority voting strategy, the terminal device expresses the evaluation result in a binary matrix form, the classification of the detected analysis result on the related data sample is accurate by 1, otherwise, the classification is expressed by 0. The server may sum the matrices to calculate the number of times an update is judged to be potentially malicious.
Step 4, in order to adjust the weight of the update, the invention adds a penalty coefficient to each update according to the count, and the smaller the coefficient, the lower the duty ratio of the update in aggregation. Due to updated potential false positive reports, such coefficients should be as small as possible when the count is low. Based on the returned results, the present invention designs a hierarchical method to determine the coefficient c. According to the three points (1), when the report has higher credibility and the abnormal report is increased, the coefficient should be greatly reduced; (2) if more than half of the terminal devices report the same secondary result with an exception, it should be discarded during the aggregation process; (3) if only one customer reports an anomaly, the penalty factor should not be too large, since the result may be inaccurate, so a weighting factor is set that is positively correlated to the total number of terminal devices evaluating the secondary result and negatively correlated to the number of potentially malicious participants, the calculation method being as follows:
Figure BDA0002823519090000061
wherein e refers to the number of terminal devices evaluating the result,
Figure BDA0002823519090000062
refers to reporting the number of devices that are updated as potentially malicious. When only one terminal device reports that the ith update is malicious, i.e.
Figure BDA0002823519090000063
Time, penalty coefficient
Figure BDA0002823519090000064
Is initialized to 0.5;
step 5, the process of secondary result aggregation is shown in the expression:
Figure BDA0002823519090000065
defining t as the number of rounds that have been updated, d as the number of terminal devices,wt+1aggregating received updated global analysis results of round t +1, w, for server groupstAs a global result of the server's round t,
Figure BDA0002823519090000066
indicating secondary results to be applied to the ith terminal device of round t +1
Figure BDA0002823519090000067
The penalty factor of (2). The new global analysis result is obtained by adding the sum of the products of all the secondary results containing the update and the corresponding penalty coefficients to the original global analysis result.
The novel defense method provided by the invention is used for completing the optimization process of the analysis result. The above process is repeated for a plurality of times, and the final analysis result can be obtained.
Referring to FIG. 1, the example illustrates a typical statistical optimization procedure, defining ΔiAnd submitting the analysis result to the server for the ith terminal equipment participating in the training process. WiAggregating the received updated formed ith secondary result for the server group. c. CiRepresents the addition of a secondary result WiThe penalty factor of (2). The corresponding signal transmission process is as follows:
firstly, a user trains a model locally and uploads the model to a server
② the server aggregates all updates to generate sub-models
Thirdly, the server sends the sub-model to the user
Fourthly, the user evaluates the received submodel and generates a report
The user returns the report to the server
Sixth, the server calculates punishment coefficient for all sub-models
The server carries out the aggregation of the global model and obtains each updated secondary result W received by the serveriAnd corresponding penalty coefficient ciSum of products W
For the sake of reference, the following data collected between different devices provides an implementation process for independent and non-independent distribution:
under the condition that the collected data are independently distributed, the method comprises the following steps:
step 1, after collecting K updates of all terminal devices in the round of training, a server randomly divides the updates into d groups, wherein the number of the updates in each group is u-K/d, and the u updates in each group are aggregated into d secondary results;
step 2, in the t round of training, for each secondary result, a set S of all terminal devices is selected according to a preset ruletRandomly selecting e terminal devices, authorizing the terminal devices, and distributing secondary results to the terminal devices for detection; each terminal device will evaluate m (m)<d) A plurality of secondary results;
and 3, collecting the evaluation result of the secondary result by the terminal equipment by the server. The invention adopts a majority voting mode, and each terminal device is required to submit a binary matrix to report the performance of each evaluated secondary result; this binary matrix needs to state whether the data samples of the relevant category are accurately classified; through the binary matrixes, the server can easily count the times that a certain update is judged to have potential malicious behaviors;
step 4, after the possible malicious degree of the result uploaded by each terminal is obtained, the server adjusts the weight of each update when the average is calculated later so as to achieve the purpose of reducing the influence of the malicious result on the overall analysis result; in order to realize scientific adjustment of the adjustment weight, a penalty coefficient is added to each update based on the number of times that each update is judged to be malicious; the invention discloses a layering method, which utilizes a penalty function according to data returned by observation
Figure BDA0002823519090000071
Calculating the penalty coefficient of the ith update of the round t
Figure BDA0002823519090000072
Where e refers to the number of terminal devices evaluating the secondary result.
Figure BDA0002823519090000073
This refers to the round of t reporting the number of potentially malicious terminal devices for the ith update. When only one terminal device reports that the ith update is malicious, i.e. when the terminal device reports that the ith update is malicious
Figure BDA0002823519090000074
Time, penalty coefficient
Figure BDA0002823519090000075
Is initialized to 0.5;
and 5, calculating an average value by using the updated calculated average value after the weight is adjusted, and aggregating the analysis results. The formulation of the aggregation of the analysis results in the invention is described as
Figure BDA0002823519090000076
wt+1Refer to the global analysis result of the t +1 th round, i.e. the updated global analysis result. w is atRefers to the global analysis result of the t-th round, i.e. the analysis result before updating. d refers to the number of secondary results aggregated by the update packet. w is ai t+1Refers to the ith secondary outcome when performing the t +1 th round of training. c. Ci t+1The penalty coefficient corresponding to the secondary result is updated when the training of the (t + 1) th round is carried out.
Under the condition of non-independent and same distribution, the method comprises the following steps:
step 1, forming a secondary result to be allocated to the detection of the terminal device. Due to the imbalance of terminal device datasets across different categories, the choice of parameters when aggregating to form secondary results using non-independently distributed data is different from before. The value of d (the number of secondary results formed) used herein is determined by special calculation; the server firstly summarizes binary vectors obtained from the terminal equipment, wherein each vector represents whether the equipment holds the collected data of the category or whether the analysis result is updated on the category; then, the server arranges the results in a descending order to obtain a minimum value, namely, all the updates comprise the data category with the least times; taking the minimum value as d, and performing grouping aggregation on the updates to obtain d secondary results;
step 2, selecting proper terminal equipment for detecting tasks; because the data sets held by different terminal devices are not independently and uniformly distributed, some types of data samples may not exist in the data sets of some terminal devices; the invention provides a protocol for solving the problem, and secondary results are dynamically distributed to proper terminal equipment for evaluation; in the method, the terminal device firstly reports to the server whether the quantity of the local collected data on each data category exceeds a certain threshold value, so that the server can distribute the secondary result to the terminal device with a large quantity of collected data on the same data category for evaluation; in each round of updating, the server performs the process once, and selects a batch of new terminal equipment according to the updating result to realize the dynamic allocation of the secondary result;
in order to reasonably distribute the detection and evaluation tasks of the secondary results, the inequality of the collected data types of the terminal equipment can be utilized, more evaluation tasks are distributed to the terminal equipment with more data types, and the effect of reducing the communication cost is achieved;
step 3-5, the server aggregates the updating of the terminal equipment to form a new overall analysis result; since the operations in the update weight adjustment and aggregation process are the same as those in the case of uniform independence, please refer to steps 3-5 in the above cases of independence and equal distribution, which is not described herein again.
In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (6)

1. A novel data pollution attack defense method for an Internet of things system is characterized by comprising the following steps:
step 1, after collecting K updates of all terminals in the current round of training, a server randomly divides the updates into d groups, wherein the update number of each group is u-K/d; forming d secondary results by aggregating the u updates for each group; wherein d is a preset numerical value;
step 2, setting the tth round for optimizing the result, and for each secondary result, collecting S from all terminals according to a preset ruletIn the method, e terminals are randomly selected and authorized, and each terminal detects and evaluates m received secondary results, m<d;
Step 3, the server collects the evaluation reports of the terminal to the secondary results, wherein a majority voting mode is adopted, each terminal is required to submit a binary matrix to report the performance of each evaluated result, and the binary matrix declares whether the data samples of the related types accurately fit the corresponding statistical results; the server counts the times that a certain update is judged to be potentially malicious through the collected binary matrix;
step 4, after the possible maliciousness degree evaluation of each update is obtained, the server adjusts the weight of each result in the later average calculation process, and a penalty coefficient is added to each update based on the number of times that each update is judged to be malicious;
step 5, calculating an average value by using the penalty coefficient as a weight, aggregating the secondary results as follows,
Figure RE-FDA0003136988690000011
wherein t is defined as the updated round number, d represents the number of terminal devices, and wt+1Is divided into serversGroup aggregate received global analysis results of the t +1 th round formed after update, wtAs a global result of the server's round t,
Figure RE-FDA0003136988690000012
indicating secondary results to be applied to the ith terminal device of round t +1
Figure RE-FDA0003136988690000013
The penalty factor of (2).
2. The novel data pollution attack defense method for the internet of things system according to claim 1, characterized in that: calculating the ith updated penalty coefficient in the round by a penalty function according to the data returned by observation by adopting a hierarchical method
Figure RE-FDA0003136988690000014
3. The novel data pollution attack defense method for the internet of things system according to claim 1 or 2, characterized in that: for use in an independent co-distributed or non-independent co-distributed environment.
4. The novel data pollution attack defense method for the internet of things system according to claim 3, characterized in that: for the case of non-independent and same distribution, the following protocol is established, the secondary result is dynamically distributed to the proper terminal equipment for evaluation,
the terminal equipment reports the server firstly, whether the quantity of the local acquired data on each data category exceeds a certain threshold value or not so as to facilitate the server to distribute the secondary result to the terminal equipment with a large quantity of acquired data on the same data category for evaluation;
in each round of updating, the server performs the process once, and selects a batch of new terminal equipment according to the updating result to realize the dynamic allocation of the secondary result.
5. The novel data pollution attack defense method for the internet of things system according to claim 3, characterized in that: in step 1, for the non-independent and same-distribution condition, the server sums the binary matrixes of the recorded data set information received from the terminal equipment to obtain a matrix count value, then arranges the numerical values corresponding to each category in the count value in a descending order, and assigns the minimum numerical value to d.
6. The novel data pollution attack defense method for the internet of things system according to claim 3, characterized in that: for the case of non-independent and same distribution, the following protocol is established, the secondary result is dynamically distributed to the proper terminal equipment for evaluation,
the terminal equipment reports the server firstly, whether the quantity of the local acquired data on each data category exceeds a certain threshold value or not so as to facilitate the server to distribute the secondary result to the terminal equipment with a large quantity of acquired data on the same data category for evaluation;
in each round of updating, the server performs the process once, and selects a batch of new terminal equipment according to the updating result to realize the dynamic allocation of the secondary result.
CN202011443807.4A 2020-12-08 2020-12-08 Novel data pollution attack defense method for Internet of things system Active CN113239351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011443807.4A CN113239351B (en) 2020-12-08 2020-12-08 Novel data pollution attack defense method for Internet of things system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011443807.4A CN113239351B (en) 2020-12-08 2020-12-08 Novel data pollution attack defense method for Internet of things system

Publications (2)

Publication Number Publication Date
CN113239351A true CN113239351A (en) 2021-08-10
CN113239351B CN113239351B (en) 2022-05-13

Family

ID=77129904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011443807.4A Active CN113239351B (en) 2020-12-08 2020-12-08 Novel data pollution attack defense method for Internet of things system

Country Status (1)

Country Link
CN (1) CN113239351B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826755A (en) * 2022-05-05 2022-07-29 烽火通信科技股份有限公司 Method and device for defending network malicious attack
CN115277039A (en) * 2022-03-18 2022-11-01 广州大学 Optimized TruthFinder method for multi-turn data poisoning attack

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114417A1 (en) * 2017-10-13 2019-04-18 Ping Identity Corporation Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
US20190215339A1 (en) * 2018-01-05 2019-07-11 Byton Limited System and method for enforcing security with a vehicle gateway
CN110008696A (en) * 2019-03-29 2019-07-12 武汉大学 A kind of user data Rebuilding Attack method towards the study of depth federation
CN110442457A (en) * 2019-08-12 2019-11-12 北京大学深圳研究生院 Model training method, device and server based on federation's study
CN111047658A (en) * 2019-11-29 2020-04-21 武汉大学 Compression-resistant antagonistic image generation method for deep neural network
CN111695674A (en) * 2020-05-14 2020-09-22 平安科技(深圳)有限公司 Federal learning method and device, computer equipment and readable storage medium
CN111753319A (en) * 2020-06-22 2020-10-09 上海富数科技有限公司 Method for realizing data exploratory analysis processing based on federal learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114417A1 (en) * 2017-10-13 2019-04-18 Ping Identity Corporation Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
US20190215339A1 (en) * 2018-01-05 2019-07-11 Byton Limited System and method for enforcing security with a vehicle gateway
CN110008696A (en) * 2019-03-29 2019-07-12 武汉大学 A kind of user data Rebuilding Attack method towards the study of depth federation
CN110442457A (en) * 2019-08-12 2019-11-12 北京大学深圳研究生院 Model training method, device and server based on federation's study
CN111047658A (en) * 2019-11-29 2020-04-21 武汉大学 Compression-resistant antagonistic image generation method for deep neural network
CN111695674A (en) * 2020-05-14 2020-09-22 平安科技(深圳)有限公司 Federal learning method and device, computer equipment and readable storage medium
CN111753319A (en) * 2020-06-22 2020-10-09 上海富数科技有限公司 Method for realizing data exploratory analysis processing based on federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MENGKAI SONG 等: "Analyzing_User-Level_Privacy_Attack_Against_Federated_Learning", 《IEEE》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277039A (en) * 2022-03-18 2022-11-01 广州大学 Optimized TruthFinder method for multi-turn data poisoning attack
CN115277039B (en) * 2022-03-18 2023-12-12 广州大学 Optimized TruthFinder defense method for multi-round data poisoning attack
CN114826755A (en) * 2022-05-05 2022-07-29 烽火通信科技股份有限公司 Method and device for defending network malicious attack
CN114826755B (en) * 2022-05-05 2023-12-01 烽火通信科技股份有限公司 Method and device for defending network malicious attack

Also Published As

Publication number Publication date
CN113239351B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN113239351B (en) Novel data pollution attack defense method for Internet of things system
CN109118779B (en) Traffic violation information identification method, equipment and readable storage medium
WO2021004033A1 (en) Quantified secure access policy selection method for terminal at edge computing side
CN112381428A (en) Business allocation method, device, equipment and storage medium based on reinforcement learning
CN113806735A (en) Execution and evaluation dual-network personalized federal learning intrusion detection method and system
CN110602062B (en) Network active defense method and device based on reinforcement learning
CN112801670B (en) Risk assessment method and device for payment operation
CN113691594B (en) Method for solving data imbalance problem in federal learning based on second derivative
CN111428885B (en) User indexing method in federated learning and federated learning device
CN110874638B (en) Behavior analysis-oriented meta-knowledge federation method, device, electronic equipment and system
CN117272306A (en) Federal learning half-target poisoning attack method and system based on alternate minimization
CN114863226A (en) Network physical system intrusion detection method
CN112039704A (en) Information system risk assessment method based on risk propagation
CN115695025A (en) Training method and device of network security situation prediction model
CN115481441A (en) Difference privacy protection method and device for federal learning
CN110322261B (en) Method, device and computer readable storage medium for monitoring resource acquisition
CN113596001B (en) DDoS attack detection method, device, equipment and computer readable storage medium
CN112494935B (en) Cloud game platform pooling method, electronic equipment and storage medium
Kim et al. P2P computing for trusted networking of personalized IoT services
Sharma et al. Flair: Defense against model poisoning attack in federated learning
CN113132398A (en) Array honeypot system defense strategy prediction method based on Q learning
CN114978550B (en) Trusted data perception method based on historical data backtracking
CN113392141B (en) Distributed data multi-class logistic regression method and device for resisting spoofing attack
CN112733170B (en) Active trust evaluation method based on evidence sequence extraction
Balagura et al. Mathematical models of cognitive interaction identification in the social networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant