CN117009960B - Data security cleaning method based on artificial intelligence - Google Patents

Data security cleaning method based on artificial intelligence Download PDF

Info

Publication number
CN117009960B
CN117009960B CN202311102120.8A CN202311102120A CN117009960B CN 117009960 B CN117009960 B CN 117009960B CN 202311102120 A CN202311102120 A CN 202311102120A CN 117009960 B CN117009960 B CN 117009960B
Authority
CN
China
Prior art keywords
node
data
primary
global data
primary node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311102120.8A
Other languages
Chinese (zh)
Other versions
CN117009960A (en
Inventor
辛继胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Industry Technical College
Original Assignee
Guangdong Industry Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Industry Technical College filed Critical Guangdong Industry Technical College
Priority to CN202311102120.8A priority Critical patent/CN117009960B/en
Publication of CN117009960A publication Critical patent/CN117009960A/en
Application granted granted Critical
Publication of CN117009960B publication Critical patent/CN117009960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/556Detecting local intrusion or implementing counter-measures involving covert channels, i.e. data leakage between processes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data cleaning, in particular to a data security cleaning method based on artificial intelligence, which comprises the following steps: the primary node cleans the data stored by the primary node and reports a cleaning task to the secondary node; after the primary node is cleaned, sending the cleaning completion information of the current cleaning task to the secondary node; after the primary node receives the local data file to be integrated sent by the secondary node, integrating and updating the local data file to be integrated, and then sending a first global data request to the secondary node managing the primary node; and after the primary node receives the remaining global data files to be integrated, integrating and updating the remaining global data files to form a global data file of the cleaning task, and then sending the global data file to a data total node through the secondary node. According to the data security cleaning method based on the artificial intelligence, distributed data cleaning technology can be implemented more safely.

Description

Data security cleaning method based on artificial intelligence
Technical Field
The application relates to the technical field of data cleaning, in particular to a data security cleaning method based on artificial intelligence.
Background
Data cleansing refers to the inspection, modification, conversion, and integration of data collected from different sources to improve the quality and usability of the data. The purpose of data cleansing is to eliminate errors, duplicates, inconsistencies, and deletions in the data for efficient data analysis and mining.
The data capacity of a single machine is often difficult to meet due to the huge amount of data on the internet. Thus, distributed data cleansing techniques have evolved that utilize computing resources of multiple nodes to execute on different nodes, thereby improving the efficiency and effectiveness of data cleansing.
However, distributed data cleansing techniques also present problems and challenges that render it currently impossible to securely implement distributed data cleansing techniques.
Disclosure of Invention
In order to solve the technical problems or at least partially solve the technical problems, the application provides a data security cleaning method based on artificial intelligence, which can implement distributed data cleaning technology more safely.
In a first aspect, the present application provides an artificial intelligence based data security cleaning method, performed by a primary node, comprising:
the primary node performs data cleaning on the data stored by the primary node and reports a cleaning task to the secondary node which manages the primary node;
after the primary node finishes cleaning, sending the cleaning finishing information of the current cleaning task to a secondary node managing the primary node so as to request the local data to be integrated files of the same cleaning task received by the secondary node;
after the primary node receives the local data to-be-integrated file sent by the secondary node, integrating and updating the local data to-be-integrated file according to the data cleaned by the primary node to form a global data to-be-integrated file;
after the integration updating is completed to form the global data to-be-integrated file, a first global data request is sent to a secondary node managing the primary node, so that the secondary node requests to a data master node to acquire the rest global data to-be-integrated files of the cleaning task; the data total node examines the primary node through a preset request window identification model to determine whether to send other global data integration files;
after the primary node receives the rest of global data files to be integrated, integrating and updating the rest of global data files according to the global data files to be integrated formed by the primary node so as to form global data files of the cleaning task;
and after the integrated updating is completed to form the global data file, the global data file is sent to a data total node through a secondary node.
Optionally, the method further comprises:
after receiving a first local data transfer instruction sent by a second-level node, the first-level node forms a local data to-be-integrated file from the data which is cleaned by the first-level node and sends the local data to the second-level node;
and after receiving a second local data transfer instruction sent by the secondary node, the primary node sends the global data file to be integrated to the secondary node.
In a second aspect, the present application provides an artificial intelligence based data security cleaning method, performed by a secondary node, comprising:
the second-level node reports the cleaning tasks included by all the first-level nodes managed and controlled by the second-level node to the data master node;
after the second-level node receives the current cleaning task cleaning completion information sent by the first-level node, judging whether the other first-level nodes under the current second-level node have the same cleaning task unfinished as the first-level node;
if the primary node is the last completed primary node in the same cleaning task under the secondary node, the secondary node sends the local data to be integrated file of the same cleaning task received by the secondary node to the primary node;
after receiving the first global data request, the secondary node sends a second global data request to the data master node, wherein the second global data request comprises a primary node identifier for initiating the global data request;
and after the secondary node receives the remaining global data to-be-integrated files sent by the data total node, forwarding the remaining global data to-be-integrated files to the primary node controlled by the secondary node according to a preset rule.
Optionally, the method further comprises:
after the secondary node receives the wind control signal sent by the data master node, the secondary node sends a second local data transfer instruction to the wind-controlled primary node so as to obtain a global data to-be-integrated file of the wind-controlled primary node;
and the secondary node is used for managing and controlling an isolation node special for isolation, and forwarding the global data to-be-integrated file of the wind-controlled primary node to the isolation node for isolation.
Optionally, the method further comprises:
after receiving the global data transfer instruction sent by the data master node, the secondary node sends a second local data transfer instruction to the primary node sending the global data merging request so as to obtain a global data to-be-integrated file of the primary node, and forwards the global data to-be-integrated file to the data master node.
Optionally, forwarding the remaining global data to-be-integrated file to the primary node controlled by the global data to-be-integrated file according to a preset rule includes:
acquiring the number of times of sending global data requests by a primary node sending global data requests at present, and comparing the number of times of sending global data requests by the primary node of the secondary node on average;
when the difference value of the times of sending the global data request by the first-level node sending the global data request relative to the times of average sending the global data request by the other first-level nodes is higher than a preset value, sending a second local data transfer instruction to the first-level node sending the global data request, and extracting data in the first-level node sending the global data request; transmitting the global data to-be-integrated file in the data total node and the primary node to other primary nodes managed by the secondary node for global data merging processing;
otherwise, the rest global data to-be-integrated files sent by the data master node are directly forwarded to the primary node sending the global data request.
In a third aspect, the present application provides an artificial intelligence based data security cleaning method, performed by a data master node, including:
the data total node obtains server operation data of the primary node through the transparent monitoring layer;
when the data master node receives a second global data request sent by the secondary node, the data control node identifies server operation data of the primary node initiating the global data request through a preset request window identification model so as to judge whether the primary node initiating the global data request is in a data request window period;
if the primary node is judged to be in the data request window period, the data master node checks whether the primary node initiating the global data request is the last primary node completing the same cleaning task in the global;
if the primary node is judged to be the last primary node which completes the same cleaning task in all primary nodes, the data master node sends the remaining global data to be integrated files of the same cleaning task received by the data master node to the secondary node which manages the primary node.
Optionally, the method further comprises: if the first-level node is not the last one of all the first-level nodes to finish the same cleaning task;
then a global data transfer instruction is sent to the secondary node.
Optionally, the method further comprises:
if the primary node is judged not to be in the data request window period, the data master node sends a risk warning signal to a worker and sends a wind control signal to the secondary node to indicate the wind-controlled primary node.
Optionally, the server operation data includes: the method comprises the steps of presetting the change of the disk read-write occupancy rate, the change of the network uploading rate, the change of the network downloading rate and the change of the CPU occupancy rate in a time window;
inputting the data into a request window identification model to identify whether the primary node is in a state of waiting for data after cleaning;
when the primary node is identified to be in a state of waiting for data after cleaning, the primary node is judged to be in a data request window period.
Compared with the prior art, the technical scheme provided by the application has the following advantages:
in the prior art, two main stream data cleaning methods exist, namely, data are concentrated in the same server node for processing; and secondly, connecting a plurality of nodes into a main database so as to realize the scheduling of data among the nodes, wherein each node has all rights to access the main database and can know the data of other nodes.
But because the identity of the content provider on the internet is unknown, the data obtained from the whole network may contain camouflaged malicious programs such as viruses, trojans, worms, etc.
If the prior art directly cleans network data of the whole network, all stored data can be completely leaked and all nodes can be finally controlled by malicious programs as long as a certain node triggers the malicious programs to be maliciously controlled.
According to the data security cleaning method based on artificial intelligence, each primary node for executing data cleaning is used for data cleaning based on data stored in the primary node, and the secondary node and the data total node for not executing data cleaning tasks execute data scheduling.
Therefore, for the primary node performing the data cleaning task, even if the primary node is controlled by the malicious program, since the primary node does not know what data the other nodes have, the scope of data leakage is only controlled within the scope of the primary node controlled by the malicious program, and the other data in the global system cannot be leaked. Moreover, the wind control is carried out on the primary nodes which request the data except the secondary nodes through the data total node through the preset data window identification model, so that the controlled primary nodes are prevented from cheating the data from the data total node through the mode of fictitious data cleaning task.
Therefore, the data security cleaning method based on artificial intelligence provided by the embodiment of the application can implement the distributed data cleaning technology more safely.
Drawings
FIG. 1 is one of the signaling diagrams of an artificial intelligence based data security method provided in an embodiment of the present application;
FIG. 2 is a second signaling diagram of an artificial intelligence based data security method according to an embodiment of the present application;
FIG. 3 is a third signaling diagram of an artificial intelligence based data security method according to an embodiment of the present application;
FIG. 4 is a fourth signaling diagram of an artificial intelligence based data security method according to an embodiment of the present application;
fig. 5 is a fifth signaling diagram of an artificial intelligence based data security method according to an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described below with reference to the accompanying drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
The system structure for data cleaning in the embodiment of the application comprises: primary node, secondary node and data total node.
The security policy of the secondary node is configured to communicate only with the primary node and the data control node it manages.
Wherein each primary node communicates with only one of the secondary nodes, i.e., each primary node is attributed to only one secondary node. Each secondary node is capable of managing a plurality of primary nodes.
Preferably, the second-level node and the first-level node managed by the second-level node are located in the same local area network, and the data security cleaning method based on artificial intelligence provided by the application is combined, so that the most frequent transfer of local data files to be integrated mainly occurs in the local area network, and the data transfer speed of the internal network is faster than that of the external network because the upper limit of the network speed of the internal network cannot be limited by the network broadband of the external network, thereby improving the data cleaning speed of the whole distributed cleaning system.
The plurality of secondary nodes and the data total node complete communication through the VPN network.
As shown in fig. 1, the data security cleaning method based on artificial intelligence provided in the embodiment of the application includes:
the first-level node reports the cleaning task executed by the first-level node to the second-level node for controlling the first-level node;
the secondary node reports the cleaning tasks included by all primary nodes managed and controlled by the secondary node to the data master node.
The cleaning task is determined by network data stored in the primary node. For example, for a text resource of a novel crawled from a network, the cleaning tasks may be divided by the novel names, the same novel name is the same cleaning task, and the novel names are used as identifiers for distinguishing different cleaning tasks. The cleaning tasks can also be divided by a website providing the webpage data, and the website name is used as an identification for distinguishing different cleaning tasks.
The data total node obtains server operation data of the primary node through the transparent monitoring layer.
Specifically, the transparent monitoring layer is configured in each primary node to obtain server operation data of the primary node. In the embodiment of the application, the transparent monitoring layer is a Zabbix monitoring tool, which can monitor server operation data of the primary node and changes thereof, and send the server operation data to the data total node.
And the primary node performs data cleaning on the data stored by the primary node. The data stored by the primary node is network data which the primary node climbs from the internet in a pre-data acquisition stage, or the data stored by the primary node can be obtained by dividing all network data into a plurality of file packets in advance and uploading the file packets to different primary nodes respectively.
The data cleaning method includes, but is not limited to, extracting web page text content from a web page file through XPath expressions, removing advertisements or watermark characters in the web page text content through regular expressions, and performing duplication removal on the web page text content through pandas data processing library.
And after the primary node finishes the current data cleaning task, sending the cleaning completion information of the current cleaning task to the secondary node managing the primary node.
The second-level node can firstly judge whether the other first-level nodes under the current second-level node have the same cleaning task unfinished as the first-level node. The same cleaning task in the embodiment of the present application refers to the same identification of the cleaning task.
Referring to fig. 2, if it is determined that the primary node is not the last completed primary node in the same cleaning task under the secondary node, a first local data transfer instruction is sent to the primary node.
After receiving the first local data transfer instruction, the primary node sends the data which is cleaned by the primary node to the secondary node to form a local data to-be-integrated file
And the secondary node stores the files to be integrated of the local data according to the identification of the cleaning task.
If the primary node is judged to be the last primary node completed in the same cleaning task under the secondary node, the local data to be integrated files of all the same cleaning tasks stored by the secondary node are sent to the primary node.
And after the primary node receives the local data to-be-integrated file sent by the secondary node, integrating and updating the local data to-be-integrated file according to the data cleaned by the primary node to form a global data to-be-integrated file.
And after the integration updating is completed, forming the global data file to be integrated, sending a first global data request to a secondary node managing the primary node. And after the second-level node receives the first global data request, sending a second global data request to the data master node, wherein the second global data request comprises a first-level node identifier for initiating the global data request.
And after the data total node receives the second global data request sent by the secondary node, obtaining a primary node identifier for initiating the global data request by analyzing the second global data request.
And according to the primary node identification, the data control node identifies server operation data of the primary node initiating the global data request through a preset request window identification model so as to judge whether the primary node initiating the global data request is in a data request window period.
Specifically, the server operation data includes: the method comprises the steps of presetting the change of the disk read-write occupancy rate, the change of the network uploading rate, the change of the network downloading rate and the change of the CPU occupancy rate in a time window;
inputting the data into a request window identification model to identify whether the primary node is in a state of waiting for data after cleaning;
when the primary node is identified to be in a state of waiting for data after cleaning, the primary node is judged to be in a data request window period.
Specifically, the preset request window identification model is obtained by the following method:
s1, data preparation: and collecting and sorting the change of the disk read-write occupancy rate of the server, the change of the network uploading rate, the change of the network downloading rate and the change of the CPU occupancy rate in the time length of a preset time window under different states.
The different states include "wait after flush data", "data transfer elsewhere to the level one node" or "data flush".
In this embodiment of the present application, the preset time window refers to within 10 minutes before a time point when the primary node issues the global data request.
S2, extracting features: useful features are extracted from the data, which in embodiments of the present application include: rate and trend of server data change over a time window. These features will be used as inputs to the model.
S3, data division: the data set is divided into a training set and a test set. In the embodiment of the application, the data set is divided into a training set and a testing set according to the proportion of 8:2.
S4, constructing an LSTM model: and constructing a network structure of the LSTM model. LSTM is a recurrent neural network suitable for processing sequence data.
S5, training and adjusting a model: the model is trained using a training set, and performance of the model is assessed using a test set. Accurate state prediction can be made by adjusting the hyper-parameters of the model until the request window obtained by training identifies the model.
Referring to fig. 3, if the request window recognition model determines that the primary node is not in the data request window period, the data master node sends a risk warning signal to the staff and sends a wind control signal to the secondary node to indicate the primary node that is wind controlled.
After receiving the wind control signal sent by the data master node, the secondary node sends a second local data transfer instruction to the wind-controlled primary node so as to obtain a global data to-be-integrated file of the wind-controlled primary node;
and the secondary node is used for managing and controlling an isolation node special for isolation, and forwarding the global data to-be-integrated file of the wind-controlled primary node to the isolation node for isolation.
The method has the advantages that whether the primary node is a global data request initiated after normal flow can be judged through the request window identification model. In the data security cleaning method based on artificial intelligence provided in the embodiment of the application, if the primary node is controlled by a malicious program to want to acquire data of other nodes, the data in other nodes is spoofed by directly sending global data requests forged into other cleaning tasks to the secondary node or directly sending global data requests to the data master node.
The abnormal behaviors can be identified through a preset request window identification model, because the primary node can possibly send out a global data request only when the primary node is in a state of waiting for data after cleaning, and can not send out the global data request when the primary node is in other states.
The primary node controlled by the malicious program directly sends global data requests forged into other cleaning tasks to the secondary node or directly sends global data requests forged into other cleaning tasks to the data master node, and the running state of a server of the primary node is not changed, so that the data of other nodes can be identified and prevented from being cheated by the primary node controlled by the malicious program through the identification model for the running request window of the data master node.
Referring to fig. 4, if it is determined that the primary node is not the last primary node that completes the same cleaning task in all primary nodes, the data master node issues a global data transfer instruction to the secondary nodes.
After receiving the global data transfer instruction sent by the data master node, the secondary node sends a second local data transfer instruction to the primary node sending the global data merging request so as to obtain a global data to-be-integrated file of the primary node, and forwards the global data to-be-integrated file to the data master node.
If the primary node is judged to be the last primary node which completes the same cleaning task in all primary nodes, the data master node sends the remaining global data to be integrated files of the same cleaning task received by the data master node to the secondary node which manages the primary node.
And after the secondary node receives the rest global data to be integrated files sent by the data total node, acquiring the times of sending global data requests by the primary node sending global data requests currently, and comparing the times with the times of average sending global data requests by the primary node of the secondary node.
When the difference between the times of sending global data requests by the primary node sending the global data requests and the times of sending global data requests by the other primary nodes on average is lower than a preset value.
And the secondary node directly forwards the rest global data to-be-integrated files sent by the data master node to the primary node sending the global data request.
And after the primary node receives the rest of global data files to be integrated, integrating and updating the rest of global data files according to the global data files to be integrated formed by the primary node so as to form the global data file of the cleaning task.
And after the integrated updating is completed to form the global data file, the global data file is sent to a secondary node.
And after receiving the global data file, the secondary node sends the global data file to a data master node for warehousing and storage.
Referring to fig. 5, when the difference between the number of times that the primary node sending the global data request sends the global data request and the number of times that the remaining primary nodes send the global data request on average is higher than a preset value, a second local data transfer instruction is sent to the primary node sending the global data request, and data in the primary node sending the global data request is extracted; and sending the global data to-be-integrated file in the data total node and the primary node to other primary nodes managed by the secondary node for merging global data.
The method has the advantages that the method is inevitably required to be carried out by a certain primary node as long as the primary node executes the request global data to be integrated file and forms the global data file, the global data file is necessarily the primary node which is completed last in the cleaning task, and therefore the subsequent data cleaning task of the primary node is necessarily delayed. This results in the primary node always being the last to complete the same cleaning task, so that the primary node is always responsible for the merging of global data. The global data file is mostly transmitted to the primary node, and if the primary node is controlled by a malicious program, most data can be stolen by utilizing the vulnerability.
By means of the scheduling of the secondary nodes, the situation that the same node always completes the request and update combination of the global data files to be integrated can be avoided, and most of data can be prevented from being stolen through a certain primary node.
In general, for the data security cleaning method based on artificial intelligence provided by the application, the method has the advantages that each primary node performing data cleaning is data cleaning based on data stored by the primary node, and the secondary node and the data total node which do not perform data cleaning tasks perform data scheduling.
Therefore, for the primary node performing the data cleaning task, even if the primary node is controlled by the malicious program, since the primary node does not know what data the other nodes have, the scope of data leakage is only controlled within the scope of the primary node controlled by the malicious program, and the other data in the global system cannot be leaked. Moreover, the primary nodes requesting data except the secondary nodes to which the primary nodes belong are wind-controlled through the data total nodes through the preset data window identification model, and the controlled primary nodes are prevented from cheating data from the data total nodes through the mode of fictitious data cleaning tasks.
Therefore, the data security cleaning method based on artificial intelligence provided by the embodiment of the application can implement the distributed data cleaning technology more safely.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In addition, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Moreover, in the description of the embodiments of the present application, "/" means or, unless otherwise indicated, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. Also, in the description of the embodiments of the present application, "plurality" means two or more than two.
The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. The data security cleaning method based on artificial intelligence is characterized by being executed by a primary node and comprising the following steps of:
the primary node performs data cleaning on the data stored by the primary node and reports a cleaning task to the secondary node which manages the primary node;
after the primary node finishes cleaning, sending the cleaning finishing information of the current cleaning task to a secondary node managing the primary node so as to request the local data to be integrated files of the same cleaning task received by the secondary node;
after the primary node receives the local data to-be-integrated file sent by the secondary node, integrating and updating the local data to-be-integrated file according to the data cleaned by the primary node to form a global data to-be-integrated file;
after the integration updating is completed to form the global data to-be-integrated file, a first global data request is sent to a secondary node managing the primary node, so that the secondary node requests to a data master node to acquire the rest global data to-be-integrated files of the cleaning task; the data total node examines the primary node through a preset request window identification model to determine whether to send other global data files to be integrated;
after the primary node receives the rest of global data to be integrated files, integrating and updating the rest of global data to be integrated files according to the global data to be integrated files formed by the primary node so as to form global data files of the cleaning task;
after the integration updating is completed to form the global data file, the global data file is sent to a data total node through a secondary node;
after receiving a first local data transfer instruction sent by a second-level node, the first-level node forms a local data to-be-integrated file from the data which is cleaned by the first-level node and sends the local data to the second-level node;
and after receiving a second local data transfer instruction sent by the secondary node, the primary node sends the global data file to be integrated to the secondary node.
2. The data security cleaning method based on artificial intelligence is characterized by being executed by a secondary node and comprising the following steps of:
the second-level node reports the cleaning tasks included by all the first-level nodes managed and controlled by the second-level node to the data master node;
after the second-level node receives the current cleaning task cleaning completion information sent by the first-level node, judging whether the other first-level nodes under the current second-level node have the same cleaning task unfinished as the first-level node;
if the primary node is the last completed primary node in the same cleaning task under the secondary node, the secondary node sends the local data to be integrated file of the same cleaning task received by the secondary node to the primary node;
after receiving the first global data request, the secondary node sends a second global data request to the data master node, wherein the second global data request comprises a primary node identifier for initiating the first global data request;
after receiving the rest global data to-be-integrated files sent by the data total node, the secondary node forwards the rest global data to-be-integrated files to the primary node controlled by the secondary node according to a preset rule;
after receiving the global data transfer instruction sent by the data master node, the secondary node sends a second local data transfer instruction to the primary node sending the global data merging request so as to obtain a global data to-be-integrated file of the primary node, and forwards the global data to-be-integrated file to the data master node.
3. The artificial intelligence based data security cleaning method of claim 2, further comprising:
after the secondary node receives the wind control signal sent by the data master node, the secondary node sends a second local data transfer instruction to the wind-controlled primary node so as to obtain a global data to-be-integrated file of the wind-controlled primary node;
and the secondary node is used for managing and controlling an isolation node special for isolation, and forwarding the global data to-be-integrated file of the wind-controlled primary node to the isolation node for isolation.
4. The method for cleaning data security based on artificial intelligence according to claim 2, wherein forwarding the remaining global data to-be-integrated file to the primary node controlled by the same according to a preset rule comprises:
acquiring the number of times of sending the first global data request by a primary node sending the first global data request at present, and comparing the number of times of sending the first global data request by the primary node of the secondary node on average;
when the difference value of the times of sending the first global data request by the first-level node sending the first global data request relative to the times of averagely sending the first global data request by the other first-level nodes is higher than a preset value, sending a second local data transfer instruction to the first-level node sending the first global data request, and extracting data in the first-level node sending the first global data request; transmitting the global data to-be-integrated file in the data total node and the primary node to other primary nodes managed by the secondary node for global data merging processing;
otherwise, the rest global data to-be-integrated files sent by the data total node are directly forwarded to the primary node sending the first global data request.
5. The data security cleaning method based on artificial intelligence is characterized by being executed by a data total node and comprising the following steps of:
the data total node obtains server operation data of the primary node through the transparent monitoring layer;
when the data master node receives a second global data request sent by the secondary node, the data control node identifies server operation data of the primary node initiating the global data request through a preset request window identification model so as to judge whether the primary node initiating the global data request is in a data request window period;
if the primary node is judged to be in the data request window period, the data master node checks whether the primary node initiating the global data request is the last primary node completing the same cleaning task in the global;
if the primary node is judged to be the last primary node which completes the same cleaning task in all primary nodes, the data master node sends the remaining global data to be integrated files of the same cleaning task received by the data master node to the secondary node which manages the primary node;
if the first-level node is not the last one of all the first-level nodes to finish the same cleaning task;
then a global data transfer instruction is sent to the secondary node.
6. The artificial intelligence based data security cleaning method of claim 5, further comprising:
if the primary node is judged not to be in the data request window period, the data master node sends a risk warning signal to a worker and sends a wind control signal to the secondary node to indicate the wind-controlled primary node.
7. The artificial intelligence based data security cleaning method of claim 5, wherein the server operating data comprises: the method comprises the steps of presetting the change of the disk read-write occupancy rate, the change of the network uploading rate, the change of the network downloading rate and the change of the CPU occupancy rate in a time window;
inputting the data into a request window identification model to identify whether the primary node is in a state of waiting for data after cleaning;
when the primary node is identified to be in a state of waiting for data after cleaning, the primary node is judged to be in a data request window period.
CN202311102120.8A 2023-08-30 2023-08-30 Data security cleaning method based on artificial intelligence Active CN117009960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311102120.8A CN117009960B (en) 2023-08-30 2023-08-30 Data security cleaning method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311102120.8A CN117009960B (en) 2023-08-30 2023-08-30 Data security cleaning method based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN117009960A CN117009960A (en) 2023-11-07
CN117009960B true CN117009960B (en) 2024-02-02

Family

ID=88561848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311102120.8A Active CN117009960B (en) 2023-08-30 2023-08-30 Data security cleaning method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN117009960B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network
WO2020207371A1 (en) * 2019-04-08 2020-10-15 阿里巴巴集团控股有限公司 Data processing system and method, apparatus, and electronic device
CN111898706A (en) * 2020-08-24 2020-11-06 深圳市富之富信息科技有限公司 Intelligent iterative deployment method and device of model, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network
WO2020207371A1 (en) * 2019-04-08 2020-10-15 阿里巴巴集团控股有限公司 Data processing system and method, apparatus, and electronic device
CN111898706A (en) * 2020-08-24 2020-11-06 深圳市富之富信息科技有限公司 Intelligent iterative deployment method and device of model, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117009960A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
TWI689874B (en) Method and device for neural network model training and transaction behavior risk identification
CN109815704B (en) Safety detection method and system for Kubernetes cloud native application
US11550937B2 (en) Privacy trustworthiness based API access
US20130014260A1 (en) Apparatus, system, and method for preventing infection by malicious code
CN107451152B (en) Computing device, data caching and searching method and device
CN113132336A (en) Method, system and equipment for processing web crawler
CN112765660A (en) Terminal security analysis method and system based on MapReduce parallel clustering technology
RU148692U1 (en) COMPUTER SECURITY EVENTS MONITORING SYSTEM
CN115174205A (en) Network space safety real-time monitoring method, system and computer storage medium
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
Chouchen et al. Learning to predict code review completion time in modern code review
CN114185761A (en) Log collection method, device and equipment
CN117009960B (en) Data security cleaning method based on artificial intelligence
CN111368894B (en) FCBF feature selection method and application thereof in network intrusion detection
Marquardt et al. Déjà vu? Client-side fingerprinting and version detection of web application software
CN116842264A (en) Platform intelligent personalized information pushing system
CN115037561B (en) Network security detection method and system
US11403203B2 (en) Utilizing application performance management automatic discovery data for plugin priority
CN109284436B (en) Path planning method and network piracy discovery system during searching unknown information network
CN105488390B (en) A kind of apocrypha under Linux finds method and system
Mukunthan et al. Multilevel Petri net‐based ticket assignment and IT management for improved IT organization support
US20210226996A1 (en) Network Data Clustering
CN114846767A (en) Techniques for analyzing data with a device to resolve conflicts
CN114065190A (en) High-availability and high-safety algorithm automatic online test system
NL2026283B1 (en) Method and system for identifying clusters of affiliated web resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant