WO2024045642A1 - Procédé et appareil de détection d'anomalie de données - Google Patents

Procédé et appareil de détection d'anomalie de données Download PDF

Info

Publication number
WO2024045642A1
WO2024045642A1 PCT/CN2023/089425 CN2023089425W WO2024045642A1 WO 2024045642 A1 WO2024045642 A1 WO 2024045642A1 CN 2023089425 W CN2023089425 W CN 2023089425W WO 2024045642 A1 WO2024045642 A1 WO 2024045642A1
Authority
WO
WIPO (PCT)
Prior art keywords
copy
abnormal
relative
anomaly detection
data
Prior art date
Application number
PCT/CN2023/089425
Other languages
English (en)
Chinese (zh)
Inventor
王银续
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024045642A1 publication Critical patent/WO2024045642A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data anomaly detection method and device.
  • Data backup technology is widely used in the field of computer technology.
  • the currently commonly used data backup method is to generate corresponding data copies of the data to be backed up at different times.
  • the corresponding data to be backed up at this time can be restored based on the data copy at each time, thus This enables the data to be backed up to have disaster recovery capabilities.
  • all log data of a server at a certain time can be packaged into a compressed file, and this compressed file can be used as a copy of the data corresponding to this time.
  • the embodiment of the present application provides a data anomaly detection method to solve the technical problem of the lack of accurate detection means in the existing technology.
  • embodiments of the present application provide a data anomaly detection method, which method is applied to a data anomaly detection device.
  • the method includes:
  • first target data which is related to the file changes of the first copy relative to the second copy.
  • the first copy and the second copy are the same data to be backed up at the first moment and the second copy in sequence.
  • the output of the anomaly detection model is the first information; based on the first information, the abnormal state of the first copy is determined.
  • the first information can be obtained based on the knowledge that the anomaly detection model has learned and the first target data and the anomaly detection model. Therefore, the first copy can be determined more accurately. Abnormal status.
  • the first target data is file changes of the first copy relative to the second copy.
  • obtaining the first target data includes:
  • the first target data is obtained based on the file changes of the first copy relative to the second copy and the file changes of the first copy relative to the third copy at the third time. before said second moment.
  • the first target data refers to the file changes of the first copy relative to the second copy, and the file changes of the first copy relative to the third copy at the third time, through richer
  • the file changes between multiple sets of copies can more accurately indicate the changes in the first copy compared to the previous copy, so that when input into the anomaly detection model, more accurate first information can be output.
  • determining the abnormal status of the first copy based on the first information includes:
  • the first information indicates that the first copy is abnormal, it is determined that the first copy is abnormal, and the first target data It is the file changes of the first copy relative to the second copy.
  • the method further includes: obtaining second information according to the second target data and the anomaly detection model, the second target data and the second copy relative to a third time at a third time.
  • the file changes of the three copies are related.
  • the third time is before the second time.
  • Determining the abnormal status of the first copy based on the first information includes:
  • first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, If the copy is abnormal, and the second information indicates that the second copy is abnormal, it is determined that there is a possibility that the first copy is abnormal.
  • the abnormal status of the first copy refers to the first information and the second information.
  • the second information can be used as a reference for the first information, through the information output by the richer anomaly detection model, thereby The abnormal status of the first copy can be determined more accurately.
  • the method further includes:
  • Third information is obtained according to the third target data and the anomaly detection model.
  • the third target data is related to the file changes of the fourth copy at the fourth time relative to the first copy.
  • the fourth time is at After the first moment, when the input of the anomaly detection model is the third target data, the output of the anomaly detection model is the third information;
  • Determining the abnormal status of the first copy based on the first information includes:
  • the first information indicates that the first copy is abnormal, and the third information indicates that the fourth copy is abnormal, it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, If the copy is abnormal and the third information indicates that the fourth copy is normal, it is determined that there is a possibility that the first copy is abnormal.
  • the abnormal status of the first copy refers to the first information and the third information.
  • the third information can be used as a reference for the first information through the information output by the richer anomaly detection model, thereby The abnormal status of the first copy can be determined more accurately.
  • a target change value is obtained, and the target change value is related to the index change value of the first copy relative to the second copy; if the target change value is within a set value range, Otherwise, it is determined that there is a possibility of an abnormality in the first copy, and the step of obtaining the first information according to the first target data and the anomaly detection model is performed.
  • a preliminary judgment can be made based on whether the target indicator value is within the set value range.
  • further execution is performed based on the first target data and the anomaly detection model to obtain The first information step can improve the accuracy of first copy detection.
  • the target index value is an index change value of the first copy relative to the second copy.
  • the target change value is an indicator change value of the first copy relative to the second copy, or is based on an indicator change value of the first copy relative to the second copy.
  • the value is obtained by the index change value of the second copy relative to the third copy at the third time, which is before the second time.
  • the file change of the first copy relative to the second copy is any of the following: the change value of the file statistical index of the first copy relative to the second copy; The absolute value of the change value of the file statistical index of the first copy relative to the second copy; the file statistical index of the first copy relative to the second copy The rate of change; the absolute value of the rate of change of the file statistics index of the first copy relative to the second copy.
  • the anomaly detection model is obtained through machine learning training based on a training data set.
  • the training samples in the training data set include file changes between data copies and abnormal status of the data copies.
  • the training samples include file changes between data copies and abnormal status of data copies.
  • the anomaly detection model can learn the data characteristic knowledge of the abnormal status of the data copies corresponding to the file changes between the data copies, so that it can train anomaly detection for detecting the abnormal status of the data copies. Model.
  • this application provides a data anomaly detection device, including:
  • a processing module configured to obtain first target data.
  • the first target data is related to the file changes of the first copy relative to the second copy.
  • the first copy and the second copy are sequentially the same data to be backed up. Data copies of the first time and the second time, the second time being before the first time;
  • Determining module configured to obtain first information according to the first target data and anomaly detection model.
  • the input of the anomaly detection model is the first target data
  • the output of the anomaly detection model is the third One piece of information; determining the abnormal status of the first copy according to the first information.
  • the processing module is further configured to: based on file changes of the first copy relative to the second copy, files of the first copy relative to the third copy at a third time In a changing situation, the first target data is obtained, and the third time is before the second time.
  • the determination module in (1) is specifically used to:
  • the first information indicates that the first copy is abnormal, it is determined that the first copy is abnormal, and the first target data is the file changes of the first copy relative to the second copy;
  • the processing module is also used to:
  • Second information is obtained according to the second target data and the anomaly detection model.
  • the second target data is related to the file changes of the second copy relative to the third copy at a third time.
  • the third time is at Before the second moment, when the input of the anomaly detection model is the second target data, the output of the anomaly detection model is the second information;
  • the determination module is specifically used for:
  • first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, If the copy is abnormal, and the second information indicates that the second copy is abnormal, it is determined that there is a possibility that the first copy is abnormal.
  • the processing module is also used to:
  • Third information is obtained according to the third target data and the anomaly detection model.
  • the third target data is related to the file changes of the fourth copy at the fourth time relative to the first copy.
  • the fourth time is at After the first moment, when the input of the anomaly detection model is the third target data, the output of the anomaly detection model is the third information;
  • the determination module is specifically used for:
  • the first information indicates that the first copy is abnormal, and the third information indicates that the fourth copy is abnormal, then it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, abnormal, and the third information indicates that the fourth copy is normal, it is determined that there is a possibility that the first copy is abnormal.
  • the processing module is also used to:
  • the determination module is also used to:
  • the target change value is outside the set value range, determine that there is a possibility of an abnormality in the first copy, and perform the step of obtaining the first information based on the first target data and the anomaly detection model.
  • the target change value is an indicator change value of the first copy relative to the second copy, or is based on an indicator change value of the first copy relative to the second copy. , obtained by the index change value of the second copy relative to the third copy at a third time, and the third time is before the second time.
  • the file changes of the first copy relative to the second copy are any of the following:
  • the change value of the file statistical index of the first copy relative to the second copy The change value of the file statistical index of the first copy relative to the second copy; the absolute value of the change value of the file statistical index of the first copy relative to the second copy; the absolute value of the change value of the file statistical index of the first copy relative to the second copy; The change rate of the file statistical indicators of the second copy; the absolute value of the change rate of the file statistical indicators of the first copy relative to the second copy.
  • the anomaly detection model is obtained through machine learning training based on a training data set.
  • the training samples in the training data set include file changes between data copies and abnormal status of the data copies.
  • an electronic device in a third aspect, includes: one or more processors; one or more memories; wherein the one or more memories store one or more computer instructions.
  • One or more computer instructions when executed by the one or more processors, cause the electronic device to perform the method described in any one of the above first aspects.
  • a fourth aspect provides a computer-readable storage medium that includes computer instructions that, when run on a computer, cause the computer to execute the method described in any one of the above-mentioned first aspects. method.
  • this application provides a chip.
  • the chip includes a memory and a processor.
  • the memory is used to store computer instructions.
  • the processor is used to call and run the computer instructions from the memory to execute any possibility in the first aspect. Methods.
  • the present application provides a computer program product, which when a computer reads and executes the computer program product, causes the computer to perform any possible method in the above-mentioned first aspect.
  • Figure 1a is a schematic structural diagram of a neural network model
  • Figure 1b is an example diagram of a neural network model
  • Figure 2 is a schematic diagram of a system architecture applicable to the embodiment of the present application.
  • Figure 3 is a schematic flowchart of steps corresponding to a data anomaly detection method provided by an embodiment of the present application
  • Figure 4 is a schematic structural diagram of a data anomaly detection device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • multiple in the embodiments of this application means two or more.
  • plural in the embodiments of this application can also be understood as “at least two”.
  • At least one can be understood as One or more, for example, is understood to mean one, two or more.
  • including at least one means including one, two or more, and does not limit which ones are included.
  • including A, At least one of B and C then it can include A, B, C, A and B, A and C, B and C, or A and B and C.
  • “at least one” and other descriptions Understanding is similar.
  • “And/or” describes the relationship between associated objects, indicating that there can be three relationships.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. These three situations.
  • the character "/" unless otherwise specified, generally indicates that the related objects are in an "or” relationship.
  • ordinal numbers such as “first” and “second” mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the order, timing, priority or importance of multiple objects.
  • the currently commonly used data backup method is to generate corresponding data copies of the data to be backed up at different times.
  • the data to be backed up is attacked by virus infection or other methods, the corresponding data to be backed up at this time can be restored based on the data copy at each time, thus This enables the data to be backed up to have disaster recovery capabilities.
  • the data to be backed up can be in various situations.
  • the data to be backed up can be data generated by one or more devices within a preset period, such as business data generated by all devices of an enterprise in 2022, or logs accumulated by certain departments of an organization as of 2022. Data, etc.
  • the embodiment of this application can also be all data generated by a device until 2022.
  • the embodiment of this application does not limit the form of backup data.
  • the embodiment of the present application does not limit the data copy corresponding to the data to be backed up.
  • the data copy at a certain time may be the data to be backed up at this time, or it may be the compressed data of the data to be backed up at this time.
  • all log data of a server at a certain time can be packaged into a compressed file, and this compressed file can be used as a copy of the data corresponding to this time.
  • this application provides a data anomaly detection method, which can be implemented based on artificial intelligence.
  • Artificial intelligence is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, Natural language processing and expert systems, etc.
  • a data anomaly detection method provided by this application can be detected through a machine learning model.
  • the machine learning model can be a neural network. Model, the following uses the neural network model as an example.
  • the neural network model based on artificial intelligence is a machine learning model that simulates the structure of the brain. In the field of machine learning, neural network models are often used to model more complex tasks. The size of the neural network model, including depth and width, can be adjusted, depending on the application domain and problem size. Because of the super expressive ability of neural network models, they are widely used in speech recognition, image recognition, natural language processing, advertising and other application fields.
  • the structure of the neural network model includes multiple layers, the first layer is the input layer, the last layer is the output layer, the middle includes zero or more hidden layers, and each layer includes one or more nodes.
  • the input layer size is determined by the number of input variables, and the output layer size is determined by the number of categories.
  • the hidden layer includes multiple neurons. By adjusting the number of neurons, the complexity and expressive ability of the neural network model can be adjusted. Generally speaking, the wider and deeper the neural network model is, the stronger its modeling ability is, but the cost of training this model is also higher.
  • the training process of the neural network model is a process of iteratively adjusting each parameter value in the neural network model until convergence based on the input and output of the training samples. It is also called the learning process of the neural network model.
  • the anomaly detection model is obtained through machine learning training based on the training data set.
  • the training samples in the training data set include file changes between data copies and the abnormal status of the data copies.
  • the neural network model can learn the data characteristic knowledge of the file changes between the corresponding data copies when the data copies are in an abnormal state.
  • the trained neural network model is used to detect the data copies. Anomaly detection model for abnormal states.
  • the structure of the neural network model can be shown in Figure 1a.
  • the neural network model 100 has N processing layers, N ⁇ 3 and N is a natural number.
  • the first layer of the neural network model is the input layer 101, which is responsible for receiving input signals.
  • the last layer of the neural network model 100 is the output layer 103, which outputs the processing results of the neural network model.
  • the other layers except the first layer and the last layer are the intermediate layers 104. These intermediate layers together form the hidden layer 102.
  • Each middle layer can receive input signals and output signals, and the hidden layer is responsible for the processing of input signals.
  • Each layer represents a logical level of signal processing. Through multiple layers, the data signal can be processed by multi-level logic.
  • Figure 1b shows a relatively simple neural network model, which includes an input layer, two hidden layers and an output layer.
  • the input layer has three nodes, namely node A0, node A1 and node A2;
  • the first hidden layer includes two nodes, respectively node B0 and node B1;
  • the second hidden layer includes 2 nodes, They are node C0 and node C1 respectively;
  • the output layer includes a node D0.
  • the line segments connecting nodes between different layers in the neural network model are called edges.
  • Each edge has a corresponding edge weight.
  • the edge weight represents the pair of nodes close to the input layer and far away from the input layer among the two nodes connected by the corresponding edge.
  • the contribution size of the node Specifically in Figure 1b, W 0,0 represents the edge weight from node A0 of the input layer to node B0 of the first hidden layer, and U 0,0 represents the edge weight from node B0 of the first hidden layer to the second hidden layer.
  • the edge weight of node C0 of the hidden layer, V 0,0 represents the edge weight from node C0 of the second hidden layer to node D0 of the output layer.
  • the edge weight of each edge in the neural network model can also be called the parameter value of the neural network model.
  • the precision of the parameter value of the neural network model (or can also be called the precision of the neural network model) can be FP32, FP16 or other The accuracy is not specifically limited.
  • the process of training the neural network model in Figure 1b can be as follows:
  • Step 1 Randomly select a training sample from the entire training sample set.
  • the training sample contains all input features.
  • the input features can take a value within 0 or 1, or they can be floating point numbers; each training sample can have an expected output value, or There can be no expected output value.
  • Step 2 Perform forward calculation on the first hidden layer.
  • the forward calculation needs to use the edge weights of all incoming edges of the hidden layer and the activation values of lower-level nodes.
  • the incoming edge refers to the edge from the node in the lower layer to the node in the current layer.
  • the outgoing edge refers to the edge from the node in the current layer to the node in the higher layer.
  • a certain layer in the neural network model is used as the benchmark, and the layer close to the input layer The layers are called lower layers, and the layers further away from the input layer are called higher layers.
  • Step 3 Similarly, perform forward calculation on the second hidden layer.
  • Step 4 The output layer only includes one node D0, and the activation value and residual value of node D0 are calculated.
  • the residual value represents the difference between the observed value and the predicted value.
  • the residual value ED0 of node D0 can be calculated based on the expected output value of the training sample and the calculated activation value of node D0.
  • Step 5 Perform reverse calculation on the second hidden layer, calculate the residual value of each node of the hidden layer based on the residual value of the output layer and the edge weight of the outgoing edge of the second hidden layer, and adjust The edge weight of the corresponding outgoing edge.
  • the output layer has only one node D0, so the residual value ED0 of node D0 and the edge weight V 0,0 of the outgoing edge of node C0 are multiplied and then entered into the residual calculation function, Get the residual value EC0 of node C0.
  • the current edge weight minus the intermediate parameter is used as the updated edge weight.
  • the intermediate parameter is the preset step multiplied by the residual value of the high-level node of the edge corresponding to the edge weight and then multiplied by the value of the low-level node of the edge. activation value.
  • the edge weight V 0,0 is subtracted from the intermediate parameter.
  • the intermediate parameter is the residual value of the preset step size multiplied by the node D0 and then multiplied by the activation of the node C0.
  • adjust the edge weight V 1,0 adjust the edge weight V 1,0 .
  • Step 6 Similarly, perform reverse calculation on the first hidden layer, and calculate the hidden layer based on the residual value of each node of the second hidden layer and the edge weight of the outgoing edge of the first hidden layer. The residual value of each node, and adjust the edge weight of the corresponding outgoing edge.
  • the residual value of node C0 is multiplied by U 0,0 and the edge weight of the outgoing edge of node B0 corresponding to node C0, and the residual value of node C1 is corresponding to node B1 Multiply the edge weight U 0,1 of the outgoing edge of node C1.
  • the two products that is, EC0*U 0,0 +EC1*U 0,1 .
  • EB0 substitute it into the residual calculation function to get the residual value of node B0.
  • the residual value EB1 of node B1 can be calculated.
  • Step 7 Perform reverse calculation on the input layer and adjust the edge weight from the input layer to the first hidden layer.
  • Step 8 return to step 1 to train the next training sample.
  • Figure 1b is just a very simple neural network model.
  • the width and depth of the neural network model can be adjusted according to specific needs.
  • the width and depth of the neural network model can be much larger than the width and depth in Figure 1b.
  • FIG. 2 is a schematic diagram of a system architecture applicable to the embodiment of the present application.
  • the system architecture may include a server 201, a network 203, and one or more terminal devices, such as a first terminal device 2021, The second terminal device 2022 and the network 203 are used to provide a medium for communication links between the first terminal device 2021, the second terminal device 2022 and the server 201.
  • Network 203 may include various connection types, such as wired, wireless communication links, fiber optic cables, etc.
  • the terminal device in the system architecture shown in Figure 2 is a device with wireless transceiver functions, which can be deployed on land, including indoors or outdoors, handheld or vehicle-mounted; it can also be deployed on water (such as ships, etc.); it can also Deployed in the air (such as aircraft, balloons and satellites, etc.).
  • the terminal device may be a mobile phone (mobile phone), a tablet computer (Pad), a computer with wireless transceiver function, a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, or an industrial control device.
  • Wireless terminal equipment in industrial control wireless terminal equipment in self-driving, wireless terminal equipment in remote medical, wireless terminal equipment in smart grid, transportation safety ( Wireless terminal equipment in transportation safety, wireless terminal equipment in smart city, wireless terminal equipment in smart home, etc., or hardware such as chips in the aforementioned equipment. equipment.
  • Figure 2 takes two terminal devices and a server as an example.
  • the specific scenario corresponding to Figure 2 can be as follows:
  • Either terminal device among the first terminal device 2021 and the second terminal device 2022 can send local data to be backed up to the server, and the server 201 can generate the data of the data to be backed up at that moment based on the data to be backed up of the terminal device at that moment.
  • the data to be backed up may be the data to be backed up of a terminal device, and the data copy may be a data copy of the data to be backed up of a terminal device.
  • the first copy is a data copy of the data to be backed up of the first terminal device 2021.
  • the data to be backed up may also be the data to be backed up of the two terminal devices, and the data copies may be the data copies of the data to be backed up of the two terminal devices.
  • the first copy is a data copy of the data to be backed up of the first terminal device 2021 and the second terminal device 2022.
  • the server 201 can store the data copies at each time and perform anomaly detection on the data copies. In the embodiment of the present application, there is no specific limit on the timing when the server 201 detects the data copy at each time. For example, the server 201 can check all data copies or a part of the data copies according to a preset period. The server 201 may also perform data anomaly detection on the data copy corresponding to the detection request based on the detection request from the terminal device.
  • this application provides a data anomaly detection method, which can be executed by the server shown in Figure 2.
  • Step 301 Obtain the target indicator value.
  • Step 302 Determine whether there is a possibility of abnormality in the first copy.
  • step 303 If yes, execute step 303; otherwise, end the process.
  • Step 303 Obtain the first target data.
  • Step 304 Obtain the first information according to the first target data and the anomaly detection model.
  • Step 305 Obtain the second target data.
  • Step 306 Obtain the second information according to the second target data and the anomaly detection model.
  • Step 307 Obtain the third target data.
  • Step 308 Obtain the third information according to the third target data and the anomaly detection model.
  • Step 309 Determine whether the first copy is abnormal.
  • the target change value is related to the index change value of the first copy relative to the second copy.
  • the target indicator value may be the indicator change value of the first copy relative to the second copy, or the indicator change value of the first copy relative to the second copy, or the indicator change value of the second copy relative to the third copy at the third moment. Arrived. The second moment is before the first moment, and the third moment is before the second moment.
  • the index change value of the first copy relative to the second copy may be used to represent the change of a certain file statistical index of the first copy relative to the second copy.
  • the index change value of the first copy relative to the second copy can be as follows: the number of file changes in the file name set by the first copy relative to the second copy, and the number of file changes in the file path set by the first copy relative to the second copy.
  • the number of changes, the number of file changes in the first copy relative to the second copy setting file suffix, the number of file changes in the first copy relative to the second copy setting file modification time, the first copy relative to the second copy setting file size The number of file changes, the number of file changes of the first copy's set file type relative to the second copy, etc.
  • the file suffix of the first copy is changed compared to the second copy
  • the file suffix is set to "doc”
  • the number of "doc” files in the first copy is 80,000
  • the number of "doc” files in the second copy is The number of files in "doc” is 90,000
  • the number of file changes in the first copy compared to the second copy "doc” is -10,000.
  • the above-mentioned index change value of the first copy relative to the second copy is only explained by taking the number of changes in file statistical indicators as an example. No.
  • the index change value of one copy relative to the second copy can also be expressed in other ways.
  • the indicator change value of the first copy relative to the second copy may also be: the absolute value of the change value of a single file statistical indicator of the first copy relative to the second copy; The change rate of file statistical indicators; the absolute value of the change rate of a single file statistical indicator of the first copy relative to the second copy.
  • the indicator change value of the first copy relative to the second copy can also be -1/8.
  • the target indicator value can also be obtained from the change value of the single file statistical indicator of the first copy relative to the second copy combined with other information.
  • the target indicator value can also consider the duration, and calculate the single file statistics of the first copy relative to the second copy.
  • the indicator's change value is divided by the time between the first moment and the second moment.
  • the target index value can be obtained based on the index change value of the first copy relative to the second copy, and the index change value of the second copy relative to the third copy at the third time.
  • the target index value can be obtained by a weighted average based on the index change value of the first copy relative to the second copy and the index change value of the second copy relative to the third copy at the third time.
  • step 302 can be as follows:
  • the target indicator value is within the set value range, it is determined that there is a possibility of an abnormality in the first copy.
  • the target indicator value may be the indicator change value of the first copy relative to the second copy.
  • the number of file changes in the set file suffix of the first copy relative to the second copy is -10000, and the set value range is [-6000,8000], then it is determined that the first copy is abnormal.
  • Steps 301 to 302 are all optional steps.
  • the execution may be started directly from step 303, that is, the abnormality detection may not be performed through the target index value, but the abnormality detection may be directly performed through the first target data.
  • the first target data is related to the file changes of the first copy relative to the second copy.
  • the file changes of the first copy relative to the second copy can be described by the file statistical changes of the first copy relative to the second copy.
  • the change in file statistics of the first copy relative to the second copy refers to the change in single or multiple file statistics indicators of the first copy relative to the second copy, and any change in the first copy relative to the second copy is
  • the change value of a file's statistical indicator can refer to the above description.
  • the change amount of file statistics of the first copy relative to the second copy may also include the duration between the first time and the second time.
  • the first target data may be the file statistical change of the first copy relative to the second copy.
  • the file statistics of the first copy relative to the second copy is the change value of multiple file statistics indicators of the first copy relative to the second copy
  • the file statistics of the first copy relative to the second copy may be ⁇ X1, X2, X3, X4, X5, T ⁇ .
  • X1 represents the number of file changes in the first copy relative to the second copy setting file name
  • X4 represents the number of file changes of the first copy compared to the second copy of the set file modification time, The duration from the first moment to the second moment.
  • the form of the change amount of the file statistics of the first copy relative to the second copy is also not limited to the change value.
  • it can also be: the absolute value of the change value of the file statistics index of the first copy relative to the second copy. ; The change rate of the file statistical indicators of the first copy relative to the second copy; the absolute value of the change rate of the file statistical indicators of the first copy relative to the second copy.
  • the first target data can be obtained as follows:
  • the first target data is obtained by the change amount of file statistics of the first copy relative to the second copy, and the change amount of file statistics of the first copy relative to the third copy at a third time, and the third time is before the second time.
  • the first target data is obtained by weighting the average of the file statistical change amount of the copy relative to the second copy and the file statistical change amount of the first copy relative to the third copy at the third time.
  • the change value of the first file statistical index in the first target data may be the change value of the first file statistical index of the first copy relative to the second copy, the change value of the first file statistical index relative to the first copy relative to the third copy.
  • the change value of the first file statistical index of the copy is obtained by a weighted average, and the first file statistical index is an arbitrary file statistical index.
  • step 304 when the input of the anomaly detection model is the first target data, the output of the anomaly detection model is the first information, and the first information is related to the abnormal state of the first copy.
  • the first information may indicate whether the first copy is abnormal.
  • the result indicated by the first information may not be able to determine whether the first copy is abnormal, and may be determined in combination with other information.
  • steps 305 to 308 may be executed to determine whether the first copy is abnormal.
  • the method of obtaining the second target data may refer to the method of obtaining the first target data in step 303.
  • the second target data is related to the file changes of the second copy relative to the third copy at the third time.
  • the second target data may be the file changes of the second copy relative to the third copy at the third time, or may be based on the file changes of the second copy relative to the third copy at the third time, the file changes of the second copy relative to the third copy at the third time, The file changes of the copy at the time before the time are obtained.
  • step 306 when the input of the anomaly detection model is the second target data, the output of the anomaly detection model is the second information.
  • the method of obtaining the third target data may refer to the method of obtaining the first target data in step 303.
  • the third target data is related to the file changes of the fourth copy at the fourth time relative to the first copy.
  • the third target data may be the file changes of the fourth copy relative to the first copy at the fourth time, or may be based on the file changes of the fourth copy relative to the first copy, or the file changes of the first copy relative to the second copy. situation gets.
  • step 308 when the input of the anomaly detection model is the third target data, the output of the anomaly detection model is the third information.
  • Steps 305 to 308 are all optional steps, and the implementation of step 309 may vary.
  • step 309 can be implemented as follows:
  • the first information indicates that the first copy is abnormal, it is determined that the first copy is abnormal, and the first target data is the file changes of the first copy relative to the second copy.
  • Step 309 can be implemented specifically as follows:
  • first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, then it is determined that the first copy is abnormal; or if the first information indicates that the first copy is abnormal, and the second information indicates that the second copy is abnormal, then Determine the possibility of anomalies in the first copy.
  • Step 309 can be implemented specifically as follows:
  • first information indicates that the first copy is abnormal, and the third information indicates that the fourth copy is abnormal, then it is determined that the first copy is abnormal; or if the first information indicates that the first copy is abnormal, and the third information indicates that the fourth copy is normal, then Determine the possibility of anomalies in the first copy.
  • steps 305 to 308 are executed, and then the abnormal state of the first copy can be determined based on the first information, the second information, and the third information.
  • Step 309 can be implemented specifically as follows:
  • the first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, and the third information indicates that the fourth copy If the first copy is abnormal, it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, and the third information indicates that the fourth copy is normal, it is determined that the first copy is abnormal. possibility.
  • this application provides a data anomaly detection device, including:
  • the processing module 401 is used to obtain the first target data.
  • the first target data is related to the file changes of the first copy relative to the second copy.
  • the first copy and the second copy are the same data to be backed up. Data copies at a first time and a second time, the second time being before the first time;
  • Determining module 402 configured to obtain first information according to the first target data and anomaly detection model.
  • the input of the anomaly detection model is the first target data
  • the output of the anomaly detection model is the First information: determine the abnormal status of the first copy according to the first information.
  • the processing module 401 is also configured to: based on file changes of the first copy relative to the second copy, and file changes of the first copy relative to the third copy at a third time. In case of file changes, the first target data is obtained, and the third time is before the second time.
  • the determination module 402 is specifically used to:
  • the first information indicates that the first copy is abnormal, it is determined that the first copy is abnormal, and the first target data is the file changes of the first copy relative to the second copy;
  • the processing module 401 is also used to:
  • Second information is obtained according to the second target data and the anomaly detection model.
  • the second target data is related to the file changes of the second copy relative to the third copy at a third time.
  • the third time is at Before the second moment, when the input of the anomaly detection model is the second target data, the output of the anomaly detection model is the second information;
  • the determination module 402 is specifically used to:
  • first information indicates that the first copy is abnormal, and the second information indicates that the second copy is normal, it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, If the copy is abnormal, and the second information indicates that the second copy is abnormal, it is determined that there is a possibility that the first copy is abnormal.
  • processing module 401 is also used to:
  • Third information is obtained according to the third target data and the anomaly detection model.
  • the third target data is related to the file changes of the fourth copy at the fourth time relative to the first copy.
  • the fourth time is at After the first moment, when the input of the anomaly detection model is the third target data, the output of the anomaly detection model is the third information;
  • the determination module 402 is specifically used to:
  • the first information indicates that the first copy is abnormal, and the third information indicates that the fourth copy is abnormal, then it is determined that the first copy is abnormal; or, if the first information indicates that the first copy is abnormal, abnormal, and the third information indicates that the fourth copy is normal, it is determined that there is a possibility that the first copy is abnormal.
  • processing module 401 is also used to:
  • the target change value being related to an indicator change value of the first copy relative to the second copy
  • the determination module 402 is also used to:
  • the target change value is outside the set value range, determine that there is a possibility of an abnormality in the first copy, and perform the step of obtaining the first information based on the first target data and the anomaly detection model.
  • the target change value is an indicator change value of the first copy relative to the second copy, or is based on an indicator change value of the first copy relative to the second copy. , obtained by the index change value of the second copy relative to the third copy at a third time, and the third time is before the second time.
  • the file changes of the first copy relative to the second copy are any of the following:
  • the change value of the file statistical index of the first copy relative to the second copy The change value of the file statistical index of the first copy relative to the second copy; the absolute value of the change value of the file statistical index of the first copy relative to the second copy; the absolute value of the change value of the file statistical index of the first copy relative to the second copy; The change rate of the file statistical indicators of the second copy; the absolute value of the change rate of the file statistical indicators of the first copy relative to the second copy.
  • An embodiment of the present application also provides an electronic device, which may have a structure as shown in Figure 5.
  • the electronic device may be a computer device, or may be a chip or chip system that can support the computer device to implement the above method.
  • the electronic device as shown in Figure 5 may include at least one processor 501, which is configured to be coupled with a memory, read and execute instructions in the memory to implement the data anomaly detection provided by the embodiments of the present application. Method steps.
  • the electronic device may also include a communication interface 502 for supporting the electronic device to receive or send signaling or data.
  • the communication interface 502 in the electronic device can be used to interact with other electronic devices.
  • the processor 501 may be used to implement the electronic device to perform steps in the method shown in FIG. 3 .
  • the electronic device may also include a memory 503 in which computer instructions are stored.
  • the memory 503 may be coupled with the processor 501 and/or the communication interface 502 to support the processor 501 in calling the computer instructions in the memory 503 to implement The steps in the method shown in Figure 3; in addition, the memory 503 can also be used to store data involved in the method embodiments of the present application, for example, used to store data, instructions, and/or necessary to support the communication interface 502 to implement interaction. Or, used to store the configuration information necessary for the electronic device to execute the method described in the embodiment of this application.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • Computer instructions are stored on the computer-readable storage medium. When these computer instructions are called and executed by a computer, they can cause the computer to complete any of the above method embodiments and method embodiments. methods involved in possible designs.
  • the computer-readable storage medium is not limited. For example, it may be RAM (random-access memory), ROM (read-only memory), etc.
  • This application also provides a chip, which may include a processor and an interface circuit, for completing the above method embodiments and the methods involved in any possible implementation of the method embodiments, where "coupling" means The two components are coupled to each other directly or indirectly, and this coupling may be fixed or removable.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of computer instructions.
  • the computer instructions When the computer instructions are loaded and executed on a computer, processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.
  • the steps of the methods or algorithms described in the embodiments of this application can be directly embedded in hardware, software units executed by processors, Or a combination of the two.
  • the software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, register, hard disk, removable disk, CD-ROM or any other form of storage medium in the art.
  • the storage medium can be connected to the processor, so that the processor can read information from the storage medium and can store and write information to the storage medium.
  • the storage medium can also be integrated into the processor.
  • the processor and the storage medium can be installed in the ASIC, and the ASIC can be installed in the terminal device.
  • the processor and the storage medium may also be provided in different components in the terminal device.
  • These computer instructions may also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing the instructions to execute on the computer or other programmable device
  • steps for implementing the functionality specified in a process or processes in a flow diagram and/or in a block or blocks in a block diagram are also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing the instructions to execute on the computer or other programmable device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

La présente invention divulgue un procédé et un appareil de détection d'anomalie de données. Le procédé consiste à : obtenir des premières données cibles, les premières données cibles étant associées à un changement de fichier dans une première copie par rapport à une seconde copie, la première copie et la seconde copie étant respectivement des copies de données, à un premier moment et à un second moment, des mêmes données à sauvegarder, et le second moment étant antérieur au premier moment ; obtenir une première information en fonction des premières données cibles et d'un modèle de détection d'anomalie, une sortie du modèle de détection d'anomalie étant la première information lorsqu'une entrée du modèle de détection d'anomalie est la première donnée cible ; et déterminer un état anormal de la première copie en fonction de la première information. Un modèle de détection d'anomalie comporte des connaissances apprises associées aux états anormaux d'un grand nombre de copies, de sorte que les premières informations peuvent être obtenues d'après les connaissances apprises par le modèle de détection d'anomalie et selon les premières données cibles et le modèle de détection d'anomalie, ce qui permet de déterminer avec précision l'état anormal d'une première copie.
PCT/CN2023/089425 2022-08-31 2023-04-20 Procédé et appareil de détection d'anomalie de données WO2024045642A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211053661.1 2022-08-31
CN202211053661.1A CN117707840A (zh) 2022-08-31 2022-08-31 一种数据异常检测方法及装置

Publications (1)

Publication Number Publication Date
WO2024045642A1 true WO2024045642A1 (fr) 2024-03-07

Family

ID=90100307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089425 WO2024045642A1 (fr) 2022-08-31 2023-04-20 Procédé et appareil de détection d'anomalie de données

Country Status (2)

Country Link
CN (1) CN117707840A (fr)
WO (1) WO2024045642A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190042744A1 (en) * 2017-08-02 2019-02-07 Code 42 Software, Inc. Ransomware attack onset detection
US20190108341A1 (en) * 2017-09-14 2019-04-11 Commvault Systems, Inc. Ransomware detection and data pruning management
US20200034537A1 (en) * 2018-07-30 2020-01-30 Rubrik, Inc. Ransomware infection detection in filesystems
US20200057843A1 (en) * 2018-08-17 2020-02-20 Citrix Systems, Inc. Secure file sharing using semantic watermarking
US20210044603A1 (en) * 2019-08-07 2021-02-11 Rubrik, Inc. Anomaly and ransomware detection
CN112416891A (zh) * 2020-11-26 2021-02-26 北京天融信网络安全技术有限公司 数据检测方法、装置、电子设备及可读存储介质
US11170104B1 (en) * 2015-08-21 2021-11-09 Amazon Technologies, Inc. Identifying attacks on file systems

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11170104B1 (en) * 2015-08-21 2021-11-09 Amazon Technologies, Inc. Identifying attacks on file systems
US20190042744A1 (en) * 2017-08-02 2019-02-07 Code 42 Software, Inc. Ransomware attack onset detection
US20190108341A1 (en) * 2017-09-14 2019-04-11 Commvault Systems, Inc. Ransomware detection and data pruning management
US20200034537A1 (en) * 2018-07-30 2020-01-30 Rubrik, Inc. Ransomware infection detection in filesystems
US20200057843A1 (en) * 2018-08-17 2020-02-20 Citrix Systems, Inc. Secure file sharing using semantic watermarking
US20210044603A1 (en) * 2019-08-07 2021-02-11 Rubrik, Inc. Anomaly and ransomware detection
CN112416891A (zh) * 2020-11-26 2021-02-26 北京天融信网络安全技术有限公司 数据检测方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN117707840A (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
KR102170105B1 (ko) 신경 네트워크 구조의 생성 방법 및 장치, 전자 기기, 저장 매체
US11379723B2 (en) Method and apparatus for compressing neural network
CN110163368B (zh) 基于混合精度的深度学习模型训练方法、装置及系统
KR20200045128A (ko) 모델 학습 방법 및 장치, 및 데이터 인식 방법
CN111145076B (zh) 数据并行化处理方法、系统、设备及存储介质
CN106897178A (zh) 一种基于极限学习机的慢盘检测方法及系统
US20210176174A1 (en) Load balancing device and method for an edge computing network
US11030520B2 (en) Distributed secure training of neural network model
CN113128419B (zh) 一种障碍物识别方法和装置、电子设备及存储介质
CN110263628B (zh) 障碍物检测方法、装置、电子设备以及存储介质
CN112215438A (zh) 一种应急灾害预警分析数据处理方法及系统
EP3206367A1 (fr) Techniques pour détecter des attaques dans un réseau de publication-abonnement
WO2024045642A1 (fr) Procédé et appareil de détection d'anomalie de données
CN109639755A (zh) 关联系统服务器解耦方法、装置、介质及电子设备
CN113635896A (zh) 一种驾驶行为确定方法及其相关设备
CN112488843A (zh) 基于社交网络的企业风险预警方法、装置、设备和介质
CN116842384A (zh) 多模态模型训练方法、装置、电子设备及可读存储介质
CN116341680A (zh) 人工智能模型适配方法、装置、电子设备以及存储介质
JP2020057361A (ja) モデルの信頼度を検出するための方法及び装置
CN112798955B (zh) 一种特种电机的故障检测方法及装置
CN115062769A (zh) 基于知识蒸馏的模型训练方法、装置、设备及存储介质
CN114254757B (zh) 一种分布式深度学习方法、装置、终端设备及存储介质
CN114115987A (zh) 一种模型处理方法及装置
CN111709784B (zh) 用于生成用户留存时间的方法、装置、设备和介质
CN109756494B (zh) 一种负样本变换方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858683

Country of ref document: EP

Kind code of ref document: A1