CN114863394A

CN114863394A - Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable storage medium

Info

Publication number: CN114863394A
Application number: CN202210467552.8A
Authority: CN
Inventors: 彭磊; 舒洪峰; 赵子琪; 徐婷; 崔允端; 朱栋文
Original assignee: Shenzhen Smart City Technology Development Group Co ltd; Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Smart City Technology Development Group Co ltd; Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-08-05

Abstract

The embodiment of the invention discloses an anomaly detection method, an anomaly detection device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring N data, wherein the N data correspond to N sensors one by one, and N is a positive integer greater than 1; and determining abnormal sensors in the N sensors according to the N data and the adjacency matrix among the N sensors. According to the embodiment of the invention, the anomaly detection can be carried out by combining the sensor data and the adjacency matrix between the sensors, and the adjacency matrix between the sensors comprises abundant incidence relations between the sensors, so that the detection accuracy can be improved.

Description

Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to an anomaly detection method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of internet of things (IoT) technology, IoT technology has been widely applied in the fields of automation, medical health, smart energy, smart manufacturing, and the like.

Due to reasons inside or outside some internet of things systems, sensor data acquired by the internet of things systems may have anomalies (such as noise, countersamples, and the like), and these anomalous data may cause the system to make wrong decisions. For example, for an autopilot scenario, anomalous data may cause the autopilot system to turn on the wipers of the car when it is not raining. In order to solve the above problems, it is very important to detect abnormality of the sensor in the internet of things system.

A common method for detecting sensor anomalies is: the data flow of a single sensor is analyzed, whether data (namely abnormal data) which do not accord with the distribution rule exist in the data flow of the sensor or not is determined through methods (such as a support vector machine algorithm) of statistics, classification, clustering and the like, and if the abnormal data exist, the sensor is determined to be abnormal. This anomaly detection method is not effective for identifying anomalous data (such as countermeasure samples) that conforms to the distribution rule, resulting in low accuracy.

Disclosure of Invention

The embodiment of the invention discloses an anomaly detection method and device, electronic equipment and a computer readable storage medium, which are used for improving detection accuracy.

The first aspect discloses an abnormality detection method, which may be applied to an electronic device, a module (e.g., a chip) in the electronic device, and a logic module or software that can implement all or part of the functions of the electronic device, and is described below with reference to the application to the electronic device as an example. The method can comprise the following steps:

acquiring N data, wherein the N data correspond to N sensors one by one, and N is a positive integer greater than 1;

and determining abnormal sensors in the N sensors according to the N data and the adjacency matrix among the N sensors.

In the embodiment of the invention, when the electronic equipment determines the abnormal sensor in the N sensors, the adjacent matrix among the N sensors is used, and the adjacent matrix comprises abundant incidence relations among the N sensors, so that the electronic equipment can associate the N data through the adjacent matrix, can more accurately determine the abnormal sensor in the N sensors, and can improve the detection accuracy.

As a possible implementation, the determining an abnormal sensor of the N sensors according to the N data and the adjacency matrix between the N sensors includes:

obtaining scene semantics according to the N data and the adjacent matrix among the N sensors, wherein the scene semantics comprise the semantics of the N sensors;

and determining abnormal sensors in the N sensors according to the scene semantics.

In the embodiment of the invention, the electronic equipment can obtain the scene semantics according to the N data and the adjacency matrix, wherein the scene semantics are comprehensively described by the N sensors and comprise the semantics of the N sensors, so that the electronic equipment can more accurately determine abnormal sensors in the N sensors through the scene semantics.

As a possible implementation, the obtaining scene semantics according to the N data and the adjacency matrix between the N sensors includes:

obtaining N semantic vectors according to the N data, wherein the N semantic vectors correspond to the N sensors one by one;

based on the N semantic vectors and the adjacency matrix between the N sensors, scene semantics are obtained.

In the embodiment of the invention, the electronic equipment can obtain N semantic vectors through N data, and then can fuse the N semantic vectors based on the N semantic vectors and the adjacency matrix to obtain scene semantics. Because the scene semantics is fused with the semantic vectors of all the sensors, the scene semantics can include richer and more accurate information, thereby being convenient for the electronic equipment to carry out anomaly detection.

As a possible implementation, the obtaining N semantic vectors according to the N data includes:

obtaining N semantemes according to the N data, wherein the N semantemes correspond to the N sensors one by one;

and obtaining the N semantic vectors according to the N semantics and the semantic library.

In the embodiment of the invention, because the dimensions, data formats and data contents of different sensor data can be different, the electronic equipment can obtain N semantics according to N data firstly and then obtain N semantic vectors according to the N semantics, so that the electronic equipment can fuse the semantics of N sensors.

As a possible implementation, the deriving scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors includes:

splicing the N semantic vectors to obtain sensor semantic features;

multiplying the adjacent matrix and the sensor semantic features to obtain original scene semantics;

and inputting the original scene semantic into an encoder to obtain the scene semantic, wherein the dimension of the scene semantic is smaller than that of the original scene semantic.

In the embodiment of the invention, the electronic equipment can splice N semantic vectors to obtain sensor semantic features, and then multiply the sensor semantic features with the adjacent matrix to obtain the original scene semantics. Then, the electronic device can input the original scene semantic into the encoder to obtain the embedded scene semantic (i.e., the scene semantic), and the dimension of the scene semantic can be smaller than that of the original scene semantic, so that the efficiency of anomaly detection of the electronic device can be improved.

As a possible implementation, the method further comprises:

acquiring N true value data, scene labels corresponding to the true value data, and corresponding the N true value data and N sensors one by one;

obtaining original scene semantics according to the N truth value data;

inputting the original scene semantic into an initial encoder to obtain scene semantic;

inputting the scene semantics into a classifier to obtain a predicted scene label;

determining a loss based on the predicted scene tag and the scene tag;

and optimizing the parameters of the initial encoder according to the loss to obtain the encoder.

In the embodiment of the invention, the electronic equipment can train the initial encoder through the truth data so as to obtain more accurate scene semantics and scene semantics centroid of each scene, thereby improving the accuracy of anomaly detection.

As a possible implementation, the method further comprises:

calculating the similarity between the scene semantics and M scene semantic centroids to obtain M similarities, wherein the M scene semantic centroids correspond to the M scenes one by one, the scene semantic centroids are standard scene semantics corresponding to the scenes, and M is a positive integer greater than or equal to 1;

the determining the anomaly of the N sensors according to the scene semantics includes:

and under the condition that the M similarity is smaller than a similarity threshold value, determining abnormal sensors in the N sensors according to the scene semantics.

In the embodiment of the invention, the electronic equipment can calculate the similarity between the scene semantics and the M scene semantics centroids to obtain the M similarity, and then the electronic equipment can judge the magnitude between the M similarity and the similarity threshold. If the similarity greater than or equal to the similarity threshold exists in the M similarities, the electronic equipment can determine that an abnormal sensor does not exist in the N sensors and does not need to respond; in a case where the M similarities are less than the similarity threshold, the electronic device may determine that there is an abnormal sensor among the N sensors, and then the electronic device may determine the abnormal sensor among the N sensors according to the scene semantics. Therefore, the electronic device can continue to determine the abnormal sensor only when the M similarity is smaller than the similarity threshold, and the efficiency of abnormality detection can be improved.

As a possible implementation, the original scene semantics include N scene semantic vectors, which correspond to the N sensors one to one; the determining the anomaly of the N sensors according to the scene semantics includes:

determining a scene semantic centroid with the highest scene semantic similarity to obtain a first scene semantic centroid;

acquiring N standard semantic vectors corresponding to the semantic centroid of the first scene, wherein the N standard semantic vectors correspond to the N sensors one by one;

calculating semantic distances between the N scene semantic vectors and the N standard semantic vectors to obtain N semantic distances, wherein the N semantic distances correspond to the N sensors one by one;

determining the largest k semantic distances in the N semantic distances, determining the sensors corresponding to the k semantic distances as abnormal sensors, wherein k is a positive integer greater than or equal to 1.

In the embodiment of the invention, under the condition that the electronic equipment determines that abnormal sensors exist in the N sensors, the electronic equipment can determine the scene semantic centroid with the highest semantic similarity with the scene to obtain the first scene semantic centroid. Thereafter, the electronic device may determine N semantic distances based on the N scene semantic vectors and the N standard semantic vectors corresponding to the first scene semantic centroid, and may determine a sensor corresponding to the k most distant semantic distances as an anomalous sensor. Because the standard semantic vector can accurately measure the scene semantic vector which the sensor should represent under the normal condition, the electronic equipment can find out the sensor which is most likely to have abnormality by calculating the semantic distance between the scene semantic vector and the standard semantic vector.

As a possible embodiment, the N sensors include a first sensor and a second sensor;

under the condition that the sensing ranges of the first sensor and the second sensor are crossed, the corresponding value of the first sensor and the second sensor in the adjacent matrix is 1;

in the case that there is no intersection between the sensing ranges of the first sensor and the second sensor, the value corresponding to the first sensor and the second sensor in the adjacent matrix is 0.

A second aspect discloses an abnormality detection device, which may be an electronic apparatus or a module (e.g., a chip) in the electronic apparatus. The apparatus may include:

as a possible implementation manner, the obtaining unit is configured to obtain N data, where the N data correspond to the N sensors one to one, and N is a positive integer greater than 1;

and the determining unit is used for determining abnormal sensors in the N sensors according to the N data and the adjacent matrix among the N sensors.

As a possible embodiment, the determining unit determining an abnormal sensor among the N sensors according to the N data and the adjacency matrix between the N sensors includes:

obtaining scene semantics according to the N data and the adjacent matrixes among the N sensors, wherein the scene semantics comprise the semantics of the N sensors;

As a possible implementation, the determining unit obtains the scene semantics according to the N data and the adjacency matrix between the N sensors includes:

As a possible implementation, the determining unit obtains N semantic vectors according to the N data includes:

As a possible implementation, the determining unit obtains the scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors, including:

splicing the N semantic vectors to obtain sensor semantic features;

multiplying the adjacent matrix with the semantic features of the sensor to obtain original scene semantics;

As a possible implementation, the apparatus further comprises:

the processing unit is used for calculating the similarity between the scene semantics and the M scene semantic centroids to obtain M similarities, wherein the M scene semantic centroids correspond to the M scenes one by one, the scene semantic centroids are standard scene semantics corresponding to the scenes, and M is a positive integer greater than or equal to 1;

the determining unit determining an abnormal sensor among the N sensors according to the scene semantics includes:

As a possible implementation, the original scene semantics include N scene semantic vectors, which correspond to the N sensors one to one; the determining unit determining an abnormal sensor among the N sensors according to the scene semantics includes:

A third aspect discloses an electronic device, comprising: a processor and a memory. The memory is used for storing computer programs, and the processor is used for calling the computer programs. When the processor executes the computer program stored in the memory, the processor is caused to execute the anomaly detection method disclosed in the first aspect or any embodiment of the first aspect.

A fourth aspect discloses a computer-readable storage medium having stored thereon a computer program or computer instructions which, when executed, implement the anomaly detection method as disclosed in the above aspects.

A fifth aspect discloses a chip comprising a processor for executing a program stored in a memory, which program, when executed, causes the chip to carry out the above method.

As a possible implementation, the memory is located off-chip.

A sixth aspect discloses a computer program product comprising computer program code which, when executed, causes the above-mentioned anomaly detection method to be performed.

Drawings

FIG. 1 is a schematic flow chart of an anomaly detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a topology of a sensor network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a scene semantic cluster according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network architecture of a scene semantic centroid extractor disclosed in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a semantic distance disclosed in an embodiment of the present invention;

FIG. 6 is a schematic diagram of a scene semantic cluster according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a SSC-IDS and SSF-IDS comparison disclosed in an embodiment of the present invention;

FIG. 8 is a schematic diagram of another SSC-IDS and SSF-IDS comparison disclosed in embodiments of the present invention;

FIG. 9 is a comparison of different k values according to the embodiment of the present invention;

FIG. 10 is a schematic structural diagram of an anomaly detection device according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The embodiment of the invention discloses an anomaly detection method and device, electronic equipment and a computer readable storage medium, which are used for improving detection accuracy. The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The terms "first," "second," "third," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not necessarily for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process may comprise a sequence of steps or elements, or may alternatively comprise steps or elements not listed, or may alternatively comprise other steps or elements inherent to such process, method, article, or apparatus.

Only some, but not all, of the material relevant to the present application is shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

As used in this specification, the terms "component," "module," "system," "unit," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a unit may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a distribution between two or more computers. In addition, these units may execute from various computer readable media having various data structures stored thereon. The units may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., from a second unit of data interacting with another unit in a local system, distributed system, and/or across a network.

For a better understanding of the embodiments of the present invention, some terms and related technologies of the embodiments of the present invention will be described below.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a one-domain multi-domain cross subject, and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, weakly supervised learning, strongly supervised learning, and the like.

With the popularization of the internet of things technology, the application of the internet of things technology in the fields of automation, medical health, energy, manufacturing industry and the like is more and more, and the whole internet of things system can be assisted to make effective decisions by analyzing data acquired by a large number of sensors in the internet of things system. In the method, a large-scale internet of things device (i.e., a sensor) is vulnerable and damaged, so that the acquired data may include abnormal data such as noise (e.g., abnormal data, incomplete information, etc.), countersamples, etc., and the abnormal data may cause an internet of things system to make an incorrect decision. For example, for a medical internet of things system, an attack against a sample may cause an error in classifying an electronic Computed Tomography (CT) image, which increases the risk of misdiagnosis. For another example, machine learning models for autonomous driving are also quite vulnerable to countersample cheating, thereby making false decisions and increasing the risk of traffic accidents.

Therefore, the data of the sensor comprises abnormal data due to the fact that the internet of things equipment is easily attacked and damaged, and the abnormal data can bring security threat to the application in the internet of things system. For some high-risk applications (such as automatic driving, telemedicine, and the like), abnormal data can cause more serious consequences, so that the internet of things system needs to have high robustness and can effectively identify the abnormal data, so that the internet of things system can make a reliable decision based on correct data, and the safety of the internet of things system is improved.

An anomaly detection (intrusion detection) technology is one way to solve the above problems, and the robustness of the internet of things system can be improved through the anomaly detection technology.

At present, the anomaly detection of the internet of things system mainly depends on analyzing the data of a single sensor, and the intrusion of the sensor is detected based on an outlier detection method. Specifically, for the data stream of the sensor, methods such as statistics, classification, clustering, and the like can be used to perform anomaly detection, and according to the historical data of the sensor and the prediction of future data, it can be determined whether the data stream of the sensor includes outliers (i.e., anomalous data) that do not conform to the distribution rule of the data stream, and if data that do not conform to the distribution rule obviously exist in the data stream, the sensor can be considered to be invaded, or the sensor is an anomalous sensor. However, for resisting the sample attack, there is no anomaly (i.e. no outlier that does not conform to the data flow distribution rule) in view of the data flow distribution, so the above method cannot perform detection effectively, resulting in low accuracy.

For better understanding of the embodiments of the present invention, the following description will exemplarily describe a scenario in which the embodiments of the present invention are applicable.

Illustratively, the anomaly detection method provided by the embodiment of the invention can be used for improving the accuracy of anomaly detection and can be used for anomaly detection of a sensor network comprising a large number of sensors. Moreover, the method can also be applied to sensor networks with multi-source isomerism.

For example, with the development of automobile intelligence, in order to facilitate the realization of functions such as automatic driving and assistant driving, more and more sensors are mounted on intelligent automobiles. A smart car may be equipped with one or more laser radars (lidar), one or more cameras (camera), one or more radars (e.g., millimeter wave radars), etc., which may together form a sensor network. Aiming at an automatic driving scene, sensors on the intelligent automobile can be in the same space-time environment, and the functions of automatic parking, emergency braking and the like can be realized through the technologies of machine learning, deep learning and the like through data acquired by the sensors. However, if the sensor data acquired by the autopilot system is incorrect, the autopilot system may be caused to make some incorrect decisions, thereby increasing the driving safety hazard. According to the anomaly detection method provided by the embodiment of the invention, the anomaly detection can be carried out on the sensor on the intelligent automobile based on the acquired sensor data, and the abnormal sensor can be found out, so that the automatic driving system can be assisted to make a correct decision.

The anomaly detection system realized by the anomaly detection method provided by the embodiment of the invention can be called an SSC-IDS (SSC-IDS) intrusion detection system based on scene semantic centroids.

When the anomaly detection method provided by the embodiment of the invention is used for carrying out anomaly detection on sensor data, the anomaly detection can be carried out based on the scene semantics of a sensor network (system), for example, the anomaly detection is carried out aiming at the scene semantics of the sensor network on an intelligent automobile. Under the condition that the data of all the sensors are normal data, the scene semantics corresponding to the data also need to be normal scene semantics; in the case where the abnormal data is included in the data of all the sensors, the scene semantics corresponding to the data should be abnormal scene semantics. Thus, if the scene semantics determined from the data of the sensor are anomalous scene semantics, it indicates that an anomalous sensor may be included in the sensor network. For abnormal scene semantics, a semantic distance between the semantics of each sensor and the standard sensor semantics may be calculated, and k sensors with a larger semantic distance may be determined as abnormal sensors, k being an integer greater than 1.

It should be understood that the anomaly detection method provided by the embodiment of the invention is applicable to all sensor systems under the Internet of things, and the anomaly detection based on scene semantics can greatly improve the efficiency and accuracy of anomaly detection and improve the quality of anomaly alarm of the sensor systems. Meanwhile, when the sensor system is detected to be abnormal, the method can also determine the sensor which is most likely to be abnormal.

The abnormality detection method provided by the embodiment of the invention can be executed by electronic equipment, and the electronic equipment comprises but is not limited to terminal equipment or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platforms and the like. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. Meanwhile, the electronic device may also be a chip, a chip system, or a processor supporting the electronic device to implement the exception method, or may also be a logic module or software capable of implementing all or part of the functions of the electronic device.

Referring to fig. 1, fig. 1 is a schematic flow chart of an anomaly detection method according to an embodiment of the present invention. As shown in fig. 1, the abnormality detection method may include the following steps.

101. N data are obtained, and the N data correspond to the N sensors one to one.

In order to detect whether an abnormal sensor is included in a sensor network (e.g., a sensor network of an intelligent vehicle) or whether the sensor network is invaded (e.g., a network attack), the electronic device may acquire N data (i.e., data collected by N sensors) in real time, where the N data corresponds to the N sensors one to one. The N sensors may belong to the same sensor network. The N sensors can be located in the same space-time environment, the N data can also be data collected in the same space-time environment, and N is a positive integer greater than 1.

It should be understood that different types of sensors (i.e., multi-source heterogeneous sensors) may be included in the N sensors, such as a temperature sensor, a laser radar, a camera, a millimeter wave radar, and the like, and the embodiments of the present invention are not limited herein. Therefore, the data formats, data dimensions, and information richness of the data collected by the N sensors may be different. For example, the temperature sensor may collect real-time temperature information (e.g., 10 degrees) in the current space-time environment; the millimeter wave radar can collect the physical environment information around, and can determine whether the surroundings include obstacles or not by analyzing based on the information; the camera may collect surrounding images or video information, and compared to the millimeter wave radar, the information collected by the camera may include color information, such as color of traffic lights in front of the camera and countdown information.

102. And determining abnormal sensors in the N sensors according to the N data and the adjacent matrix among the N sensors.

Since the N sensors may belong to the same sensor network and may be located in the same space-time environment, in order to more accurately determine an abnormal sensor among the N sensors, the electronic device may first determine an adjacency matrix among the N sensors, and then, the electronic device may determine an abnormal sensor among the N sensors according to the N data and the adjacency matrix among the N sensors.

The adjacency matrix between the N sensors may describe the spatial topology of the sensor network and include rich association information between the sensors, the size of the adjacency matrix being N × N. The sensing ranges of some or all of the N sensors may intersect, and in the case that the sensing ranges of two sensors intersect, the corresponding values of the two sensors in the adjacent matrix may be 1; in the case where there is no intersection in the sensing ranges of the two sensors, the corresponding values of the two sensors in the adjacency matrix may be 0. Specifically, the N sensors may include a first sensor and a second sensor. In the case where there is an intersection between the sensing ranges of the first sensor and the second sensor, the value corresponding to the first sensor and the second sensor in the adjacency matrix may be 1; in the case where there is no intersection between the sensing ranges of the first sensor and the second sensor, the value corresponding to the first sensor and the second sensor in the adjacency matrix may be 0.

It should be noted that the sensing range of the sensor can be understood as the spatial range of the data collected by the sensor. For example, for a camera and a millimeter wave radar on a smart car, if the camera can collect an image of a first area, the millimeter wave radar can collect physical environment information of a second area, and if the first area and the second area have the same part, it can be considered that the sensing ranges of the camera and the millimeter wave radar are crossed. Assuming that the same part of the first area and the second area is a third area (namely the part where the sensing range of the camera and the millimeter wave radar is crossed is the third area), and a traffic light exists in the third area, the fact that a traffic light exists in the third area can be determined through information collected by the millimeter wave radar, and the fact that a traffic light exists in the third area and the state of the traffic light can be determined through information collected by the camera.

The electronics can model the sensor network as a structure of a graph, denoted as G ═ V, E. V is the set of sensor nodes and E is the set of edges. Meanwhile, connection relationship information between sensor nodes is described using an adjacency matrix of a graph. For example, suppose that a sensor system of a smart car includes 6 cameras (cameras), 5 radars, and 1 lidar mounted above the roof of the car, the cameras being a front camera, a left front camera, a right front camera, a rear camera, a left rear camera, and a right rear camera, respectively; the 5 radars are respectively a front radar, a left front radar, a right front radar, a left rear radar and a right rear radar. Wherein, an edge can be established between two sensors with crossed sensing ranges, and the topological structure of the two sensors can be as shown in fig. 2. As can be seen from fig. 2, there is a connection line (i.e., there is an edge) between the lidar and all the cameras and radars, indicating that the sensing range of the lidar intersects with the sensing range of any one of the cameras or radars. In fig. 2, there may be a side between the left rear radar and the rear camera, the left rear camera, and the laser radar, that is, there may be a sensing range of the left rear radar, and there may be a cross between the left rear radar and the sensing range of the rear camera, the left rear camera, and the laser radar. Similar understanding can be made with respect to the other sensors in fig. 2, and a detailed description thereof is omitted.

The electronic device may determine an abnormal sensor of the N sensors according to the N data and the adjacency matrix between the N sensors, and specifically include the following steps: the electronic device obtains scene semantics from the N data and the adjacency matrix between the N sensors, and then the electronic device can determine an abnormal sensor of the N sensors according to the scene semantics. The scene semantics include the semantics of the N sensors.

The electronic device may obtain scene semantics from the N data and the adjacency matrix between the N sensors. Specifically, the electronic device may obtain N semantic vectors according to the N data, and then may obtain scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors. The N semantic vectors correspond to the N sensors one-to-one.

In order to obtain the N semantic vectors, the electronic device may first obtain N semantics according to the N data, and then obtain the N semantic vectors according to the N semantics and the semantic library. The N semantics correspond to the N sensors one to one. The electronic device may convert the sensor data into semantics. For example, for an image captured by a camera, the electronic device may convert the image into semantics through a machine learning algorithm, a deep learning algorithm, or the like.

The semantics of a sensor can be expressed as

Sem _i Is the semantics of the ith sensor in the sensor network (i.e. the N sensors mentioned above),

is the kth semantic among all the semantics contained in the sensor data.

After the electronic equipment obtains the semantics of the sensor, the semantics of the sensor can be coded through the semantic library corresponding to the sensor. The encoding method may be one-hot (one-hot) encoding. It should be noted that the types of sensors are different, and the semantic library used for encoding is different. Meanwhile, the same type of sensors can share the same semantic library.

The electronic device can encode the N semantics through one-hot encoding to obtain N semantic vectors. Wherein the semantic vector (code) of the ith sensor can be expressed as code _i 。code _i Is the same size as the semantic library used for encoding. code _i Each position in (1) can represent a semantic, if sensor semantics Sem _i Including the corresponding semantics of a certain position, the value of the position can be 1, if the sensor semantics Sem _i The value of the position may be 0.

For example, the semantic library for a camera may be { sunny day, rainy day, …, snowy day, cloudy day }, the size of the semantic library may be 1000, and the semantic of one camera is { sunny day }, so that one-hot encoding of the semantic of the camera through the semantic library may obtain a 1000-dimensional vector [1,0, …,0, 0], where only the position corresponding to the sunny day has a value of 1 (i.e., the first position has a value of 1), and the other positions have all values of 0.

After the electronic device obtains the semantic vectors of the N sensors (i.e., the N semantic vectors), the scene semantics may be obtained based on the N semantic vectors and the adjacency matrix between the N sensors. Specifically, the electronic device can splice the N semantic vectors to obtain sensor semantic features; then, multiplying the adjacent matrixes among the N sensors by the semantic features of the sensors to obtain original scene semantics; and then, the electronic equipment can input the original scene semantics into the encoder to obtain the scene semantics. The dimensions of the scene semantics are smaller than the dimensions of the original scene semantics. The following describes the steps of obtaining scene semantics by the electronic device in detail.

The electronic device may splice N semantic vectors to obtain sensor semantic features, which may be expressed as a multi-dimensional vector S, S ═ code ₁ ,code ₂ ,…code _N ) ^T . Wherein, N is the number of sensors included in the sensor network, and S is a sensor semantic feature composed of semantic vectors of all sensors, which may also be referred to as an attribute matrix of the sensor network. In the sensor network, the adjacency matrix of the topology G may be denoted as a, and the adjacency matrix a is specifically shown as follows:

wherein, a _ij A value of 1 represents that the sensing ranges of the sensor node i and the sensor node j are overlapped (i.e. the sensing ranges are crossed), and a _ij A value of 0 indicates that the sensing ranges between sensor node i and sensor node j do not intersect, i and j being less than N.

The original scene semantics can be obtained from the adjacency matrix a and the sensor semantic features S, and can be represented as v, where v is a × S. It can be seen that the original scene semantic may be a graph vector, the dimension of v may be N × M, and M may be the size of the largest semantic library among the semantic libraries corresponding to the N sensors. The ith row of v may be denoted as v _i . The original scene semantics may include N scene semantic vectors that correspond one-to-one with the N sensors. In particular, a line of the original scene semantic may be a scene semantic vector with a dimension of 1 × M.

Because the sensor semantic features S are obtained through one-hot coding, and S has the characteristics of high dimension and sparseness, the original scene semantic v also has the characteristics of high dimension and sparseness. In order to reduce the dimensionality of the original scene semantic v and improve the processing efficiency, the original scene semantic v may be encoded by an encoder. The encoder may be a Graph Neural Network (GNN), which may be referred to as a ScSem2 Vec.

The electronic device may compress the original scene semantics v from a high-dimensional sparse space to a low-dimensional dense space through the ScSem2 Vec. The number of layers of the encoder may be 3, 5, or other number of layers, which is not limited herein in the embodiments of the present invention. In the encoder, each layer may aggregate information of neighbor nodes of each sensor node according to correlation between the sensor nodes. In the k-th layer network of the encoder, the following formula (1) can be adopted to calculate the correlation coefficient between the node i and the node j

Wherein the content of the first and second substances,

are trainable parameters. Sigmoid is a Sigmoid function. Sigma canThe activation function ReLU may be other activation functions, and the embodiment of the present invention is not limited herein. When k is a number of 1, the number of the transition metal,

and

may be the data of the ith and jth lines in the original scene semantics.

The electronic device may correlate (i.e., compare) the coefficients

) The normalization is performed, and the normalization formula can be expressed as the following formula (2).

Wherein the content of the first and second substances,

may be a normalization factor between node i and node j.

Is a set of neighbor nodes of a node (i.e., sensor node) i. The nodes of the k-th layer can be represented as

Specifically, the following equation (3) shows.

The output obtained by the electronic device through the last layer of network of the encoder is an embedded representation of the original scene semantics, i.e. the scene semantics.

It should be understood that in order to make the original scene semantics under the same scene, the embedded scene semantics obtained after compression by the encoder are gathered together as targets, the encoder may be trained first. In the process of encoder training, a large amount of truth data can be used for training the encoder, cross entropy can be used as a loss function, and scene semantic classification can be carried out on the output of the last layer of the encoder through a classifier.

Specifically, the electronic device may obtain N pieces of truth value data and a scene tag corresponding to the truth value data; according to the N truth value data, the electronic equipment can obtain original scene semantics; then, the electronic device can input the original scene semantic into an initial encoder to obtain the scene semantic; then, the electronic device can input the scene semantics into a classifier to obtain a predicted scene label; the electronic device may then determine a loss based on the predicted scene tag and the scene tag; finally, the electronic device may optimize parameters of the initial encoder according to the loss to obtain the encoder. The N true value data correspond to the N sensors one by one.

For example, for an automatic driving scenario, eight scene semantic categories may be included, which may be daytime straight ahead, daytime parking, daytime turning, daytime overtaking, nighttime straight ahead, nighttime parking, nighttime turning, and nighttime overtaking, respectively. For each category, a large amount of truth data needs to be collected, each truth data may correspond to a scene tag (for example, data collected in a daytime straight scene may be labeled), and an encoder may be trained through the truth data and the corresponding scene tag.

It can be understood that, in the iterative training process of the model (i.e., the initial encoder), when the total loss value is smaller than the preset model error, the electronic device may stop the training, and obtain a trained encoder. Optionally, the electronic device may set a preset iteration number of the initial encoder, record a training iteration number of the initial encoder, stop training the initial encoder when the training iteration number is equal to the preset iteration number, and determine a model with the training iteration number equal to the preset iteration number as the finally trained encoder. Meanwhile, in the model training process, the total loss values of a plurality of groups of different data can be obtained first, and then the total loss values of the plurality of groups of data are averaged to obtain an average total loss value. The model parameters may then be optimized based on the average total loss value.

It should be noted that, through the trained encoder, scene semantics under each scene may form a scene semantics cluster in a large amount of truth data under different scenes. The definition of scene semantic clusters may be: given a set of clusters C ═ C ₁ ,...,C _k V-set of scene semantics for a set of N samples ₁ ,v ₂ ,...,v _N K corresponding scene semantic clusters are formed. That is to say that the first and second electrodes,

the scene semantic cluster may have a property as shown in the following equation (4):

wherein, dist (v) _i ,v _j ) For scene semantic samples v _i And v _j The semantic distance between the two scene semantic samples can be measured by cosine similarity, the cosine similarity is calculated as shown in the following formula (5), and the scene semantic samples can be scene semantics determined by truth data of the sensor.

Wherein, theta _i,j For scene semantic samples v _i And v _j The included angle between the two parts is large or small. | v | (V) _i | is a scene semantic sample v _i Is a modulo length, | | v _j | is a scene semantic sample v _j Die length of (2). v. of _i ·v _j For semantic samples v _i And v _j Dot product between.

To facilitate understanding of anomaly detection of scene semantics, the concept of scene semantic centroids is described below. The scene semantic centroid of a scene may be a center vector of a scene semantic cluster of the scene, and the calculation of the scene semantic centroid may be shown in the following formula (6).

Wherein class _i Represents a scene class, centroid _i And Y is the number of scene semantic samples used for calculating the scene semantic centroid corresponding to the scene. v. of _j Is the jth scene semantic sample.

It should be appreciated that the larger Y (i.e., the more truth data corresponding to a scene), the more accurate the calculated semantic centroid of the scene may be. Meanwhile, when the Y value is larger than a certain fixed value, the scene semantic centroid converges. The scene semantic centroid is an expected value of all scene semantic samples in the scene semantic cluster to which the scene semantic centroid belongs, and can be standard scene semantics corresponding to the scene. Also, it should be understood that each component of the scene semantic centroid may correspond to the semantic expectation of each single sensor in each truth data sample for the scene, as shown in equation (7) below.

Wherein the content of the first and second substances,

represents class _i And under the scene, the semantic expectation of each single sensor corresponding to the jth scene semantic sample. T denotes the transpose of the matrix.

The calculation of (c) can be shown as the following equation (8).

Where N is the number of sensor nodes.

And the semantic meaning corresponding to the kth sensor in the jth scene semantic sample.

It should be noted that the distribution plane of the scene semantic cluster may be fixed, and under the condition that the acquired sensor data does not include abnormal data, the scene semantics acquired through the sensor data should be located in the distribution plane of the scene semantic cluster; when the acquired sensor data includes abnormal data, the scene semantics obtained from the sensor data should be located outside the distribution plane of the scene semantics cluster. Therefore, the distribution plane of the scene semantic cluster can be used as a standard for anomaly detection, and anomaly detection for a sensor can be realized by out-of-distribution (OOD) detection. Referring to fig. 3, fig. 3 is a schematic diagram of a scene semantic cluster according to an embodiment of the disclosure. As shown in fig. 3, assume that 4 scenes can be included, corresponding to 4 different clusters. Except abnormal data, the abnormal data in the graph are normal data (namely true value data), and as can be seen from a three-dimensional visual image, the normal data are distributed in a three-dimensional space more intensively, a scene semantic centroid surrounding each scene can form a cluster (namely scene semantic cluster), and the abnormal data are obviously independent and have a longer distance from each scene semantic centroid. The three-dimensional data is compressed to a two-dimensional plane for visualization (i.e. the right two-dimensional visualization image in fig. 3), it can be seen that each cluster surrounds the scene semantic centroid and is substantially distributed in a circle, and the abnormal data is far from the scene semantic centroid.

The electronic device may determine an anomalous sensor of the N sensors based on the scene semantics. Specifically, the electronic device may calculate similarity between the scene semantics and M scene semantics centroids to obtain M similarity; then, the electronic device may first determine the M similarities and the similarity threshold, and when it is determined that the M similarities are greater than or equal to the similarity threshold, it may be indicated that the N acquired data are normal data, and it may be considered that there is no abnormal sensor in the N sensors; when the M similarities are all smaller than the similarity threshold, it may be indicated that abnormal data exists in the acquired N data, and it may be considered that an abnormal sensor exists in the N sensors, and then the electronic device may determine the abnormal sensor in the N sensors according to the scene semantics. The M scene semantic centroids correspond to the M scenes one by one, the scene semantic centroids are standard scene semantics corresponding to the scenes, and M is a positive integer greater than or equal to 1.

The electronic equipment can calculate cosine similarity between scene semantics and M scene semantics centroids, the cosine similarity is used as a mode for measuring the similarity, and the scene semantics v and the scene semantics centroids are centroids _i Cosine similarity cos theta between _i The calculation of (c) can be shown as the following equation (9).

Wherein, theta _i For scene semantics v and scene semantics centroid centroids _i The included angle therebetween. II centroid _i II is scene semantic centroid _i Is the scene semantic v, is the scene semantic v.

It should be understood that the cosine similarity threshold may be 0.5, may also be 0.7, may also be other values, and may be selected according to actual situations, and the embodiment of the present invention is not limited herein.

After the electronic device determines that the scene semantics are abnormal scene semantics, abnormal sensors of the N sensors may be determined according to the scene semantics. Specifically, the electronic device may determine a scene semantic centroid with the highest scene semantic similarity, to obtain a first scene semantic centroid; then, the electronic device may obtain N standard semantic vectors corresponding to the semantic centroid of the first scene; then, the electronic device can calculate semantic distances between the N scene semantic vectors and the N standard semantic vectors to obtain N semantic distances; then, the electronic device may determine the largest k semantic distances of the N semantic distances, and may determine sensors corresponding to the k semantic distances as anomalous sensors. K is a positive integer greater than or equal to 1, the N standard semantic vectors correspond to the N sensors one to one, the N semantic distances correspond to the N sensors one to one, the N scene semantic vectors correspond to the N sensors one to one, and the N scene semantic vectors are scene semantic vectors corresponding to original scene semantics. The first scene semantic centroid may be a scene semantic centroid corresponding to the first scene (e.g., a daytime straight ahead, a daytime overtaking, etc.). The steps of the electronic device identifying the abnormal sensor will be described in detail below.

Specifically, the higher the similarity between the scene semantic and a certain scene semantic centroid is, the more likely the scene semantic belongs to the abnormal scene semantic of the scene. Therefore, after the electronic device determines that the scene semantic is the abnormal scene semantic, the scene semantic centroid with the highest similarity (for example, the highest cosine similarity) to the scene semantic can be determined to obtain the first scene semantic centroid. The scene semantics can be vectors obtained by compressing the original scene semantics through an encoder; accordingly, the scene semantic centroid can also be regarded as a vector obtained by compressing the standard original scene semantic through the encoder. The standard original scene semantics may include N standard semantic vectors, and the N standard semantic vectors correspond to the N sensors one to one.

After the electronic device determines the scene semantic centroid with the highest similarity, the electronic device may acquire a standard semantic vector corresponding to the scene semantic centroid. Specifically, the electronic device may obtain, by the decoder, a standard semantic vector corresponding to the semantic centroid of the scene. Because the electronic equipment compresses the original scene semantics through the encoder, the embedded scene semantics are obtained. Therefore, the electronic device may also restore the embedded scene semantics to its original structure (i.e., the original scene semantics) through a decoder corresponding to the encoder. This decoder may be referred to as Vec2 ScSem.

It should be appreciated that in order for the decoder to be able to restore the embedded scene semantics to the original scene semantics, the decoder may be trained through truth data. During the training of the decoder, the electronics can freeze the parameters of the encoder. Decoder needs to reconstruct scene languageThe original input of the centroid (i.e. the original scene semantics) is defined and can therefore pass through an inverse encoding process corresponding to the encoder. In the k-th layer network of the decoder, the following formula (10) can be used to calculate the correlation coefficient between the node i and the node j

Wherein the content of the first and second substances,

are trainable parameters of the decoder layer k network. Sigmoid is Sigmoid function. σ may be an activation function ReLU, or may be another activation function, and the embodiment of the present invention is not limited herein. When k is a number of 1, the number of the transition metal,

and

may be the data of the ith and jth lines in the scene semantics.

The electronic device may correlate (i.e., compare) the coefficients

) The normalization is performed, and the normalization formula can be expressed as the following formula (11).

Wherein the content of the first and second substances,

may be a normalized coefficient between node i and node j,

is section (III)A set of neighbor nodes for point i. The node i of the k-1 layer after the k-layer network reconstruction of the decoder can be expressed as

Specifically, the following equation (12) shows.

Where σ may be the Relu activation function.

The output obtained by the electronic device through the last layer of network of the decoder is the original scene semantic corresponding to the scene semantic, and the original scene semantic may include N standard semantic components (i.e., N standard semantic vectors).

It should be understood that in order to obtain the correct original scene semantics after the scene semantics are decoded by the decoder, the reconstruction error can be minimized as an optimization target in the decoder training process, as shown in the following formula (13).

Wherein v is _i May be the original scene semantics obtained by the decoder,

may be the original scene semantics (i.e. the original scene semantics derived from the sensor truth data) input to the encoder.

Can be that

The die length of (2) is open.

After the decoder is trained through a large amount of truth data, a trained decoder can be obtained, and the electronic equipment can decode the scene semantic centroids through the decoder to obtain N standard semantic vectors corresponding to each scene semantic centroid. It should be noted that the number of network layers of the encoder and the decoder may be the same.

The electronic device can compare the scene semantics with the first scene semantic centroid at a level of sensor semantics. Specifically, assume that N standard semantic components obtained by decoding a first scene semantic centroid through a decoder are respectively recorded as

Scene semantics v _a The corresponding N scene semantic components of the original scene semantic (i.e., the scene semantic components of each sensor) are recorded as

The electronic device may calculate the semantic distance of each sensor based on the N scene semantic components of the original scene semantic and the N standard semantic components, and the calculation of the semantic distance may be as shown in equation (14) below.

Wherein the content of the first and second substances,

scene semantic vector for ith sensor

A standard semantic vector corresponding to the sensor in the first scene

The distance between them. l is the dimension of the single sensor semantic code.

May be the mth dimension of the ith sensorThe value of the degree.

Can be that

Value of the mth dimension.

The electronic device calculates semantic distances between the N scene semantic components of the original scene semantic and the N standard semantic components, and N semantic distances can be obtained. The electronic device may then compare the magnitudes of the N semantic distances, and may determine the farthest k semantic distances (i.e., the k largest semantic distances). Thereafter, the electronic device may determine k sensors corresponding to the k most distant semantic distances and may determine the k sensors as anomalous sensors or intruded sensors.

Referring to fig. 5, fig. 5 is a schematic diagram of semantic distance according to an embodiment of the disclosure. The semantic distance in fig. 5 is a normalized semantic distance, L in fig. 5 may represent a laser radar, C1, C2, C3, C4, C5, C6 may represent a front camera, a left front camera, a right front camera, a rear camera, a left rear camera, and a right rear camera, respectively; r1, R2, R3, R4, R5 may respectively denote a front radar, a left front radar, a right front radar, a left rear radar, and a right rear radar. As shown in fig. 5, L and C6 may be anomalous sensors or sensors that have suffered an intrusion that correspond to a semantic distance that is greater than the difference between a normal sensor and a standard semantic vector (i.e., a semantic distance that is greater).

It should be understood that the scene semantic centroid and the corresponding semantic components can be obtained by cooperating a ScSem2Vec (encoder) and a Vec2ScSem (decoder), the two model structures can be collectively referred to as a scene semantic centroid extractor, and the specific network architecture thereof can be as shown in fig. 4. As shown in fig. 4, the overall network architecture may include four parts. In the first part, the electronic device may input the original scene semantics into the encoder to obtain embedded scene semantics (i.e., the scene semantics), and the scene semantics under different scenes may form a scene semantics cluster through a large amount of truth data. In the encoder training process, the embedded scene semantics can be input into a classifier to obtain a predicted scene label, then, loss can be calculated based on the predicted scene label and the real scene label, and the trained encoder is obtained by optimizing the encoder through the loss. The original scene semantics are derived by the electronic device based on sensor data. In the second part, a trained encoder is used, scene semantics obtained through a large amount of truth value data form a scene semantic cluster, and a scene semantic centroid corresponding to each scene can be calculated based on the scene semantic cluster. In the third part, a decoder can be trained, parameters of an encoder can be frozen in the training process, embedded scene semantics can be obtained through truth value data, then the embedded scene semantics are input into the decoder, predicted original scene semantics can be obtained, then the electronic equipment can obtain reconstruction errors (namely reconstruction loss) based on the original scene semantics and the predicted original scene semantics, and the decoder can be optimized through the errors, so that the trained decoder can be obtained. In the fourth part, through the trained decoder, the electronic device can input the scene semantic centroid into the decoder to obtain the standard original scene semantics, wherein the standard original scene semantics comprises N standard semantic vectors.

It should also be understood that, in the embodiment of the present invention, the scene semantic centroid and the standard semantic component may be obtained in advance and may be used directly, so that the anomaly detection efficiency may be greatly improved by using the anomaly detection method disclosed in the embodiment of the present invention. Specifically, the electronic device may obtain the trained encoder and decoder in advance through pre-training, and may obtain the distribution plane of the normal scene semantic samples, and the scene semantic centroid.

In order to verify the effectiveness of the anomaly detection method provided by the invention, in the embodiment of the invention, the anomaly detection method is verified through an automatic driving public data set Nuschens. The Nuscenes data set comprises a plurality of types of sensors, provides data collected by the sensors in the automatic driving process, and carries out the work of aligning timestamps. Meanwhile, the Nuscenes data set includes a variety of scenes.

In which we have chosen data in eight different scenarios to train the encoder and decoder at the time of verification. The eight scenes can be daytime direct driving, daytime parking, daytime turning, daytime overtaking, nighttime direct driving, nighttime parking, nighttime turning and nighttime overtaking respectively. Referring to fig. 6, fig. 6 is a schematic diagram of a scene semantic cluster according to an embodiment of the present invention. Fig. 6 is a visual presentation of scene semantic clusters corresponding to the eight scenes on a two-dimensional plane, as shown in fig. 6, the eight scenes form eight scene semantic clusters through a trained encoder, and a scene semantic centroid of each scene semantic cluster can be obtained.

For example, for a sample in a daytime straight-ahead scene, one or more sensor data may be replaced with data in a nighttime straight-ahead scene, so that data of a part of sensors in the constructed abnormal sample is data in the daytime straight-ahead scene, and data of another part of sensors is data in the nighttime straight-ahead scene. In addition, data of different sensors may be cross-mixed to construct an abnormal sample, for example, data of five sensors, i.e., a front camera (i.e., a video camera), a right front camera, a front radar, a left front radar, and a laser radar, may be cross-mixed. Referring to fig. 6 again, the abnormal scene semantics in fig. 6 are the scene semantics obtained through the abnormal sample. As can be seen from fig. 6, although the constructed anomaly sample seemingly has no anomaly (i.e., conforms to the normal distribution of the data stream), the scene semantics corresponding to the anomaly sample are independent of each scene semantic cluster.

In the present example, the experimental effect was evaluated by the Detection Rate (DR) and the False Positive Rate (FPR). The detection rate represents the percentage of real abnormal data detected to the total number of abnormalities, and the false positive rate represents the percentage of normal data misclassified as abnormal data. The calculation formulas of the detection rate and the false positive rate are shown in the following formulas (15) and (16).

Wherein tp (true positive) represents a true positive, i.e., a correctly classified (detected) positive (i.e., positive sample); TN (true negative) indicates true negative, i.e., correctly classified negative; fp (false positive) indicates a false positive, i.e., a negative that is misclassified as a positive; fn (false negative) indicates a false negative case, i.e., a positive case that is misclassified as a negative case.

In the embodiment of the invention, the abnormality detection method is compared with the traditional abnormality detection method. For a sensor network, a conventional anomaly detection method is to analyze data streams of a single sensor, and if any sensor detects an abnormal value, an intrusion detection system will report that an anomaly occurs. According to the anomaly detection method disclosed by the embodiment of the invention, under the condition that the abnormal scene semantics exist, the intrusion detection system reports that the anomaly occurs. An intrusion detection system based on a conventional anomaly detection method may be referred to as an SSF-ids (single sensor flow intrusion detection system).

Referring to FIG. 7, FIG. 7 is a diagram illustrating a comparison of SSC-IDS and SSF-IDS according to an embodiment of the present invention. DR in fig. 7 can be a detection rate for detecting sensor network abnormality (i.e., a sensor that detects whether there is an abnormality in the sensor network) for SSC-IDS and SSF-IDS without detecting which specific sensor is abnormal. As shown in FIG. 7, the detection rate of SSF-IDS is about 81.1%, while the detection rate of the intrusion detection system SSC-IDS based on the scene semantic centroid proposed by the embodiment of the invention is about 97.6%. Therefore, the detection effect of the SSC-IDS is better than that of the SSF-IDS, the false alarm of the sensor system can be avoided, and the quality of the intrusion detection system can be improved.

Further, the traceability of SSC-IDS and SSF-IDS to the anomaly sensor can be compared. The SSF-IDS is used for judging whether a single sensor is an abnormal sensor or not based on the data of the single sensor; the SSC-IDS is a sensor that detects the presence of an abnormality in all sensors based on data of all sensors.

Referring to FIG. 8, FIG. 8 is a diagram illustrating a comparison of SSC-IDS and SSF-IDS according to another embodiment of the present invention. DR in FIG. 8, can detect the detection rate of an anomalous sensor (i.e., detecting a particular sensor in the sensor network) for SSC-IDS and SSF-IDS. For single data streams of the front camera, the right front camera, the front radar, the left front radar and the laser radar, SSF-IDS can be adopted to detect whether the sensor is abnormal or not, and the average detection rate is about 57.5%. The detection rate of SSC-IDS is about 95.7%.

It can be seen that SSC-IDS is more effective in detection than SSF-IDS. The SSC-IDS can determine abnormal sensors based on the standard semantic component of the scene semantic centroid, can effectively distinguish countermeasure samples from normal data, can improve the accuracy of anomaly detection, and can improve the quality of an intrusion detection system.

It is assumed that the detection rate of an abnormal scenario (i.e., the detection rate of only detecting whether the sensor network is abnormal) is expressed as DR _scene The detection rate of sensing intrusion (i.e., the detection rate of a sensor detecting a specific abnormality in the sensor network) is expressed as DR _sensing . In the embodiment of the invention, the overall detection rate DR is used _total The comprehensive performance of SSC-IDS and SSF-IDS was evaluated, and the overall detection rate was calculated as shown in the following equation (17).

DR _total ＝DR _scene ×DR _sensing (17)

Through a large number of test samples, the overall detection rate of SSF-IDS is about 46.3%, and the overall detection rate of SSF-IDS is about 93.4%. It can be seen that the SSC-IDS can effectively detect the abnormality of the sensor network and can find out the specific abnormal sensor.

It should be noted that the hyper parameter k set when determining an abnormal sensor through the SSC-IDS may be 1, and therefore, a sensor corresponding to the largest semantic distance may be determined as an abnormal sensor.

Wherein, the values of k are different, and the detection results can be different. Referring to fig. 9, fig. 9 is a schematic diagram illustrating a comparison of different k values according to an embodiment of the present invention. As shown in fig. 9, the larger the k value (from 1 to 5), the smaller the improvement of the detection rate of SCC-IDS, and the more likely to be invaded sensor can be detected. At the same time, however, there may be more normal sensors that are misjudged as abnormal, resulting in a significant increase in false positive rate, which may reduce detection efficiency.

It should be understood that, according to practical situations, it is found that there are fewer cases where multiple abnormal sensors exist simultaneously (e.g., multiple sensors are invaded simultaneously), and therefore, it may be preferable to take a smaller value of k, such as 1, 2, etc. The specific number of k values can be determined according to actual situations, and the embodiment of the present invention is not limited herein.

In the embodiment of the invention, the electronic equipment can firstly construct a sensor network topological graph and can obtain an adjacency matrix comprising the connection relation among the sensors; then, the electronic equipment can fuse the semantics of the sensors to obtain scene semantics; then, the electronic device can perform anomaly detection on the scene semantics based on the scene semantic centroid; when the scene semantic is detected to be an abnormal scene semantic, the electronic device may trace the abnormal sensor based on the abnormal scene semantic, that is, by calculating a semantic distance between a semantic component of the abnormal scene semantic and a standard semantic component, determine the sensor corresponding to the largest k semantic distances as the abnormal sensor. Therefore, the electronic equipment can carry out consistency check (namely carrying out similarity comparison with each scene semantic centroid) based on scene semantics described by the sensor network, can effectively detect whether an abnormal sensor exists in the sensor network or whether a sensor system is invaded, can trace the source of the abnormal sensor, and can give an alarm. And the scene semantic-based detection method can detect the attack against the sample.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an abnormality detection device according to an embodiment of the present invention. The abnormality detection device may be an electronic device or a module in the electronic device. As shown in fig. 10, the apparatus may include:

an obtaining unit 1001 configured to obtain N data, where the N data correspond to N sensors one to one, and N is a positive integer greater than 1;

a determining unit 1002, configured to determine an abnormal sensor of the N sensors according to the N data and the adjacency matrix between the N sensors.

In one embodiment, the determining unit 1002 determines an abnormal sensor of the N sensors according to the N data and the adjacency matrix between the N sensors includes:

In one embodiment, the determining unit 1002 deriving scene semantics from the N data and the adjacency matrix between the N sensors includes:

In one embodiment, the determining unit 1002 obtains N semantic vectors according to the N data, including:

In one embodiment, the determining unit 1002 obtains scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors, including:

splicing the N semantic vectors to obtain sensor semantic features;

In one embodiment, the apparatus may further comprise:

the processing unit 1003 is configured to calculate similarities between the scene semantics and M scene semantic centroids to obtain M similarities, where the M scene semantic centroids correspond to M scenes one to one, the scene semantic centroid is a standard scene semantic corresponding to a scene, and M is a positive integer greater than or equal to 1;

the determining unit 1002 determining an abnormal sensor among the N sensors according to the scene semantics includes:

In one embodiment, the original scene semantics include N scene semantic vectors, the N scene semantic vectors corresponding one-to-one with the N sensors; the determining unit 1002 determining an abnormal sensor among the N sensors according to the scene semantics includes:

More detailed descriptions about the obtaining unit 1001, the determining unit 1002, and the processing unit 1003 can be directly obtained by referring to the related descriptions in the embodiment of the method shown in fig. 1, which are not repeated herein.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 11, the electronic device 1100 may include: at least one processor 1101, such as a CPU, at least one memory 1105, at least one communication bus 1102. Optionally, the electronic device 1100 may also include at least one network interface 1104, user interface 1103. Wherein a communication bus 1102 is used to enable connective communication between these components. The user interface 1103 may include a display screen (display) and a keyboard (keyboard), and the network interface 1104 may optionally include a standard wired interface and a wireless interface (e.g., WI-FI interface). The memory 1105 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1105 may optionally also be at least one storage device located remotely from the aforementioned processor 1101. As shown in fig. 11, the memory 1105, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.

In the electronic device 1100 shown in fig. 11, the network interface 1104 may provide a network communication function; the user interface 1103 is primarily used as an interface for providing input to a user.

In one embodiment, the processor 1101 may be configured to invoke a device control application stored in the memory 1105 and may implement:

It should be understood that the electronic device 1100 described in this embodiment of the present application may be used to execute the method performed by the electronic device in the embodiment of the method in fig. 1, and reference may be made to the relevant description, which is not described herein again.

Embodiments of the present invention also disclose a computer-readable storage medium having instructions stored thereon, which when executed perform the method in the above method embodiments.

The embodiment of the invention also discloses a computer program product comprising instructions, and the instructions are executed to execute the method in the embodiment of the method.

The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. An abnormality detection method characterized by comprising:

determining an anomalous sensor of the N sensors from the N data and a proximity matrix between the N sensors.

2. The method of claim 1, wherein said determining an anomalous sensor of the N sensors from the N data and an adjacency matrix between the N sensors comprises:

obtaining scene semantics according to the N data and an adjacent matrix among the N sensors, wherein the scene semantics comprise the semantics of the N sensors;

3. The method of claim 2, wherein deriving scene semantics from the N data and a adjacency matrix between the N sensors comprises:

and obtaining scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors.

4. The method of claim 3, wherein the deriving N semantic vectors from the N data comprises:

5. The method of claim 3, wherein the deriving scene semantics based on the N semantic vectors and the adjacency matrix between the N sensors comprises:

splicing the N semantic vectors to obtain sensor semantic features;

6. The method of claim 5, further comprising:

calculating the similarity between the scene semantics and M scene semantics centroids to obtain M similarities, wherein the M scene semantics centroids correspond to the M scenes one by one, the scene semantics centroids are standard scene semantics corresponding to the scenes, and M is a positive integer greater than or equal to 1;

the determining, from the scene semantics, an anomalous sensor of the N sensors comprises:

and under the condition that the M similarity degrees are all smaller than a similarity threshold value, determining abnormal sensors in the N sensors according to the scene semantics.

7. The method of claim 6, wherein the original scene semantics comprise N scene semantic vectors, the N scene semantic vectors corresponding one-to-one to the N sensors; the determining, from the scene semantics, an anomalous sensor of the N sensors comprises:

8. An abnormality detection device characterized by comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N data, the N data correspond to N sensors one by one, and N is a positive integer greater than 1;

a determining unit for determining abnormal sensors among the N sensors according to the N data and an adjacency matrix among the N sensors.

9. An electronic device, comprising: a memory and a processor; wherein:

the memory for storing a computer program, the computer program comprising program instructions;

the processor is configured to invoke the program instructions to cause the electronic device to perform the method of any of claims 1-7.

10. A computer-readable storage medium, in which a computer program or computer instructions are stored which, when executed, implement the method according to any one of claims 1 to 7.