WO2018105320A1

WO2018105320A1 - Information processing device, information processing method, and program

Info

Publication number: WO2018105320A1
Application number: PCT/JP2017/040727
Authority: WO
Inventors: 良太高橋; 崇光佐々木
Original assignee: パナソニックインテレクチュアルプロパティコーポレーションオブアメリカ
Priority date: 2016-12-06
Filing date: 2017-11-13
Publication date: 2018-06-14

Abstract

Provided is an information processing device, comprising a processor. The processor: receives an input of data elements which are two or more vectors which are used as training data; normalizes the training data so as to distribute same across a first region; segments a multidimensional second region which contains the first region into third regions which are hypercubes of equivalent size; acquires the number S of data elements which each of the third regions includes; adds in a uniform distribution, to each of the third regions which include fewer than a first threshold T of the data elements, (T – S) noise elements which are vectors; generates noise-added training data which includes the vectors in the second region; and using the generated noise-added training data, generates and outputs Isolation Forest learning dictionary data.

Description

Information processing apparatus, information processing method, and program

This disclosure relates to an abnormality detection technology used in an in-vehicle network or the like.

In an automobile that has become more electronic, the importance of the in-vehicle network is higher than before.

The automobile is equipped with a large number of electronic control units (Electronic Control Units, hereinafter referred to as ECUs) for controlling various systems. The ECUs are connected to an in-vehicle network, and communication is performed through the in-vehicle network in order to realize various functions of the automobile. CAN (Controller Area Network) is one of such in-vehicle network standards, and is adopted as a standard technology in many countries and regions.

A network conforming to the CAN protocol can be constructed as a closed communication path on a single vehicle. However, it is not uncommon for each automobile to be built and installed as a network that can be accessed from the outside. For example, an in-vehicle network is provided with a port for taking out information flowing through the network for the purpose of diagnosis for each in-vehicle system, or a car navigation system having a function of providing a wireless LAN is connected. Allowing external access to the in-vehicle network can improve convenience for automobile users, but also increases threats.

For example, in 2013, it was proved that unauthorized vehicle control by misuse of the parking support function from the outside of the in-vehicle network is possible. In 2015, it was demonstrated that remote control of a specific vehicle type is possible, and this verification started as a recall for that vehicle type.

Such unauthorized control of vehicles by external access is a problem that cannot be overlooked by the automobile industry, and security measures for in-vehicle networks are urgently needed.

One method of attacking the in-vehicle network is to access the ECU connected to the network from the outside, take over the ECU, and transmit a frame for attack (hereinafter also referred to as an attack frame) from the ECU to Some of them are illegally controlled. An attack frame is an abnormal frame that differs in some way from a normal frame that flows through an in-vehicle network that is not attacked.

As a technique for detecting an abnormality in such an in-vehicle network, an abnormal data detection process for a frame (hereinafter also referred to as a CAN message or simply a message) flowing on a CAN bus is obtained as a result of learning using learning data. A technique to be executed using an evaluation model is disclosed (see Patent Document 1 and Patent Document 2).

Japanese Patent Laying-Open No. 2015-026252 JP2015-170121A

The attacks on the in-vehicle network and the security technology to counter the attacks are still under research, and the techniques of Patent Documents 1 and 2 are not necessarily sufficient, and further research and development are desired.

This disclosure provides an information processing apparatus and the like that are useful for detecting anomalies due to attacks in an in-vehicle network of a vehicle such as an automobile.

In order to solve the above problem, an information processing apparatus according to an aspect of the present disclosure is an information processing apparatus including a processor, and the processor uses N pieces of N (N is 2 or more) used as training data for Isolation Forest. (Integer) M-dimensional vector (M is an integer equal to or larger than 2), a data element acquisition step that receives input of data elements, and normalization that normalizes the training data to be distributed over the first area of M dimensions Step and M-dimensional second region larger than the first region and including the first region into a third region which is an LM hypercube of LM pieces (L is an integer of 4 or more) having the same size. A division step for dividing and the number S of data elements included in each of the third regions (S is an integer of 0 or more) are obtained, and among the third regions, a first threshold T (T is a natural number) A first noise adding step of adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including a smaller number of the data elements; A generation step of generating noise-added training data including a noise element and a learning dictionary data output step of generating and outputting Ilation Forest learning dictionary data using the noise-added training data are executed.

Further, an information processing method according to an aspect of the present disclosure is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest). A data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. And a normalizing step for making an M-dimensional second region larger than the first region and including the first region, the third region being an M-dimensional hypercube of LM pieces (L is an integer of 4 or more) having the same size And the number of data elements included in each of the third regions (S is an integer greater than or equal to 0) is obtained, and among the third regions, the first threshold T (T is a natural number) is obtained. A first noise adding step of adding noise elements as (TS) M-dimensional vectors in a uniform distribution to each of the third regions including a small number of data elements, and including the data elements and the noise elements A generation step for generating noise-added training data and a learning dictionary data output step for generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.

Note that these comprehensive or specific aspects may be realized by a system, apparatus, method, integrated circuit, computer program, or non-transitory recording medium such as a computer-readable CD-ROM. The present invention may be realized by any combination of an apparatus, a method, an integrated circuit, a computer program, and a recording medium.

According to the present disclosure, there is provided an information processing apparatus and the like that can quickly provide a learning dictionary that is used for abnormality detection due to an attack in an in-vehicle network of a vehicle such as an automobile and that has a reduced false detection rate.

1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1. FIG. FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1. FIG. 1C is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1. FIG. 2 is a block diagram illustrating a configuration example of an abnormality determination unit and a learning unit that configure the above-described abnormality detection system. FIG. 3 is a schematic diagram for explaining a learning dictionary generated by the learning unit using training data. FIG. 4 is a schematic diagram for explaining the abnormality determination by the abnormality determination unit. FIG. 5 is a diagram showing a data flow in the learning unit that generates the learning dictionary. FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit that performs abnormality determination. FIG. 7 is a diagram illustrating an example of an inappropriate determination boundary that does not fit the distribution of training data. FIG. 8 is a flowchart illustrating an example of a training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system. FIG. 9A is a diagram illustrating an example of training data before normalization distributed in an M-dimensional space. FIG. 9B is a diagram illustrating an example of training data after normalization distributed in an M-dimensional space. FIG. 9C is a diagram illustrating an example of training data after addition of noise elements distributed in the M-dimensional space. FIG. 10 is a flowchart showing another example of the training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system. FIG. 11A is a diagram for explaining an example of division of an M-dimensional region in the M-dimensional space. FIG. 11B is a diagram for describing an example of training data after adding noise elements distributed in an M-dimensional space. FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise. FIG. 12B is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise. FIG. 12C is a bar graph showing a false detection rate in an abnormality detection test performed using each learning dictionary whose determination boundaries are shown in FIGS. 12A and 12B. FIG. 13 is a flowchart illustrating an example of a processing method for determining whether to select a training data processing method and whether to perform parameter search in each processing method, which is executed in the abnormality detection system according to the second embodiment. It is. FIG. 14 is a flowchart illustrating an example of a processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment. FIG. 15 is a flowchart illustrating another example of the processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.

(Knowledge that became the basis of this disclosure)
Methods proposed as security measures for in-vehicle networks can be broadly divided into two.

One is to use message encryption or sender authentication. However, some of these technologies are theoretically effective but need to be changed in the mounting of the ECU, and the number of ECUs mounted per vehicle may exceed several hundreds. Dissemination is difficult.

The other is to monitor CAN messages flowing through the in-vehicle network. This method can be realized by adding a monitoring ECU (node) to each vehicle, and is relatively easy to introduce. The proposed method can be further classified into three types: a rule-based method, a method that uses the data transmission cycle, and a method that detects outliers in message contents using LOF (Local Outer Factor). It can be roughly divided.

Among these three methods, the rule-based method and the method using the data transmission cycle can deal with a known attack pattern, but a method using LOF to detect an unknown attack pattern. Thus, detection based on the content of the message is necessary.

However, in the method using LOF, it is necessary to store a large amount of normal data for the evaluation of the CAN message, and a large amount of calculation is required. However, the ECU connected to the in-vehicle network does not always have sufficient data processing capacity and storage capacity, and even in such an execution environment, the speed required for a car traveling on a road at several tens of kilometers per hour is required. If it is not possible to detect, it is not practical.

Therefore, the present inventors use an abnormality detection algorithm called Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make

An information processing apparatus according to an aspect of the present disclosure is an information processing apparatus including a processor, and the processor uses N (N is an integer of 2 or more) M-dimensional vectors (N is an integer of 2 or more) used as training data for Isolation Forest. A data element acquisition step that receives an input of a data element that is an integer greater than or equal to 2), a normalization step that normalizes the training data to be distributed over the M-dimensional first region, and a step larger than the first region. A division step of dividing an M-dimensional second region including one region into third regions which are LM pieces (L is an integer of 4 or more) having the same size, and each of the third regions; The number S of data elements included in S is acquired (S is an integer equal to or greater than 0), and each of the third areas including a number of data elements smaller than the first threshold T (T is a natural number) among the third areas. A (TS) M-dimensional vector of noise elements added in a uniform distribution, a first noise adding step, a data element and a noise adding training data including the noise elements, a generating step, a noise A learning dictionary data output step of generating and outputting the learning dictionary data of the Isolation Forest using the additional training data is executed.

This makes it possible to obtain a learning dictionary that enables execution of Isolation Forest with a lower false detection rate.

Also, for example, the processor executes a first determination step for determining whether N is equal to or greater than a predetermined second threshold value, and if it is determined in the first determination step that N is not equal to or greater than the second threshold value, the division step The generation step and the learning dictionary data output step may be executed after executing the first noise addition step.

Thereby, for example, when the number of data elements of training data is excessive with respect to the load state of the processor, generation of learning dictionary data using the training data can be postponed.

Further, for example, when the processor determines that N is equal to or greater than the second threshold value in the first determination step, the noise element that is an M-dimensional vector of K pieces (K is a natural number smaller than N) is set in the second region. The generation step and the learning dictionary data output step may be executed after executing the second noise addition step for adding at a different density.

This makes it possible to switch the noise addition method according to the processing load that changes depending on the size of the training data, and to generate the learning dictionary at a speed suitable for the execution environment.

Further, for example, when the processor further determines that N is not equal to or greater than the second threshold value in the first determination step, the processor receives a test data in the Isolation Forest, and N is equal to or greater than a predetermined third threshold value. A second determination step for determining whether or not there is present, and when it is determined in the second determination step that N is not equal to or greater than a third threshold value, a division step, a first noise addition step, a generation step, and learning dictionary data output A set of steps is executed a plurality of times using L having different values in the division step to output a plurality of learning dictionary data, and further, abnormality detection for the test data is executed using each of the plurality of learning dictionary data. An evaluation step for evaluating each of the plurality of learning dictionary data based on the result of the abnormality detection, and an evaluation step A learning dictionary data selection step for selecting the best learning dictionary data from a plurality of learning dictionary data based on the result of the search, and if it is determined in the second determination step that N is greater than or equal to the third threshold, The set may be executed once using L which is a predetermined value in the step.

This makes it possible to switch between generating a plurality of learning dictionary data and outputting the optimal one, or generating and outputting a single learning dictionary data according to the processing load that changes depending on the size of the training data. . Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.

For example, if the processor determines that N is not equal to or greater than the third threshold in the second determination step, the processor may determine the number of different values of L so as to have a negative correlation with the value of N.

∙ As a result, if the training data is large, the processing load is reduced by reducing the number of divisions into the third area. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.

Further, for example, in the first noise addition step, the processor determines, as the value of the first threshold value T, any number smaller than the median number of data elements included in each of the third areas in the first area. May be.

Therefore, if the training data is large, an increase in processing load can be suppressed by reducing the number of third regions to which noise elements are added. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.

In addition, for example, when the processor determines that N is equal to or greater than the second threshold value in the first determination step, the processor receives the test data for Isolation Forest test data, and N is equal to or greater than a predetermined fourth threshold value. A third determination step for determining whether or not there is present, and if it is determined in the third determination step that N is not greater than or equal to the fourth threshold, a set of a second noise addition step, a generation step, and a learning dictionary data output step Is executed a plurality of times using K having different values in the second noise addition step to output a plurality of learning dictionary data, and further, abnormality detection is performed on the test data using each of the plurality of learning dictionary data. An evaluation step for evaluating each of a plurality of learning dictionary data, and a plurality of learning dictionaries based on the result of the evaluation step. A learning dictionary data selection step for selecting the best learning dictionary data from the data, and if it is determined in the third determination step that N is equal to or greater than a fourth threshold, K is a predetermined value in the second noise addition step. The set may be executed once using.

For example, if the processor determines that N is not equal to or greater than the fourth threshold in the third determination step, the processor may determine the number of different values of K so as to have a negative correlation with the value of N.

This makes it possible to suppress an increase in processing load by reducing the number of learning dictionaries to be generated. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.

For example, if the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space, the second region is [−0.5, 1.5] in the M-dimensional space. It may be a region defined by a hypercube of M.

Thereby, even if there are few outliers in the training data that can be used to generate the learning dictionary, it is possible to obtain a learning dictionary that enables anomaly detection with a lower false detection rate.

An abnormality detection system according to an aspect of the present disclosure includes any one of the information processing apparatuses described above and a memory and a processor that store learning dictionary data output from the information processing apparatus, and is connected to a network. The determination device, the processor includes an abnormality determination device that acquires data flowing through a network and executes an abnormality determination of the acquired data based on learning dictionary data stored in a memory.

Thus, abnormality detection is performed using a learning dictionary that is updated quickly in consideration of accuracy.

Further, an information processing method according to an aspect of the present disclosure is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest). A data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. Normalization step, and dividing the M-dimensional second area larger than the first area and including the first area into LM third areas (L is an integer of 4 or more) having the same size. The step S and the number S of data elements included in each of the third regions (S is an integer greater than or equal to 0) are acquired, and the number of data elements smaller than the first threshold T (T is a natural number) among the third regions A first noise addition step for adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including the data elements, and noise addition including the data elements and the noise elements A generation step of generating training data, and a learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.

Also, a program according to an aspect of the present disclosure is a program that causes a processor included in a computer to execute the above information processing method.

Also by such a method or program, it is possible to obtain a learning dictionary that enables execution of Isolation Forest with a lower false detection rate.

These general or specific aspects may be realized by any of a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. You may implement | achieve with arbitrary combinations of a circuit, a computer program, or a recording medium.

Hereinafter, an information processing apparatus, an information processing method, and the like according to embodiments will be described with reference to the drawings. Each of the embodiments shown here shows a specific example of the present disclosure. Therefore, numerical values, components, arrangement and connection forms of components, and steps (processes) and order of steps shown in the following embodiments are merely examples, and do not limit the present disclosure.

In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims can be arbitrarily added. Each figure is a schematic diagram and is not necessarily shown strictly.

In addition, the explanation regarding CAN and Isolation Forest included in the following is mainly intended to assist understanding of the present disclosure, and matters not included in the claims of this description are intended to limit the present disclosure. It is not described in.

(Embodiment 1)
[Constitution]
[Overview]
1A to 1C are block diagrams respectively showing a configuration example of an abnormality detection system including an information processing apparatus according to Embodiment 1.

1A to 1C show

abnormality detection systems

100A, 100B, and 100C having different configurations, respectively.

The anomaly detection systems 100A to 100C are systems that detect an anomaly of data flowing through a network to be monitored using an algorithm called Isolation Forest, and each includes an anomaly determination unit 110 and a learning unit 120.

The abnormality determination unit 110 determines whether data flowing through the in-vehicle network 210 included in the vehicle 20 is normal or abnormal. The vehicle 20 is an automobile, for example.

The in-vehicle network 210 is a network corresponding to, for example, a CAN standard, and includes a bus, a plurality of ECUs and diagnostic ports connected to the bus in each of the configuration examples of FIGS. 1A to 1C. The plurality of ECUs include ECUs having different functions such as an ECU that collects and analyzes measurement data from various sensors, an ECU that controls an engine, an ECU that controls a brake, and an ECU that monitors a network. The data flowing through the in-vehicle network 210 is message data flowing through the bus.

The learning unit 120 performs prior learning for the abnormality determination unit 110 to perform the above determination. More specifically, the learning unit 120 learns using the training data, and generates a learning dictionary that the abnormality determination unit 110 uses for the above determination. The generated learning dictionary data (hereinafter also referred to as learning dictionary data) is stored, for example, in a storage device (not shown).

The abnormality determination unit 110 reads the learning dictionary from the storage device, and whether or not unknown data that is a target of normality or abnormality, that is, message data acquired from the in-vehicle network 210 deviates from the learning dictionary. Whether or not it is abnormal is determined based on whether or not it is abnormal. More specifically, the learning dictionary generated by the learning unit 120 includes a plurality of binary trees, and the abnormality determination unit 110 uses the average value of the scores calculated from the plurality of binary trees to determine whether the data is abnormal. Determine whether. In addition, this binary tree used in Isolation Forest is called Isolation Tree or iTree.

The abnormality determination unit 110 and the learning unit 120 are functional components provided by a processor that reads and executes a predetermined program. In each configuration example shown in FIGS. 1A to 1C, the locations of the processors that provide the functional components of these processors are different.

In the configuration example shown in FIG. 1A, the learning unit 120 is provided by a processor and a memory included in the external server 10 that is a so-called server computer outside the vehicle 20. The external server 10 is one example of the information processing apparatus in the present embodiment.

In this case, the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network. The learning unit 120 also outputs Isolation Forest learning dictionary data generated using the training data, and provides it to the abnormality determination unit 110 of the vehicle 20 via the communication network.

In the vehicle 20, the learning dictionary data is stored in a storage device such as a flash memory of a microcontroller included in a monitoring ECU for network monitoring connected to the in-vehicle network 210, and the abnormality determination unit 110 is operated by the processor of the microcontroller. Provided. The abnormality determination unit 110 performs message abnormality determination on the message acquired from the bus using the learning dictionary data acquired from the learning dictionary data from the storage device.

In such a configuration, learning dictionary data updated after shipment of the vehicle 20 can be provided to the abnormality determination unit 110.

1B, both the abnormality determination unit 110 and the learning unit 120 are provided by a processor and a memory included in the external server 10 outside the vehicle 20. Such an external server 10 is also an example of the information processing apparatus in the present embodiment.

Also in this case, the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network. The learning unit 120 outputs the learning dictionary data of the Isolation Forest generated using the training data, but the output destination is not outside the external server 10, but a storage device (for example, a hard disk drive provided in the external server 10 (illustrated) None).

In this configuration, the abnormality determination is performed on the external server 10 instead of on the vehicle 20. That is, the message flowing through the in-vehicle network 210 is transmitted to the external server 10 via the communication network. This message received by the external server 10 is input to the abnormality determination unit 110. Abnormality determination unit 110 acquires learning dictionary data from the storage device, performs abnormality determination of the message using the learning dictionary data, and transmits the result to vehicle 20 via the communication network.

In such a configuration, the learning dictionary data used by the abnormality determination unit 110 in the external server 10 is updated as needed.

In the configuration example shown in FIG. 1C, both the abnormality determination unit 110 and the learning unit 120 are provided by a microcontroller provided in a monitoring ECU that is connected to the in-vehicle network 210 of the vehicle 20 and monitors the in-vehicle network 210. . The monitoring ECU 10 is one example of the information processing apparatus in the present embodiment.

In this case, the learning unit 120 directly acquires and uses, for example, a message flowing through the in-vehicle network 210 as training data. The learning unit 120 outputs the learning dictionary data of Isolation Forest generated using the training data, but the output destination is not outside the vehicle 20, but a storage device on the vehicle 20, for example, a flash memory in the monitoring ECU Etc. are stored in a storage device.

In this configuration, learning dictionary generation and abnormality determination are performed on the vehicle 20. For example, in the monitoring ECU, the learning unit 120 acquires message data flowing through the in-vehicle network 210 to which the monitoring ECU is connected, and uses it as training data to generate a learning dictionary. The generated learning dictionary data is stored in the storage device of the monitoring ECU. Moreover, in the monitoring ECU, the abnormality determination unit 110 further acquires learning dictionary data from the storage device, and executes abnormality determination of the message using the learning dictionary data.

Even with such a configuration, the learning dictionary data used by the abnormality determination unit 110 on the vehicle 20 can be updated.

Further, each configuration shown in FIGS. 1A to 1C may be a configuration that can be dynamically changed on the vehicle 20 instead of a fixed configuration on the vehicle 20 after shipment. For example, depending on the communication speed between the vehicle 20 and the external server 10, the usage rate of the computer resources of the monitoring ECU, the remaining power amount when the vehicle 20 is an electric vehicle, or the operation of the driver, between these configurations Switching may be possible.

[Configuration of abnormality determination unit and learning unit]
The configurations of the abnormality determination unit 110 and the learning unit 120 that are components of the

abnormality detection systems

100A, 100B, and 100C described in the configuration overview will be described. In the following description, any one of the

abnormality detection systems

100A, 100B, and 100C is not specified, or all of them are collectively referred to as the abnormality detection system 100.

FIG. 2 is a block diagram illustrating a configuration example of the abnormality determination unit 110 and the learning unit 120 included in the abnormality detection system 100.

2, the learning unit 120 includes a training data receiving unit 122 and a learning dictionary generating unit 124.

The training data receiving unit 122 receives input of training data. The training data here is two or more M-dimensional vectors, and M is an integer of 2 or more. The value of each dimension is a value of each byte from the beginning of the payload of the CAN message having a maximum of 8 bytes, for example.

The learning unit 120 generates learning dictionary data using the training data received by the training data receiving unit 122, and outputs the learning dictionary data to a storage unit 112 of the abnormality determination unit 110 described later.

FIG. 3 is a schematic diagram for explaining the data elements of the training data in the case of M = 2 and the learning dictionary generated using the training data. In FIG. 3, the data elements are point groups distributed in the M-dimensional space, each point is indicated by a white circle, and the learning dictionary is a boundary in the M-dimensional space and indicated by a thick solid line. Hereinafter, this boundary is also referred to as a determination boundary. When M = 2, the determination boundary is a boundary line.

2, the abnormality determination unit 110 includes a storage unit 112, a determination target data reception unit 114, a determination target data conversion unit 116, and a determination execution unit 118.

The storage unit 112 stores the learning dictionary data output from the learning unit 120 as described above. In addition, data used for conversion of determination target data described later is also stored in the storage unit 112.

The determination target data receiving unit 114 acquires data that is a target of abnormality determination, that is, a CAN message from the in-vehicle network 210.

The determination target data conversion unit 116 converts the CAN message received by the determination target data reception unit 114 into a format for processing by the determination execution unit 118. In this conversion, for example, extraction of a determination target portion from the CAN message, normalization using the data for conversion of the determination target data, and the like are performed. The normalization will be described later.

The determination execution unit 118 determines whether the determination target data is normal or abnormal, that is, abnormality determination based on the learning dictionary stored as learning dictionary data in the storage unit 112.

FIG. 4 is a schematic diagram for explaining this abnormality determination. In FIG. 4, two pieces of data, determination target data A and determination target data B, are shown in the M-dimensional space based on the values.

The determination execution unit 118 determines whether each data is normal or abnormal based on whether the data is positioned inside or outside the determination boundary of the learning dictionary, and outputs the result. In this example, determination target data A located inside the determination boundary is determined to be normal, and determination target data B positioned outside the determination boundary is determined to be abnormal. When it is determined that there is an abnormality, the monitoring ECU including the abnormality determination unit 110 and the learning unit executes, for example, another program that receives the determination result as an input and outputs an error message to the bus, A command for restricting part or all of the functions of the ECU or shifting another ECU to a special operation mode corresponding to an abnormality is transmitted. Moreover, the notification of abnormality occurrence toward the driver of the vehicle 20 may be issued by display on the instrument panel or by voice. In addition, information regarding the occurrence of an abnormality may be recorded in a log. This log is acquired and used, for example, by a mechanic of the vehicle 20 through a diagnostic port included in the in-vehicle network 210.

Each component of the abnormality determination unit 110 and the learning unit 120 executes a part of the Isolation Forest algorithm, and cooperates as described above to execute the entire Isolation Forest algorithm.

[Outline of processing in anomaly detection system]
Data flows in the abnormality determination unit 110 and the learning unit 120 including the above-described components are shown in FIGS. FIG. 5 is a diagram illustrating a data flow in the learning unit 120 that generates the learning dictionary. FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit 110 that performs abnormality determination. These diagrams are based on a sequence diagram showing the flow of data, and are also represented in a form that also serves as a flowchart showing the processing order in each unit.

As shown in FIG. 5, in the learning unit 120 that generates the learning dictionary, first, the training data receiving unit 122 receives input and acquires training data (step S51). If the generation of the learning dictionary is performed before the vehicle 20 is shipped, the training data input source is, for example, a place in the storage device that is artificially specified or preset at this stage. Further, if the learning dictionary is generated after the vehicle 20 is shipped, for example, the vehicle-mounted network 210 to which the monitoring ECU including the learning unit 120 is connected.

Next, in the learning unit 120, the training dictionary generation unit 124 normalizes the input training data (step S52), and generates a learning dictionary by the method of Isolation Forest using the normalized training data (step S53). . Normalization refers to the original distribution range of the input training data in the M-dimensional space, maintaining the relative positional relationship of each training data, and the distribution range within a predetermined region in the same space. It is a calculation process that converts to pass.

The generated learning dictionary data is transferred to the abnormality determination unit 110 (step S54), and the abnormality determination unit 110 stores the learning dictionary data in the storage unit 112 (step S55). In addition to the learning dictionary data, the data used in the normalization calculation process is also passed from the learning unit 120 to the abnormality determination unit 110. This data includes the maximum and minimum values of each component of the feature vector necessary for conversion. In the abnormality determination unit 110, normalization of unknown data that is a determination target is executed using this data.

As shown in FIG. 6, in the abnormality determination unit 110 that performs abnormality determination, first, the determination target data receiving unit 114 acquires data of a CAN message that is a target of abnormality determination from the in-vehicle network 210 (step S61). ).

Next, in the abnormality determination unit 110, the determination execution unit 118 reads the learning dictionary data stored in the storage unit 112 (step S62). Further, the determination target data conversion unit 116 reads data such as coefficients used for normalization of the training data from the storage unit 112, and normalizes the determination target data, that is, the acquired CAN message data, using this data. (Step S63). The determination execution unit 118 determines whether the normalized data is normal or abnormal based on the learning dictionary data (step S64).

The above is the outline of the abnormality detection process including the steps from the generation of the learning dictionary using the training data to the abnormality determination using this learning dictionary, which is executed in the abnormality detection system 100. By adopting the Isolation Forest method for this abnormality detection, the load on the computer resources is reduced compared to the conventional case, and the processing can be executed at a higher speed.

However, in the Isolation Forest algorithm, there are cases where the judgment boundary of the learning dictionary obtained as a result of learning does not properly fit the distribution in the M-dimensional space of normal training data. FIG. 7 is an example of such an inappropriate determination boundary. As described above, when the determination boundary is inside the outer edge of the distribution of normal data elements, an erroneous determination is made in which it is determined that the abnormality is abnormal although it is actually normal. In the example of FIG. 7, data elements indicated by black circles are data elements determined to be abnormal data, and many of these are actually normal data elements. Hereinafter, such erroneous detection based on an erroneous determination that normal data is abnormal is also referred to as overdetection.

Such a learning dictionary that causes erroneous determination may occur when, for example, the amount of abnormal data included in the training data is insufficient. Below, the process performed in the abnormality detection system 100 in order to obtain a suitable learning dictionary also in such a case is demonstrated.

[Process to obtain an appropriate learning dictionary]
Below, two examples of the processing method for obtaining the suitable learning dictionary in this Embodiment are demonstrated.

[First processing method]
FIG. 8 is a flowchart showing a first processing method which is an example of a training data processing method for obtaining the appropriate learning dictionary described above.

The first processing method is executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest consisting of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120. FIG. 9A is an example of an initial state of input training data distributed in an M-dimensional space, that is, a two-dimensional plane, when M = 2.

First, the learning unit 120 reads parameters used for this processing (step S80). Details of the parameters will be described in the following steps.

Next, the learning unit 120 acquires the number of data elements of the input training data (step S81).

Next, the learning unit 120 determines the number of noise elements added to the training data based on the number of data elements (step S82). The noise element is also an M-dimensional vector. The parameter acquired in step S80 is used to determine the number of noise elements in step S82, and is a real number greater than 0 and less than 1, for example. As the number of noise elements added to the training data, a value obtained by rounding the value obtained by multiplying the number of data elements acquired in step S81 by this parameter to an integer is used. That is, the number of noise elements is determined to be smaller than the number of data elements of training data.

Next, the learning unit 120 normalizes the training data (step S83). FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. In this example, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the [0, 1] 2 region in the two-dimensional plane. Such a region is an example of the first region in the present embodiment.

Next, the learning unit 120 adds the number of noise elements determined in step S82 over an M-dimensional space that is larger than the first area and includes the first area, that is, a two-dimensional plane area in this example. (Step S84). FIG. 9C is an example of training data after addition of noise elements distributed in the M-dimensional space, and the noise elements are indicated by dotted outline circles distributed in the two-dimensional plane. In this example, noise elements are added so as to be distributed over the region [−0.5, 1.5] 2. Such a region is an example of the second region in the present embodiment.

As shown in FIG. 9C, as a result of the process of step S84, a smaller number of noise elements than the original training data data elements are added so as to be distributed over a wider area than the original training data distribution range. Therefore, the distribution density of the noise elements is lower than the distribution density of the data elements of the original training data. In addition, noise elements are added so as to have a uniform distribution as a whole in the above-described region.

Next, the learning unit 120 generates noise-added training data including both an element that is an M-dimensional vector in the second region, that is, a training data element and a noise element that are both two-dimensional vectors ( Step S85).

Finally, the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S85, and outputs the learning dictionary data (step S86).

Of the above steps, step S82 and step S84 are examples of the second noise addition step, step S85 is a generation step, and step S86 is an example of the learning dictionary data output step in this embodiment.

That is, the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.

By generating a learning dictionary using such noise-added training data, even when there is little abnormal data included in the training data, a large number of normal data as shown in FIG. 7 is located outside the determination boundary. Obtaining such a learning dictionary is avoided. As a result, the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.

In the above description of the first processing method, the number of noise elements that is smaller than the data elements of the original training data is determined by using a parameter that takes a real value greater than 0 and less than 1. The method of determining the number of noise elements is not limited to this. For example, the number of noise elements may be obtained by subtracting a certain number from the number of data elements of training data. Further, the number of training data may be divided into a plurality of ranges, and a predetermined number of noise elements may be used for each range. The correspondence between the number of training data and the number of noise elements is stored in the memory of the information processing apparatus, for example, included in a data table.

In addition, the first processing method has been described by taking an example in which the data elements of the training data are two-dimensional vectors, but the idea based on the first processing method can be generalized and applied to higher dimensional spaces. The first processing method can also be applied to training data that is a vector of three or more dimensions. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [−0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.

[Second processing method]
FIG. 10 is a flowchart showing a second processing method as another example of the training data processing method for obtaining the appropriate learning dictionary described above.

The second processing method is also executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120. A case where the second processing method is also started from the initial state of the training data shown in FIG. 9A will be described as an example. In addition, the description of the steps common to the first processing method may be simplified.

First, the learning unit 120 reads parameters used for this processing (step S100). Details of the parameters will be described in the following steps.

Next, the learning unit 120 normalizes the input training data (step S101). The content of this process is the same as that of the first processing method, and FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. Further, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the area [0, 1] 2 on the two-dimensional plane. Such a region is an example of the first region in the present embodiment.

Next, the learning unit 120 sets an M-dimensional space that is larger than the first area and includes the first area, that is, a second area that is a two-dimensional plane area in this example, and the second area is It is divided into third regions that are equal M-dimensional hypercubes (step S102). FIG. 11A is a diagram for explaining the second region and the third region in the two-dimensional plane. In the example shown in FIG. 11A, is an area of [−0.5, 1.5] 2, and the third area is a sub-area obtained by dividing the second area into 64 areas.

Here, the parameter acquired in step S100 is used to determine the number of third regions obtained by dividing the second region in step S102, and the value of this parameter is 8 in the example of FIG. 11A. The number is 8 to the Mth power, that is, in this example, the number is squared to 64.

Next, the learning unit 120 obtains the number S of data elements of training data included in each of the third regions (S is an integer of 0 or more) (step S103). At this time, since there is no data element of training data in the third area outside the first area, S = 0 for any third area.

Next, the learning unit 120 determines a first threshold value T (T is a natural number) that is a threshold value for the data elements of the training data in each third region (step S104). For example, the parameter acquired in step S100 is used to determine the first threshold T. The parameters used in step S102 may be the same or different. If they are different, they may be calculated from the parameters used in step S102.

As a more specific example of this parameter used in step S104, for example, the number of data elements of training data included in any third region in the first region may be specified. As a specific example, a specific rank may be indicated by a rank in which the number of data elements of training data included in the third region are arranged in the order of size. In this case, the number of data elements of training data included in the third region of this specific rank is used as the first threshold value. As a way of indicating the order, it may be indicated by the number from the minimum value or the maximum value, or by the order from the average value or the median value, whichever is larger or smaller.

From here, the learning unit 120 determines whether or not it is necessary to add a noise element to each third region by using the above S and T, determines the number of noise elements to be added to each third region, and determines the noise element Execute the adding procedure.

First, the learning unit 120 confirms whether there is a third region in which the determination as to whether noise elements need to be added has been made (step S105). If there is one (YES in step S105), the learning unit 120 selects one from the third region. It is selected (step S106), and it is determined whether or not the number S of data elements of the training data in the third region is smaller than the first threshold T (step S107).

When the number S of data elements of the training data in the third area is smaller than the first threshold T (YES in step S107), the total number of data elements and noise elements in the third area is T (T -S) noise elements are added (step S108).

If the number S of data elements of the training data in the third area is equal to or greater than the first threshold T (NO in step S107), it is confirmed whether there is an unprocessed third area (step S105).

When processing from step S105 to step S107 or S108 is performed for all the third regions (NO in step S105), the learning unit 120 obtains noise-added training data including data elements and noise elements in the second region. Generate (step S109). FIG. 11B is a diagram for describing an example of training data and noise elements distributed in a two-dimensional space in the case of NO in step S105. In FIG. 11B, the noise element is indicated by a dotted outline circle.

The example of FIG. 11B is an example when the first threshold T = 9. In the third area at the lower left corner of the first area, since the number of data elements S = 6 of the training data, TS = 3 noise elements are added. In the third area in the lower left corner of the first area, since the number S of data elements of the training data is 8, TS = 1 noise element is added. Since all other third regions in the first region have S of 9 or more, no noise element is added. Since other hatched third regions are outside the first region and do not include data elements of training data, nine noise elements are added thereto. The noise element is a random number according to a uniform distribution in each third region.

Finally, the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S109, and outputs the learning dictionary data (step S110).

Of the above steps, step S101 is a normalization step, step S102 is a division step, steps S103 to S108 are a first noise addition step, step S109 is a generation step, and step S110 is a learning dictionary data output step. It is an example in embodiment.

Also in the second processing method, the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.

Also, in the second processing method, unlike the first processing method, the number of noise elements to be added in the first region where the training data is distributed is determined according to the density of each subdivided region. Therefore, in the second processing method, the occurrence of an overcrowded place of data elements and noise elements that can occur in the first region in the first processing method is suppressed. In Isolation Forest, the place where vector data is overcrowded in training data tends to be inside the judgment boundary. Therefore, if data elements and noise elements are likely to be overcrowded, there is an increased possibility of erroneous determination that even abnormal data is determined to be normal. In the following, erroneous detection based on an erroneous determination that abnormal data is normal is also referred to as detection omission for the above-described overdetection. In the abnormality detection system 100 in which the abnormality determination of unknown data is performed based on the learning dictionary generated by executing the second processing method, the abnormality detection is performed while suppressing the occurrence of overdetection and also suppressing the possibility of detection omission. Can do.

As with the first processing method, the concept based on this processing method can be applied to a higher-dimensional space in general, and the second processing method is training data that is a vector of three or more dimensions. It can also be applied to. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [−0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.

[effect]
Here, an actual example of the effect of adding noise to the training data by the second processing method will be shown.

FIG. 12A and FIG. 12B show the determination boundary of the learning dictionary generated using the training data without adding noise, and the determination of the learning dictionary generated using the same training data added with noise by the above processing method. It is a figure which shows a boundary. Note that the training data 1 in FIG. 12A and the training data 2 in FIG. 12B are different types of data acquired from the same vehicle-mounted network. Comparing the training data 1 and the training data 2, the training data 1 has data elements distributed almost uniformly from the center to the periphery of the distribution, and the training data 2 has a sparse distribution of data elements at the periphery. It can be said that the training data 2 is more likely to contain outliers than the training data 1.

In both FIG. 12A and FIG. 12B, a circle indicates a data element of training data. A solid line box represents a decision boundary of a learning dictionary generated using training data without adding noise, and a broken line box represents a decision boundary of a learning dictionary generated using training data added with noise. Noise elements are not shown in each figure.

As can be seen from these figures, inside the judgment boundary of the learning dictionary obtained when noise is added, all of the training data inside the judgment boundary of the learning dictionary obtained when noise is not added, and Much of the training data outside it is included.

Furthermore, the inventors conducted an abnormality detection test in each learning dictionary using test data in order to confirm whether the learning dictionary obtained when noise is added is more appropriate. FIG. 12C shows the false detection rate in this abnormality detection test. The left column of each training data is the false detection rate in the learning dictionary obtained without adding noise to the training data, and the right column is the false detection rate in the learning dictionary obtained by adding noise to the training data.

As can be seen from FIG. 12C, there is a significant improvement in the false detection rate in the learning dictionary obtained by adding noise compared to the learning dictionary obtained without adding noise. That is, it can be seen that the learning dictionary obtained when noise is added is more appropriate. This improvement is also seen in the case of training data 2 that has a high possibility of including outliers and has a low false detection rate even in a learning dictionary obtained without adding noise. In detecting an abnormality in a vehicle that runs at a speed of several tens of kilometers per hour, it is highly important that false detection is suppressed to a low level regardless of whether it is overdetection or omission of detection.

On the other hand, it is not always easy to collect data having sufficient fluctuations including, for example, abnormality data derived from abnormality in the application layer as training data obtained from a network conforming to a standard such as CAN. If the training data is close to the abnormal data generated by an unknown attack pattern, it is difficult to prepare. In other words, conventionally, such training data has been used to generate a learning dictionary in Isolation Forest, and it has been difficult to suppress the false detection rate in abnormality detection.

However, by executing the processing method according to the present embodiment, the original training data that includes many normal data elements includes a small amount of data elements that deviate from the training data to some extent in a data space at a lower density than the original training data. It is added inside. This added data element is referred to as a noise element above. And in the abnormality detection system using the learning dictionary produced | generated using this training data, the abnormality detection with the false detection rate suppressed compared with the past is possible.

(Embodiment 2)
The first processing method and the second processing method described in the first embodiment are differences in the algorithms of programs executed in the information processing apparatus in order to realize each, for example, by switching the program read by a certain processor It can be selectively executed.

However, there are the following differences between the first processing method and the second processing method.

First, the time required for adding a noise element is greater in dependence on the number of training data in the second processing method than in the first processing method, and it takes longer as training data increases. That is, the processing load on the processor is larger in the second processing method.

On the other hand, the detection accuracy (low false detection rate) in the generated learning dictionary is improved as compared with the conventional method as described above, but the second processing method is superior.

From the viewpoint of high accuracy, it is desirable that the second processing method is always executed in the abnormality detection system. The difference in processing load as described above is unlikely to be a problem because the abnormality detection system 100A in FIG. 1A or the abnormality detection system 100B in FIG. However, in the configuration such as the abnormality detection system 100C in FIG. 1C, it is assumed that there is a limit to computer resources such as the processor operation speed. That is, in a traveling vehicle, there is a possibility that the learning dictionary cannot be generated or updated at a necessary speed by the second processing method.

Also, there are parameters in each processing method as well as differences in processing methods that affect the time cost and accuracy of detection in the abnormality detection system.

In the first processing method, the parameter used to determine the number of noise elements can take a real number larger than 0 and smaller than 1. However, it is difficult to predict in advance which value in this range will generate a learning dictionary that is more suitable for anomaly detection. In order to know this, for example, a plurality of learning dictionaries generated by changing parameter values may be used. Compare the accuracy of anomaly detection performed on test data. However, as a matter of course, if a comparison is made for searching for such an optimum parameter, it takes more time until a learning dictionary used for abnormality detection is determined. If the learning dictionary is determined slowly, the abnormality detection cannot be executed until the learning dictionary is determined or is performed using the old learning dictionary.

In the second processing method, there are a parameter used for determining the number of third regions obtained by dividing the second region and a parameter used for determining the first threshold T. Of these two parameters, the former is, for example, divided into two or more in the first area in each dimension, and more than one third area on both sides outside the first area. Assuming that four or more third regions are arranged, L can take an integer value of four or more. If the latter is a value used for specifying any one of the third regions in the second region, for example, it can take a real value that is 1 or more and less than or equal to the number of the third regions in the second region. For these parameters, the same thing as the first processing method applies, and if a search is performed, a learning dictionary capable of detecting an abnormality with higher accuracy may be obtained. However, a learning dictionary used for detecting an abnormality is determined. It takes more time to complete. Therefore, the execution of abnormality detection is delayed or accuracy is sacrificed.

In view of these points, the inventors have selected whether to select a training data processing method or perform parameter search in order to cause the abnormality detection system to perform abnormality detection at the required speed and with the highest possible accuracy. I came up with a method to make a quick decision on the anomaly detection system.

Hereinafter, such an abnormality detection system will be described. In addition, since the structure of the abnormality detection system of this Embodiment may be common with Embodiment 1, description is abbreviate | omitted as the abnormality detection system 100, and the operation | movement is demonstrated.

[Operation]
Below, the whole process for the quick decision regarding the selection of the processing method of training data in the abnormality detection system 100 or the execution of parameter search will be described, and the process for parameter search will be described in the description. To do.

FIG. 13 is a flowchart illustrating an example of a processing method for determining whether or not to perform training data selection and parameter search in each processing method, which is executed in the abnormality detection system 100.

This processing method includes a step executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 will be described as the processing of the learning unit 120. Further, although there are processes executed by each component of the abnormality determination unit 110, the following description may be made as processing by the abnormality determination unit 110.

In the following description, it is assumed that the training data receiving unit 122 has already received training data in the initial state.

First, the learning unit 120 acquires the number N of data elements of training data (step S130).

Next, the learning unit 120 determines whether N is greater than or equal to a predetermined second threshold (step S131). The second threshold value is a threshold value used for determining whether to use the first processing method or the second processing method as the training data processing method. For example, the computing ability of the processor that implements the learning unit 120, etc. It is determined by available computer resources and stored in the memory of the information processing apparatus. By using a predetermined threshold in this way, a quick determination can be made.

When it is determined that N is equal to or greater than the second threshold, that is, when the number of data elements of the training data is large, the learning unit 120 selects a first processing method that can be completed in a shorter time (step S132).

When it is determined that N is not equal to or greater than the second threshold, that is, when the number of data elements of training data is small, the learning unit 120 selects a second processing method that provides a learning dictionary capable of detecting an abnormality with higher accuracy. (Step S133).

Next, the learning unit 120 determines whether N is greater than or equal to a predetermined third threshold (step S134). The third threshold value is a threshold value used for determining whether or not to search for a parameter when executing each processing method of training data. Similarly to the second threshold, the third threshold is determined by available computer resources such as the computing capability of the processor that implements the learning unit 120 and stored in the memory of the information processing apparatus. The second threshold value may be related or may be a value independent of each other. By using a predetermined threshold in this way, a quick determination can be made.

When it is determined that N is equal to or greater than the third threshold, that is, when the number of data elements of the training data is large, the learning unit 120 determines not to execute the parameter search so that it can be completed in a shorter time (step S135).

When it is determined that N is not equal to or greater than the third threshold, that is, when the number of data elements of the training data is small, the learning unit 120 performs a parameter search for obtaining a learning dictionary capable of detecting an abnormality with higher accuracy ( Step S136).

When generating and outputting learning dictionary data (step S137) through step S132 and step S135, the learning unit 120 executes the first processing method shown in the flowchart of FIG.

When the learning dictionary data is generated and output (step S137) through step S133 and step S135, the learning unit 120 executes the second processing method shown in the flowchart of FIG.

When generating and outputting learning dictionary data (step S137) via steps S132 and S136, the learning unit 120 executes the first processing method shown in the flowchart of FIG. FIG. 14 is a flowchart of the first processing method including parameter search, which is executed in the abnormality detection system 100. In the flowchart of FIG. 14, steps common to the first processing method shown in the flowchart of FIG. 8 are denoted by common reference numerals, and detailed description thereof is omitted.

In the first processing method shown in the flowchart of FIG. 14, the learning unit 120 executes a set of steps S82 and S84 to S86 a plurality of times by exchanging parameter values. A plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S83 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.

The anomaly judgment unit 110 has acquired data for Isolation Forest testing. This test data is input, for example, into the abnormality determination unit 110 in advance and stored in the storage unit 112. If it is determined in step S131 that N is not equal to or greater than the second threshold value, the abnormality determination unit 110 performs this test data. Is read from the storage unit 112 and acquired. Then, the abnormality determination unit 110 normalizes the test data using the data used for normalization in step S83, and executes abnormality determination for the test data using each learning dictionary data (step S140).

Finally, the learning unit 120 evaluates the abnormality determination using each learning dictionary data performed in step S140, and selects the best learning dictionary data as learning dictionary data used for actual abnormality detection based on the evaluation result. (Step S141). For this evaluation, for example, a known evaluation scale such as a recall and F value can be used. Note that step S141 may be performed by the abnormality determination unit 110.

Of the above steps, step S82 and step S84 are examples of the second noise addition step, step S85 is a generation step, and step S86 is an example of the learning dictionary data output step in this embodiment. Further, step S131 is an example in the present embodiment of the first determination step, and step S134 is the second determination step. Steps S140 and S141 are examples in the present embodiment corresponding to the test data acquisition step, the evaluation step, and the learning dictionary data selection step.

One difference from the case where the first processing method is executed through steps S132 and S135 is that the set of steps S82 and S84 to S86 is performed once before learning dictionary data used for abnormality detection is output. Is only executed or multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.

When generating and outputting learning dictionary data (step S137) through steps S133 and S136, the learning unit 120 executes the second processing method shown in the flowchart of FIG. FIG. 15 is a flowchart of the second processing method including parameter search, which is executed in the abnormality detection system 100. In the flowchart of FIG. 15, steps common to the second processing method shown in the flowchart of FIG. 10 are denoted by common reference numerals, and detailed description thereof is omitted.

In the second processing method shown in the flowchart of FIG. 15, the learning unit 120 executes a set of steps S102 to S110 a plurality of times by exchanging combinations of two types of parameter values. A plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S101 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.

The contents of steps S150 and S151 are the same as those of steps S140 and S141, respectively.

Of the above steps, step S102 is an example in the present embodiment of the division step, steps S103 to S108 are the first noise addition step, step S109 is the generation step, and step S110 is the learning dictionary data output step. Further, step S131 is an example in the present embodiment of the first determination step, and step S134 is the second determination step. Steps S150 and S151 are examples in the present embodiment corresponding to a test data acquisition step, an evaluation step, and a learning dictionary data selection step.

One difference from the case where the second processing method is executed through steps S133 and S135 is that the set of steps S102 to S110 is executed only once until learning dictionary data used for abnormality detection is output. Whether it is executed or executed multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.

As described above, in the flowchart shown in FIG. 13, there are two types of processing methods for adding noise, and there are two types depending on whether or not parameter search is performed for each processing method. That is, there are four processing patterns until learning dictionary data used for abnormality detection is determined and abnormality detection can be performed. Among these processing patterns, the time cost has the largest time when the second processing method is executed including parameter search. Next, the time cost is large when the first processing method is executed including parameter search. Compared to these two patterns, the time costs of the remaining two patterns are significantly smaller. In the above description, the second threshold value and the third threshold value may be independent values, but may be determined in consideration of the magnitude relationship of this time cost.

Further, the threshold value used in step S134 is switched according to the determination result in step S131, that is, depending on whether the first processing method or the second processing method is used for adding noise. Also good. For example, when the second processing method is used, the third threshold is used, and when the first processing method is used, a fourth threshold that is another predetermined threshold is used instead of the third threshold. Also good. Step S134 in the case where the fourth threshold is used in this way is an example of the third determination step in the present embodiment.

Further, in the flowchart of FIG. 13, two determinations are made, that is, the determination of the noise addition processing method and the determination of whether or not to execute the parameter search for each processing method. Both are not mandatory. The time cost may be adjusted by only one of these determinations.

In the flowchart of FIG. 13, there are two options for execution of parameter search, whether to execute or not. For example, the parameters to be replaced for the search according to the number of data elements of training data. May be changed in stages. That is, as the number of data elements of training data increases, the number of parameters to be replaced may be reduced. In this case, the number of parameters may be a value calculated from the number of data elements, or may be a value determined in advance for each predetermined range of data elements. That is, it is sufficient that there is a negative correlation between the number of data elements of training data and the number of parameters. Thereby, when there are many data elements of training data, the increase in the load of calculation processing can be suppressed so that the time required to determine learning dictionary data does not become too long.

In the flowchart of FIG. 13, the first processing method is executed for training data processing or the second processing method is determined according to the comparison result of the number N of data elements of training data with the second threshold value. Although execution is selected, it is not limited to this. For example, there may be further an option of not executing training data processing. For example, when the information processing device has a heavy load on the processor due to other processing, the current learning dictionary is continuously used for abnormality detection, and the generation of a new learning dictionary for updating is postponed. Such a determination may be made. Further, the option may be two options of executing either the first processing method or the second processing method and not executing the training data processing.

(Other embodiments)
As described above, Embodiments 1 and 2 have been described as examples of the technology according to the present disclosure. However, the technology according to the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are appropriately performed. For example, the following modifications are also included in one embodiment of the present disclosure.

Some or all of the constituent elements constituting each device in the above embodiment may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip. Specifically, the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. . A computer program is recorded in this RAM. Further, the system LSI achieves its functions by the microprocessor operating according to the computer program recorded in the RAM. In addition, each part of the constituent elements constituting each of the above devices may be individually made into one chip, or may be made into one chip so as to include a part or all of them. Although the system LSI is used here, it may be called IC, LSI, super LSI, or ultra LSI depending on the degree of integration. Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection and setting of the circuit cells inside the LSI may be used. Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied as a possibility.

Some or all of the constituent elements constituting each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device. This IC card or module is a computer system including a microprocessor, ROM, RAM, and the like. Further, this IC card or module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

In the above embodiment, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program executor such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. Here, the software that realizes the information processing apparatus of the above-described embodiment is a program as follows.

That is, this program allows the computer to receive input of N data elements (M is an integer equal to or greater than 2) M-dimensional vectors (M is an integer equal to or greater than 2) used as training data for Isolation Forest. An element acquisition step, a normalization step for normalizing the training data to be distributed over the M-dimensional first region, and an M-dimensional second region larger than the first region and including the first region. A division step of dividing into equal LM (L is an integer of 4 or more) M-dimensional hypercube and a third region, and the number S of data elements included in each of the third regions (S is an integer of 0 or more) (TS) M-dimensional vectors are obtained in each of the third regions including the number of data elements smaller than the first threshold T (T is a natural number). A first noise addition step for adding noise elements in a uniform distribution, a generation step for generating noise addition training data including data elements and noise elements, and generation of learning dictionary data for Isolation Forest using the noise addition training data. An information processing method including a learning dictionary data output step to be output is executed.

Further, the present disclosure can be implemented as an information processing apparatus that generates learning dictionary data using training data and provides the learning dictionary data to an abnormality determination apparatus that performs abnormality determination, as described in the above embodiment. is there. Moreover, it is realizable also as an abnormality detection system provided with this information processing apparatus and abnormality determination apparatus. This abnormality determination device is a monitoring ECU that realizes an abnormality determination unit connected to the in-vehicle network 210, for example, within the abnormality detection system configured as shown in FIG. 1A or 1C. Moreover, if it is in the abnormality detection system of the structure shown by FIG. 1BC, it is the external server 10 which implement | achieves an abnormality determination part. In either case, a memory and a processor for storing learning dictionary data output from the information processing apparatus are provided and connected to the network. This network is typically an in-vehicle CAN network as described above, but is not limited thereto.

For example, a network such as CAN-FD (CAN with Flexible Data rate), FlexRay, Ethernet, LIN (Local Interconnect Network), MOST (Media Oriented Systems Transport) may be used. Alternatively, an in-vehicle network combining these networks as sub-networks with a CAN network may be used.

In the above embodiment, each component may be a circuit. A plurality of components may constitute one circuit as a whole, or may constitute separate circuits. Each circuit may be a general-purpose circuit or a dedicated circuit.

As described above, the information processing apparatus according to one or more aspects has been described based on the embodiment, but the present disclosure is not limited to this embodiment. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.

For example, in the above embodiment, a process executed by a specific component may be executed by another component instead of the specific component. Further, the order of the plurality of processes may be changed, and the plurality of processes may be executed in parallel.

This disclosure can be used for an in-vehicle network system including an in-vehicle network.

DESCRIPTION OF SYMBOLS 10 External server 20 Vehicle 100,100A, 100B, 100C Abnormality detection system 110 Abnormality determination part 112 Accumulation part 114 Determination object data reception part 116 Determination object data conversion part 118 Determination execution part 120 Learning part 122 Training data reception part 124 Learning dictionary generation Part 210 In-vehicle network

Claims

An information processing apparatus comprising a processor,
The processor is
A data element acquisition step for receiving input of data elements which are N (N is an integer of 2 or more) M-dimensional vectors (M is an integer of 2 or more) used as training data for Isolation Forest;
A normalizing step of normalizing the training data to be distributed over an M-dimensional first region;
A division that divides an M-dimensional second region that is larger than the first region and includes the first region into third regions that are LM pieces (L is an integer of 4 or more) having the same size. Steps,
The number S of data elements included in each of the third regions (S is an integer equal to or greater than 0) is acquired, and the number of the data elements smaller than a first threshold T (T is a natural number) in the third regions is obtained. A first noise adding step of adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including;
Generating a noise-added training data including the data element and the noise element;
A learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data.
The processor is
Performing a first determination step of determining whether N is equal to or greater than a predetermined second threshold;
When it is determined in the first determination step that N is not equal to or greater than the second threshold value, the generation step and the learning dictionary data output step are executed after executing the division step and the first noise addition step. The information processing apparatus according to 1.
The processor is
When it is determined in the first determination step that N is equal to or greater than the second threshold value, K (K is a natural number smaller than N) noise elements that are M-dimensional vectors are uniformly distributed in the second region. The information processing apparatus according to claim 2, wherein the generation step and the learning dictionary data output step are executed after executing a second noise addition step to be added.
The processor further includes:
If it is determined in the first determination step that N is not equal to or greater than the second threshold, a test data acquisition step for receiving input of test data for Isolation Forest, and whether N is equal to or greater than a predetermined third threshold. Performing a second determination step to determine,
If it is determined in the second determination step that N is not greater than or equal to the third threshold value, a set of the division step, the first noise addition step, the generation step, and the learning dictionary data output step is set in the division step. A plurality of learning dictionary data are output using a plurality of different Ls, and an abnormality detection is performed on the test data using each of the plurality of learning dictionary data, and the abnormality detection result An evaluation step for evaluating each of the plurality of learning dictionary data based on the learning step, and a learning dictionary data selection step for selecting the best learning dictionary data from the plurality of learning dictionary data based on a result of the evaluation step. ,
The said set is performed once using L which is a predetermined value at the said division | segmentation step when it determines with N being more than the said 3rd threshold value in the said 2nd determination step. The information processing apparatus described in 1.
5. The information according to claim 4, wherein when the second determination step determines that N is not equal to or greater than the third threshold, the processor determines the number of the different values of L so as to have a negative correlation with the value of N. 6. Processing equipment.
In the first noise addition step, the processor sets a value smaller than the median number of the data elements included in each of the third regions in the first region to a value of the first threshold T. The information processing apparatus according to any one of claims 1 to 5.
The processor is
If it is determined in the first determination step that N is equal to or greater than the second threshold value, a test data acquisition step for receiving input of test data for Isolation Forest, and whether N is equal to or greater than a predetermined fourth threshold value And a third determination step for determining
If it is determined in the third determination step that N is not greater than or equal to the fourth threshold value, the second noise addition step, the generation step, and the learning dictionary data output step set are set in the second noise addition step. A plurality of learning dictionary data is output by executing a plurality of times using different K, and the abnormality detection for the test data is performed using each of the plurality of learning dictionary data. An evaluation step for evaluating each of the learning dictionary data, and a learning dictionary data selection step for selecting the best learning dictionary data from the plurality of learning dictionary data based on the result of the evaluation step,
4. When the third determination step determines that N is equal to or greater than the fourth threshold value, the second noise addition step executes the set once using K, which is a predetermined value. 5. The information processing apparatus according to claim 1.
8. The information according to claim 7, wherein the processor determines the number of the different values of K so as to have a negative correlation with the value of N when it is determined in the third determination step that N is not equal to or greater than the fourth threshold value. Processing equipment.
When the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space,
The information processing apparatus according to any one of claims 1 to 8, wherein the second area is an area defined by a hypercube of [-0.5, 1.5] M in the space.
The information processing apparatus according to any one of claims 1 to 9,
An abnormality determination device including a memory and a processor for storing learning dictionary data output from the information processing device and connected to a network, wherein the processor acquires data flowing through the network, and the acquired data An abnormality detection system comprising: an abnormality determination device that executes the abnormality determination based on learning dictionary data stored in the memory.
The anomaly detection system according to claim 10, wherein the network is an on-board Controlled Area Network network.
An information processing method executed using an information processing apparatus including a processor,
In the processor,
A data element acquisition step for receiving input of data elements which are N (N is an integer of 2 or more) M-dimensional vectors (M is an integer of 2 or more) used as training data for Isolation Forest;
A normalizing step of normalizing the training data to be distributed over an M-dimensional first region;
Dividing an M-dimensional second region that is larger than the first region and includes the first region into third regions that are LM (L is an integer of 4 or more) M-dimensional hypercubes having the same size. Steps,
The number S of data elements included in each of the third regions (S is an integer of 0 or more) is acquired, and the number of the data elements smaller than a first threshold T (T is a natural number) in the third region is obtained. A first noise adding step of adding noise elements that are (TS) M-dimensional vectors in a uniform distribution to each of the third regions including;
Generating a noise-added training data including the data element and the noise element;
A learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data.
A program that causes a processor included in a computer to execute the information processing method according to claim 12.