WO2018105320A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2018105320A1
WO2018105320A1 PCT/JP2017/040727 JP2017040727W WO2018105320A1 WO 2018105320 A1 WO2018105320 A1 WO 2018105320A1 JP 2017040727 W JP2017040727 W JP 2017040727W WO 2018105320 A1 WO2018105320 A1 WO 2018105320A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning dictionary
noise
training data
learning
Prior art date
Application number
PCT/JP2017/040727
Other languages
French (fr)
Japanese (ja)
Inventor
良太 高橋
崇光 佐々木
Original Assignee
パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2017207085A external-priority patent/JP6782679B2/en
Application filed by パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ filed Critical パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ
Priority to CN201780022736.0A priority Critical patent/CN109074519B/en
Priority to EP17877549.0A priority patent/EP3553712B1/en
Publication of WO2018105320A1 publication Critical patent/WO2018105320A1/en
Priority to US16/255,877 priority patent/US10601852B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • This disclosure relates to an abnormality detection technology used in an in-vehicle network or the like.
  • the automobile is equipped with a large number of electronic control units (Electronic Control Units, hereinafter referred to as ECUs) for controlling various systems.
  • ECUs Electronic Control Units
  • the ECUs are connected to an in-vehicle network, and communication is performed through the in-vehicle network in order to realize various functions of the automobile.
  • CAN Controller Area Network
  • CAN Controller Area Network
  • a network conforming to the CAN protocol can be constructed as a closed communication path on a single vehicle.
  • each automobile it is not uncommon for each automobile to be built and installed as a network that can be accessed from the outside.
  • an in-vehicle network is provided with a port for taking out information flowing through the network for the purpose of diagnosis for each in-vehicle system, or a car navigation system having a function of providing a wireless LAN is connected. Allowing external access to the in-vehicle network can improve convenience for automobile users, but also increases threats.
  • An attack frame is an abnormal frame that differs in some way from a normal frame that flows through an in-vehicle network that is not attacked.
  • an abnormal data detection process for a frame (hereinafter also referred to as a CAN message or simply a message) flowing on a CAN bus is obtained as a result of learning using learning data.
  • a technique to be executed using an evaluation model is disclosed (see Patent Document 1 and Patent Document 2).
  • This disclosure provides an information processing apparatus and the like that are useful for detecting anomalies due to attacks in an in-vehicle network of a vehicle such as an automobile.
  • an information processing apparatus including a processor, and the processor uses N pieces of N (N is 2 or more) used as training data for Isolation Forest.
  • N is 2 or more
  • M-dimensional vector M is an integer equal to or larger than 2
  • a data element acquisition step that receives input of data elements
  • normalization that normalizes the training data to be distributed over the first area of M dimensions Step and M-dimensional second region larger than the first region and including the first region into a third region which is an LM hypercube of LM pieces (L is an integer of 4 or more) having the same size.
  • a division step for dividing and the number S of data elements included in each of the third regions (S is an integer of 0 or more) are obtained, and among the third regions, a first threshold T (T is a natural number)
  • a generation step of generating noise-added training data including a noise element and a learning dictionary data output step of generating and outputting Ilation Forest learning dictionary data using the noise-added training data are executed.
  • an information processing method is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest).
  • N is used as training data for Isolation Forest.
  • a data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension.
  • L is an integer of 4 or more
  • S is an integer greater than or equal to 0
  • T is a natural number
  • a generation step for generating noise-added training data and a learning dictionary data output step for generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
  • an information processing apparatus and the like that can quickly provide a learning dictionary that is used for abnormality detection due to an attack in an in-vehicle network of a vehicle such as an automobile and that has a reduced false detection rate.
  • FIG. 1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1.
  • FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1.
  • FIG. 1C is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1.
  • FIG. 2 is a block diagram illustrating a configuration example of an abnormality determination unit and a learning unit that configure the above-described abnormality detection system.
  • FIG. 3 is a schematic diagram for explaining a learning dictionary generated by the learning unit using training data.
  • FIG. 4 is a schematic diagram for explaining the abnormality determination by the abnormality determination unit.
  • FIG. 1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1.
  • FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according
  • FIG. 5 is a diagram showing a data flow in the learning unit that generates the learning dictionary.
  • FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit that performs abnormality determination.
  • FIG. 7 is a diagram illustrating an example of an inappropriate determination boundary that does not fit the distribution of training data.
  • FIG. 8 is a flowchart illustrating an example of a training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system.
  • FIG. 9A is a diagram illustrating an example of training data before normalization distributed in an M-dimensional space.
  • FIG. 9B is a diagram illustrating an example of training data after normalization distributed in an M-dimensional space.
  • FIG. 9C is a diagram illustrating an example of training data after addition of noise elements distributed in the M-dimensional space.
  • FIG. 10 is a flowchart showing another example of the training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system.
  • FIG. 11A is a diagram for explaining an example of division of an M-dimensional region in the M-dimensional space.
  • FIG. 11B is a diagram for describing an example of training data after adding noise elements distributed in an M-dimensional space.
  • FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12B is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12B is a diagram illustrating a determination
  • FIG. 12C is a bar graph showing a false detection rate in an abnormality detection test performed using each learning dictionary whose determination boundaries are shown in FIGS. 12A and 12B.
  • FIG. 13 is a flowchart illustrating an example of a processing method for determining whether to select a training data processing method and whether to perform parameter search in each processing method, which is executed in the abnormality detection system according to the second embodiment. It is.
  • FIG. 14 is a flowchart illustrating an example of a processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.
  • FIG. 15 is a flowchart illustrating another example of the processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.
  • the other is to monitor CAN messages flowing through the in-vehicle network.
  • This method can be realized by adding a monitoring ECU (node) to each vehicle, and is relatively easy to introduce.
  • the proposed method can be further classified into three types: a rule-based method, a method that uses the data transmission cycle, and a method that detects outliers in message contents using LOF (Local Outer Factor). It can be roughly divided.
  • the rule-based method and the method using the data transmission cycle can deal with a known attack pattern, but a method using LOF to detect an unknown attack pattern. Thus, detection based on the content of the message is necessary.
  • the ECU connected to the in-vehicle network does not always have sufficient data processing capacity and storage capacity, and even in such an execution environment, the speed required for a car traveling on a road at several tens of kilometers per hour is required. If it is not possible to detect, it is not practical.
  • the present inventors use an abnormality detection algorithm called Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make
  • An information processing apparatus is an information processing apparatus including a processor, and the processor uses N (N is an integer of 2 or more) M-dimensional vectors (N is an integer of 2 or more) used as training data for Isolation Forest.
  • N is an integer of 2 or more
  • M-dimensional vectors N is an integer of 2 or more
  • a data element acquisition step that receives an input of a data element that is an integer greater than or equal to 2)
  • a normalization step that normalizes the training data to be distributed over the M-dimensional first region, and a step larger than the first region.
  • the number S of data elements included in S is acquired (S is an integer equal to or greater than 0), and each of the third areas including a number of data elements smaller than the first threshold T (T is a natural number) among the third areas.
  • a learning dictionary data output step of generating and outputting the learning dictionary data of the Isolation Forest using the additional training data is executed.
  • the processor executes a first determination step for determining whether N is equal to or greater than a predetermined second threshold value, and if it is determined in the first determination step that N is not equal to or greater than the second threshold value, the division step
  • the generation step and the learning dictionary data output step may be executed after executing the first noise addition step.
  • the processor determines that N is equal to or greater than the second threshold value in the first determination step
  • the noise element that is an M-dimensional vector of K pieces (K is a natural number smaller than N) is set in the second region.
  • the generation step and the learning dictionary data output step may be executed after executing the second noise addition step for adding at a different density.
  • the processor when the processor further determines that N is not equal to or greater than the second threshold value in the first determination step, the processor receives a test data in the Isolation Forest, and N is equal to or greater than a predetermined third threshold value.
  • a set of steps is executed a plurality of times using L having different values in the division step to output a plurality of learning dictionary data, and further, abnormality detection for the test data is executed using each of the plurality of learning dictionary data.
  • An evaluation step for evaluating each of the plurality of learning dictionary data based on the result of the abnormality detection, and an evaluation step A learning dictionary data selection step for selecting the best learning dictionary data from a plurality of learning dictionary data based on the result of the search, and if it is determined in the second determination step that N is greater than or equal to the third threshold,
  • the set may be executed once using L which is a predetermined value in the step.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor may determine the number of different values of L so as to have a negative correlation with the value of N.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor determines, as the value of the first threshold value T, any number smaller than the median number of data elements included in each of the third areas in the first area. May be.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor determines that N is equal to or greater than the second threshold value in the first determination step, the processor receives the test data for Isolation Forest test data, and N is equal to or greater than a predetermined fourth threshold value.
  • An evaluation step for evaluating each of a plurality of learning dictionary data, and a plurality of learning dictionaries based on the result of the evaluation step.
  • a learning dictionary data selection step for selecting the best learning dictionary data from the data, and if it is determined in the third determination step that N is equal to or greater than a fourth threshold, K is a predetermined value in the second noise addition step.
  • the set may be executed once using.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor may determine the number of different values of K so as to have a negative correlation with the value of N.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space
  • the second region is [ ⁇ 0.5, 1.5] in the M-dimensional space. It may be a region defined by a hypercube of M.
  • An abnormality detection system includes any one of the information processing apparatuses described above and a memory and a processor that store learning dictionary data output from the information processing apparatus, and is connected to a network.
  • the determination device, the processor includes an abnormality determination device that acquires data flowing through a network and executes an abnormality determination of the acquired data based on learning dictionary data stored in a memory.
  • abnormality detection is performed using a learning dictionary that is updated quickly in consideration of accuracy.
  • an information processing method is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest).
  • a data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. Normalization step, and dividing the M-dimensional second area larger than the first area and including the first area into LM third areas (L is an integer of 4 or more) having the same size.
  • the step S and the number S of data elements included in each of the third regions are acquired, and the number of data elements smaller than the first threshold T (T is a natural number) among the third regions
  • a generation step of generating training data, and a learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
  • a program according to an aspect of the present disclosure is a program that causes a processor included in a computer to execute the above information processing method.
  • FIGS. 1A to 1C are block diagrams respectively showing a configuration example of an abnormality detection system including an information processing apparatus according to Embodiment 1.
  • 1A to 1C show abnormality detection systems 100A, 100B, and 100C having different configurations, respectively.
  • the anomaly detection systems 100A to 100C are systems that detect an anomaly of data flowing through a network to be monitored using an algorithm called Isolation Forest, and each includes an anomaly determination unit 110 and a learning unit 120.
  • the abnormality determination unit 110 determines whether data flowing through the in-vehicle network 210 included in the vehicle 20 is normal or abnormal.
  • the vehicle 20 is an automobile, for example.
  • the in-vehicle network 210 is a network corresponding to, for example, a CAN standard, and includes a bus, a plurality of ECUs and diagnostic ports connected to the bus in each of the configuration examples of FIGS. 1A to 1C.
  • the plurality of ECUs include ECUs having different functions such as an ECU that collects and analyzes measurement data from various sensors, an ECU that controls an engine, an ECU that controls a brake, and an ECU that monitors a network.
  • the data flowing through the in-vehicle network 210 is message data flowing through the bus.
  • the learning unit 120 performs prior learning for the abnormality determination unit 110 to perform the above determination. More specifically, the learning unit 120 learns using the training data, and generates a learning dictionary that the abnormality determination unit 110 uses for the above determination.
  • the generated learning dictionary data (hereinafter also referred to as learning dictionary data) is stored, for example, in a storage device (not shown).
  • the abnormality determination unit 110 reads the learning dictionary from the storage device, and whether or not unknown data that is a target of normality or abnormality, that is, message data acquired from the in-vehicle network 210 deviates from the learning dictionary. Whether or not it is abnormal is determined based on whether or not it is abnormal. More specifically, the learning dictionary generated by the learning unit 120 includes a plurality of binary trees, and the abnormality determination unit 110 uses the average value of the scores calculated from the plurality of binary trees to determine whether the data is abnormal. Determine whether. In addition, this binary tree used in Isolation Forest is called Isolation Tree or iTree.
  • the abnormality determination unit 110 and the learning unit 120 are functional components provided by a processor that reads and executes a predetermined program. In each configuration example shown in FIGS. 1A to 1C, the locations of the processors that provide the functional components of these processors are different.
  • the learning unit 120 is provided by a processor and a memory included in the external server 10 that is a so-called server computer outside the vehicle 20.
  • the external server 10 is one example of the information processing apparatus in the present embodiment.
  • the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network.
  • the learning unit 120 also outputs Isolation Forest learning dictionary data generated using the training data, and provides it to the abnormality determination unit 110 of the vehicle 20 via the communication network.
  • the learning dictionary data is stored in a storage device such as a flash memory of a microcontroller included in a monitoring ECU for network monitoring connected to the in-vehicle network 210, and the abnormality determination unit 110 is operated by the processor of the microcontroller. Provided.
  • the abnormality determination unit 110 performs message abnormality determination on the message acquired from the bus using the learning dictionary data acquired from the learning dictionary data from the storage device.
  • learning dictionary data updated after shipment of the vehicle 20 can be provided to the abnormality determination unit 110.
  • both the abnormality determination unit 110 and the learning unit 120 are provided by a processor and a memory included in the external server 10 outside the vehicle 20.
  • Such an external server 10 is also an example of the information processing apparatus in the present embodiment.
  • the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network.
  • the learning unit 120 outputs the learning dictionary data of the Isolation Forest generated using the training data, but the output destination is not outside the external server 10, but a storage device (for example, a hard disk drive provided in the external server 10 (illustrated) None).
  • the abnormality determination is performed on the external server 10 instead of on the vehicle 20. That is, the message flowing through the in-vehicle network 210 is transmitted to the external server 10 via the communication network. This message received by the external server 10 is input to the abnormality determination unit 110.
  • Abnormality determination unit 110 acquires learning dictionary data from the storage device, performs abnormality determination of the message using the learning dictionary data, and transmits the result to vehicle 20 via the communication network.
  • the learning dictionary data used by the abnormality determination unit 110 in the external server 10 is updated as needed.
  • both the abnormality determination unit 110 and the learning unit 120 are provided by a microcontroller provided in a monitoring ECU that is connected to the in-vehicle network 210 of the vehicle 20 and monitors the in-vehicle network 210.
  • the monitoring ECU 10 is one example of the information processing apparatus in the present embodiment.
  • the learning unit 120 directly acquires and uses, for example, a message flowing through the in-vehicle network 210 as training data.
  • the learning unit 120 outputs the learning dictionary data of Isolation Forest generated using the training data, but the output destination is not outside the vehicle 20, but a storage device on the vehicle 20, for example, a flash memory in the monitoring ECU Etc. are stored in a storage device.
  • learning dictionary generation and abnormality determination are performed on the vehicle 20.
  • the learning unit 120 acquires message data flowing through the in-vehicle network 210 to which the monitoring ECU is connected, and uses it as training data to generate a learning dictionary.
  • the generated learning dictionary data is stored in the storage device of the monitoring ECU.
  • the abnormality determination unit 110 further acquires learning dictionary data from the storage device, and executes abnormality determination of the message using the learning dictionary data.
  • the learning dictionary data used by the abnormality determination unit 110 on the vehicle 20 can be updated.
  • each configuration shown in FIGS. 1A to 1C may be a configuration that can be dynamically changed on the vehicle 20 instead of a fixed configuration on the vehicle 20 after shipment. For example, depending on the communication speed between the vehicle 20 and the external server 10, the usage rate of the computer resources of the monitoring ECU, the remaining power amount when the vehicle 20 is an electric vehicle, or the operation of the driver, between these configurations Switching may be possible.
  • FIG. 2 is a block diagram illustrating a configuration example of the abnormality determination unit 110 and the learning unit 120 included in the abnormality detection system 100.
  • the learning unit 120 includes a training data receiving unit 122 and a learning dictionary generating unit 124.
  • the training data receiving unit 122 receives input of training data.
  • the training data here is two or more M-dimensional vectors, and M is an integer of 2 or more.
  • the value of each dimension is a value of each byte from the beginning of the payload of the CAN message having a maximum of 8 bytes, for example.
  • the learning unit 120 generates learning dictionary data using the training data received by the training data receiving unit 122, and outputs the learning dictionary data to a storage unit 112 of the abnormality determination unit 110 described later.
  • the data elements are point groups distributed in the M-dimensional space, each point is indicated by a white circle, and the learning dictionary is a boundary in the M-dimensional space and indicated by a thick solid line.
  • this boundary is also referred to as a determination boundary.
  • the determination boundary is a boundary line.
  • the abnormality determination unit 110 includes a storage unit 112, a determination target data reception unit 114, a determination target data conversion unit 116, and a determination execution unit 118.
  • the storage unit 112 stores the learning dictionary data output from the learning unit 120 as described above. In addition, data used for conversion of determination target data described later is also stored in the storage unit 112.
  • the determination target data receiving unit 114 acquires data that is a target of abnormality determination, that is, a CAN message from the in-vehicle network 210.
  • the determination target data conversion unit 116 converts the CAN message received by the determination target data reception unit 114 into a format for processing by the determination execution unit 118. In this conversion, for example, extraction of a determination target portion from the CAN message, normalization using the data for conversion of the determination target data, and the like are performed. The normalization will be described later.
  • the determination execution unit 118 determines whether the determination target data is normal or abnormal, that is, abnormality determination based on the learning dictionary stored as learning dictionary data in the storage unit 112.
  • FIG. 4 is a schematic diagram for explaining this abnormality determination.
  • two pieces of data, determination target data A and determination target data B, are shown in the M-dimensional space based on the values.
  • the determination execution unit 118 determines whether each data is normal or abnormal based on whether the data is positioned inside or outside the determination boundary of the learning dictionary, and outputs the result.
  • determination target data A located inside the determination boundary is determined to be normal
  • determination target data B positioned outside the determination boundary is determined to be abnormal.
  • the monitoring ECU including the abnormality determination unit 110 and the learning unit executes, for example, another program that receives the determination result as an input and outputs an error message to the bus, A command for restricting part or all of the functions of the ECU or shifting another ECU to a special operation mode corresponding to an abnormality is transmitted.
  • the notification of abnormality occurrence toward the driver of the vehicle 20 may be issued by display on the instrument panel or by voice.
  • information regarding the occurrence of an abnormality may be recorded in a log. This log is acquired and used, for example, by a mechanic of the vehicle 20 through a diagnostic port included in the in-vehicle network 210.
  • Each component of the abnormality determination unit 110 and the learning unit 120 executes a part of the Isolation Forest algorithm, and cooperates as described above to execute the entire Isolation Forest algorithm.
  • FIG. 5 is a diagram illustrating a data flow in the learning unit 120 that generates the learning dictionary.
  • FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit 110 that performs abnormality determination. These diagrams are based on a sequence diagram showing the flow of data, and are also represented in a form that also serves as a flowchart showing the processing order in each unit.
  • the training data receiving unit 122 receives input and acquires training data (step S51). If the generation of the learning dictionary is performed before the vehicle 20 is shipped, the training data input source is, for example, a place in the storage device that is artificially specified or preset at this stage. Further, if the learning dictionary is generated after the vehicle 20 is shipped, for example, the vehicle-mounted network 210 to which the monitoring ECU including the learning unit 120 is connected.
  • the training dictionary generation unit 124 normalizes the input training data (step S52), and generates a learning dictionary by the method of Isolation Forest using the normalized training data (step S53).
  • Normalization refers to the original distribution range of the input training data in the M-dimensional space, maintaining the relative positional relationship of each training data, and the distribution range within a predetermined region in the same space. It is a calculation process that converts to pass.
  • the generated learning dictionary data is transferred to the abnormality determination unit 110 (step S54), and the abnormality determination unit 110 stores the learning dictionary data in the storage unit 112 (step S55).
  • the data used in the normalization calculation process is also passed from the learning unit 120 to the abnormality determination unit 110.
  • This data includes the maximum and minimum values of each component of the feature vector necessary for conversion.
  • normalization of unknown data that is a determination target is executed using this data.
  • the determination target data receiving unit 114 acquires data of a CAN message that is a target of abnormality determination from the in-vehicle network 210 (step S61). ).
  • the determination execution unit 118 reads the learning dictionary data stored in the storage unit 112 (step S62). Further, the determination target data conversion unit 116 reads data such as coefficients used for normalization of the training data from the storage unit 112, and normalizes the determination target data, that is, the acquired CAN message data, using this data. (Step S63). The determination execution unit 118 determines whether the normalized data is normal or abnormal based on the learning dictionary data (step S64).
  • the above is the outline of the abnormality detection process including the steps from the generation of the learning dictionary using the training data to the abnormality determination using this learning dictionary, which is executed in the abnormality detection system 100.
  • the Isolation Forest method for this abnormality detection, the load on the computer resources is reduced compared to the conventional case, and the processing can be executed at a higher speed.
  • FIG. 7 is an example of such an inappropriate determination boundary.
  • an erroneous determination is made in which it is determined that the abnormality is abnormal although it is actually normal.
  • data elements indicated by black circles are data elements determined to be abnormal data, and many of these are actually normal data elements.
  • overdetection such erroneous detection based on an erroneous determination that normal data is abnormal.
  • Such a learning dictionary that causes erroneous determination may occur when, for example, the amount of abnormal data included in the training data is insufficient.
  • the process performed in the abnormality detection system 100 in order to obtain a suitable learning dictionary also in such a case is demonstrated.
  • FIG. 8 is a flowchart showing a first processing method which is an example of a training data processing method for obtaining the appropriate learning dictionary described above.
  • the first processing method is executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest consisting of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120.
  • the learning unit 120 reads parameters used for this processing (step S80). Details of the parameters will be described in the following steps.
  • the learning unit 120 acquires the number of data elements of the input training data (step S81).
  • the learning unit 120 determines the number of noise elements added to the training data based on the number of data elements (step S82).
  • the noise element is also an M-dimensional vector.
  • the parameter acquired in step S80 is used to determine the number of noise elements in step S82, and is a real number greater than 0 and less than 1, for example.
  • a value obtained by rounding the value obtained by multiplying the number of data elements acquired in step S81 by this parameter to an integer is used. That is, the number of noise elements is determined to be smaller than the number of data elements of training data.
  • FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane.
  • the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the [0, 1] 2 region in the two-dimensional plane.
  • Such a region is an example of the first region in the present embodiment.
  • the learning unit 120 adds the number of noise elements determined in step S82 over an M-dimensional space that is larger than the first area and includes the first area, that is, a two-dimensional plane area in this example.
  • FIG. 9C is an example of training data after addition of noise elements distributed in the M-dimensional space, and the noise elements are indicated by dotted outline circles distributed in the two-dimensional plane.
  • noise elements are added so as to be distributed over the region [ ⁇ 0.5, 1.5] 2.
  • Such a region is an example of the second region in the present embodiment.
  • step S84 As shown in FIG. 9C, as a result of the process of step S84, a smaller number of noise elements than the original training data data elements are added so as to be distributed over a wider area than the original training data distribution range. Therefore, the distribution density of the noise elements is lower than the distribution density of the data elements of the original training data. In addition, noise elements are added so as to have a uniform distribution as a whole in the above-described region.
  • the learning unit 120 generates noise-added training data including both an element that is an M-dimensional vector in the second region, that is, a training data element and a noise element that are both two-dimensional vectors ( Step S85).
  • the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S85, and outputs the learning dictionary data (step S86).
  • step S82 and step S84 are examples of the second noise addition step
  • step S85 is a generation step
  • step S86 is an example of the learning dictionary data output step in this embodiment.
  • the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
  • the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
  • the number of noise elements that is smaller than the data elements of the original training data is determined by using a parameter that takes a real value greater than 0 and less than 1.
  • the method of determining the number of noise elements is not limited to this.
  • the number of noise elements may be obtained by subtracting a certain number from the number of data elements of training data.
  • the number of training data may be divided into a plurality of ranges, and a predetermined number of noise elements may be used for each range.
  • the correspondence between the number of training data and the number of noise elements is stored in the memory of the information processing apparatus, for example, included in a data table.
  • the first processing method has been described by taking an example in which the data elements of the training data are two-dimensional vectors, but the idea based on the first processing method can be generalized and applied to higher dimensional spaces.
  • the first processing method can also be applied to training data that is a vector of three or more dimensions. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [ ⁇ 0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
  • FIG. 10 is a flowchart showing a second processing method as another example of the training data processing method for obtaining the appropriate learning dictionary described above.
  • the second processing method is also executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120.
  • a case where the second processing method is also started from the initial state of the training data shown in FIG. 9A will be described as an example.
  • the description of the steps common to the first processing method may be simplified.
  • the learning unit 120 reads parameters used for this processing (step S100). Details of the parameters will be described in the following steps.
  • the learning unit 120 normalizes the input training data (step S101).
  • the content of this process is the same as that of the first processing method, and FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. Further, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the area [0, 1] 2 on the two-dimensional plane. Such a region is an example of the first region in the present embodiment.
  • the learning unit 120 sets an M-dimensional space that is larger than the first area and includes the first area, that is, a second area that is a two-dimensional plane area in this example, and the second area is It is divided into third regions that are equal M-dimensional hypercubes (step S102).
  • FIG. 11A is a diagram for explaining the second region and the third region in the two-dimensional plane. In the example shown in FIG. 11A, is an area of [ ⁇ 0.5, 1.5] 2, and the third area is a sub-area obtained by dividing the second area into 64 areas.
  • the parameter acquired in step S100 is used to determine the number of third regions obtained by dividing the second region in step S102, and the value of this parameter is 8 in the example of FIG. 11A.
  • the number is 8 to the Mth power, that is, in this example, the number is squared to 64.
  • the learning unit 120 determines a first threshold value T (T is a natural number) that is a threshold value for the data elements of the training data in each third region (step S104). For example, the parameter acquired in step S100 is used to determine the first threshold T.
  • the parameters used in step S102 may be the same or different. If they are different, they may be calculated from the parameters used in step S102.
  • the number of data elements of training data included in any third region in the first region may be specified.
  • a specific rank may be indicated by a rank in which the number of data elements of training data included in the third region are arranged in the order of size.
  • the number of data elements of training data included in the third region of this specific rank is used as the first threshold value.
  • the order it may be indicated by the number from the minimum value or the maximum value, or by the order from the average value or the median value, whichever is larger or smaller.
  • the learning unit 120 determines whether or not it is necessary to add a noise element to each third region by using the above S and T, determines the number of noise elements to be added to each third region, and determines the noise element Execute the adding procedure.
  • the learning unit 120 confirms whether there is a third region in which the determination as to whether noise elements need to be added has been made (step S105). If there is one (YES in step S105), the learning unit 120 selects one from the third region. It is selected (step S106), and it is determined whether or not the number S of data elements of the training data in the third region is smaller than the first threshold T (step S107).
  • step S107 When the number S of data elements of the training data in the third area is smaller than the first threshold T (YES in step S107), the total number of data elements and noise elements in the third area is T (T -S) noise elements are added (step S108).
  • step S105 If the number S of data elements of the training data in the third area is equal to or greater than the first threshold T (NO in step S107), it is confirmed whether there is an unprocessed third area (step S105).
  • FIG. 11B is a diagram for describing an example of training data and noise elements distributed in a two-dimensional space in the case of NO in step S105.
  • the noise element is indicated by a dotted outline circle.
  • TS 3 noise elements are added.
  • TS 1 noise element is added. Since all other third regions in the first region have S of 9 or more, no noise element is added. Since other hatched third regions are outside the first region and do not include data elements of training data, nine noise elements are added thereto.
  • the noise element is a random number according to a uniform distribution in each third region.
  • the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S109, and outputs the learning dictionary data (step S110).
  • step S101 is a normalization step
  • step S102 is a division step
  • steps S103 to S108 are a first noise addition step
  • step S109 is a generation step
  • step S110 is a learning dictionary data output step. It is an example in embodiment.
  • the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
  • the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
  • the number of noise elements to be added in the first region where the training data is distributed is determined according to the density of each subdivided region. Therefore, in the second processing method, the occurrence of an overcrowded place of data elements and noise elements that can occur in the first region in the first processing method is suppressed.
  • Isolation Forest the place where vector data is overcrowded in training data tends to be inside the judgment boundary. Therefore, if data elements and noise elements are likely to be overcrowded, there is an increased possibility of erroneous determination that even abnormal data is determined to be normal.
  • erroneous detection based on an erroneous determination that abnormal data is normal is also referred to as detection omission for the above-described overdetection.
  • the abnormality detection system 100 in which the abnormality determination of unknown data is performed based on the learning dictionary generated by executing the second processing method, the abnormality detection is performed while suppressing the occurrence of overdetection and also suppressing the possibility of detection omission. Can do.
  • the concept based on this processing method can be applied to a higher-dimensional space in general, and the second processing method is training data that is a vector of three or more dimensions. It can also be applied to.
  • the training data is an M-dimensional vector
  • the range of the first region is read as [0, 1] M
  • the range of the second region is read as [ ⁇ 0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space
  • the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
  • FIG. 12A and FIG. 12B show the determination boundary of the learning dictionary generated using the training data without adding noise, and the determination of the learning dictionary generated using the same training data added with noise by the above processing method. It is a figure which shows a boundary.
  • the training data 1 in FIG. 12A and the training data 2 in FIG. 12B are different types of data acquired from the same vehicle-mounted network. Comparing the training data 1 and the training data 2, the training data 1 has data elements distributed almost uniformly from the center to the periphery of the distribution, and the training data 2 has a sparse distribution of data elements at the periphery. It can be said that the training data 2 is more likely to contain outliers than the training data 1.
  • a circle indicates a data element of training data.
  • a solid line box represents a decision boundary of a learning dictionary generated using training data without adding noise
  • a broken line box represents a decision boundary of a learning dictionary generated using training data added with noise. Noise elements are not shown in each figure.
  • FIG. 12C shows the false detection rate in this abnormality detection test.
  • the left column of each training data is the false detection rate in the learning dictionary obtained without adding noise to the training data
  • the right column is the false detection rate in the learning dictionary obtained by adding noise to the training data.
  • the original training data that includes many normal data elements includes a small amount of data elements that deviate from the training data to some extent in a data space at a lower density than the original training data. It is added inside. This added data element is referred to as a noise element above. And in the abnormality detection system using the learning dictionary produced
  • the first processing method and the second processing method described in the first embodiment are differences in the algorithms of programs executed in the information processing apparatus in order to realize each, for example, by switching the program read by a certain processor It can be selectively executed.
  • the time required for adding a noise element is greater in dependence on the number of training data in the second processing method than in the first processing method, and it takes longer as training data increases. That is, the processing load on the processor is larger in the second processing method.
  • the detection accuracy (low false detection rate) in the generated learning dictionary is improved as compared with the conventional method as described above, but the second processing method is superior.
  • the second processing method is always executed in the abnormality detection system.
  • the difference in processing load as described above is unlikely to be a problem because the abnormality detection system 100A in FIG. 1A or the abnormality detection system 100B in FIG.
  • the configuration such as the abnormality detection system 100C in FIG. 1C it is assumed that there is a limit to computer resources such as the processor operation speed. That is, in a traveling vehicle, there is a possibility that the learning dictionary cannot be generated or updated at a necessary speed by the second processing method.
  • the parameter used to determine the number of noise elements can take a real number larger than 0 and smaller than 1. However, it is difficult to predict in advance which value in this range will generate a learning dictionary that is more suitable for anomaly detection. In order to know this, for example, a plurality of learning dictionaries generated by changing parameter values may be used. Compare the accuracy of anomaly detection performed on test data. However, as a matter of course, if a comparison is made for searching for such an optimum parameter, it takes more time until a learning dictionary used for abnormality detection is determined. If the learning dictionary is determined slowly, the abnormality detection cannot be executed until the learning dictionary is determined or is performed using the old learning dictionary.
  • the former is, for example, divided into two or more in the first area in each dimension, and more than one third area on both sides outside the first area.
  • L can take an integer value of four or more. If the latter is a value used for specifying any one of the third regions in the second region, for example, it can take a real value that is 1 or more and less than or equal to the number of the third regions in the second region.
  • a learning dictionary capable of detecting an abnormality with higher accuracy may be obtained.
  • a learning dictionary used for detecting an abnormality is determined. It takes more time to complete. Therefore, the execution of abnormality detection is delayed or accuracy is sacrificed.
  • the inventors have selected whether to select a training data processing method or perform parameter search in order to cause the abnormality detection system to perform abnormality detection at the required speed and with the highest possible accuracy. I came up with a method to make a quick decision on the anomaly detection system.
  • FIG. 13 is a flowchart illustrating an example of a processing method for determining whether or not to perform training data selection and parameter search in each processing method, which is executed in the abnormality detection system 100.
  • This processing method includes a step executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 will be described as the processing of the learning unit 120.
  • the following description may be made as processing by the abnormality determination unit 110.
  • training data receiving unit 122 has already received training data in the initial state.
  • the learning unit 120 acquires the number N of data elements of training data (step S130).
  • the learning unit 120 determines whether N is greater than or equal to a predetermined second threshold (step S131).
  • the second threshold value is a threshold value used for determining whether to use the first processing method or the second processing method as the training data processing method. For example, the computing ability of the processor that implements the learning unit 120, etc. It is determined by available computer resources and stored in the memory of the information processing apparatus. By using a predetermined threshold in this way, a quick determination can be made.
  • the learning unit 120 selects a first processing method that can be completed in a shorter time (step S132).
  • the learning unit 120 selects a second processing method that provides a learning dictionary capable of detecting an abnormality with higher accuracy. (Step S133).
  • the learning unit 120 determines whether N is greater than or equal to a predetermined third threshold (step S134).
  • the third threshold value is a threshold value used for determining whether or not to search for a parameter when executing each processing method of training data.
  • the third threshold is determined by available computer resources such as the computing capability of the processor that implements the learning unit 120 and stored in the memory of the information processing apparatus.
  • the second threshold value may be related or may be a value independent of each other.
  • the learning unit 120 determines not to execute the parameter search so that it can be completed in a shorter time (step S135).
  • the learning unit 120 When it is determined that N is not equal to or greater than the third threshold, that is, when the number of data elements of the training data is small, the learning unit 120 performs a parameter search for obtaining a learning dictionary capable of detecting an abnormality with higher accuracy (Ste S136).
  • step S137 When generating and outputting learning dictionary data (step S137) through step S132 and step S135, the learning unit 120 executes the first processing method shown in the flowchart of FIG.
  • step S137 When the learning dictionary data is generated and output (step S137) through step S133 and step S135, the learning unit 120 executes the second processing method shown in the flowchart of FIG.
  • FIG. 14 is a flowchart of the first processing method including parameter search, which is executed in the abnormality detection system 100.
  • steps common to the first processing method shown in the flowchart of FIG. 8 are denoted by common reference numerals, and detailed description thereof is omitted.
  • the learning unit 120 executes a set of steps S82 and S84 to S86 a plurality of times by exchanging parameter values.
  • a plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S83 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
  • the anomaly judgment unit 110 has acquired data for Isolation Forest testing. This test data is input, for example, into the abnormality determination unit 110 in advance and stored in the storage unit 112. If it is determined in step S131 that N is not equal to or greater than the second threshold value, the abnormality determination unit 110 performs this test data. Is read from the storage unit 112 and acquired. Then, the abnormality determination unit 110 normalizes the test data using the data used for normalization in step S83, and executes abnormality determination for the test data using each learning dictionary data (step S140).
  • the learning unit 120 evaluates the abnormality determination using each learning dictionary data performed in step S140, and selects the best learning dictionary data as learning dictionary data used for actual abnormality detection based on the evaluation result. (Step S141). For this evaluation, for example, a known evaluation scale such as a recall and F value can be used. Note that step S141 may be performed by the abnormality determination unit 110.
  • step S82 and step S84 are examples of the second noise addition step
  • step S85 is a generation step
  • step S86 is an example of the learning dictionary data output step in this embodiment.
  • step S131 is an example in the present embodiment of the first determination step
  • step S134 is the second determination step.
  • Steps S140 and S141 are examples in the present embodiment corresponding to the test data acquisition step, the evaluation step, and the learning dictionary data selection step.
  • One difference from the case where the first processing method is executed through steps S132 and S135 is that the set of steps S82 and S84 to S86 is performed once before learning dictionary data used for abnormality detection is output. Is only executed or multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
  • FIG. 15 is a flowchart of the second processing method including parameter search, which is executed in the abnormality detection system 100.
  • steps common to the second processing method shown in the flowchart of FIG. 10 are denoted by common reference numerals, and detailed description thereof is omitted.
  • the learning unit 120 executes a set of steps S102 to S110 a plurality of times by exchanging combinations of two types of parameter values.
  • a plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S101 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
  • steps S150 and S151 are the same as those of steps S140 and S141, respectively.
  • step S102 is an example in the present embodiment of the division step
  • steps S103 to S108 are the first noise addition step
  • step S109 is the generation step
  • step S110 is the learning dictionary data output step.
  • step S131 is an example in the present embodiment of the first determination step
  • step S134 is the second determination step.
  • Steps S150 and S151 are examples in the present embodiment corresponding to a test data acquisition step, an evaluation step, and a learning dictionary data selection step.
  • the set of steps S102 to S110 is executed only once until learning dictionary data used for abnormality detection is output. Whether it is executed or executed multiple times.
  • Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
  • the time cost has the largest time when the second processing method is executed including parameter search.
  • the time cost is large when the first processing method is executed including parameter search.
  • the time costs of the remaining two patterns are significantly smaller.
  • the second threshold value and the third threshold value may be independent values, but may be determined in consideration of the magnitude relationship of this time cost.
  • step S134 the threshold value used in step S134 is switched according to the determination result in step S131, that is, depending on whether the first processing method or the second processing method is used for adding noise. Also good. For example, when the second processing method is used, the third threshold is used, and when the first processing method is used, a fourth threshold that is another predetermined threshold is used instead of the third threshold. Also good. Step S134 in the case where the fourth threshold is used in this way is an example of the third determination step in the present embodiment.
  • two determinations are made, that is, the determination of the noise addition processing method and the determination of whether or not to execute the parameter search for each processing method. Both are not mandatory.
  • the time cost may be adjusted by only one of these determinations.
  • the parameters to be replaced for the search may be changed in stages. That is, as the number of data elements of training data increases, the number of parameters to be replaced may be reduced.
  • the number of parameters may be a value calculated from the number of data elements, or may be a value determined in advance for each predetermined range of data elements. That is, it is sufficient that there is a negative correlation between the number of data elements of training data and the number of parameters. Thereby, when there are many data elements of training data, the increase in the load of calculation processing can be suppressed so that the time required to determine learning dictionary data does not become too long.
  • the first processing method is executed for training data processing or the second processing method is determined according to the comparison result of the number N of data elements of training data with the second threshold value.
  • execution is selected, it is not limited to this.
  • the option may be two options of executing either the first processing method or the second processing method and not executing the training data processing.
  • Embodiments 1 and 2 have been described as examples of the technology according to the present disclosure.
  • the technology according to the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are appropriately performed.
  • the following modifications are also included in one embodiment of the present disclosure.
  • the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip.
  • the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. . A computer program is recorded in this RAM. Further, the system LSI achieves its functions by the microprocessor operating according to the computer program recorded in the RAM.
  • each part of the constituent elements constituting each of the above devices may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
  • the system LSI is used here, it may be called IC, LSI, super LSI, or ultra LSI depending on the degree of integration.
  • the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible.
  • An FPGA Field Programmable Gate Array
  • a reconfigurable processor that can reconfigure the connection and setting of the circuit cells inside the LSI may be used.
  • integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied as a possibility.
  • each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device.
  • This IC card or module is a computer system including a microprocessor, ROM, RAM, and the like. Further, this IC card or module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
  • each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component.
  • Each component may be realized by a program executor such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the software that realizes the information processing apparatus of the above-described embodiment is a program as follows.
  • this program allows the computer to receive input of N data elements (M is an integer equal to or greater than 2) M-dimensional vectors (M is an integer equal to or greater than 2) used as training data for Isolation Forest.
  • An element acquisition step a normalization step for normalizing the training data to be distributed over the M-dimensional first region, and an M-dimensional second region larger than the first region and including the first region.
  • a first noise addition step for adding noise elements in a uniform distribution a generation step for generating noise addition training data including data elements and noise elements, and generation of learning dictionary data for Isolation Forest using the noise addition training data.
  • An information processing method including a learning dictionary data output step to be output is executed.
  • the present disclosure can be implemented as an information processing apparatus that generates learning dictionary data using training data and provides the learning dictionary data to an abnormality determination apparatus that performs abnormality determination, as described in the above embodiment. is there. Moreover, it is realizable also as an abnormality detection system provided with this information processing apparatus and abnormality determination apparatus.
  • This abnormality determination device is a monitoring ECU that realizes an abnormality determination unit connected to the in-vehicle network 210, for example, within the abnormality detection system configured as shown in FIG. 1A or 1C. Moreover, if it is in the abnormality detection system of the structure shown by FIG. 1BC, it is the external server 10 which implement
  • This network is typically an in-vehicle CAN network as described above, but is not limited thereto.
  • a network such as CAN-FD (CAN with Flexible Data rate), FlexRay, Ethernet, LIN (Local Interconnect Network), MOST (Media Oriented Systems Transport) may be used.
  • CAN-FD CAN with Flexible Data rate
  • FlexRay FlexRay
  • Ethernet LIN
  • LIN Local Interconnect Network
  • MOST Media Oriented Systems Transport
  • an in-vehicle network combining these networks as sub-networks with a CAN network may be used.
  • each component may be a circuit.
  • a plurality of components may constitute one circuit as a whole, or may constitute separate circuits.
  • Each circuit may be a general-purpose circuit or a dedicated circuit.
  • a process executed by a specific component may be executed by another component instead of the specific component.
  • the order of the plurality of processes may be changed, and the plurality of processes may be executed in parallel.
  • This disclosure can be used for an in-vehicle network system including an in-vehicle network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

Provided is an information processing device, comprising a processor. The processor: receives an input of data elements which are two or more vectors which are used as training data; normalizes the training data so as to distribute same across a first region; segments a multidimensional second region which contains the first region into third regions which are hypercubes of equivalent size; acquires the number S of data elements which each of the third regions includes; adds in a uniform distribution, to each of the third regions which include fewer than a first threshold T of the data elements, (T – S) noise elements which are vectors; generates noise-added training data which includes the vectors in the second region; and using the generated noise-added training data, generates and outputs Isolation Forest learning dictionary data.

Description

情報処理装置、情報処理方法及びプログラムInformation processing apparatus, information processing method, and program
 本開示は、車載ネットワーク等で用いられる異常検知技術に関する。 This disclosure relates to an abnormality detection technology used in an in-vehicle network or the like.
 電子化が進んだ自動車において、車載ネットワークの重要性は以前にまして高い。 In an automobile that has become more electronic, the importance of the in-vehicle network is higher than before.
 自動車には各種のシステムを制御する多数の電子制御ユニット(Electronic Control Unit、以下ECUと表記する)が搭載されている。ECU間では車載ネットワークに接続され、自動車の諸機能を実現するためにこの車載ネットワークを介して通信が行われている。CAN(Controller Area Network)は、このような車載ネットワークの規格のひとつで、標準的な技術として多くの国及び地域で採用されている。 The automobile is equipped with a large number of electronic control units (Electronic Control Units, hereinafter referred to as ECUs) for controlling various systems. The ECUs are connected to an in-vehicle network, and communication is performed through the in-vehicle network in order to realize various functions of the automobile. CAN (Controller Area Network) is one of such in-vehicle network standards, and is adopted as a standard technology in many countries and regions.
 CANのプロトコルに準拠するネットワークは1台の車上で閉じた通信経路として構築可能である。しかしながら、各自動車には外部からのアクセスが可能なネットワークとして構築され搭載されるのが珍しくない。例えば車載ネットワークには、ネットワークを流れる情報を車載の各システムの診断に利用する目的で取り出すためのポートが設置されたり、無線LANを提供する機能を備えるカーナビゲーションシステムが接続されたりしている。車載ネットワークへの外部からのアクセスが可能になることで自動車のユーザにとっての利便性は向上し得るが、その一方で脅威も増大する。 A network conforming to the CAN protocol can be constructed as a closed communication path on a single vehicle. However, it is not uncommon for each automobile to be built and installed as a network that can be accessed from the outside. For example, an in-vehicle network is provided with a port for taking out information flowing through the network for the purpose of diagnosis for each in-vehicle system, or a car navigation system having a function of providing a wireless LAN is connected. Allowing external access to the in-vehicle network can improve convenience for automobile users, but also increases threats.
 例えば2013年には、車載ネットワークの外部からの駐車支援機能等の悪用による不正な車両制御が可能であることが実証された。また、2015年には特定の車種の遠隔からの不正制御が可能であることが実証され、この実証が発端となって当該車種のリコールに発展した。 For example, in 2013, it was proved that unauthorized vehicle control by misuse of the parking support function from the outside of the in-vehicle network is possible. In 2015, it was demonstrated that remote control of a specific vehicle type is possible, and this verification started as a recall for that vehicle type.
 このような外部からのアクセスによる車両の不正制御は、自動車業界にとっては看過できない問題であり、車載ネットワークのセキュリティ対策は急務な状況にある。 Such unauthorized control of vehicles by external access is a problem that cannot be overlooked by the automobile industry, and security measures for in-vehicle networks are urgently needed.
 車載ネットワークへの攻撃の一手法としては、ネットワークに接続されるECUに外部からアクセスしてこのECUを乗っ取り、このECUから攻撃のためのフレーム(以下では攻撃フレームともいう)を送信させて自動車を不正に制御するものがある。攻撃フレームは、攻撃されていない車載ネットワークを流れる正常なフレームとは何らかの点で異なる異常なフレームである。 One method of attacking the in-vehicle network is to access the ECU connected to the network from the outside, take over the ECU, and transmit a frame for attack (hereinafter also referred to as an attack frame) from the ECU to Some of them are illegally controlled. An attack frame is an abnormal frame that differs in some way from a normal frame that flows through an in-vehicle network that is not attacked.
 このような車載ネットワークでの異常検知のための技術として、CANのバス上を流れるフレーム(以下、CANメッセージ又は単にメッセージともいう)に対する異常データ検知処理を、学習データを用いた学習の結果として得る評価モデルを用いて実行する技術が開示されている(特許文献1、特許文献2参照)。 As a technique for detecting an abnormality in such an in-vehicle network, an abnormal data detection process for a frame (hereinafter also referred to as a CAN message or simply a message) flowing on a CAN bus is obtained as a result of learning using learning data. A technique to be executed using an evaluation model is disclosed (see Patent Document 1 and Patent Document 2).
特開2015-026252号公報Japanese Patent Laying-Open No. 2015-026252 特開2015-170121号公報JP2015-170121A
 車載ネットワークへの攻撃及び攻撃に対抗するためのセキュリティ技術は研究途上であって特許文献1、2の技術で十分とは限らず、更なる研究開発が望まれている。 The attacks on the in-vehicle network and the security technology to counter the attacks are still under research, and the techniques of Patent Documents 1 and 2 are not necessarily sufficient, and further research and development are desired.
 本開示は、自動車等の車両の車載ネットワークにおける攻撃による異常検知のために有用な情報処理装置等を提供する。 This disclosure provides an information processing apparatus and the like that are useful for detecting anomalies due to attacks in an in-vehicle network of a vehicle such as an automobile.
 上記課題を解決するために、本開示の一態様に係る情報処理装置は、プロセッサを備える情報処理装置であって、前記プロセッサは、Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けるデータ要素取得ステップと、前記訓練データをM次元の第一領域に渡って分布させるよう正規化する正規化ステップと、前記第一領域より大きく前記第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割する分割ステップと、前記第三領域のそれぞれが含む前記データ要素の個数S(Sは0以上の整数)を取得し、前記第三領域のうち、第一閾値T(Tは自然数)より少ない個数の前記データ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加する第一ノイズ付加ステップと、前記データ要素及び前記ノイズ要素を含むノイズ付加訓練データを生成する生成ステップと、前記ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力する学習辞書データ出力ステップとを実行する。 In order to solve the above problem, an information processing apparatus according to an aspect of the present disclosure is an information processing apparatus including a processor, and the processor uses N pieces of N (N is 2 or more) used as training data for Isolation Forest. (Integer) M-dimensional vector (M is an integer equal to or larger than 2), a data element acquisition step that receives input of data elements, and normalization that normalizes the training data to be distributed over the first area of M dimensions Step and M-dimensional second region larger than the first region and including the first region into a third region which is an LM hypercube of LM pieces (L is an integer of 4 or more) having the same size. A division step for dividing and the number S of data elements included in each of the third regions (S is an integer of 0 or more) are obtained, and among the third regions, a first threshold T (T is a natural number) A first noise adding step of adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including a smaller number of the data elements; A generation step of generating noise-added training data including a noise element and a learning dictionary data output step of generating and outputting Ilation Forest learning dictionary data using the noise-added training data are executed.
 また、本開示の一態様に係る情報処理方法は、プロセッサを備える情報処理装置を用いて実行される情報処理方法であって、このプロセッサに、Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けさせるデータ要素取得ステップと、訓練データをM次元の第一領域に渡って分布させるよう正規化させる正規化ステップと、第一領域より大きく第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割させる分割ステップと、第三領域のそれぞれが含むデータ要素の個数S(Sは0以上の整数)を取得させ、第三領域のうち、第一閾値T(Tは自然数)より少ない個数のデータ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加させる第一ノイズ付加ステップと、データ要素及びノイズ要素を含むノイズ付加訓練データを生成させる生成ステップと、ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力させる学習辞書データ出力ステップとを含む。 Further, an information processing method according to an aspect of the present disclosure is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest). A data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. And a normalizing step for making an M-dimensional second region larger than the first region and including the first region, the third region being an M-dimensional hypercube of LM pieces (L is an integer of 4 or more) having the same size And the number of data elements included in each of the third regions (S is an integer greater than or equal to 0) is obtained, and among the third regions, the first threshold T (T is a natural number) is obtained. A first noise adding step of adding noise elements as (TS) M-dimensional vectors in a uniform distribution to each of the third regions including a small number of data elements, and including the data elements and the noise elements A generation step for generating noise-added training data and a learning dictionary data output step for generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
 なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、又はコンピュータ読み取り可能なCD-ROMなどの非一時的な記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, apparatus, method, integrated circuit, computer program, or non-transitory recording medium such as a computer-readable CD-ROM. The present invention may be realized by any combination of an apparatus, a method, an integrated circuit, a computer program, and a recording medium.
 本開示によれば、自動車等の車両の車載ネットワークにおける攻撃による異常検知に用いられて誤検知率が抑えられた学習辞書を迅速に提供な情報処理装置等が提供される。 According to the present disclosure, there is provided an information processing apparatus and the like that can quickly provide a learning dictionary that is used for abnormality detection due to an attack in an in-vehicle network of a vehicle such as an automobile and that has a reduced false detection rate.
図1Aは、実施の形態1における情報処理装置を含む異常検知システムの構成例を示すブロック図である。1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1. FIG. 図1Bは、実施の形態1における情報処理装置を含む異常検知システムの構成例を示すブロック図である。FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1. 図1Cは、実施の形態1における情報処理装置を含む異常検知システムの構成例を示すブロック図である。FIG. 1C is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1. 図2は、上記の異常検知システムを構成する異常判定部及び学習部の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of an abnormality determination unit and a learning unit that configure the above-described abnormality detection system. 図3は、上記の学習部が訓練データを用いて生成した学習辞書を説明するための模式図である。FIG. 3 is a schematic diagram for explaining a learning dictionary generated by the learning unit using training data. 図4は、上記の異常判定部による異常判定を説明するための模式図である。FIG. 4 is a schematic diagram for explaining the abnormality determination by the abnormality determination unit. 図5は、学習辞書を生成する上記の学習部でのデータの流れを示す図である。FIG. 5 is a diagram showing a data flow in the learning unit that generates the learning dictionary. 図6は、異常判定を行う上記の異常判定部でのデータの流れを示す図である。FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit that performs abnormality determination. 図7は、訓練データの分布にフィットしていない不適切な判定境界の例を示す図である。FIG. 7 is a diagram illustrating an example of an inappropriate determination boundary that does not fit the distribution of training data. 図8は、上記の異常検知システムにおいて実行される、適切な学習辞書を得るための訓練データの処理方法の一例を示すフロー図である。FIG. 8 is a flowchart illustrating an example of a training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system. 図9Aは、M次元空間に分布する正規化前の訓練データの例を示す図である。FIG. 9A is a diagram illustrating an example of training data before normalization distributed in an M-dimensional space. 図9Bは、M次元空間に分布する正規化後の訓練データの例を示す図である。FIG. 9B is a diagram illustrating an example of training data after normalization distributed in an M-dimensional space. 図9Cは、M次元空間に分布するノイズ要素の付加後の訓練データの例を示す図である。FIG. 9C is a diagram illustrating an example of training data after addition of noise elements distributed in the M-dimensional space. 図10は、上記の異常検知システムにおいて実行される、適切な学習辞書を得るための訓練データの処理方法の他の一例を示すフロー図である。FIG. 10 is a flowchart showing another example of the training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system. 図11Aは、M次元空間におけるM次元領域の分割の例を説明するための図である。FIG. 11A is a diagram for explaining an example of division of an M-dimensional region in the M-dimensional space. 図11Bは、M次元空間に分布するノイズ要素の付加後の訓練データの例を説明するための図である。FIG. 11B is a diagram for describing an example of training data after adding noise elements distributed in an M-dimensional space. 図12Aは、ノイズを付加しない訓練データを用いて生成した学習辞書の判定境界と、同じ訓練データにノイズを付加したものを用いて生成した学習辞書の判定境界とを示す図である。FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise. 図12Bは、ノイズを付加しない訓練データを用いて生成した学習辞書の判定境界と、同じ訓練データにノイズを付加したものを用いて生成した学習辞書の判定境界とを示す図である。FIG. 12B is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise. 図12Cは、図12A及び図12Bに判定境界を示す各学習辞書を用いてなされた異常検知試験での誤検知率を示す棒グラフである。FIG. 12C is a bar graph showing a false detection rate in an abnormality detection test performed using each learning dictionary whose determination boundaries are shown in FIGS. 12A and 12B. 図13は、実施の形態2における異常検知システムにおいて実行される、訓練データの処理方法の選択及び各処理方法でのパラメータの探索の実行の有無に関する決定のための処理方法の一例を示すフロー図である。FIG. 13 is a flowchart illustrating an example of a processing method for determining whether to select a training data processing method and whether to perform parameter search in each processing method, which is executed in the abnormality detection system according to the second embodiment. It is. 図14は、実施の形態2における異常検知システムにおいて実行される、より適切な学習辞書を得るための処理方法の一例を示すフロー図である。FIG. 14 is a flowchart illustrating an example of a processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment. 図15は、実施の形態2における異常検知システムにおいて実行される、より適切な学習辞書を得るための処理方法の他の例を示すフロー図である。FIG. 15 is a flowchart illustrating another example of the processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.
 (本開示の基礎になった知見等)
 車載ネットワークのセキュリティ対策として提案されている手法は、大きくに二つに分けられる。
(Knowledge that became the basis of this disclosure)
Methods proposed as security measures for in-vehicle networks can be broadly divided into two.
 ひとつはメッセージの暗号化又は送信元の認証を利用するものである。ただし、この技術には、理論上は有効であるがECUの実装の変更が必要なものもあり、また、自動車1台当たりに搭載されるECUは数百を超える場合があることから、早期の普及は難しい。 One is to use message encryption or sender authentication. However, some of these technologies are theoretically effective but need to be changed in the mounting of the ECU, and the number of ECUs mounted per vehicle may exceed several hundreds. Dissemination is difficult.
 もうひとつは、車載ネットワークを流れるCANメッセージを監視するものである。この手法は、監視用のECU(ノード)を各自動車に追加することで実現可能であり、導入は比較的容易である。提案されているこのような手法をさらに分類すると、ルールベースの手法、データの送信周期を利用する手法、LOF(Local Outlier Factor)を用いてメッセージの内容の外れ値を検知する手法の三種類に大きく分けることができる。 The other is to monitor CAN messages flowing through the in-vehicle network. This method can be realized by adding a monitoring ECU (node) to each vehicle, and is relatively easy to introduce. The proposed method can be further classified into three types: a rule-based method, a method that uses the data transmission cycle, and a method that detects outliers in message contents using LOF (Local Outer Factor). It can be roughly divided.
 これらの三種類の手法のうち、ルールベースの手法及びデータの送信周期を利用する手法では既知の攻撃パターンに対応することができるが、未知の攻撃パターンを検知するには、LOFを利用する手法のようにメッセージの内容に基づく検知が必要である。 Among these three methods, the rule-based method and the method using the data transmission cycle can deal with a known attack pattern, but a method using LOF to detect an unknown attack pattern. Thus, detection based on the content of the message is necessary.
 ただし、LOFを利用する手法では、CANメッセージの評価のために大量の正常データを保持しておく必要があり、要求される計算量が大きい。しかしながら、車載ネットワークに接続されるECUは、データの処理能力及び記憶領域の容量がふんだんであるとは限らず、そのような実行環境でも時速数十km以上で道路を走る自動車で要求される速さで検知が可能な手法でなければ実用的ではない。 However, in the method using LOF, it is necessary to store a large amount of normal data for the evaluation of the CAN message, and a large amount of calculation is required. However, the ECU connected to the in-vehicle network does not always have sufficient data processing capacity and storage capacity, and even in such an execution environment, the speed required for a car traveling on a road at several tens of kilometers per hour is required. If it is not possible to detect, it is not practical.
 そこで本発明者らは、LOFよりも要求される保持データが少なく、計算量の小さいIsolation Forest又はiForest(非特許文献1参照)と呼ばれる異常検知アルゴリズムを車載ネットワークの異常検知の手法に利用することに想到した。また、さらに本発明者らは、Isolation Forestを利用する上で、限られた計算機資源で実行される場合であっても、必要な速さで、かつ極力高い精度での異常検知の実行を可能にする技術を提案する。 Therefore, the present inventors use an abnormality detection algorithm called Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make
 本開示の一態様に係る情報処理装置は、プロセッサを備える情報処理装置であって、このプロセッサは、Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けるデータ要素取得ステップと、訓練データをM次元の第一領域に渡って分布させるよう正規化する正規化ステップと、第一領域より大きく第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割する分割ステップと、第三領域のそれぞれが含むデータ要素の個数S(Sは0以上の整数)を取得し、第三領域のうち、第一閾値T(Tは自然数)より少ない個数のデータ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加する第一ノイズ付加ステップと、データ要素及びノイズ要素を含むノイズ付加訓練データを生成する生成ステップと、ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力する学習辞書データ出力ステップとを実行する。 An information processing apparatus according to an aspect of the present disclosure is an information processing apparatus including a processor, and the processor uses N (N is an integer of 2 or more) M-dimensional vectors (N is an integer of 2 or more) used as training data for Isolation Forest. A data element acquisition step that receives an input of a data element that is an integer greater than or equal to 2), a normalization step that normalizes the training data to be distributed over the M-dimensional first region, and a step larger than the first region. A division step of dividing an M-dimensional second region including one region into third regions which are LM pieces (L is an integer of 4 or more) having the same size, and each of the third regions; The number S of data elements included in S is acquired (S is an integer equal to or greater than 0), and each of the third areas including a number of data elements smaller than the first threshold T (T is a natural number) among the third areas. A (TS) M-dimensional vector of noise elements added in a uniform distribution, a first noise adding step, a data element and a noise adding training data including the noise elements, a generating step, a noise A learning dictionary data output step of generating and outputting the learning dictionary data of the Isolation Forest using the additional training data is executed.
 これにより、より低い誤検知率でのIsolation Forestの実行を可能にする学習辞書を得ることができる。 This makes it possible to obtain a learning dictionary that enables execution of Isolation Forest with a lower false detection rate.
 また例えば、プロセッサは、Nが所定の第二閾値以上であるか否かを判定する第一判定ステップを実行し、第一判定ステップにおいてNが第二閾値以上ではないと判定した場合、分割ステップ及び第一ノイズ付加ステップを実行してから生成ステップ及び学習辞書データ出力ステップを実行してもよい。 Also, for example, the processor executes a first determination step for determining whether N is equal to or greater than a predetermined second threshold value, and if it is determined in the first determination step that N is not equal to or greater than the second threshold value, the division step The generation step and the learning dictionary data output step may be executed after executing the first noise addition step.
 これにより、例えば訓練データのデータ要素の個数が、プロセッサの負荷状況に対して過大である場合は、この訓練データを用いた学習辞書データの生成を延期することができる。 Thereby, for example, when the number of data elements of training data is excessive with respect to the load state of the processor, generation of learning dictionary data using the training data can be postponed.
 また例えば、プロセッサは、第一判定ステップにおいてNが第二閾値以上であると判定した場合、K個(KはNより小さい自然数)のM次元のベクトルであるノイズ要素を第二領域内に一様な密度で付加する第二ノイズ付加ステップを実行してから生成ステップ及び学習辞書データ出力ステップを実行してもよい。 Further, for example, when the processor determines that N is equal to or greater than the second threshold value in the first determination step, the noise element that is an M-dimensional vector of K pieces (K is a natural number smaller than N) is set in the second region. The generation step and the learning dictionary data output step may be executed after executing the second noise addition step for adding at a different density.
 これにより、訓練データの大きさで変わる処理負荷に応じてノイズの付加方法を切り替えることができ、学習辞書を実行環境に適した速さで生成することができる。 This makes it possible to switch the noise addition method according to the processing load that changes depending on the size of the training data, and to generate the learning dictionary at a speed suitable for the execution environment.
 また例えば、プロセッサはさらに、第一判定ステップにおいてNが第二閾値以上でないと判定した場合、Isolation Forestのテスト用データの入力を受けるテスト用データ取得ステップと、Nが所定の第三閾値以上であるか否かを判定する第二判定ステップとを実行し、第二判定ステップにおいてNが第三閾値以上でないと判定した場合、分割ステップ、第一ノイズ付加ステップ、生成ステップ、及び学習辞書データ出力ステップのセットを、分割ステップで値の異なるLを用いて複数回実行して複数の学習辞書データを出力し、さらに、複数の学習辞書データのそれぞれを用いてテスト用データに対する異常検知を実行し、異常検知の結果に基づいて複数の学習辞書データのそれぞれを評価する評価ステップと、評価ステップの結果に基づいて複数の学習辞書データから最良の学習辞書データを選択する学習辞書データ選択ステップとを実行し、第二判定ステップにおいてNが第三閾値以上であると判定した場合、分割ステップで所定の値であるLを用いてセットを1回実行してもよい。 Further, for example, when the processor further determines that N is not equal to or greater than the second threshold value in the first determination step, the processor receives a test data in the Isolation Forest, and N is equal to or greater than a predetermined third threshold value. A second determination step for determining whether or not there is present, and when it is determined in the second determination step that N is not equal to or greater than a third threshold value, a division step, a first noise addition step, a generation step, and learning dictionary data output A set of steps is executed a plurality of times using L having different values in the division step to output a plurality of learning dictionary data, and further, abnormality detection for the test data is executed using each of the plurality of learning dictionary data. An evaluation step for evaluating each of the plurality of learning dictionary data based on the result of the abnormality detection, and an evaluation step A learning dictionary data selection step for selecting the best learning dictionary data from a plurality of learning dictionary data based on the result of the search, and if it is determined in the second determination step that N is greater than or equal to the third threshold, The set may be executed once using L which is a predetermined value in the step.
 これにより、訓練データの大きさで変わる処理負荷に応じて、複数の学習辞書データを生成して最適なものを出力するか、ひとつの学習辞書データを生成して出力するかを切り替えることができる。したがって、学習辞書を実行環境に適した速さで生成することができる。 This makes it possible to switch between generating a plurality of learning dictionary data and outputting the optimal one, or generating and outputting a single learning dictionary data according to the processing load that changes depending on the size of the training data. . Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.
 また例えば、プロセッサは、第二判定ステップにおいてNが第三閾値以上でないと判定した場合、Nの値と負の相関を有するようLの異なる値の個数を決定してもよい。 For example, if the processor determines that N is not equal to or greater than the third threshold in the second determination step, the processor may determine the number of different values of L so as to have a negative correlation with the value of N.
 これにより、訓練データが大きければ、第三領域への分割数を減らすことで処理負荷が減る。したがって、学習辞書を実行環境に適した速さで生成することができる。 ∙ As a result, if the training data is large, the processing load is reduced by reducing the number of divisions into the third area. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.
 また例えば、プロセッサは、第一ノイズ付加ステップにおいて、第一領域内にある第三領域のそれぞれに含まれるデータ要素の個数の中央値より小さい個数のいずれかを第一閾値Tの値として決定してもよい。 Further, for example, in the first noise addition step, the processor determines, as the value of the first threshold value T, any number smaller than the median number of data elements included in each of the third areas in the first area. May be.
 これにより、訓練データが大きければ、ノイズ要素が付加される第三領域の個数を減らすことで処理負荷の増大を抑えることができる。したがって、学習辞書を実行環境に適した速さで生成することができる。 Therefore, if the training data is large, an increase in processing load can be suppressed by reducing the number of third regions to which noise elements are added. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.
 また例えば、プロセッサは、第一判定ステップにおいてNが第二閾値以上であると判定した場合、Isolation Forestのテスト用データの入力を受けるテスト用データ取得ステップと、Nが所定の第四閾値以上であるか否かを判定する第三判定ステップとを実行し、第三判定ステップにおいてNが第四閾値以上でないと判定した場合、第二ノイズ付加ステップ、生成ステップ、及び学習辞書データ出力ステップのセットを、第二ノイズ付加ステップで値の異なるKを用いて複数回実行して複数の学習辞書データを出力し、さらに、複数の学習辞書データのそれぞれを用いてテスト用データに対する異常検知を実行して複数の学習辞書データのそれぞれを評価する評価ステップと、評価ステップの結果に基づいて複数の学習辞書データから最良の学習辞書データを選択する学習辞書データ選択ステップとを実行し、第三判定ステップにおいてNが第四閾値以上であると判定した場合、第二ノイズ付加ステップで所定の値であるKを用いてセットを1回実行してもよい。 In addition, for example, when the processor determines that N is equal to or greater than the second threshold value in the first determination step, the processor receives the test data for Isolation Forest test data, and N is equal to or greater than a predetermined fourth threshold value. A third determination step for determining whether or not there is present, and if it is determined in the third determination step that N is not greater than or equal to the fourth threshold, a set of a second noise addition step, a generation step, and a learning dictionary data output step Is executed a plurality of times using K having different values in the second noise addition step to output a plurality of learning dictionary data, and further, abnormality detection is performed on the test data using each of the plurality of learning dictionary data. An evaluation step for evaluating each of a plurality of learning dictionary data, and a plurality of learning dictionaries based on the result of the evaluation step. A learning dictionary data selection step for selecting the best learning dictionary data from the data, and if it is determined in the third determination step that N is equal to or greater than a fourth threshold, K is a predetermined value in the second noise addition step. The set may be executed once using.
 これにより、訓練データの大きさで変わる処理負荷に応じて、複数の学習辞書データを生成して最適なものを出力するか、ひとつの学習辞書データを生成して出力するかを切り替えることができる。したがって、学習辞書を実行環境に適した速さで生成することができる。 This makes it possible to switch between generating a plurality of learning dictionary data and outputting the optimal one, or generating and outputting a single learning dictionary data according to the processing load that changes depending on the size of the training data. . Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.
 また例えば、プロセッサは、第三判定ステップにおいてNが第四閾値以上でないと判定した場合、Nの値と負の相関を有するようKの異なる値の個数を決定してもよい。 For example, if the processor determines that N is not equal to or greater than the fourth threshold in the third determination step, the processor may determine the number of different values of K so as to have a negative correlation with the value of N.
 これにより、生成する学習辞書の個数を減らすことで処理負荷の増大を抑えることができる。したがって、学習辞書を実行環境に適した速さで生成することができる。 This makes it possible to suppress an increase in processing load by reducing the number of learning dictionaries to be generated. Therefore, the learning dictionary can be generated at a speed suitable for the execution environment.
 また例えば、第一領域をM次元の空間における[0,1]Mの超立方体で画定される領域とすると、第二領域は、このM次元の空間において[-0.5,1.5]Mの超立方体で画定される領域であってもよい。 For example, if the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space, the second region is [−0.5, 1.5] in the M-dimensional space. It may be a region defined by a hypercube of M.
 これにより、学習辞書の生成に利用可能な訓練データに外れ値が少ない場合であっても、より低い誤検知率での異常検知を可能にする学習辞書を得ることができる。 Thereby, even if there are few outliers in the training data that can be used to generate the learning dictionary, it is possible to obtain a learning dictionary that enables anomaly detection with a lower false detection rate.
 また、本開示の一態様に係る異常検知システムは、上記に記載の情報処理装置のいずれかと、情報処理装置から出力された学習辞書データを記憶するメモリ及びプロセッサを備え、ネットワークに接続される異常判定装置であって、プロセッサは、ネットワークを流れるデータを取得し、取得されたデータの異常判定をメモリに記憶されている学習辞書データに基づいて実行する異常判定装置とを備える。 An abnormality detection system according to an aspect of the present disclosure includes any one of the information processing apparatuses described above and a memory and a processor that store learning dictionary data output from the information processing apparatus, and is connected to a network. The determination device, the processor includes an abnormality determination device that acquires data flowing through a network and executes an abnormality determination of the acquired data based on learning dictionary data stored in a memory.
 これにより、精度を考慮した上で迅速に更新される学習辞書を利用して異常検知が実行される。 Thus, abnormality detection is performed using a learning dictionary that is updated quickly in consideration of accuracy.
 また、本開示の一態様に係る情報処理方法は、プロセッサを備える情報処理装置を用いて実行される情報処理方法であって、このプロセッサに、Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けさせるデータ要素取得ステップと、訓練データをM次元の第一領域に渡って分布させるよう正規化させる正規化ステップと、第一領域より大きく第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の第三領域に分割させる分割ステップと、第三領域のそれぞれが含むデータ要素の個数S(Sは0以上の整数)を取得させ、第三領域のうち、第一閾値T(Tは自然数)より少ない個数のデータ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加させる第一ノイズ付加ステップと、データ要素及びノイズ要素を含むノイズ付加訓練データを生成させる生成ステップと、ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力させる学習辞書データ出力ステップとを含む。 Further, an information processing method according to an aspect of the present disclosure is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest). A data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. Normalization step, and dividing the M-dimensional second area larger than the first area and including the first area into LM third areas (L is an integer of 4 or more) having the same size. The step S and the number S of data elements included in each of the third regions (S is an integer greater than or equal to 0) are acquired, and the number of data elements smaller than the first threshold T (T is a natural number) among the third regions A first noise addition step for adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including the data elements, and noise addition including the data elements and the noise elements A generation step of generating training data, and a learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
 また、本開示の一態様に係るプログラムは、コンピュータが備えるプロセッサに、上記の情報処理方法を実行させるプログラムである。 Also, a program according to an aspect of the present disclosure is a program that causes a processor included in a computer to execute the above information processing method.
 このような方法又はプログラムによっても、より低い誤検知率でのIsolation Forestの実行を可能にする学習辞書を得ることができる。 Also by such a method or program, it is possible to obtain a learning dictionary that enables execution of Isolation Forest with a lower false detection rate.
 なお、これらの全般的又は具体的な態様は、システム、方法、集積回路、コンピュータプログラム、又はコンピュータで読み取り可能なCD-ROM等の記録媒体のいずれで実現されてもよく、システム、方法、集積回路、コンピュータプログラム又は記録媒体の任意な組み合わせで実現されてもよい。 These general or specific aspects may be realized by any of a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. You may implement | achieve with arbitrary combinations of a circuit, a computer program, or a recording medium.
 以下、実施の形態に係る情報処理装置、情報処理方法等について、図面を参照しながら説明する。ここで示す実施の形態は、いずれも本開示の一具体例を示すものである。したがって、以下の実施の形態で示される数値、構成要素、構成要素の配置及び接続形態、並びに、ステップ(工程)及びステップの順序等は、一例であって本開示を限定するものではない。 Hereinafter, an information processing apparatus, an information processing method, and the like according to embodiments will be described with reference to the drawings. Each of the embodiments shown here shows a specific example of the present disclosure. Therefore, numerical values, components, arrangement and connection forms of components, and steps (processes) and order of steps shown in the following embodiments are merely examples, and do not limit the present disclosure.
 また、以下の実施の形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意に付加可能な構成要素である。各図は模式図であり、必ずしも厳密に図示されたものではない。 In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims can be arbitrarily added. Each figure is a schematic diagram and is not necessarily shown strictly.
 また、以下に含まれるCAN及びIsolation Forestに関する説明は、本開示の理解の一助を主な趣旨とするものであり、この説明のうち請求項に含まれない事項については、本開示を限定する趣旨で記載されるものではない。 In addition, the explanation regarding CAN and Isolation Forest included in the following is mainly intended to assist understanding of the present disclosure, and matters not included in the claims of this description are intended to limit the present disclosure. It is not described in.
 (実施の形態1)
 [構成]
 [概要]
 図1Aから図1Cは、実施の形態1における情報処理装置を含む異常検知システムの一構成例をそれぞれ示すブロック図である。
(Embodiment 1)
[Constitution]
[Overview]
1A to 1C are block diagrams respectively showing a configuration example of an abnormality detection system including an information processing apparatus according to Embodiment 1.
 図1Aから図1Cには、構成の異なる異常検知システム100A、100B、及び100Cがそれぞれ示される。 1A to 1C show abnormality detection systems 100A, 100B, and 100C having different configurations, respectively.
 異常検知システム100A~100Cは、監視対象であるネットワークを流れるデータの異常を、Isolation Forestと呼ばれるアルゴリズムを用いて検知するシステムであり、いずれも異常判定部110及び学習部120を備える。 The anomaly detection systems 100A to 100C are systems that detect an anomaly of data flowing through a network to be monitored using an algorithm called Isolation Forest, and each includes an anomaly determination unit 110 and a learning unit 120.
 異常判定部110は、車両20が備える車載ネットワーク210を流れるデータが正常か異常かを判定する。車両20は例えば自動車である。 The abnormality determination unit 110 determines whether data flowing through the in-vehicle network 210 included in the vehicle 20 is normal or abnormal. The vehicle 20 is an automobile, for example.
 車載ネットワーク210は、例えばCANの規格に対応するネットワークであり、図1Aから図1Cの各構成例では、バスと、このバスに接続される複数のECU及び診断用ポートとを含む。複数のECUには、各種のセンサから測定データを収集して分析するECU、エンジンを制御するECU、ブレーキを制御するECU、ネットワークを監視するECU等の、機能の異なるECUが含まれる。車載ネットワーク210を流れるデータとは、バスを流れるメッセージのデータである。 The in-vehicle network 210 is a network corresponding to, for example, a CAN standard, and includes a bus, a plurality of ECUs and diagnostic ports connected to the bus in each of the configuration examples of FIGS. 1A to 1C. The plurality of ECUs include ECUs having different functions such as an ECU that collects and analyzes measurement data from various sensors, an ECU that controls an engine, an ECU that controls a brake, and an ECU that monitors a network. The data flowing through the in-vehicle network 210 is message data flowing through the bus.
 学習部120は、異常判定部110が上記の判定を行うための事前の学習を行う。より具体的には、学習部120は、訓練データを用いて学習し異常判定部110が上記の判定に用いる学習辞書を生成する。生成された学習辞書のデータ(以下、学習辞書データともいう)は、例えば記憶装置(図示なし)に格納される。 The learning unit 120 performs prior learning for the abnormality determination unit 110 to perform the above determination. More specifically, the learning unit 120 learns using the training data, and generates a learning dictionary that the abnormality determination unit 110 uses for the above determination. The generated learning dictionary data (hereinafter also referred to as learning dictionary data) is stored, for example, in a storage device (not shown).
 異常判定部110は、記憶装置から学習辞書を読み込み、正常か異常かの判定の対象である未知のデータ、つまり車載ネットワーク210から取得したメッセージのデータがこの学習辞書に照らして逸脱しているか否かに基づいて異常であるか否かを判定する。より詳細には、学習部120が生成する学習辞書は複数の二分木からなり、異常判定部110は、これらの複数の二分木から算出したスコアの平均値を用いてデータが異常であるか否かを判定する。なお、Isolation Forestで用いられるこの二分木は、Isolation Tree又はiTreeと呼ばれる。 The abnormality determination unit 110 reads the learning dictionary from the storage device, and whether or not unknown data that is a target of normality or abnormality, that is, message data acquired from the in-vehicle network 210 deviates from the learning dictionary. Whether or not it is abnormal is determined based on whether or not it is abnormal. More specifically, the learning dictionary generated by the learning unit 120 includes a plurality of binary trees, and the abnormality determination unit 110 uses the average value of the scores calculated from the plurality of binary trees to determine whether the data is abnormal. Determine whether. In addition, this binary tree used in Isolation Forest is called Isolation Tree or iTree.
 異常判定部110及び学習部120は、所定のプログラムを読み込んで実行するプロセッサによって提供される機能的な構成要素である。そして図1Aから図1Cの各構成例では、これらのプロセッサの機能的な構成要素を提供するプロセッサの場所が異なる。 The abnormality determination unit 110 and the learning unit 120 are functional components provided by a processor that reads and executes a predetermined program. In each configuration example shown in FIGS. 1A to 1C, the locations of the processors that provide the functional components of these processors are different.
 図1Aに示される構成例では、学習部120が、車両20の外部にある、いわゆるサーバコンピュータである外部サーバ10が備えるプロセッサ及びメモリによって提供される。外部サーバ10は、本実施の形態における情報処理装置の例のひとつである。 In the configuration example shown in FIG. 1A, the learning unit 120 is provided by a processor and a memory included in the external server 10 that is a so-called server computer outside the vehicle 20. The external server 10 is one example of the information processing apparatus in the present embodiment.
 この場合、学習部120は例えば車載ネットワーク210を流れるメッセージを訓練データとして通信網を経由して車両20から取得する。また学習部120は、この訓練データを用いて生成したIsolation Forestの学習辞書データを出力し、通信網を経由して車両20の異常判定部110に提供する。 In this case, the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network. The learning unit 120 also outputs Isolation Forest learning dictionary data generated using the training data, and provides it to the abnormality determination unit 110 of the vehicle 20 via the communication network.
 また、車両20では、学習辞書データは例えば車載ネットワーク210に接続されるネットワーク監視用の監視ECUが備えるマイクロコントローラのフラッシュメモリ等の記憶装置に格納され、このマイクロコントローラのプロセッサによって異常判定部110が提供される。異常判定部110は、バスから取得したメッセージに対して、この記憶装置から学習辞書データを取得した学習辞書データを用いてメッセージの異常判定を実行する。 In the vehicle 20, the learning dictionary data is stored in a storage device such as a flash memory of a microcontroller included in a monitoring ECU for network monitoring connected to the in-vehicle network 210, and the abnormality determination unit 110 is operated by the processor of the microcontroller. Provided. The abnormality determination unit 110 performs message abnormality determination on the message acquired from the bus using the learning dictionary data acquired from the learning dictionary data from the storage device.
 なお、このような構成では、車両20の出荷後に更新された学習辞書データを異常判定部110に提供することができる。 In such a configuration, learning dictionary data updated after shipment of the vehicle 20 can be provided to the abnormality determination unit 110.
 図1Bに示される構成例では、異常判定部110及び学習部120の両方が、車両20の外部にある外部サーバ10が備えるプロセッサ及びメモリによって提供される。このような外部サーバ10も、本実施の形態における情報処理装置の例のひとつである。 1B, both the abnormality determination unit 110 and the learning unit 120 are provided by a processor and a memory included in the external server 10 outside the vehicle 20. Such an external server 10 is also an example of the information processing apparatus in the present embodiment.
 この場合も、学習部120は例えば車載ネットワーク210を流れるメッセージを訓練データとして通信網を経由して車両20から取得する。また学習部120は、この訓練データを用いて生成したIsolation Forestの学習辞書データを出力するが、出力先は外部サーバ10の外ではなく、例えば外部サーバ10が備えるハードディスクドライブ等の記憶装置(図示なし)に格納される。 Also in this case, the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network. The learning unit 120 outputs the learning dictionary data of the Isolation Forest generated using the training data, but the output destination is not outside the external server 10, but a storage device (for example, a hard disk drive provided in the external server 10 (illustrated) None).
 この構成では、異常判定は車両20上ではなく、外部サーバ10で行われる。つまり、車載ネットワーク210を流れるメッセージは、通信網を介して外部サーバ10に送信される。外部サーバ10が受信したこのメッセージは、異常判定部110に入力される。異常判定部110は、記憶装置から学習辞書データを取得し、この学習辞書データを用いてメッセージの異常判定を実行し、その結果を通信網を介して車両20に送信する。 In this configuration, the abnormality determination is performed on the external server 10 instead of on the vehicle 20. That is, the message flowing through the in-vehicle network 210 is transmitted to the external server 10 via the communication network. This message received by the external server 10 is input to the abnormality determination unit 110. Abnormality determination unit 110 acquires learning dictionary data from the storage device, performs abnormality determination of the message using the learning dictionary data, and transmits the result to vehicle 20 via the communication network.
 なお、このような構成では、外部サーバ10において異常判定部110が利用する学習辞書データは随時更新される。 In such a configuration, the learning dictionary data used by the abnormality determination unit 110 in the external server 10 is updated as needed.
 図1Cに示される構成例では、異常判定部110及び学習部120の両方が、車両20の車載ネットワーク210に接続されて車載ネットワーク210を監視するECUである監視ECUが備えるマイクロコントローラによって提供される。監視ECU10は、本実施の形態における情報処理装置の例のひとつである。 In the configuration example shown in FIG. 1C, both the abnormality determination unit 110 and the learning unit 120 are provided by a microcontroller provided in a monitoring ECU that is connected to the in-vehicle network 210 of the vehicle 20 and monitors the in-vehicle network 210. . The monitoring ECU 10 is one example of the information processing apparatus in the present embodiment.
 この場合、学習部120は例えば車載ネットワーク210を流れるメッセージを訓練データとして直接取得して利用する。また学習部120は、この訓練データを用いて生成したIsolation Forestの学習辞書データを出力するが、出力先は車両20の外ではなく、車両20上にある記憶装置、例えば監視ECU内のフラッシュメモリ等の記憶装置に格納される。 In this case, the learning unit 120 directly acquires and uses, for example, a message flowing through the in-vehicle network 210 as training data. The learning unit 120 outputs the learning dictionary data of Isolation Forest generated using the training data, but the output destination is not outside the vehicle 20, but a storage device on the vehicle 20, for example, a flash memory in the monitoring ECU Etc. are stored in a storage device.
 この構成では、学習辞書の生成も異常判定も車両20上で行われる。例えば、監視ECUにおいて、学習部120はこの監視ECUが接続されている車載ネットワーク210を流れるメッセージのデータを取得し、訓練データとして用いて学習辞書を生成する。生成した学習辞書のデータは、監視ECUの記憶装置に格納される。また、監視ECUにおいては、さらに異常判定部110が記憶装置から学習辞書データを取得し、この学習辞書データを用いてメッセージの異常判定を実行する。 In this configuration, learning dictionary generation and abnormality determination are performed on the vehicle 20. For example, in the monitoring ECU, the learning unit 120 acquires message data flowing through the in-vehicle network 210 to which the monitoring ECU is connected, and uses it as training data to generate a learning dictionary. The generated learning dictionary data is stored in the storage device of the monitoring ECU. Moreover, in the monitoring ECU, the abnormality determination unit 110 further acquires learning dictionary data from the storage device, and executes abnormality determination of the message using the learning dictionary data.
 なお、このような構成でも、車両20上の異常判定部110が利用する学習辞書データの更新は可能である。 Even with such a configuration, the learning dictionary data used by the abnormality determination unit 110 on the vehicle 20 can be updated.
 また、図1Aから図1Cに示される各構成は出荷後の車両20で固定的な構成ではなく、車両20上で動的に変更可能な構成であってもよい。例えば車両20と外部サーバ10との間の通信速度、監視ECUの計算機資源の使用率、車両20が電気自動車である場合の残電力量、又は運転者の操作に応じて、これらの構成間での切替が可能であってもよい。 Further, each configuration shown in FIGS. 1A to 1C may be a configuration that can be dynamically changed on the vehicle 20 instead of a fixed configuration on the vehicle 20 after shipment. For example, depending on the communication speed between the vehicle 20 and the external server 10, the usage rate of the computer resources of the monitoring ECU, the remaining power amount when the vehicle 20 is an electric vehicle, or the operation of the driver, between these configurations Switching may be possible.
 [異常判定部及び学習部の構成]
 構成の概要で記載した異常検知システム100A、100B、及び100Cそれぞれの構成要素である異常判定部110及び学習部120の構成について説明する。なお、以下では、異常検知システム100A、100B、及び100Cの一部のいずれかを特定せずに、又は全部を集合的に指して異常検知システム100とも呼ぶ。
[Configuration of abnormality determination unit and learning unit]
The configurations of the abnormality determination unit 110 and the learning unit 120 that are components of the abnormality detection systems 100A, 100B, and 100C described in the configuration overview will be described. In the following description, any one of the abnormality detection systems 100A, 100B, and 100C is not specified, or all of them are collectively referred to as the abnormality detection system 100.
 図2は、異常検知システム100を構成する異常判定部110及び学習部120の構成例を示すブロック図である。 FIG. 2 is a block diagram illustrating a configuration example of the abnormality determination unit 110 and the learning unit 120 included in the abnormality detection system 100.
 図2に示されるように、学習部120は、訓練データ受信部122及び学習辞書生成部124を有する。 2, the learning unit 120 includes a training data receiving unit 122 and a learning dictionary generating unit 124.
 訓練データ受信部122は、訓練データの入力を受ける。ここでいう訓練データとは、2個以上のM次元のベクトルであり、Mは2以上の整数である。各次元の値は、例えば最大8バイトであるCANメッセージのペイロードの先頭からの各バイトの値である。 The training data receiving unit 122 receives input of training data. The training data here is two or more M-dimensional vectors, and M is an integer of 2 or more. The value of each dimension is a value of each byte from the beginning of the payload of the CAN message having a maximum of 8 bytes, for example.
 学習部120は、訓練データ受信部122が入力を受けた訓練データを用いて学習辞書データを生成し、この学習辞書データを後述の異常判定部110の蓄積部112に向けて出力する。 The learning unit 120 generates learning dictionary data using the training data received by the training data receiving unit 122, and outputs the learning dictionary data to a storage unit 112 of the abnormality determination unit 110 described later.
 図3は、M=2の場合における訓練データのデータ要素、及びこの訓練データを用いて生成された学習辞書を説明するための模式図である。図3では、データ要素は、M次元空間内に分布する点群であって各点は白抜きの丸で示され、学習辞書は、M次元空間における境界であって太い実線で示される。この境界のことを以下では判定境界ともいう。なお、M=2の場合、判定境界は境界線である。 FIG. 3 is a schematic diagram for explaining the data elements of the training data in the case of M = 2 and the learning dictionary generated using the training data. In FIG. 3, the data elements are point groups distributed in the M-dimensional space, each point is indicated by a white circle, and the learning dictionary is a boundary in the M-dimensional space and indicated by a thick solid line. Hereinafter, this boundary is also referred to as a determination boundary. When M = 2, the determination boundary is a boundary line.
 さらに図2に示されるように、異常判定部110は、蓄積部112、判定対象データ受信部114、判定対象データ変換部116、及び判定実行部118を備える。 2, the abnormality determination unit 110 includes a storage unit 112, a determination target data reception unit 114, a determination target data conversion unit 116, and a determination execution unit 118.
 蓄積部112は、上述のとおり学習部120から出力された学習辞書データを保存する。また、後述する判定対象データの変換に用いられるデータも蓄積部112に保存される。 The storage unit 112 stores the learning dictionary data output from the learning unit 120 as described above. In addition, data used for conversion of determination target data described later is also stored in the storage unit 112.
 判定対象データ受信部114は、異常判定の対象であるデータ、つまりCANメッセージを車載ネットワーク210から取得する。 The determination target data receiving unit 114 acquires data that is a target of abnormality determination, that is, a CAN message from the in-vehicle network 210.
 判定対象データ変換部116は、判定対象データ受信部114が受信したCANメッセージを、判定実行部118で処理するための形式に変換する。この変換では、例えばCANメッセージからの判定対象の部分の抽出、上記の判定対象データの変換用のデータを用いた正規化等が行われる。正規化については後述する。 The determination target data conversion unit 116 converts the CAN message received by the determination target data reception unit 114 into a format for processing by the determination execution unit 118. In this conversion, for example, extraction of a determination target portion from the CAN message, normalization using the data for conversion of the determination target data, and the like are performed. The normalization will be described later.
 判定実行部118は、蓄積部112に学習辞書データとして保存されている学習辞書に基づいて、判定対象データが正常であるか異常であるかの判定、つまり異常判定を実行する。 The determination execution unit 118 determines whether the determination target data is normal or abnormal, that is, abnormality determination based on the learning dictionary stored as learning dictionary data in the storage unit 112.
 図4は、この異常判定を説明するための模式図である。図4では、判定対象データAと判定対象データBの2件のデータがその値に基づいてM次元空間内に示されている。 FIG. 4 is a schematic diagram for explaining this abnormality determination. In FIG. 4, two pieces of data, determination target data A and determination target data B, are shown in the M-dimensional space based on the values.
 判定実行部118は、各データが学習辞書の判定境界の内側に位置するか外側に位置するかに基づいて正常か異常かを判定し、その結果を出力する。この例では、判定境界の内側に位置する判定対象データAは正常であると判定され、判定境界の外側に位置する判定対象データBは異常であると判定される。異常であるとの判定がなされた場合、異常判定部110及び学習部を含む監視ECUでは、例えばこの判定結果を入力として受ける他のプログラムが実行されてエラーメッセージをバスに出力したり、他のECUの機能の一部又は全部を制限又は他のECUを異常時対応の特別な動作モードに移行させるための命令を送信したりする。また、車両20の運転者に向けた異常発生の通知が、計器盤での表示又は音声によって発せられてもよい。その他、異常発生に関する情報がログに記録されてもよい。このログは、例えば車両20の整備士等が車載ネットワーク210に含まれる診断用ポートを通じて取得し利用する。 The determination execution unit 118 determines whether each data is normal or abnormal based on whether the data is positioned inside or outside the determination boundary of the learning dictionary, and outputs the result. In this example, determination target data A located inside the determination boundary is determined to be normal, and determination target data B positioned outside the determination boundary is determined to be abnormal. When it is determined that there is an abnormality, the monitoring ECU including the abnormality determination unit 110 and the learning unit executes, for example, another program that receives the determination result as an input and outputs an error message to the bus, A command for restricting part or all of the functions of the ECU or shifting another ECU to a special operation mode corresponding to an abnormality is transmitted. Moreover, the notification of abnormality occurrence toward the driver of the vehicle 20 may be issued by display on the instrument panel or by voice. In addition, information regarding the occurrence of an abnormality may be recorded in a log. This log is acquired and used, for example, by a mechanic of the vehicle 20 through a diagnostic port included in the in-vehicle network 210.
 異常判定部110及び学習部120の各構成要素はIsolation Forestのアルゴリズムの一部を実行し、上記のように連携してIsolation Forestのアルゴリズムの全体を実行する。 Each component of the abnormality determination unit 110 and the learning unit 120 executes a part of the Isolation Forest algorithm, and cooperates as described above to execute the entire Isolation Forest algorithm.
 [異常検知システムにおける処理の概要]
 上記の構成要素を備える異常判定部110及び学習部120でのデータの流れを図5及び図6に表す。図5は、学習辞書を生成する学習部120でのデータの流れを示す図である。図6は、異常判定を行う異常判定部110でのデータの流れを示す図である。なお、これらの図はデータの流れを示すシーケンス図を基本として、各部における処理順序を示すフロー図も兼ねた体裁で表されている。
[Outline of processing in anomaly detection system]
Data flows in the abnormality determination unit 110 and the learning unit 120 including the above-described components are shown in FIGS. FIG. 5 is a diagram illustrating a data flow in the learning unit 120 that generates the learning dictionary. FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit 110 that performs abnormality determination. These diagrams are based on a sequence diagram showing the flow of data, and are also represented in a form that also serves as a flowchart showing the processing order in each unit.
 図5に示されるように、学習辞書を生成する学習部120では、まず訓練データ受信部122が入力を受けて訓練データが取得される(ステップS51)。訓練データの入力元は、学習辞書の生成の実行が車両20の出荷前であれば、例えばこの段階で人為的に指定又はあらかじめ設定された記憶装置内の場所である。また、学習辞書の生成の実行が車両20の出荷後であれば、例えば学習部120を含む監視ECUが接続されている車載ネットワーク210である。 As shown in FIG. 5, in the learning unit 120 that generates the learning dictionary, first, the training data receiving unit 122 receives input and acquires training data (step S51). If the generation of the learning dictionary is performed before the vehicle 20 is shipped, the training data input source is, for example, a place in the storage device that is artificially specified or preset at this stage. Further, if the learning dictionary is generated after the vehicle 20 is shipped, for example, the vehicle-mounted network 210 to which the monitoring ECU including the learning unit 120 is connected.
 次に学習部120では、入力された訓練データを、学習辞書生成部124が正規化し(ステップS52)、正規化済みの訓練データを用いてIsolation Forestの手法で学習辞書を生成する(ステップS53)。なお、正規化とは、入力された訓練データのM次元空間での元の分布範囲を、各訓練データの相対的な位置関係を保持してその分布範囲が同空間内の所定の領域内に渡るよう変換する計算処理である。 Next, in the learning unit 120, the training dictionary generation unit 124 normalizes the input training data (step S52), and generates a learning dictionary by the method of Isolation Forest using the normalized training data (step S53). . Normalization refers to the original distribution range of the input training data in the M-dimensional space, maintaining the relative positional relationship of each training data, and the distribution range within a predetermined region in the same space. It is a calculation process that converts to pass.
 生成された学習辞書のデータは異常判定部110に渡され(ステップS54)、異常判定部110ではこの学習辞書データが蓄積部112に保存される(ステップS55)。また、学習辞書データと合わせて、上記の正規化の計算処理に用いられたデータも学習部120から異常判定部110に渡される。このデータには、変換に必要な特徴ベクトルの各成分の最大値及び最小値等が含まれる。異常判定部110では、このデータを用いて判定対象である未知データの正規化が実行される。 The generated learning dictionary data is transferred to the abnormality determination unit 110 (step S54), and the abnormality determination unit 110 stores the learning dictionary data in the storage unit 112 (step S55). In addition to the learning dictionary data, the data used in the normalization calculation process is also passed from the learning unit 120 to the abnormality determination unit 110. This data includes the maximum and minimum values of each component of the feature vector necessary for conversion. In the abnormality determination unit 110, normalization of unknown data that is a determination target is executed using this data.
 また、図6に示されるように、異常判定を行う異常判定部110では、まず判定対象データ受信部114が、車載ネットワーク210から異常判定の対象であるCANメッセージのデータが取得される(ステップS61)。 As shown in FIG. 6, in the abnormality determination unit 110 that performs abnormality determination, first, the determination target data receiving unit 114 acquires data of a CAN message that is a target of abnormality determination from the in-vehicle network 210 (step S61). ).
 次に異常判定部110では、判定実行部118が、蓄積部112に保存された学習辞書データを読み込む(ステップS62)。また、判定対象データ変換部116は、訓練データの正規化に用いられた係数等のデータを蓄積部112から読み込み、このデータを用いて判定対象データ、つまり取得されたCANメッセージのデータを正規化する(ステップS63)。判定実行部118は、学習辞書データに基づいて、この正規化されたデータが正常か異常か判定する(ステップS64)。 Next, in the abnormality determination unit 110, the determination execution unit 118 reads the learning dictionary data stored in the storage unit 112 (step S62). Further, the determination target data conversion unit 116 reads data such as coefficients used for normalization of the training data from the storage unit 112, and normalizes the determination target data, that is, the acquired CAN message data, using this data. (Step S63). The determination execution unit 118 determines whether the normalized data is normal or abnormal based on the learning dictionary data (step S64).
 以上が異常検知システム100において実行される、訓練データを用いての学習辞書の生成から、この学習辞書を用いての異常判定までの工程を含む異常検知の処理の概要である。この異常検知にIsolation Forestの手法を採用することで計算機資源への負荷が従来と比べて軽減され、より高速に処理を実行することができる。 The above is the outline of the abnormality detection process including the steps from the generation of the learning dictionary using the training data to the abnormality determination using this learning dictionary, which is executed in the abnormality detection system 100. By adopting the Isolation Forest method for this abnormality detection, the load on the computer resources is reduced compared to the conventional case, and the processing can be executed at a higher speed.
 しかしながら、Isolation Forestのアルゴリズムにおいて、学習の結果として得られた学習辞書の判定境界が正常な訓練データのM次元空間の分布に適切にフィットしない場合がある。図7は、このような不適切な判定境界の例である。このように、判定境界が正常なデータ要素の分布の外縁よりも内側にある場合、異常判定では実際は正常であるのに異常であると判定される誤判定がなされる。図7の例では、黒く塗りつぶされた丸で示されるデータ要素は異常データと判定されるデータ要素であり、この中には、実際には正常であるデータ要素が多く含まれる。以下では、このように正常であるデータを異常であるとする誤判定による誤検知のことを、過検知ともいう。 However, in the Isolation Forest algorithm, there are cases where the judgment boundary of the learning dictionary obtained as a result of learning does not properly fit the distribution in the M-dimensional space of normal training data. FIG. 7 is an example of such an inappropriate determination boundary. As described above, when the determination boundary is inside the outer edge of the distribution of normal data elements, an erroneous determination is made in which it is determined that the abnormality is abnormal although it is actually normal. In the example of FIG. 7, data elements indicated by black circles are data elements determined to be abnormal data, and many of these are actually normal data elements. Hereinafter, such erroneous detection based on an erroneous determination that normal data is abnormal is also referred to as overdetection.
 誤判定の原因となるこのような学習辞書は、例えば訓練データに含まれる異常データの量が不十分である場合に起こりえる。以下では、このような場合にも適切な学習辞書を得るために異常検知システム100で行われる処理について説明する。 Such a learning dictionary that causes erroneous determination may occur when, for example, the amount of abnormal data included in the training data is insufficient. Below, the process performed in the abnormality detection system 100 in order to obtain a suitable learning dictionary also in such a case is demonstrated.
 [適切な学習辞書を得るための処理]
 以下では、本実施の形態における適切な学習辞書を得るための処理方法の例を2つ説明する。
[Process to obtain an appropriate learning dictionary]
Below, two examples of the processing method for obtaining the suitable learning dictionary in this Embodiment are demonstrated.
 [第一処理方法]
 図8は、上記で述べた適切な学習辞書を得るための訓練データの処理方法の一例である第一処理方法を示すフロー図である。
[First processing method]
FIG. 8 is a flowchart showing a first processing method which is an example of a training data processing method for obtaining the appropriate learning dictionary described above.
 第一処理方法は、2個以上のM次元のベクトルからなるIsolation Forestの訓練データの入力を受けた後の学習部120において、学習辞書生成部124によって実行される。ただし以下では、学習辞書生成部124による処理であっても学習部120の処理として説明することがある。図9Aは、M=2の場合における、M次元空間、つまり2次元平面に分布する入力された訓練データの初期状態の例である。 The first processing method is executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest consisting of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120. FIG. 9A is an example of an initial state of input training data distributed in an M-dimensional space, that is, a two-dimensional plane, when M = 2.
 まず学習部120は、本処理に用いるパラメータを読み込む(ステップS80)。パラメータの詳細については以降のステップで説明する。 First, the learning unit 120 reads parameters used for this processing (step S80). Details of the parameters will be described in the following steps.
 次に学習部120は、入力された訓練データのデータ要素の個数を取得する(ステップS81)。 Next, the learning unit 120 acquires the number of data elements of the input training data (step S81).
 次に学習部120は、訓練データに付加するノイズ要素の個数をデータ要素の個数に基づいて決定する(ステップS82)。ノイズ要素もまた、M次元のベクトルである。ステップS80で取得されたパラメータは、ステップS82でのノイズ要素の個数の決定に用いられ、例えば0より大きく1より小さい実数である。そして訓練データに付加されるノイズ要素の個数は、ステップS81で取得されたデータ要素の個数にこのパラメータを乗じた値を整数に丸めた値が用いられる。つまり、ノイズ要素の個数は、訓練データのデータ要素の個数よりも小さくなるよう決定される。 Next, the learning unit 120 determines the number of noise elements added to the training data based on the number of data elements (step S82). The noise element is also an M-dimensional vector. The parameter acquired in step S80 is used to determine the number of noise elements in step S82, and is a real number greater than 0 and less than 1, for example. As the number of noise elements added to the training data, a value obtained by rounding the value obtained by multiplying the number of data elements acquired in step S81 by this parameter to an integer is used. That is, the number of noise elements is determined to be smaller than the number of data elements of training data.
 次に学習部120は、訓練データを正規化する(ステップS83)。図9Bは、2次元平面に分布する正規化後の訓練データの例を示す。この例では、正規化前は図9Aに示されるように分布していた訓練データの分布範囲が、2次元平面における[0,1]2の領域に渡るよう変換されている。このような領域は、本実施の形態における第一領域の例である。 Next, the learning unit 120 normalizes the training data (step S83). FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. In this example, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the [0, 1] 2 region in the two-dimensional plane. Such a region is an example of the first region in the present embodiment.
 次に学習部120は、ステップS82で決定された個数のノイズ要素を、第一領域より大きく、且つ第一領域を包含するM次元空間、つまりこの例では2次元平面の領域内に渡って付加する(ステップS84)。図9Cは、M次元空間に分布するノイズ要素の付加後の訓練データの例であり、ノイズ要素は2次元平面内に分布する破線の輪郭の丸で示される。この例では、ノイズ要素が[-0.5,1.5]2の領域に渡って分布するよう付加されている。なお、このような領域は、本実施の形態における第二領域の例である。 Next, the learning unit 120 adds the number of noise elements determined in step S82 over an M-dimensional space that is larger than the first area and includes the first area, that is, a two-dimensional plane area in this example. (Step S84). FIG. 9C is an example of training data after addition of noise elements distributed in the M-dimensional space, and the noise elements are indicated by dotted outline circles distributed in the two-dimensional plane. In this example, noise elements are added so as to be distributed over the region [−0.5, 1.5] 2. Such a region is an example of the second region in the present embodiment.
 図9Cに示されるように、ステップS84の工程の結果、元の訓練データのデータ要素よりも少ない個数のノイズ要素が、元の訓練データの分布範囲よりも広い領域に分布するよう付加される。したがって、ノイズ要素の分布密度は元の訓練データのデータ要素の分布密度に比べて低い。また、ノイズ要素は上記の領域において、全体としては一様分布となるよう付加される。 As shown in FIG. 9C, as a result of the process of step S84, a smaller number of noise elements than the original training data data elements are added so as to be distributed over a wider area than the original training data distribution range. Therefore, the distribution density of the noise elements is lower than the distribution density of the data elements of the original training data. In addition, noise elements are added so as to have a uniform distribution as a whole in the above-described region.
 次に学習部120は、第二領域内にあるM次元のベクトルである要素、つまり、いずれも2次元のベクトルである訓練データのデータ要素及びノイズ要素をともに含むノイズ付加訓練データを生成する(ステップS85)。 Next, the learning unit 120 generates noise-added training data including both an element that is an M-dimensional vector in the second region, that is, a training data element and a noise element that are both two-dimensional vectors ( Step S85).
 最後に学習部120は、ステップS85で生成されたノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成し、この学習辞書データを出力する(ステップS86)。 Finally, the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S85, and outputs the learning dictionary data (step S86).
 なお、上記の各ステップのうち、ステップS82及びステップS84は第二ノイズ付加ステップ、ステップS85は生成ステップ、ステップS86は学習辞書データ出力ステップの本実施の形態における例である。 Of the above steps, step S82 and step S84 are examples of the second noise addition step, step S85 is a generation step, and step S86 is an example of the learning dictionary data output step in this embodiment.
 つまり学習部120は、従来のように正規化した訓練データをそのままは用いない。これに代えて学習部120は、M次元空間において、正規化された訓練データの分布範囲の周辺を含む領域にノイズを加えたものを用いて学習辞書を生成する。 That is, the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
 このようなノイズ付加訓練データを用いて学習辞書を生成することで、訓練データに含まれる異常データが少ない場合にも、図7に示されるような多数の正常データが判定境界の外側に位置するような学習辞書を得ることが回避される。その結果、異常検知システム100では、過検知率を抑えた異常検知をすることができる。 By generating a learning dictionary using such noise-added training data, even when there is little abnormal data included in the training data, a large number of normal data as shown in FIG. 7 is located outside the determination boundary. Obtaining such a learning dictionary is avoided. As a result, the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
 なお、第一処理方法についての上記の説明では、元の訓練データのデータ要素よりも少ないノイズ要素の個数の決定を0より大きく1より小さい実数値を取るパラメータを用いることで行っていたが、ノイズ要素の個数の決定の方法はこれに限定されない。例えばノイズ要素の個数は、訓練データのデータ要素の個数から一定の数を引いたものであってもよい。また、訓練データの個数を複数の範囲に区切り、各範囲に対して予め定めた個数のノイズ要素が用いられてもよい。このような訓練データの個数とノイズ要素の個数との対応は、例えばデータテーブルに含めて情報処理装置のメモリに記憶される。 In the above description of the first processing method, the number of noise elements that is smaller than the data elements of the original training data is determined by using a parameter that takes a real value greater than 0 and less than 1. The method of determining the number of noise elements is not limited to this. For example, the number of noise elements may be obtained by subtracting a certain number from the number of data elements of training data. Further, the number of training data may be divided into a plurality of ranges, and a predetermined number of noise elements may be used for each range. The correspondence between the number of training data and the number of noise elements is stored in the memory of the information processing apparatus, for example, included in a data table.
 また、第一処理方法は、訓練データのデータ要素が2次元のベクトルである場合を例に説明したが、第一処理方法が基づく考え方はより高次元の空間に一般化して適用することができ、第一処理方法は3次元以上のベクトルである訓練データにも適用することができる。訓練データがM次元のベクトルであれば、上記の第一領域の範囲は[0,1]M、第二領域の範囲は[-0.5,1.5]Mと読み替えて適用される。つまり、第一領域はM次元の空間における超立方体である第一超立方体で画定されるM次元空間の領域、第二領域はM次元の空間において第一超立方体より大きくこれを包含する超立方体である第二超立方体で画定されるM次元空間の領域である。 In addition, the first processing method has been described by taking an example in which the data elements of the training data are two-dimensional vectors, but the idea based on the first processing method can be generalized and applied to higher dimensional spaces. The first processing method can also be applied to training data that is a vector of three or more dimensions. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [−0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
 [第二処理方法]
 図10は、上記で述べた適切な学習辞書を得るための訓練データの処理方法の他の一例である第二処理方法を示すフロー図である。
[Second processing method]
FIG. 10 is a flowchart showing a second processing method as another example of the training data processing method for obtaining the appropriate learning dictionary described above.
 第二処理方法も、2個以上のM次元のベクトルからなるIsolation Forestの訓練データの入力を受けた後の学習部120において、学習辞書生成部124によって実行される。ただし以下では、学習辞書生成部124による処理であっても学習部120の処理として説明することがある。第二処理方法も、図9Aに示される訓練データの初期状態から始める場合を例に説明する。また、第一処理方法と共通の工程については説明を簡略化することがある。 The second processing method is also executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120. A case where the second processing method is also started from the initial state of the training data shown in FIG. 9A will be described as an example. In addition, the description of the steps common to the first processing method may be simplified.
 まず学習部120は、本処理に用いるパラメータを読み込む(ステップS100)。パラメータの詳細については以降のステップで説明する。 First, the learning unit 120 reads parameters used for this processing (step S100). Details of the parameters will be described in the following steps.
 次に学習部120は、入力された訓練データを正規化する(ステップS101)。この工程の内容は第一処理方法と共通であり、図9Bは、2次元平面に分布する正規化後の訓練データの例を示す。また、正規化前は図9Aに示されるように分布していた訓練データの分布範囲が、2次元平面における[0,1]2の領域に渡るよう変換されている。このような領域は、本実施の形態における第一領域の例である。 Next, the learning unit 120 normalizes the input training data (step S101). The content of this process is the same as that of the first processing method, and FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. Further, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the area [0, 1] 2 on the two-dimensional plane. Such a region is an example of the first region in the present embodiment.
 次に学習部120は、第一領域より大きく、且つ第一領域を包含するM次元空間、つまりこの例では2次元平面の領域である第二領域を設定し、第二領域を、大きさの等しいM次元の超立方体である第三領域に分割する(ステップS102)。図11Aは、2次元平面における第二領域及び第三領域を説明するための図である。図11Aに示される例では、は[-0.5,1.5]2の領域であり、第三領域は、第二領域を64個に分割して得られるサブ領域である。 Next, the learning unit 120 sets an M-dimensional space that is larger than the first area and includes the first area, that is, a second area that is a two-dimensional plane area in this example, and the second area is It is divided into third regions that are equal M-dimensional hypercubes (step S102). FIG. 11A is a diagram for explaining the second region and the third region in the two-dimensional plane. In the example shown in FIG. 11A, is an area of [−0.5, 1.5] 2, and the third area is a sub-area obtained by dividing the second area into 64 areas.
 ここで、ステップS100で取得されたパラメータは、ステップS102で第二領域を分割して得られる第三領域の個数の決定に用いられ、図11Aの例ではこのパラメータの値は8であり、分割数は8のM乗、つまりこの例では2乗で64個である。 Here, the parameter acquired in step S100 is used to determine the number of third regions obtained by dividing the second region in step S102, and the value of this parameter is 8 in the example of FIG. 11A. The number is 8 to the Mth power, that is, in this example, the number is squared to 64.
 次に学習部120は、第三領域のそれぞれが含む訓練データのデータ要素の個数S(Sは0以上の整数)を取得する(ステップS103)。なお、この時点では第一領域の外にある第三領域内には訓練データのデータ要素はないため、いずれの第三領域についてもS=0である。 Next, the learning unit 120 obtains the number S of data elements of training data included in each of the third regions (S is an integer of 0 or more) (step S103). At this time, since there is no data element of training data in the third area outside the first area, S = 0 for any third area.
 次に学習部120は、各第三領域内にある訓練データのデータ要素についての閾値である第一閾値T(Tは自然数)を決定する(ステップS104)。第一閾値Tの決定には、例えばステップS100で取得されたパラメータが用いられる。ステップS102で用いられるパラメータと同じでもよいし、異なっていてもよい。異なる場合には、ステップS102で用いられるパラメータから算出されてもよい。 Next, the learning unit 120 determines a first threshold value T (T is a natural number) that is a threshold value for the data elements of the training data in each third region (step S104). For example, the parameter acquired in step S100 is used to determine the first threshold T. The parameters used in step S102 may be the same or different. If they are different, they may be calculated from the parameters used in step S102.
 ステップS104で用いられるこのパラメータのより具体的な例を挙げると、例えば第一領域内にあるいずれかの第三領域に含まれる訓練データのデータ要素の個数を特定するものであってもよい。具体例としては、第三領域に含まれる訓練データのデータ要素の個数を大きさ順で並べた順位で特定の順位を示すものであってもよい。この場合、第一閾値には、この特定の順位の第三領域に含まれる訓練データのデータ要素の個数が用いられる。順位の示し方としては、最小値若しくは最大値から何番目であるか、又は平均値若しくは中央値を始点として大小いずれかの何番目であるかで示されてもよい。 As a more specific example of this parameter used in step S104, for example, the number of data elements of training data included in any third region in the first region may be specified. As a specific example, a specific rank may be indicated by a rank in which the number of data elements of training data included in the third region are arranged in the order of size. In this case, the number of data elements of training data included in the third region of this specific rank is used as the first threshold value. As a way of indicating the order, it may be indicated by the number from the minimum value or the maximum value, or by the order from the average value or the median value, whichever is larger or smaller.
 ここから学習部120は、上記のS及びTを用いて各第三領域へのノイズ要素の付加の要否の判断、及び各第三領域に付加するノイズ要素の個数を決定してノイズ要素を付加する手順を実行する。 From here, the learning unit 120 determines whether or not it is necessary to add a noise element to each third region by using the above S and T, determines the number of noise elements to be added to each third region, and determines the noise element Execute the adding procedure.
 まず学習部120は、ノイズ要素の付加の要否に関する判断がなされていない第三領域があるか確認し(ステップS105)、ある場合(ステップS105でYES)には、その第三領域からひとつを選択し(ステップS106)、第三領域の訓練データのデータ要素の個数Sが第一閾値Tより小さいか否か判断する(ステップS107)。 First, the learning unit 120 confirms whether there is a third region in which the determination as to whether noise elements need to be added has been made (step S105). If there is one (YES in step S105), the learning unit 120 selects one from the third region. It is selected (step S106), and it is determined whether or not the number S of data elements of the training data in the third region is smaller than the first threshold T (step S107).
 その第三領域の訓練データのデータ要素の個数Sが第一閾値Tより小さい場合(ステップS107でYES)、その第三領域のデータ要素とノイズ要素との合計数がTになるよう、(T-S)個のノイズ要素を追加する(ステップS108)。 When the number S of data elements of the training data in the third area is smaller than the first threshold T (YES in step S107), the total number of data elements and noise elements in the third area is T (T -S) noise elements are added (step S108).
 その第三領域の訓練データのデータ要素の個数Sが第一閾値T以上である場合(ステップS107でNO)、未処理の第三領域がさらにあるかを確認する(ステップS105)。 If the number S of data elements of the training data in the third area is equal to or greater than the first threshold T (NO in step S107), it is confirmed whether there is an unprocessed third area (step S105).
 全ての第三領域について、ステップS105からステップS107又はS108までの処理がなされると(ステップS105でNO)、学習部120は第二領域内にあるデータ要素及びノイズ要素を含むノイズ付加訓練データを生成する(ステップS109)。図11Bは、ステップS105でNOの場合の2次元空間に分布する訓練データ及びノイズ要素の例を説明するための図である。図11Bにおいてもノイズ要素は破線の輪郭の丸で示されている。 When processing from step S105 to step S107 or S108 is performed for all the third regions (NO in step S105), the learning unit 120 obtains noise-added training data including data elements and noise elements in the second region. Generate (step S109). FIG. 11B is a diagram for describing an example of training data and noise elements distributed in a two-dimensional space in the case of NO in step S105. In FIG. 11B, the noise element is indicated by a dotted outline circle.
 図11Bの例は、第一閾値T=9である場合の例である。第一領域の左下隅にある第三領域は、訓練データのデータ要素の個数S=6であったため、T-S=3個のノイズ要素が付加されている。第一領域の左下隅にある第三領域は、訓練データのデータ要素の個数S=8であったため、T-S=1個のノイズ要素が付加されている。第一領域内の他の第三領域は、全てSが9以上であったため、ノイズ要素は付加されていない。ハッチングが施された他の第三領域は第一領域の外にあって訓練データのデータ要素を含まないため、それぞれ9個のノイズ要素が付加されている。なお、ノイズ要素は各第三領域において、その領域内で一様分布に従う乱数とする。 The example of FIG. 11B is an example when the first threshold T = 9. In the third area at the lower left corner of the first area, since the number of data elements S = 6 of the training data, TS = 3 noise elements are added. In the third area in the lower left corner of the first area, since the number S of data elements of the training data is 8, TS = 1 noise element is added. Since all other third regions in the first region have S of 9 or more, no noise element is added. Since other hatched third regions are outside the first region and do not include data elements of training data, nine noise elements are added thereto. The noise element is a random number according to a uniform distribution in each third region.
 最後に学習部120は、ステップS109で生成されたノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成し、この学習辞書データを出力する(ステップS110)。 Finally, the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S109, and outputs the learning dictionary data (step S110).
 なお、上記の各ステップのうち、ステップS101は正規化ステップ、ステップS102は分割ステップ、ステップS103からS108までは第一ノイズ付加ステップ、ステップS109は生成ステップ、ステップS110は学習辞書データ出力ステップの本実施の形態における例である。 Of the above steps, step S101 is a normalization step, step S102 is a division step, steps S103 to S108 are a first noise addition step, step S109 is a generation step, and step S110 is a learning dictionary data output step. It is an example in embodiment.
 第二処理方法においても、学習部120は、従来のように正規化した訓練データをそのままは用いない。これに代えて学習部120は、M次元空間において、正規化された訓練データの分布範囲の周辺を含む領域にノイズを加えたものを用いて学習辞書を生成する。 Also in the second processing method, the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
 このようなノイズ付加訓練データを用いて学習辞書を生成することで、訓練データに含まれる異常データが少ない場合にも、図7に示されるような多数の正常データが判定境界の外側に位置するような学習辞書を得ることが回避される。その結果、異常検知システム100では、過検知率を抑えた異常検知をすることができる。 By generating a learning dictionary using such noise-added training data, even when there is little abnormal data included in the training data, a large number of normal data as shown in FIG. 7 is located outside the determination boundary. Obtaining such a learning dictionary is avoided. As a result, the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
 また、第二処理方法では第一処理方法と異なり、訓練データが分布する第一領域内に付加するノイズ要素の個数が、より細分化した領域ごとの疎密に応じて決定される。したがって第二処理方法では、第一処理方法では第一領域内で生じ得るデータ要素とノイズ要素との過密な場所の発生が抑えられる。Isolation Forestでは、訓練データにおいてベクトルのデータが過密な場所は、判定境界の内側になりやすい。したがって、データ要素とノイズ要素の過密が生じやすいと、異常なデータであっても正常と判定される誤判定の可能性が高まる。異常であるデータを正常であるとする誤判定による誤検知については、以下では上記の過検知に対して検知漏れともいう。第二処理方法を実行して生成された学習辞書に基づいて未知データの異常判定が行われる異常検知システム100では、過検知の発生を抑えるとともに検知漏れの可能性も抑えた異常検知をすることができる。 Also, in the second processing method, unlike the first processing method, the number of noise elements to be added in the first region where the training data is distributed is determined according to the density of each subdivided region. Therefore, in the second processing method, the occurrence of an overcrowded place of data elements and noise elements that can occur in the first region in the first processing method is suppressed. In Isolation Forest, the place where vector data is overcrowded in training data tends to be inside the judgment boundary. Therefore, if data elements and noise elements are likely to be overcrowded, there is an increased possibility of erroneous determination that even abnormal data is determined to be normal. In the following, erroneous detection based on an erroneous determination that abnormal data is normal is also referred to as detection omission for the above-described overdetection. In the abnormality detection system 100 in which the abnormality determination of unknown data is performed based on the learning dictionary generated by executing the second processing method, the abnormality detection is performed while suppressing the occurrence of overdetection and also suppressing the possibility of detection omission. Can do.
 なお、第二処理方法も第一処理方法と同じく、本処理方法が基づく考え方はより高次元の空間に一般化して適用することができ、第二処理方法は3次元以上のベクトルである訓練データにも適用することができる。訓練データがM次元のベクトルであれば、上記の第一領域の範囲は[0,1]M、第二領域の範囲は[-0.5,1.5]Mと読み替えて適用される。つまり、第一領域はM次元の空間における超立方体である第一超立方体で画定されるM次元空間の領域、第二領域はM次元の空間において第一超立方体より大きくこれを包含する超立方体である第二超立方体で画定されるM次元空間の領域である。 As with the first processing method, the concept based on this processing method can be applied to a higher-dimensional space in general, and the second processing method is training data that is a vector of three or more dimensions. It can also be applied to. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [−0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
 [効果]
 ここで、上記の第二処理方法によって訓練データにノイズを付加することによる効果の実例を示す。
[effect]
Here, an actual example of the effect of adding noise to the training data by the second processing method will be shown.
 図12A及び図12Bは、訓練データにノイズを付加せずに用いて生成した学習辞書の判定境界と、同じ訓練データに上記の処理方法でノイズを付加したものを用いて生成した学習辞書の判定境界とを示す図である。なお、図12Aの訓練データ1と図12Bの訓練データ2とは、同一の実車の車載ネットワークから取得された種類の異なるデータである。訓練データ1と訓練データ2とを比較すると、訓練データ1はデータ要素が分布の中心から周縁までほぼ一様に分布し、訓練データ2は周縁でデータ要素の分布が疎になる。訓練データ2は、訓練データ1よりも外れ値を含む可能性が高いともいえる。 FIG. 12A and FIG. 12B show the determination boundary of the learning dictionary generated using the training data without adding noise, and the determination of the learning dictionary generated using the same training data added with noise by the above processing method. It is a figure which shows a boundary. Note that the training data 1 in FIG. 12A and the training data 2 in FIG. 12B are different types of data acquired from the same vehicle-mounted network. Comparing the training data 1 and the training data 2, the training data 1 has data elements distributed almost uniformly from the center to the periphery of the distribution, and the training data 2 has a sparse distribution of data elements at the periphery. It can be said that the training data 2 is more likely to contain outliers than the training data 1.
 図12A及び図12Bのいずれにおいても、丸は訓練データのデータ要素を示す。また、実線の囲みはノイズを付加しない訓練データを用いて生成した学習辞書の判定境界、破線の囲みはノイズを付加した訓練データを用いて生成した学習辞書の判定境界である。なお、ノイズ要素は各図中で示していない。 In both FIG. 12A and FIG. 12B, a circle indicates a data element of training data. A solid line box represents a decision boundary of a learning dictionary generated using training data without adding noise, and a broken line box represents a decision boundary of a learning dictionary generated using training data added with noise. Noise elements are not shown in each figure.
 これらの図から分かるように、ノイズを付加した場合に得られた学習辞書の判定境界の内側には、ノイズを付加しない場合に得られた学習辞書の判定境界の内側の訓練データの全て、及びその外側の訓練データの多くが含まれている。 As can be seen from these figures, inside the judgment boundary of the learning dictionary obtained when noise is added, all of the training data inside the judgment boundary of the learning dictionary obtained when noise is not added, and Much of the training data outside it is included.
 さらに発明者らは、ノイズを付加した場合に得られた学習辞書の方が適切であるかを確認するためにテスト用のデータを用いて各学習辞書での異常検知試験を行った。図12Cはこの異常検知試験での誤検知率を示す。各訓練データの左の柱は訓練データにノイズを付加しないで得た学習辞書での誤検知率、右の柱は訓練データにノイズを付加して得た学習辞書での誤検知率である。 Furthermore, the inventors conducted an abnormality detection test in each learning dictionary using test data in order to confirm whether the learning dictionary obtained when noise is added is more appropriate. FIG. 12C shows the false detection rate in this abnormality detection test. The left column of each training data is the false detection rate in the learning dictionary obtained without adding noise to the training data, and the right column is the false detection rate in the learning dictionary obtained by adding noise to the training data.
 図12Cからわかるように、ノイズを付加して得た学習辞書での誤検知率には、ノイズを付加しないで得た学習辞書に比べて大幅な改善が見られる。つまり、ノイズを付加した場合に得られた学習辞書の方がより適切であることがわかる。また、この改善は、外れ値を含む可能性が高く、ノイズを付加しないで得た学習辞書でも誤検知率がある程度低かった訓練データ2の場合でも見られる。時速数十km以上で走る車両での異常検知では、過検知であるか検知漏れであるかを問わず誤検知が低く抑えられることの重要性は高い。 As can be seen from FIG. 12C, there is a significant improvement in the false detection rate in the learning dictionary obtained by adding noise compared to the learning dictionary obtained without adding noise. That is, it can be seen that the learning dictionary obtained when noise is added is more appropriate. This improvement is also seen in the case of training data 2 that has a high possibility of including outliers and has a low false detection rate even in a learning dictionary obtained without adding noise. In detecting an abnormality in a vehicle that runs at a speed of several tens of kilometers per hour, it is highly important that false detection is suppressed to a low level regardless of whether it is overdetection or omission of detection.
 一方で、CAN等の規格に準拠するネットワークから得られる訓練データとして、例えばアプリケーション層での異常に由来する異常データも含めて十分なゆらぎを持つデータを収集するのは、必ずしも容易ではない。未知の攻撃パターンで発生する異常データに近い訓練データとなれば、なおのこと用意するのは困難である。つまり、このような訓練データをIsolation Forestでの学習辞書の生成に利用していた従来は、異常検知での誤検知率を抑えるのが困難であった。 On the other hand, it is not always easy to collect data having sufficient fluctuations including, for example, abnormality data derived from abnormality in the application layer as training data obtained from a network conforming to a standard such as CAN. If the training data is close to the abnormal data generated by an unknown attack pattern, it is difficult to prepare. In other words, conventionally, such training data has been used to generate a learning dictionary in Isolation Forest, and it has been difficult to suppress the false detection rate in abnormality detection.
 しかしながら、本実施の形態における処理方法を実行することで、正常データ要素を多く含む元の訓練データに、この訓練データからある程度外れたデータ要素が、元の訓練データより少量、低い密度でデータ空間内に付加される。この付加されるデータ要素を上記ではノイズ要素と呼んでいる。そしてこの訓練データを用いて生成した学習辞書を用いる異常検知システムでは、従来よりも抑えられた誤検知率での異常検知が可能である。 However, by executing the processing method according to the present embodiment, the original training data that includes many normal data elements includes a small amount of data elements that deviate from the training data to some extent in a data space at a lower density than the original training data. It is added inside. This added data element is referred to as a noise element above. And in the abnormality detection system using the learning dictionary produced | generated using this training data, the abnormality detection with the false detection rate suppressed compared with the past is possible.
 (実施の形態2)
 実施の形態1で説明した第一処理方法と第二処理方法とは、それぞれを実現するために情報処理装置において実行されるプログラムのアルゴリズムの差であり、例えばあるプロセッサで読み込むプログラムを切り替えることで選択的に実行が可能である。
(Embodiment 2)
The first processing method and the second processing method described in the first embodiment are differences in the algorithms of programs executed in the information processing apparatus in order to realize each, for example, by switching the program read by a certain processor It can be selectively executed.
 ただし第一処理方法と第二処理方法とでは、次のような差がある。 However, there are the following differences between the first processing method and the second processing method.
 まず、ノイズ要素の付加に要する時間が、第二処理方法は第一処理方法よりも訓練データの量数への依存度が大きく、訓練データが増えるほど時間がかかる。つまり、第二処理方法の方が、プロセッサへの処理負荷が大きい。 First, the time required for adding a noise element is greater in dependence on the number of training data in the second processing method than in the first processing method, and it takes longer as training data increases. That is, the processing load on the processor is larger in the second processing method.
 その一方で、生成される学習辞書での検知の精度(誤検知率の低さ)は、上述のとおりいずれでも従来に比べて改善されるが、第二処理方法の方が優れる。 On the other hand, the detection accuracy (low false detection rate) in the generated learning dictionary is improved as compared with the conventional method as described above, but the second processing method is superior.
 精度の高さという観点では、異常検知システムでは常に第二処理方法が実行されるのが望ましい。そして上記のような処理負荷の差は、図1Aの異常検知システム100A又は図1Bの異常検知システム100Bでは十分な計算機資源を投入しやすいため問題になりにくい。しかしながら、図1Cの異常検知システム100Cのような構成では、プロセッサの演算速度等計算機資源に制限がある場合が想定される。つまり、走行する車両においては、第二処理方法では必要な速度で学習辞書の生成又は更新ができない可能性がある。 From the viewpoint of high accuracy, it is desirable that the second processing method is always executed in the abnormality detection system. The difference in processing load as described above is unlikely to be a problem because the abnormality detection system 100A in FIG. 1A or the abnormality detection system 100B in FIG. However, in the configuration such as the abnormality detection system 100C in FIG. 1C, it is assumed that there is a limit to computer resources such as the processor operation speed. That is, in a traveling vehicle, there is a possibility that the learning dictionary cannot be generated or updated at a necessary speed by the second processing method.
 また、異常検知システムでの検知の時間コスト及び精度に影響するものとしては、処理方法の違いだけでなく、各処理方法におけるパラメータがある。 Also, there are parameters in each processing method as well as differences in processing methods that affect the time cost and accuracy of detection in the abnormality detection system.
 第一処理方法では、ノイズ要素の個数の決定に用いられるパラメータは0より大きく1より小さい実数を値として取り得る。しかし、この範囲のどの値で異常検知により適した学習辞書が生成されるかをあらかじめ予想するのは困難であり、これを知るには、例えばパラメータの値を変えて生成した複数の学習辞書でテスト用のデータに対して行う異常検知の精度を比較する。ただし当然のことながら、このような最適なパラメータの探索のために比較をすれば、異常検知に用いる学習辞書が決定されるまでにより多くの時間がかかる。学習辞書の決定が遅ければ、異常検知は学習辞書の決定まで実行できないか、古い学習辞書を用いて実行されるために精度が落ちる。 In the first processing method, the parameter used to determine the number of noise elements can take a real number larger than 0 and smaller than 1. However, it is difficult to predict in advance which value in this range will generate a learning dictionary that is more suitable for anomaly detection. In order to know this, for example, a plurality of learning dictionaries generated by changing parameter values may be used. Compare the accuracy of anomaly detection performed on test data. However, as a matter of course, if a comparison is made for searching for such an optimum parameter, it takes more time until a learning dictionary used for abnormality detection is determined. If the learning dictionary is determined slowly, the abnormality detection cannot be executed until the learning dictionary is determined or is performed using the old learning dictionary.
 第二処理方法では、第二領域を分割して得られる第三領域の個数の決定に用いられるパラメータ、及び第一閾値Tの決定に用いられるパラメータがある。これらの2つのパラメータのうち前者は、例えば、各次元で第一領域内で1回以上分割することで2個以上、また、第一領域の外では両側に第三領域が1個以上、計4個以上の第三領域が並ぶと想定して、Lは4以上の整数値を取り得る。後者は、例えば第二領域にある第三領域のいずれかの特定に用いられる値であれば、1以上で第二領域にある第三領域の個数以下の実数の値を取り得る。これらのパラメータについても、第一処理方法と同様のことがあてはまり、探索を行えばより精度のよい異常検知が可能な学習辞書が得られる可能性があるが、異常検知に用いる学習辞書が決定されるまでにより多くの時間がかかる。したがって、異常検知の実行が遅れるか、精度が犠牲となる。 In the second processing method, there are a parameter used for determining the number of third regions obtained by dividing the second region and a parameter used for determining the first threshold T. Of these two parameters, the former is, for example, divided into two or more in the first area in each dimension, and more than one third area on both sides outside the first area. Assuming that four or more third regions are arranged, L can take an integer value of four or more. If the latter is a value used for specifying any one of the third regions in the second region, for example, it can take a real value that is 1 or more and less than or equal to the number of the third regions in the second region. For these parameters, the same thing as the first processing method applies, and if a search is performed, a learning dictionary capable of detecting an abnormality with higher accuracy may be obtained. However, a learning dictionary used for detecting an abnormality is determined. It takes more time to complete. Therefore, the execution of abnormality detection is delayed or accuracy is sacrificed.
 発明者らはこれらの点を考慮し、必要な速さで、かつ極力高い精度での異常検知を異常検知システムに実行させるには、訓練データの処理方法の選択又はパラメータの探索の実行の有無に関する迅速な決定を異常検知システムでさせる手法に想到した。 In view of these points, the inventors have selected whether to select a training data processing method or perform parameter search in order to cause the abnormality detection system to perform abnormality detection at the required speed and with the highest possible accuracy. I came up with a method to make a quick decision on the anomaly detection system.
 以下、このような異常検知システムについて説明する。なお、本実施の形態の異常検知システムの構成は実施の形態1と共通でよいため異常検知システム100として説明を省略し、その動作について説明する。 Hereinafter, such an abnormality detection system will be described. In addition, since the structure of the abnormality detection system of this Embodiment may be common with Embodiment 1, description is abbreviate | omitted as the abnormality detection system 100, and the operation | movement is demonstrated.
 [動作]
 以下では、異常検知システム100において訓練データの処理方法の選択又はパラメータの探索の実行の有無に関する迅速な決定のための処理全体について説明し、その説明の中でパラメータの探索のための処理について説明する。
[Operation]
Below, the whole process for the quick decision regarding the selection of the processing method of training data in the abnormality detection system 100 or the execution of parameter search will be described, and the process for parameter search will be described in the description. To do.
 図13は、異常検知システム100において実行される、訓練データの処理方法の選択及び各処理方法でのパラメータの探索の実行の有無に関する決定のための処理方法の一例を示すフロー図である。 FIG. 13 is a flowchart illustrating an example of a processing method for determining whether or not to perform training data selection and parameter search in each processing method, which is executed in the abnormality detection system 100.
 この処理方法は、2個以上のM次元のベクトルからなるIsolation Forestの訓練データの入力を受けた後の学習部120において、学習辞書生成部124によって実行される工程がある。ただし以下では、学習辞書生成部124による処理であっても学習部120の処理として説明する。また、異常判定部110の各構成要素によって実行される工程もあるが、以下では、異常判定部110による処理として説明することがある。 This processing method includes a step executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors. However, hereinafter, even the processing by the learning dictionary generation unit 124 will be described as the processing of the learning unit 120. Further, although there are processes executed by each component of the abnormality determination unit 110, the following description may be made as processing by the abnormality determination unit 110.
 また、以下では、初期状態で訓練データ受信部122が訓練データの入力を既に受けていることを想定して説明する。 In the following description, it is assumed that the training data receiving unit 122 has already received training data in the initial state.
 まず学習部120は、訓練データのデータ要素の個数Nを取得する(ステップS130)。 First, the learning unit 120 acquires the number N of data elements of training data (step S130).
 次に学習部120は、Nが所定の第二閾値以上であるか否かを判定する(ステップS131)。第二閾値は、訓練データの処理方法として第一処理方法と第二処理方法とのいずれを用いるかの判断のために用いられる閾値であり、例えば学習部120を実現するプロセッサの演算能力等の利用可能な計算機資源によって決定されて情報処理装置のメモリに格納されている。このように所定の閾値を用いることで、迅速な判断が可能である。 Next, the learning unit 120 determines whether N is greater than or equal to a predetermined second threshold (step S131). The second threshold value is a threshold value used for determining whether to use the first processing method or the second processing method as the training data processing method. For example, the computing ability of the processor that implements the learning unit 120, etc. It is determined by available computer resources and stored in the memory of the information processing apparatus. By using a predetermined threshold in this way, a quick determination can be made.
 Nが第二閾値以上であると判定した場合、つまり訓練データのデータ要素の個数が多い場合、学習部120は、より短時間で完了できる第一処理方法を選択する(ステップS132)。 When it is determined that N is equal to or greater than the second threshold, that is, when the number of data elements of the training data is large, the learning unit 120 selects a first processing method that can be completed in a shorter time (step S132).
 Nが第二閾値以上でないと判定した場合、つまり訓練データのデータ要素の個数が少ない場合、学習部120は、より精度の高い異常検知が可能な学習辞書が得られる第二処理方法を選択する(ステップS133)。 When it is determined that N is not equal to or greater than the second threshold, that is, when the number of data elements of training data is small, the learning unit 120 selects a second processing method that provides a learning dictionary capable of detecting an abnormality with higher accuracy. (Step S133).
 次に学習部120は、Nが所定の第三閾値以上であるか否かを判定する(ステップS134)。第三閾値は、訓練データの各処理方法の実行時にパラメータの探索を実行するか否かの判断のために用いられる閾値である。第三閾値も第二閾値と同様に、例えば学習部120を実現するプロセッサの演算能力等の利用可能な計算機資源によって決定され、情報処理装置のメモリに格納されている。第二閾値とは関連があってもよいし、相互に独立した値でもよい。このように所定の閾値を用いることで、迅速な判断が可能である。 Next, the learning unit 120 determines whether N is greater than or equal to a predetermined third threshold (step S134). The third threshold value is a threshold value used for determining whether or not to search for a parameter when executing each processing method of training data. Similarly to the second threshold, the third threshold is determined by available computer resources such as the computing capability of the processor that implements the learning unit 120 and stored in the memory of the information processing apparatus. The second threshold value may be related or may be a value independent of each other. By using a predetermined threshold in this way, a quick determination can be made.
 Nが第三閾値以上であると判定した場合、つまり訓練データのデータ要素の個数が多い場合、学習部120は、より短時間で完了できるようパラメータ探索は実行しないと決定する(ステップS135)。 When it is determined that N is equal to or greater than the third threshold, that is, when the number of data elements of the training data is large, the learning unit 120 determines not to execute the parameter search so that it can be completed in a shorter time (step S135).
 Nが第三閾値以上でないと判定した場合、つまり訓練データのデータ要素の個数が少ない場合、学習部120は、より精度の高い異常検知が可能な学習辞書を得るためのパラメータ探索を実行する(ステップS136)。 When it is determined that N is not equal to or greater than the third threshold, that is, when the number of data elements of the training data is small, the learning unit 120 performs a parameter search for obtaining a learning dictionary capable of detecting an abnormality with higher accuracy ( Step S136).
 ステップS132及びステップS135を経て学習辞書データの生成及び出力(ステップS137)をする場合、学習部120は、図8のフロー図に示される第一処理方法を実行する。 When generating and outputting learning dictionary data (step S137) through step S132 and step S135, the learning unit 120 executes the first processing method shown in the flowchart of FIG.
 ステップS133及びステップS135を経て学習辞書データの生成及び出力(ステップS137)をする場合、学習部120は、図10のフロー図に示される第二処理方法を実行する。 When the learning dictionary data is generated and output (step S137) through step S133 and step S135, the learning unit 120 executes the second processing method shown in the flowchart of FIG.
 ステップS132及びステップS136を経て学習辞書データの生成及び出力(ステップS137)をする場合、学習部120は、図14のフロー図に示される第一処理方法を実行する。図14は、異常検知システム100において実行される、パラメータ探索を含む第一処理方法のフロー図である。図14のフロー図では、図8のフロー図に示される第一処理方法と共通の工程は共通の参照符号で示し、詳細な説明は省略する。 When generating and outputting learning dictionary data (step S137) via steps S132 and S136, the learning unit 120 executes the first processing method shown in the flowchart of FIG. FIG. 14 is a flowchart of the first processing method including parameter search, which is executed in the abnormality detection system 100. In the flowchart of FIG. 14, steps common to the first processing method shown in the flowchart of FIG. 8 are denoted by common reference numerals, and detailed description thereof is omitted.
 図14のフロー図に示される第一処理方法では、学習部120は、S82、S84~S86の工程のセットを、パラメータの値を入れ替えて複数回実行する。その結果として生成され出力される複数の学習辞書データは、異常判定部110の蓄積部112に保存される。また、学習部120からは、ステップS83で正規化に用いられたデータも異常判定部110に提供されて蓄積部112に保存される。 In the first processing method shown in the flowchart of FIG. 14, the learning unit 120 executes a set of steps S82 and S84 to S86 a plurality of times by exchanging parameter values. A plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S83 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
 異常判定部110は、Isolation Forestのテスト用データを取得している。このテスト用データは、例えば予め異常判定部110に入力され、蓄積部112に保存されており、ステップS131でNが第二閾値以上でないと判定された場合に異常判定部110がこのテスト用データを蓄積部112から読み込んで取得する。そして、異常判定部110は、ステップS83で正規化に用いられたデータを用いてテスト用データを正規化し、各学習辞書データを用いてテスト用データに対する異常判定を実行する(ステップS140)。 The anomaly judgment unit 110 has acquired data for Isolation Forest testing. This test data is input, for example, into the abnormality determination unit 110 in advance and stored in the storage unit 112. If it is determined in step S131 that N is not equal to or greater than the second threshold value, the abnormality determination unit 110 performs this test data. Is read from the storage unit 112 and acquired. Then, the abnormality determination unit 110 normalizes the test data using the data used for normalization in step S83, and executes abnormality determination for the test data using each learning dictionary data (step S140).
 最後に学習部120は、ステップS140でなされた各学習辞書データを用いた異常判定の評価をし、この評価結果に基づいて最良の学習辞書データを実際の異常検知に用いられる学習辞書データとして選択する(ステップS141)。この評価には、例えば再現率、F値等の既知の評価尺度を利用することができる。なお、ステップS141は異常判定部110によって実施されてもよい。 Finally, the learning unit 120 evaluates the abnormality determination using each learning dictionary data performed in step S140, and selects the best learning dictionary data as learning dictionary data used for actual abnormality detection based on the evaluation result. (Step S141). For this evaluation, for example, a known evaluation scale such as a recall and F value can be used. Note that step S141 may be performed by the abnormality determination unit 110.
 なお、上記の各ステップのうち、ステップS82及びステップS84は第二ノイズ付加ステップ、ステップS85は生成ステップ、ステップS86は学習辞書データ出力ステップの本実施の形態における例である。また、ステップS131は第一判定ステップ、ステップS134は第二判定ステップの本実施の形態における例である。また、ステップS140及びS141はテスト用データ取得ステップ、評価ステップ、及び学習辞書データ選択ステップに対応する本実施の形態における例である。 Of the above steps, step S82 and step S84 are examples of the second noise addition step, step S85 is a generation step, and step S86 is an example of the learning dictionary data output step in this embodiment. Further, step S131 is an example in the present embodiment of the first determination step, and step S134 is the second determination step. Steps S140 and S141 are examples in the present embodiment corresponding to the test data acquisition step, the evaluation step, and the learning dictionary data selection step.
 ステップS132及びステップS135を経て第一処理方法が実行される場合との違いのひとつは、ステップS82、S84~S86の工程のセットが異常検知に用いられる学習辞書データが出力されるまでに1回のみ実行されるのか複数回実行されるかという点にある。また、テスト用データを用いて複数の学習辞書データが評価され、この評価の結果に基づいて最良の学習辞書データが異常検知に用いられる学習辞書データとして選択される点も異なる。 One difference from the case where the first processing method is executed through steps S132 and S135 is that the set of steps S82 and S84 to S86 is performed once before learning dictionary data used for abnormality detection is output. Is only executed or multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
 ステップS133及びステップS136を経て学習辞書データの生成及び出力(ステップS137)をする場合、学習部120は、図15のフロー図に示される第二処理方法を実行する。図15は、異常検知システム100において実行される、パラメータ探索を含む第二処理方法のフロー図である。図15のフロー図では、図10のフロー図に示される第二処理方法と共通の工程は共通の参照符号で示し、詳細な説明は省略する。 When generating and outputting learning dictionary data (step S137) through steps S133 and S136, the learning unit 120 executes the second processing method shown in the flowchart of FIG. FIG. 15 is a flowchart of the second processing method including parameter search, which is executed in the abnormality detection system 100. In the flowchart of FIG. 15, steps common to the second processing method shown in the flowchart of FIG. 10 are denoted by common reference numerals, and detailed description thereof is omitted.
 図15のフロー図に示される第二処理方法では、学習部120は、ステップS102~S110の工程のセットを、2種類のパラメータの値の組み合わせを入れ替えて複数回実行する。その結果として生成され出力される複数の学習辞書データは、異常判定部110の蓄積部112に保存される。また、学習部120からは、ステップS101で正規化に用いられたデータも異常判定部110に提供されて蓄積部112に保存される。 In the second processing method shown in the flowchart of FIG. 15, the learning unit 120 executes a set of steps S102 to S110 a plurality of times by exchanging combinations of two types of parameter values. A plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S101 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
 ステップS150及びS151の内容は、それぞれステップS140及びS141と共通である。 The contents of steps S150 and S151 are the same as those of steps S140 and S141, respectively.
 なお、上記の各ステップのうち、ステップS102は分割ステップ、ステップS103からS108まで第一ノイズ付加ステップ、ステップS109は生成ステップ、ステップS110は学習辞書データ出力ステップの本実施の形態における例である。また、ステップS131は第一判定ステップ、ステップS134は第二判定ステップの本実施の形態における例である。また、ステップS150及びS151はテスト用データ取得ステップ、評価ステップ、及び学習辞書データ選択ステップに対応する本実施の形態における例である。 Of the above steps, step S102 is an example in the present embodiment of the division step, steps S103 to S108 are the first noise addition step, step S109 is the generation step, and step S110 is the learning dictionary data output step. Further, step S131 is an example in the present embodiment of the first determination step, and step S134 is the second determination step. Steps S150 and S151 are examples in the present embodiment corresponding to a test data acquisition step, an evaluation step, and a learning dictionary data selection step.
 ステップS133及びステップS135を経て第二処理方法が実行される場合との違いのひとつは、ステップS102~S110の工程のセットが異常検知に用いられる学習辞書データが出力されるまでに1回のみ実行されるのか複数回実行されるかという点にある。また、テスト用データを用いて複数の学習辞書データが評価され、この評価の結果に基づいて最良の学習辞書データが異常検知に用いられる学習辞書データとして選択される点も異なる。 One difference from the case where the second processing method is executed through steps S133 and S135 is that the set of steps S102 to S110 is executed only once until learning dictionary data used for abnormality detection is output. Whether it is executed or executed multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
 以上に説明したとおり、図13に示されるフロー図ではノイズ付加の処理方法が2通り、各処理方法についてのパラメータ探索の実行の有無で2通りある。つまり、異常検知に用いられる学習辞書データが決定して異常検知の実行が可能になるまで4通りの処理パターンがある。これらの処理パターンのうち、時間コストがもっとも大きいのは、パラメータ探索を含めて第二処理方法が実行されるパターンである。次いで時間コストが大きいのは、パラメータ探索を含めて第一処理方法が実行される場合である。これらの2パターンに比べて残る2パターンの時間コストは大幅に小さい。上記では第二閾値と第三閾値とは独立の値でもよいとしているが、この時間コストの大小関係を考慮して決定されてもよい。 As described above, in the flowchart shown in FIG. 13, there are two types of processing methods for adding noise, and there are two types depending on whether or not parameter search is performed for each processing method. That is, there are four processing patterns until learning dictionary data used for abnormality detection is determined and abnormality detection can be performed. Among these processing patterns, the time cost has the largest time when the second processing method is executed including parameter search. Next, the time cost is large when the first processing method is executed including parameter search. Compared to these two patterns, the time costs of the remaining two patterns are significantly smaller. In the above description, the second threshold value and the third threshold value may be independent values, but may be determined in consideration of the magnitude relationship of this time cost.
 また、ステップS131での判断の結果に応じて、つまりノイズの付加に用いられるのが第一処理方法であるか第二処理方法であるかに応じて、ステップS134で用いられる閾値が切り替えられてもよい。例えば第二処理方法が用いられる場合には、第三閾値が用いられ、第一処理方法が用いられる場合には、第三閾値に代えて別の所定の閾値である第四閾値が用いられてもよい。このように第四閾値が用いられる場合のステップS134は、本実施の形態における第三判定ステップの例である。 Further, the threshold value used in step S134 is switched according to the determination result in step S131, that is, depending on whether the first processing method or the second processing method is used for adding noise. Also good. For example, when the second processing method is used, the third threshold is used, and when the first processing method is used, a fourth threshold that is another predetermined threshold is used instead of the third threshold. Also good. Step S134 in the case where the fourth threshold is used in this way is an example of the third determination step in the present embodiment.
 また、図13のフロー図では、ノイズ付加の処理方法の判断と、各処理方法についてのパラメータ探索の実行の有無の判断との2つの判断がなされているが、時間コストの調整にはこれらの両方が必須ではない。これらの判断のうち一方のみで時間コストの調整が図られてもよい。 Further, in the flowchart of FIG. 13, two determinations are made, that is, the determination of the noise addition processing method and the determination of whether or not to execute the parameter search for each processing method. Both are not mandatory. The time cost may be adjusted by only one of these determinations.
 また、図13のフロー図では、パラメータ探索の実行について用意されている選択肢は実行するかしないかの2つであるが、例えば訓練データのデータ要素の個数に応じて、探索のために入れ替えるパラメータの個数が段階的に変更されてもよい。つまり、訓練データのデータ要素の個数が多いほど、入れ替えられるパラメータの個数が減らされてもよい。この場合、パラメータの個数はデータ要素の個数から算出される値であってもよいし、データ要素の所定範囲ごとにあらかじめ決められた値でもよい。つまり、訓練データのデータ要素の個数とパラメータの個数との間に負の相関があればよい。これにより、訓練データのデータ要素が多い場合には、学習辞書データの決定までに要する時間が長くなり過ぎないように演算処理の負荷の増加が抑えられる。 In the flowchart of FIG. 13, there are two options for execution of parameter search, whether to execute or not. For example, the parameters to be replaced for the search according to the number of data elements of training data. May be changed in stages. That is, as the number of data elements of training data increases, the number of parameters to be replaced may be reduced. In this case, the number of parameters may be a value calculated from the number of data elements, or may be a value determined in advance for each predetermined range of data elements. That is, it is sufficient that there is a negative correlation between the number of data elements of training data and the number of parameters. Thereby, when there are many data elements of training data, the increase in the load of calculation processing can be suppressed so that the time required to determine learning dictionary data does not become too long.
 また、図13のフロー図では、訓練データのデータ要素の個数Nの第二閾値との比較の結果に応じて、訓練データの処理のために第一処理方法を実行するか第二処理方法を実行するかが選択されるが、これに限定されない。例えば、訓練データの処理を実行しないという選択肢がさらにあってもよい。例えば、情報処理装置において他の処理によるプロセッサへの負荷が大きいため、異常検知には現行の学習辞書を継続して利用し、更新のための新たな学習辞書の生成を延期するという場合にこのような判断がされてもよい。また、選択肢は、第一処理方法の実行又は第二処理方法の実行の一方と、訓練データの処理を実行しないという2つの選択肢であってもよい。 In the flowchart of FIG. 13, the first processing method is executed for training data processing or the second processing method is determined according to the comparison result of the number N of data elements of training data with the second threshold value. Although execution is selected, it is not limited to this. For example, there may be further an option of not executing training data processing. For example, when the information processing device has a heavy load on the processor due to other processing, the current learning dictionary is continuously used for abnormality detection, and the generation of a new learning dictionary for updating is postponed. Such a determination may be made. Further, the option may be two options of executing either the first processing method or the second processing method and not executing the training data processing.
 (他の実施の形態)
 以上のように、本開示に係る技術の例示として実施の形態1、2を説明した。しかしながら、本開示に係る技術は、これに限定されず、適宜、変更、置き換え、付加、省略等を行った実施の形態にも適用可能である。例えば、以下のような変形例も本開示の一実施態様に含まれる。
(Other embodiments)
As described above, Embodiments 1 and 2 have been described as examples of the technology according to the present disclosure. However, the technology according to the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are appropriately performed. For example, the following modifications are also included in one embodiment of the present disclosure.
 上記実施の形態における各装置を構成する構成要素の一部又は全部は、1個のシステムLSI(Large Scale Integration:大規模集積回路)から構成されているとしてもよい。システムLSIは、複数の構成部を1個のチップ上に集積して製造された超多機能LSIであり、具体的には、マイクロプロセッサ、ROM、RAM等を含んで構成されるコンピュータシステムである。このRAMには、コンピュータプログラムが記録されている。また、このマイクロプロセッサが、RAMに記録されているコンピュータプログラムに従って動作することにより、システムLSIは、その機能を達成する。また、上記各装置を構成する構成要素の各部は、個別に1チップ化されていてもよいし、一部又は全部を含むように1チップ化されてもよい。また、ここでは、システムLSIとしたが、集積度の違いにより、IC、LSI、スーパーLSI、ウルトラLSIと呼称されることもある。また、集積回路化の手法はLSIに限るものではなく、専用回路又は汎用プロセッサで実現してもよい。LSI製造後に、プログラムすることが可能なFPGA(Field Programmable Gate Array)や、LSI内部の回路セルの接続や設定を再構成可能なリコンフィギュラブルプロセッサを利用してもよい。さらには、半導体技術の進歩又は派生する別技術によりLSIに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適用等が可能性としてあり得る。 Some or all of the constituent elements constituting each device in the above embodiment may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip. Specifically, the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. . A computer program is recorded in this RAM. Further, the system LSI achieves its functions by the microprocessor operating according to the computer program recorded in the RAM. In addition, each part of the constituent elements constituting each of the above devices may be individually made into one chip, or may be made into one chip so as to include a part or all of them. Although the system LSI is used here, it may be called IC, LSI, super LSI, or ultra LSI depending on the degree of integration. Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection and setting of the circuit cells inside the LSI may be used. Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied as a possibility.
 上記各装置を構成する構成要素の一部又は全部は、各装置に脱着可能なICカード又は単体のモジュールから構成されているとしてもよい。このICカード又はモジュールは、マイクロプロセッサ、ROM、RAM等から構成されるコンピュータシステムである。また、このICカード又はモジュールは、上記の超多機能LSIを含むとしてもよい。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、ICカード又はモジュールは、その機能を達成する。このICカード又はこのモジュールは、耐タンパ性を有するとしてもよい。 Some or all of the constituent elements constituting each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device. This IC card or module is a computer system including a microprocessor, ROM, RAM, and the like. Further, this IC card or module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
 なお、上記実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、CPU又はプロセッサなどのプログラム実行器が、ハードディスク又は半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。ここで、上記実施の形態の情報処理装置などを実現するソフトウェアは、次のようなプログラムである。 In the above embodiment, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program executor such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. Here, the software that realizes the information processing apparatus of the above-described embodiment is a program as follows.
 すなわち、このプログラムは、コンピュータに、Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けさせるデータ要素取得ステップと、訓練データをM次元の第一領域に渡って分布させるよう正規化させる正規化ステップと、第一領域より大きく第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割させる分割ステップと、第三領域のそれぞれが含むデータ要素の個数S(Sは0以上の整数)を取得させ、第三領域のうち、第一閾値T(Tは自然数)より少ない個数のデータ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加させる第一ノイズ付加ステップと、データ要素及びノイズ要素を含むノイズ付加訓練データを生成させる生成ステップと、ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力させる学習辞書データ出力ステップとを含む情報処理方法を実行させる。 That is, this program allows the computer to receive input of N data elements (M is an integer equal to or greater than 2) M-dimensional vectors (M is an integer equal to or greater than 2) used as training data for Isolation Forest. An element acquisition step, a normalization step for normalizing the training data to be distributed over the M-dimensional first region, and an M-dimensional second region larger than the first region and including the first region. A division step of dividing into equal LM (L is an integer of 4 or more) M-dimensional hypercube and a third region, and the number S of data elements included in each of the third regions (S is an integer of 0 or more) (TS) M-dimensional vectors are obtained in each of the third regions including the number of data elements smaller than the first threshold T (T is a natural number). A first noise addition step for adding noise elements in a uniform distribution, a generation step for generating noise addition training data including data elements and noise elements, and generation of learning dictionary data for Isolation Forest using the noise addition training data. An information processing method including a learning dictionary data output step to be output is executed.
 また、本開示は上記の実施の形態に記載される、訓練データを用いて学習辞書データを生成し、この学習辞書データを異常判定を実行する異常判定装置に提供する情報処理装置として実現可能である。また、この情報処理装置及び異常判定装置を備える異常検知システムとしても実現可能である。この異常判定装置は、例えば図1A又は図1Cに示される構成の異常検知システム内であれば、車載ネットワーク210に接続される異常判定部を実現する監視ECUである。また、図1BCに示される構成の異常検知システム内であれば、異常判定部を実現する外部サーバ10である。いずれの場合も、情報処理装置から出力された学習辞書データを記憶するメモリ及びプロセッサを備えネットワークに接続される。このネットワークは、典型的には上述のとおり車載のCANネットワークであるが、これに限定されない。 Further, the present disclosure can be implemented as an information processing apparatus that generates learning dictionary data using training data and provides the learning dictionary data to an abnormality determination apparatus that performs abnormality determination, as described in the above embodiment. is there. Moreover, it is realizable also as an abnormality detection system provided with this information processing apparatus and abnormality determination apparatus. This abnormality determination device is a monitoring ECU that realizes an abnormality determination unit connected to the in-vehicle network 210, for example, within the abnormality detection system configured as shown in FIG. 1A or 1C. Moreover, if it is in the abnormality detection system of the structure shown by FIG. 1BC, it is the external server 10 which implement | achieves an abnormality determination part. In either case, a memory and a processor for storing learning dictionary data output from the information processing apparatus are provided and connected to the network. This network is typically an in-vehicle CAN network as described above, but is not limited thereto.
 例えば、CAN-FD(CAN with Flexible Data rate)、FlexRay、Ethernet、LIN(Local Interconnect Network)、MOST(Media Oriented Systems Transport)などのネットワークであってもよい。あるいはこれらのネットワークをサブネットワークとして、CANネットワークと組み合わせた車載ネットワークであってもよい。 For example, a network such as CAN-FD (CAN with Flexible Data rate), FlexRay, Ethernet, LIN (Local Interconnect Network), MOST (Media Oriented Systems Transport) may be used. Alternatively, an in-vehicle network combining these networks as sub-networks with a CAN network may be used.
 また、上記実施の形態において、各構成要素は、回路でもよい。複数の構成要素が、全体として1つの回路を構成してもよいし、それぞれ別々の回路を構成してもよい。また、回路は、それぞれ、汎用的な回路でもよいし、専用の回路でもよい。 In the above embodiment, each component may be a circuit. A plurality of components may constitute one circuit as a whole, or may constitute separate circuits. Each circuit may be a general-purpose circuit or a dedicated circuit.
 以上、一つ又は複数の態様に係る情報処理装置などについて、実施の形態に基づいて説明したが、本開示は、この実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したもの、及び異なる実施の形態における構成要素を組み合わせて構築される形態も、一つ又は複数の態様の範囲内に含まれてもよい。 As described above, the information processing apparatus according to one or more aspects has been described based on the embodiment, but the present disclosure is not limited to this embodiment. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.
 例えば、上記実施の形態において、特定の構成要素が実行する処理を特定の構成要素の代わりに別の構成要素が実行してもよい。また、複数の処理の順序が変更されてもよいし、複数の処理が並行して実行されてもよい。 For example, in the above embodiment, a process executed by a specific component may be executed by another component instead of the specific component. Further, the order of the plurality of processes may be changed, and the plurality of processes may be executed in parallel.
 本開示は、車載ネットワークを含む車載ネットワークシステムに利用可能である。 This disclosure can be used for an in-vehicle network system including an in-vehicle network.
 10 外部サーバ
 20 車両
 100,100A,100B,100C 異常検知システム
 110 異常判定部
 112 蓄積部
 114 判定対象データ受信部
 116 判定対象データ変換部
 118 判定実行部
 120 学習部
 122 訓練データ受信部
 124 学習辞書生成部
 210 車載ネットワーク
DESCRIPTION OF SYMBOLS 10 External server 20 Vehicle 100,100A, 100B, 100C Abnormality detection system 110 Abnormality determination part 112 Accumulation part 114 Determination object data reception part 116 Determination object data conversion part 118 Determination execution part 120 Learning part 122 Training data reception part 124 Learning dictionary generation Part 210 In-vehicle network

Claims (13)

  1.  プロセッサを備える情報処理装置であって、
     前記プロセッサは、
     Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けるデータ要素取得ステップと、
     前記訓練データをM次元の第一領域に渡って分布させるよう正規化する正規化ステップと、
     前記第一領域より大きく前記第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割する分割ステップと、
     前記第三領域のそれぞれが含む前記データ要素の個数S(Sは0以上の整数)を取得し、前記第三領域のうち、第一閾値T(Tは自然数)より少ない個数の前記データ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加する第一ノイズ付加ステップと、
     前記データ要素及び前記ノイズ要素を含むノイズ付加訓練データを生成する生成ステップと、
     前記ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力する学習辞書データ出力ステップとを実行する
     情報処理装置。
    An information processing apparatus comprising a processor,
    The processor is
    A data element acquisition step for receiving input of data elements which are N (N is an integer of 2 or more) M-dimensional vectors (M is an integer of 2 or more) used as training data for Isolation Forest;
    A normalizing step of normalizing the training data to be distributed over an M-dimensional first region;
    A division that divides an M-dimensional second region that is larger than the first region and includes the first region into third regions that are LM pieces (L is an integer of 4 or more) having the same size. Steps,
    The number S of data elements included in each of the third regions (S is an integer equal to or greater than 0) is acquired, and the number of the data elements smaller than a first threshold T (T is a natural number) in the third regions is obtained. A first noise adding step of adding (TS) M-dimensional vector noise elements in a uniform distribution to each of the third regions including;
    Generating a noise-added training data including the data element and the noise element;
    A learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data.
  2.  前記プロセッサは、
     Nが所定の第二閾値以上であるか否かを判定する第一判定ステップを実行し、
     前記第一判定ステップにおいてNが前記第二閾値以上ではないと判定した場合、前記分割ステップ及び前記第一ノイズ付加ステップを実行してから前記生成ステップ及び前記学習辞書データ出力ステップを実行する
     請求項1に記載の情報処理装置。
    The processor is
    Performing a first determination step of determining whether N is equal to or greater than a predetermined second threshold;
    When it is determined in the first determination step that N is not equal to or greater than the second threshold value, the generation step and the learning dictionary data output step are executed after executing the division step and the first noise addition step. The information processing apparatus according to 1.
  3.  前記プロセッサは、
     前記第一判定ステップにおいてNが前記第二閾値以上であると判定した場合、K個(KはNより小さい自然数)のM次元のベクトルであるノイズ要素を前記第二領域内に一様分布で付加する第二ノイズ付加ステップを実行してから前記生成ステップ及び前記学習辞書データ出力ステップを実行する
     請求項2に記載の情報処理装置。
    The processor is
    When it is determined in the first determination step that N is equal to or greater than the second threshold value, K (K is a natural number smaller than N) noise elements that are M-dimensional vectors are uniformly distributed in the second region. The information processing apparatus according to claim 2, wherein the generation step and the learning dictionary data output step are executed after executing a second noise addition step to be added.
  4.  前記プロセッサはさらに、
     前記第一判定ステップにおいてNが前記第二閾値以上でないと判定した場合、Isolation Forestのテスト用データの入力を受けるテスト用データ取得ステップと、Nが所定の第三閾値以上であるか否かを判定する第二判定ステップとを実行し、
     前記第二判定ステップにおいてNが前記第三閾値以上でないと判定した場合、前記分割ステップ、前記第一ノイズ付加ステップ、前記生成ステップ、及び前記学習辞書データ出力ステップのセットを、前記分割ステップで値の異なるLを用いて複数回実行して複数の前記学習辞書データを出力し、さらに、前記複数の学習辞書データのそれぞれを用いて前記テスト用データに対する異常検知を実行し、前記異常検知の結果に基づいて前記複数の学習辞書データのそれぞれを評価する評価ステップと、前記評価ステップの結果に基づいて前記複数の学習辞書データから最良の学習辞書データを選択する学習辞書データ選択ステップとを実行し、
     前記第二判定ステップにおいてNが前記第三閾値以上であると判定した場合、前記分割ステップで所定の値であるLを用いて前記セットを1回実行する
     請求項1から3のいずれか1項に記載の情報処理装置。
    The processor further includes:
    If it is determined in the first determination step that N is not equal to or greater than the second threshold, a test data acquisition step for receiving input of test data for Isolation Forest, and whether N is equal to or greater than a predetermined third threshold. Performing a second determination step to determine,
    If it is determined in the second determination step that N is not greater than or equal to the third threshold value, a set of the division step, the first noise addition step, the generation step, and the learning dictionary data output step is set in the division step. A plurality of learning dictionary data are output using a plurality of different Ls, and an abnormality detection is performed on the test data using each of the plurality of learning dictionary data, and the abnormality detection result An evaluation step for evaluating each of the plurality of learning dictionary data based on the learning step, and a learning dictionary data selection step for selecting the best learning dictionary data from the plurality of learning dictionary data based on a result of the evaluation step. ,
    The said set is performed once using L which is a predetermined value at the said division | segmentation step when it determines with N being more than the said 3rd threshold value in the said 2nd determination step. The information processing apparatus described in 1.
  5.  前記プロセッサは、前記第二判定ステップにおいてNが前記第三閾値以上でないと判定した場合、Nの値と負の相関を有するようLの前記異なる値の個数を決定する
     請求項4に記載の情報処理装置。
    5. The information according to claim 4, wherein when the second determination step determines that N is not equal to or greater than the third threshold, the processor determines the number of the different values of L so as to have a negative correlation with the value of N. 6. Processing equipment.
  6.  前記プロセッサは、前記第一ノイズ付加ステップにおいて、前記第一領域内にある前記第三領域のそれぞれに含まれる前記データ要素の個数の中央値より小さい個数のいずれかを前記第一閾値Tの値として決定する
     請求項1から5のいずれか1項に記載の情報処理装置。
    In the first noise addition step, the processor sets a value smaller than the median number of the data elements included in each of the third regions in the first region to a value of the first threshold T. The information processing apparatus according to any one of claims 1 to 5.
  7.  前記プロセッサは、
     前記第一判定ステップにおいてNが前記第二閾値以上であると判定した場合、Isolation Forestのテスト用データの入力を受けるテスト用データ取得ステップと、Nが所定の第四閾値以上であるか否かを判定する第三判定ステップとを実行し、
     前記第三判定ステップにおいてNが前記第四閾値以上でないと判定した場合、前記第二ノイズ付加ステップ、前記生成ステップ、及び前記学習辞書データ出力ステップのセットを、前記第二ノイズ付加ステップで値の異なるKを用いて複数回実行して複数の前記学習辞書データを出力し、さらに、前記複数の学習辞書データのそれぞれを用いて前記テスト用データに対する異常検知を実行して前記複数の学習辞書データのそれぞれを評価する評価ステップと、前記評価ステップの結果に基づいて前記複数の学習辞書データから最良の学習辞書データを選択する学習辞書データ選択ステップとを実行し、
     前記第三判定ステップにおいてNが前記第四閾値以上であると判定した場合、前記第二ノイズ付加ステップで所定の値であるKを用いて前記セットを1回実行する
     請求項1から3のいずれか1項に記載の情報処理装置。
    The processor is
    If it is determined in the first determination step that N is equal to or greater than the second threshold value, a test data acquisition step for receiving input of test data for Isolation Forest, and whether N is equal to or greater than a predetermined fourth threshold value And a third determination step for determining
    If it is determined in the third determination step that N is not greater than or equal to the fourth threshold value, the second noise addition step, the generation step, and the learning dictionary data output step set are set in the second noise addition step. A plurality of learning dictionary data is output by executing a plurality of times using different K, and the abnormality detection for the test data is performed using each of the plurality of learning dictionary data. An evaluation step for evaluating each of the learning dictionary data, and a learning dictionary data selection step for selecting the best learning dictionary data from the plurality of learning dictionary data based on the result of the evaluation step,
    4. When the third determination step determines that N is equal to or greater than the fourth threshold value, the second noise addition step executes the set once using K, which is a predetermined value. 5. The information processing apparatus according to claim 1.
  8.  前記プロセッサは、前記第三判定ステップにおいてNが前記第四閾値以上でないと判定した場合、Nの値と負の相関を有するようKの前記異なる値の個数を決定する
     請求項7に記載の情報処理装置。
    8. The information according to claim 7, wherein the processor determines the number of the different values of K so as to have a negative correlation with the value of N when it is determined in the third determination step that N is not equal to or greater than the fourth threshold value. Processing equipment.
  9.  前記第一領域をM次元の空間における[0,1]Mの超立方体で画定される領域とすると、
     前記第二領域は、前記空間において[-0.5,1.5]Mの超立方体で画定される領域である
     請求項1から8のいずれか1項に記載の情報処理装置。
    When the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space,
    The information processing apparatus according to any one of claims 1 to 8, wherein the second area is an area defined by a hypercube of [-0.5, 1.5] M in the space.
  10.  請求項1から9のいずれか1項に記載の情報処理装置と、
     前記情報処理装置から出力された学習辞書データを記憶するメモリ及びプロセッサを備え、ネットワークに接続される異常判定装置であって、前記プロセッサは、前記ネットワークを流れるデータを取得し、取得された前記データの異常判定を前記メモリに記憶されている学習辞書データに基づいて実行する異常判定装置とを備える
     異常検知システム。
    The information processing apparatus according to any one of claims 1 to 9,
    An abnormality determination device including a memory and a processor for storing learning dictionary data output from the information processing device and connected to a network, wherein the processor acquires data flowing through the network, and the acquired data An abnormality detection system comprising: an abnormality determination device that executes the abnormality determination based on learning dictionary data stored in the memory.
  11.  前記ネットワークは車載のControlled Area Networkネットワークである
     請求項10に記載の異常検知システム。
    The anomaly detection system according to claim 10, wherein the network is an on-board Controlled Area Network network.
  12.  プロセッサを備える情報処理装置を用いて実行される情報処理方法であって、
     前記プロセッサに、
     Isolation Forestの訓練データとして用いられるN個(Nは2以上の整数)のM次元のベクトル(Mは2以上の整数)であるデータ要素の入力を受けさせるデータ要素取得ステップと、
     前記訓練データをM次元の第一領域に渡って分布させるよう正規化させる正規化ステップと、
     前記第一領域より大きく前記第一領域を包含するM次元の第二領域を、大きさの等しいLM個(Lは4以上の整数)のM次元の超立方体である第三領域に分割させる分割ステップと、
     前記第三領域のそれぞれが含む前記データ要素の個数S(Sは0以上の整数)を取得させ、前記第三領域のうち、第一閾値T(Tは自然数)より少ない個数の前記データ要素を含む第三領域のそれぞれに、(T-S)個のM次元のベクトルであるノイズ要素を一様分布で付加させる第一ノイズ付加ステップと、
     前記データ要素及び前記ノイズ要素を含むノイズ付加訓練データを生成させる生成ステップと、
     前記ノイズ付加訓練データを用いてIsolation Forestの学習辞書データを生成して出力させる学習辞書データ出力ステップとを含む
     情報処理方法。
    An information processing method executed using an information processing apparatus including a processor,
    In the processor,
    A data element acquisition step for receiving input of data elements which are N (N is an integer of 2 or more) M-dimensional vectors (M is an integer of 2 or more) used as training data for Isolation Forest;
    A normalizing step of normalizing the training data to be distributed over an M-dimensional first region;
    Dividing an M-dimensional second region that is larger than the first region and includes the first region into third regions that are LM (L is an integer of 4 or more) M-dimensional hypercubes having the same size. Steps,
    The number S of data elements included in each of the third regions (S is an integer of 0 or more) is acquired, and the number of the data elements smaller than a first threshold T (T is a natural number) in the third region is obtained. A first noise adding step of adding noise elements that are (TS) M-dimensional vectors in a uniform distribution to each of the third regions including;
    Generating a noise-added training data including the data element and the noise element;
    A learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data.
  13.  コンピュータが備えるプロセッサに、請求項12に記載の情報処理方法を実行させるプログラム。 A program that causes a processor included in a computer to execute the information processing method according to claim 12.
PCT/JP2017/040727 2016-12-06 2017-11-13 Information processing device, information processing method, and program WO2018105320A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201780022736.0A CN109074519B (en) 2016-12-06 2017-11-13 Information processing apparatus, information processing method, and program
EP17877549.0A EP3553712B1 (en) 2016-12-06 2017-11-13 Information processing device, information processing method, and program
US16/255,877 US10601852B2 (en) 2016-12-06 2019-01-24 Information processing device, information processing method, and recording medium storing program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662430570P 2016-12-06 2016-12-06
US62/430570 2016-12-06
JP2017-207085 2017-10-26
JP2017207085A JP6782679B2 (en) 2016-12-06 2017-10-26 Information processing equipment, information processing methods and programs

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/255,877 Continuation US10601852B2 (en) 2016-12-06 2019-01-24 Information processing device, information processing method, and recording medium storing program

Publications (1)

Publication Number Publication Date
WO2018105320A1 true WO2018105320A1 (en) 2018-06-14

Family

ID=62491996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/040727 WO2018105320A1 (en) 2016-12-06 2017-11-13 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2018105320A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345137A (en) * 2018-10-22 2019-02-15 广东精点数据科技股份有限公司 A kind of rejecting outliers method based on agriculture big data
CN109508738A (en) * 2018-10-31 2019-03-22 北京国双科技有限公司 A kind of information processing method and relevant device
CN109948738A (en) * 2019-04-11 2019-06-28 合肥工业大学 Energy consumption method for detecting abnormality, the apparatus and system of coating drying room
CN110243599A (en) * 2019-07-02 2019-09-17 西南交通大学 Multidimensional peels off train EMU axle box bearing temperature anomaly state monitoring method
CN114019940A (en) * 2020-03-02 2022-02-08 阿波罗智联(北京)科技有限公司 Method and apparatus for detecting anomalies
WO2023127111A1 (en) * 2021-12-28 2023-07-06 富士通株式会社 Generation method, generation program, and information processing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (en) * 2011-10-12 2013-05-09 Sony Corp Information processing apparatus, information processing method, and program
US20150078654A1 (en) * 2013-09-13 2015-03-19 Interra Systems, Inc. Visual Descriptors Based Video Quality Assessment Using Outlier Model
JP2016133895A (en) * 2015-01-16 2016-07-25 キヤノン株式会社 Information processing device, information processing method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (en) * 2011-10-12 2013-05-09 Sony Corp Information processing apparatus, information processing method, and program
US20150078654A1 (en) * 2013-09-13 2015-03-19 Interra Systems, Inc. Visual Descriptors Based Video Quality Assessment Using Outlier Model
JP2016133895A (en) * 2015-01-16 2016-07-25 キヤノン株式会社 Information processing device, information processing method, and program

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAGA, TOMOYUKI ET AL.: "Proposal of statistical abnormality detecting system for vehicle-mounted newtwork using Cloud", SCIS2016 SYMPOSIUM ON CRYPTOGRAPHY AND INFORMATION SECURITY, 19 January 2016 (2016-01-19), pages 1 - 8, XP009515155 *
LIU, FEI TONY ET AL.: "Isolation-Based Anomaly Detection", ACM TRANSACT IONS ON KNOWLEDGE DISCOVERY FROM DATA (TKDD), vol. 6, no. 1, 1 March 2012 (2012-03-01), pages 1 - 39, XP055492079 *
See also references of EP3553712A4 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345137A (en) * 2018-10-22 2019-02-15 广东精点数据科技股份有限公司 A kind of rejecting outliers method based on agriculture big data
CN109508738A (en) * 2018-10-31 2019-03-22 北京国双科技有限公司 A kind of information processing method and relevant device
CN109948738A (en) * 2019-04-11 2019-06-28 合肥工业大学 Energy consumption method for detecting abnormality, the apparatus and system of coating drying room
CN110243599A (en) * 2019-07-02 2019-09-17 西南交通大学 Multidimensional peels off train EMU axle box bearing temperature anomaly state monitoring method
CN114019940A (en) * 2020-03-02 2022-02-08 阿波罗智联(北京)科技有限公司 Method and apparatus for detecting anomalies
CN114035544A (en) * 2020-03-02 2022-02-11 阿波罗智联(北京)科技有限公司 Method and apparatus for detecting anomalies
WO2023127111A1 (en) * 2021-12-28 2023-07-06 富士通株式会社 Generation method, generation program, and information processing device

Similar Documents

Publication Publication Date Title
JP6782679B2 (en) Information processing equipment, information processing methods and programs
WO2018105320A1 (en) Information processing device, information processing method, and program
US10437992B2 (en) Anomaly detection for vehicular networks for intrusion and malfunction detection
CN107005790B (en) Collaborative security in wireless sensor networks
JP2023068037A (en) Vehicle abnormality detection server, vehicle abnormality detection system, and vehicle abnormality detection method
CA2995864A1 (en) Multi-modal, multi-disciplinary feature discovery to detect cyber threats in electric power grid
US11595431B2 (en) Information processing apparatus, moving apparatus, and method
US20210273966A1 (en) Anomaly detection method and anomaly detection device
EP4075726A1 (en) Unified multi-agent system for abnormality detection and isolation
US11856006B2 (en) Abnormal communication detection apparatus, abnormal communication detection method and program
CN113328985B (en) Passive Internet of things equipment identification method, system, medium and equipment
Zhang et al. Dual generative adversarial networks based unknown encryption ransomware attack detection
WO2020208639A2 (en) A system and method for detection of anomalous controller area network (can) messages
JP6939898B2 (en) Bit assignment estimation device, bit assignment estimation method, program
US11972334B2 (en) Method and apparatus for generating a combined isolation forest model for detecting anomalies in data
US11227051B2 (en) Method for detecting computer virus, computing device, and storage medium
JP2021179935A (en) Vehicular abnormality detection device and vehicular abnormality detection method
JP2015526826A (en) System and method for state-based test case generation for software verification
US20230007034A1 (en) Attack analyzer, attack analysis method and attack analysis program
Kleberger et al. Towards designing secure in-vehicle network architectures using community detection algorithms
US9218486B2 (en) System and method for operating point and box enumeration for interval bayesian detection
JP2022153081A (en) Attack analysis device, attack analysis method, and attack analysis program
JP6979630B2 (en) Monitoring equipment, monitoring methods and programs
US20240202336A1 (en) Method and system for incremental centroid clustering
Wang et al. An Intrusion Detection System Based on the Double-Decision-Tree Method for In-Vehicle Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17877549

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017877549

Country of ref document: EP

Effective date: 20190708