WO2023112333A1 - 推定装置、推定方法及び推定プログラム - Google Patents
推定装置、推定方法及び推定プログラム Download PDFInfo
- Publication number
- WO2023112333A1 WO2023112333A1 PCT/JP2021/046840 JP2021046840W WO2023112333A1 WO 2023112333 A1 WO2023112333 A1 WO 2023112333A1 JP 2021046840 W JP2021046840 W JP 2021046840W WO 2023112333 A1 WO2023112333 A1 WO 2023112333A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- packet data
- abnormal
- byte
- normal
- data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 27
- 230000002159 abnormal effect Effects 0.000 claims abstract description 198
- 238000000605 extraction Methods 0.000 claims abstract description 19
- 238000003058 natural language processing Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 110
- 238000012545 processing Methods 0.000 claims description 19
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 8
- 230000002457 bidirectional effect Effects 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 9
- 238000004891 communication Methods 0.000 abstract description 8
- 238000012217 deletion Methods 0.000 description 32
- 230000037430 deletion Effects 0.000 description 32
- 238000003780 insertion Methods 0.000 description 32
- 230000037431 insertion Effects 0.000 description 32
- 238000010586 diagram Methods 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 7
- 230000010365 information processing Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012946 outsourcing Methods 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
Definitions
- the present invention relates to an estimation device, an estimation method, and an estimation program.
- OTDS Operational Technology Intrusion Detection System
- OTD Operational Technology Intrusion Detection System
- unexpected operations such as a temperature setting value changed by one digit due to unauthorized rewriting, may cause a serious accident. Therefore, it is desirable to be able to detect without overlooking unauthorized rewriting of 1 byte of the payload corresponding to the contents of communication. Therefore, precise analysis of payload contents is essential for anomaly detection systems for industrial and building network control systems.
- the estimation device generates similar normal packet data having a relatively high degree of similarity with abnormal packet data among a plurality of normal packet data based on a natural language processing model.
- an extracting unit for extracting a predetermined number of the similar normal packet data extracted by the extracting unit, extracting same-length packet data having the same packet length as the abnormal packet data, and extracting same-length packet data having the same packet length as the abnormal packet data an estimating unit for estimating an abnormal byte location by comparing the packet data byte by byte;
- FIG. 1 is a block diagram of an information processing device according to an embodiment.
- FIG. 2 is a block diagram showing the details of the question generator.
- FIG. 3 is a block diagram of a machine learning device that learns a question generation model.
- FIG. 4 is a diagram showing an example of question-answer learning data.
- FIG. 5 is an image diagram of learning data for learning the question generation model.
- FIG. 6 is a diagram illustrating an example of question sentence creation by the information processing apparatus according to the embodiment.
- FIG. 7 is a flowchart of question generation processing by the information processing apparatus according to the embodiment.
- FIG. 8 is a flowchart of machine learning processing by the machine learning device according to the embodiment.
- FIG. 9 is a diagram showing experimental results using the information processing apparatus according to the embodiment.
- FIG. 10 is a diagram illustrating an example of a computer that executes an information processing program;
- An estimation device 1 According to an embodiment of the present invention will be described with reference to FIG.
- the estimating device 1 estimates and outputs an abnormal byte in the abnormal packet.
- the estimating device 1 compares an abnormal packet determined to be abnormal by another system with a normal packet determined to be normal by another system, estimates an abnormal byte in the input abnormal packet, inserts a byte position, or deletes a byte position. is estimated.
- normal packets and abnormal packets are each collected in a communication network of one operation technology.
- the other system may use any method to determine whether the packet is normal or abnormal, and the determination method does not matter in the embodiment of the present invention.
- the estimation device 1 generates model data 11, normal vector data group 12, normal packet data group 13, abnormal packet data 15, abnormal vector data 16, similar normal packet data group 17, abnormal bytes 18, and insertion/deletion byte locations 19. have data.
- the estimating device 1 also includes a transforming unit 21 , a generating unit 22 , an extracting unit 23 and an estimating unit 24 .
- the model data 11 specifies a model for converting packet data into vector data.
- the vector data associates each byte of the packet data with each vector representing the characteristics of the value of each byte.
- the model data 11 is generated by learning the value of each byte of a plurality of normal packet data of the normal vector data group 12 by the generation unit 22 which will be described later.
- the characteristics of each byte value are calculated by comparing with each byte value of a plurality of normal packet data.
- the model data 11 specifies a model that converts each byte of the input packet data into an appropriate fixed-length vector in consideration of the positional relationship of each byte.
- an appropriate fixed-length vector means a vector that can detect the presence of an abnormal byte location by comparing the abnormal vector data 16 with normal vector data in the estimation unit 24, which will be described later. For example, as shown in FIG. 2, if there is fixed-length packet data such as the first byte value "2e”, the second byte value "3f", the third byte value "00", and so on. do. Each byte of this packet data is converted by the model into a 784-dimensional vector. In the example shown in FIG. 2, the model transforms each byte of packet data into a 784-dimensional vector that characterizes the value of each byte.
- the model data 11 is generated by BERT, for example.
- BERT is a natural language processing model.
- each byte of packet data is considered a word.
- a model generated using BERT converts the packet data into vector data.
- the normal packet data group 13 includes data of multiple packets identified as normal packets in other systems.
- the normal packet data group 13 may be a normal packet data group used for BERT learning, or may be a packet data group determined to be normal by the estimation device 1 most recently. You can use it.
- the normal vector data group 12 includes multiple normal vector data.
- Normal vector data is data obtained by converting normal packet data included in the normal packet data group 13 using a model specified by the model data 11 .
- the normal vector data group 12 is referred to when the generator 22 generates the model data 11 or when the extractor 23 extracts a similar normal vector data group similar to the abnormal vector data 16 .
- Both the generator 22 and the extractor 23 may refer to a plurality of normal vector data included in the normal vector data group 12 .
- a plurality of normal vector data included in the normal vector data group 12 may be divided into a plurality of groups, and the generation unit 22 may refer to one group and the extraction unit 23 may refer to another group.
- the abnormal packet data 15 is data of packets identified as abnormal packets in other systems.
- the estimation device 1 estimates an abnormal byte 18 and an insertion/deletion byte location 19 for one piece of abnormal packet data 15 .
- the abnormal vector data 16 is data obtained by converting the abnormal packet data 15 using a model specified by the model data 11 .
- the abnormal vector data 16 associates the identifier of the position of each byte of the abnormal packet data 15 with each vector representing the characteristics of the value of each byte.
- the similar-normal packet data group 17 is a set of normal packet data before conversion of the similar-normal vector data group.
- the similar normal vector data group is a set of data having a relatively high degree of similarity with the abnormal vector data 16 among the plurality of normal vector data included in the normal vector data group 12 .
- the similar normal vector data group includes a predetermined number of normal vector data in descending order of similarity from the normal vector data with the highest similarity to the abnormal vector data 16 among the plurality of normal vector data included in the normal vector data group 12. is a set.
- the predetermined number can be 100, for example.
- the group of similar normal vector data may be a set of a predetermined number of normal vector data among normal vector data whose similarity is higher than a predetermined threshold.
- the similar-normal packet data group 17 includes the same number of normal packet data as the predetermined number of normal vector data included in the similar-normal vector data group. That is, the similar normal packet data group 17 includes a predetermined number of normal packet data.
- the abnormal byte 18 is data specifying a byte that is presumed to be abnormal among the bytes of the abnormal packet data 15 .
- the abnormal byte 18 is specified, for example, by comparing each byte of the abnormal packet data 15 and normal packet data having the same length as the abnormal packet data 15 included in the similar normal packet data group 17 one by one.
- the inserted/deleted byte location 19 is an inserted byte location suspected of inserting an extra byte in the abnormal packet data 15 or a deleted byte location suspected of deleting a normal byte.
- the insertion/deletion byte location 19 is estimated, for example, by calculating the Edit Distance between the abnormal packet data 15 and similar normal packet data included in the similar normal packet data group 17 .
- the conversion unit 21 converts the abnormal packet data 15 into abnormal vector data 16 using the model specified by the model data 11 . For example, as shown in FIG. 2, the conversion unit 21 converts each byte value of the abnormal packet data 15 into a 784-dimensional vector. The conversion unit 21 associates the position of each byte of the abnormal packet data 15 with the 784-dimensional vector converted from that byte, and outputs the abnormal vector data 16 .
- the generation unit 22 learns the values of each byte of multiple normal packet data in the normal vector data group 12 and generates a model specified by the model data 11 .
- the model converts the packet data into vector data that associates each byte of the packet data with each vector that characterizes the value of each byte.
- the generation unit 22 generates a model according to BERT, for example.
- the generation unit 22 may preliminarily learn the characteristics of each byte value in normal packet data by solving auxiliary tasks such as MLM (Masked Language Model) or NSP (Next Sentence Prediction).
- MLM Mask Language Model
- NSP Next Sentence Prediction
- the generation unit 22 uses these auxiliary tasks to identify the validity of data within a packet and the validity of consecutive packets, and the generation unit 22 generates a model that identifies normal vector data.
- the auxiliary tasks listed here are only examples, and the generator 22 may learn by solving other auxiliary tasks.
- the extraction unit 23 extracts a predetermined number of normal vector data having a relatively high degree of similarity with the abnormal vector data 16 from a plurality of normal vector data in the normal vector data group 12 .
- the extraction unit 23 treats the extracted predetermined number of normal vector data as a group of similar normal vector data.
- a relatively high degree of similarity means that the degree of similarity between the abnormal vector data 16 and certain normal vector data is higher than the degree of similarity between the abnormal vector data 16 and other normal vector data.
- the extraction unit 23 may extract a predetermined number of normal vector data in descending order of similarity from normal vector data with the highest similarity to the abnormal vector data 16 .
- the predetermined number can be 100, for example.
- the extraction unit 23 may extract a predetermined number of normal vector data from normal vector data whose similarity is higher than a threshold.
- the extraction unit 23 calculates the degree of similarity between the abnormal vector data 16 and each normal vector data in the normal vector data group 12 .
- the extraction unit 23 may calculate the degree of similarity with some normal vector data in the normal vector data group 12 .
- some normal vector data is obtained by extracting multiple representative packet data from multiple normal packet data using MMD-Critic (MMD: Maximum Mean Discrepancy) and converting each extracted representative packet data with a model. This is multiple normal vector data.
- the part of the normal vector data is a plurality of normal vectors obtained by extracting normal packet data having the same packet length as the abnormal packet data 15 from a plurality of normal packet data and converting each extracted normal packet data with a model. Data.
- the extraction unit 23 may use BERTScore as the degree of similarity. Alternatively, the extraction unit 23 calculates the degree of similarity between the vector of the abnormal vector data 16 and the vector of the normal vector data for each byte of the abnormal vector data 16, and extracts the abnormal vector data from the degree of similarity calculated for each byte. 16 and normal vector data may be calculated. Cosine similarity may be used as the similarity between vectors of each byte. The degree of similarity between the abnormal vector data 16 and the normal vector data is, for example, the average degree of similarity calculated for each byte. At this time, if the number of vectors of the abnormal vector data 16 is different from the number of vectors of the normal vector data, the similarity may be calculated according to the smaller number of vectors. The number of vectors of each vector data is the number of bytes of packet data before conversion.
- the extraction unit 23 acquires from the normal packet data group 13 a predetermined number of normal packet data before conversion of the predetermined number of normal vector data included in the similar normal vector data group. Then, the extraction unit 23 sets the obtained predetermined number of normal packet data as similar-normal packet data, and sets the set of similar-normal packet data as a similar-normal packet data group 17 .
- the estimating unit 24 compares the abnormal packet data 15 with the similar normal packet data contained in the similar normal packet data group 17, and estimates the abnormal bytes 18 contained in the abnormal packet data 15 or the insertion in the abnormal packet data 15. / Estimate the deletion byte location 19 . Details of the estimation by the estimation unit 24 will be described below.
- the estimation unit 24 has a length comparison unit 241, an abnormal byte estimation unit 242, and an insertion/deletion byte location estimation unit 243, as shown in FIG.
- the length comparison unit 241 compares the packet lengths of the abnormal packet data 15 and a predetermined number of similar normal packet data included in the similar normal packet data group 17 .
- the length comparison unit 241 determines that byte rewriting occurs when the number of similar normal packet data equal to or greater than a predetermined determination threshold among the predetermined number of normal packet data has the same packet length as the abnormal packet data 15.
- the determination threshold is a parameter that can be specified.
- the determination threshold can be, for example, a fixed value of 50%. Alternatively, the threshold may be specified by a predetermined calculation.
- a plurality of pairs of two similar normal packet data that are similar to each other are extracted, and the threshold is specified from the lowest similarity among the similarities of the vectors of the two similar normal packet data corresponding to a predetermined byte. Also good.
- the extracted normal packet data having the same packet length as the abnormal packet data 15 is hereinafter referred to as "same-length normal packet data". Then, the length comparing section 241 outputs the abnormal packet data 15 and the same-length normal packet data to the abnormal byte estimating section 242 .
- the length comparison unit 241 determines that byte insertion or deletion has occurred, or that the abnormal packet data 15 is completely different from normal packet data. do. Then, the length comparing section 241 outputs all the predetermined number of normal packet data included in the abnormal packet data 15 and the similar normal packet data group 17 to the abnormal byte estimating section 242 .
- the abnormal byte estimation unit 242 acquires the abnormal packet data 15 and the same-length normal packet data from the length comparison unit 241 . Then, the abnormal byte estimator 242 compares the acquired same-length normal packet data with the abnormal packet data 15 in order to identify the byte rewrite location in the abnormal packet data 15 .
- the abnormal byte estimation unit 242 directly treats the value of the first byte of the same-length normal packet data extracted for comparison as a number between 0 and 255 to calculate the interquartile range. Then, the abnormal byte estimation unit 242 determines whether or not the numerical value of the first byte of the abnormal packet data 15 is included in the calculated interquartile range. When the numerical value of the first byte of the abnormal packet data 15 is included in the calculated interquartile range, the abnormal byte estimation unit 242 determines that the first byte is normal. Further, when the numerical value of the first byte of the abnormal packet data 15 is not included in the calculated interquartile range, the abnormal byte estimation unit 242 regards the first byte as an abnormal byte.
- the abnormal byte estimation unit 242 compares each byte in order from the second byte to the third byte, determines whether each byte is an abnormal byte location, and estimates an abnormal byte.
- the interquartile range is used to compare each byte here, if it is an anomaly detection method that can handle one-dimensional data, it is particularly limited to the anomaly detection method that can be used, such as the nonparametric method or the naive Bayes method. no.
- the abnormal byte estimator 242 selects a normal packet with the smallest abnormal byte location from among normal packet data of the same length with an abnormal byte location less than a certain value. Select the data as the last similar normal packet data. After that, the abnormal byte estimation unit 242 estimates the stored abnormal byte location in the final similar normal packet data as the abnormal byte 18 .
- the abnormal byte estimation unit 242 processes the abnormal packet data 15 as if there is no final similar normal packet data.
- the fixed value is, for example, a parameter that can specify approximately 1/3 to 1/2 of the packet length.
- the inserted/deleted byte location estimation unit 243 acquires the abnormal packet data 15 and a predetermined number of normal packet data from the length comparison unit 241 . Then, the inserted/deleted byte location estimation unit 243 compares the acquired normal packet data and the abnormal packet data 15 in order to specify the insertion byte location or deleted byte location of the bytes in the abnormal packet data 15 .
- the inserted/deleted byte location estimation unit 243 calculates the edit distance between the abnormal packet data 15 and each normal packet data using dynamic programming.
- the inserted/deleted byte position estimation unit 243 can specify an inserted byte position suspected of being inserted or a deleted byte position suspected of being deleted by calculating an edit distance.
- the insertion/deletion byte location estimation unit 243 selects the normal packet data whose edit distance is the shortest among the normal packet data whose edit distance is less than the fixed distance. Select as similar normal packet data. Then, the insertion/deletion byte location estimation unit 243 estimates the insertion/deletion byte location 19 using the edit distance of the selected final similar normal packet data.
- the insertion/deletion byte position estimation unit 243 processes as if there is no final similar normal packet data corresponding to the abnormal packet data 15.
- the constant distance is a parameter that can specify, for example, about 1/3 to 1/2 of the packet length.
- FIG. 3 is a flowchart of estimation processing by the estimation device according to the embodiment. Next, the flow of estimation processing by the estimation device according to the present embodiment will be described with reference to FIG.
- the conversion unit 21 converts the abnormal packet data 15 into abnormal vector data 16 (step S1).
- the extracting unit 23 extracts a predetermined number of normal vector data similar to the abnormal vector data 16 converted in step S1 from the normal vector data group 12 (step S2) to form a similar normal vector data group.
- the extraction unit 23 acquires from the normal packet data group 13 a predetermined number of normal packet data before conversion of the normal vector data contained in the similar normal vector data group (step S3), and as similar normal packet data, Let the set be a similar normal packet data group 17 .
- the length comparing section 241 of the estimating section 24 compares the abnormal packet data 15 with each similar normal packet data included in the similar normal packet data group 17 (step S4). Then, the length comparison unit 241 determines whether or not the similar normal packet data group 17 includes normal packet data of the same length and the same packet length as the abnormal packet data 15 (step S5).
- step S5 When the number of normal packet data of the same length exceeds the determination threshold (step S5: affirmative), the length comparison unit 241 determines that the abnormal byte 18 exists. The length comparison unit 241 then transmits the abnormal packet data 15 and the normal packet data of the same length to the abnormal byte estimation unit 242 . The abnormal byte estimation unit 242 acquires the abnormal packet data 15 and the normal packet data of the same length, and executes the abnormal byte estimation process (step S6).
- the length comparison unit 241 determines that there is an inserted byte portion or a deleted byte portion.
- the length comparison unit 241 transmits all of the predetermined number of similar-normal packet data included in the abnormal packet data 15 and the similar-normal packet data group 17 to the insertion/deletion byte location estimation unit 243 .
- the insertion/deletion byte location estimation unit 243 acquires the abnormal packet data 15 and a predetermined number of similar normal packet data, and executes insertion/deletion byte location estimation processing (step S7).
- FIG. 4 is a flowchart of abnormal byte estimation processing.
- the flow shown in FIG. 4 corresponds to an example of the abnormal byte estimation process executed in step S6 in FIG.
- the abnormal byte estimation unit 242 selects one unselected same-length normal packet data from the same-length normal packet data (step S101).
- the abnormal byte estimating unit 242 sets n, which is a parameter representing the position of the byte to be compared, to 1 (step S102).
- the abnormal byte estimation unit 242 compares the n-th byte of the selected same-length normal packet data and the abnormal packet data 15 (step S103). For example, the abnormal byte estimator 242 directly treats the n-th byte value of the selected same-length normal packet data as a number between 0 and 255 to calculate the interquartile range. Then, the abnormal byte estimation unit 242 determines whether or not the n-th byte number of the abnormal packet data 15 is included in the calculated interquartile range.
- the abnormal byte estimation unit 242 uses the comparison result to determine whether the n-th byte is an abnormal byte (step S104). For example, if the calculated interquartile range includes the numerical value of the n-th byte of the abnormal packet data 15, the abnormal byte estimation unit 242 determines that the n-th byte is normal. Conversely, if the n-th byte value of the abnormal packet data 15 is not included in the calculated interquartile range, the abnormal byte estimation unit 242 regards the n-th byte as an abnormal byte. If the n-th byte is not an abnormal byte (step S104: No), the abnormal byte estimation unit 242 proceeds to step S106.
- the abnormal byte estimation unit 242 stores the n-th byte in the abnormal packet data 15 as an abnormal byte location (step S105). Proceed to S106.
- the abnormal byte estimation unit 242 determines whether or not the n-th byte is the last byte in the abnormal packet data 15 (step S106). If the n-th byte is not the last byte (step S106: No), the abnormal byte estimation unit 242 increments n by 1 (step S107) and returns to step S103.
- step S106 determines whether or not all the normal packet data of the same length have been selected. If unselected same-length normal packet data remains (step S108: No), the abnormal byte estimation unit 242 returns to step S101.
- step S108 when all the same-length normal packet data have been selected (step S108: affirmative), the abnormal byte estimating unit 242 determines that the same-length normal packet data in which the location of the stored abnormal byte is less than the predetermined value is It is determined whether or not it exists (step S109).
- the abnormal byte estimation unit 242 performs the following processing. In this case, the abnormal byte estimator 242 selects normal packet data with the smallest number of abnormal byte locations as the final similar normal packet data from among normal packet data of the same length with abnormal byte locations less than a certain value. Then, the abnormal byte estimation unit 242 estimates the abnormal byte location of the stored abnormal packet data 15 for the last similar normal packet data as the abnormal byte 18 (step S110), and terminates the abnormal byte estimation process.
- step S109 determines that there is no final similar normal packet data.
- FIG. 5 is a flowchart of insertion/deletion byte location estimation processing.
- the flow shown in FIG. 5 corresponds to an example of insertion/deletion byte location estimation processing executed in step S7 in FIG.
- the insertion/deletion byte location estimation unit 243 selects one unselected normal packet data from a predetermined number of similar normal packet data (step S201).
- the inserted/deleted byte location estimation unit 243 calculates the edit distance between the abnormal packet data 15 and the selected similar normal packet data using dynamic programming (step S202).
- the insertion/deletion byte position estimation unit 243 determines whether or not all the predetermined number of similar normal packet data have been selected (step S203). If unselected normal packet data remains among the predetermined number of similar normal packet data (step S203: NO), the insertion/deletion byte position estimation unit 243 returns to step S201.
- step S203 when the selection of all the predetermined number of similar normal packet data is completed (step S203: Yes), the insertion/deletion byte location estimation unit 243 determines that there is similar normal packet data whose edit distance is less than the predetermined distance. It is determined whether or not to do so (step S204).
- the insertion/deletion byte location estimation unit 243 executes the following processing. In this case, the inserted/deleted byte location estimation unit 243 selects the similar normal packet data with the shortest edit distance among the similar normal packet data with an edit distance less than a certain distance as the final similar normal packet data. Then, the insertion/deletion byte location estimating unit 243 estimates the insertion/deletion byte location 19 using the edit distance between the selected last similar normal packet data and the abnormal packet data 15 (step S205). Terminate the deletion byte position estimation processing.
- step S204 determines that there is no similar normal packet data whose edit distance is less than the fixed distance (step S204: NO).
- FIG. 6 is a diagram showing experimental results when rewriting bytes.
- FIG. 7 is a diagram showing experimental results when random bytes are inserted.
- FIG. 8 is a diagram showing experimental results when bytes are deleted.
- the vertical axis represents the rate of estimation success, and the horizontal axis represents the number of deleted bytes.
- the rate of successful estimation represents the rate of successful estimation, with 1 being the case where the estimation is successful for all 100 cases of abnormal packet data 15 .
- the estimation device 1 can estimate about 90% even if five locations are rewritten. In this case, since the packet length is 12 to 25, it can be considered that the estimation accuracy is quite good even if the rewriting of 5 is performed.
- the estimation device 1 can estimate about 90% up to two rewrites. However, the estimation accuracy drops when inserting at 3 or more locations, and the estimation accuracy drops to 50% when rewriting at 5 locations.
- One reason for this is thought to be that the data packets with packet lengths of 12, 14, or 15, which account for the majority, were mixed due to the insertion of random bytes, and the extraction of the similar normal packet data group 17 by BERT did not work well. be done. That is, if the packet length is not affected by the insertion of random bytes, it is considered that the estimation accuracy is improved even if the insertion is made at three or more locations.
- FIG. 9 is a diagram for explaining the cause of the decrease in accuracy of byte deletion.
- the estimation device 1 may estimate the byte 130 in the determination result packet data 103 as the deleted location. In this experiment, such estimations are not treated as correct, resulting in reduced estimation accuracy.
- the estimation device 1 can ensure high estimation accuracy in detecting anomalies regardless of whether the anomaly is byte rewriting, random byte insertion, or byte deletion. is considered possible.
- the estimation device 1 extracts a predetermined number of similar normal packet data similar to the detected abnormal packet data 15 using BERT. Then, the estimating device 1 estimates a tampered portion by comparing the abnormal packet data 15 and the similar normal packet data byte by byte, and estimates an inserted byte portion and a deleted byte portion using edit distance calculation. This makes it possible to accurately estimate the location of an abnormal byte or insertion/deletion byte location 19 for a packet of any communication protocol.
- each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated.
- the specific form of distribution and integration of each device is not limited to the illustrated one, and all or part of them can be functionally or physically distributed or Can be integrated and configured.
- all or any part of each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or hardware by wired logic can be realized as
- the estimation device 1 can be implemented by installing an information processing program for executing the above-described question generation process as package software or online software on a desired computer.
- the computer can function as the estimation device 1 by causing the computer to execute the above estimation processing program.
- the computer referred to here includes a desktop or notebook personal computer.
- computers include smartphones, mobile communication terminals such as mobile phones and PHS (Personal Handy-phone Systems), and slate terminals such as PDAs (Personal Digital Assistants).
- the estimating device 1 may be implemented as a web server, or may be implemented as a cloud that provides services related to the above-described management processing by outsourcing.
- FIG. 10 is a diagram showing an example of a computer that executes an estimation processing program.
- the computer 1000 has a memory 1010 and a CPU 1020, for example.
- Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012 .
- the ROM 1011 stores a boot program such as BIOS (BASIC Input Output System).
- BIOS BASIC Input Output System
- Hard disk drive interface 1030 is connected to hard disk drive 1090 .
- a disk drive interface 1040 is connected to the disk drive 1100 .
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
- the serial port interface 1050 is connected to an input unit 1200 such as a mouse 1110 and a keyboard 1120, for example.
- Video adapter 1060 is connected to output 1300 , such as display 1130 .
- the hard disk drive 1090 stores, for example, an OS 1091, application programs 1092, program modules 1093, and program data 1094. That is, a program defining each process of the estimating device 1 having functions equivalent to those of the estimating device 1 is implemented as a program module 1093 in which computer-executable code is described. Program modules 1093 are stored, for example, on hard disk drive 1090 .
- the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configuration in the estimation device 1 .
- the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads the program modules 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes the processes of the above-described embodiments.
- the program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
- LAN Local Area Network
- WAN Wide Area Network
- estimator 11 model data 12 normal vector data group 13 normal packet data group 15 abnormal packet data 16 abnormal vector data 17 similar normal packet data group 18 abnormal byte 19 insertion/deletion byte location 21 converter 22 generator 23 extractor 24 estimation Unit 241 Length comparison unit 242 Abnormal byte estimation unit 243 Insertion/deletion byte location estimation unit
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Environmental & Geological Engineering (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
図1を参照して本発明の実施の形態に係る推定装置1を説明する。推定装置1は、異常パケットが入力されると、その異常パケット中の異常バイトを推定して出力する。推定装置1は、他システムで異常と判定された異常パケットと、その他システムで正常と判定された正常パケットと比較して、入力された異常パケットにおける異常バイトの推定や挿入バイト箇所又は削除バイト箇所の推定を行う。例えば、正常パケット及び異常パケットは、それぞれ一つのオペレーションテクノロジの通信ネットワークで収集される。他システムは、任意の方法で、パケットの正常又は異常を判定すればよく、本発明の実施の形態において判定方法は問わない。
図3は、実施形態に係る推定装置による推定処理のフローチャートである。次に、図3を参照して、本実施形態に係る推定装置による推定処理の流れについて説明する。
次に、本実施形態に係る推定装置1による異常バイト18もしくは挿入/削除バイト箇所19の推定の実験を行った場合の実験結果について説明する。ここでは、以下の条件で実験を実施した。BERTは、Modbus/TCP3万件で学習を行うことにより学習済みである。また、データセットとして、400件の正常パケットデータと100件の異常パケットデータ15とを利用した。異常パケットデータ15は、1件ずつ入力して原因推定を実施した。また、この実験においては、完全一致のみ推定成功とみなした。
以上に説明したように、本実施形態に係る推定装置1は、検知した異常パケットデータ15と類似する所定数の類似正常パケットデータを、BERTを用いて抽出する。そして、推定装置1は、異常パケットデータ15と類似正常パケットデータとをバイト毎に比較しての改ざん箇所の推定や、編集距離算出を用いた挿入バイト箇所及び削除バイト箇所の推定を行う。これにより、任意の通信プロトコルのパケットに対して、異常バイト箇所又は挿入/削除バイト箇所19の推定を精度良く行うことが可能となる。
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散及び統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散又は統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、CPU(Central Processing Unit)及び当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。
一実施形態として、推定装置1は、パッケージソフトウェアやオンラインソフトウェアとして上記の質問生成処理を実行する情報処理プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の推定処理プログラムをコンピュータに実行させることにより、コンピュータを推定装置1として機能させることができる。ここで言うコンピュータには、デスクトップ型又はノート型のパーソナルコンピュータが含まれる。また、その他にも、コンピュータにはスマートフォン、携帯電話機やPHS(Personal Handy-phone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。推定装置1は、Webサーバとして実装することとしてもよいし、アウトソーシングによって上記の管理処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。
11 モデルデータ
12 正常ベクトルデータ群
13 正常パケットデータ群
15 異常パケットデータ
16 異常ベクトルデータ
17 類似正常パケットデータ群
18 異常バイト
19 挿入/削除バイト箇所
21 変換部
22 生成部
23 抽出部
24 推定部
241 長さ比較部
242 異常バイト推定部
243 挿入/削除バイト箇所推定部
Claims (7)
- 自然言語処理モデルを基に、複数の正常パケットデータの中から異常パケットデータとの類似度が相対的に高い類似正常パケットデータを所定数抽出する抽出部と、
前記抽出部により抽出された前記類似正常パケットデータの中から前記異常パケットデータとパケット長が同一の同一長パケットデータを抽出し、前記異常パケットデータと前記同一長パケットデータとをバイト毎に比較して異常バイト箇所を推定する推定部と
を備えたことを特徴とする推定装置。 - 前記推定部は、前記異常パケットデータ及び前記同一長パケットデータのそれぞれの各バイトを数値として扱って比較する1次元の異常検知を行うことを特徴とする請求項1に記載の推定装置。
- 前記抽出部は、前記所定数の前記類似正常パケットデータのうち前記同一長パケットデータの数が判定閾値未満の場合、前記異常パケットデータと前記類似正常パケットデータとの間の編集距離を基に、前記異常バイト箇所を推定することを特徴とする請求項1又は2に記載の推定装置。
- パケットデータの各バイトの値の特徴を表すそれぞれのベクトルを各前記バイトに対応づけるベクトルデータに、前記パケットデータを変換する前記自然言語処理モデルを用いて、複数の前記正常パケットデータが変換された複数の正常ベクトルデータの中から、前記異常パケットデータが前記自然言語処理モデルを用いて変換された異常ベクトルデータとの類似度が相対的に高い前記所定数の類似正常ベクトルデータを特定し、前記類似正常ベクトルデータの変換前の前記正常パケットデータを前記類似正常パケットデータとして抽出することを特徴とする請求項1~3のいずれか一つに記載の推定装置。
- 前記抽出部は、前記自然言語処理モデルとしてBidirectional Encoder Representations from Transformers(BERT)を用いることを特徴とする請求項1~4のいずれか一つに記載の推定装置。
- 自然言語処理モデルを基に、複数の正常パケットデータの中から異常パケットデータとの類似度が相対的に高い類似正常パケットデータを所定数抽出し、
前記類似正常パケットデータの中から前記異常パケットデータとパケット長が同一の同一長パケットデータを抽出し、
前記異常パケットデータと前記同一長パケットデータとをバイト毎に比較して異常バイト箇所を推定する
ことを特徴とする推定方法。 - 自然言語処理モデルを基に、複数の正常パケットデータの中から異常パケットデータとの類似度が相対的に高い類似正常パケットデータを所定数抽出し、
前記類似正常パケットデータの中から前記異常パケットデータとパケット長が同一の同一長パケットデータを抽出し、
前記異常パケットデータと前記同一長パケットデータとをバイト毎に比較して異常バイト箇所を推定する
処理をコンピュータに実行させることを特徴とする推定プログラム。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/046840 WO2023112333A1 (ja) | 2021-12-17 | 2021-12-17 | 推定装置、推定方法及び推定プログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/046840 WO2023112333A1 (ja) | 2021-12-17 | 2021-12-17 | 推定装置、推定方法及び推定プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023112333A1 true WO2023112333A1 (ja) | 2023-06-22 |
Family
ID=86773961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/046840 WO2023112333A1 (ja) | 2021-12-17 | 2021-12-17 | 推定装置、推定方法及び推定プログラム |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023112333A1 (ja) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019110513A (ja) * | 2017-12-15 | 2019-07-04 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 異常検知方法、学習方法、異常検知装置、および、学習装置 |
CN112365338A (zh) * | 2020-11-11 | 2021-02-12 | 平安普惠企业管理有限公司 | 基于人工智能的数据欺诈检测方法、装置、终端及介质 |
-
2021
- 2021-12-17 WO PCT/JP2021/046840 patent/WO2023112333A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019110513A (ja) * | 2017-12-15 | 2019-07-04 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 異常検知方法、学習方法、異常検知装置、および、学習装置 |
CN112365338A (zh) * | 2020-11-11 | 2021-02-12 | 平安普惠企业管理有限公司 | 基于人工智能的数据欺诈检测方法、装置、终端及介质 |
Non-Patent Citations (1)
Title |
---|
YAMANAKA YUUKI, YAMADA MASANORI, TAKAHASHI TOMOKATSU, NAGAI TOMOHIRO: "Utilizing BERT for Feature Extraction of Packet Payload", PROCEEDINGS OF THE ANNUAL CONFERENCE OF JSAI, THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 11 June 2021 (2021-06-11), pages 1 - 3, XP093015030, Retrieved from the Internet <URL:https://www.jstage.jst.go.jp/article/pjsai/JSAI2021/0/JSAI2021_1F2GS10a04/_pdf/-char/en> [retrieved on 20230118], DOI: 10.11517/pjsai.JSAI2021.0_1F2GS10a04 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3796176B1 (en) | Fault root cause analysis method and apparatus | |
CN111191767B (zh) | 一种基于向量化的恶意流量攻击类型的判断方法 | |
JP3832281B2 (ja) | 外れ値ルール生成装置と外れ値検出装置、その外れ値ルール生成方法と外れ値検出方法及びそのプログラム | |
CN110298035B (zh) | 基于人工智能的字向量定义方法、装置、设备及存储介质 | |
CN108733508B (zh) | 用于控制数据备份的方法和系统 | |
CN110912908B (zh) | 网络协议异常检测方法、装置、计算机设备和存储介质 | |
CN109981625B (zh) | 一种基于在线层次聚类的日志模板抽取方法 | |
CN103870751A (zh) | 入侵检测方法及系统 | |
CN108491228B (zh) | 一种二进制漏洞代码克隆检测方法及系统 | |
CN113518063A (zh) | 基于数据增强和BiLSTM的网络入侵检测方法及系统 | |
EP3404572A1 (en) | Attack code detection device, attack code detection method, and attack code detection program | |
CN113656807A (zh) | 一种漏洞管理方法、装置、设备及存储介质 | |
CN111984792A (zh) | 网站分类方法、装置、计算机设备及存储介质 | |
CN117688342B (zh) | 一种基于模型的设备状态预测方法、电子设备及存储介质 | |
WO2023112333A1 (ja) | 推定装置、推定方法及び推定プログラム | |
CN117909864A (zh) | 一种电力故障预测系统及方法 | |
CN111274202A (zh) | 电子合同生成方法、装置、计算机设备及存储介质 | |
AU2021479523A1 (en) | Estimation device, estimation method, and estimation program | |
JP7444287B2 (ja) | 推定装置、推定方法、および、推定プログラム | |
CN113360899B (zh) | 一种机器行为的识别方法及系统 | |
CN115147020A (zh) | 装修数据处理方法、装置、设备及存储介质 | |
CN114968719A (zh) | 线程运行状态分类方法、装置、计算机设备及存储介质 | |
CN110727538B (zh) | 一种基于模型命中概率分布的故障定位系统及方法 | |
CN113407495A (zh) | 一种基于simhash的文件相似度判定方法及系统 | |
WO2023112227A1 (ja) | 異常検知装置、異常検知方法、および、異常検知プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21968246 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023567501 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: AU2021479523 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021968246 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021968246 Country of ref document: EP Effective date: 20240617 |