WO2022239166A1 - 抽出方法、抽出装置及び抽出プログラム - Google Patents
抽出方法、抽出装置及び抽出プログラム Download PDFInfo
- Publication number
- WO2022239166A1 WO2022239166A1 PCT/JP2021/018127 JP2021018127W WO2022239166A1 WO 2022239166 A1 WO2022239166 A1 WO 2022239166A1 JP 2021018127 W JP2021018127 W JP 2021018127W WO 2022239166 A1 WO2022239166 A1 WO 2022239166A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- ioc
- feature information
- extraction unit
- dns
- domain name
- Prior art date
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 84
- 230000008520 organization Effects 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims description 27
- 238000001514 detection method Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 19
- 239000000284 extract Substances 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 238000011835 investigation Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 238000007477 logistic regression Methods 0.000 description 8
- 230000003211 malignant effect Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000010365 information processing Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 244000035744 Hura crepitans Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012946 outsourcing Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/034—Test or assess a computer or a system
Definitions
- the present invention relates to an extraction method, an extraction device, and an extraction program.
- SOC security operation center
- References 1 and 2 below, analysts who process a large number of alerts on a daily basis cause a situation called alert fatigue, which leads to analyst burnout.
- Reference 1 S. C. Sundaramurthy, A. G. Bardas, J. Case, X. Ou, M. Wesch, J. McHugh, and S. R. Rajagopalan, “A human capital model for mitigating security analyst burnout,” Proc. SOUPS, 2015.
- Reference 2 Ponemon Institute, “Improving the Effectiveness of the Security Operations Center,” 2019.
- Non-Patent Documents 1 to 5 a technology has been proposed to distinguish between truly malicious alerts and non-malicious alerts that are false positives by estimating the anomaly score and malignancy score of each security-related alert from past alerts.
- Non-Patent Documents 6 to 8 there is a known technology that supports analysts' subsequent processes by extracting information that is most relevant to each security alert (see, for example, Non-Patent Documents 6 to 8).
- the technology described in the prior art document above employs characteristic information necessary to determine whether an IOC is abnormal or malignant. On the other hand, whether an IOC is abnormal or malignant and whether the IOC requires further investigation by an analyst is independent.
- the extraction method is an extraction method executed by an extraction device, and is an observation result by a predetermined organization for an IOC (Indicator of Compromise) included in cybersecurity information. and a creation step of creating characteristic information of the IOC based on information obtained from the observation results acquired by the acquisition step.
- IOC Intelligent of Compromise
- useful feature information can be obtained for determining the priority of IOC investigations.
- FIG. 1 is a diagram explaining a security system.
- FIG. 2 is a diagram showing an example of an alert monitor screen.
- FIG. 3 is a diagram showing an example of an IOC checker screen.
- FIG. 4 is a diagram illustrating a configuration example of a determination device according to the first embodiment.
- FIG. 5 is a diagram illustrating an example of a request period.
- FIG. 6 is a flowchart showing the flow of learning processing.
- FIG. 7 is a flowchart showing the flow of processing for extracting feature information.
- FIG. 8 is a flowchart showing the flow of prediction processing.
- FIG. 9 is a diagram illustrating an example of a computer that executes a determination program;
- the determination device functions as an extraction device.
- FIG. 1 is a diagram explaining a security system.
- the security system 1 performs automatic analysis by an analysis engine or analysis by an analyst based on predetermined information generated in the security appliance of the customer organization.
- Security appliances include, for example, intrusion prevention systems (IPS), proxies, sandboxes, and unified threat management (UTM).
- IPS intrusion prevention systems
- proxies proxies
- sandboxes sandboxes
- UPM unified threat management
- the SOC analyzes security-related information obtained from security appliances in real time.
- security related information includes security logs and alerts.
- the SOC is used as an outsourced SOC provided by a large-scale MSSP (Managed Security Service Provider).
- MSSP Managed Security Service Provider
- this embodiment is also applicable to an in-house SOC.
- the customer organization's security appliance sends alerts and security logs to the SOC's analysis engine 10 (step S1).
- the security system 1 can process security logs in the same manner as alerts.
- the analysis engine 10 performs automatic analysis (step S2).
- the analysis engine 10 responds to alerts by performing analysis based on known malicious characteristics and pre-defined rules and blacklists.
- the analysis engine 10 may perform analysis using a function called SOAR (Security Orchestration, Automation, and Response).
- SOAR Security Orchestration, Automation, and Response
- the analysis engine 10 sends an alert that satisfies a predetermined condition to the determination device 20, alert monitor 30, or IOC checker 40 (step S3).
- FIG. 2 is a diagram showing an example of an alert monitor screen.
- the alert monitor 30 stores the date of the event that caused the alert (Date), the customer name (Customer), the device that sent the alert (Device), the name of the alert (Alert Name), and the event that triggered the alert. An overview of the status, etc. is displayed.
- the IOC checker 40 displays information about the IOC (Indicator of Compromise) included in the alert.
- FIG. 3 is a diagram showing an example of an IOC checker screen.
- the IOC includes domain names, IP addresses, URLs, file hash values, etc.
- the IOC checker 40 can check the status of investigation in the SOC (Status), the SOC's most recent judgment on the malignancy of the IOC (SOC Last Decision), the latest threat intelligence result of the IOC (Detection in TI), etc.
- the analyst uses tools dedicated to IOC evaluation, such as the alert monitor 30 and the IOC checker 40, to triage (evaluate) IOCs for alerts that could not be processed by the analysis engine 10.
- tools dedicated to IOC evaluation such as the alert monitor 30 and the IOC checker 40, to triage (evaluate) IOCs for alerts that could not be processed by the analysis engine 10.
- the determination device 20 determines an IOC with a high priority and notifies the analyst of it. This can prevent multiple analysts from manually evaluating the same IOC at the SOC.
- the determination device 20 it is possible to preferentially analyze IOCs with high priority, so that the effect on the workload of the analyst can be improved.
- the decision device 20 learns the model or predicts the priority of the IOC using the model (step S4). Then, the determining device 20 determines an IOC with a higher priority based on the prediction result, and notifies the determined IOC (step S5).
- the decision device 20 notifies the analyst of the decided IOC via the IOC checker 40 .
- the analyst performs analysis based on the notified priority (step S6).
- the analyst may also search a threat intelligence service (eg, VirusTotal (https://www.virustotal.com/)) during the analysis (step S7).
- a threat intelligence service eg, VirusTotal (https://www.virustotal.com/)
- Some threat intelligence services provide scores regarding the level and severity of threats. However, such a score by itself does not necessarily determine the analyst's next action.
- an IOC related to an attack that uses a vulnerability that has already been patched may have a high score for being malicious, but it is not an immediate threat from the perspective of protecting customer organizations.
- the decision of the IOC with the higher priority by the deciding device 20 is useful for securing time for the analyst's decision and for reducing the investigation operation of each IOC.
- the analyst ultimately determines whether the alert to be analyzed and the IOC included in the alert are malignant or non-malignant, further determines whether reporting to the customer is necessary, and if it is necessary to report to the customer , to the system administrator of the client organization, etc. (step S8).
- the conditions for triggering alerts in the analysis engine 10 can be changed based on the results.
- the IOC can be used in the analysis engine 10 as a custom blacklist or custom signature.
- logs containing the same IOC can be automatically detected by other SOC customers. Additionally, if the assessment identifies an IOC with a low false positive or threat level, the SIEM logic that triggers the alert can be changed to prevent the same false positive alert from occurring again, reducing analyst workload. .
- FIG. 4 is a diagram showing a configuration example of the determination device according to the first embodiment.
- the determination device 20 has a feature information extraction unit 21 , label assignment unit 22 , learning unit 23 , prediction unit 24 and model information 25 .
- the decision device 20 performs model learning processing using a machine learning method and prediction processing using the learned model.
- the feature information extraction unit 21, the labeling unit 22, and the learning unit 23 are used. Also, in the prediction process, the feature information extraction unit 21 and the prediction unit 24 are used.
- the feature information extraction unit 21 extracts feature information from the IOCs included in the information on cybersecurity. For example, information regarding cyber security is an alert obtained from the analysis engine 10 .
- the feature information extraction unit 21 extracts information characterizing the characteristics of the IOC (hereinafter referred to as feature information) from the IOCs included in past alerts obtained from the analysis engine 10 .
- the feature information may be the domain name, IP address, URL, file hash value, etc. included in the IOC.
- the feature information extraction unit 21 extracts feature information from alerts that have occurred during a predetermined number of days.
- the feature information extraction unit 21 functions as an extraction device having an acquisition unit and a creation unit.
- the acquisition unit acquires observation results by a predetermined organization for IOCs included in cybersecurity information.
- the creation unit creates IOC characteristic information based on information obtained from the acquired observation results.
- the feature information extraction unit 21 creates feature information based on the observation results (items 1, 2, 3) of the threat intelligence service or the observation results (items 4, 5) of networks such as the Internet.
- the feature information of items 1, 2 and 3 is feature information focusing on the characteristics of threats already observed by the threat intelligence service in relation to each IOC.
- the feature information extraction unit 21 acquires the detection status of items related to the IOC by the threat intelligence service.
- the feature information extraction unit 21 creates feature information based on the detection situation.
- the threat intelligence service may be prepared by the customer organization or may be provided by an external organization.
- a threat intelligence service is a service such as VirusTotal that can acquire threat information related to domain names, IP addresses, URLs, and file hash values.
- the feature information extraction unit 21 refers to the threat intelligence service and obtains (1) the detection URL including the domain name, (2) the detection file communicated to the domain name, (3) the The number of the detected files downloaded from the domain name and (4) the detected files referring to the domain name are respectively counted and used as characteristic information.
- the feature information extraction unit 21 obtains, for example, four pieces of feature information. According to the feature information of item 1, it is possible to identify whether the IOC is associated with known threats.
- the detected URL containing the relevant domain name in (1) is defined as one detected by at least one of the arbitrary detection engines on the threat intelligence service among the URLs with the common domain name part.
- any arbitrary on the threat intelligence service Defined as detected by at least one of the detection engines.
- the detected files downloaded from the relevant domain name in (3) are defined as those detected by at least one or more of the arbitrary detection engines on the threat intelligence service among the files acquired from the relevant domain name. .
- At least one or more of the arbitrary detection engines on the threat intelligence service detected the detected file that mentions the domain name in (4), among the files that contain the character string of the domain name. defined as things.
- the feature information extraction unit 21 refers to the threat intelligence service and obtains (1) a non-detected URL containing the domain name, (2) a non-detected file addressed to the domain name, (3 ) non-detected files downloaded from the domain name, and (4) the number of non-detected files referring to the domain name.
- the feature information in item 2 corresponds to feature information in which the part "detected" in item 1 is replaced with "not detected".
- a URL or file not detected means that it was examined by the threat intelligence service, but was not detected as malicious or suspicious by any detection engine.
- the feature information extraction unit 21 obtains, for example, four pieces of feature information. According to the feature information of item 2, it is possible to identify whether the IOC is benign or legitimate.
- the feature information extraction unit 21 extracts information on the number of detections, which indicates how many detection engines out of a plurality of detection engines existing in the threat intelligence service have detected each of items (1) to (4) of item 1. collect.
- the feature information extraction unit 21 calculates 5 statistics (average value, minimum value, maximum value, standard deviation, variance) for the 4 types of detection numbers, and creates a total of 20 pieces of feature information.
- the feature information extraction unit 21 creates feature information based on information obtained from observation results and statistics calculated from the information.
- the feature information in item 3 it is possible to determine whether the detected URL or file is a major threat detected by more detection engines, or a minor threat detected by only a few detection engines. can be distinguished.
- a network is, for example, the Internet.
- the feature information extraction unit 21 uses a Passive DNS (Domain Name System) database to acquire information on how many times each IOC has been referred to in a certain network.
- Passive DNS Domain Name System
- a Passive DNS database is a database that records the correspondence between domain names and IP addresses and their histories from the DNS messages that are actually exchanged by observing the communication in any cache DNS server or authoritative DNS server.
- the Passive DNS database may be prepared by the customer organization or may be provided by an external organization.
- the feature information extraction unit 21 extracts five items included in items 4 and 5, a total of 147 pieces of feature information, as feature information related to communication characteristics observed within the network.
- the feature information extraction unit 21 acquires a DNS (Domain Name System) record corresponding to the domain name associated with the IOC as an observation result, and creates, for example, seven pieces of feature information based on the number of times the DNS record information has been changed. .
- DNS Domain Name System
- the feature information extraction unit 21 refers to the Passive DNS database and selects seven types of DNS resource records (A, AAAA, CNAME, MX, NS, SOA, TXT ), the number of resource record changes from a certain point in the past to the present is counted as feature information.
- the feature information extraction unit 21 counts the number of changes of the IOC corresponding to the domain name “example.com” as one.
- the feature information extraction unit 21 may extract the above feature information after associating the IOC with the domain name.
- the feature information extraction unit 21 associates the IOC with the domain name part "www.example.com” and counts.
- the characteristic information extraction unit 21 refers to the DNS reverse lookup record to obtain the corresponding domain name, or uses the Passive DNS database to link the IP address. can extract the domain name.
- the feature information extraction unit 21 can refer to the threat intelligence service to extract the domain name of the destination of communication of the file or the source of the download of the file.
- the feature information extraction unit 21 acquires DNS (Domain Name System) records corresponding to domain names associated with IOCs as observation results, and creates, for example, 140 pieces of feature information based on the number of times of use and the period of use of the DNS records. do.
- DNS Domain Name System
- the feature information extraction unit 21 creates, for example, 35 pieces of feature information based on the average value, minimum value, maximum value, standard deviation, and variance of past DNS query counts.
- the feature information extraction unit 21 first refers to the Passive DNS database as in item 4, and selects seven types of DNS resource records (A, AAAA, CNAME, MX, NS, SOA, TXT), the number of DNS queries for each combination is counted.
- the number of DNS queries is defined as the number of times a combination of resource records (eg "example.com", A record, "192.0.2.1”) is observed in the Passive DNS database.
- FIG. 5 shows an example of two past DNS A records for "example.com” and their DNS query counts (5,000 and 15,000, respectively).
- the feature information extraction unit 21 calculates five statistics (mean value, minimum value, maximum value, standard deviation, variance) for seven types of resource records, and creates a total of 35 pieces of feature information. .
- the feature information extraction unit 21 creates, for example, 35 pieces of feature information based on the average value, minimum value, maximum value, standard deviation, and variance of the elapsed days from the first DNS query.
- the feature information extraction unit 21 first refers to the Passive DNS database as in item 4, and selects seven types of DNS resource records (A, AAAA, CNAME, MX, NS, SOA, TXT), extract the date the first DNS query for each combination was made.
- the feature information extraction unit 21 calculates the number of days that have elapsed from the date to the day on which feature information is to be extracted.
- Fig. 5 shows an example of two records for "example.com" and the date when the first DNS query was observed.
- the feature information extraction unit 21 extracts the first record from 2019-10-31 to 2020-06- Count the number of days until 01, and the number of days from 2020-01-24 to 2020-06-01 for the second record.
- the feature information extraction unit 21 creates 35 pieces of feature information based on the average value, minimum value, maximum value, standard deviation, and variance of the number of days elapsed since the last DNS query.
- the feature information extraction unit 21 extracts feature information by changing "first DNS query" in item 5-2 to "last DNS query”.
- the feature information extraction unit 21 extracts the first record from 2020-01-23 to 2020-06- 01, and for the second record count the number of days from 2020-04-01 to 2020-06-01.
- the feature information extraction unit 21 calculates five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create
- the feature information extractor 21 is based on the mean, minimum, maximum, standard deviation, and variance of the period in which the DNS queries existed. 35 pieces of feature information are created.
- the characteristic information extraction unit 21 extracts items 5- Obtain the date of the first DNS query as in item 2 and the date of the last DNS query as in item 5-3.
- the characteristic information extraction unit 21 extracts the number of days from 2019-10-31 to 2020-01-23 for the first record, and the number of days from 2020-01-24 to 2020-2020 for the second record. Count the days until 04-01.
- the feature information extraction unit 21 calculates five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create five statistics (mean value, minimum value, maximum value, standard deviation, variance) of the number of days counted for each of the seven types of resource records, and obtains a total of 35 pieces of feature information. to create
- the labeling unit 22 gives each IOC a label according to the amount of work required to respond to the associated alert.
- the label is assumed to be binary data indicating whether the priority is high or not.
- the label assigning unit 22 assigns a label indicating that the priority is high to IOCs that have consumed a large amount of work of the analyst in the past, and assigns labels indicating that the priority is not high to other IOCs. do.
- the label assigning unit 22 assigns a label indicating a high priority to an IOC for which the number of manual investigations that occurred within a certain period of time for the related alert is equal to or greater than a predetermined value, and assigns a label indicating that the manual investigation is performed.
- a label indicating that the priority is not high is assigned to the IOC whose number of times is less than a predetermined value.
- priority a label indicating high priority
- non-priority a label indicating low priority
- the learning unit 23 uses learning data obtained by combining the feature information extracted by the feature information extraction unit 21 and the label assigned by the label assignment unit 22 to learn a model that outputs a label from the IOC feature information.
- the learning unit 23 creates and updates models by supervised machine learning.
- the model information 25 is information including parameters for constructing a model.
- the learning unit 23 creates and updates model information 25 .
- the learning unit 23 can employ any known supervised machine learning algorithm. In this embodiment, the learning unit 23 adopts standard logistic regression.
- Logistic regression is scalable and fast, so it is suitable for predicting IOC contained in a large number of alerts from many customers, such as in an SOC environment.
- logistic regression is known to be highly interpretable.
- the output of logistic regression by its nature can be interpreted as the probability that the input IOC is preferred, and can indicate which feature among the feature information corresponding to each IOC contributes to the result.
- logistic regression has the advantage of being highly interpretable.
- the learning unit 23 particularly uses logistic regression with L1 regularization.
- the learning unit 23 converts the conditional probability y of the label shown in equation (1) into a model as shown in equation (2).
- ⁇ is the parameter of the logistic regression model.
- ⁇ is a sigmoid function. Also assume that all features of x are normalized to the range [0,1].
- the learning unit 23 obtains the parameter ⁇ for minimizing the objective function of the formula (4) into which the hyperparameter ⁇ that determines the degree of regularization is introduced. Use a set of training data.
- 1 adds a penalty to the objective function, and has the effect of identifying and reducing feature information that does not contribute significantly.
- the prediction unit 24 uses the model trained by the learning unit 23 to predict the label from the IOC feature information.
- the prediction unit 24 uses the model learned by the learning unit 23 to input the IOC included in the alert newly generated in real time and the corresponding feature information, and determines which IOC will be used by the analyst in the future. Predict whether you will consume a lot of
- the prediction unit 24 makes predictions using a logistic regression model constructed based on the model information 25.
- the prediction unit 24 predicts the probability that the analyst will manually analyze the target IOC K times or more within P days (where P is an integer).
- the prediction unit 24 uses the parameter ⁇ determined by the learning unit 23, the prediction unit 24 obtains the probability p that the vector x of the feature information corresponding to the IOC is “priority”, and sets the predicted label ⁇ y ( ⁇ directly above y) (5) is defined by the formula.
- the decision device 20 Based on the labels predicted by the prediction unit 24, the decision device 20 outputs the IOCs that are considered to lead to repeated investigations by the SOC analyst, that is, the IOCs predicted with the "priority" label, in descending order of probability p. and present it to the analyst.
- the analyst can use the information presented by the decision device 20 to prioritize the research targets and efficiently perform triage and detailed analysis.
- the analyst can investigate IOCs with high priority and reflect the results in the analysis engine 10 .
- the analysis engine 10 can automatically process alerts containing the same IOC, avoiding the need for the analyst to manually investigate the IOC every time, and reducing the amount of operation of the SOC as a whole. .
- the analyst investigates IOCs determined to have a high priority, and based on the results, causes the analysis engine 10 to automatically analyze the IOCs. As a result, the IOC will not be handed over to other analysts, thus reducing the amount of work.
- the determining device 20 re-executes the learning process offline periodically (for example, once a day) to update the model information 25 .
- the determination device 20 performs learning processing using data for a predetermined period before and after the feature information extraction point shown in FIG. 5 .
- the determination device 20 performs the learning process using data for F+L days, which is F days up to the point of feature extraction and L days from the point of feature information extraction (where F and L are integers).
- the determination device 20 processes the IOC included in the alert from the customer organization in real time, that is, when performing the prediction process, the feature information to extract
- the determining device 20 calculates the probability p that the analyst will conduct K or more manual investigations in the future P days from the extracted feature information.
- the decision device 20 repeats the above prediction process for each IOC received in real time. As a result, a list of IOCs to be investigated preferentially by the analyst is displayed on the screen of the IOC checker 40 as shown in FIG. 3 and continuously updated.
- FIG. 6 is a flowchart showing the flow of learning processing. As shown in FIG. 6, first, the determination device 20 receives an input of past alerts (step S101).
- the determining device 20 extracts feature information from the IOC included in the input alert (step S102). Subsequently, the determination device 20 assigns a correct label regarding priority based on the amount of work of the analyst for each IOC (step S103).
- the determining device 20 learns a model that outputs a priority-related label from the feature information using the correct label (step S104).
- FIG. 7 is a flowchart showing the flow of processing for extracting feature information.
- the processing in FIG. 7 corresponds to step S102 in FIG.
- the determination device 20 acquires the IOC observation results (step S102a).
- the decision device 20 creates feature information based on the detection status by the threat intelligence service (items 1, 2, 3) (step S102b). Further, the determining device 20 creates characteristic information based on the DNS record corresponding to the domain name associated with the IOC (items 4 and 5) (step S102c).
- FIG. 8 is a flowchart showing the flow of prediction processing. As shown in FIG. 8, the determining device 20 first receives an input of the most recent alert (step S201).
- the determination device 20 extracts feature information from the IOC included in the input alert (step S202). Subsequently, the determination device 20 extracts a correct label based on the analyst's working amount for each IOC (step S203).
- the determining device 20 inputs the feature information into the learned model and predicts a label related to priority (step S204).
- the decision device 20 can notify the SOC analyst of the high-priority IOC based on the predicted label.
- the characteristic information extraction unit 21 acquires observation results by a predetermined organization with respect to IOCs included in cybersecurity-related information.
- the feature information extraction unit 21 creates IOC feature information based on information obtained from the observation results obtained by the feature information extraction unit 21 .
- the feature information extraction unit 21 acquires the detection status of items related to the IOC by the threat intelligence service.
- the feature information extraction unit 21 creates feature information based on the detection situation.
- the feature information extraction unit 21 can reflect in the feature information whether the IOC is malignant or benign, or the degree of threat of the IOC.
- the characteristic information extraction unit 21 acquires the DNS record corresponding to the domain name associated with the IOC as an observation result.
- the feature information extraction unit 21 creates feature information based on the number of changes in DNS record information.
- the characteristic information extraction unit 21 can distinguish between domain names whose DNS records themselves are frequently changed and domain names that are used stably.
- the characteristic information extraction unit 21 acquires the DNS record corresponding to the domain name associated with the IOC as an observation result.
- the feature information extraction unit 21 creates feature information based on the number of times of use and the period of use of the DNS record.
- the characteristic information extraction unit 21 can reflect the DNS trend regarding domain names in the characteristic information.
- the feature information extraction unit 21 creates feature information based on information obtained from observation results and statistics calculated from the information.
- the feature information extraction unit 21 can obtain more feature information from limited information.
- each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated.
- the specific form of distribution and integration of each device is not limited to the illustrated one, and all or part of them can be functionally or physically distributed or Can be integrated and configured.
- all or any part of each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or hardware by wired logic can be realized as Note that the program may be executed not only by the CPU but also by other processors such as a GPU.
- CPU Central Processing Unit
- the determination device 20 can be implemented by installing a determination program that executes the determination process described above as package software or online software on a desired computer.
- the information processing device can function as the decision device 20 by causing the information processing device to execute the decision program.
- the information processing apparatus referred to here includes a desktop or notebook personal computer.
- information processing devices include mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone Systems), and slate terminals such as PDAs (Personal Digital Assistants).
- the decision device 20 can also be implemented as a decision server device that uses a terminal device used by a user as a client and provides the client with services related to the above-described decision processing.
- the determination server device is implemented as a server device that provides a determination service that receives security alerts as inputs and outputs high-priority IOCs.
- the determination server device may be implemented as a web server, or may be implemented as a cloud that provides services related to the above-described determination processing through outsourcing.
- FIG. 9 is a diagram showing an example of a computer that executes a determination program.
- the computer 1000 has a memory 1010 and a CPU 1020, for example.
- Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012 .
- the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- Hard disk drive interface 1030 is connected to hard disk drive 1090 .
- a disk drive interface 1040 is connected to the disk drive 1100 .
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
- Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example.
- Video adapter 1060 is connected to display 1130, for example.
- the hard disk drive 1090 stores, for example, an OS 1091, application programs 1092, program modules 1093, and program data 1094. That is, the program that defines each process of the decision device 20 is implemented as a program module 1093 in which computer-executable code is described.
- Program modules 1093 are stored, for example, on hard disk drive 1090 .
- the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configuration of the determination device 20 .
- the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads the program modules 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes the processes of the above-described embodiments.
- the program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
- LAN Local Area Network
- WAN Wide Area Network
- security system 10 analysis engine 20 decision device 21 feature information extraction unit 22 labeling unit 23 learning unit 24 prediction unit 25 model information 30 alert monitor 40 IOC checker
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
参考文献1:S. C. Sundaramurthy, A. G. Bardas, J. Case, X. Ou, M. Wesch, J. McHugh, and S. R. Rajagopalan, “A human capital model for mitigating security analyst burnout,” Proc. SOUPS, 2015.
参考文献2:Ponemon Institute, “Improving the Effectiveness of the Security Operations Center,” 2019.
参考文献3:F. B. Kokulu, A. Soneji, T. Bao, Y. Shoshitaishvili, Z. Zhao, A. Doupe, and G.-J. Ahn, “Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues,” Proc. ACM CCS, 2019.
まず、図1を用いて、第1の実施形態に係る決定装置を含むセキュリティシステムについて説明する。図1は、セキュリティシステムについて説明する図である。
特徴情報抽出部21は、ドメイン名のIOCがあった場合、脅威インテリジェンスサービスを参照し、(1)当該ドメイン名を含む検知URL、(2)当該ドメイン名宛に通信した検知ファイル、(3)当該ドメイン名からダウンロードされた検知ファイル、(4)当該ドメイン名を言及している検知ファイルの4つの事項の数をそれぞれカウントし、特徴情報とする。
特徴情報抽出部21は、ドメイン名のIOCがあった場合、脅威インテリジェンスサービスを参照し、(1)当該ドメイン名を含む非検知URL、(2)当該ドメイン名宛に通信した非検知ファイル、(3)当該ドメイン名からダウンロードされた非検知ファイル、(4)当該ドメイン名を言及している非検知ファイルの数の4つの事項の数をそれぞれカウントし、特徴情報とする。
特徴情報抽出部21は、項目1の(1)から(4)の各事項について、脅威インテリジェンスサービスに存在する複数の検知エンジンのうち、何個の検知エンジンが検知したかという検知数の情報を収集する。
特徴情報抽出部21は、IOCに紐付くドメイン名に対応するDNS(Domain Name System)レコードを観測結果として取得し、DNSレコードの情報の変更回数を基に、例えば7個の特徴情報を作成する。
特徴情報抽出部21は、IOCに紐付くドメイン名に対応するDNS(Domain Name System)レコードを観測結果として取得し、DNSレコードの利用回数及び利用期間を基に、例えば140個の特徴情報を作成する。
特徴情報抽出部21は、過去のDNSクエリ数の平均値、最小値、最大値、標準偏差、分散に基づく、例えば35個の特徴情報を作成する。
特徴情報抽出部21は、最初のDNSクエリからの経過日数の平均値、最小値、最大値、標準偏差、分散に基づく、例えば35個の特徴情報を作成する。
特徴情報抽出部21は、最後のDNSクエリからの経過日数の平均値、最小値、最大値、標準偏差、分散に基づく、35個の特徴情報を作成する。
特徴情報抽出部21は、DNSクエリが存在していた期間の平均値、最小値、最大値、標準偏差、分散に基づく。35個の特徴情報を作成する。
図6は、学習処理の流れを示すフローチャートである。図6に示すように、まず、決定装置20は、過去のアラートの入力を受け付ける(ステップS101)。
これまで説明してきたように、特徴情報抽出部21は、サイバーセキュリティに関する情報に含まれるIOCに対する所定の組織による観測結果を取得する。特徴情報抽出部21は、特徴情報抽出部21によって取得された観測結果から得られる情報を基に、IOCの特徴情報を作成する。
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散及び統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散又は統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、CPU(Central Processing Unit)及び当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。なお、プログラムは、CPUだけでなく、GPU等の他のプロセッサによって実行されてもよい。
一実施形態として、決定装置20は、パッケージソフトウェアやオンラインソフトウェアとして上記の決定処理を実行する決定プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の決定プログラムを情報処理装置に実行させることにより、情報処理装置を決定装置20として機能させることができる。ここで言う情報処理装置には、デスクトップ型又はノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。
10 分析エンジン
20 決定装置
21 特徴情報抽出部
22 ラベル付与部
23 学習部
24 予測部
25 モデル情報
30 アラートモニタ
40 IOCチェッカー
Claims (7)
- 抽出装置によって実行される抽出方法であって、
サイバーセキュリティに関する情報に含まれるIOC(Indicator of Compromise)に対する所定の組織による観測結果を取得する取得工程と、
前記取得工程によって取得された観測結果から得られる情報を基に、IOCの特徴情報を作成する作成工程と、
を含むことを特徴とする抽出方法。 - 前記取得工程は、前記IOCに関連する事項の脅威インテリジェンスサービスによる検知状況を取得し、
前記作成工程は、前記検知状況を基に、前記特徴情報を作成することを特徴とする請求項1に記載の抽出方法。 - 前記取得工程は、前記IOCに紐付くドメイン名に対応するDNS(Domain Name System)レコードを前記観測結果として取得し、
前記作成工程は、前記DNSレコードの情報の変更回数を基に、前記特徴情報を作成することを特徴とする請求項1に記載の抽出方法。 - 前記取得工程は、前記IOCに紐付くドメイン名に対応するDNS(Domain Name System)レコードを前記観測結果として取得し、
前記作成工程は、前記DNSレコードの利用回数及び利用期間を基に、前記特徴情報を作成することを特徴とする請求項1に記載の抽出方法。 - 前記作成工程は、前記観測結果から得られる情報、及び前記情報から計算される統計量を基に、前記特徴情報を作成することを特徴とする請求項1から4のいずれか1項に記載の抽出方法。
- サイバーセキュリティに関する情報に含まれるIOC(Indicator of Compromise)に対する所定の組織による観測結果を取得する取得部と、
前記取得部によって取得された観測結果から得られる情報を基に、IOCの特徴情報を作成する作成部と、
を有することを特徴とする抽出装置。 - コンピュータに、
サイバーセキュリティに関する情報に含まれるIOC(Indicator of Compromise)に対する所定の組織による観測結果を取得する取得手順と、
前記取得手順によって取得された観測結果から得られる情報を基に、IOCの特徴情報を作成する作成手順と、
を実行させることを特徴とする抽出プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023520666A JP7563587B2 (ja) | 2021-05-12 | 2021-05-12 | 抽出方法、抽出装置及び抽出プログラム |
US18/290,029 US20240289446A1 (en) | 2021-05-12 | 2021-05-12 | Extraction method, extraction device, and extraction program |
PCT/JP2021/018127 WO2022239166A1 (ja) | 2021-05-12 | 2021-05-12 | 抽出方法、抽出装置及び抽出プログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/018127 WO2022239166A1 (ja) | 2021-05-12 | 2021-05-12 | 抽出方法、抽出装置及び抽出プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022239166A1 true WO2022239166A1 (ja) | 2022-11-17 |
Family
ID=84028059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/018127 WO2022239166A1 (ja) | 2021-05-12 | 2021-05-12 | 抽出方法、抽出装置及び抽出プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240289446A1 (ja) |
JP (1) | JP7563587B2 (ja) |
WO (1) | WO2022239166A1 (ja) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016117132A1 (ja) * | 2015-01-23 | 2016-07-28 | 株式会社Ubic | 電子メール分析システム、電子メール分析システムの制御方法、及び電子メール分析システムの制御プログラム |
WO2018235252A1 (ja) * | 2017-06-23 | 2018-12-27 | 日本電気株式会社 | 分析装置、ログの分析方法及び記録媒体 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9407645B2 (en) | 2014-08-29 | 2016-08-02 | Accenture Global Services Limited | Security threat information analysis |
US10681071B1 (en) | 2016-08-02 | 2020-06-09 | ThreatConnect, Inc. | Enrichment and analysis of cybersecurity threat intelligence and orchestrating application of threat intelligence to selected network security events |
US10469509B2 (en) | 2016-12-29 | 2019-11-05 | Chronicle Llc | Gathering indicators of compromise for security threat detection |
US11194905B2 (en) | 2019-04-09 | 2021-12-07 | International Business Machines Corporation | Affectedness scoring engine for cyber threat intelligence services |
-
2021
- 2021-05-12 JP JP2023520666A patent/JP7563587B2/ja active Active
- 2021-05-12 WO PCT/JP2021/018127 patent/WO2022239166A1/ja active Application Filing
- 2021-05-12 US US18/290,029 patent/US20240289446A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016117132A1 (ja) * | 2015-01-23 | 2016-07-28 | 株式会社Ubic | 電子メール分析システム、電子メール分析システムの制御方法、及び電子メール分析システムの制御プログラム |
WO2018235252A1 (ja) * | 2017-06-23 | 2018-12-27 | 日本電気株式会社 | 分析装置、ログの分析方法及び記録媒体 |
Non-Patent Citations (1)
Title |
---|
SHIBAHARA, TOSHIKI; KODERA, HIROKAZU; CHIBA, DAIKI; AKIYAMA, MITSUAKI; HATO, KUNIO; SÖDERSTRÖM, OLA; DALEK, DANIEL; MURATA, MASAYU: "Efficient Incident Detection by Predicting Potential Important Alerts", PROCEEDINGS OF COMPUTER SECURITY SYMPOSIUM 2019; OCTOBER 21-24, 2019, vol. 2019, 14 October 2019 (2019-10-14) - 24 October 2019 (2019-10-24), pages 1092 - 1099, XP009535472 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022239166A1 (ja) | 2022-11-17 |
JP7563587B2 (ja) | 2024-10-08 |
US20240289446A1 (en) | 2024-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Usman et al. | Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics | |
JP6916300B2 (ja) | セキュリティ脅威検出のための危殆化のインジケータを収集すること | |
US11750659B2 (en) | Cybersecurity profiling and rating using active and passive external reconnaissance | |
US20220014560A1 (en) | Correlating network event anomalies using active and passive external reconnaissance to identify attack information | |
Shen et al. | {ATTACK2VEC}: Leveraging temporal word embeddings to understand the evolution of cyberattacks | |
US11245713B2 (en) | Enrichment and analysis of cybersecurity threat intelligence and orchestrating application of threat intelligence to selected network security events | |
CA2998749C (en) | Systems and methods for security and risk assessment and testing of applications | |
US12041091B2 (en) | System and methods for automated internet- scale web application vulnerability scanning and enhanced security profiling | |
US20210360032A1 (en) | Cybersecurity risk analysis and anomaly detection using active and passive external reconnaissance | |
US8312536B2 (en) | Hygiene-based computer security | |
US10862906B2 (en) | Playbook based data collection to identify cyber security threats | |
US20200004957A1 (en) | Machine learning-based security alert escalation guidance | |
JP6401424B2 (ja) | ログ分析装置、ログ分析方法およびログ分析プログラム | |
WO2007109721A2 (en) | Tactical and strategic attack detection and prediction | |
US11374946B2 (en) | Inline malware detection | |
CN111651591A (zh) | 一种网络安全分析方法和装置 | |
US11201875B2 (en) | Web threat investigation using advanced web crawling | |
Walker et al. | Cuckoo’s malware threat scoring and classification: Friend or foe? | |
US20240259414A1 (en) | Comprehensible threat detection | |
EP3799367B1 (en) | Generation device, generation method, and generation program | |
WO2022239166A1 (ja) | 抽出方法、抽出装置及び抽出プログラム | |
US20220245249A1 (en) | Specific file detection baked into machine learning pipelines | |
WO2022239161A1 (ja) | 抽出方法、抽出装置及び抽出プログラム | |
WO2022239162A1 (ja) | 決定方法、決定装置及び決定プログラム | |
EP3999985A1 (en) | Inline malware detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21941900 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023520666 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18290029 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21941900 Country of ref document: EP Kind code of ref document: A1 |