WO2013014987A1 - 情報識別方法、プログラム及びシステム - Google Patents
情報識別方法、プログラム及びシステム Download PDFInfo
- Publication number
- WO2013014987A1 WO2013014987A1 PCT/JP2012/061294 JP2012061294W WO2013014987A1 WO 2013014987 A1 WO2013014987 A1 WO 2013014987A1 JP 2012061294 W JP2012061294 W JP 2012061294W WO 2013014987 A1 WO2013014987 A1 WO 2013014987A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- classifier
- training
- test
- time
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Definitions
- the present invention relates to identification of information by supervised machine learning, and more particularly to a technique for dealing with an attack caused by malicious alteration of information.
- the information sent by the applicant for assessment and credit includes yes / no answers to questions, numerical values such as age and annual income, and other descriptive text information.
- a predetermined operator inputs it with a computer keyboard or puts it on OCR to digitize the information.
- OCR optical character recognition
- D training ⁇ (x 1 , y 1 ), ..., (x n , y n ) ⁇
- y i ⁇ C
- the teacher data includes pass (label 1) data 102, 104, 106, 108 and fail (label 0) data 110, 112, 114. These individually correspond to individual application information.
- the supervised machine learning system forms a classifier using such training data.
- the classifier means that the feature vector of application information is x and the label is y.
- h Corresponds to a function h such as x ⁇ y.
- the above prior art can detect a malicious attack in a specific situation, but there are limitations in assuming special characteristics of data such as data homogeneity and individual abnormality of data, Although there is an assessment of the susceptibility to attack, there is a problem that the fact that the attack is focused on a false pass cannot be detected.
- an object of the present invention is to provide a technique capable of detecting, with high accuracy, falsely-accepted data created in a malicious manner in the process of examining and assessing application documents by supervised machine learning.
- Another object of the present invention is to prevent the spread of damage due to inevitable error discrimination in the process of examining and assessing application documents by supervised machine learning.
- Another object of the present invention is to avoid a situation in which, in the process of examining and assessing application documents by supervised machine learning, the user is not aware of the damage.
- the present invention has been made to solve the above-described problems. According to the present invention, first, time is added to both when preparing teacher (learning) data and when preparing test data. The data is recorded. This time is, for example, the time when data is input.
- the system according to the present invention clusters the learning data of the target class (typically a passing class). Similarly, the test data of the target class (typically a passing class) is clustered.
- the system according to the present invention was obtained by the above clustering for the learning data for each time frame of various times and widths, and for the test data for each time frame of the most recent various widths.
- the identification probability density for each partial class is aggregated.
- the system according to the present invention detects, as an abnormality, an input in which the relative frequency is statistically greatly increased by using the ratio of the probability density between learning and testing as a relative frequency for each partial class for each time frame. , Alert you to investigate in detail whether it is an attack. That is, according to the knowledge of the present invention, in such a case, there is a high possibility that a malicious person is behind learning by learning data.
- data is recorded by adding time both when preparing learning data and when preparing test data.
- frequency for each time frame after clustering between the learning data and the test data we detected potentially malicious data. It is possible to detect malicious data with high accuracy without assuming special properties such as individual abnormalities, and as a result, it is possible to improve the reliability of examination. It can also take into account the social cooperation of attackers.
- FIG. 4 there is shown a block diagram of computer hardware for realizing a system configuration and processing according to an embodiment of the present invention.
- a CPU 404 a main memory (RAM) 406, a hard disk drive (HDD) 408, a keyboard 410, a mouse 412, and a display 414 are connected to the system path 402.
- the CPU 404 is preferably based on a 32-bit or 64-bit architecture, for example, Intel Pentium (trademark) 4, Core (trademark) 2, Duo, Xeon (trademark), AMD Athlon (trademark). Etc. can be used.
- the main memory 406 preferably has a capacity of 4 GB or more.
- the hard disk drive 408 can store training data and test data of a large amount of application information such as insurance claim assessment at an insurance company, loan and credit card examination and credit at a financial company, for example, 500 GB or more It is desirable to have a capacity of
- the hard disk drive 408 stores an operating system in advance, although not individually illustrated.
- the operating system may be any compatible with the CPU 404, such as Linux (trademark), Microsoft Windows XP (trademark), Windows (trademark) 2000, Apple Computer Mac OS (trademark).
- the hard disk drive 408 may also store program language processing systems such as C, C ++, C #, and Java (trademark). This programming language processing system is used to create and maintain a processing routine or tool according to the present invention, which will be described later.
- the hard disk drive 408 may further include a text editor for writing source code for compiling with a program language processing system, and a development environment such as Eclipse (trademark).
- the keyboard 410 and the mouse 412 are loaded into the main memory 406 from the operating system or the hard disk drive 408, and are used to start a program (not shown) displayed on the display 414 and to input characters. .
- the display 414 is preferably a liquid crystal display, and an arbitrary resolution such as XGA (1024 ⁇ 768 resolution) or UXGA (1600 ⁇ 1200 resolution) can be used. Although not shown, the display 414 is used to display a cluster including false pass data that may be maliciously created.
- FIG. 5 is a functional block diagram showing a processing routine, training data 502, and test data 504 according to the present invention.
- routines are written in an existing programming language such as C, C ++, C #, Java TM, stored in the hard disk drive 408 in an executable binary format, and in response to operation of the mouse 412 or keyboard 410, It is called and executed in the main memory 406 by the action of an operating system (not shown).
- the training data 502 is stored in the hard disk drive 408 and has the following data structure.
- D (training) ⁇ (x 1 (training) , y 1 (training) , t 1 (training) ), ..., (x n (training) , y n (training) , t n (training) ) ⁇
- x i (training) is the feature vector of the i-th training data
- y i (training) is the class label of the i-th training data
- t i (training) is the time of the i-th training data. It is a stamp.
- the time stamp t i (training) is preferably the date and time when the application information is input, and has a date + time format, for example.
- the classifier generation routine 504 has a function of generating, from the training data 502, a classification parameter 508 that the classifier 510 uses for the test data 504 classification process.
- the test data 504 is stored in the hard disk drive 408 and has a data structure as shown below.
- D ' (test) ⁇ (x 1 (test) , t 1 (test) ), ..., (x m (test) , t m (test) ) ⁇
- x i (test) is a feature vector of the i-th test data
- t i (test) is a time stamp of the i-th test data.
- the time stamp t i (test) is preferably the date and time when the application information is input, and has a date + time format, for example.
- classification problems include Fisher's linear discriminant function as a linear classifier, logistic regression, naive Bayes classifier, perceptron, as well as Quadratic classifier, k-nearest neighbors, boosting, decision tree, neural network, There are a Bayesian network, a support vector machine, and a hidden Markov model, and the present invention can use any of these techniques, but in this embodiment, a support vector machine is particularly used. For more details, see Christopher M. Bishop, “Pattern Recognition And Machine Learning", 2006, Springer Verlag.
- the classifier 510 reads the test data 504, assigns a class label, and generates classified data 512 as follows.
- D (test) ⁇ (x 1 (test) , y 1 (test) , t 1 (test) ), ..., (x m (test) , y m (test) , t m (test) ) ⁇
- the cluster analysis routine 514 defines distances such as Euclidean distance and Machtan distance between the feature vectors of the training data 502, and uses this distance to perform clustering by a known method such as K-means. Section data 516 for clustering is generated.
- the partition data 516 is preferably stored on the hard disk drive 408. Since the partition data 516 defines positional information such as the boundary or center of each cluster, the cluster data 516 can be compared with the partition data 516 to know which cluster the arbitrary data belongs to. That is, the partition data 516 serves as a sub classifier.
- the clustering technique that can be used in the present invention is not limited to K-means, and any clustering technique suitable for the present invention such as a Gaussian mixture model, an aggregation method, branching clustering, or a self-organizing map can be used. .
- a divided data group may be obtained by grid division.
- Cluster analysis routine 514 writes partition data 516 representing the clustered result to hard disk drive 408.
- the time series analysis routine 518 reads the training data 502, calculates the frequency of data for each predetermined time window for each cluster (subclass) according to the partition data 516, and other statistical data, and is preferably used as the time series data 520. Save to the hard disk drive 408.
- the time series analysis routine 522 reads the test data 504, calculates the frequency of data for each predetermined time window for each cluster (subclass) according to the partition data 516, and other statistical data, and is suitably used as the time series data 524. Is stored in the hard disk drive 408.
- the abnormality detection routine 526 has a function of calculating data related to the corresponding time window of the corresponding cluster of the time series data 520 and the time series data 524 and starting the alarm routine 528 when the value is larger than a predetermined threshold value. Have.
- the alarm routine 528 has a function of displaying a cluster in which an abnormality is detected and a time window on the display 414 to notify the operator.
- FIG. 6 is a flowchart of the training data analysis process.
- the classifier generation routine 506 generates the classifier 510 by generating the classification parameter 508.
- step 604 the cluster analysis routine 514 generates a sub-classifier, ie, a partition 516 for clustering.
- step 606 the time series analysis routine 518 generates time series data 520 by calculating input frequency statistics for each subclass and time window.
- FIG. 7 is a diagram showing a flowchart specifically showing the process of step 604. That is, in this processing, the cluster analysis routine 514 generates a sub-classifier for the data of the class in step 704 in the loop spanning each class from step 702 to step 706.
- FIG. 8 is a diagram showing a flowchart of processing for analyzing test data. Steps 802 to 810 are a loop, which is processing over all data included in the test data 504.
- step 804 the classifier 510 classifies the individual data of the test data 504. Then, in step 806, the time series analysis routine 522 classifies into subclasses based on the partition data 516 using the classified data (ie, clustering), and in step 808, the time series analysis routine 522 The input frequency in the subclass in the current time window is incremented while shifting the time window.
- step 802 to step 810 When the processing loop from step 802 to step 810 is completed for all data included in the test data 504, the time series analysis routine 522 writes the time series data 524 to the hard disk drive 408.
- FIG. 9 is a diagram showing a flowchart of processing for the abnormality detection routine 526 to detect the possibility of abnormality in a predetermined time window.
- the anomaly detection routine 526 calculates the ratio of test input frequency to training data frequency over the time window.
- the abnormality detection routine 526 calculates a statistically significant increase score in each subclass.
- statistically significant means that a sufficient number of samples are available.
- the significant frequency increase score may be a simple ratio calculation, but in this embodiment, the following formula is used to calculate more accurately.
- W be the width of the time window.
- the function g () is a function for obtaining a subclass.
- mode is either training meaning training data or test meaning test data.
- the probability of occurrence of input data with label j is defined as follows.
- the abnormal increase score is defined by the following formula. In this equation, E () represents the expected value and ⁇ () represents the variance.
- This formula basically uses a moving average value of frequency and its variance, but periodic fluctuations in relative frequency may be taken into account by applying frequency transformation such as wavelet transformation.
- step 906 the abnormality detection routine 526 determines whether or not the value of the abnormality increase score exceeds the threshold value. If so, the alarm routine 528 is activated in step 908, and the subclass may be illegal. Is displayed on the display 414.
- the cost may be weighted according to the size of each sample, or may be distinguished from natural fluctuations by using the characteristics of tampering that can be an attack.
- FIG. 10 is a diagram showing the distribution of data along the time for each subclass A1, A2,..., An of class A, as training data and test data.
- the processing of the present invention detects the possibility of abnormality by the ratio of the frequency in a predetermined time window in the same subclass of the same class between the training data and the test data.
- FIG. 11 shows an example in which the possibility of such an abnormality is detected. That is, the abnormality detection routine 526 indicates that the frequency of the test data is substantially larger than the frequency of the training data in the fourth cluster (subclass) as indicated by reference numeral 1104 in a specific time frame. It detects and notifies the alarm routine 528 of the possibility of the presence of illegal data.
- the operator can narrow down the data for identifying the problem that there is a possibility that there is a problem in the data in the cluster in the time window.
- the detected misidentification that caused the attack is identified, and the labels are corrected once and moved to failure. Will get.
- Classifier generation routine 510 Classifier 514.
- Cluster analysis routine 516 ... Time series analysis routines 520, 524 ... Time series data 526 . Abnormality detection routine
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Security & Cryptography (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Finance (AREA)
- Computational Linguistics (AREA)
- Accounting & Taxation (AREA)
- Computer Hardware Design (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Technology Law (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Dtraining = {(x1,y1),...,(xn,yn)}
ここで、yi ∈ Cで、Cはクラス・ラベルの集合であるが、例えば、C = {0,1}であり、1が合格、0が不合格である。
h : x → y のような関数hに相当する。
D(training) = {(x1 (training),y1 (training),t1 (training)),...,(xn (training),yn (training),tn (training))}
ここで、xi (training)は、i番目の訓練データの特徴ベクトル、yi (training)はi番目の訓練データのクラス・ラベル、ti (training)は、i番目の訓練データのタイム・スタンプである。特徴ベクトルxi (training)(i = 1,...,n)は、電子的な申請情報の項目から好適にはコンピュータの処理により自動的に生成される。その際、必要に応じて、テキスト・マイニングなどの技術も使用される。クラス・ラベルyi (training)(i = 1,...,n)は、申請情報を予め熟練した専門の担当者が目で見て判断した結果に従い、セットされる。タイム・スタンプti (training)は、好適には、その申請情報が入力された日時であり、例えば、日付+時刻のフォーマットである。
D'(test) = {(x1 (test),t1 (test)),...,(xm (test),tm (test))}
ここで、xi (test)は、i番目のテスト・データの特徴ベクトル、ti (test)は、i番目のテスト・データのタイム・スタンプである。特徴ベクトルxi (test)(i = 1,...,m)は、電子的な申請情報の項目から好適にはコンピュータの処理により自動的に生成される。タイム・スタンプti (test)は、好適には、その申請情報が入力された日時であり、例えば、日付+時刻のフォーマットである。
D(test) = {(x1 (test),y1 (test),t1 (test)),...,(xm (test),ym (test),tm (test))}
ここで、modeは、訓練データを意味するtraining、またはテスト・データを意味するtestのどちらかである。また、ラベルjをもつ入力データの発生確率を次のように定義する。
すると、異常増加スコアは次の式で定義される。
この式で、E()は期待値、σ()は分散をあらわす。
408・・・ハードディスク・ドライブ
502・・・訓練データ
504・・・テスト・データ
506・・・分類器生成ルーチン
510・・・分類器
514・・・クラスタ解析ルーチン
516・・・区画データ
518、522・・・時系列解析ルーチン
520、524・・・時系列データ
526・・・異常検出ルーチン
Claims (18)
- コンピュータの処理により、教師付き機械学習により構成された分類器に対する不正なデータによる攻撃を検出する方法であって、
特徴データと、ラベルと、時刻を含む複数の訓練データを用意するステップと、
前記訓練データを用いて分類器を構成するステップと、
前記訓練データを用いて、前記分類器によって分類されたクラスのデータをサブクラスに分類しつつ副分類器を構成するステップと、
特徴データと、ラベルと、時刻を含む複数のテスト・データを用意するステップと、
前記分類器を用いて前記複数のテスト・データを分類するステップと、
前記副分類器を用いて前記分類された前記複数のテスト・データをサブクラスに分類するステップと、
前記時刻の所定幅の時間窓において、前記同一のサブクラス毎に、前記訓練データに対する前記テスト・データの相対的な頻度をあらわす統計的データを計算するステップと、
前記統計的データの値が所定の閾値を超えることに応答して、不正なデータによる攻撃の可能性を警報するステップを有する、
情報識別方法。 - 前記特徴データが、金融関係の申請書類の質問項目への回答を電子化した特徴ベクトルによりあらわされ、前記クラスが合格と不合格のクラスを含む、請求項1に記載の方法。
- 前記分類器が、サポートベクターマシンにより構成される、請求項1に記載の方法。
- 前記副分類器が、K-meansのアルゴリズムを利用する、請求項1に記載の方法。
- 前記不正なデータが、偽合格のデータである、請求項2に記載の方法。
- 前記統計データが、前記頻度の移動平均値とその分散を用いて計算される、請求項1に記載の方法。
- コンピュータの処理により、教師付き機械学習により構成された分類器に対する不正なデータによる攻撃を検出するプログラムであって、
前記コンピュータに、
特徴データと、ラベルと、時刻を含む複数の訓練データを用意するステップと、
前記訓練データを用いて分類器を構成するステップと、
前記訓練データを用いて、前記分類器によって分類されたクラスのデータをサブクラスに分類しつつ副分類器を構成するステップと、
特徴データと、ラベルと、時刻を含む複数のテスト・データを用意するステップと、
前記分類器を用いて前記複数のテスト・データを分類するステップと、
前記副分類器を用いて前記分類された前記複数のテスト・データをサブクラスに分類するステップと、
前記時刻の所定幅の時間窓において、前記同一のサブクラス毎に、前記訓練データに対する前記テスト・データの相対的な頻度をあらわす統計的データを計算するステップと、
前記統計的データの値が所定の閾値を超えることに応答して、不正なデータによる攻撃の可能性を警報するステップを実行させる、
情報識別プログラム。 - 前記特徴データが、金融関係の申請書類の質問項目への回答を電子化した特徴ベクトルによりあらわされ、前記クラスが合格と不合格のクラスを含む、請求項7に記載のプログラム。
- 前記分類器が、サポートベクターマシンにより構成される、請求項7に記載のプログラム。
- 前記副分類器が、K-meansのアルゴリズムを利用する、請求項7に記載のプログラム。
- 前記不正なデータが、偽合格のデータである、請求項8に記載のプログラム。
- 前記統計データが、前記頻度の移動平均値とその分散を用いて計算される、請求項7に記載のプログラム。
- コンピュータの処理により、教師付き機械学習により構成された分類器に対する不正なデータによる攻撃を検出するシステムであって、
記憶手段と、
前記記憶手段に保存された、特徴データと、ラベルと、時刻を含む複数の訓練データと、
前記訓練データを用いて構成された分類器と、
前記訓練データを用いて構成された、前記分類器によって分類されたクラスのデータをサブクラスに分類するための副分類器と、
前記訓練データに前記副分類器を適用して作成され、前記記憶手段に保存された、前記訓練データのサブクラスのデータと、
前記記憶手段に保存された、特徴データと、ラベルと、時刻を含む複数のテスト・データと、
前記テスト・データに前記副分類器を適用して作成され、前記記憶手段に保存された、前記訓練データのサブクラスのデータと、
前記時刻の所定幅の時間窓において、前記同一のサブクラス毎に、前記訓練データに対する前記テスト・データの相対的な頻度をあらわす統計的データを計算する手段と、
前記統計的データの値が所定の閾値を超えることに応答して、不正なデータによる攻撃の可能性を警報する手段とを有する、
情報識別システム。 - 前記特徴データが、金融関係の申請書類の質問項目への回答を電子化した特徴ベクトルによりあらわされ、前記クラスが合格と不合格のクラスを含む、請求項13に記載のシステム。
- 前記分類器が、サポートベクターマシンにより構成される、請求項13に記載のシステム。
- 前記副分類器が、K-meansのアルゴリズムを利用する、請求項13に記載のシステム。
- 前記不正なデータが、偽合格のデータである、請求項14に記載のシステム。
- 前記統計データが、前記頻度の移動平均値とその分散を用いて計算される、請求項13に記載のシステム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112012003110.5T DE112012003110T5 (de) | 2011-07-25 | 2012-04-26 | Verfahren, Programmprodukt und System zur Datenidentifizierung |
US14/234,747 US9471882B2 (en) | 2011-07-25 | 2012-04-26 | Information identification method, program product, and system using relative frequency |
GB1401147.2A GB2507217A (en) | 2011-07-25 | 2012-04-26 | Information identification method, program and system |
JP2013525603A JP5568183B2 (ja) | 2011-07-25 | 2012-04-26 | 情報識別方法、プログラム及びシステム |
CN201280036705.8A CN103703487B (zh) | 2011-07-25 | 2012-04-26 | 信息识别方法以及系统 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011162082 | 2011-07-25 | ||
JP2011-162082 | 2011-07-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013014987A1 true WO2013014987A1 (ja) | 2013-01-31 |
Family
ID=47600847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/061294 WO2013014987A1 (ja) | 2011-07-25 | 2012-04-26 | 情報識別方法、プログラム及びシステム |
Country Status (6)
Country | Link |
---|---|
US (1) | US9471882B2 (ja) |
JP (1) | JP5568183B2 (ja) |
CN (1) | CN103703487B (ja) |
DE (1) | DE112012003110T5 (ja) |
GB (1) | GB2507217A (ja) |
WO (1) | WO2013014987A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653740A (zh) * | 2016-03-22 | 2016-06-08 | 中南林业科技大学 | 一种用于文本挖掘的系统 |
JP2020160546A (ja) * | 2019-03-25 | 2020-10-01 | 株式会社日立製作所 | 業務の外れケース抽出支援システムおよび業務の外れケース抽出支援方法 |
CN111797260A (zh) * | 2020-07-10 | 2020-10-20 | 宁夏中科启创知识产权咨询有限公司 | 基于图像识别的商标检索方法及系统 |
JP2021018757A (ja) * | 2019-07-23 | 2021-02-15 | イチロウホールディングス株式会社 | リース契約システム及びリース契約プログラム |
WO2021111540A1 (ja) | 2019-12-04 | 2021-06-10 | 富士通株式会社 | 評価方法、評価プログラム、および情報処理装置 |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10535014B2 (en) | 2014-03-10 | 2020-01-14 | California Institute Of Technology | Alternative training distribution data in machine learning |
US10558935B2 (en) * | 2013-11-22 | 2020-02-11 | California Institute Of Technology | Weight benefit evaluator for training data |
US9953271B2 (en) | 2013-11-22 | 2018-04-24 | California Institute Of Technology | Generation of weights in machine learning |
US9858534B2 (en) | 2013-11-22 | 2018-01-02 | California Institute Of Technology | Weight generation in machine learning |
US20150206064A1 (en) * | 2014-01-19 | 2015-07-23 | Jacob Levman | Method for supervised machine learning |
TWI528216B (zh) * | 2014-04-30 | 2016-04-01 | 財團法人資訊工業策進會 | 隨選檢測惡意程式之方法、電子裝置、及使用者介面 |
US9686312B2 (en) * | 2014-07-23 | 2017-06-20 | Cisco Technology, Inc. | Verifying network attack detector effectiveness |
CN104616031B (zh) * | 2015-01-22 | 2018-06-12 | 哈尔滨工业大学深圳研究生院 | 迁移学习方法及装置 |
US10713140B2 (en) | 2015-06-10 | 2020-07-14 | Fair Isaac Corporation | Identifying latent states of machines based on machine logs |
US10282458B2 (en) * | 2015-06-15 | 2019-05-07 | Vmware, Inc. | Event notification system with cluster classification |
US10296982B1 (en) | 2015-10-15 | 2019-05-21 | State Farm Mutual Automobile Insurance Company | Using images and voice recordings to facilitate underwriting life insurance |
US10360093B2 (en) * | 2015-11-18 | 2019-07-23 | Fair Isaac Corporation | Detecting anomalous states of machines |
US10410113B2 (en) * | 2016-01-14 | 2019-09-10 | Preferred Networks, Inc. | Time series data adaptation and sensor fusion systems, methods, and apparatus |
JP6719724B2 (ja) * | 2016-02-05 | 2020-07-08 | 富士ゼロックス株式会社 | データ分類装置およびプログラム |
CN109074517B (zh) * | 2016-03-18 | 2021-11-30 | 谷歌有限责任公司 | 全局归一化神经网络 |
CN106383812B (zh) * | 2016-08-30 | 2020-05-26 | 泰康保险集团股份有限公司 | 新契约保单测试方法及装置 |
JP6858798B2 (ja) * | 2017-02-02 | 2021-04-14 | 日本電信電話株式会社 | 特徴量生成装置、特徴量生成方法及びプログラム |
KR20190126430A (ko) | 2017-03-31 | 2019-11-11 | 쓰리엠 이노베이티브 프로퍼티즈 컴파니 | 이미지 기반 위조 검출 |
CN109409529B (zh) * | 2018-09-13 | 2020-12-08 | 北京中科闻歌科技股份有限公司 | 一种事件认知分析方法、系统及存储介质 |
JP7331369B2 (ja) * | 2019-01-30 | 2023-08-23 | 日本電信電話株式会社 | 異常音追加学習方法、データ追加学習方法、異常度算出装置、指標値算出装置、およびプログラム |
US11715030B2 (en) | 2019-03-29 | 2023-08-01 | Red Hat, Inc. | Automatic object optimization to accelerate machine learning training |
US11966851B2 (en) | 2019-04-02 | 2024-04-23 | International Business Machines Corporation | Construction of a machine learning model |
CN110012013A (zh) * | 2019-04-04 | 2019-07-12 | 电子科技大学成都学院 | 一种基于knn的虚拟平台威胁行为分析方法及系统 |
CN111046379B (zh) * | 2019-12-06 | 2021-06-18 | 支付宝(杭州)信息技术有限公司 | 一种对抗攻击的监测方法和装置 |
CN111046957B (zh) * | 2019-12-13 | 2021-03-16 | 支付宝(杭州)信息技术有限公司 | 一种模型盗用的检测、模型的训练方法和装置 |
US11481679B2 (en) * | 2020-03-02 | 2022-10-25 | Kyndryl, Inc. | Adaptive data ingestion rates |
US20230132720A1 (en) * | 2021-10-29 | 2023-05-04 | Intuit Inc. | Multiple input machine learning framework for anomaly detection |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010128674A (ja) * | 2008-11-26 | 2010-06-10 | Nec Corp | コンピュータネットワーク、異常検出装置、異常検出方法および異常検出プログラム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1376420A1 (en) | 2002-06-19 | 2004-01-02 | Pitsos Errikos | Method and system for classifying electronic documents |
US8239677B2 (en) * | 2006-10-10 | 2012-08-07 | Equifax Inc. | Verification and authentication systems and methods |
JP2009048402A (ja) | 2007-08-20 | 2009-03-05 | Fujitsu Ltd | 申請手続不正リスク評価装置 |
CN102449660B (zh) * | 2009-04-01 | 2015-05-06 | I-切塔纳私人有限公司 | 用于数据检测的系统和方法 |
US20110218948A1 (en) * | 2009-12-15 | 2011-09-08 | Fabricio Benevenuto De Souza | Methods for detecting spammers and content promoters in online video social networks |
-
2012
- 2012-04-26 DE DE112012003110.5T patent/DE112012003110T5/de active Pending
- 2012-04-26 GB GB1401147.2A patent/GB2507217A/en not_active Withdrawn
- 2012-04-26 CN CN201280036705.8A patent/CN103703487B/zh active Active
- 2012-04-26 US US14/234,747 patent/US9471882B2/en active Active
- 2012-04-26 JP JP2013525603A patent/JP5568183B2/ja active Active
- 2012-04-26 WO PCT/JP2012/061294 patent/WO2013014987A1/ja active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010128674A (ja) * | 2008-11-26 | 2010-06-10 | Nec Corp | コンピュータネットワーク、異常検出装置、異常検出方法および異常検出プログラム |
Non-Patent Citations (2)
Title |
---|
HIROAKI OYA ET AL.: "A Technique to Reduce False Positives of Network IDS with Machine Learning", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 45, no. 8, 15 August 2004 (2004-08-15), pages 2105 - 2112 * |
STIJN VIAENE ET AL.: "Strategies for detecting fraudulent claims in the automobile insurance industry", EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, vol. 176, 2007, pages 565 - 583 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653740A (zh) * | 2016-03-22 | 2016-06-08 | 中南林业科技大学 | 一种用于文本挖掘的系统 |
JP2020160546A (ja) * | 2019-03-25 | 2020-10-01 | 株式会社日立製作所 | 業務の外れケース抽出支援システムおよび業務の外れケース抽出支援方法 |
JP7171482B2 (ja) | 2019-03-25 | 2022-11-15 | 株式会社日立製作所 | 業務の外れケース抽出支援システムおよび業務の外れケース抽出支援方法 |
JP2021018757A (ja) * | 2019-07-23 | 2021-02-15 | イチロウホールディングス株式会社 | リース契約システム及びリース契約プログラム |
JP7198405B2 (ja) | 2019-07-23 | 2023-01-04 | イチロウホールディングス株式会社 | リース契約システム及びリース契約プログラム |
WO2021111540A1 (ja) | 2019-12-04 | 2021-06-10 | 富士通株式会社 | 評価方法、評価プログラム、および情報処理装置 |
CN111797260A (zh) * | 2020-07-10 | 2020-10-20 | 宁夏中科启创知识产权咨询有限公司 | 基于图像识别的商标检索方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
JP5568183B2 (ja) | 2014-08-06 |
US9471882B2 (en) | 2016-10-18 |
DE112012003110T5 (de) | 2014-04-10 |
CN103703487B (zh) | 2016-11-02 |
GB201401147D0 (en) | 2014-03-12 |
US20140180980A1 (en) | 2014-06-26 |
GB2507217A (en) | 2014-04-23 |
CN103703487A (zh) | 2014-04-02 |
JPWO2013014987A1 (ja) | 2015-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5568183B2 (ja) | 情報識別方法、プログラム及びシステム | |
US11005872B2 (en) | Anomaly detection in cybersecurity and fraud applications | |
Du et al. | Lifelong anomaly detection through unlearning | |
Wang et al. | A two-step parametric method for failure prediction in hard disk drives | |
US20200379868A1 (en) | Anomaly detection using deep learning models | |
US8886574B2 (en) | Generalized pattern recognition for fault diagnosis in machine condition monitoring | |
EP2193478B1 (en) | Segment-based change detection method in multivariate data stream | |
KR20210145126A (ko) | 데이터 이상을 검출하고 해석하기 위한 방법, 및 관련된 시스템 및 디바이스 | |
Ebadollahi et al. | Predicting patient’s trajectory of physiological data using temporal trends in similar patients: a system for near-term prognostics | |
US20160042287A1 (en) | Computer-Implemented System And Method For Detecting Anomalies Using Sample-Based Rule Identification | |
CN111709765A (zh) | 一种用户画像评分方法、装置和存储介质 | |
US20230085991A1 (en) | Anomaly detection and filtering of time-series data | |
Jeragh et al. | Combining auto encoders and one class support vectors machine for fraudulant credit card transactions detection | |
US11727109B2 (en) | Identifying adversarial attacks with advanced subset scanning | |
US20210365771A1 (en) | Out-of-distribution (ood) detection by perturbation | |
Ghashghaei et al. | Grayscale image statistics of COVID-19 patient CT scans characterize lung condition with machine and deep learning | |
Leoni et al. | A derivative, integral, and proportional features extractor for fault detection in dynamic processes | |
Sethi et al. | Comparative Evaluation of Approaches and Mechanisms for Software Metrics | |
Niculaescu | Applying data science for anomaly and change point detection | |
US20230013470A1 (en) | Autonomic detection and correction of artificial intelligence model drift | |
Alhashem et al. | Evaluation of Machine Learning Techniques for ESP Diagnosis Using a Synthetic Time Series Dataset | |
Webster | A comparison of transfer learning algorithms for defect and vulnerability detection | |
Frank et al. | Comparative analysis of DNNs and SVMs for anomaly detection | |
Zapata-Cortes et al. | Machine Learning Models and Applications for Early Detection | |
Jahani et al. | Anomaly Detection in Cloud Computing Workloads based on Re-source Usage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12817705 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013525603 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 1401147 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20120426 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1401147.2 Country of ref document: GB |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14234747 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120120031105 Country of ref document: DE Ref document number: 112012003110 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12817705 Country of ref document: EP Kind code of ref document: A1 |
|
ENPC | Correction to former announcement of entry into national phase, pct application did not enter into the national phase |
Ref country code: GB |