TWI747334B

TWI747334B - Fraud measurement detection device, method, program product and computer readable medium

Info

Publication number: TWI747334B
Application number: TW109120434A
Authority: TW
Inventors: 王其宏
Original assignee: 王其宏
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2021-11-21
Also published as: TW202201254A

Abstract

A fraud measurement detection device, method, program product and computer readable medium. The device includes a data collection module, a storage module, a data pre-process module, a calibration module and a fraud judgment module. When a measurement interval is judged as suspicious one and in the meantime the distribution is identified as being suspicious also then the judgment module would deduct the fraud data based on distribution. With the invention, real time analysis could be available remotely for the validity of measurement data to avoid fraud data by manual manipulation as well as to reflect truly quality level and detect possible quality problem in advance and reduce redundant inspection cost to save resource.

Description

Device, method, program product and computer readable medium for detecting data fraud

本發明係關於一種機器學習及詐欺識別的裝置、方法、程式產品及電腦可讀取媒體，特別是指一種檢測數據詐欺的裝置、方法、程式產品及電腦可讀取媒體。 The present invention relates to a device, method, program product, and computer readable medium for machine learning and fraud identification, in particular to a device, method, program product, and computer readable medium for detecting data fraud.

過去的詐欺識別多數用在銀行、證券、保險業中，例如有US 8,862,526 B2、US 10,115,111 B2以及US 10,325,271 B2，主要是藉由系統生成對應用戶的因果模型，並藉由因果模型預測用戶在下一個事件期間的期望行為。普遍用於語音識別或聯合機率預測的隱藏式馬可夫模型(Hidden Markov Model,HMM)，除了在CN 105608536 A中被使用於其他產業的風險及失誤預測，也在WO 2019037205 A1中被使用在詐欺識別的方面。 In the past, fraud identification is mostly used in the banking, securities, and insurance industries. For example, US 8,862,526 B2, US 10,115,111 B2, and US 10,325,271 B2 are mainly used to generate a causal model corresponding to the user through the system, and use the causal model to predict the user’s next move. Expected behavior during the event. The Hidden Markov Model (HMM), which is commonly used for speech recognition or joint probability prediction, is used in CN 105608536 A for risk and misprediction in other industries, and it is also used for fraud identification in WO 2019037205 A1. Aspect.

對於量測設備的詐欺行為或意圖，大多以硬體手段偵測，具體例如有EP 2205947 A1及EP 1564533 B1等等，但前述專利案未能考量設備的通用性。 For fraudulent behaviors or intentions of measurement equipment, most of them are detected by hardware means, for example, EP 2205947 A1 and EP 1564533 B1, etc., but the aforementioned patents fail to consider the versatility of the equipment.

爰此，本發明人為解決供應鏈中不信任所產生的重複檢驗問題，以及重複檢驗所衍生的質檢成本並縮短交期，而提出一種檢測數據詐欺裝置，用於檢測一量測設備的一量測行為，該檢測數據詐欺裝置包含：一量測數據擷取模組，讀取該量測設備量測到的一量測數據及一量測間隔；一儲存模組，訊號連接該量測數據擷取模組，該儲存模組儲存該量測數據及該量測間隔；一量測數據處理模組，訊號連接該儲存模組，該量測數據處理模組根據該量測數據的一量測數據分布與對應的一規格值，計算其z-score分布及取得一特徵值集合；一模擬數據產生模組，訊號連接該量測數據處理模組，該模擬數據產生一模擬正常數據及一模擬詐欺數據的分布，並取得該模擬正常數據及該模擬詐欺數據的一模擬特徵值集合；一量測數據分類模組，訊號連接該量測數據處理模組及該模擬數據產生模組，該量測數據分類模組根據該特徵值集合或該模擬特徵值集合進行訓練及特徵選擇運算，建立一分類模型，以分類該量測數據分布是否為一可疑數據分布；一量測間隔異常偵測模組，訊號連接該儲存模組，該量測間隔異常偵測模組根據該量測間隔建立一無監督學習模型，以判斷該量測間隔是否為一可疑間隔；以及一詐欺數據推定模組，訊號連接該量測間隔異常偵測模組、該量測數據分類模組及該儲存模組，當該無監督學習模型判斷該量測間隔為該可疑間隔，且該分類模型將該量測數據分布分類至該可疑數據分布時，該詐欺數據推定模組以一隱藏式馬可夫模型推定對應的詐欺數據。 Therefore, in order to solve the problem of repeated inspections caused by mistrust in the supply chain, as well as the cost of quality inspection resulting from repeated inspections and shorten the delivery time, the inventor proposes a data fraud detection device for detecting a piece of a measuring device. Measurement behavior, the device for detecting fraudulent data includes: a measurement data capture Take the module, read a measurement data and a measurement interval measured by the measurement equipment; a storage module, the signal is connected to the measurement data acquisition module, the storage module stores the measurement data and The measurement interval; a measurement data processing module, the signal is connected to the storage module, the measurement data processing module calculates its z-score based on a measurement data distribution of the measurement data and a corresponding specification value Distribute and obtain a set of characteristic values; an analog data generation module, a signal connected to the measurement data processing module, the analog data generates a simulated normal data and a simulated fraud data distribution, and obtain the simulated normal data and the simulation A set of simulated characteristic values of fraud data; a measurement data classification module, a signal is connected to the measurement data processing module and the analog data generation module, the measurement data classification module is based on the characteristic value set or the simulated characteristic The value set performs training and feature selection operations to establish a classification model to classify whether the measurement data distribution is a suspicious data distribution; a measurement interval abnormality detection module, the signal is connected to the storage module, the measurement interval is abnormal The detection module establishes an unsupervised learning model based on the measurement interval to determine whether the measurement interval is a suspicious interval; The measurement data classification module and the storage module, when the unsupervised learning model determines that the measurement interval is the suspicious interval, and the classification model classifies the measurement data distribution into the suspicious data distribution, the fraud data presumption model The group uses a hidden Markov model to estimate the corresponding fraud data.

進一步，有一校正模組訊號連接該量測數據擷取模組及該詐欺數據推定模組，該校正模組包含一簡單物件，該校正模組由配合運用該簡單物件取得一校正參數，該隱藏式馬可夫模型由該校正模組的該校正參數及Baum-Welch演算法分別建立並與一初始條件的良率比較而得到，且詐欺數據的推定係由該隱藏式馬可夫模型以Viterbi演算法得之。 Furthermore, a calibration module signal is connected to the measurement data acquisition module and the fraud data estimation module, the calibration module includes a simple object, and the calibration module obtains a calibration parameter by cooperating with the simple object, and the hidden The Markov model is established by the calibration parameters of the calibration module and the Baum-Welch algorithm and compared with the yield of an initial condition, and the presumption of fraud data is obtained by the hidden Markov model using the Viterbi algorithm. .

進一步，該校正模組給予一校正指令，該校正模組根據對該簡單物件執行該校正指令的結果決定該校正參數。 Further, the calibration module gives a calibration command, and the calibration module determines the calibration parameter according to the result of executing the calibration command on the simple object.

其中，該模擬數據產生模組利用亂數產生該模擬詐欺數據。 Wherein, the simulated data generating module uses random numbers to generate the simulated fraud data.

其中，該量測數據擷取模組利用OCR(Optical Character Recognition)技術讀取該量測數據。 Among them, the measurement data acquisition module uses OCR (Optical Character Recognition) technology to read the measurement data.

其中，該量測數據分類模組以支持向量機(Support Vector Machine,SVM)及/或隨機森林(Random Forests,RF)演算法建立該分類模型。 Among them, the measurement data classification module uses Support Vector Machine (SVM) and/or Random Forests (Random Forests, RF) algorithms to establish the classification model.

其中，該量測間隔異常偵測模組以孤立森林(Isolation Forests,IF)及/或z-score演算法建立該無監督學習模型。 Wherein, the measurement interval anomaly detection module uses Isolation Forests (IF) and/or z-score algorithm to establish the unsupervised learning model.

其中，該量測數據分類模組利用自助法抽樣以取得該特徵值集合及該模擬特徵值集合。 Wherein, the measurement data classification module uses self-service sampling to obtain the feature value set and the simulated feature value set.

其中，該量測數據處理模組對該量測數據分布及該規格值進行z-score分布轉換。 Wherein, the measurement data processing module performs z-score distribution conversion on the measurement data distribution and the specification value.

本發明人再提出一種檢測數據詐欺方法，包含：利用一量測數據擷取模組讀取一量測設備量測到的一量測數據及一量測間隔；利用一儲存模組儲存該量測數據及該量測間隔，利用一量測數據處理模組根據該量測數據的一量測數據分布與對應的一規格值，計算取得一特徵值集合；利用一量測數據分類模組根據該特徵值集合，或一模擬正常數據及一模擬詐欺數據的一模擬特徵值集合進行訓練及特徵選擇運算，建立一分類模型，以分類該量測數據分布是否為一可疑數據分布；利用一量測間隔異常偵測模組根據該量測間隔建立一無監督學習模型，以判斷該量測間隔是否為一可疑間隔；以及當該無監督學習模型判斷該量測間隔為該可疑間隔，且該分類模型將該量測數據分布分類至該可疑數據分布時，一詐欺數據推定模組以一隱藏式馬可夫模型推定對應的詐欺數據。 The inventor further proposes a method for detecting data fraud, including: using a measurement data acquisition module to read a measurement data and a measurement interval measured by a measurement device; using a storage module to store the amount For the measurement data and the measurement interval, a measurement data processing module is used to calculate a feature value set according to a measurement data distribution of the measurement data and a corresponding specification value; a measurement data classification module is used according to The feature value set, or a simulated feature value set that simulates normal data and a simulated fraud data, is trained and feature selection operations are performed to establish a classification model to classify whether the measurement data distribution is a suspicious data distribution; use a quantity The measurement interval anomaly detection module establishes an unsupervised learning model based on the measurement interval to determine whether the measurement interval is a suspicious interval; and when the unsupervised learning model determines that the measurement interval is the suspicious interval, and the When the classification model classifies the measurement data distribution into the suspicious data distribution, a fraud data estimation module uses a hidden Markov model to infer the corresponding fraud data.

進一步，利用一校正模組配合運用一簡單物件取得一校正參數；該隱藏式馬可夫模型由該校正模組的該校正參數及Baum-Welch演算法建立並與一初始條件的良率比較而得到，且詐欺數據的推定係由該隱藏式馬可夫模型以Viterbi演算法得之。 Further, a calibration module is used in conjunction with a simple object to obtain a calibration parameter; the hidden Markov model is established by the calibration parameter of the calibration module and the Baum-Welch algorithm and is obtained by comparing the yield rate with an initial condition, And the presumption of fraud data is obtained by the hidden Markov model using the Viterbi algorithm.

其中，該模擬正常數據及該模擬詐欺數據為該儲存模組中儲存的一歷史量測數據及一歷史量測間隔，或是由一模擬數據產生模組產生該模擬正常數據及該模擬詐欺數據的分布。 Wherein, the simulated normal data and the simulated fraud data are a historical measurement data and a historical measurement interval stored in the storage module, or the simulated normal data and the simulated fraud data are generated by an analog data generation module Distribution.

其中，該量測數據擷取模組利用OCR技術讀取該量測數據。 Wherein, the measurement data acquisition module uses OCR technology to read the measurement data.

其中，該量測數據分類模組以支持向量機及/或隨機森林演算法建立該分類模型。 Wherein, the measurement data classification module uses a support vector machine and/or random forest algorithm to establish the classification model.

其中，該量測間隔異常偵測模組以孤立森林及/或z-score演算法建立該無監督學習模型。 Wherein, the measurement interval anomaly detection module uses an isolation forest and/or z-score algorithm to establish the unsupervised learning model.

本發明人進一步提供一種程式產品，用以在載入一電腦裝置後，執行前述檢測數據詐欺方法。 The inventor further provides a program product for executing the aforementioned data fraud detection method after being loaded into a computer device.

本發明人進一步提供一種電腦可讀取媒體，用以在載入一電腦裝置後，執行前述檢測數據詐欺方法。 The present inventor further provides a computer-readable medium for performing the aforementioned data fraud detection method after being loaded into a computer device.

根據上述技術特徵可達成以下功效： According to the above technical features, the following effects can be achieved:

1.可以遠距即時分析質檢量測數據的有效性，避免人為偽造數據而詐欺，同時即時反映真實的品質水平，提早發現可能的品質問題及減少重複質檢所需資源、耗費成本。 1. The effectiveness of quality inspection measurement data can be analyzed remotely and instantaneously, avoiding fraud by artificially falsifying data, while reflecting the true quality level in real time, detecting possible quality problems early and reducing the resources and costs required for repeated quality inspections.

2.模擬數據產生模組產生模擬正常數據及模擬詐欺數據，量測數據分類模組再據此進行機器學習而建立分類模型，可以縮短建立分類模型的所需時間，提升檢測數據詐欺裝置的實用性。 2. The simulated data generation module generates simulated normal data and simulated fraud data, and the measurement data classification module performs machine learning based on this to establish a classification model, which can shorten the time required to establish a classification model and improve the utility of the data fraud detection device sex.

3.利用無監督學習模型識別可疑間隔，可以客觀決定量測間隔是否為可疑間隔，並同時將詐欺行為的動機與實際觀測可疑現象列入是否為詐欺數據的判斷參考。 3. Using an unsupervised learning model to identify suspicious intervals can objectively determine whether the measurement interval is a suspicious interval, and at the same time include the motive of fraudulent behavior and the actual observed suspicious phenomenon as a reference for judging whether it is fraudulent data.

4.校正模組取得質檢者對簡單物件的校正參數，做為建立隱藏式馬可夫模型的參數參考，增加Baum-Welch演算法以外的選項，提高檢測數據詐欺裝置的穩定性。 4. The calibration module obtains the calibration parameters of the quality inspector for simple objects, as a parameter reference for establishing the hidden Markov model, and adds options other than the Baum-Welch algorithm to improve the stability of the data fraud detection device.

100:檢測數據詐欺裝置 100: Detect data fraud device

1:量測數據擷取模組 1: Measurement data acquisition module

11:量測數據 11: Measurement data

12:量測間隔 12: Measurement interval

2:儲存模組 2: Storage module

3:量測數據處理模組 3: Measurement data processing module

31:歷史量測數據分布 31: Historical measurement data distribution

32:歷史量測數據z-score分布 32: Historical measurement data z-score distribution

33:標稱值 33: Nominal value

34:下限值 34: Lower limit

35:上限值 35: upper limit

36:z-score標稱值 36: z-score nominal value

37:z-score下限值 37: z-score lower limit

38:z-score上限值 38: z-score upper limit

39:z-score特徵值集合 39: z-score feature value collection

4:模擬數據產生模組 4: Analog data generation module

41:模擬數據z-score分布 41: Simulation data z-score distribution

42:模擬特徵值集合 42: Simulated eigenvalue set

5:量測數據分類模組 5: Measurement data classification module

S501:進行訓練 S501: Perform training

S502:特徵選擇運算 S502: Feature selection operation

51:分類模型 51: Classification model

52:最適特徵值集合 52: Optimal eigenvalue set

6:量測間隔異常偵測模組 6: Measurement interval anomaly detection module

61:無監督學習模型 61: Unsupervised learning model

S601:孤立森林演算法 S601: Isolated forest algorithm

S602:z-score演算法 S602: z-score algorithm

62:汙染參數 62: pollution parameters

63:異常分數 63: Anomaly score

64:閾值參數 64: Threshold parameter

7:校正模組 7: Calibration module

71:簡單物件 71: Simple Objects

72:盲測箱 72: blind test box

73:校正指令分析產生器 73: Correction instruction analysis generator

74:內徑 74: inner diameter

75:外徑 75: outer diameter

76:柱高 76: Column Height

77:校正參數 77: Calibration parameters

78:校正參數表 78: Calibration parameter table

79:激勵指數 79: Incentive Index

8:詐欺數據推定模組 8: Fraud data presumption module

81:隱藏式馬可夫模型 81: Hidden Markov Model

811:狀態轉移矩陣 811: State transition matrix

812:觀測概率矩陣 812: Observation Probability Matrix

813:初始狀態 813: initial state

82:觀測序列 82: Observation sequence

821:可能抽樣OK/NG集合 821: Possible to sample OK/NG collection

[第一圖]係本發明實施例之系統方塊圖。 [The first figure] is a system block diagram of an embodiment of the present invention.

[第二圖]係本發明實施例之功能方塊圖一。 [The second figure] is the first functional block diagram of the embodiment of the present invention.

[第三圖]係本發明實施例之功能方塊圖二。 [Third Figure] is the second functional block diagram of the embodiment of the present invention.

[第四圖]係本發明實施例之功能方塊圖三。 [Fourth Figure] is the third functional block diagram of the embodiment of the present invention.

[第五圖]係本發明實施例之功能方塊圖四。 [Fifth Figure] is the fourth functional block diagram of the embodiment of the present invention.

[第六圖]係本發明實施例之功能方塊圖五。 [Figure 6] is the fifth functional block diagram of the embodiment of the present invention.

[第七圖]係本發明實施例之功能方塊圖六。 [Figure 7] is a functional block diagram 6 of the embodiment of the present invention.

[第八圖]係本發明實施例之校正模組之實施示意圖。 [Figure 8] is a schematic diagram of the implementation of the calibration module of the embodiment of the present invention.

[第九圖]係本發明實施例之校正之流程示意圖。 [Figure 9] is a schematic diagram of the calibration process of the embodiment of the present invention.

[第十圖]係本發明實施例之詐欺數據推定模組之功能方塊圖。 [Figure 10] is a functional block diagram of the fraud data estimation module of the embodiment of the present invention.

[第十一圖]係本發明實施例之詐欺數據推定之流程示意圖。 [Figure 11] is a schematic diagram of the process of inferring fraud data in an embodiment of the present invention.

綜合上述技術特徵，本發明檢測數據詐欺裝置、方法、程式產品及電腦可讀取媒體的主要功效將可於下述實施例清楚呈現。 Based on the above technical features, the main effects of the data fraud detection device, method, program product, and computer readable media of the present invention will be clearly presented in the following embodiments.

請參閱第一圖，係揭示本發明實施例檢測數據詐欺裝置100，用於執行一檢測數據詐欺方法以檢測一量測行為，也可以做為一種程式產品或一種電腦可讀取媒體，在載入一電腦裝置後，執行該檢測數據詐欺方法，該檢測數據詐欺裝置100包含：一量測數據擷取模組1、一儲存模組2、一量測數據處理模組3、一模擬數據產生模組4、一量測數據分類模組5、一量測間隔異常偵測模組6、一校正模組7及一詐欺數據推定模組8。 Please refer to the first figure, which discloses a data fraud detection device 100 according to an embodiment of the present invention, which is used to implement a data fraud detection method to detect a measurement behavior. It can also be used as a program product or a computer readable medium, which After entering a computer device, the method for detecting data fraud is executed. The device 100 for detecting data fraud includes: a measurement data acquisition module 1, a storage module 2, a measurement data processing module 3, and an analog data generation Module 4, a measurement data classification module 5, a measurement interval anomaly detection module 6, a correction module 7 and a fraud data estimation module 8.

請參閱第一圖及第二圖，該量測數據擷取模組1讀取一量測設備在質檢過程中所顯示的一量測數據11及一量測間隔12，該量測數據擷取模組1可以是具有拍照功能的各種電子設備，包含但不限於智能手機、平板電腦等等，並可以在該量測數據擷取模組1上安裝應用軟體以讀入影像並使用OCR技術讀取該量測數據11。該量測間隔12是上次該量測數據11完成確認至本次該量測數據11完成確認之間的時間間隔。該量測數據擷取模組1在完成每一次的影像擷取之後，可以藉由語音、影像、燈光等方式提示操作該量測設備的一質檢者確認是否要繼續進行影像擷取，也可以由該質檢者先預設要影像擷取的次數，該量測數據擷取模組1根據要影像擷取的次數於該質檢者確認後自動進行至完成影像擷取。 Please refer to the first and second figures. The measurement data acquisition module 1 reads a measurement data 11 and a measurement interval 12 displayed by a measurement device in the quality inspection process. The measurement data capture The acquisition module 1 can be various electronic devices with camera functions, including but not limited to smart phones, tablet computers, etc., and application software can be installed on the measurement data acquisition module 1 to read images and use OCR technology Read the measurement data 11. The measurement interval 12 is the time interval from the last time the measurement data 11 is confirmed to this time the measurement data 11 is confirmed. After the measurement data capture module 1 completes each image capture, it can prompt a quality inspector operating the measurement equipment to confirm whether to continue image capture through voice, image, light, etc. The quality inspector can first preset the number of image captures to be captured, and the measurement data acquisition module 1 automatically proceeds to the completion of image capture after the quality inspector confirms the number of image captures.

該儲存模組2訊號連接該量測數據擷取模組1，該量測數據擷取模組1取得該量測數據11及該量測間隔12後送至該儲存模組2而做為一歷史量測數據儲存，而儲存在該儲存模組2中的該歷史量測數據可以在後續的步驟中被分類。 The storage module 2 is signally connected to the measurement data acquisition module 1. The measurement data acquisition module 1 obtains the measurement data 11 and the measurement interval 12 and sends them to the storage module 2 as a The historical measurement data is stored, and the historical measurement data stored in the storage module 2 can be classified in subsequent steps.

請參閱第一圖及第三圖，該量測數據處理模組3訊號連接該儲存模組2，該量測數據處理模組3可以先就儲存在該儲存模組2中的該歷史量測數據產生一歷史量測數據分布31，計算平均值、標準差後進行z-score轉換，並取得一歷史量測數據z-score分布32。該量測數據處理模組3也可以對預先設置的一規格值，例如一標稱值33、一下限值34及一上限值35，進行z-score轉換，並分別取得一z-score標稱值36、一z-score下限值37及一z-score上限值38。該量測數據處理模組3依照所設定的一量測數量(例如30件)形成的該歷史量測數據分布31取得該歷史量測數據z-score分布32後，還可以計算該歷史量測數據z-score分布32的一z-score特徵值集合39，該z-score特徵值集合39包含但不限於偏度、峰度等描述統計量數。若未達成該量測數量(例如小於30件但大於15件)，該量測數據處理模組3則可以利用自助抽樣法(Bootstrapping)重複抽樣至該量測數量並計算特徵值後，遞迴例如10,000次再利用中央極限定理取出平均值而產生該z-score特徵值集合39。 Please refer to the first and third figures. The measurement data processing module 3 is signaled to the storage module 2. The measurement data processing module 3 can firstly store the historical measurement in the storage module 2 The data generates a historical measurement data distribution 31, calculates the average value and standard deviation, and performs z-score conversion, and obtains a historical measurement data z-score distribution 32. The measurement data processing module 3 can also perform z-score conversion on a preset specification value, such as a nominal value 33, a lower limit value 34, and an upper limit value 35, and obtain a z-score standard value respectively. Weighing 36, a z-score lower limit 37, and a z-score upper limit 38. After the measurement data processing module 3 obtains the historical measurement data z-score distribution 32 formed according to a set measurement quantity (for example, 30 pieces), the historical measurement data distribution 31 can also calculate the historical measurement data. A z-score feature value set 39 of the data z-score distribution 32, the z-score feature value set 39 includes but is not limited to descriptive statistics such as skewness and kurtosis. If the measurement quantity is not reached (for example, less than 30 pieces but greater than 15 pieces), the measurement data processing module 3 can use Bootstrapping to repeatedly sample to the measurement quantity and calculate the characteristic value, and then recurse For example, the central limit theorem is used 10,000 times to take the average value to generate the z-score feature value set 39.

請參閱第一圖及第四圖，並請搭配第三圖，該模擬數據產生模組4訊號連接該量測數據處理模組3，可以利用亂數模擬產生一模擬數據z-score分布41，若該模擬數據z-score分布41中的數據低於該z-score下限值37或高於該z-score上限值38，則藉由包含但不限於以下所列的方式處理：重新進行亂數取樣、以該z-score標稱值36取代、或由z=0取代產生數據。該模擬數據產生模組4再計算該模擬數據z-score分布41的一模擬特徵值集合42，該模擬特徵值集合42包含但不限於偏度、峰度等描述統計量數。 Please refer to the first and fourth figures, and please match the third figure. The analog data generating module 4 is connected to the measurement data processing module 3, and a random number simulation can be used to generate an analog data z-score distribution 41. If the data in the simulation data z-score distribution 41 is lower than the z-score lower limit value 37 or higher than the z-score upper limit value 38, then the method includes but not limited to the following processing: Random sampling, replace with the z-score nominal value of 36, or replace with z=0 to generate data. The analog data generation module 4 counts A simulated feature value set 42 of the simulated data z-score distribution 41 is calculated. The simulated feature value set 42 includes but is not limited to descriptive statistics such as skewness and kurtosis.

請參閱第一圖及第五圖，該量測數據分類模組5訊號連接該量測數據處理模組3及該模擬數據產生模組4，該量測數據分類模組5就該z-score特徵值集合39或該模擬特徵值集合42以監督式學習進行訓練S501及特徵選擇運算S502，進而建立一分類模型51。該分類模型51至少可以使用支持向量機或隨機森林演算法建立。當使用支持向量機演算法建立該分類模型51時，可以利用迭代特徵消除(Recursive Feature Elimination,RFE)與交叉驗證(Cross Validation,CV)找出一最適特徵值集合52以建立該分類模型51；當使用隨機森林演算法建立該分類模型51時，可以計算特徵重要性(Feature Importance)，取得特徵重要性比集合均值還要大的特徵做為該最適特徵值集合52進行訓練以建立該分類模型51。該分類模型51可以用來判斷分類新輸入的該最適特徵值集合52是否為一可疑數據分布。該模擬數據產生模組4先產生該模擬正常數據及該模擬詐欺數據，該量測數據分類模組5再據此進行機器學習而建立該分類模型51，可以縮短建立該分類模型51的所需時間，提升該檢測數據詐欺裝置100的實用性。 Please refer to the first and fifth figures. The measurement data classification module 5 is signaled to connect the measurement data processing module 3 and the analog data generation module 4, and the measurement data classification module 5 is based on the z-score. The feature value set 39 or the simulated feature value set 42 is trained by supervised learning S501 and feature selection operation S502, and then a classification model 51 is established. The classification model 51 can be established using at least a support vector machine or a random forest algorithm. When using the support vector machine algorithm to establish the classification model 51, iterative feature elimination (Recursive Feature Elimination, RFE) and cross validation (Cross Validation, CV) can be used to find an optimal feature value set 52 to establish the classification model 51; When the random forest algorithm is used to establish the classification model 51, the feature importance (Feature Importance) can be calculated, and the feature whose feature importance is greater than the set mean is obtained as the optimal feature value set 52 for training to establish the classification model 51. The classification model 51 can be used to determine whether the optimal feature value set 52 that is newly input for classification is a suspicious data distribution. The simulated data generation module 4 first generates the simulated normal data and the simulated fraud data, and the measurement data classification module 5 then performs machine learning based on this to establish the classification model 51, which can shorten the need to establish the classification model 51 Time improves the practicability of the device 100 for detecting data fraud.

請參閱第一圖及第六圖，該量測間隔異常偵測模組6訊號連接該儲存模組2，該量測間隔異常偵測模組6可以利用孤立森林演算法S601或z-score演算法S602建立一無監督學習模型61。當使用孤立森林演算法S601建立該無監督學習模型61時，該量測間隔異常偵測模組6先就該儲存模組2中儲存的該量測間隔12與預設的一汙染參數62(Contamination Parameter)，該汙染參數62例如3%，計算一異常分數63，將該異常分數63為負值的該量測間隔12設為異常值以建立該無監督學習模型61。同時該量測間隔異常偵測模組6可以將該異常分數 63的分布進行四分位數計算，並以第三四分位數及第一四分位數分別加減如2倍第三四分位數及第一四分位數間之差值，做為新的異常值上下限，可以較佳的以相對分數界定異常值。當使用z-score演算法S602建立該無監督學習模型61時，先取一閾值參數(Threshold value)64，該閾值參數64例如3，進行z-score轉換後，該量測間隔12超過±3的即設為異常值。 Please refer to the first and sixth figures. The measurement interval anomaly detection module 6 is signaled to the storage module 2. The measurement interval anomaly detection module 6 can use the isolated forest algorithm S601 or z-score calculation Method S602 establishes an unsupervised learning model 61. When the isolated forest algorithm S601 is used to establish the unsupervised learning model 61, the measurement interval anomaly detection module 6 first compares the measurement interval 12 stored in the storage module 2 and a preset pollution parameter 62 ( Contamination Parameter), the contamination parameter 62 is 3%, for example, an abnormality score 63 is calculated, and the measurement interval 12 where the abnormality score 63 is a negative value is set as an abnormal value to establish the unsupervised learning model 61. At the same time, the measurement interval anomaly detection module 6 can use the anomaly score The distribution of 63 is calculated by the quartile, and the third quartile and the first quartile are added or subtracted, such as twice the difference between the third quartile and the first quartile, as The new upper and lower limits of outliers can better define the outliers with relative scores. When using the z-score algorithm S602 to establish the unsupervised learning model 61, first take a threshold parameter (Threshold value) 64, the threshold parameter 64, such as 3, after z-score conversion, the measurement interval 12 exceeds ±3 It is set as an outlier.

請參閱第五圖及第六圖，並請搭配第一圖至第三圖，當發現該量測間隔12為異常值，也就是一可疑間隔時，該量測間隔異常偵測模組6會與該詐欺數據推定模組8溝通，使該量測數據分類模組5啟動該分類模型51以就該儲存模組2儲存的該歷史量測數據分布31(例如過去30件)得到的該最適特徵值集合52進行判斷分類。利用該無監督學習模型61識別該可疑間隔，可以客觀決定該量測間隔12是否為該可疑間隔，並同時將詐欺行為的動機與實際分類出的該可疑數據分布列入是否為一詐欺數據的判斷參考，舉例來說，若該量測數據11超過該規格值，該質檢者就有動機進行詐欺行為，詐欺行為例如使用該規格值取代該量測數據11，或是重複使用符合該規格值的該量測數據11，都會使該量測間隔12加長，以及該歷史量測數據分布31扭曲。 Please refer to the fifth and sixth figures, and please match the first to third figures. When the measurement interval 12 is found to be an abnormal value, that is, a suspicious interval, the measurement interval anomaly detection module 6 will Communicate with the fraud data estimation module 8 to enable the measurement data classification module 5 to activate the classification model 51 to obtain the optimum value for the historical measurement data distribution 31 (for example, the past 30) stored in the storage module 2 The feature value set 52 performs judgment and classification. Using the unsupervised learning model 61 to identify the suspicious interval, it is possible to objectively determine whether the measurement interval 12 is the suspicious interval, and at the same time, the motive of the fraudulent behavior and the actually classified distribution of the suspicious data are included in whether it is a fraud data. Judgment reference, for example, if the measurement data 11 exceeds the specification value, the quality inspector is motivated to commit fraud, such as using the specification value to replace the measurement data 11, or repeated use meets the specification The value of the measurement data 11 will lengthen the measurement interval 12 and the historical measurement data distribution 31 will be distorted.

請參閱第七圖至第九圖，並請搭配第一圖及第二圖，該校正模組7訊號連接該量測數據擷取模組1，該校正模組7包含已知幾何尺寸的複數簡單物件71、一盲測箱72及一校正指令分析產生器73。所述簡單物件71例如可以是五個外觀一致、尺寸差異甚小的空心圓柱體，並至少具備三個尺寸：一內徑74、一外徑75及一柱高76，且各尺寸間彼此獨立。在該質檢者首次使用該檢測數據詐欺裝置100前，該校正指令分析產生器73利用該量測數據擷取模組1給予一校正指令，例如依序量測所述簡單物件71的某一尺寸(例如該內徑74)若干次以蒐集初始之該量測間隔12的分布，送至該校正指令分析產生器73後，並根據該量測間隔12的分布產生一個人化行為測試項目：該校正模組7可以提示並依不同程度的一激勵指數79激勵該質檢者於給定時間內(如m秒)隨機選取所述簡單物件71並就另一個尺寸(例如該外徑75)進行量測直到量測結果滿足預設的指定條件(例如有n次的量測結果大於或小於d)，該激勵指數79例如i%，蒐集量測結果包含次數、該量測數據11及該量測間隔12決定存於一校正參數表78中的一校正參數77，該校正參數77對應該質檢者反應該校正指令的行為，且該校正參數77可以做為一隱藏式馬可夫模型81之一觀測概率矩陣812的輸入[該隱藏式馬可夫模型81及該觀測概率矩陣812請搭配第十圖]。該校正參數表78例如可以在欄標題填入A、B、C、D……，列標題則填入6、7……，該校正參數表78中的每個該校正參數77都各自對應到單一的欄標題與單一的列標題，A6欄位即對應到欄標題A及列標題6，該校正參數表78的A6欄位的值為0.996、B6欄位的值為0.985、C6欄位的值為0.95、D6欄位的值為0.1，以這些值做為該校正參數77，實際實施時，該校正參數77及該校正參數表78不限於此。 Please refer to the seventh to ninth figures, and please match the first and second figures. The calibration module 7 is connected to the measurement data acquisition module 1, and the calibration module 7 contains a complex number of known geometric dimensions. Simple object 71, a blind test box 72, and a calibration instruction analysis generator 73. The simple object 71 can be, for example, five hollow cylinders with the same appearance and very small differences in size, and have at least three sizes: an inner diameter 74, an outer diameter 75 and a column height 76, and each size is independent of each other. . Before the quality inspector uses the data fraud detection device 100 for the first time, the calibration command analysis generator 73 uses the measurement data acquisition module 1 to give a calibration command, such as sequentially measuring one of the simple objects 71 Size (e.g. the inner diameter 74) several times to search Collect the initial distribution of the measurement interval 12 and send it to the calibration command analysis generator 73, and generate a humanized behavior test item according to the distribution of the measurement interval 12: the calibration module 7 can prompt and perform different levels of An incentive index 79 encourages the quality inspector to randomly select the simple object 71 within a given time (such as m seconds) and measure another size (such as the outer diameter 75) until the measurement result meets the preset designation Condition (for example, there are n measurement results greater than or less than d), the excitation index 79 such as i%, the collected measurement results include the number of times, the measurement data 11 and the measurement interval 12 are determined to be stored in a calibration parameter table 78 A calibration parameter 77 in the QC, the calibration parameter 77 responds to the behavior of the quality inspector to reflect the calibration command, and the calibration parameter 77 can be used as an input to an observation probability matrix 812 of a hidden Markov model 81 [the hidden Markov Please match the model 81 and the observation probability matrix 812 with the tenth figure]. For example, the calibration parameter table 78 can be filled with A, B, C, D... in the column headings, and 6, 7... in the column headings. Each of the correction parameters 77 in the correction parameter table 78 corresponds to A single column heading and a single column heading. The A6 column corresponds to the column heading A and the column heading 6. The value of the A6 column of the calibration parameter table 78 is 0.996, the value of the B6 column is 0.985, and the value of the C6 column is The value is 0.95, the value of the D6 column is 0.1, and these values are used as the calibration parameter 77. In actual implementation, the calibration parameter 77 and the calibration parameter table 78 are not limited to this.

該校正指令分析產生器73的運算判斷過程進一步舉例說明如下： The calculation and judgment process of the correction instruction analysis generator 73 is further illustrated as follows:

1.接收到初始之該量測間隔12後，形成總體機率密度函數p(t)，假設p是常態分布(μ,σ)。 1. After receiving the initial measurement interval 12, the overall probability density function p(t) is formed, assuming that p is a normal distribution (μ, σ).

2.逆分布

，其中，f為頻率機率密度函數，b為t的倒數。 2. Inverse distribution

, Where f is the probability density function of frequency, and b is the reciprocal of t.

3.給定該量測間隔12，例如m秒，則新的分布會變成fm(m μ,m σ)。 3. Given the measurement interval of 12, such as m seconds, the new distribution will become fm (m μ, m σ).

4.設計概率極低的測試為：在給定的m秒內，完成若干次的試驗以獲得至少n次小於d的結果。檢查q=3時，n+q的結果是否低於該激勵指數79，例如i%。然後計算累積分布函數P而min m=P(0.95|(n+q)μ,(n+q)σ)。 4. The test with extremely low design probability is: within a given m seconds, complete several tests to obtain at least n results less than d. Check whether the result of n+q is lower than the incentive index 79 when q=3, such as i%. Then calculate the cumulative distribution function P and min m=P(0.95|(n+q)μ,(n+q)σ).

5.舉例來說，若五個尺寸依序為v、w、x、y及z，且w<d<x，假設n>5為通常質檢批次檢測數，故n=6，則完成的機率如下： 5. For example, if the five dimensions are v, w, x, y, and z in sequence, and w<d<x, assuming that n>5 is the number of inspections in the usual quality inspection batch, so n=6, then complete The probability of is as follows:

(1)n=6,q=0：

。這是指連續6次取得尺寸小於d的物件。 (1)n=6,q=0:

. This refers to obtaining an object with a size smaller than d for 6 consecutive times.

(2)n=6,q=1：

。 (2)n=6,q=1:

.

(3)n=6,q=2：

3.10%。 (3) n=6, q=2:

3.10%.

(4)n=6,q=3：

9.91%。 (4) n=6, q=3:

9.91%.

(5)根據以上計算結果，在這個案例中n=6，q=0，與q=1、2，或試驗在時間內超過9次結束時，結果會被判定為可疑，此時該校正參數77分別參照該校正參數表78的A6欄位、B6欄位及C6欄位，其餘結果的該校正參數77則參照該校正參數表78的D6欄位。 (5) According to the above calculation results, in this case n=6, q=0, and q=1, 2, or when the test ends more than 9 times in the time, the result will be judged as suspicious. At this time, the calibration parameter 77 respectively refer to the A6 column, B6 column and C6 column of the calibration parameter table 78, and the calibration parameter 77 of the remaining results refer to the D6 column of the calibration parameter table 78.

6.再以n=7的另一個案例做說明，同樣w<d<x，則完成的機率如下： 6. Let's take another case where n=7 to illustrate, the same w<d<x, then the probability of completion is as follows:

(1)n=7,q=0：

。這是指連續7次取得尺寸小於d的物件。 (1)n=7,q=0:

. This refers to obtaining an object with a size smaller than d 7 times in a row.

(2)n=7,q=1：

。 (2) n=7, q=1:

.

(3)n=7,q=2：

。 (3) n=7, q=2:

.

(4)n=7,q=3：

。 (4) n=7, q=3:

.

(5)根據以上計算結果，在這個案例中，n=7，q=0、與q=1、2、3，或試驗在時間內超過10次結束時，結果會被判定為可疑。但由於在本案例中，檢查q=3的結果並不大於該激勵指數79(本案例中設為5%)，故在該激勵指數79下不產生該校正指令。 (5) According to the above calculation results, in this case, n=7, q=0, and q=1, 2, 3, or when the test ends more than 10 times in a time, the result will be judged as suspicious. However, since in this case, the result of checking q=3 is not greater than the incentive index 79 (in this case, it is set to 5%), so the correction instruction is not generated under the incentive index 79.

7.再以n=7的另一個案例做說明，此時x<d<y，則完成的機率如下： 7. Let's explain another case with n=7. At this time, x<d<y, the probability of completion is as follows:

(1)n=7,q=0：

。這是指連續7次取得尺寸小於d的物件。 (1)n=7,q=0:

(2)n=7,q=1：

。 (2) n=7, q=1:

.

(3)n=7,q=2：

12.54%。 (3) n=7, q=2:

12.54%.

(4)n=7,q=3：

15.05%。 (4) n=7, q=3:

15.05%.

(5)根據以上計算結果，在這個案例中，完成測試的可能性更高，而不會被判定為可疑。除試驗在時間內超過10次結束時該校正參數77設為C7=0.95，其餘的該校正參數77為A7=B7=D7=0.1。 (5) According to the above calculation results, in this case, the possibility of completing the test is higher, and will not be judged as suspicious. Except that the calibration parameter 77 is set to C7=0.95 when the test is over 10 times in the time, the remaining calibration parameters 77 are A7=B7=D7=0.1.

在某些狀況下，如認為需再次進行校正以確認該質檢者行為，可改變該激勵指數79產生新的該校正指令以再次進行。 In some situations, if it is deemed necessary to perform calibration again to confirm the behavior of the quality inspector, the incentive index 79 may be changed to generate a new calibration instruction to perform the calibration again.

請參閱第七圖、第十圖及第十一圖，並請搭配第一圖、第五圖及第六圖，該詐欺數據推定模組8訊號連接該校正模組7、該量測間隔異常偵測模組6、該量測數據分類模組5及該儲存模組2。當發現該量測間隔12為該可疑間隔時，該量測間隔異常偵測模組6與該詐欺數據推定模組8溝通，使該量測數據分類模組5啟動該分類模型51，若該最適特徵值集合52也被分類於該可疑數據分布，則該詐欺數據推定模組8就該最適特徵值集合52對應之已儲存的該量測間隔12以該無監督學習模型61轉換輸出一觀測序列82，進而建立該隱藏式馬可夫模型81。 Please refer to the seventh, tenth and eleventh figures, and please match the first, fifth and sixth figures. The fraud data estimation module 8 is connected to the correction module 7. The measurement interval is abnormal The detection module 6, the measurement data classification module 5, and the storage module 2. When the measurement interval 12 is found to be the suspicious interval, the measurement interval anomaly detection module 6 communicates with the fraud data estimation module 8 so that the measurement data is divided The class module 5 activates the classification model 51. If the optimal feature value set 52 is also classified in the suspicious data distribution, the fraud data inference module 8 determines the stored measurement interval corresponding to the optimal feature value set 52 12 Convert and output an observation sequence 82 with the unsupervised learning model 61, and then establish the hidden Markov model 81.

該隱藏式馬可夫模型81(λ)包含一狀態轉移矩陣811(A)、該觀測概率矩陣812(B)及一初始狀態813(Π)，通常表示為λ=(A，B，Π)。該狀態轉移矩陣811可以為2D矩陣，該狀態轉移矩陣811的列為當前狀態，行則為下一狀態，用以表示每次與下次抽樣的量測概率結果，例如

而A[1,2]=0.7代表本次抽樣樣本為OK良品而下次為NG不良品的概率為0.7，由於抽樣結果為獨立事件而且兩次抽樣結果互斥，因此每一列的總和為1。該觀測概率矩陣812也可以為2D矩陣，該觀測概率矩陣812的列為狀態，行則為觀測，例如

而B[2,2]=0.9代表抽樣樣本為NG不良品而觀測到該可疑間隔的概率為0.9。 The hidden Markov model 81 (λ) includes a state transition matrix 811 (A), the observation probability matrix 812 (B), and an initial state 813 (Π), usually expressed as λ=(A, B, Π). The state transition matrix 811 may be a 2D matrix. The column of the state transition matrix 811 is the current state, and the row is the next state, which is used to represent the measurement probability result of each and next sampling, for example,

And A[1,2]=0.7 means that the probability that this sample is OK good and NG next time is 0.7. Since the sampling result is an independent event and the two sampling results are mutually exclusive, the sum of each column is 1 . The observation probability matrix 812 may also be a 2D matrix. The columns of the observation probability matrix 812 are states, and the rows are observations, for example

And B[2,2]=0.9 represents that the sampling sample is NG defective and the probability of observing the suspicious interval is 0.9.

請參閱第七圖、第十圖及第十一圖，並請搭配第三圖，可以就該觀測序列82所對應的該歷史量測數據z-score分布32、該z-score下限值37及該z-score上限值38計算良率(OK，NG)例如

做為該初始狀態813，並複製擴展於各列做為該狀態轉移矩陣811，例如

，以及利用該校正模組7所得的該校正參數77建立對應的該觀測概率矩陣812，以該校正參數表78的B6欄位舉例，則

，其中OK列在本例中可預設為

以建立該隱藏式馬可夫模型81(λ)後再以維特比(Viterbi)演算法運算該觀測序列82的一可能抽樣OK/NG集合821計算良率。再以相同之該歷史量測數據z-score分布32的良率做為該初始狀態813(Π)而利用Baum-Welch演算法所得到的該隱藏式馬可夫模型81對應的良率與該初始狀態813做比較，決定所使用的最可能之該隱藏式馬可夫模型81及所產生的詐欺數據。藉由該校正參數77，增加Baum-Welch演算法以外建立該隱藏式馬可夫模型81的參考選項，提高該檢測數據詐欺裝置100[該檢測數據詐欺裝置100請搭配第一圖]的穩定性。 Please refer to the seventh, tenth and eleventh figure, and please match the third figure, you can find the z-score distribution of the historical measurement data corresponding to the observation sequence 82 and the lower limit of the z-score 37 And the z-score upper limit 38 to calculate the yield (OK, NG). For example

As the initial state 813, and replicate and expand in each column as the state transition matrix 811, for example

, And use the correction parameter 77 obtained by the correction module 7 to establish the corresponding observation probability matrix 812. Taking the B6 column of the correction parameter table 78 as an example, then

, Where the OK column can be preset as

After establishing the hidden Markov model 81 (λ), the Viterbi algorithm is used to calculate a possible sample OK/NG set 821 of the observation sequence 82 to calculate the yield. Then use the same yield rate of the historical measurement data z-score distribution 32 as the initial state 813(Π), and use the Baum-Welch algorithm to obtain the yield rate corresponding to the hidden Markov model 81 and the initial state 813 makes a comparison to determine the most likely hidden Markov model 81 used and the fraud data generated. With the calibration parameter 77, a reference option for establishing the hidden Markov model 81 other than the Baum-Welch algorithm is added to improve the stability of the data fraud detection device 100 [the detection data fraud device 100 please match the first figure].

復請參閱第一圖及第二圖，藉由該檢測數據詐欺裝置100，可以遠距即時分析質檢之該量測數據11的有效性，避免人為偽造數據而詐欺，同時即時反映真實的品質水平，提早發現可能的品質問題及減少重複質檢所需資源、耗費成本。 Please refer to the first and second figures again. With the detection data fraud device 100, the validity of the measurement data 11 of the quality inspection can be analyzed in real time from a distance, avoiding fraud by artificially falsifying data, and reflecting the real quality in real time. Level, early detection of possible quality problems and reduction of resources and costs for repeated quality inspections.

綜合上述實施例之說明，當可充分瞭解本發明之操作、使用及本發明產生之功效，惟以上所述實施例僅係為本發明之較佳實施例，當不能以此限定本發明實施之範圍，即依本發明申請專利範圍及發明說明內容所作簡單的等效變化與修飾，皆屬本發明涵蓋之範圍內。 Based on the description of the above-mentioned embodiments, when one can fully understand the operation and use of the present invention and the effects of the present invention, the above-mentioned embodiments are only the preferred embodiments of the present invention, and the implementation of the present invention cannot be limited by this. The scope, that is, simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the content of the description of the invention, are all within the scope of the present invention.

100:檢測數據詐欺裝置 100: Detect data fraud device

1:量測數據擷取模組 1: Measurement data acquisition module

2:儲存模組 2: Storage module

3:量測數據處理模組 3: Measurement data processing module

4:模擬數據產生模組 4: Analog data generation module

5:量測數據分類模組 5: Measurement data classification module

7:校正模組 7: Calibration module

8:詐欺數據推定模組 8: Fraud data presumption module

Claims

A device for detecting data fraud is used to detect a measurement behavior of a measurement device. The device for detecting data fraud includes: a measurement data acquisition module that reads a measurement data measured by the measurement device; and A measurement interval; a storage module, the signal is connected to the measurement data acquisition module, the storage module stores the measurement data and the measurement interval; a measurement data processing module, the signal is connected to the storage module , The measurement data processing module calculates the z-score distribution and obtains a set of characteristic values according to a measurement data distribution of the measurement data and a corresponding specification value; an analog data generation module is connected to the signal A measurement data processing module that generates a simulated normal data and a simulated fraud data distribution, and obtains the simulated normal data and a simulated feature value set of the simulated fraud data; a measurement data classification module , The signal is connected to the measurement data processing module and the simulation data generation module, and the measurement data classification module performs training and feature selection operations according to the feature value set or the simulation feature value set to establish a classification model to classify Whether the measurement data distribution is a suspicious data distribution; a measurement interval anomaly detection module, the signal is connected to the storage module, the measurement interval anomaly detection module establishes an unsupervised learning model based on the measurement interval, To determine whether the measurement interval is a suspicious interval; a fraud data inference module, the signal is connected to the measurement interval anomaly detection module, the measurement data classification module and the storage module, when the unsupervised learning model When determining that the measurement interval is the suspicious interval, and the classification model classifies the measurement data distribution into the suspicious data distribution, the fraud data estimation module uses a hidden Markov model to infer the corresponding fraud data; and a correction model Group, the signal is connected to the measurement data acquisition module and the fraud data estimation module, the correction module The group includes a simple object, the calibration module cooperates with the simple object to obtain a calibration parameter, the hidden Markov model is established by the calibration parameter of the calibration module and the Baum-Welch algorithm, and is compared with the yield of an initial condition It is obtained by comparison, and the presumption of fraud data is obtained by the hidden Markov model using the Viterbi algorithm.

For example, the data fraud detection device of claim 1, further, the calibration module gives a calibration command, and the calibration module determines the calibration parameter according to the result of executing the calibration command on the simple object.

For example, the device for detecting data fraud in claim 1, wherein the simulated data generating module generates the simulated fraud data by using random numbers.

For example, the device for detecting data fraud in claim 1, wherein the measurement data acquisition module uses OCR technology to read the measurement data.

For example, the device for detecting data fraud in claim 1, wherein the measurement data classification module uses a support vector machine and/or random forest algorithm to establish the classification model.

For example, the data fraud detection device of claim 1, wherein the measurement interval anomaly detection module uses an isolation forest and/or z-score algorithm to establish the unsupervised learning model.

For example, the detection data fraud device of claim 1, wherein the measurement data classification module uses a self-service sampling method to obtain the feature value set and the simulated feature value set.

For example, the data fraud detection device of claim 1, wherein the measurement data processing module performs z-score distribution conversion on the measurement data distribution and the specification value.

A method for detecting data fraud, including: using a measurement data acquisition module to read a measurement data and a measurement interval measured by a measurement device; using a storage module to store the measurement data and the amount For the measurement interval, a measurement data processing module is used to calculate a feature value set based on a measurement data distribution of the measurement data and a corresponding specification value; Use a measurement data classification module to perform training and feature selection operations based on the feature value set, or a simulated feature value set that simulates normal data and a simulated fraud data, and establishes a classification model to classify whether the measurement data is distributed Is a suspicious data distribution; a measurement interval anomaly detection module is used to establish an unsupervised learning model based on the measurement interval to determine whether the measurement interval is a suspicious interval; and a calibration module is used in conjunction with a simple The object obtains a calibration parameter, which is established by the calibration parameter of the calibration module and the Baum-Welch algorithm and compared with the yield of an initial condition to obtain a hidden Markov model; when the unsupervised learning model determines the measurement interval When it is the suspicious interval and the classification model classifies the measurement data distribution into the suspicious data distribution, a fraud data estimation module uses the hidden Markov model to infer the corresponding fraud data using the Viterbi algorithm.

For example, the method for detecting data fraud in claim 9, further, the calibration module gives a calibration command, and the calibration module determines the calibration parameter according to the result of executing the calibration command on the simple object.

For example, the method for detecting data fraud in claim 9, wherein the simulated normal data and the simulated fraud data are historical measurement data and a historical measurement interval stored in the storage module, or are generated by an analog data generation module Generate the simulated normal data and the distribution of the simulated fraud data.

For example, the method for detecting data fraud in claim 9, wherein the measurement data acquisition module uses OCR technology to read the measurement data.

For example, the method for detecting data fraud in claim 9, wherein the measurement data classification module uses a support vector machine and/or random forest algorithm to establish the classification model.

For example, the method for detecting data fraud in claim 9, wherein the measurement interval anomaly detection module uses an isolation forest and/or z-score algorithm to establish the unsupervised learning model.

For example, the method for detecting data fraud in claim 9, wherein the measurement data classification module uses a self-service sampling method to obtain the feature value set and the simulated feature value set.

For example, the method for detecting data fraud in claim 9, wherein the measurement data processing module performs z-score distribution conversion on the measurement data distribution and the specification value.

A program product used to execute a data fraud detection method such as any one of claim 9 to claim 16 after being loaded into a computer device.

A computer-readable medium is used to execute the method for detecting data fraud such as any one of claim 9 to 16 after loading a computer device.