TWI839813B

TWI839813B - Electronic computing device for verifying user identity, update method of discriminant model thereof and computer program product

Info

Publication number: TWI839813B
Application number: TW111130919A
Authority: TW
Inventors: 梁德容
Original assignee: 國立中央大學
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2024-04-21
Also published as: TW202409863A

Abstract

An electronic computing device for verifying user identity, an update method of a discriminant model thereof, and a computer program product are disclosed herein. The electronic computing device simplifies the previously training data set according to a clustering-based pooling algorithm and add new training data into the previously training data set, and thereby produce an updated training data set. In addition, the electronic computing device also executes the updated discriminant model to identify whether a user is a legitimate user according to at least one piece of behavior data corresponding to the behavior characteristics of the user.

Description

Electronic computing device for verifying user identity, updating method of its discrimination model and computer program product

本揭露是關於一種使用判別模型驗證使用者身份的電子計算裝置、其判別模型之更新方法及電腦程式產品。更具體而言，本揭露是關於一種基於分群池化策略更新訓練資料集以對判別模型進行再訓練之電子計算裝置、判別模型之更新方法及電腦程式產品。The present disclosure relates to an electronic computing device that uses a discriminant model to verify the identity of a user, a method for updating the discriminant model, and a computer program product. More specifically, the present disclosure relates to an electronic computing device that updates a training data set based on a clustering pooling strategy to retrain the discriminant model, a method for updating the discriminant model, and a computer program product.

在現行為了維護電子計算裝置的資訊安全而驗證使用者身份的模式中，已存在兩階段的驗證模式。第一階段可針對使用者進行基於例如生物特徵、字符密碼、滑動圖形等資訊的驗證模式，以識別使用者的初步合法性。然而，因所述第一階段驗證所要求使用者提供的資訊有遭竊取並且冒用之可能性，故第二階段可進一步基於使用者實際操作電子計算裝置時的行為特徵而嘗試識別該使用者的身份，並判斷是否符合要求。所述第二階段的驗證模式可為例如中華民國第I512550號發明專利（發明名稱為「識別行動裝置之使用者之方法及模組、電腦程式產品」）所記載之使用者識別模式。In the current mode of verifying the identity of users in order to maintain the information security of electronic computing devices, there is already a two-stage verification mode. In the first stage, a verification mode based on information such as biometrics, character passwords, sliding graphics, etc. can be performed on the user to identify the initial legitimacy of the user. However, because the information required by the user to provide in the first stage of verification may be stolen and misused, the second stage can further attempt to identify the identity of the user based on the behavioral characteristics of the user when actually operating the electronic computing device, and determine whether the requirements are met. The verification mode of the second stage can be, for example, the user identification mode described in the Republic of China Patent No. I512550 (the invention name is "Method and module for identifying users of mobile devices, computer program product").

上述第二階段的驗證可透過以特定使用者（例如：應被視為合法或應被視為非法之使用者）既有的操作資料來訓練基於機器學習架構的一判別模型，以利用該判別模型來針對往後的使用者行為特徵資料進行判別。然而，即便是同一位使用者，在不同的使用情境時也可能出現不同的操作行為，故隨著時間推移，使用者的行為特徵資料也將逐漸趨於雜亂且不可靠。由於該判別模型無從適應使用者的使用行為改變，這將導致該判別模型的錯誤率逐漸提升。因此，顯然有對該判別模型進行再訓練（re-training）之必要。The second stage of verification can be achieved by training a discriminant model based on a machine learning framework with existing operation data of a specific user (for example, a user who should be considered legal or illegal), so that the discriminant model can be used to make judgments on future user behavior feature data. However, even the same user may have different operation behaviors in different usage scenarios, so as time goes by, the user's behavior feature data will gradually become chaotic and unreliable. Since the discriminant model cannot adapt to changes in the user's usage behavior, this will cause the error rate of the discriminant model to gradually increase. Therefore, it is obviously necessary to re-train the discriminant model.

理論上，為了使該判別模型適應使用者的行為變化，應將該使用者在電子計算裝置使用期間的全部特徵資料作為訓練資料來進行再訓練。然而，此舉不僅將大幅增加訓練所需的時間，且也將大幅消耗電子計算裝置的運算資源。但倘若因此而直接減少訓練資料，反而會降低再訓練後的該判別模型的判別正確率。有鑑於上述情形，如何在維持判別正確率的前提下對模型進行快速的再訓練以應付使用者的操作行為改變，實為本領域亟需解決之技術問題。Theoretically, in order to make the discrimination model adapt to the user's behavioral changes, all the characteristic data of the user during the use of the electronic computing device should be used as training data for retraining. However, this will not only greatly increase the time required for training, but also consume a lot of computing resources of the electronic computing device. However, if the training data is directly reduced for this reason, the discrimination accuracy of the retrained discrimination model will be reduced. In view of the above situation, how to quickly retrain the model to cope with the changes in the user's operating behavior while maintaining the discrimination accuracy is indeed a technical problem that needs to be solved in this field.

為了至少解決上述問題，本發明提供一種驗證使用者身份的電子計算裝置。該電子計算裝置可包含一儲存器以及與該儲存器電性連接的一處理器。該儲存器可用以儲存一判別模型以及一訓練資料集，其中該訓練資料集包含複數筆訓練資料。該處理器可用以運行該判別模型，以根據至少一筆行為資料而判別一使用者是否為一合法使用者。該至少一筆行為資料可對應至該使用者的行為特徵。此外，該處理器還可用以根據基於分群的一池化演算法簡化一原先的訓練資料集，且將複數筆新增訓練資料加入該原先的訓練資料集，以產生該訓練資料集，並以該訓練資料集訓練一原先的判別模型，以產生該判別模型。In order to at least solve the above problems, the present invention provides an electronic computing device for verifying user identity. The electronic computing device may include a memory and a processor electrically connected to the memory. The memory may be used to store a discrimination model and a training data set, wherein the training data set includes a plurality of training data. The processor may be used to run the discrimination model to determine whether a user is a legitimate user based on at least one piece of behavioral data. The at least one piece of behavioral data may correspond to the behavioral characteristics of the user. In addition, the processor can also be used to simplify an original training data set according to a clustering-based pooling algorithm, and add a plurality of new training data to the original training data set to generate the training data set, and train an original discriminant model with the training data set to generate the discriminant model.

為了至少解決上述問題，本發明還提供一種用以更新一判別模型之方法。該判別模型可用以驗證一電子計算裝置之使用者身份。該方法可包含下列步驟：由該電子計算裝置根據基於分群的一池化演算法簡化一原先的訓練資料集，且將複數筆新增訓練資料加入該原先的訓練資料集，以產生一訓練資料集，其中該訓練資料集包含複數筆訓練資料；以及由該電子計算裝置以該訓練資料集訓練一原先的判別模型，以產生該判別模型。 In order to at least solve the above problems, the present invention also provides a method for updating a discriminant model. The discriminant model can be used to verify the identity of a user of an electronic computing device. The method may include the following steps: The electronic computing device simplifies an original training data set according to a pooling algorithm based on clustering, and adds a plurality of new training data to the original training data set to generate a training data set, wherein the training data set includes a plurality of training data; and The electronic computing device trains an original discriminant model with the training data set to generate the discriminant model.

為了至少解決上述問題，本發明還提供一種電腦程式產品。一電子計算裝置載入該電腦程式產品後可執行下列指令：根據基於分群的一池化演算法簡化一原先的訓練資料集，且將複數筆新增訓練資料加入該原先的訓練資料集，以產生一訓練資料集，其中該訓練資料集包含複數筆訓練資料；以及以該訓練資料集訓練一原先的判別模型，以產生用以驗證該電子計算裝置之使用者身份的一判別模型。 In order to at least solve the above problems, the present invention also provides a computer program product. After loading the computer program product into an electronic computing device, the following instructions can be executed: Simplifying an original training data set according to a pooling algorithm based on clustering, and adding a plurality of new training data to the original training data set to generate a training data set, wherein the training data set includes a plurality of training data; and Training an original discriminant model with the training data set to generate a discriminant model for verifying the identity of the user of the electronic computing device.

綜上所述，本揭露提供的驗證使用者身份的電子計算裝置、判別模型之更新方法及對應的電腦程式產品係基於分群池化的策略而減少訓練資料集的資料量，使得判別模型整體上的訓練資料輸入量不會因訓練週期的增加而相應地大幅增加。又因分群池化的策略在減少訓練資料量的同時仍確保了訓練資料的代表性，故不會讓判別模型於推論時的等錯誤率（Equal Error Rate，EER）顯著提升。據此，本揭露提供的驗證使用者身份的電子計算裝置、判別模型之更新方法及對應的電腦程式產品確實解決了本發明所屬技術領域中的上述問題。In summary, the electronic computing device for verifying user identity, the method for updating the discriminant model, and the corresponding computer program product provided by the present disclosure are based on the clustering and pooling strategy to reduce the amount of data in the training data set, so that the overall training data input amount of the discriminant model will not increase significantly due to the increase in the training cycle. Because the clustering and pooling strategy reduces the amount of training data while still ensuring the representativeness of the training data, the equal error rate (EER) of the discriminant model during inference will not be significantly improved. Accordingly, the electronic computing device for verifying user identity, the method for updating the discriminant model, and the corresponding computer program product provided by the present disclosure have indeed solved the above-mentioned problems in the technical field to which the present invention belongs.

發明內容整體地敘述了本發明的核心概念，並涵蓋了本發明可解決的問題、可採用的手段以及可達到的功效，以提供本發明所屬技術領域中具有通常知識者對本發明的基本理解。然而，應理解，發明內容並非有意概括本發明的所有實施例，而僅是以一簡單形式來呈現本發明的核心概念，以作為隨後詳細描述的一個引言。以下結合圖式闡述本發明之詳細技術及實施方式，俾使本發明所屬技術領域中具有通常知識者能理解所請求保護之發明之技術特徵。The content of the invention describes the core concept of the invention as a whole, and covers the problems that the invention can solve, the means that can be adopted, and the effects that can be achieved, so as to provide a basic understanding of the invention by those with ordinary knowledge in the technical field to which the invention belongs. However, it should be understood that the content of the invention is not intended to summarize all embodiments of the invention, but only presents the core concept of the invention in a simple form as an introduction to the subsequent detailed description. The following describes the detailed technology and implementation of the invention in conjunction with the drawings, so that those with ordinary knowledge in the technical field to which the invention belongs can understand the technical features of the invention for which protection is requested.

以下將透過實施方式來解釋本發明所提供之驗證使用者身份的電子計算裝置、判別模型之更新方法及對應的電腦程式產品。然而，該等實施方式並非用以限制本發明須在如該等實施方式所述之任何環境、應用或方式方能實施。因此，關於實施方式之說明僅為闡釋本發明之目的，而非用以限制本發明之範圍。應理解，在以下實施方式及圖式中，與本發明非直接相關之元件已省略而未繪示，且各元件之尺寸以及元件間之尺寸比例僅為例示而已，而非用以限制本發明之範圍。The following will explain the electronic computing device for verifying user identity, the method for updating the discrimination model, and the corresponding computer program product provided by the present invention through implementation methods. However, these implementation methods are not intended to limit the present invention to any environment, application, or method as described in these implementation methods. Therefore, the description of the implementation methods is only for the purpose of explaining the present invention, and is not intended to limit the scope of the present invention. It should be understood that in the following implementation methods and drawings, components that are not directly related to the present invention have been omitted and are not shown, and the size of each component and the size ratio between components are only for example, and are not intended to limit the scope of the present invention.

第1圖為描繪根據本發明的一或多個實施例的驗證使用者身份的電子計算裝置之示意圖。第1圖所示內容僅是為了說明本發明的實施例，而非為了限制本發明。FIG. 1 is a schematic diagram of an electronic computing device for verifying user identity according to one or more embodiments of the present invention. The content shown in FIG. 1 is only for illustrating an embodiment of the present invention, not for limiting the present invention.

參照第1圖，一電子計算裝置1基本上可包含一儲存器11以及一處理器12，且儲存器11可與處理器12電性連接。儲存器11與處理器12之間的電性連接可以是直接的（即沒有透過其他元件而彼此連接）或是間接的（即透過其他元件而彼此連接）。電子計算裝置1可以是各種類型之計算裝置，例如但不限於筆記型電腦、行動電話、可攜式電子配件（手錶、眼鏡等等）。當一使用者U1操作電子計算裝置1時，電子計算裝置1可根據使用者U1所提供的至少一筆行為資料（例如：行為資料BD1）而驗證使用者U1的身份。Referring to FIG. 1 , an electronic computing device 1 may basically include a memory 11 and a processor 12, and the memory 11 may be electrically connected to the processor 12. The electrical connection between the memory 11 and the processor 12 may be direct (i.e., not connected to each other through other components) or indirect (i.e., connected to each other through other components). The electronic computing device 1 may be a computing device of various types, such as but not limited to a laptop, a mobile phone, a portable electronic accessory (a watch, glasses, etc.). When a user U1 operates the electronic computing device 1, the electronic computing device 1 may verify the identity of the user U1 based on at least one piece of behavioral data (e.g., behavioral data BD1) provided by the user U1.

儲存器11可用以儲存電子計算裝置1所產生的資料、外部裝置傳入的資料、或使用者自行輸入的資料。例如，儲存器11可用以儲存一訓練資料集111以及一判別模型112。儲存器11可包含第一級記憶體（又稱主記憶體或內部記憶體），且處理器12可直接讀取儲存在第一級記憶體內的指令集，並在需要時執行這些指令集。儲存器11可選擇性地包含第二級記憶體（又稱外部記憶體或輔助記憶體），且此記憶體可透過資料緩衝器將儲存的資料傳送至第一級記憶體。舉例而言，第二級記憶體可以是但不限於：硬碟、光碟等。儲存器11可選擇性地包含第三級記憶體，亦即，可直接插入或自電腦拔除的儲存裝置，例如隨身硬碟。The memory 11 can be used to store data generated by the electronic computing device 1, data input from an external device, or data input by the user. For example, the memory 11 can be used to store a training data set 111 and a discriminant model 112. The memory 11 can include a first-level memory (also known as a main memory or an internal memory), and the processor 12 can directly read the instruction set stored in the first-level memory and execute these instruction sets when necessary. The memory 11 can optionally include a second-level memory (also known as an external memory or an auxiliary memory), and this memory can transfer the stored data to the first-level memory through a data buffer. For example, the second level memory may be, but is not limited to, a hard disk, an optical disk, etc. The storage device 11 may optionally include a third level memory, that is, a storage device that can be directly inserted into or removed from the computer, such as a portable hard disk.

處理器12可以是具備訊號處理功能的微處理器（microprocessor）或微控制器（microcontroller）等。微處理器或微控制器是一種可程式化的特殊積體電路，其具有運算、儲存、輸出／輸入等能力，且可接受並處理各種編碼指令，藉以進行各種邏輯運算與算術運算，並輸出相應的運算結果。處理器12可被編程以解釋各種指令，以處理電子計算裝置1中的資料並執行各項運算程序或程式。The processor 12 may be a microprocessor or microcontroller with signal processing functions. A microprocessor or microcontroller is a special programmable integrated circuit that has the capabilities of computing, storage, input/output, etc., and can accept and process various coded instructions to perform various logical operations and arithmetic operations and output corresponding operation results. The processor 12 can be programmed to interpret various instructions to process data in the electronic computing device 1 and execute various operation procedures or programs.

在某些實施例中，電子計算裝置1還可包含一輸入元件13以及一輸出元件14，且輸入元件13及輸出元件14皆可與儲存器11及處理器12電性連接。輸入元件13可用以自外部裝置或使用者接收特定資訊，而輸出元件14可用以向外部裝置或使用者輸出特定資訊。舉例而言，輸入元件13可為一觸控板、一鍵盤、一陀螺儀、一高度儀等可由使用者U1進行操作並產生相應資料的元件或其組合，輸出元件14則可為一顯示器、一揚聲器、一震動器等元件或其組合。在某些實施例中，輸入元件13與輸出元件14可進一步被整合為一觸控顯示器。In some embodiments, the electronic computing device 1 may further include an input element 13 and an output element 14, and both the input element 13 and the output element 14 may be electrically connected to the memory 11 and the processor 12. The input element 13 may be used to receive specific information from an external device or a user, and the output element 14 may be used to output specific information to an external device or a user. For example, the input element 13 may be a touch panel, a keyboard, a gyroscope, an altimeter, or other element or combination thereof that can be operated by the user U1 and generate corresponding data, and the output element 14 may be a display, a speaker, a vibrator, or other element or combination thereof. In some embodiments, the input element 13 and the output element 14 may be further integrated into a touch display.

第2圖為描繪根據本發明的一或多個實施例的判別模型再訓練及驗證使用者身份之流程圖。第2圖所示內容僅是為了說明本發明的實施例，而非為了限制本發明。FIG. 2 is a flow chart describing the discriminant model retraining and user identity verification according to one or more embodiments of the present invention. The content shown in FIG. 2 is only for illustrating the embodiments of the present invention, and is not intended to limit the present invention.

同時參照第1圖及第2圖，電子計算裝置1進行判別模型的再訓練以及使用者身份驗證可如一流程2所示。首先，為了讓判別模型112習得合法使用者及／或非法使用者的行為特徵，電子計算裝置1的處理器12可先蒐集（例如：透過輸入元件13）相應類型的使用者的行為資料（即，動作201）。Referring to FIG. 1 and FIG. 2 at the same time, the electronic computing device 1 may perform retraining of the discrimination model and user identity verification as shown in a process 2. First, in order to allow the discrimination model 112 to learn the behavioral characteristics of a legitimate user and/or an illegitimate user, the processor 12 of the electronic computing device 1 may first collect (e.g., through the input element 13) behavioral data of the corresponding type of user (i.e., action 201).

所述行為資料可為電子計算裝置1在被使用者操作時，其中的特定元件因應使用者的操作行為而產生的資料。舉例而言，所述行為資料可為輸入元件13（例如：前述的觸控板、觸控顯示器、鍵盤、陀螺儀、高度儀等等）因應使用者的操作行為所產生的各種資料。The behavior data may be data generated by a specific component of the electronic computing device 1 in response to the user's operation behavior when the electronic computing device 1 is operated by the user. For example, the behavior data may be various data generated by the input component 13 (e.g., the aforementioned touch panel, touch display, keyboard, gyroscope, altimeter, etc.) in response to the user's operation behavior.

在蒐集到足夠多的行為資料之後，處理器12可對所蒐集到的行為資料進行預處理（即，動作202），以將其轉換為對應的特徵資料。舉例而言，處理器12可在所蒐集到的行為資料中針對例如（但不限於）觸控軌跡、在螢幕上滑動時的起點／終點／速度／加速度／時長、按壓力度、裝置傾斜角度、裝置高度、裝置移動的速度／加速度／角速度等項目進行特徵提取，並將各特徵進一步轉換為直方圖（histogram）形式的訓練資料。由於直方圖中的每個特徵是由具有不同大小的多種直條來呈現使用者行為資料的特徵的分布情形，這使得處理器12在提取特徵時更能抵抗資料中的雜訊（noises）及／或干擾。此外，直方圖形式的訓練資料也更有利於判別模型112學習區分合法使用者與非法使用者的效率。After collecting enough behavior data, the processor 12 may pre-process the collected behavior data (i.e., action 202) to convert it into corresponding feature data. For example, the processor 12 may extract features from the collected behavior data for items such as (but not limited to) touch trajectory, starting point/end point/speed/acceleration/duration when sliding on the screen, pressure, device tilt angle, device height, speed/acceleration/angular velocity of device movement, etc., and further convert each feature into training data in the form of a histogram. Since each feature in the histogram is represented by multiple bars of different sizes to show the distribution of the features of the user behavior data, the processor 12 is more resistant to noise and/or interference in the data when extracting features. In addition, the training data in the form of histograms is also more conducive to the efficiency of the discriminant model 112 in learning to distinguish between legitimate users and illegal users.

由處理器12所轉換出的各筆直方圖形式的訓練資料可形成訓練資料集111並被儲存在儲存器11中。在某些實施例中，所述直方圖形式的訓練資料可以是以例如（但不限於）「.csv」的格式而被儲存於儲存器11中。The training data in the form of histograms converted by the processor 12 may form a training data set 111 and be stored in the memory 11. In some embodiments, the training data in the form of histograms may be stored in the memory 11 in a format such as (but not limited to) ".csv".

完成預處理後，處理器12可接著以訓練資料集111中的訓練資料來訓練判別模型112（即，動作203）。在某些實施例中，判別模型112可以是基於線性判別分析（Linear Discriminant Analysis，LDA）來進行資料分類的一機器學習模型。所述線性判別分析是基於共變異數矩陣（covariance matrix）進行分類運算，例如計算類內變異數（within-class variance）。換言之，處理器12在判別模型112的再訓練階段或者完成再訓練後實際進行使用者身份驗證的階段中，皆可以是基於共變異數矩陣而計算類內變異數。After completing the preprocessing, the processor 12 may then train the discriminant model 112 with the training data in the training data set 111 (i.e., action 203). In some embodiments, the discriminant model 112 may be a machine learning model that performs data classification based on Linear Discriminant Analysis (LDA). The linear discriminant analysis performs classification operations based on a covariance matrix, such as calculating within-class variance. In other words, the processor 12 may calculate within-class variance based on a covariance matrix during the retraining phase of the discriminant model 112 or during the actual user authentication phase after the retraining is completed.

然而，在行為資料／特徵資料的數量龐大的情況下，共變異數矩陣的運算量也將隨之變得龐大且耗時。為了解決此問題，在某些實施例中，本揭露的判別模型112可改採取一種輕量化的線性判別分析模式。第3圖為描繪根據本發明的一或多個實施例的直方圖形式的特徵資料的相關性的示意圖。第3圖所示內容僅是為了說明本發明的實施例，而非為了限制本發明。參照第3圖，由於直方圖形式的特徵資料中，相關性最高者皆在對角矩陣的位置上（如圖中的左斜線所示），故在上述輕量化之線性判別分析模式中，為了節省運算資源進而縮短線性判別分析的訓練／處理時間，處理器12可選擇僅針對對角線上的特徵資料進行處理，而非整個共變異數矩陣。因此，在某些實施例中，處理器12在訓練判別模型112以及完成再訓練後實際驗證使用者身份時，皆可以是基於對角共變異數矩陣（diagonal covariance matrix）來計算類內變異數。However, when the amount of behavioral data/feature data is large, the amount of computation of the covariance matrix will also become large and time-consuming. To solve this problem, in some embodiments, the discriminant model 112 of the present disclosure may adopt a lightweight linear discriminant analysis mode. FIG. 3 is a schematic diagram depicting the correlation of feature data in the form of a histogram according to one or more embodiments of the present invention. The content shown in FIG. 3 is only for illustrating the embodiments of the present invention, and is not intended to limit the present invention. Referring to FIG. 3 , since the feature data in the form of a histogram have the highest correlation at the diagonal matrix position (as indicated by the left slash in the figure), in the above-mentioned lightweight linear discriminant analysis mode, in order to save computing resources and thus shorten the training/processing time of the linear discriminant analysis, the processor 12 may choose to process only the feature data on the diagonal line, rather than the entire covariance matrix. Therefore, in some embodiments, the processor 12 may calculate the intra-class variance based on the diagonal covariance matrix when training the discriminant model 112 and actually verifying the identity of the user after completing the retraining.

重新參照第1圖及第2圖，由於訓練資料將隨時間累積，故有必要針對訓練資料進行池化，以提升訓練效率。具體而言，處理器12可採用基於分群的池化演算法來減少訓練資料集111中的資料量。所述基於分群的池化演算法可如下所示：（1）針對當前訓練週期的訓練資料集，設定特定數量的群心。（2）將此訓練資料集的訓練資料分配給最近的群心。（3）將同一個群集中的每筆訓練資料的位置加總後平均，得到此群集重新計算的平均值，並將該平均值指派為新的群心。（4）重複執行步驟（2）和（3），直到各群心達到收斂。（5）將各群心取代各自群集中的訓練資料，以更新訓練資料集。 Referring again to FIG. 1 and FIG. 2, since the training data will accumulate over time, it is necessary to pool the training data to improve the training efficiency. Specifically, the processor 12 can use a clustering-based pooling algorithm to reduce the amount of data in the training data set 111. The clustering-based pooling algorithm can be as follows: (1) For the training data set of the current training cycle, set a specific number of cluster centers. (2) Assign the training data of this training data set to the nearest cluster center. (3) Sum and average the positions of each training data in the same cluster to obtain the recalculated average value of this cluster, and assign the average value as the new cluster center. (4) Repeat steps (2) and (3) until each cluster center reaches convergence. (5) Replace the training data in each cluster with the cluster center to update the training data set.

處理器12執行上述基於分群的池化演算法的具體細節詳述如下。首先，處理器12可在訓練資料集111中決定複數個群心，並根據該複數個群心而將訓練資料集111中的該複數筆訓練資料分成複數個群集。The processor 12 executes the above-mentioned clustering-based pooling algorithm in detail as follows: First, the processor 12 can determine a plurality of cluster centers in the training data set 111, and divide the plurality of training data in the training data set 111 into a plurality of clusters according to the plurality of cluster centers.

舉例而言，若處理器12決定了30個群心，則可相應地將訓練資料集111的資料分成30個群集。具體的分群方式可以是針對每一筆訓練資料，計算該訓練資料中與每個群心之間的差距，並將各訓練資料與差距最小的群心相關聯（亦即，分配給差距最小的群心所屬的群集）。所述差距可為例如但不限於一歐幾里得距離。For example, if the processor 12 determines 30 cluster centers, the data in the training data set 111 may be divided into 30 clusters accordingly. A specific clustering method may be to calculate the distance between each training data and each cluster center, and associate each training data with the cluster center with the smallest distance (i.e., assign it to the cluster to which the cluster center with the smallest distance belongs). The distance may be, for example, but not limited to, a Euclidean distance.

形成該複數個群集之後，針對各群集，處理器12可將當中所有的訓練資料加總後平均，進而獲得該群集的一平均資料。處理器12可將各群集中計算出的該平均資料用作該群集的代表性資料（即，動作204）。After forming the plurality of clusters, the processor 12 may sum up and average all the training data in each cluster to obtain an average data of the cluster. The processor 12 may use the average data calculated in each cluster as the representative data of the cluster (ie, action 204).

在某些實施例中，關於動作204，在計算出各群集的該平均資料之後，處理器12可改為先將各平均資料用作對應群集的新的群心，並且反覆執行上述「計算差距進行分群」、「加總平均計算各群平均資料」以及「以平均資料更新群心」的步驟，直到下一次計算出的平均資料與當前的群心相同（表示群心不再改變，換言之，新的群心呈現收斂），並以最後計算出的各平均資料用作各群集的代表性資料。In some embodiments, regarding action 204, after calculating the average data of each cluster, the processor 12 may instead use each average data as the new group center of the corresponding cluster, and repeatedly execute the above-mentioned steps of "calculating the gap for grouping", "adding the average to calculate the average data of each group", and "updating the group center with the average data" until the next calculated average data is the same as the current group center (indicating that the group center no longer changes, in other words, the new group center shows convergence), and the last calculated average data is used as the representative data of each cluster.

隨後，處理器12便可將最後計算出的各平均資料用以取代對應的各群集中的其他訓練資料，藉此簡化訓練資料集111。除了簡化訓練資料集111之外，處理器12可將下一訓練週期所欲新增的訓練資料加入當前的訓練資料集111，以更新訓練資料集111（即，動作205）。接著，處理器12便可以更新後的訓練資料集111對判別模型112進行再訓練（即，再次進行動作203）。Subsequently, the processor 12 can use the finally calculated average data to replace other training data in the corresponding clusters, thereby simplifying the training data set 111. In addition to simplifying the training data set 111, the processor 12 can add the training data to be added in the next training cycle to the current training data set 111 to update the training data set 111 (i.e., action 205). Then, the processor 12 can retrain the discriminant model 112 with the updated training data set 111 (i.e., perform action 203 again).

經過本揭露的分群池化策略處理後的訓練資料集111的資料數量可如下方表一所示，同一訓練資料集未經分群池化策略處理的資料數量則可如下方表二所示：訓練週期新增訓練資料池化訓練資料總訓練資料 1 48 0 48 2 48 30 78 3 48 30 78 4 48 30 78 5 48 30 78 6 48 30 78 7 48 30 78 ＜表一＞訓練週期現有訓練資料新增訓練資料總訓練資料 1 48 0 48 2 48 48 96 3 48 48 144 4 48 48 192 5 48 48 240 6 48 48 288 7 48 48 336 ＜表二＞ The amount of data in the training data set 111 after being processed by the clustering and pooling strategy disclosed in the present invention can be shown in Table 1 below, and the amount of data in the same training data set without being processed by the clustering and pooling strategy can be shown in Table 2 below: Training cycle Add training data Pooling training data Total training information 1 48 0 48 2 48 30 78 3 48 30 78 4 48 30 78 5 48 30 78 6 48 30 78 7 48 30 78 ＜Table 1＞ Training cycle Current training data Add training data Total training information 1 48 0 48 2 48 48 96 3 48 48 144 4 48 48 192 5 48 48 240 6 48 48 288 7 48 48 336 ＜Table 2＞

自表一的例子中可看出，處理器12可針對判別模型112進行多個週期的再訓練。由於進行每一輪的訓練時處理器12都會將訓練資料集111中的資料量減少為30筆，故在每一輪訓練都固定新增48筆新訓練資料的情況下，可將訓練資料的總數量維持在78筆。另一方面，在表二的例子中由於未經分群池化，故訓練資料量將逐輪倍數增加。久而久之將顯著地影響判別模型112的再訓練效率。As can be seen from the example in Table 1, the processor 12 can perform multiple cycles of retraining for the discriminant model 112. Since the processor 12 will reduce the amount of data in the training data set 111 to 30 records in each round of training, the total amount of training data can be maintained at 78 records when 48 new training data are added in each round of training. On the other hand, in the example in Table 2, since there is no clustering and pooling, the amount of training data will increase by multiples in each round. Over time, it will significantly affect the retraining efficiency of the discriminant model 112.

完成判別模型112的再訓練之後，處理器12便可透過執行判別模型112以針對特定使用者的行為資料進行分類，進而驗證該特定使用者是否為合法使用者（即，動作206）。具體而言，處理器12可依照與訓練階段時相同的模式而將該特定使用者的行為資料（例如：第1圖中使用者U1提供的行為資料BD1）轉換成直方圖形式的資料，接著計算轉換後的直方圖資料與訓練時所定義的合法使用者或非法使用者的直方圖資料之間的差距，並據以驗證該特定使用者屬於合法使用者還是非法使用者。After completing the retraining of the discriminant model 112, the processor 12 can classify the behavior data of the specific user by executing the discriminant model 112, and then verify whether the specific user is a legitimate user (i.e., action 206). Specifically, the processor 12 can convert the behavior data of the specific user (e.g., the behavior data BD1 provided by the user U1 in FIG. 1) into data in the form of a histogram according to the same mode as in the training phase, and then calculate the difference between the converted histogram data and the histogram data of the legitimate user or the illegitimate user defined during training, and verify whether the specific user is a legitimate user or an illegitimate user based on this.

若判定該特定使用者為非法使用者，則處理器12可發出相應的警示訊息，以通知該特定使用者其將無法繼續使用電子計算裝置1，並且鎖定電子計算裝置1（即，動作207）。所述警示訊息可以是例如第1圖中由輸出元件14發出的警示訊息A1。另一方面，若判定該使用者為合法使用者，則可允許其繼續使用電子計算裝置1（即，動作208）。If the specific user is determined to be an illegal user, the processor 12 may issue a corresponding warning message to inform the specific user that the specific user will not be able to continue to use the electronic computing device 1, and lock the electronic computing device 1 (i.e., action 207). The warning message may be, for example, the warning message A1 issued by the output element 14 in FIG. 1. On the other hand, if the user is determined to be a legal user, the user may be allowed to continue to use the electronic computing device 1 (i.e., action 208).

在某些實施例中，電子計算裝置1還可進一步包含與儲存器11以及處理器12電性連接的一收發器（圖未示出），而處理器12可透過該收發器通知電子計算裝置1的管理者有一非法使用者正在使用其所管理的裝置。該收發器可包含一傳送器（transmitter）與一接收器（receiver）。以無線通訊為例，該收發器可包含但不限於：天線、放大器、調變器、解調變器、偵測器、類比至數位轉換器、數位至類比轉換器等通訊元件。以有線通訊為例，該收發器可以是例如但不限於：一十億位元乙太網路收發器（gigabit Ethernet transceiver）、一十億位元乙太網路介面轉換器（gigabit interface converter，GBIC）、一小封裝可插拔收發器（small form-factor pluggable (SFP) transceiver）、一百億位元小封裝可插拔收發器（ten gigabit small form-factor pluggable (XFP) transceiver）等。In some embodiments, the electronic computing device 1 may further include a transceiver (not shown) electrically connected to the memory 11 and the processor 12, and the processor 12 may notify the administrator of the electronic computing device 1 through the transceiver that an illegal user is using the device it manages. The transceiver may include a transmitter and a receiver. Taking wireless communication as an example, the transceiver may include but is not limited to: antenna, amplifier, modulator, demodulator, detector, analog to digital converter, digital to analog converter and other communication elements. Taking wired communication as an example, the transceiver may be, for example but not limited to: a gigabit Ethernet transceiver, a gigabit interface converter (GBIC), a small form-factor pluggable (SFP) transceiver, a ten gigabit small form-factor pluggable (XFP) transceiver, etc.

應理解，第2圖中係以一虛線區分「電子計算裝置1對判別模型112進行再訓練」以及「電子計算裝置1驗證使用者身份」兩個階段，且此二階段可獨立進行。詳言之，於第2圖中，對應於驗證使用者身份階段的動作206接續在對應於再訓練階段的動作203之後進行，其目的僅是為了表達電子計算裝置1可使用已完成再訓練的判別模型112來驗證使用者身份，而非限定電子計算裝置1在每一次驗證使用者身份（即，動作206、動作207、動作208）之前皆須先進行動作201～動作205的再訓練。It should be understood that the two stages of "the electronic computing device 1 retrains the discriminant model 112" and "the electronic computing device 1 verifies the user's identity" are separated by a dotted line in FIG. 2, and these two stages can be performed independently. Specifically, in FIG. 2, the action 206 corresponding to the stage of verifying the user's identity is performed after the action 203 corresponding to the retraining stage. The purpose is only to express that the electronic computing device 1 can use the discriminant model 112 that has been retrained to verify the user's identity, but it does not limit the electronic computing device 1 to retrain actions 201 to 205 before each user verification (i.e., action 206, action 207, action 208).

第4圖為描繪根據本發明的一或多個實施例的判別模型之更新方法的流程圖。第4圖所示內容僅是為了說明本發明的實施例，而非為了限制本發明。FIG. 4 is a flow chart describing a method for updating a discriminant model according to one or more embodiments of the present invention. The contents shown in FIG. 4 are only for illustrating embodiments of the present invention, and are not intended to limit the present invention.

參照第4圖，本揭露的一第二實施方式為一種用以更新一判別模型之一方法4。該判別模型可用以驗證一電子計算裝置之使用者身份。方法4可包含下列步驟：由該電子計算裝置，根據基於分群的一池化演算法簡化一原先的訓練資料集，且將複數筆新增訓練資料加入該原先的訓練資料集，以產生一訓練資料集（標示為401），其中該訓練資料集包含複數筆訓練資料；以及由該電子計算裝置，以該訓練資料集訓練一原先的判別模型，以產生該判別模型（標示為402）。 Referring to FIG. 4, a second embodiment of the present disclosure is a method 4 for updating a discriminant model. The discriminant model can be used to verify the identity of a user of an electronic computing device. Method 4 may include the following steps: The electronic computing device simplifies an original training data set according to a pooling algorithm based on clustering, and adds a plurality of new training data to the original training data set to generate a training data set (labeled as 401), wherein the training data set includes a plurality of training data; and The electronic computing device trains an original discriminant model with the training data set to generate the discriminant model (labeled as 402).

在某些實施例中，在簡化該原先的訓練資料集時，方法4還可包含下列步驟：（A）由該電子計算裝置在複數筆訓練資料中決定複數個群心；（B）由該電子計算裝置根據該複數個群心，將該複數筆訓練資料分成複數個群集，其中該複數個群集的數量與該複數個群心的數量相同；（C）由該電子計算裝置針對各群集，將當中的所有訓練資料加總後平均，進而獲得該群集的一平均資料；（D）由該電子計算裝置以該複數筆平均資料更新該複數個群心；（E）由該電子計算裝置反覆且依序地進行步驟（B）、步驟（C）、步驟（D），直到該複數筆平均資料收斂至與該複數個群心相同；以及（F）由該電子計算裝置將各群集的該平均資料用以取代該群集中的全部訓練資料，進而簡化該原先的訓練資料集。 In some embodiments, when simplifying the original training data set, method 4 may also include the following steps: (A) the electronic computing device determines a plurality of group centers from a plurality of training data; (B) the electronic computing device divides the plurality of training data into a plurality of clusters based on the plurality of group centers, wherein the number of the plurality of clusters is the same as the number of the plurality of group centers; (C) the electronic computing device adds up and averages all the training data in each cluster to obtain an average data of the cluster; (D) the electronic computing device updates the plurality of group centers with the plurality of average data; (E) The electronic computing device repeatedly and sequentially performs step (B), step (C), and step (D) until the plurality of average data converges to the same as the plurality of cluster centers; and (F) The electronic computing device uses the average data of each cluster to replace all training data in the cluster, thereby simplifying the original training data set.

在某些實施例中，除了上述之步驟（A）～（D）之外，在將該原先的訓練資料集中的該複數筆訓練資料分成該複數個群集時，方法4還可包含下列步驟：（B1）由該電子計算裝置計算該原先的訓練資料集中的各訓練資料與各該群心的一差距；以及（B2）由該電子計算裝置將該原先的訓練資料集中的各該訓練資料與差距最小的群心相關聯，進而形成該複數個群集。 In some embodiments, in addition to the above steps (A) to (D), when dividing the plurality of training data in the original training data set into the plurality of clusters, method 4 may further include the following steps: (B1) the electronic computing device calculates a gap between each training data in the original training data set and each of the cluster centers; and (B2) the electronic computing device associates each of the training data in the original training data set with the cluster center with the smallest gap, thereby forming the plurality of clusters.

在某些實施例中，關於方法4，該複數筆訓練資料可為直方圖資料。此外，在某些實施例中，該原先的訓練資料集及該訓練資料集具有相同的訓練資料數量。In some embodiments, regarding method 4, the plurality of training data may be histogram data. In addition, in some embodiments, the original training data set and the training data set have the same amount of training data.

在某些實施例中，方法4還可包含下列步驟：在訓練該原先的判別模型時，由該電子計算裝置基於對角共變異數矩陣計算類內變異數。 In some embodiments, method 4 may also include the following steps: When training the original discriminant model, the electronic computing device calculates the intra-class variance based on the diagonal covariance matrix.

方法4的每一個實施例基本上都會與電子計算裝置1的某一個實施例相對應。因此，僅根據上文針對電子計算裝置1的說明，本發明所屬技術領域中具有通常知識者即已能充分瞭解且實現方法4的所有相應的實施例，即使上文未針對方法4的每一個實施例進行詳述。Each embodiment of method 4 basically corresponds to a certain embodiment of electronic computing device 1. Therefore, based on the above description of electronic computing device 1, a person with ordinary knowledge in the technical field to which the present invention belongs can fully understand and implement all corresponding embodiments of method 4, even if each embodiment of method 4 is not described in detail above.

在某些實施例中，方法4可被實作為一電腦程式產品。當電腦程式產品被讀入一電子計算裝置時，包含於該電腦程式產品中的複數個程式指令可執行第二實施方式所述之方法4。該電腦程式產品可被儲存於一非暫態有形機器可讀媒介（non-transitory tangible machine-readable medium），例如（但不限於）一唯讀記憶體（read-only memory，ROM）、一快閃記憶體（flash memory）、一磁碟片（floppy disk）、一行動硬碟、一磁帶（magnetic tape）、可連網的一資料庫或任何其他為本發明所屬技術領域中具有通常知識者所熟知且具有相同功能的儲存媒介。由於網路之普及，電腦軟體除可儲存於記錄媒體外，亦可在網路上直接傳輸提供，而無須藉由儲存於記錄媒體上提供。電腦程式產品係指載有電腦可讀取之程式且不限外在形式之物。In some embodiments, method 4 may be implemented as a computer program product. When the computer program product is read into an electronic computing device, a plurality of program instructions contained in the computer program product may execute method 4 described in the second embodiment. The computer program product may be stored in a non-transitory tangible machine-readable medium, such as (but not limited to) a read-only memory (ROM), a flash memory, a floppy disk, a hard disk, a magnetic tape, a database connected to the Internet, or any other storage medium that is well known to a person of ordinary skill in the art to which the present invention belongs and has the same function. Due to the popularity of the Internet, computer software can be stored in recording media and can also be directly transmitted and provided on the Internet without being stored on recording media. A computer program product refers to an item that contains a computer-readable program and is not limited to an external form.

上述實施方式僅用來例舉本發明之部分實施態樣，以及闡釋本發明之技術特徵，而非用來限制本發明之保護範疇及範圍。任何本發明所屬技術領域中具有通常知識者可輕易完成之改變或均等性之安排均屬於本發明所主張之範圍，而本發明之權利保護範圍以申請專利範圍為準。The above implementations are only used to exemplify some implementation modes of the present invention and to explain the technical features of the present invention, and are not used to limit the scope and range of protection of the present invention. Any changes or equivalent arrangements that can be easily completed by a person with ordinary knowledge in the technical field to which the present invention belongs are within the scope advocated by the present invention, and the scope of protection of the present invention is subject to the scope of the patent application.

如下所示： 1:電子計算裝置 11:儲存器 111:訓練資料集 112:判別模型 12:處理器 13:輸入元件 14:輸出元件 2:流程 201~208:動作 4:方法 401~402:步驟 A1:警示訊息 BD1:行為資料 U1:使用者 As shown below: 1: Electronic computing device 11: Memory 111: Training data set 112: Discrimination model 12: Processor 13: Input element 14: Output element 2: Process 201~208: Action 4: Method 401~402: Step A1: Warning message BD1: Behavior data U1: User

第1圖為描繪根據本發明的一或多個實施例的驗證使用者身份的電子計算裝置之示意圖。第2圖為描繪根據本發明的一或多個實施例的判別模型再訓練及驗證使用者身份之流程圖。第3圖為描繪根據本發明的一或多個實施例的直方圖形式的特徵資料的相關性的示意圖。第4圖為描繪根據本發明的一或多個實施例的判別模型之更新方法的流程圖。 FIG. 1 is a schematic diagram of an electronic computing device for verifying user identity according to one or more embodiments of the present invention. FIG. 2 is a flowchart for retraining a discriminant model and verifying user identity according to one or more embodiments of the present invention. FIG. 3 is a schematic diagram for describing the correlation of feature data in the form of a histogram according to one or more embodiments of the present invention. FIG. 4 is a flowchart for describing a method for updating a discriminant model according to one or more embodiments of the present invention.

無。without.

4:判別模型之更新方法 401~402:步驟 4: Update method of discriminant model 401~402: Steps

Claims

An electronic computing device for verifying user identity comprises: a memory for storing a discrimination model and a training data set, wherein the training data set comprises a plurality of training data; and a processor, electrically connected to the memory, for running the discrimination model to determine whether a user is a legitimate user based on at least one piece of behavior data, wherein the at least one piece of behavior data corresponds to the behavior characteristics of the user; wherein the processor is further used to: simplify an original training data set according to a pooling algorithm based on clustering, and add a plurality of new training data to the original training data set to generate the training data set; train an original discrimination model with the training data set to generate the discrimination model; wherein, in simplifying the original When training a data set, the processor is also used to: (A) determine a plurality of centroids in a plurality of training data; (B) divide the plurality of training data into a plurality of clusters according to the plurality of centroids, wherein the number of the plurality of clusters is the same as the number of the centroids; (C) for each cluster, sum up and average all the training data in the cluster to obtain an average data of the cluster; (D) update the plurality of centroids with the plurality of average data; (E) repeatedly and sequentially perform steps (B), (C), and (D) until the plurality of average data converges to the same as the plurality of centroids; and (F) use the average data of each cluster to replace all the training data in the cluster, thereby simplifying the original training data set.

The electronic computing device as described in claim 1, wherein when the plurality of training data are divided into a plurality of clusters, the processor is also used to: (B1) calculate a gap between each training data in the original training data set and each of the cluster centers; and (B2) associate each of the training data in the original training data set with the cluster center with the smallest gap, thereby forming the plurality of clusters.

An electronic computing device as described in claim 1, wherein the plurality of training data are histogram data.

An electronic computing device as described in claim 1, wherein the original training data set and the training data set have the same amount of training data.

The electronic computing device as described in claim 1, wherein the processor is also used to: calculate the intra-class variance based on the diagonal covariance matrix when training the original discriminant model and when making a judgment based on the at least one behavioral data.

A method for updating a discriminant model, the discriminant model is used to verify the identity of a user of an electronic computing device, the method comprising the following steps: the electronic computing device simplifies an original training data set according to a pooling algorithm based on clustering, and adds a plurality of new training data to the original training data set to generate a training data set, wherein the training data set includes a plurality of training data; and the electronic computing device trains an original discriminant model with the training data set to generate the discriminant model, wherein when simplifying the original training data set, the method further comprises the following steps: (A) the electronic computing device determines a plurality of group centroids from the plurality of training data; (B) the electronic computing device According to the plurality of group centroids, the plurality of training data are divided into a plurality of clusters, wherein the number of the plurality of clusters is the same as the number of the plurality of group centroids; (C) the electronic computing device adds up and averages all the training data in each cluster to obtain an average data of the cluster; (D) the electronic computing device updates the plurality of group centroids with the plurality of average data; (E) the electronic computing device repeatedly and sequentially performs steps (B), (C), and (D) until the plurality of average data converges to the same as the plurality of group centroids; and (F) the electronic computing device replaces all the training data in the cluster with the average data of each cluster to simplify the original training data set.

The method as described in claim 6, wherein when the plurality of training data in the original training data set are divided into the plurality of clusters, the method further comprises the following steps: (B1) the electronic computing device calculates a gap between each training data in the original training data set and each of the cluster centers; and (B2) the electronic computing device associates each of the training data in the original training data set with the cluster center with the smallest gap, thereby forming the plurality of clusters.

The method as described in claim 6, wherein the plurality of training data are histogram data.

The method as described in claim 6, wherein the original training data set and the training data set have the same amount of training data.

The method as described in claim 6 further comprises the following steps: When training the original discriminant model, the electronic computing device calculates the intra-class variance based on the diagonal covariance matrix.

A computer program product, after being loaded into an electronic computing device, executes the following instructions: simplifying an original training data set according to a pooling algorithm based on clustering, and adding a plurality of new training data to the original training data set to generate a training data set, wherein the training data set includes a plurality of training data; and training an original discriminant model with the training data set to generate a discriminant model for verifying the identity of a user of the electronic computing device, wherein when the electronic computing device simplifies the original training data set, the computer program product further includes the following instructions: (A) determining a plurality of group centers in the plurality of training data; (B) dividing the plurality of training data into a plurality of clusters according to the plurality of group centroids, wherein the number of the plurality of clusters is the same as the number of the plurality of group centroids; (C) for each cluster, summing up all the training data therein and averaging them to obtain an average data of the cluster; (D) updating the plurality of group centroids with the plurality of average data; (E) repeatedly and sequentially performing steps (B), (C), and (D) until the plurality of average data converge to the same as the plurality of group centroids; and (F) using the average data of each cluster to replace all the training data in the cluster, thereby simplifying the original training data set.

A computer program product as described in claim 11, wherein when the electronic computing device divides the plurality of training data in the original training data set into the plurality of clusters, the computer program product further comprises the following instructions: (B1) calculating a gap between each training data in the original training data set and each of the cluster centers; and (B2) associating each of the training data in the original training data set with the cluster center with the smallest gap, thereby forming the plurality of clusters.

A computer program product as described in claim 11, wherein the plurality of training data are histogram data.

A computer program product as described in claim 11, wherein the original training data set and the training data set have the same amount of training data.

The computer program product as described in claim 11 further comprises the following instructions: when the electronic computing device trains the original discriminant model, the intra-class variance is calculated based on the diagonal covariance matrix.