TWI729412B

TWI729412B - Fraud phone call detecting method and electronic device

Info

Publication number: TWI729412B
Application number: TW108120828A
Authority: TW
Inventors: 束宜鵬; 張守全; 林本源; 林錦誼; 林相宇; 羅湘盈; 劉育維; 古唐瑜; 陳靖姿; 邱佳宜
Original assignee: 遠傳電信股份有限公司
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2021-06-01
Also published as: TW202101952A

Abstract

A fraud phone call detecting method and an electronic device are provided. The method includes: obtaining a plurality of voice call detail records related to voice; obtaining a plurality of call features of each of a plurality of phone numbers according to the plurality of voice call detail records; inputting the plurality of call features of each of the plurality of phone numbers to a trained model to obtain a determining result; and outputting the determining result, wherein the determining result is used to indicate that phone numbers belongs to a fraud phone call in the plurality of phone numbers.

Description

Fraud phone detection method and electronic device

本發明是有關於一種詐騙電話偵測方法與電子裝置。The present invention relates to a method for detecting fraudulent calls and an electronic device.

詐騙電話是目前治安防治很重要的一環。然而，目前並沒有一套有效的方法來辨識一個門號是否為詐騙電話。因此，如何辨識一個門號是否為詐騙電話，是本領域技術人員所欲解決的問題之一。Scam calls are a very important part of the current public order prevention and control. However, there is currently no effective method to identify whether a phone number is a fraudulent call. Therefore, how to identify whether a door number is a fraudulent call is one of the problems that those skilled in the art want to solve.

本發明提供一種詐騙電話偵測方法與電子裝置，可以透過已訓練的模型來有效地辨識出屬於詐騙電話的門號。The present invention provides a fraudulent call detection method and electronic device, which can effectively identify the door number of a fraudulent call through a trained model.

本發明提出一種詐騙電話偵測方法，用於一電子裝置，所述方法包括：獲得與語音相關的多個語音通聯紀錄；根據所述多個語音通聯紀錄獲得多個門號中的每一個門號的多個通話特徵；將所述多個門號中的每一個門號的所述多個通話特徵輸入至一已訓練的模型以獲得一判斷結果；以及輸出所述判斷結果，其中所述判斷結果用於指出所述多個門號中屬於詐騙電話的門號。The present invention provides a fraudulent call detection method for an electronic device. The method includes: obtaining a plurality of voice communication records related to voice; and obtaining each of the plurality of door numbers according to the plurality of voice communication records. Multiple call features of each of the multiple door numbers; input the multiple call features of each of the multiple door numbers into a trained model to obtain a judgment result; and output the judgment result, wherein the The judgment result is used to point out the door number belonging to a fraudulent call among the plurality of door numbers.

本發明提出一種電子裝置，所述電子裝置包括：輸入輸出電路以及處理器。處理器用以耦接至所述輸入輸出電路。所述輸入輸出電路獲得與語音相關的多個語音通聯紀錄。所述處理器根據所述多個語音通聯紀錄獲得多個門號中的每一個門號的多個通話特徵。所述處理器將所述多個門號中的每一個門號的所述多個通話特徵輸入至一已訓練的模型以獲得一判斷結果。所述輸入輸出電路輸出所述判斷結果，其中所述判斷結果用於指出所述多個門號中屬於詐騙電話的門號。The present invention provides an electronic device. The electronic device includes an input/output circuit and a processor. The processor is used for coupling to the input and output circuit. The input and output circuit obtains multiple voice communication records related to voice. The processor obtains multiple call characteristics of each of the multiple door numbers according to the multiple voice communication records. The processor inputs the plurality of call characteristics of each of the plurality of door numbers into a trained model to obtain a judgment result. The input and output circuit outputs the judgment result, wherein the judgment result is used to indicate the door number belonging to a fraudulent call among the plurality of door numbers.

基於上述，本發明的詐騙電話偵測方法與電子裝置，可以透過已訓練的模型來有效地辨識出屬於詐騙電話的門號，藉此即早防止詐騙電話對民眾所造成的危害。Based on the above, the fraudulent call detection method and electronic device of the present invention can effectively identify the door number of a fraudulent call through the trained model, thereby preventing the harm caused by the fraudulent call to the public.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

現將詳細參考本發明之示範性實施例，在附圖中說明所述示範性實施例之實例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件代表相同或類似部分。Now referring to the exemplary embodiments of the present invention in detail, examples of the exemplary embodiments are illustrated in the accompanying drawings. In addition, wherever possible, elements/components with the same reference numbers in the drawings and embodiments represent the same or similar parts.

圖1是依照本發明的一實施例所繪示的電子裝置的方塊圖。FIG. 1 is a block diagram of an electronic device according to an embodiment of the invention.

請參照圖1，電子裝置100包括處理器20、輸入輸出電路22以及儲存電路24。其中，輸入輸出電路22以及儲存電路24分別耦接至處理器20。電子裝置100例如是桌上型電腦、伺服器、手機、平板電腦、筆記型電腦等電子行動裝置，在此不設限。Please refer to FIG. 1, the electronic device 100 includes a processor 20, an input/output circuit 22 and a storage circuit 24. Among them, the input/output circuit 22 and the storage circuit 24 are respectively coupled to the processor 20. The electronic device 100 is, for example, an electronic mobile device such as a desktop computer, a server, a mobile phone, a tablet computer, a notebook computer, etc., which is not limited herein.

處理器20可以是中央處理器（Central Processing Unit，CPU），或是其他可程式化之一般用途或特殊用途的微處理器（Microprocessor）、數位信號處理器（Digital Signal Processor，DSP）、可程式化控制器、特殊應用積體電路（Application Specific Integrated Circuit，ASIC）或其他類似元件或上述元件的組合。The processor 20 may be a central processing unit (Central Processing Unit, CPU), or other programmable general-purpose or special-purpose microprocessors (Microprocessor), digital signal processors (Digital Signal Processor, DSP), programmable Integrated circuit (Application Specific Integrated Circuit, ASIC) or other similar components or a combination of the above components.

輸入輸出電路22例如是用於從電子裝置100外部或其他來源取得相關資料的輸入介面或電路。此外，輸入輸出電路22也可以將電子裝置100產生的資料傳送給其他電子裝置的輸出介面或電路，在此並不設限。The input/output circuit 22 is, for example, an input interface or circuit for obtaining relevant data from outside the electronic device 100 or other sources. In addition, the input/output circuit 22 can also transmit the data generated by the electronic device 100 to the output interface or circuit of other electronic devices, which is not limited herein.

儲存電路24可以是任何型態的固定或可移動隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）或類似元件或上述元件的組合。The storage circuit 24 can be any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory (flash memory) or similar components Or a combination of the above elements.

在本範例實施例中，電子裝置100的儲存電路24中儲存有多個程式碼片段，在上述程式碼片段被安裝後，會由處理器20來執行。例如，儲存電路24中包括多個模組，藉由這些模組來分別執行應用於電子裝置100的詐騙電話偵測方法的各個運作，其中各模組是由一或多個程式碼片段所組成。然而本發明不限於此，電子裝置100的各個運作也可以是使用其他硬體形式的方式來實現。In this exemplary embodiment, a plurality of code fragments are stored in the storage circuit 24 of the electronic device 100, and after the above-mentioned code fragments are installed, they will be executed by the processor 20. For example, the storage circuit 24 includes a plurality of modules, and each operation of the fraudulent call detection method applied to the electronic device 100 is executed by these modules, and each module is composed of one or more code fragments. . However, the present invention is not limited to this, and various operations of the electronic device 100 may also be implemented in other hardware forms.

圖2是依照本發明的一實施例所繪示的模型的訓練的流程圖。Fig. 2 is a flowchart of the training of a model according to an embodiment of the present invention.

請參照圖2，假設電子裝置100是運作在電信業者的機房當中。需說明的是，當該電信業者所提供的一門號撥出一通電話時，該電信業者的一資料庫(亦稱為，第一資料庫)中也會儲存相對應的一筆通聯紀錄(Call Detail Record，CDR)。在訓練模型時，處理器20例如可以先從第一資料庫中取得預先被(例如，人工地)識別為詐騙電話的門號的通聯紀錄(亦稱為，第一通聯紀錄)。之後，處理器20可以執行一特徵擷取操作(步驟S201)。更詳細來說，通聯紀錄中可能會包括與電話語音相關的資料以及與網路相關的資料。而在此的特徵擷取操作主要是用於擷取通聯紀錄中與電話語音相關的資料(亦稱為，語音通聯紀錄)，而此資料例如可以包括門號號碼、通話的時間點、通話的時間長度以及通話時所使用的基地台等資訊。Please refer to FIG. 2, assuming that the electronic device 100 is operating in a telecom operator's computer room. It should be noted that when a phone number provided by the carrier makes a call, a database (also known as the first database) of the carrier will also store a corresponding call detail record (Call Detail). Record, CDR). When training the model, the processor 20 may first obtain, from the first database, a contact record (also referred to as a first contact record) that has been previously (for example, manually) identified as a fraudulent phone number. After that, the processor 20 may perform a feature extraction operation (step S201). In more detail, the communication records may include data related to telephone voice and data related to the Internet. The feature extraction operation here is mainly used to capture the data related to the phone voice in the contact record (also called the voice contact record), and this data can include, for example, the door number, the time point of the call, and the call Information such as the length of time and the base station used during the call.

在從第一通聯紀錄中擷取出與電話語音相關的資料(即，語音通聯紀錄)後，處理器可以執行特徵前處理操作(步驟S203)。舉例來說，由於一個門號在撥出一通電話時會產生一筆通聯紀錄，故此步驟主要用於執行資料聚合(data aggregation)操作以將多筆與同一個門號相關的語音通聯紀錄進行整合，藉此以獲得同一個門號的多個通話特徵。因此，處理器20例如可以對多個預先被識別為詐騙電話的門號的多個語音通聯紀錄執行資料聚合操作以獲得每一個被識別為詐騙電話的門號的多個通話特徵(亦稱為，第一通話特徵)。其中，通話特徵可以包括：一門號在一段時間(例如，一週)內的單日通話筆數、通話的時段、單次通話的長度的平均、單次通話的長度的標準差、單日平均總通話時間長度、受話號碼的多樣性、使用的基地台的數量以及發話號碼的第三至四碼的至少其中之一。After extracting the data related to the telephone voice (ie, the voice communication record) from the first communication record, the processor may perform a feature pre-processing operation (step S203). For example, since a phone number generates a contact record when making a call, this step is mainly used to perform data aggregation operations to integrate multiple voice contact records related to the same door number. In this way, multiple call features of the same door number can be obtained. Therefore, the processor 20 may, for example, perform a data aggregation operation on multiple voice communication records identified as fraudulent phone numbers in advance to obtain multiple call characteristics (also known as fraudulent phone numbers). , The first call feature). Among them, the call characteristics may include: the number of calls in a single day for a door number in a period of time (for example, a week), the time period of the call, the average length of a single call, the standard deviation of the length of a single call, and the average total of a single day. At least one of the length of the talk time, the diversity of the receiving number, the number of base stations used, and the third to fourth digits of the calling number.

單日通話筆數例如是一週內每日的平均通話數量。需說明的是，由於詐騙電話的門號在單日的通話筆數通常會高於一般非詐騙電話的門號，因此在前述的第一通話特徵中，單日通話筆數會大於一門檻值(亦稱為，第一門檻値)，而本發明並不用於限定此第一門檻值的數值。The number of calls in a single day is, for example, the average number of calls per day in a week. It should be noted that since the number of fraudulent calls in a single day is usually higher than that of ordinary non-fraud calls, in the first call feature mentioned above, the number of single-day calls will be greater than a threshold. (Also referred to as the first threshold), and the present invention is not used to limit the value of the first threshold.

通話的時段例如是一門號在一段時間(例如，一週)內的每天第一通電話的時間的平均值與最後一通電話的時間的平均值。需說明的是，由於詐騙電話的撥打時間通常介於早上八點至下午六點之間，因此在前述的第一通話特徵中，通話的時段會處於一特定時段中(例如，早上八點至下午六點之間)，而本發明並不用於限定此特定時段為何。The time period of the call is, for example, the average value of the time of the first call and the time of the last call of a door number in a period of time (for example, a week). It should be noted that since the time of making a fraudulent call is usually between 8 am and 6 pm, in the aforementioned first call feature, the call period will be in a specific time period (for example, from 8 am to 6 pm). Between 6 pm), and the present invention is not used to limit the specific time period.

單次通話的長度的平均例如是一門號在一段時間(例如，一週)內的每一通電話的通話時間長度加總後除以該段時間內的通話次數。需說明的是，由於詐騙電話的單次通話的長度的平均通常會低於一般非詐騙電話的單次通話的長度的平均，因此在前述的第一通話特徵中，單次通話的長度的平均會小於一門檻值(亦稱為，第二門檻値)，而本發明並不用於限定此第二門檻值的數值。The average of the length of a single call is, for example, the sum of the length of the call time of each call of a door number in a period of time (for example, a week) divided by the number of calls in the period. It should be noted that the average length of a single call of a fraudulent call is usually lower than the average length of a single call of a general non-fraud call. Therefore, in the aforementioned first call feature, the average length of a single call Will be less than a threshold value (also referred to as the second threshold value), and the present invention is not used to limit the value of the second threshold value.

單次通話的長度的標準差例如是對一門號在一段時間(例如，一週)內的每一通電話的通話時間長度執行標準差的計算。需說明的是，由於詐騙電話的單次通話的長度的標準差通常會低於一般非詐騙電話的單次通話的長度的標準差，因此在前述的第一通話特徵中，單次通話的長度的標準差會小於一門檻值(亦稱為，第三門檻値)，而本發明並不用於限定此第三門檻值的數值。The standard deviation of the length of a single call is, for example, the calculation of the standard deviation of the length of the call for each call of a door number in a period of time (for example, a week). It should be noted that since the standard deviation of the length of a single call of a fraudulent call is usually lower than the standard deviation of the length of a single call of a general non-fraud call, the length of a single call is therefore The standard deviation of is smaller than a threshold value (also referred to as the third threshold value), and the present invention is not used to limit the value of the third threshold value.

單日平均總通話時間長度例如是對一門號在一段時間(例如，一週)內的電話的每一通電話的通話時間加總後除以該段時間內的天數。需說明的是，由於詐騙電話的單日平均總通話時間長度通常會大於一般非詐騙電話的單日平均總通話時間長度，因此在前述的第一通話特徵中，單日平均總通話時間長度會大於一門檻值(亦稱為，第四門檻値)，而本發明並不用於限定此第四門檻值的數值。The average total talk time in a single day is, for example, the sum of the talk time of each call of a phone number in a period of time (for example, a week) divided by the number of days in the period of time. It should be noted that the average daily total call time length of fraudulent calls is usually greater than the average daily average total call time length of general non-fraud calls. Therefore, in the aforementioned first call feature, the average single-day total call time length will be It is greater than a threshold value (also referred to as the fourth threshold value), and the present invention is not used to limit the value of the fourth threshold value.

受話號碼的多樣性可以是受話方號碼所處的地區或者是受話方號碼的擁有者的族群(例如，以年齡或職業區分的族群)數量。需說明的是，由於詐騙電話的受話號碼的多樣性通常會大於一般非詐騙電話的受話號碼的多樣性，因此在前述的第一通話特徵中，受話號碼的多樣性會大於一門檻值(亦稱為，第五門檻値)，而本發明並不用於限定此第五門檻值的數值。The diversity of the callee number may be the area where the callee number is located or the ethnic group of the owner of the callee number (for example, the number of ethnic groups distinguished by age or occupation). It should be noted that because the diversity of the received numbers of fraudulent calls is usually greater than the diversity of ordinary non-fraud calls, the diversity of the received numbers in the aforementioned first call feature will be greater than a threshold (also It is called the fifth threshold), and the present invention is not used to limit the value of the fifth threshold.

使用的基地台的數量可以是在一段時間(例如，一週)內一門號所使用到的基地台的平均數量。需說明的是，由於詐騙集團的機房通常是固定設置於一定點，因此詐騙電話所使用的基地台的比例(例如，所撥出的電話的數量與所播出的電話所使用到的基地台的數量的比例)通常會小於一般非詐騙電話所使用的基地台的數量，因此在前述的第一通話特徵中，使用的基地台的數量會小於一門檻值(亦稱為，第六門檻値)，而本發明並不用於限定此第六門檻值的數值。The number of base stations used may be the average number of base stations used by a door number in a period of time (for example, a week). It should be noted that because the computer room of the fraud group is usually fixed at a certain point, the ratio of the base station used by the fraudulent call (for example, the number of calls made and the base station used by the broadcasted call The ratio of the number of base stations) is usually smaller than the number of base stations used by ordinary non-fraud calls. Therefore, in the aforementioned first call feature, the number of base stations used will be less than a threshold (also known as the sixth threshold). ), and the present invention is not used to limit the value of the sixth threshold.

發話號碼的第三至四碼例如是位在發話號碼由左往右數的第三個與第四個號碼。需說明的是，由於詐騙集團可能會一次申辦大量號碼相近的門號，因此詐騙電話所使用的電話號碼的第三至四碼通常會符合一特定數值。而本發明並不用於限定此特定數值為何。The third to fourth codes of the calling number are, for example, the third and fourth numbers from left to right of the calling number. It should be noted that since fraudulent groups may bid for a large number of door numbers with similar numbers at one time, the third to fourth digits of the phone number used by fraudulent calls usually meet a specific value. However, the present invention is not used to limit the specific value.

之後，處理器20會根據每一個被識別為詐騙電話的門號的第一通話特徵訓練一模型(步驟S205)以產生並獲得已訓練的模型(步驟S207)。而步驟S207所產生的模型可以用於預測一個門號是否為詐騙電話。After that, the processor 20 trains a model according to the first call feature of each phone number identified as a fraudulent call (step S205) to generate and obtain a trained model (step S207). The model generated in step S207 can be used to predict whether a door number is a fraudulent call.

更詳細來說，圖3是依照本發明的一實施例所繪示的模型的使用的流程圖。In more detail, FIG. 3 is a flowchart of the use of the model according to an embodiment of the present invention.

請參照圖3，首先，請參照圖3，處理器20例如可以先從第一資料庫中取得欲被辨識的門號的通聯紀錄。之後，處理器20可以執行一特徵擷取操作(步驟S301)。類似於前述，由於通聯紀錄中可能會包括與電話語音相關的資料以及與網路相關的資料。而在此的特徵擷取操作主要是用於擷取通聯紀錄中與電話語音相關的資料(亦稱為，語音通聯紀錄)，而此資料例如可以包括門號號碼、通話的時間點、通話的時間長度以及通話時所使用的基地台等資訊。在本實施例中，處理器20還會將所擷取出的語音通聯紀錄儲存至另一資料庫(亦稱為，第二資料庫)中以待進行後續的處理分析。Please refer to FIG. 3. First, referring to FIG. 3, the processor 20 may first obtain the communication record of the door number to be recognized from the first database, for example. After that, the processor 20 may perform a feature extraction operation (step S301). Similar to the foregoing, the communication records may include data related to telephone voice and data related to the Internet. The feature extraction operation here is mainly used to capture the data related to the phone voice in the contact record (also called the voice contact record), and this data can include, for example, the door number, the time point of the call, and the call Information such as the length of time and the base station used during the call. In this embodiment, the processor 20 also stores the retrieved voice communication records in another database (also referred to as a second database) for subsequent processing and analysis.

在執行完步驟S301後，處理器20可以執行特徵前處理操作(步驟S303)。類似於前述，由於一個門號在撥出一通電話時會產生一筆通聯紀錄，故此步驟主要用於執行資料聚合(data aggregation)操作以將多筆與同一個門號相關的語音通聯紀錄進行整合，藉此以獲得同一個門號的多個通話特徵。因此，處理器20例如可以對多個欲被辨識的門號的多個語音通聯紀錄執行資料聚合操作以獲得每一個欲被辨識的門號的多個通話特徵。類似地，通話特徵可以包括：一門號在一段時間(例如，一週)內的單日通話筆數、通話的時段、單次通話的長度的平均、單次通話的長度的標準差、單日平均總通話時間長度、受話號碼的多樣性、使用的基地台的數量以及發話號碼的第三至四碼的至少其中之一。After performing step S301, the processor 20 may perform feature pre-processing operations (step S303). Similar to the above, since a phone number will generate a contact record when making a call, this step is mainly used to perform data aggregation operations to integrate multiple voice contact records related to the same door number. In this way, multiple call features of the same door number can be obtained. Therefore, the processor 20 may, for example, perform a data aggregation operation on multiple voice communication records of multiple door numbers to be recognized to obtain multiple call characteristics of each door number to be recognized. Similarly, call characteristics can include: the number of calls made in a single day for a door number in a period of time (for example, a week), the time period of the call, the average length of a single call, the standard deviation of the length of a single call, and the average of a single day At least one of the length of the total talk time, the diversity of the receiving number, the number of base stations used, and the third to fourth digits of the calling number.

之後，處理器20可以將前述步驟S303中每一個欲被辨識的門號的通話特徵輸入至前述圖2中所產生的已訓練的模型(步驟S307)以獲得一判斷結果。之後，輸入輸出電路22會輸出前述的判斷結果，而此判斷結果用於指出用於辨識的多個門號中屬於詐騙電話的門號。After that, the processor 20 can input the call characteristics of each door number to be recognized in the foregoing step S303 into the trained model generated in the foregoing FIG. 2 (step S307) to obtain a judgment result. After that, the input/output circuit 22 outputs the aforementioned judgment result, and the judgment result is used to indicate the number of a fraudulent call among the plurality of door numbers used for identification.

需說明的是，本發明是使用基於決策樹(Decision tree-based)的機器學習演算法(例如，Light GBM)來訓練前述的模型。然而本發明不限於此，在其他實施例中，也可以使用其他的機器學習或深度學習的演算法來產生前述的模型。It should be noted that the present invention uses a decision tree-based machine learning algorithm (for example, Light GBM) to train the aforementioned model. However, the present invention is not limited to this. In other embodiments, other machine learning or deep learning algorithms can also be used to generate the aforementioned model.

圖4是依照本發明的一實施例所繪示的模型預測準確度的示意圖。FIG. 4 is a schematic diagram of the accuracy of model prediction according to an embodiment of the present invention.

請參照圖4，假設在圖4的矩陣M1中，行C1代表被前述模型預測(或識別)為詐騙電話的門號的數量。行C2代表被前述模型預測(或識別)為非詐騙電話的門號的數量。列R1代表實際上詐騙電話的門號的數量。列R2代表實際上非詐騙電話的門號的數量。因此，藉由行C1~C2與列R1~R2，可以構成矩陣M1中4個區塊BK1~BK4，每個區塊包括至少一門號的數量。而模型預測的準確度。在本實施例中，假設對40545個門號進行判斷。若以Light GBM產生的模型進行預測，則區塊BK1會有205個門號，區塊BK2會有141個門號，區塊BK3會有97個門號，區塊BK4會有40102個門號。可以將所有的門號中，模型準確預測為詐騙電話的數量以及準確預測為非詐騙電話的數量所佔的比例稱為準確度(Accuracy)。而模型的準確度(Accuracy)可以計算如下：(205+40102)/( 205+40102+97+141)=99.4%。此外，可以將模型所預測出的詐騙電話中，真實的詐騙電話所佔的比例稱為精確度(Precision)。因此，模型的精確度可以計算如下：(205)/(205+97)=68.6%。特別是，在本發明經過實驗後，使用Light GBM所產生的模型會有較高的準確度與經確度。Please refer to FIG. 4, assuming that in the matrix M1 of FIG. 4, row C1 represents the number of phone numbers predicted (or identified) as fraudulent calls by the aforementioned model. Row C2 represents the number of phone numbers predicted (or identified) as non-fraud calls by the aforementioned model. Column R1 represents the actual number of fraudulent calls. Column R2 represents the actual number of non-fraud calls. Therefore, with rows C1 to C2 and columns R1 to R2, four blocks BK1 to BK4 in the matrix M1 can be formed, and each block includes at least one door number. And the accuracy of model prediction. In this embodiment, it is assumed that 40,545 door numbers are judged. If the model generated by Light GBM is used for prediction, block BK1 will have 205 door numbers, block BK2 will have 141 door numbers, block BK3 will have 97 door numbers, and block BK4 will have 40102 door numbers. . Among all the door numbers, the proportion of the number accurately predicted by the model as fraudulent calls and the number accurately predicted as non-fraud calls can be called Accuracy. The accuracy of the model can be calculated as follows: (205+40102)/(205+40102+97+141)=99.4%. In addition, among the fraudulent calls predicted by the model, the proportion of real fraudulent calls can be called Precision. Therefore, the accuracy of the model can be calculated as follows: (205)/(205+97)=68.6%. In particular, after the present invention has undergone experiments, the model generated by using Light GBM will have higher accuracy and longitude accuracy.

圖5是依照本發明的一實施例所繪示的詐騙電話偵測方法的流程圖。FIG. 5 is a flowchart of a method for detecting fraudulent calls according to an embodiment of the present invention.

請參照圖5，在步驟S501中，輸入輸出電路22獲得與語音相關的多個語音通聯紀錄。在步驟S503中，處理器20根據前述多個語音通聯紀錄獲得多個門號中的每一個門號的通話特徵。在步驟S505中，處理器20將前述每一個門號的通話特徵輸入至已訓練的模型以獲得判斷結果。其中，此判斷結果用於指出前述多個門號中屬於詐騙電話的門號。最後再步驟S507中，輸入輸出電路22輸出前述的判斷結果。Referring to FIG. 5, in step S501, the input/output circuit 22 obtains multiple voice communication records related to voice. In step S503, the processor 20 obtains the call characteristics of each of the plurality of door numbers according to the aforementioned multiple voice communication records. In step S505, the processor 20 inputs the call characteristics of each of the aforementioned door numbers into the trained model to obtain the judgment result. Among them, the result of this judgment is used to indicate the number of a fraudulent call among the aforementioned multiple door numbers. Finally, in step S507, the input/output circuit 22 outputs the aforementioned judgment result.

綜上所述，本發明的詐騙電話偵測方法與電子裝置，可以透過已訓練的模型來有效地辨識出屬於詐騙電話的門號，藉此即早防止詐騙電話對民眾所造成的危害。In summary, the fraudulent call detection method and electronic device of the present invention can effectively identify the door number of a fraudulent call through the trained model, thereby preventing the harm caused by the fraudulent call to the public.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The protection scope of the present invention shall be subject to those defined by the attached patent application scope.

100:電子裝置 20:處理器 22:輸入輸出電路 24:儲存電路 S201~S207、S301~S307、S501~S507:步驟 BK1~BK4:區塊 C1~C2:行 R1~R2:列 M1:矩陣100: electronic device 20: processor 22: Input and output circuit 24: storage circuit S201~S207, S301~S307, S501~S507: steps BK1~BK4: block C1~C2: OK R1~R2: column M1: matrix

圖1是依照本發明的一實施例所繪示的電子裝置的方塊圖。圖2是依照本發明的一實施例所繪示的模型的訓練的流程圖。圖3是依照本發明的一實施例所繪示的模型的使用的流程圖。圖4是依照本發明的一實施例所繪示的模型預測準確度的示意圖。圖5是依照本發明的一實施例所繪示的詐騙電話偵測方法的流程圖。FIG. 1 is a block diagram of an electronic device according to an embodiment of the invention. Fig. 2 is a flowchart of the training of a model according to an embodiment of the present invention. Fig. 3 is a flow chart of the use of a model according to an embodiment of the present invention. FIG. 4 is a schematic diagram of the accuracy of model prediction according to an embodiment of the present invention. FIG. 5 is a flowchart of a method for detecting fraudulent calls according to an embodiment of the present invention.

S501~S507:步驟 S501~S507: steps

Claims

A method for detecting fraudulent calls is used in an electronic device. The method includes: obtaining a plurality of voice communication records related to voice; and obtaining, according to the plurality of voice communication records, among a plurality of phone numbers provided by a telecommunications company Multiple call characteristics of each door number, wherein the multiple call characteristics include the average number of base stations used by a door number in a week and the third to fourth digits of the calling number; each of the multiple door numbers The plurality of call characteristics of a door number are input to a trained model to obtain a judgment result; and the judgment result is output, wherein the judgment result is used to indicate the door of the plurality of door numbers that is a fraudulent call. number.

The fraudulent call detection method according to the first item of the scope of patent application, wherein the step of obtaining the plurality of voice communication records related to voice includes: extracting from the plurality of communication records stored in a first database Obtain the multiple voice communication records related to the voice, and store the multiple voice communication records in a second database.

The fraudulent call detection method according to the first item of the scope of patent application, wherein the plurality of calls of each of the plurality of phone numbers provided by the telecommunications company are obtained according to the plurality of voice communication records The feature step includes: performing a feature extraction operation and a data aggregation operation on the plurality of voice communication records to obtain the plurality of door numbers The multiple call characteristics of each door number in.

The fraudulent call detection method described in item 1 of the scope of patent application, wherein the multiple call characteristics also include the number of calls per day, the time period of the call, the average length of a single call, and the standard for the length of a single call At least one of the difference, the average length of the total call time in a single day, and the diversity of the number of the call.

The fraudulent call detection method according to the first item of the scope of patent application, wherein before the step of inputting the plurality of call features of each of the plurality of door numbers into the trained model, The method further includes: performing a feature extraction operation and a data aggregation operation on a plurality of first voice communication records that are previously identified as fraudulent phone numbers to obtain a plurality of first voice communication records. A call feature; training a model according to the plurality of first call features to obtain the trained model.

The fraudulent call detection method as described in item 5 of the scope of patent application, wherein the number of calls in a single day in the plurality of first call characteristics is greater than a first threshold, the call period is within a specific period, and a single call The average length of is less than a second threshold, the standard deviation of the length of a single call is less than a third threshold, the average total call time in a single day is greater than a fourth threshold, and the diversity of received numbers is greater than a fifth threshold Value, the average number of the base stations used is greater than a sixth threshold and/or the third to fourth codes of the calling number meet a specific value.

An electronic device comprising: an input and output circuit; and A processor for being coupled to the input and output circuit, wherein the input and output circuit obtains a plurality of voice communication records related to voice, and the processor obtains a plurality of voice communication records provided by the telecommunications company according to the plurality of voice communication records The multiple call characteristics of each of the multiple door numbers, wherein the multiple call characteristics include the average number of base stations used by a door number in a week and the third to fourth digits of the calling number, the processing The device inputs the plurality of call characteristics of each of the plurality of door numbers into a trained model to obtain a judgment result, and the input-output circuit outputs the judgment result, wherein the judgment The result is used to point out the number of a fraudulent call among the plurality of door numbers.

The electronic device according to item 7 of the patent application, wherein in the operation of obtaining the plurality of voice communication records related to the voice, the processor obtains the plurality of communication records from the plurality of communication records stored in a first database In the process, the multiple voice communication records related to the voice are captured, and the multiple voice communication records are stored in a second database through the input and output circuit.

The electronic device according to item 7 of the scope of patent application, wherein the call characteristics of each of the plurality of phone numbers provided by the telecommunications company are obtained based on the plurality of voice communication records In operation, the processor performs a feature extraction operation and a data aggregation operation on the plurality of voice communication records to obtain all of the plurality of door numbers. Describes multiple call features.

As for the electronic device described in item 7 of the scope of patent application, the multiple call characteristics further include the number of calls per day, the time period of the call, the average length of a single call, the standard deviation of the length of a single call, and the number of calls per call. At least one of the average length of the total daily talk time and the diversity of the received number.

The electronic device according to claim 7, wherein before inputting the plurality of call features of each of the plurality of door numbers into the operation of the trained model, the processing The device performs a feature extraction operation and a data aggregation operation on a plurality of first voice communication records that are previously identified as fraudulent phone numbers to obtain a plurality of first call features; A model is trained on the plurality of first call features to obtain the trained model.

The electronic device described in item 11 of the scope of patent application, wherein the number of calls in a single day in the plurality of first call features is greater than a first threshold, the call period is within a specific period, and the length of a single call The average is less than a second threshold, the standard deviation of the length of a single call is less than a third threshold, the average total call time in a single day is greater than a fourth threshold, the diversity of the received numbers is greater than a fifth threshold, use The average number of the base stations is greater than a sixth threshold and/or the third to fourth codes of the calling number meet a specific value.