TWI434197B - Knowledge camouflage method - Google Patents

Knowledge camouflage method Download PDF

Info

Publication number
TWI434197B
TWI434197B TW99111192A TW99111192A TWI434197B TW I434197 B TWI434197 B TW I434197B TW 99111192 A TW99111192 A TW 99111192A TW 99111192 A TW99111192 A TW 99111192A TW I434197 B TWI434197 B TW I434197B
Authority
TW
Taiwan
Prior art keywords
data
knowledge
original
protection
original data
Prior art date
Application number
TW99111192A
Other languages
Chinese (zh)
Other versions
TW201135507A (en
Inventor
Tung Hsiao Chen
Jeanne Chen
Yuan Hung Kao
Original Assignee
Tung Hsiao Chen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tung Hsiao Chen filed Critical Tung Hsiao Chen
Priority to TW99111192A priority Critical patent/TWI434197B/en
Publication of TW201135507A publication Critical patent/TW201135507A/en
Application granted granted Critical
Publication of TWI434197B publication Critical patent/TWI434197B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Description

知識偽裝方法 Knowledge camouflage method

本發明是有關於一種知識偽裝方法,特別是有關於一種以資料偽裝技術達成反探勘之知識偽裝方法。 The invention relates to a knowledge camouflage method, in particular to a knowledge camouflage method for achieving anti-exploration by data camouflage technology.

目前,隨著資訊科技與資料處理技術的日益精進,企業在資料的蒐集、儲存與應用等能力亦明顯的提昇。尤其在網際網路快速發展與普及的潮流下,知識經濟已充斥全球,誰能擁有知識就有可能創造新的經濟體。因此,於知識經濟的時代,企業若能將龐大的資料有效率的轉換成有價值的知識,即可快速因應未來與抓住市場脈動,進而提升企業的競爭力。 At present, with the increasing precision of information technology and data processing technology, the ability of enterprises to collect, store and apply data has also been significantly improved. Especially in the trend of rapid development and popularization of the Internet, the knowledge economy has flooded the world, and anyone who has knowledge can create new economies. Therefore, in the era of knowledge economy, if enterprises can efficiently convert large amounts of information into valuable knowledge, they can quickly respond to the future and seize the market pulsation, thereby enhancing the competitiveness of enterprises.

然而,過去僅靠人工與個人經驗來處理大量的資料已不合時宜,取而代之的是結合人工智慧與自動化工具的知識萃取。因此,資料探勘(Data mining)技術的發展亦隨之盛行,並且廣受學術界與產業界的重視與應用。 However, in the past, it was no longer appropriate to process large amounts of data by manual and personal experience. Instead, knowledge extraction combined with artificial intelligence and automated tools was used. Therefore, the development of data mining technology has also become popular, and it has been widely valued and applied by academic circles and industry.

資料探勘是一種專業的資訊科技,主要的目的是於龐大的資料中,挖掘出對使用者有用或感興趣的資訊,並配合企業的domain know-how將挖掘出來的資訊轉化成為知識,以助於企業決策運作,與幫助管理者經營發展企業活動。基於資料的種類與使用者的需求,資料探勘的技術可分成以下六類:分群(Clustering)、分類(Classification)、迴歸分析(Regression)、時間數列 (Time-series)、關聯式法則(Association rule)、序列探索(Sequence discovery)等六種。 Data exploration is a professional information technology. The main purpose is to mine information that is useful or interesting to users in a huge amount of information, and to transform the information that is mined into knowledge by the domain know-how of the enterprise. Operate in corporate decision-making and help managers manage business activities. Based on the type of data and the needs of users, data mining techniques can be divided into the following six categories: Clustering, Classification, Regression, Time Series. (Time-series), association rules, sequence discovery, and so on.

這些不同的資料探勘技術各有其特性與探勘分析的功能,使用者在進行資料探勘的同時,可依其需求選用不同的分析工具,從大量的資料中挖掘出不同類型的知識。例如:分類是用已知類別的資料來研究它們的特徵(Feature),然後再根據這些特徵預測新資料的類別(Class)。而分群則是依照資料本身的特徵屬性,來找出資料之間的相似性(Similarity),並依相似的程度予以群聚。 These different data exploration techniques each have their own characteristics and exploration and analysis functions. Users can select different analysis tools according to their needs while mining data, and mine different types of knowledge from a large amount of data. For example, classification uses the knowledge of known categories to study their characteristics, and then predicts the category of new data based on these characteristics. The grouping is based on the characteristic attributes of the data itself to find the similarity between the data (Similarity) and to be clustered according to the degree of similarity.

目前資料探勘的技術,主要應用於金融、流通、製造與生物資訊等業界,對於其它領域也非常的實用。資料探勘可以幫助這些使用者,從資料中挖掘出具有實質意義的知識數據,以做為決策依據或是分析判斷。相對的,如果這些具有知識價值的資料不慎遺失或外流,有心者一樣可以利用資料探勘的技術,挖掘出資料裡蘊藏的知識,從而達到不當利益或是惡意破壞,使受害者蒙受損失甚至喪失競爭能力。所以,在使用資料探勘技術的同時,更應該要注重因為資料探勘所衍生的知識安全性議題。 At present, the technology of data exploration is mainly used in industries such as finance, circulation, manufacturing and bioinformatics, and is also very practical for other fields. Data mining can help these users to extract meaningful knowledge data from the data as a basis for decision-making or analysis. In contrast, if these materials with knowledge value are inadvertently lost or outflowed, those who are interested can use the technology of data exploration to dig out the knowledge contained in the data, thereby achieving improper benefits or malicious damage, causing victims to suffer losses or even loss. Competitive ability. Therefore, while using data exploration technology, it is more important to pay attention to the issue of knowledge security arising from data exploration.

根據甲骨文(Oracle)首席執行長Larry Ellison,在2005年於三藩市舉行的「Oracle OpenWorld」會議上強調說:「隨著更多的公司把商務應用程式放到網路上,且允許員工從家中或全球各地的分公司連結這些應用系 統,資料庫的安全性風險就將會升高。為了降低安全性風險,企業應該為它們的資料庫加密。」同時,Larry Ellison建議企業禁止用戶在沒有進行加密的情況下實施資料備份,因為如果沒有經過加密的備份,一旦遺失就等同於失去企業重要的資訊。 According to Oracle Chief Executive Larry Ellison, at the "Oracle OpenWorld" conference held in San Francisco in 2005, "As more companies put business applications on the Internet, they allow employees to be at home." Or affiliates around the world connect these applications The security risks of the database will increase. To reduce security risks, companies should encrypt their databases. At the same time, Larry Ellison suggested that companies prohibit users from performing data backups without encryption, because if there is no encrypted backup, once lost, it is equivalent to losing important information about the enterprise.

過去,在資料探勘的研究領域裡,多著重於改善資料探勘的技術,或是專注於技術整合與創新等等,對於如何保護資料所蘊藏的知識,不要輕易的被資料探勘挖掘出來,並未受到明顯的重視。在網際網路日漸發達的今日,各企業多已將資料庫與網際網路做連結,相對的亦提高資料的安全性風險,而針對於這些資料的安全性考量,主要仍是以資料的存取保全與系統風險管理為主,並未考慮到如何保護知識的安全性問題。也就是說,如果某人被授權可以存取資料,則這個人就可以挖掘出資料內的知識了。但事實上,被授權可以存取資料與被授權可以知道企業的知識,是完全不同等級權力的。所以,如何能在使用資料探勘的同時保護資料所蘊藏的知識,是目前欲解決之問題。 In the past, in the research field of data exploration, more emphasis on improving the technology of data exploration, or focusing on technology integration and innovation, etc., how to protect the knowledge contained in the data, not easily explored by data exploration, Received a clear attention. In today's increasingly developed Internet, companies have linked the database to the Internet, which in turn increases the security risks of data. The security considerations for these data are still based on data. Security and system risk management are the main factors, and how to protect the security of knowledge is not considered. That is to say, if someone is authorized to access the data, the person can dig out the knowledge in the data. But in fact, being authorized to access data and being authorized to know the knowledge of the enterprise is completely different levels of power. Therefore, how to protect the knowledge contained in the data while using data exploration is the problem that is currently being solved.

有鑑於上述習知技藝之問題,本發明之目的就是在提供一種知識偽裝方法,以解決資料探勘的知識安全性問題。 In view of the above-mentioned problems of the prior art, the object of the present invention is to provide a knowledge camouflage method to solve the problem of knowledge security of data exploration.

根據本發明之一目的,提出一種知識偽裝方法,首 先輸入一原始資料,利用至少一資料偽裝方法來加入干擾資料或擾亂原始資料,並記錄一修改資訊作為移除干擾資料與還原資料之依據,修改後之原始資料係為一保護資料,當保護資料達到一知識偽裝目標時,輸出保護資料。 According to an object of the present invention, a knowledge camouflage method is proposed. First input a raw data, use at least one data camouflage method to add interference data or disturb the original data, and record a modified information as a basis for removing the interference data and restoring the data, the modified original data is a protection data, when the protection When the data reaches a knowledge camouflage target, the protection data is output.

其中,原始資料係為具有知識內容的資料集。 Among them, the original data is a data set with knowledge content.

其中,利用資料偽裝方法加入干擾資料或擾亂原始資料的方式,係加入複數筆修改資訊,或擾亂原始資料之數值的欄位或位置順序。 Among them, the method of using the data camouflage method to add interference data or disturb the original data is to add a plurality of pieces to modify the information, or to disturb the field of the original data or the position order.

其中,干擾資料係以原始資料為基礎,運用公式推導產生,例如:偏移量、隨機震盪等方法,用以加入原始資料中來產生保護資料,使保護資料的知識與原始資料的知識有明顯的不同,藉以偽裝原始資料的知識。 Among them, the interference data is based on the original data and is derived by formula derivation, such as offset, random oscillation, etc., used to add the original data to generate protection data, so that the knowledge of the protection data and the knowledge of the original data are obvious. The difference is to camouflage the knowledge of the original material.

其中,擾亂原始資料係指重新排列及組合原始資料數值的欄位或位置,進而產生保護資料,使保護資料的知識與原始資料的知識有明顯的不同,藉以偽裝原始資料的知識。 Among them, the disturbance of the original data refers to the field or position of rearranging and combining the original data values, thereby generating protection data, so that the knowledge of the protection data is significantly different from the knowledge of the original data, thereby disguising the knowledge of the original data.

其中,保護資料產生的過程,係可以完全記錄或以一修改產生公式推導作為修改資訊,讓保護資料可以根據修改資訊還原成原始資料。 Among them, the process of protecting data generation can be completely recorded or deduced by a modified formula to modify the information, so that the protection data can be restored to the original data according to the modified information.

其中,當保護資料需被還原時,係利用修改資訊於保護資料中移除干擾資料或還原原始資料之數值的欄位或位置等資訊,用以將保護資料還原成原始資料。 Wherein, when the protection data needs to be restored, the information or the information such as the field or the position of the original data is removed from the protection data by using the modification information to restore the protection data to the original data.

承上所述,依本發明之知識偽裝方法,其可具有一 或多個下述優點: According to the above, according to the knowledge camouflage method of the present invention, it may have a Or multiple advantages:

(1)有效偽裝原始資料中之知識,而不被資料探勘技術正確的分析出來。 (1) Effectively disguise the knowledge in the original data without being correctly analyzed by the data exploration technology.

(2)此知識偽裝方法於實行資料偽裝之時,可以由使用者需求彈性調整知識偽裝之效果。 (2) This knowledge camouflage method can flexibly adjust the effect of knowledge camouflage by the user's demand when performing data camouflage.

(3)此知識偽裝方法係為一種可反向還原之知識偽裝技術。 (3) This knowledge camouflage method is a reverse-recovery knowledge camouflage technology.

請參閱第1圖,其係為本發明之知識偽裝方法之流程圖。此方法,包括下列步驟:(S10)輸入一原始資料,以供資料偽裝方法進行知識偽裝;(S11)利用一資料偽裝方法擾亂原始資料,並於修改時記錄一修改資訊;(S12)原始資料經由資料偽裝後成為一保護資料;(S13)判斷保護資料是否達到一終止條件,若是則至步驟(S14),若否則至步驟(S11);以及(S14)輸出保護資料。 Please refer to FIG. 1 , which is a flow chart of the knowledge camouflage method of the present invention. The method comprises the following steps: (S10) inputting a raw material for data camouflage method for knowledge camouflage; (S11) using a data camouflage method to disturb the original data, and recording a modified information when modifying; (S12) original data After the data is disguised, it becomes a protection material; (S13) determining whether the protection data reaches a termination condition, and if so, to step (S14), otherwise to step (S11); and (S14) outputting the protection data.

於以上步驟中,原始資料係為具有知識內容的資料集。資料偽裝方法於原始資料中,取得產生干擾資料或重新排列組合原始資料數值的修改資訊,使得原始資料根據修改資訊,被修改為保護資料,再判斷保護資料是 否達到終止條件時;若是,則將保護資料輸出;若否,則回到上一步,繼續修改原始資料,直到達到終止條件。再者,資料偽裝方法加入干擾資料或擾亂原始資料的方式包括,加入至少一修改欄位資訊、加入複數筆修改資訊或修改原始資料數值之欄位或位置。 In the above steps, the original data is a data set with knowledge content. The data camouflage method obtains the modification information which generates the interference data or rearranges the combined original data values in the original data, so that the original data is modified to protect the data according to the modification information, and then the protection data is judged to be If the termination condition is reached; if it is, the protection data will be output; if not, return to the previous step and continue to modify the original data until the termination condition is reached. Furthermore, the manner in which the data camouflage method adds interference data or disturbs the original data includes adding at least one field of modifying the field information, adding a plurality of pieces of the modified information, or modifying the field or position of the original data value.

又,偽裝資料方法係以公式推導產生對原始資料進行修改之修改資訊,並根據修改資訊,加入干擾資料或修改原始資料之數值的欄位或位置,藉以產生保護資料。此外,保護資料產生的過程,係可以完全記錄或以一修改產生公式推導作為修改資訊,讓保護資料可以根據修改資訊還原成原始資料。又,終止條件係為使用者預期達到一預期干擾效果門檻值或一執行次數限制門檻值,以符合知識偽裝之目標。 Moreover, the camouflage data method is based on a formula to derive a modification information for modifying the original data, and according to the modification information, adding the interference data or modifying the field or the value of the original data to generate the protection data. In addition, the process of protecting data generation can be completely recorded or deduced by a modified formula to modify the information, so that the protection data can be restored to the original data according to the modified information. Moreover, the termination condition is that the user expects to reach an expected interference effect threshold or an execution limit threshold to meet the goal of knowledge camouflage.

本發明所提出的知識偽裝方法,接下來以一種演算法為舉例,但實際實施時,並不限定於此種方法或特定資料探勘方法,且重複之部分不再贅述。 The knowledge camouflage method proposed by the present invention is exemplified by an algorithm, but in actual implementation, it is not limited to such a method or a specific data exploration method, and the repeated parts are not described again.

請參閱第2圖,其係為本發明之利用分群演算法進行資料偽裝之實施例流程圖。此方法,包括下列步驟:(S20)輸入一原始資料;(S21)利用一分群演算法取得原始資料的分群重心值及一基礎資訊;(S22)設定一終止條件,當原始資料經分群演算法進行修改,並滿足終止條件之後,即終止產生保護資料之步驟; (S23)利用一亂數產生器及一種子值產生修改資訊;(S24)將修改資訊依隨機的方式加入原始資料,形成保護資料;(S25)再利用分群演算法取得保護資料的分群重心值;(S26)判斷保護資料是否達到終止條件,若是進行步驟(S28),否則進行步驟(S27);(S27)刪除保護資料中的修改資訊,並將保護資料還原成原始資料,並回到步驟(S23)重新產生修改資訊;以及(S28)以原始資料的基礎資訊產生修改資訊,並以此修改資訊對於原始資料進行編輯,以產生並輸出保護資料。 Please refer to FIG. 2, which is a flow chart of an embodiment of data scrambling using a clustering algorithm according to the present invention. The method comprises the following steps: (S20) inputting a raw data; (S21) obtaining a group center of gravity value of the original data and a basic information by using a grouping algorithm; (S22) setting a termination condition, when the original data is subjected to a grouping algorithm After the modification is made and the termination condition is met, the step of generating the protection data is terminated; (S23) generating a modification information by using a random number generator and a sub-value; (S24) adding the modification information to the original data in a random manner to form a protection data; (S25) reusing the grouping algorithm to obtain a group center of gravity value of the protection data (S26) judging whether the protection data reaches the termination condition, if the step (S28) is performed, otherwise the step (S27) is performed; (S27) deleting the modification information in the protection data, and restoring the protection data to the original data, and returning to the step (S23) regenerating the modification information; and (S28) generating the modification information based on the basic information of the original data, and modifying the information to edit the original data to generate and output the protection data.

於以上步驟中,先將原始資料定義為D,係如下列所示:D={d i ,i=1,2,3...n} In the above steps, the original data is first defined as D , as shown below: D = { d i , i =1, 2, 3... n }

經由分群演算法分析後,得到第一分群重心值,如下列所示:C={c 1,c 2,...,c k } After analysis by the clustering algorithm, the first group center of gravity value is obtained, as shown below: C = { c 1 , c 2 ,..., c k }

並與各分群的資料數量:cn 1,cn 2,...,cn k 。其中令k為分群的數量,每一個重心點係如下列所示:c j =(c j1,c j2,...,c jm ) And the number of data with each group: cn 1 , cn 2 ,..., cn k . Where k is the number of clusters, and each center of gravity is as follows: c j = ( c j 1 , c j 2 ,..., c jm )

上式中j的值大於等於1並小於等於k。由於,分群 演算法以群聚的重心點來代表所有資料點,所以能減少大量的計算。 The value of j in the above formula is greater than or equal to 1 and less than or equal to k . Since the clustering algorithm represents all the data points with the center of gravity of the cluster, it can reduce a lot of calculations.

而基礎資訊o j 為第j群距離其重心點c j 位置最近的資料點,也就是以下列公式: 令o j =d i ,則d i 必須滿足The basic information o j is the data point of the jth group closest to the position of its center of gravity c j , which is the following formula: Let o j = d i , then d i must satisfy .

藉以得到基礎資訊o j 作為產生第j群修改資料的基礎素材,再以一使用者之前制訂之偏移比例r和基礎資訊o j 來計算第j群偏移量e j ,係如下式所列: The basic information o j is used as the basic material for generating the j-th group modification data, and the j- group offset e j is calculated by the offset ratio r and the basic information o j previously defined by a user, which are listed below. :

設定演算次數t=1及分群重心門檻值T,設定修改資訊之數量的遞增倍數b j 等於t,若第j群已達終止條件的修改目標,則b j 不再改變,並且計算各個分群所需的修改資訊的數量,係如下列所示: Set the calculation number t =1 and the group center of gravity threshold T , and set the increment multiple b j of the number of modification information to be equal to t . If the j group has reached the modification target of the termination condition, b j no longer changes, and each group is calculated. The amount of information required to be modified is as follows:

上式亦即為執行第t次的各分群修改資訊。 The above formula is also the information of each group modification to perform the tth time.

在本發明之一實施例中,修改資訊係以Y表示之,且修改資訊Y以隨機的方式加入原始資料D,產生保護資料D'且D'=(D Y)。再利用分群演算法取得D'之第二分群重心值為C'={c'1,c'2,c'3,...,c' k }。判斷是否達到終止條件,即c' j 是否非常接近(c j +c j ×r),如∥c' j -(c j +c j ×r)∥ T[1,k]。若尚未達到終止條件,則放棄本次產生的修改資訊Y,並設定演算次數tt+1再重新產生修改資訊Y;若達到終止條件之後則輸出保護資料D'。 In an embodiment of the invention, the modification information is represented by Y , and the modification information Y is added to the original data D in a random manner to generate the protection data D 'and D '=( D Y ). Then, using the grouping algorithm, the second group center of gravity value of D ' is obtained as C '={ c ' 1 , c ' 2 , c ' 3 ,..., c ' k }. Determine whether the termination condition is reached, ie, whether c ' j is very close ( c j + c j × r ), such as ∥ c ' j -( c j + c j × r )∥ T and [1, k ]. If the termination condition has not been reached, the modification information Y generated this time is discarded, and the modification number t is set to t +1 and the modification information Y is regenerated; if the termination condition is reached, the protection data D ' is output.

在分群演算法的資料偽裝設計裡,包含了一個由使用者自訂的偏移比例r,且r≠0。主要用途是用來計算各分群重心點偏移的位置,作為分群演算法偽裝知識的目標,r的數值可由使用者依需求來設定調整。 In the data camouflage design of the clustering algorithm, a user-defined offset ratio r is included , and r ≠0. The main purpose is to calculate the position of the gravity point offset of each group. As the target of the disguise knowledge of the grouping algorithm, the value of r can be adjusted by the user according to the demand.

分群演算法的資料偽裝設計,可讓使用者有彈性的調整偏移比例r,並依調整偏移比例r與基礎資訊o j 來產生具有與原始資料D相近似的修改資訊Y。再經由評估效果的方式,檢查新產生的分群重心點位置是否達到偏移目標。如果沒有達到偏移目標,再藉由加入修改資訊Y的方法,使各分群的重心點位置偏向使用者設定的偏移目標,以完成分群演算法的資料偽裝目的,確保原始資料D裡的群聚重心點位置,也就是其分群知識,不會因資料的外流,而被正確的分析萃取出來。 The data camouflage design of the clustering algorithm allows the user to flexibly adjust the offset ratio r , and according to the adjustment offset ratio r and the basic information o j to generate the modification information Y which is similar to the original data D. Then, by evaluating the effect, it is checked whether the position of the newly generated cluster center of gravity point reaches the offset target. If the offset target is not reached, the position of the center of gravity of each group is biased to the offset target set by the user by adding the method of modifying the information Y , so as to complete the data camouflage purpose of the grouping algorithm and ensure the group in the original data D. The focus of the center of gravity, that is, its group knowledge, will not be extracted by the correct analysis due to the outflow of data.

請參閱第3圖,其係為本發明之利用分群演算法進行資料解除保護之實施例流程圖。此方法,包括下列步驟:(S30)利用原來的亂數產生器及原來的種子值Seed產生修改資訊;(S31)刪除保護資料中之修改資訊;以及(S32)根據修改資訊將已刪除修改資訊的保護資料還原成原始資料。 Please refer to FIG. 3, which is a flow chart of an embodiment of the present invention for performing data de-protection using a clustering algorithm. The method comprises the following steps: (S30) generating the modification information by using the original random number generator and the original seed value Seed ; (S31) deleting the modification information in the protection data; and (S32) deleting the modification information according to the modification information. The protected data is restored to the original data.

於以上步驟中,利用種子值Seed、保護資料D'、偏移比例r、及設定亂數產生器,產生保護資料D'之修改資訊Y,再輸出還原後之原始資料DIn the above steps, using the seed value Seed , the protection data D ', the offset ratio r , and setting the random number generator, the modification information Y of the protection data D ' is generated, and then the restored original data D is output.

請參閱第4圖,其係為原始資料之實施例表格圖。圖中,為Iris資料(Iris Plants Database)之分群、重心值及分群資料個數,於此實施例中,分群參數k為3,令Iris的分群重心點為C={c 1,c 2,c 3},先將此資料設定為原始資料D,資料筆數n=150,並進行分群演算法分析,得知正確的重心點C。接下來進行參數設定,設定偏移比例r為0.1,及演算總次數為100次,亂數產生器之Seed由使用者自行設定。 Please refer to Figure 4, which is a table diagram of an embodiment of the original data. In the figure, it is the Iris Plants Database group, the center of gravity value and the number of clustered data. In this example, the grouping parameter k is 3, and the Iris grouping center point is C = { c 1 , c 2 , c 3 }, first set this data as the original data D , the number of data n = 150, and analyze the clustering algorithm to know the correct center of gravity C. Next, the parameter setting is made, the offset ratio r is set to 0.1, and the total number of calculations is 100 times, and the seed of the random number generator is set by the user.

請參閱第5圖,其係為知識偽裝效果之實施例表格圖。圖中,為修改後再利用分群演算法分群的保護資料D'後的分群重心點C'。共加入了115筆修改資訊,即達到偏移重心點的目標。以第一個修改後的分群重心點c 1 '和修改前的分群重心點c 1做比較,原本c 1的分群資料數量為50筆資料,經過資料偽裝後的保護資料,加入了40筆的修改資訊,使得c 1 '每一個欄位的數值,都比原本的分群重心點c 1增加了0.1倍以上,如下述計算所示:(5.71-5.00)/5.00=0.142>0.1 Please refer to FIG. 5, which is a table diagram of an embodiment of the knowledge camouflage effect. In the figure, the group center of gravity C' after the protection data D' is grouped and modified by the grouping algorithm. A total of 115 revisions were added, which is the goal of reaching the offset center of gravity. The first modified group center of gravity c 1 ' is compared with the group center of gravity c 1 before the modification. The number of grouped data of the original c 1 is 50 pieces of data. After the data is disguised, the protection data is added to 40 pieces. Modify the information so that the value of each field of c 1 ' is more than 0.1 times higher than the original cluster center point c 1 , as shown in the following calculation: (5.71-5.00) / 5.00 = 0.142 > 0.1

(3.84-3.41)/3.41=0.126>0.1 (3.84-3.41)/3.41=0.126>0.1

(1.74-1.46)/1.46=0.191>0.1 (1.74-1.46)/1.46=0.191>0.1

(0.31-0.24)/0.24=0.291>0.1 (0.31-0.24)/0.24=0.291>0.1

其它c 2 'c 3 '兩個分群重心點亦是如此,由此證實,本發明達到預估的偏移比例r=0.1的目的。可藉由設定偏移比例r的方式,讓使用者自行設定知識偽裝後分群重心點的結果。如此可有效的符合使用者的實際需求, 亦可方便使用者自訂合理的偏移方向,達到保護偽裝與誤導非法使用者的目的。 The same is true for the other c 2 ' , c 3 ' two-group center of gravity points, thus confirming that the present invention achieves the objective of the estimated offset ratio r = 0.1. By setting the offset ratio r , the user can set the result of the cluster center of gravity after the knowledge is disguised. This can effectively meet the actual needs of the user, and can also be convenient for the user to customize the reasonable offset direction to achieve the purpose of protecting the camouflage and misleading the illegal user.

請參閱第6圖,其係為知識偽裝效果之實施例之修改資訊範例表格圖。圖中,在修改資訊與原始資料的相似度方面,由於是以原始資料的中最接近分群重心點的資料為基礎,加上指定範圍內的亂數數值,所以能產生與原始資料相近似,不易分辨的修改資訊。接下來將原始資料與加入的修改資訊裡,各取三筆資料列於圖中,以便觀察兩者之間的差異,觀察每一組的資料,它們相同欄位的數值都相當接近,所產生的修改資訊是無法輕易的被篩選過濾的。 Please refer to FIG. 6 , which is a modified information example table diagram of an embodiment of the knowledge camouflage effect. In the figure, in terms of the similarity between the modified information and the original data, since it is based on the data of the original data which is closest to the center of gravity of the group, plus the random number in the specified range, it can be similar to the original data. It is difficult to distinguish the modification information. Next, in the original data and the added modification information, three data are listed in the figure to observe the difference between the two, and observe the data of each group. The values of the same fields are quite close to each other. The modification information cannot be easily filtered by the filter.

以上所述僅為舉例性,而非為限制性者。任何未脫離本發明之精神與範疇,而對其進行之等效修改或變更,均應包含於後附之申請專利範圍中。 The above is intended to be illustrative only and not limiting. Any equivalent modifications or alterations to the spirit and scope of the invention are intended to be included in the scope of the appended claims.

S10~S14‧‧‧步驟 S10~S14‧‧‧Steps

S20~S28‧‧‧步驟 S20~S28‧‧‧Steps

S30~S32‧‧‧步驟 S30~S32‧‧‧Steps

第1圖 係為本發明之知識偽裝方法之流程圖;第2圖 係為本發明之利用分群演算法進行資料偽裝之方法流程圖;第3圖 係為本發明之利用分群演算法進行資料解除偽裝之方法流程圖;第4圖 係為原始資料集之實施例表格圖;第5圖 係為知識偽裝效果之實施例表格圖;以及 第6圖 係為知識偽裝效果之實施例之干擾資料範例表格圖。 1 is a flow chart of the knowledge camouflage method of the present invention; FIG. 2 is a flowchart of a method for data camouflage using a grouping algorithm according to the present invention; and FIG. 3 is a data release method using the grouping algorithm of the present invention. a flowchart of the method of disguising; FIG. 4 is a table diagram of an embodiment of the original data set; and FIG. 5 is a table diagram of an embodiment of the knowledge camouflage effect; Fig. 6 is a table diagram showing an example of interference data of an embodiment of the knowledge camouflage effect.

S10~S14‧‧‧步驟 S10~S14‧‧‧Steps

Claims (9)

一種知識偽裝方法,其包括下列步驟:輸入一原始資料;利用一資料偽裝方法加入干擾資料或擾亂原始資料,並記錄一修改資訊作為還原資料時之依據;經資料偽裝後之資料係為一保護資料;以及當該保護資料需要被還原時,可利用記錄之修改資訊將保護資料還原成原始資料。 A knowledge camouflage method comprises the steps of: inputting a raw material; using a data camouflage method to add interference data or disturbing the original data, and recording a modified information as a basis for restoring the data; the data after the data is disguised is a protection The data; and when the protected material needs to be restored, the modified information of the record can be used to restore the protected data to the original data. 如申請專利範圍第1項所述之知識偽裝方法,其中該原始資料係為一具有知識內容之資料集。 The method of disguising the knowledge as described in claim 1, wherein the original data is a data set having knowledge content. 如申請專利範圍第1項所述之知識偽裝方法,其中利用該資料偽裝方法加入干擾資料或擾亂該原始資料之方式包括,加入至少一修改欄位資料、加入複數筆干擾資料、或修改該原始資料數值之欄位或位置順序。 The method for disguising the knowledge as described in claim 1 , wherein the method of using the data masquerading method to add interference data or disturb the original data comprises adding at least one modification field data, adding a plurality of interference data, or modifying the original The field or position order of the data values. 如申請專利範圍第3項所述之知識偽裝方法,其中該干擾資料係以原始資料為基礎,運用公式推導產生近似於原始資料之干擾資料,藉以加入原始資料中來產生該保護資料。 For example, the knowledge camouflage method described in claim 3, wherein the interference data is based on the original data, and the formula is used to generate interference data that is similar to the original data, thereby adding the original data to generate the protection data. 如申請專利範圍第3項所述之知識偽裝方法,其中該擾亂原始資料之方式,係以修改該原始資料之數值的欄位或位置,藉以產生該保護資料。 The method of disguising the knowledge as described in claim 3, wherein the method of disturbing the original data is to create the protection data by modifying a field or a position of the value of the original data. 如申請專利範圍第4、5項所述之知識偽裝方法, 其中該保護資料產生的過程,係可以完全記錄或以一修改產生公式推導作為該修改資訊,使得該保護資料可以根據該修改資訊還原成該原始資料。 For example, the knowledge camouflage method described in items 4 and 5 of the patent application scope, The process of generating the protection data may be completely recorded or deduced by a modification formula as the modification information, so that the protection data can be restored to the original data according to the modification information. 如申請專利範圍第4項所述之知識偽裝方法,其中該保護資料係指包含該原始資料與該干擾資料之資料集;其中,該原始資料並未有任何修改或增減。 The method of disguising the knowledge as described in claim 4, wherein the protection data refers to a data set containing the original data and the interference data; wherein the original data has not been modified or increased. 如申請專利範圍第5項所述之知識偽裝方法,其中當該保護資料係指重新排列或組合該原始資料之數值的欄位或位置;其中,該原始資料的數值並未有修改。 The method of disguising the knowledge as described in claim 5, wherein the protected data refers to a field or position in which the value of the original data is rearranged or combined; wherein the value of the original material is not modified. 如申請專利範圍第1項所述之知識偽裝方法,其中於該保護資料還原時,係利用該修改資訊於該保護資料中移除干擾資料或將該修改資訊還原成該原始資料之數值的欄位或位置。 The method for disguising the knowledge as described in claim 1, wherein when the protection data is restored, the modification information is used to remove the interference data from the protection data or restore the modification information to a column of the value of the original data. Bit or position.
TW99111192A 2010-04-09 2010-04-09 Knowledge camouflage method TWI434197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99111192A TWI434197B (en) 2010-04-09 2010-04-09 Knowledge camouflage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99111192A TWI434197B (en) 2010-04-09 2010-04-09 Knowledge camouflage method

Publications (2)

Publication Number Publication Date
TW201135507A TW201135507A (en) 2011-10-16
TWI434197B true TWI434197B (en) 2014-04-11

Family

ID=46751918

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99111192A TWI434197B (en) 2010-04-09 2010-04-09 Knowledge camouflage method

Country Status (1)

Country Link
TW (1) TWI434197B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI820007B (en) * 2017-03-08 2023-11-01 香港商阿里巴巴集團服務有限公司 Method and device for displaying contact information and method and device for displaying information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI820007B (en) * 2017-03-08 2023-11-01 香港商阿里巴巴集團服務有限公司 Method and device for displaying contact information and method and device for displaying information

Also Published As

Publication number Publication date
TW201135507A (en) 2011-10-16

Similar Documents

Publication Publication Date Title
Xu et al. Building confidential and efficient query services in the cloud with RASP data perturbation
KR101033511B1 (en) Method for protecting private information and computer readable recording medium therefor
Chen et al. Privacy-preserving multiparty collaborative mining with geometric data perturbation
Vidyarthi et al. Static malware analysis to identify ransomware properties
Kaur et al. A secure data classification model in cloud computing using machine learning approach
Narwaria et al. Privacy preserving data mining—‘A state of the art’
Surendra et al. Hiding sensitive itemsets without side effects
TWI434197B (en) Knowledge camouflage method
Talha et al. Quality and Security in Big Data: Challenges as opportunities to build a powerful wrap-up solution.
Kalia et al. A hybrid approach for preserving privacy for real estate data
Kumar et al. Data Privacy Over Cloud Computing using Multi Party Computation
Mynavathi et al. K nearest neighbor classifier over secured perturbed data
Toraskar et al. Efficient computer forensic analysis using machine learning approaches
Al-Asadi et al. Security enhancement of big data in cloud application using block-chain
CN110990876A (en) Database sensitivity correlation attribute desensitization method based on invariant random response technology
Aïmeur et al. Data mining and privacy
Kanimozhi et al. An Efficient privacy preserving using map reduce based international data encryption algorithm and weighted Auto Encoder
Jamadi et al. Privacy Preserving Data Mining Based on Geometrical Data Transformation Method (GDTM) and K-Means Clustering Algorithm
Kamakshi et al. A novel framework to improve the quality of additive perturbation technique
Johora et al. Notice of Violation of IEEE Publication Principles: A Robust Database Watermarking using Local Differential Privacy
Archana et al. Machine Learning Approaches To Protecting Privacy In Data Mining
Gunawan et al. Protecting Sensitive Frequent Itemsets in Database Transaction Using Unknown Symbol
Kamakshi et al. Framework to reduce the hiding failure due to randomized additive data modification PPDM technique
Vijayarani et al. Masking Techniques for Confidential Data Protection in Privacy-Preserving Data Mining
Mongardini et al. DARD: Deceptive Approaches for Robust Defense Against IP Theft

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees