CN108351946A - System and method for anonymization journal entries - Google Patents

System and method for anonymization journal entries Download PDF

Info

Publication number
CN108351946A
CN108351946A CN201680062430.3A CN201680062430A CN108351946A CN 108351946 A CN108351946 A CN 108351946A CN 201680062430 A CN201680062430 A CN 201680062430A CN 108351946 A CN108351946 A CN 108351946A
Authority
CN
China
Prior art keywords
data
journal entries
data field
anonymization
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201680062430.3A
Other languages
Chinese (zh)
Other versions
CN108351946B (en
Inventor
M·斯珀特斯
W·E·索贝尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CA Inc
Original Assignee
Symantec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Symantec Corp filed Critical Symantec Corp
Publication of CN108351946A publication Critical patent/CN108351946A/en
Application granted granted Critical
Publication of CN108351946B publication Critical patent/CN108351946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/105Multiple levels of security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2101Auditing as a secondary aspect

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a kind of computer implemented method for anonymization journal entries, the method may include:(1) data pattern in one group of journal entries is detected, the event that one group of journal entries record is executed by least one process executed at least one equipment, (2) at least one data field for including variable data in the journal entries is identified in the data pattern, (3) whether the data field of the assessment comprising variable data includes sensitive data with the determination data field, and whether (4) include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field with journal entries described in anonymization.The invention also discloses various other methods, system and computer-readable mediums.

Description

System and method for anonymization journal entries
Background technology
System Operation Log (such as security system daily record) generally comprises the valuable data of system operatio for information about.Example Such as, system manager can monitor security log to verify security system normal operating, diagnostic operation or performance issue, identification system System weakness, the source for identifying security threat, and/or forensics analysis is carried out to security breaches.Administrator can also excavate peace Full journal entries are to find the security threat of new type.In addition, data analyst can be with digging system operation log to analyze use Family behavior and/or system performance.
However, System Operation Log generally includes sensitive information, such as personally identifiable information (PII) or foundation structure are related Information (such as network address or server name).Regrettably, this information may enable attacker to map internal network And search for tender spots.Log information may also be exposed to workable work schedule in social engineering attack, personal relationship or Other information.Therefore, if security log is unprotected, security log may be the information source used during orientation threatens.Cause This, the disclosure identifies and solves the needs to the system and method for safeguarding the other of journal entries and improving.
Invention content
As the following more detailed description, the present disclosure describes include quick for the possibility by identifying in journal entries Feel the field of information and then carrys out the various of anonymization journal entries using the data anonymous strategy of the anonymization sensitive information System and method.System and method as described herein using various machine learning techniques can identify sensitive information and to quick Sense information is distinguished with other variable datas.System and method as described herein can also be directed to new entry monitoring data day Will determines whether new entry includes sensitive information, and when recognizing the new data field comprising sensitive information, anonymization is existing There are log file entries.
In one example, a kind of computer implemented method for anonymization journal entries may include:(1) it detects Data pattern in one group of journal entries, this group of journal entries record one or more mistakes by being executed in one or more equipment The event of Cheng Zhihang, (2) identify the one or more data fields for including variable data in journal entries in data pattern, (3) data field of the assessment comprising variable data is to determine whether data field includes sensitive data, and (4) in response to determination Whether data field includes sensitive data, and data anonymous strategy is applied to data field with anonymization journal entries.
In some instances, it may include executing source template to journal entries to detect the data pattern in journal entries Practise analysis.In some instances, it may include jointly sub to journal entries execution longest to detect the data pattern in journal entries Sequence analysis.In one embodiment, computer implemented method can also include:(1) it receives from one or more attached The journal entries of the additional process executed on oil (gas) filling device, one group of data that (2) will be previously identified in journal entries and journal entries Data patterns match in pattern, (3) identification correspond to the data anonymous strategy of data pattern, and (4) pass through application pair The data anonymous strategy answered carrys out anonymization journal entries.
In one embodiment, computer implemented method can also include:(1) determine wherein be necessary for be considered as The data pattern of anonymization finds the number of thresholds of the contexts of privacy of data pattern, and (2) are detected in one group of contexts of privacy Data pattern, (3) determine that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, and (4) response In determining that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, determine that data pattern is anonymization 's.In one embodiment, data field assessment determines that data field includes sensitive data, and data anonymous strategy is logical It crosses following manner and carrys out anonymization data field:(1) carry out encryption data field using uni-directional hash, (2) are added using reversible encryption Ciphertext data field, (3) utilize random data replacement data field, (4) that static data replacement data field, (5) is utilized to remove number According to field, and/or (6) generalized data field.
In one embodiment, data field assessment determines that data field includes enumerated data and therefore do not wrap Containing sensitive data, and data anonymous strategy does not change data field.In another embodiment, data field assessment is true Determine the data that data field includes the known data type for not including sensitive data, and data anonymous strategy does not change data Field.In addition, even if data field is previously determined to not include sensitive data, data field assessment can also determine data word Section includes sensitive data now.Data anonymous strategy then can be with the data field in one group of existing journal entries of anonymization.
In one embodiment, a kind of system for realizing the above method may include stored in memory several Module, such as:(1) mode module, the mode module detect the data pattern in one group of journal entries, this group of journal entries record The event executed by the one or more processes executed in one or more equipment, (2) field analysis module, the field analysis mould Block identifies the one or more data fields for including variable data in journal entries in data pattern, (3) data analysis module, The data analysis module assesses the data field comprising variable data to determine whether data field includes sensitive data, and (4) anonymization module, the anonymization module is in response to determining whether data field includes sensitive data, by data anonymous strategy Applied to data field with anonymization journal entries.System can also include at least one physical processor, the physical processor It is configured as execution pattern module, field analysis module, data analysis module and anonymization module.
In some instances, the computer-readable finger that the above method can be encoded in non-transitory computer-readable medium It enables.For example, a kind of computer-readable medium may include one or more computer executable instructions, the computer is executable It instructs at least one processor execution by computing device, may make computing device:(1) it detects in one group of journal entries Data pattern, the event that this group of journal entries record is executed by the one or more processes executed in one or more equipment, (2) The one or more data fields for including variable data in journal entries are identified in data pattern, (3) assessment includes can parameter According to data field with determine data field whether include sensitive data, and (4) in response to determine data field whether include Data anonymous strategy is applied to data field with anonymization journal entries by sensitive data.
Feature from any of the embodiment above can be according to General Principle as described herein in conjunction It uses.Following detailed description is read with claim in conjunction with the accompanying drawings, it will be more fully understood by these and other implementations Scheme, feature and advantage.
Description of the drawings
Attached drawing shows multiple exemplary implementation schemes and is part of specification.These attached drawings combine following retouch State the various principles for showing and illustrating the disclosure.
Fig. 1 is the block diagram for the exemplary system of anonymization journal entries.
Fig. 2 is the block diagram of the additional example sexual system for anonymization journal entries.
Fig. 3 is the flow chart for the illustrative methods of anonymization journal entries.
Fig. 4 is the block diagram of example log data.
Fig. 5 is that by the exemplary meter of one or more of embodiment for being described herein and/or showing The block diagram of calculation system.
Fig. 6 is that by the exemplary meter of one or more of embodiment for being described herein and/or showing Calculate the block diagram of network.
In whole attached drawings, same reference character is similar with description instruction but the element that is not necessarily the same.Although described herein Exemplary implementation scheme can receive to carry out various modifications and alternative form, but show spy by way of example in the accompanying drawings Determine embodiment and these embodiments will be described in detail herein.However, exemplary implementation scheme as described herein is not It is restricted to particular forms disclosed.On the contrary, the present invention cover all modifications form fallen within the scope of the appended claims, Equivalent form and alternative form.
Specific implementation mode
The present disclosure relates generally to the system and method for anonymization journal entries.As will be explained in greater, lead to It crosses and applies machine learning techniques, system and method as described herein can be identified by the selected data anonymous strategy of application Including data field described in the data field of personally identifiable information or other sensitive datas and anonymization.Data anonymous strategy May be customized in many ways, the mode include according to data type, desired safe class, journal file not Carry out data mining plan etc..System and method as described herein can also apply data anonymous process to be more than data field anonymity Change measurement.In addition, system and method as described herein can continuously monitor the sensitive data in new or available data field New journal entries, and when recognizing new sensitive information, data anonymous strategy is re-applied in existing security log Set.
The detailed description to the exemplary system for anonymization journal entries is provided below with reference to Fig. 1, Fig. 2 and Fig. 4. The detailed description of corresponding computer implemented method will be also provided in conjunction with Fig. 3.In addition, energy will be provided respectively in connection with Fig. 5 and Fig. 6 Enough realize the exemplary computing system of one or more of embodiment described herein and the detailed description of the network architecture.
Fig. 1 is the block diagram of the exemplary system 100 for anonymization journal entries.As shown in the drawing, exemplary system 100 It may include one or more modules 102 for executing one or more tasks.For example, and will such as solve in further detail below It releases, exemplary system 100 may include the mode module 104 of the data pattern in the multiple journal entries of detection, multiple day The event that will program recording is executed by the one or more processes executed in one or more equipment.Exemplary system 100 can be with Include additionally field analysis module 106, which identifies in multiple journal entries in data pattern comprising can Become one or more data fields of data.Exemplary system 100 can also include data analysis module 108, the data analysis Module estimation includes the data field of variable data to determine whether the data field includes sensitive data.Exemplary system 100 Can include additionally anonymization module 110, which, will in response to determining whether data field includes sensitive data Data anonymous strategy is applied to data field with the multiple journal entries of anonymization.While shown as independent component, but in Fig. 1 One or more of module 102 can indicate the part of individual module or application program.
In certain embodiments, one or more of the module 102 in Fig. 1 can indicate one or more software applications Program or program, the software application or program can make computing device execute one or more when being executed by computing device A task.For example, and such as will be described in greater detail below, one or more of module 102 can indicate to be stored in one Or on multiple computing devices and it is configured as the software module run on one or more computing devices, the computing device Such as in equipment shown in Figure 2 (for example, computing device 202 and/or server 206), Fig. 5 computing system 510 and/or The part of exemplary network architecture 600 in Fig. 6.One or more of module 102 in Fig. 1 can also indicate to be configured as holding The all or part of one or more special purpose computer of row one or more task.
As shown in Figure 1, exemplary system 100 may also include one or more databases, such as database 120.At one In example, database 120 can be configured as the one or more System Operation Logs of storage, data anonymous policy information, be System journal entries mode data and/or the data for assisting in identifying sensitive data (such as personally identifiable information).
Database 120 can indicate part or multiple databases or the computing device of single database or computing device.Example Such as, database 120 can indicate showing in computing system 510, and/or Fig. 6 in a part for the server 206 in Fig. 2, Fig. 5 The part of the example property network architecture 600.Alternatively, the database 120 in Fig. 1 can indicate one can accessed by computing device Or multiple physics autonomous devices, it is exemplary in the server 206 in such as Fig. 2, the computing system 510 in Fig. 5, and/or Fig. 6 The part of the network architecture 600.
Exemplary system 100 in Fig. 1 can be realized with various ways.For example, the whole of exemplary system 100 or one The part of the exemplary system 200 in Fig. 2 can be indicated by dividing.As shown in Fig. 2, system 200 may include via network 204 and server The computing device 202 of 206 communications.In one example, one or more of 202 available modules 102 of computing device are compiled Journey and/or all or part of that data in database 120 can be stored.Additionally or alternatively, it services One or more of 206 available modules 102 of device are programmed and/or can store the whole or one of the data in database 120 Part.
In one embodiment, one or more of module 102 of Fig. 1 is passing through at least the one of computing device 202 A processor and/or when being executed by server 206 so that computing device 202 and/or server 206 being capable of anonymization daily records Entry.For example, and as will be hereinafter described in greater detail, one or more of module 102 can make computing device 202 and/or 206 anonymization journal entries of server.For example, and as will be hereinafter described in greater detail, mode module 104 The data pattern 210 in multiple journal entries 208 can be detected, multiple journal entries record in one or more equipment by holding The event that capable one or more processes execute.Then field analysis module 106 can identify multiple days in data pattern 210 Include one or more data fields 212 of variable data in will entry 208.Then data analysis module 108 can assess packet Data field 212 containing variable data is to determine whether data field 212 includes sensitive data.Finally, anonymization module 110 can In response to determining whether data field 212 includes sensitive data, data anonymous strategy 214 is applied to data field 212 With the multiple journal entries 208 of anonymization, to create anonymization journal entries 216.
Computing device 202 usually indicates that the calculating of any types or form that can read computer executable instructions is set It is standby.The example of computing device 202 includes but not limited to laptop computer, tablet computer, desktop computer, server, honeycomb electricity Words, personal digital assistant (PDA), multimedia player, embedded system, wearable device are (for example, smartwatch, Brilliant Eyes Mirror etc.), game console, combination one or more in these equipment, the exemplary computing system 510 in Fig. 5 or it is any its His suitable computing device.
Server 206 usually indicates to receive, store and/or compare the calculating of any types or form of data Equipment.The example of server 206 includes but not limited to be configured to supply various database services and/or run certain softwares to answer With the apps server and database server of program.
Network 204 usually indicates that any medium or framework of communication or data transmission can be promoted.The example packet of network 204 Include but be not limited to Intranet, wide area network (WAN), LAN (LAN), personal area network (PAN), internet, power line communication (PLC), Exemplary network architecture 600 etc. in cellular network (for example, global system for mobile communications (GSM) network), Fig. 6.Network 204 It can be used and wirelessly or non-wirelessly connect to promote communication or data transmission.In one embodiment, network 204 can promote to calculate and set Standby communication between 202 and server 206.
Fig. 3 is the flow chart for the method 300 realized for the illustrative computer of anonymization journal entries.It is shown in Fig. 3 The step of can be executed by any suitable computer-executable code and/or computing system.In some embodiments, scheme Step shown in 3 can be by the system 100 in Fig. 1, the system 200 in Fig. 2, the computing system 510 in Fig. 5 and/or Fig. 6 One or more of the component of part of exemplary network architecture 600 execute.
As shown in figure 3, at step 302, one or more of system described herein can detect multiple journal entries In data pattern, the thing that multiple journal entries record is executed by the one or more processes executed in one or more equipment Part.For example, the part as the computing device 202 in Fig. 2, mode module 104 can detect in multiple journal entries 208 Data pattern 210, the thing that multiple journal entries record is executed by the one or more processes executed in one or more equipment Part.Including the journal file of journal entries 208 may reside within computing device 202, server 206 and/or it is one or more its On his computing device.Furthermore, it is possible to create day by the one or more processes executed in any equipment in these equipment Will entry.
Mode module 104 may detect the data pattern in journal entries in various ways.For example, for some processes, Especially for widely used program, mode module 104 can obtain daily record from documentation of program or other publicly available sources The data pattern of entry.In other examples, mode module 104 can be directed to existing journal entries using text analyzing program with Identify the fixed part and variable part of journal entries.The method may be enough to identify with seldom variable data field or a small amount of The data pattern of the program of log entry format.
For with being not easy to distinguish the journal file of structure, such as with a large amount of log entry formats or comprising being enumerated The journal file of many variable data fields of the mixing of data and personal recognizable information, mode module 104 can be applied more Any one of kind machine learning algorithm is to detect the data pattern in journal entries.In some instances, mode module 104 can To detect the data pattern in multiple journal entries by executing source template study analysis to multiple journal entries.As herein Used, term " source template study analysis " is typically referred to for will have the text of common words combination to resolve into message The method of variant.
Fig. 4 is the block diagram of example log entry data 400, and figure module 104 can be in source template study analysis Using the journal entries data to detect log entry format.Journal entries data 400 may include and other journal entries phases Set than the journal entries 402 for having been previously identified as having many common words.In this example, journal entries variant tree 404 indicate the data structure of the level for the log entry format for being configured to indicate that the group includes.In this way, occurring when in message When words, which can be associated with the words.For example, message m l, m2 and m3 include words " available ", therefore these message and The node of expression words " available " in journal entries variant tree 404 is associated.In the case where giving each father node, pattern Module 104, which can be searched, to be combined with the associated most frequent words of child node and makes the combination as child node.Mode module 104 can repeat the process, until all message are all associated.Then mode module 104 can be trimmed comprising many variants Tree branch.For example, the message in journal entries 402 may include many IP address and virtual path number.Mode module 104 These variant groups can be identified as including the data field of variable data.
In some instances, mode module 104 can be by executing longest common subsequence analysis to multiple journal entries To detect the data pattern in multiple journal entries.As used herein, term " longest common subsequence analysis " typically refers to use In the algorithm (basis for being used as several text analyzing utility programs) for searching the common words sequence of longest in massage set.Longest Common subsequence is analyzed with source template study analysis the difference is that it considers appearance sequence words in the message, wherein Source template study only relates to whether words occur, without consideration sequence.
At step 304, one or more of system described herein can identify multiple daily record items in data pattern Include one or more data fields of variable data in mesh.For example, the part as the computing device 202 in Fig. 2, field Analysis module 106 can identify the one or more numbers for including variable data in multiple journal entries 208 in data pattern 210 According to field 212.
Field analysis module 106 may identify the data field comprising variable data in various ways.For example, field analysis Data field comprising variable data can be identified as the one of the analysis that mode module 104 executes in step 302 by module 106 Part, to identify the data pattern in journal entries set.For some programs, field analysis module 106 can be from program text Identification includes the data field of variable data in shelves.In other examples, field analysis module 106 can be to existing journal entries Using text analyzing program (the diff utility programs on such as unix system or LINUX system) to identify comprising variable data Data field.
In some instances, as previously mentioned, field analysis module 106 can use machine learning algorithm (such as Source template study analysis and the analysis of longest consensus) identify the data field comprising variable data.For example, such as Fig. 4 institutes Show, source template study analysis can determine IP address and route number in message m 1-m11, to journal entries variant tree 404 Many text variants nodes are introduced, and therefore indicate the data field for including variable data.
At step 306, one or more of system described herein can assess the data field comprising variable data To determine whether the data field includes sensitive data.For example, the part as the computing device 202 in Fig. 2, data analysis Module 108 can assess the data field 212 comprising variable data to determine whether data field 212 includes sensitive data.
As used herein, phrase " sensitive data " typically refer to its public information disclosure individual or entity may be caused to damage Harmful proprietary data.Sensitive data may include personally identifiable information (PII), foundation structure related data (such as internal IP Location, user name or server name), or it is protected from disclosed data by law, contract or organizational politics.
Data analysis module 108 may determine that the data field comprising variable data includes sensitive data in many ways. For example, documentation of program or other publicly available information can indicate that the specific data field in journal entries may include sensitive number According to.In another example, data analysis module 108 may search for database or network directory service is to determine data field It is no to include personally identifiable information, user name, server name etc..In another example, data analysis module 108 can use Network diagnosis come determine data field whether include network infrastructure information (IP address of such as organization internal).Data analysis Module 108 can determine that implicit IP address constitutes sensitive information, and outside ip address does not constitute sensitive information.
At step 308, one or more of system described herein can in response to determine data field whether include Data anonymous strategy is applied to data field with the multiple journal entries of anonymization by sensitive data.For example, the meter as Fig. 2 A part for equipment 202 is calculated, anonymization module 110 can will be counted in response to determining whether data field 212 includes sensitive data It is applied to data field 212 with the multiple journal entries 208 of anonymization, to generate anonymization daily record item according to anonymization strategy 214 Mesh 216.
Data anonymous strategy may be applied to data field by anonymization module 110 in many ways.For example, anonymization Identical data anonymous strategy can be applied to all sensitive datas by module 110, or be applied depending on data type Different data anonymous strategies.In one embodiment, data field assessment can determine that data field includes sensitive number According to.In this embodiment, data anonymous strategy can carry out anonymization data using one or more data anonymous technologies Field.The selection of data anonymous technology can change depending on following:For example, security level needed for data field, being Later analysis or any other standard of no some information needed in retention data field for journal entries.
In one example, anonymization module 110 can be encrypted data field by using uni-directional hash to hide Nameization data field.It can contribute to the analysis later of journal entries using uni-directional hash, while protecting sensitive data from public affairs It opens.When due to applying hashing algorithm every time, which generates identical data for same Hash value, therefore by data hash The information that same Hash value in each case quotes identical source text can be retained by changing, and underground source text.For example, being directed to User name KPAULSEN, MD5 hashing algorithm generate hashed value " d0d4742e5beb935cf3272c4e77215f18 ".Later The people of time analysis journal entries will recognize hashed value reference same subscriber in each case, without knowing user name.
In another example, anonymization module 110 can be encrypted by using reversible encryption to data field Anonymization data field.As Hash, secret value can be retained by carrying out anonymization data field using irreversible cryptographic algorithm Correspondence between source text, and underground source text.However, using reversible encryption, trusted data Analyst can decrypt ciphertext to re-create source text using specific encryption key.
In another example, anonymization module 110 can carry out anonymization number by using random data replacement data field According to field.By this method, anonymization module 110 can protect the sensitive data in data field, and without such as Hash one Sample safeguards the relationship between anonymization data and source data.Carry out anonymization data field still retention data word using random data Section includes the information of variable data.As discussed above, the machine learning algorithm as source template learns can be in analysis day Variable data field is identified during will entry.
In another example, anonymization module 110 can be by generalized data field come anonymization data field.Number It is a kind of anonymization technology according to generalization, certain sensitive data is replaced with to the more typically data of identification specific data classification, Without public data itself.For example, anonymization module 110 can turn to implicit IP address " 208.65.13.15 " anonymity “208.65.13.XXX”.The people of analysis anonymization data logging will identify the sub-network of computing device, but None- identified is special Locking equipment.In another example, user name can be replaced with the title of its department by anonymization module 110.
Some simple anonymization technologies effectively anonymization sensitive data, but little or no be preserved for later The information of analysis.In one example, anonymization module 110 can carry out anonymization number by using static data replacement data field According to field.For example, anonymization module 110 can use character string " [IP address] " to replace IP address.In another example, anonymous Anonymization data field can be carried out simply by removing data field by changing module 110.
In one embodiment, system as described herein can determine data anonymous strategy using statistics heuristic Whether desired anonymization rank has been realized.For example, system as described herein can be with:(1) determine wherein be necessary for be considered as The data pattern of anonymization finds the number of thresholds of the contexts of privacy of data pattern, and (2) are detected in multiple privacy contexts Data pattern, (3) determine that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, and (4) response In determining that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, determine that data pattern is anonymization 's.As used herein, term " contexts of privacy " is typically referred to comprising must be by the environment of the personal information of anonymization.For example, Journal file from a business can be contexts of privacy.If finding data mould in sufficient amount of contexts of privacy Formula, then it is believed that data pattern is without personal recognizable information and therefore by abundant anonymization.For example, as the meter in Fig. 2 A part for equipment 202 is calculated, anonymization module 110 can determine if finding data pattern in 50% contexts of privacy, Then data pattern is by abundant anonymization.
In one embodiment, data field assessment can determine that data field includes enumerated data and therefore Not comprising sensitive data.In this embodiment, data anonymous strategy can not change data field.For example, as in Fig. 2 Computing device 202 a part, data analysis module 108 can determine data field include variable data when, variant Quantity is relatively small, to indicate that the field includes the data of only several probable values enumerated.For example, data analysis module 108 can determine data field only include " installation ", " refreshing " and " not installing " value, and anonymization data field without It need to further operate.
In another embodiment, data field assessment can determine that data field does not include sensitive data known to including Data type data.In this embodiment, data anonymous strategy can not change data field.For example, program is literary The data that shelves or other publicly available sources can include the specific data type for not being considered as sensitive data with indicator data field, And for the data anonymous strategy of data field without taking any further operation come anonymization data field.
In one embodiment, system as described herein can be directed to the new entry monitoring journal file added, And apply data anonymous strategy with anonymization journal entries as needed.For example, system as described herein can be with:(1) it connects The journal entries of the additional process executed on optional equipment are received, (2) had previously known journal entries with multiple journal entries Data patterns match in other one group of data pattern, (3) identification correspond to the data anonymous strategy of data pattern, and (4) by the corresponding data anonymous strategy of application come anonymization journal entries.For example, as the computing device 202 in Fig. 2 A part, mode module 104 can receive the journal entries from equipment, and determine that the data patterns match of journal entries is first The data pattern of preceding identification.Then anonymization module 110 can apply the data anonymous corresponding to identified data pattern Strategy is with anonymization journal entries.
In one embodiment, even if data field is previously determined to not include sensitive data, data field assessment It includes sensitive data that can also determine data field now.In this example, data anonymous strategy can be multiple existing with anonymization There is the data field in journal entries.For example, when for new entry monitoring journal file, system as described herein can identify The sensitive data being previously determined in the field comprising enumerated data or other nonsensitive datas.In particular, conduct A part for computing device 202 in Fig. 2, anonymization module 110 can be in the new journal entries of anonymization and existing journal entries Data field to realize desired data anonymous rank.
As in greater detail, system and method as described herein can be by identifying first comprising sensitive information above Data field and then application carry out anonymization journal entries for the data anonymous strategy of anonymization sensitive data.This paper institutes The system and method stated can apply machine learning algorithm or include the data field of variable data in journal entries for identification Other technologies.System and method as described herein can also identify the sensitive data in data field using various technologies. Additionally, system and method as described herein can select data anonymous strategy to provide various data security levels or advantageous In the analysis later of journal entries.System and method as described herein can also assess data anonymous process to verify the mistake Journey satisfaction or the expectation measured value more than data anonymous.Additionally, system and method as described herein can continue to monitor day Will file so as to the new journal entries of anonymization or determines when to handle existing journal entries again to keep desired Data anonymous rank.
Fig. 5 is the exemplary meter that can realize one or more of embodiment for being described herein and/or illustrating The block diagram of calculation system 510.For example, all or part of of computing system 510 can be combined individually or with other elements to execute One or more of steps described herein (one or more of step such as shown in Fig. 3) and/or as with In the device of execution.All or part of of computing system 510 also can perform any other for being described herein and/or illustrating Step, method or process and/or as the device for execution.
Computing system 510 indicates any uniprocessor or multiprocessor that are able to carry out computer-readable instruction in a broad sense Computing device or system.The example of computing system 510 includes but not limited to:Work station, laptop computer, client-side terminal, Server, distributed computing system, handheld device or any other computing system or equipment.In its most basic configuration, meter Calculation system 510 may include at least one processor 514 and system storage 516.
Processor 514 usually indicate that data or interpretation can be handled and the physics of any types or form that executes instruction at Manage unit (for example, hard-wired central processing unit).In certain embodiments, processor 514, which can receive, comes from software The instruction of application program or module.These instructions can make processor 514 execute one or more for being described herein and/or illustrating The function of a exemplary implementation scheme.
System storage 516 usually indicate can store data and/or other computer-readable instructions any types or The volatibility or non-volatile memory device or medium of form.The example of system storage 516 includes but not limited to:Arbitrary access Memory (RAM), read-only memory (ROM), flash memories or any other suitable memory devices.Although not being required , but in certain embodiments, computing system 510 may include volatile memory-elements (such as, system storage And both non-volatile memory devices (such as, main storage device 532, as detailed below) 516).In one example, Fig. 1 One or more of module 102 can be loaded into system storage 516.
In certain embodiments, in addition to processor 514 and system storage 516, exemplary computing system 510 may be used also Including one or more components or element.For example, as shown in figure 5, computing system 510 may include Memory Controller 518, defeated Enter/export (I/O) controller 520 and communication interface 522, each of which can be interconnected via communication infrastructure 512. Communication infrastructure 512 usually indicate can to promote any types of the communication between one or more components of computing device or The foundation structure of form.The example of communication infrastructure 512 includes but not limited to communication bus (such as Industry Standard Architecture (ISA), peripheral parts interconnected (PCI), PCI Express (PCIe) or similar bus) and network.
Memory Controller 518 usually indicates that one of memory or data or control computing system 510 can be handled Or any types of the communication between multiple components or the equipment of form.For example, in certain embodiments, Memory Controller 518 can carry out leading between control processor 514, system storage 516 and I/O controllers 520 via communication infrastructure 512 Letter.
I/O controllers 520 usually indicate to coordinate and/or control times for outputting and inputting function of computing device The module of what type or form.For example, in certain embodiments, I/O controllers 520 are controllable or promote computing system 510 One or more elements between data transmission, these elements are such as processor 514, system storage 516, communication interface 522, display adapter 526, input interface 530 and memory interface 534.
Communication interface 522 indicates that exemplary computing system 510 and one or more other equipment can be promoted in a broad sense Between any types of communication or the communication equipment of form or adapter.For example, in certain embodiments, communication interface 522 can promote the communication between computing system 510 and special or public network including additional computing systems.Communication interface 522 Example include but not limited to:Wired network interface (such as network interface card), radio network interface (such as radio network interface Card), modem and any other suitable interface.In at least one embodiment, communication interface 522 can via with net Network, directly linking for such as internet are directly connected to provide with remote server.Communication interface 522 also can be for example, by office Domain net (such as ethernet network), personal area network, phone or cable system, cellular phone connection, satellite data connection are appointed What he is suitable connects to provide such connection indirectly.
In certain embodiments, communication interface 522 can also indicate host adapter, the host adapter be configured as through Promoted by external bus or communication channel logical between computing system 510 and one or more complementary networks or storage device Letter.The example of host adapter includes but not limited to:Small computer system interface (SCSI) host adapter, general serial are total Line (USB) host adapter, 1394 host adapter of Institute of Electrical and Electronics Engineers (IEEE), Advanced Technology Attachment (ATA), Parallel ATA (PATA), serial ATA (SATA) and outside SATA (eSATA) host adapter, Fibre Channel port adapters, with Too net adapter etc..Communication interface 522 may also allow for computing system 510 to participate in distributed or remote computation.For example, communication interface 522 can receive the instruction from remote equipment or send an instruction to remote equipment for executing.
As shown in figure 5, computing system 510 may also include at least one display equipment 524, the display equipment is suitable via display Orchestration 526 is connected to communication infrastructure 512.Display equipment 524 is usually indicated to show in a visual manner and is adapted to by display Any types for the information that device 526 forwards or the equipment of form.Similarly, display adapter 526 usually indicates to be configured as turning Send from figure, text and other data of communication infrastructure 512 (or come from frame buffer, as known in the art) with The equipment of any types or form that are shown in display equipment 524.
As shown in figure 5, exemplary computing system 510 may also include is connected to communication infrastructure via input interface 530 512 at least one input equipment 528.Input equipment 528 usually indicates that input can be provided to exemplary computing system 510 (by computer or life at) any types or form input equipment.The example of input equipment 528 includes but not limited to:Key Disk, indicating equipment, speech recognition apparatus or any other input equipment.
As shown in figure 5, exemplary computing system 510 may also include main storage device 532 and couple via memory interface 534 To the backup storage device 533 of communication infrastructure 512.Storage device 532 and 533 usually indicate can store data and/or Any types of other computer-readable instructions or the storage device of form or medium.For example, storage device 532 and 533 can be Disc driver (for example, so-called hard disk drive), solid state drive, floppy disk, tape drive, disc drives Device, flash drive etc..Memory interface 534 usually indicate for storage device 532 and 533 with computing system 510 other Any types of transmission data or the interface of form or equipment between component.In one example, the database 120 of Fig. 1 can be deposited Storage is in main storage device 532.
In certain embodiments, storage device 532 and 533 can be configured as to be configured as storage computer software, The removable storage unit of data or other computer-readable informations, which executes, to be read and/or is written.It is suitable that storage list can be removed Member example include but not limited to:Floppy disk, tape, CD, flash memory device etc..Storage device 532 and 533 may also include Allow by computer software, data or other computer-readable instructions be loaded into other similar structures in computing system 510 or Equipment.For example, storage device 532 and 533 can be configured as reading and writing software, data or other computer-readable informations. Storage device 532 and 533 can also be a part for computing system 510, or can be visited by other interface systems The autonomous device asked.
Many other equipment or subsystem can be connected to computing system 510.On the contrary, without all portions shown in Fig. 5 Part and equipment all exist, and can also put into practice the embodiment for being described herein and/or illustrating.Equipment mentioned above and subsystem It can also be interconnected by being different from mode shown in fig. 5.Any amount of software, firmware and/or hard also can be used in computing system 510 Part configures.For example, one or more exemplary implementation scheme disclosed herein can be encoded as on computer-readable medium Computer program (also referred to as computer software, software application, computer-readable instruction or computer control logic).Such as this Used in text, term " computer-readable medium ", which is often referred to can to store or carry any type of of computer-readable instruction, to be set Standby, carrier or medium.The example of computer-readable medium includes but not limited to:Transmission type media (such as carrier wave) and non-transient type Medium (such as magnetic storage medium, for example, hard disk drive, tape drive and floppy disk), optical storage media (for example, CD (CD), Digital video disc (DVD) and Blu-ray Disc), electronic storage medium (for example, solid state drive and flash media) and other distribution System.
Computer-readable medium comprising computer program can be loaded into computing system 510.It then can be by computer The all or part of computer program stored on readable medium is stored in system storage 516 and/or storage device 532 and 533 Various pieces in.When being executed by processor 514, the computer program being loaded into computing system 510 can make processor 514 Execute the function and/or work of one or more of exemplary implementation scheme for being described herein and/or illustrating For the device for executing it.Additionally or alternatively, it can realize and be described herein in firmware and/or hardware And/or one or more of the exemplary implementation scheme illustrated.For example, computing system 510 can be configurable for implementing this The application-specific integrated circuit (ASIC) of one or more of exemplary implementation scheme disclosed in text.
Fig. 6 is the block diagram of exemplary network architecture 600, wherein FTP client FTP 610,620 and 630 and server 640 It can be connected to network 650 with 645.As detailed above, all or part of of the network architecture 600 can individually or and other elements In conjunction with executing one or more of step disclosed herein (one or more of step such as shown in Fig. 3) simultaneously And/or person is as the device for executing it.All or part of of the network architecture 600 can also be used in the execution disclosure illustrate Other steps and feature and/or as the device for executing it.
FTP client FTP 610,620 and 630 usually indicates the computing device or system of any types or form, such as Fig. 5 In exemplary computing system 510.Similarly, server 640 and 645 usually indicates to be configured to supply various database services And/or the computing device or system of the certain software applications of operation, such as apps server or database service Device.Network 650 usually indicates that any telecommunications or computer network include, for example, Intranet, WAN, LAN, PAN or Internet. In one example, FTP client FTP 610,620 and/or 630 and/or server 640 and/or 645 may include the system 100 of Fig. 1 All or part of.
As shown in fig. 6, one or more storage device 660 (1)-(N) can be directly attached to server 640.Similarly, one A or multiple storage device 670 (1)-(N) can be directly attached to server 645.Storage device 660 (1)-(N) and storage device 670 (1)-(N) usually indicate that the storage of any types or form that can store data and/or other computer-readable instructions is set Standby or medium.In certain embodiments, storage device 660 (1)-(N) and storage device 670 (1)-(N) can indicate by with It is set to and uses various agreements (such as Network File System (NFS), Server Message Block (SMB) or Universal Internet File System (CIFS)) network connection storage (NAS) equipment communicated with server 640 and 645.
Server 640 and 645 is also connected to storage area network (SAN) framework 680.SAN frameworks 680 usually indicate energy Enough promote any types of the communication between multiple storage devices or the computer network of form or framework.SAN frameworks 680 can promote Communication between server 640 and 645 and multiple storage devices 690 (1)-(N) and/or intelligent storage array 695.SAN framves Structure 680 can also promote FTP client FTP 610,620 and 630 in this way by network 650 and server 640 and 645 With the communication between storage device 690 (1)-(N) and/or intelligent storage array 695:Equipment 690 (1)-(N) and array 695 are in Equipment now is locally attached for FTP client FTP 610,620 and 630.With storage device 660 (1)-(N) and storage device 670 (1)-(N) is identical, storage device 690 (1)-(N) and intelligent storage array 695 usually indicate can to store data and/or other Any types of computer-readable instruction or the storage device of form or medium.
In certain embodiments, and with reference to the exemplary computing system of Fig. 5 510, communication interface is (logical in such as Fig. 5 Letter interface 522) it can be used for providing connectivity between each FTP client FTP 610,620 and 630 and network 650.Client system System 610,620 and 630 can access server 640 or 645 using such as web browser or other client softwares On information.Such software allows the access of FTP client FTP 610,620 and 630 to be set by server 640, server 645, storage For 660 (1)-(N), the number of 695 trustship of storage device 670 (1)-(N), storage device 690 (1)-(N) or intelligent storage array According to.Although Fig. 6 is shown exchanges data using network (such as internet), the implementation for being described herein and/or illustrating Scheme is not limited only to internet or any specific network-based environment.
In at least one embodiment, the whole or one in one or more exemplary implementation schemes disclosed herein Part can be encoded as computer program and be loaded into server 640, server 645, storage device 660 (1)-(N), storage set Standby 670 (1)-(N), storage device 690 (1)-(N), intelligent storage array 695 or their arbitrary group are closed and are held by it Row.All or part of in one or more exemplary implementation scheme disclosed herein can also be encoded as computer journey Sequence is stored in server 640, is run by server 645, and is assigned to FTP client FTP 610,620 and by network 650 630。
As detailed above, one or more components of computing system 510 and/or the network architecture 600 it is executable and/or As a kind of device the illustrative methods for anonymization journal entries are executed for individually or with other elements being combined One or more steps.
Although above disclosure elaborates various embodiments, each frame using specific block diagram, flow chart and example Figure component, flow chart step, the usable multiple hardwares of component, software or the firmware for operating and/or being described herein and/or illustrating (or any combination thereof) configuration is independent and/or jointly realizes.In addition, in any disclosure for the component being included in other component Appearance should be considered as being exemplary in itself, because can implement many other frameworks to realize identical function.
In some instances, all or part of of the exemplary system 100 in Fig. 1 can indicate cloud computing environment or be based on The part of the environment of network.Cloud computing environment can provide various services and applications by internet.These clothes based on cloud Business (such as software services, platform services, foundation structure services etc.) web browser or other remote interfaces can be passed through It accesses.Various functionality described herein can be provided by remote desktop environment or any other computing environment based on cloud.
In various embodiments, all or part of of the exemplary system 100 in Fig. 1 can promote calculating based on cloud Multi-tenant application in environment.In other words, software module as described herein can configure computing system (for example, server) to promote Into the multi-tenant application of one or more of function described herein.For example, one or more of software module described herein Server can be programmed with allow two or more clients (for example, customer) it is shared just running on the server answer Use program.The server programmed in this way can between multiple customers (that is, tenant) sharing application program, operating system, Processing system and/or storage system.One or more of module described herein can also be each customer segmentation multi-tenant application The data and/or configuration information of program prevent a customer from accessing the data and/or configuration information of another customer.
According to various embodiments, all or part of of exemplary system 100 in Fig. 1 can be real in virtual environment It is existing.For example, module described herein and/or data can be resident and/or execute in virtual machine.As used herein, term is " virtual Machine " is often referred to any operating system environment abstracted from computing hardware by virtual machine manager (for example, management program). Additionally or alternatively, module and/or data as described herein can be resident and/or hold in virtualization layer Row.As used herein, term " virtualization layer " is often referred to covering operating system environment and/or is taken out from operating system environment Any data Layer and/or application layer come.Virtualization layer can be by software virtualization solution (for example, file system filter) Management, software virtualization solution by virtualization layer be rendered as just look like it be bottom basic operating system a part.Example Such as, software virtualization solution can reset the calling of the position in initial orientation to basic file system and/or registration table To the position in virtualization layer.
In some instances, all or part of of the exemplary system 100 in Fig. 1 can indicate the portion of mobile computing environment Point.Mobile computing environment can be realized that these equipment include mobile phone, tablet computer, electronics by a variety of mobile computing devices Book reader, personal digital assistant, wearable computing devices are (for example, computing device, smartwatch with head-mounted display Deng), etc..In some instances, mobile computing environment can have one or more distinguishing characteristics, including (for example) supply battery Electricity dependence, at any given time only present a foreground application, remote management feature, touch screen feature, (for example, By offers such as global positioning system, gyroscope, accelerometers) modification of position and movement data, limitation to system-level configuration And/or limitation third party software checks the restricted platform of the ability of the behavior of other applications, limits the installation of application program Control device (for example, only installation from approved application program shop application program), etc..It is as described herein each Kind function is provided to mobile computing environment and/or can be interacted with mobile computing environment.
In addition, all or part of of the exemplary system 100 in Fig. 1 can indicate one or more information management systems Part is interacted with one or more information management systems, using the data generated by one or more information management systems and/or Generate the data used by one or more information management systems.As used herein, term " information management " can refer to the guarantor of data Shield, tissue and/or storage.The example of information management system may include but be not limited to:Storage system, standby system, filing system, Dubbing system, highly available system, data search system, virtualization system etc..
In some embodiments, all or part of of the exemplary system 100 in Fig. 1 can indicate one or more letters Cease the part of security system, generate the data protected by one or more information safety systems and/or with one or more information Security system communications.As used herein, term " information security " can refer to the control of the access to protected data.Information security system The example of system may include and be not limited to:The system, data loss prevention system, authentication system of managed security service are provided System, access control system, encryption system, strategy follow system, intrusion detection and guard system, electronics discovery system etc..
According to some examples, all or part of of exemplary system 100 in Fig. 1 can indicate one or more endpoints peaces System-wide part, with one or more endpoint security system communications and/or by one or more endpoint security system protections.Such as Used herein, term " endpoint security " can refer to protection point-to-point system from using, accessing and/or controlling without permission and/or illegally System.The example of endpoint protection system may include and be not limited to:Anti-malware system, customer certification system, encryption system, secrecy System, spam filter service, etc..
The procedure parameter and sequence of steps for being described herein and/or illustrating only provide and can be according to need by way of example Change.For example, although the step of as shown herein and/or description can be shown or be discussed with particular order, these steps differ It is fixed to need to execute by the sequence for illustrating or discussing.The various illustrative methods for being described herein and/or illustrating can also be omitted herein Description or one or more of the step of illustrate, or other than those of disclosed step further include other step.
Although described in the background of global function computing system and/or instantiating various embodiments, these are exemplary One or more of embodiment can be used as various forms of program products to distribute, and distribution is executed for practical without taking into account Computer-readable medium specific type.The software module for executing certain tasks can also be used in embodiments disclosed herein To realize.These software modules may include script, batch file or be storable on computer readable storage medium or computing system In other executable files.In some embodiments, computer system configurations can be execution this paper institutes by these software modules One or more of the disclosed exemplary embodiments.
In addition, one or more of module as described herein can be by the expression of data, physical equipment and/or physical equipment Another form is converted to from a kind of form.For example, one or more of module described herein can be with:Receive the behaviour to be converted Make journal entries, conversion log entry, output transformation result and carrys out anonymization one with anonymization journal entries, using transformation result Or multiple data loggings and transformation result is stored to protect personal recognizable information.In addition to this it or alternatively selects Select, one or more of module described herein can by executing on the computing device, on the computing device store data and/or It is otherwise interacted with computing device, by processor, volatile memory, nonvolatile memory, and/or physical computing Any other part of equipment is converted to another form from a kind of form.
It is to enable others skilled in the art best using disclosed herein to provide previously described purpose The various aspects of exemplary implementation scheme.The exemplary description is not intended in detail or is confined to disclosed any accurate Form.Under the premise of not departing from spirit and scope of the present invention, many modifications and variations can be carried out.Embodiment party disclosed herein Case should all be considered as illustrative rather than restrictive in all respects.It should refer to appended claims and its equivalent form come Determine the scope of the present invention.
Unless otherwise stated, the term " being connected to " used in the present description and claims and " being connected to " (and its derivative form) is construed as allowing directly or indirectly (that is, via other elements or component) connection.In addition, at this The term "a" or "an" used in description and claims it should be understood that at least one of " ... ".Most Afterwards, for ease of using, the term " comprising " and " having " that uses in the present description and claims (and it derives shape Formula) with word "comprising" it is interchangeable and with word "comprising" have identical meanings.

Claims (20)

1. a kind of computer implemented method for anonymization journal entries, at least part of the method is by including at least The computing device of one processor executes, the method includes:
Detect the data pattern in multiple journal entries, the multiple journal entries record by executed at least one equipment to The event that a few process executes;
At least one data field for including variable data in the multiple journal entries is identified in the data pattern;
Whether the data field of the assessment comprising variable data includes sensitive data with the determination data field;
Whether include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field With the multiple journal entries of anonymization.
2. computer implemented method according to claim 1, wherein detecting the number in the multiple journal entries Include that source template study analysis is executed to the multiple journal entries according to pattern.
3. computer implemented method according to claim 1, wherein detecting the number in the multiple journal entries Include that longest common subsequence analysis is executed to the multiple journal entries according to pattern.
4. computer implemented method according to claim 1, further includes:
Receive the journal entries of the additional process executed on optional equipment;
By the data patterns match in one group of data pattern being previously identified in the journal entries and the multiple journal entries;
Data anonymous strategy of the identification corresponding to the data pattern;
By the application corresponding data anonymous strategy come journal entries described in anonymization.
5. computer implemented method according to claim 1, further includes:
Determination is wherein necessary for being considered as that the data pattern of anonymization finds the contexts of privacy of the data pattern Number of thresholds;
The data pattern is detected in multiple privacy contexts;
Determine that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value;
In response to determining that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value, institute is determined It is anonymization to state data pattern.
6. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes sensitive data;
The data anonymous strategy is by least one of following come data field described in anonymization:
The data field is encrypted using uni-directional hash;
The data field is encrypted using reversible encryption;
The data field is replaced using random data;
The data field is replaced using static data;
Remove the data field;
The generalization data field.
7. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes enumerated data and therefore do not include sensitive data;
The data anonymous strategy does not change the data field.
8. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes the data of the known data type for not including sensitive data;
The data anonymous strategy does not change the data field.
9. computer implemented method according to claim 1, wherein:
Even if the data field, which is previously determined to not include if the sensitive data data field is assessed, can determine whether the number Include sensitive data now according to field;
The data field in the multiple existing journal entries of data anonymous strategy anonymization.
10. a kind of system for anonymization journal entries, the system comprises:
Mode module stored in memory, the mode module detects the data pattern in multiple journal entries, described more The event that a journal entries record is executed by least one process executed at least one equipment;
Field analysis module stored in memory, the field analysis module identify the multiple in the data pattern Include at least one data field of variable data in journal entries;
Data analysis module stored in memory, the data analysis module assessment include the data word of variable data Whether section includes sensitive data with the determination data field;
Anonymization module stored in memory, whether the anonymization module is in response to the determination data field comprising quick Feel data, data anonymous strategy is applied to the data field with the multiple journal entries of anonymization;
At least one physical processor, at least one physical processor are configured as executing the mode module, the word Piecewise analysis module, the data analysis module and the anonymization module.
11. system according to claim 10, wherein the mode module is by disappearing to the execution of the multiple journal entries Breath Template Learning is analyzed to detect the data pattern in the multiple journal entries.
12. system according to claim 10, wherein the mode module is by executing most the multiple journal entries Long common subsequence is analyzed to detect the data pattern in the multiple journal entries.
13. system according to claim 10, further includes:
Receive the journal entries of the additional process executed on optional equipment;
By the data patterns match in one group of data pattern being previously identified in the journal entries and the multiple journal entries;
Data anonymous strategy of the identification corresponding to the data pattern;
By the application corresponding data anonymous strategy come journal entries described in anonymization.
14. system according to claim 10, further includes:
Determination is wherein necessary for being considered as that the data pattern of anonymization finds the contexts of privacy of the data pattern Number of thresholds;
The data pattern is detected in multiple privacy contexts;
Determine that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value;
In response to determining that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value, institute is determined It is anonymization to state data pattern.
15. system according to claim 10, wherein:
The data field assessment determines that the data field includes sensitive data;
The data anonymous strategy is by least one of following come data field described in anonymization:
The data field is encrypted using uni-directional hash;
The data field is encrypted using reversible encryption;
The data field is replaced using random data;
The data field is replaced using static data;
Remove the data field;
The generalization data field.
16. system according to claim 10, wherein:
The data field assessment determines that the data field includes enumerated data and therefore do not include sensitive data;
The data anonymous strategy does not change the data field.
17. system according to claim 10, wherein:
The data field assessment determines that the data field includes the data of the known data type for not including sensitive data;
The data anonymous strategy does not change the data field.
18. system according to claim 10, wherein:
Even if the data field, which is previously determined to not include if the sensitive data data field is assessed, can determine whether the number Include sensitive data now according to field;
The data field in the multiple existing journal entries of data anonymous strategy anonymization.
19. a kind of includes the non-transitory computer-readable medium of one or more computer-readable instructions, one or more of Computer-readable instruction is at least one processor execution by computing device so that the computing device:
Detect the data pattern in multiple journal entries, the multiple journal entries record by executed at least one equipment to The event that a few process executes;
At least one data field for including variable data in the multiple journal entries is identified in the data pattern;
Whether the data field of the assessment comprising variable data includes sensitive data with the determination data field;
Whether include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field With the multiple journal entries of anonymization.
20. non-transitory computer-readable medium according to claim 19, wherein one or more of computer-readable Instruction is so that the computing device is the multiple to detect by executing source template study analysis to the multiple journal entries The data pattern in journal entries.
CN201680062430.3A 2015-11-20 2016-09-27 System and method for anonymizing log entries Active CN108351946B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/947915 2015-11-20
US14/947,915 US10326772B2 (en) 2015-11-20 2015-11-20 Systems and methods for anonymizing log entries
PCT/US2016/053995 WO2017087074A1 (en) 2015-11-20 2016-09-27 Systems and methods for anonymizing log entries

Publications (2)

Publication Number Publication Date
CN108351946A true CN108351946A (en) 2018-07-31
CN108351946B CN108351946B (en) 2022-03-08

Family

ID=57137266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680062430.3A Active CN108351946B (en) 2015-11-20 2016-09-27 System and method for anonymizing log entries

Country Status (5)

Country Link
US (1) US10326772B2 (en)
EP (1) EP3378007B1 (en)
JP (1) JP2019500679A (en)
CN (1) CN108351946B (en)
WO (1) WO2017087074A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874477A (en) * 2018-08-29 2020-03-10 北京京东尚科信息技术有限公司 Log data encryption method and device, electronic equipment and medium
CN111314292A (en) * 2020-01-15 2020-06-19 上海观安信息技术股份有限公司 Data security inspection method based on sensitive data identification
CN112783850A (en) * 2021-02-09 2021-05-11 珠海豹趣科技有限公司 File enumeration method and device based on USN log, electronic equipment and storage medium
CN112800003A (en) * 2021-01-20 2021-05-14 华云数据(厦门)网络有限公司 Recommendation method for creating snapshot, snapshot creation method and device and electronic equipment
CN112883389A (en) * 2021-02-09 2021-06-01 上海凯馨信息科技有限公司 Reversible desensitization algorithm supporting feature preservation
CN113452674A (en) * 2021-05-21 2021-09-28 南京逸智网络空间技术创新研究院有限公司 Galois field-based flow log multi-view anonymization method

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10223429B2 (en) * 2015-12-01 2019-03-05 Palantir Technologies Inc. Entity data attribution using disparate data sets
US10419401B2 (en) * 2016-01-08 2019-09-17 Capital One Services, Llc Methods and systems for securing data in the public cloud
US10192278B2 (en) * 2016-03-16 2019-01-29 Institute For Information Industry Traceable data audit apparatus, method, and non-transitory computer readable storage medium thereof
US10754983B2 (en) * 2017-03-31 2020-08-25 Interset Software Inc. Anonymization of sensitive data for use in user interfaces
US11023594B2 (en) * 2017-05-22 2021-06-01 Georgetown University Locally private determination of heavy hitters
US11062041B2 (en) * 2017-07-27 2021-07-13 Citrix Systems, Inc. Scrubbing log files using scrubbing engines
US10469307B2 (en) * 2017-09-26 2019-11-05 Cisco Technology, Inc. Predicting computer network equipment failure
US11263341B1 (en) * 2017-10-11 2022-03-01 Snap Inc. Identifying personally identifiable information within an unstructured data store
US10565398B2 (en) * 2017-10-26 2020-02-18 Sap Se K-anonymity and L-diversity data anonymization in an in-memory database
US10333902B1 (en) * 2017-12-19 2019-06-25 International Business Machines Corporation Data sanitization system for public host platform
US11907941B2 (en) * 2018-01-04 2024-02-20 Micro Focus Llc Anonymization of data fields in transactions
DK3800856T3 (en) * 2018-02-20 2023-08-28 Darktrace Holdings Ltd Cyber security appliance for a cloud infrastructure
US11301568B1 (en) * 2018-04-05 2022-04-12 Veritas Technologies Llc Systems and methods for computing a risk score for stored information
US11113417B2 (en) * 2018-07-10 2021-09-07 Sap Se Dynamic data anonymization using taint tracking
US20200125725A1 (en) * 2018-10-19 2020-04-23 Logrhythm, Inc. Generation and maintenance of identity profiles for implementation of security response
US11030350B2 (en) * 2018-11-29 2021-06-08 Visa International Service Association System, method, and apparatus for securely outputting sensitive information
US11803481B2 (en) 2019-02-28 2023-10-31 Hewlett Packard Enterprise Development Lp Data anonymization for a document editor
US11151285B2 (en) 2019-03-06 2021-10-19 International Business Machines Corporation Detecting sensitive data exposure via logging
US11188680B2 (en) 2019-09-20 2021-11-30 International Business Machines Corporation Creating research study corpus
US11328089B2 (en) 2019-09-20 2022-05-10 International Business Machines Corporation Built-in legal framework file management
US11106813B2 (en) * 2019-09-20 2021-08-31 International Business Machines Corporation Credentials for consent based file access
US11327665B2 (en) 2019-09-20 2022-05-10 International Business Machines Corporation Managing data on volumes
US11321488B2 (en) 2019-09-20 2022-05-03 International Business Machines Corporation Policy driven data movement
US11443056B2 (en) 2019-09-20 2022-09-13 International Business Machines Corporation File access restrictions enforcement
US11861493B2 (en) * 2019-12-30 2024-01-02 Micron Technology, Inc. Machine learning models based on altered data and systems and methods for training and using the same
EP3905087B1 (en) * 2020-04-27 2023-01-18 Brighter AI Technologies GmbH Method and system for selective and privacy-preserving anonymization
US11586486B2 (en) * 2020-08-24 2023-02-21 Vmware, Inc. Methods and systems that efficiently cache log/event messages in a distributed log-analytics system
US11874951B2 (en) * 2021-03-16 2024-01-16 Tata Consultancy Services Limited System and method for risk aware data anonymization
US20220327237A1 (en) * 2021-04-13 2022-10-13 Bi Science (2009) Ltd System and a method for identifying private user information
WO2023016641A1 (en) 2021-08-11 2023-02-16 Telefonaktiebolaget Lm Ericsson (Publ) Handling of logged restricted information based on tag syntax allocation
US12019782B1 (en) * 2021-11-19 2024-06-25 Trend Micro Incorporated Privacy protection for customer events logs of cybersecurity events
EP4235474A1 (en) 2022-02-24 2023-08-30 Fundación Tecnalia Research & Innovation Method and system for anonymising event logs
US20230315884A1 (en) * 2022-04-01 2023-10-05 Blackberry Limited Event data processing
US12013970B2 (en) * 2022-05-16 2024-06-18 Bank Of America Corporation System and method for detecting and obfuscating confidential information in task logs
KR20240013440A (en) * 2022-07-22 2024-01-30 쿠팡 주식회사 Electronic apparatus for processing data and method thereof
US20240070322A1 (en) * 2022-08-30 2024-02-29 Vmware, Inc. System and method for anonymizing sensitive information in logs of applications

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136189A (en) * 2011-11-28 2013-06-05 国际商业机器公司 Confidential information identifying method, information processing apparatus, and program
US20140237620A1 (en) * 2011-09-28 2014-08-21 Tata Consultancy Services Limited System and method for database privacy protection
US20140304825A1 (en) * 2011-07-22 2014-10-09 Vodafone Ip Licensing Limited Anonymization and filtering data
US20150302206A1 (en) * 2014-04-22 2015-10-22 International Business Machines Corporation Method and system for hiding sensitive data in log files

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002032473A (en) * 2000-07-18 2002-01-31 Fujitsu Ltd System and program storage medium for medical information processing
US7174507B2 (en) * 2003-02-10 2007-02-06 Kaidara S.A. System method and computer program product for obtaining structured data from text
US20040199828A1 (en) * 2003-04-01 2004-10-07 International Business Machines Corporation Method and apparatus for tracing troubleshooting events for aiding technical assistance
AU2004313518A1 (en) * 2003-12-15 2005-07-28 Evolveware Information Technology (India) Pty. Ltd An apparatus for migration and conversion of software code from any source platform to any target platform
US8412671B2 (en) * 2004-08-13 2013-04-02 Hewlett-Packard Development Company, L.P. System and method for developing a star schema
US20070299881A1 (en) * 2006-06-21 2007-12-27 Shimon Bouganim System and method for protecting selected fields in database files
US8341405B2 (en) * 2006-09-28 2012-12-25 Microsoft Corporation Access management in an off-premise environment
JP2008312156A (en) * 2007-06-18 2008-12-25 Hitachi Information & Control Solutions Ltd Information processing apparatus, encryption processing method, and encryption processing program
US8001136B1 (en) * 2007-07-10 2011-08-16 Google Inc. Longest-common-subsequence detection for common synonyms
US8166313B2 (en) 2008-05-08 2012-04-24 Fedtke Stephen U Method and apparatus for dump and log anonymization (DALA)
AU2011201369A1 (en) * 2010-03-25 2011-10-13 Rl Solutions Systems and methods for redacting sensitive data entries
US8544104B2 (en) * 2010-05-10 2013-09-24 International Business Machines Corporation Enforcement of data privacy to maintain obfuscation of certain data
JP2013186508A (en) * 2012-03-06 2013-09-19 Mitsubishi Denki Information Technology Corp Data processing system and log data management device
US20130332194A1 (en) * 2012-06-07 2013-12-12 Iquartic Methods and systems for adaptive ehr data integration, query, analysis, reporting, and crowdsourced ehr application development
JPWO2014181541A1 (en) * 2013-05-09 2017-02-23 日本電気株式会社 Information processing apparatus and anonymity verification method for verifying anonymity
US9448859B2 (en) * 2013-09-17 2016-09-20 Qualcomm Incorporated Exploiting hot application programming interfaces (APIs) and action patterns for efficient storage of API logs on mobile devices for behavioral analysis
US9965606B2 (en) * 2014-02-07 2018-05-08 Bank Of America Corporation Determining user authentication based on user/device interaction
US9378079B2 (en) * 2014-09-02 2016-06-28 Microsoft Technology Licensing, Llc Detection of anomalies in error signals of cloud based service
US9838359B2 (en) * 2015-10-29 2017-12-05 Ca, Inc. Separation of IoT network thing identification data at a network edge device
US10338977B2 (en) * 2016-10-11 2019-07-02 Oracle International Corporation Cluster-based processing of unstructured log messages
US11057344B2 (en) * 2016-12-30 2021-07-06 Fortinet, Inc. Management of internet of things (IoT) by security fabric

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140304825A1 (en) * 2011-07-22 2014-10-09 Vodafone Ip Licensing Limited Anonymization and filtering data
US20140237620A1 (en) * 2011-09-28 2014-08-21 Tata Consultancy Services Limited System and method for database privacy protection
CN103136189A (en) * 2011-11-28 2013-06-05 国际商业机器公司 Confidential information identifying method, information processing apparatus, and program
US20150302206A1 (en) * 2014-04-22 2015-10-22 International Business Machines Corporation Method and system for hiding sensitive data in log files

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874477A (en) * 2018-08-29 2020-03-10 北京京东尚科信息技术有限公司 Log data encryption method and device, electronic equipment and medium
CN111314292A (en) * 2020-01-15 2020-06-19 上海观安信息技术股份有限公司 Data security inspection method based on sensitive data identification
CN112800003A (en) * 2021-01-20 2021-05-14 华云数据(厦门)网络有限公司 Recommendation method for creating snapshot, snapshot creation method and device and electronic equipment
CN112783850A (en) * 2021-02-09 2021-05-11 珠海豹趣科技有限公司 File enumeration method and device based on USN log, electronic equipment and storage medium
CN112883389A (en) * 2021-02-09 2021-06-01 上海凯馨信息科技有限公司 Reversible desensitization algorithm supporting feature preservation
CN112783850B (en) * 2021-02-09 2023-09-22 珠海豹趣科技有限公司 File enumeration method and device based on USN (universal serial bus) log, electronic equipment and storage medium
CN113452674A (en) * 2021-05-21 2021-09-28 南京逸智网络空间技术创新研究院有限公司 Galois field-based flow log multi-view anonymization method
CN113452674B (en) * 2021-05-21 2024-05-07 南京逸智网络空间技术创新研究院有限公司 Galois field-based flow log multi-view anonymization method

Also Published As

Publication number Publication date
US10326772B2 (en) 2019-06-18
US20170149793A1 (en) 2017-05-25
CN108351946B (en) 2022-03-08
EP3378007B1 (en) 2022-01-19
JP2019500679A (en) 2019-01-10
EP3378007A1 (en) 2018-09-26
WO2017087074A1 (en) 2017-05-26

Similar Documents

Publication Publication Date Title
CN108351946A (en) System and method for anonymization journal entries
US9245123B1 (en) Systems and methods for identifying malicious files
US9401925B1 (en) Systems and methods for detecting security threats based on user profiles
US8925037B2 (en) Systems and methods for enforcing data-loss-prevention policies using mobile sensors
JP6122555B2 (en) System and method for identifying compromised private keys
JP6101408B2 (en) System and method for detecting attacks on computing systems using event correlation graphs
US9077747B1 (en) Systems and methods for responding to security breaches
US9652597B2 (en) Systems and methods for detecting information leakage by an organizational insider
JP6703616B2 (en) System and method for detecting security threats
CN108701188A (en) In response to detecting the potential system and method for extorting software for modification file backup
US10410158B1 (en) Systems and methods for evaluating cybersecurity risk
CN108293044A (en) System and method for detecting malware infection via domain name service flow analysis
US9323930B1 (en) Systems and methods for reporting security vulnerabilities
US9749299B1 (en) Systems and methods for image-based encryption of cloud data
CN108292133A (en) System and method for identifying compromised device in industrial control system
US9652615B1 (en) Systems and methods for analyzing suspected malware
US10425435B1 (en) Systems and methods for detecting anomalous behavior in shared data repositories
US9973525B1 (en) Systems and methods for determining the risk of information leaks from cloud-based services
US10313386B1 (en) Systems and methods for assessing security risks of users of computer networks of organizations
CN109997138A (en) For detecting the system and method for calculating the malicious process in equipment
US9569617B1 (en) Systems and methods for preventing false positive malware identification
US10366344B1 (en) Systems and methods for selecting features for classification
US9659176B1 (en) Systems and methods for generating repair scripts that facilitate remediation of malware side-effects
US20190311136A1 (en) Systems and methods for utilizing an information trail to enforce data loss prevention policies on potentially malicious file activity
US9754086B1 (en) Systems and methods for customizing privacy control systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200103

Address after: California, USA

Applicant after: CA,INC.

Address before: California, USA

Applicant before: Symantec Corporation

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant