CN108351946A - System and method for anonymization journal entries - Google Patents
System and method for anonymization journal entries Download PDFInfo
- Publication number
- CN108351946A CN108351946A CN201680062430.3A CN201680062430A CN108351946A CN 108351946 A CN108351946 A CN 108351946A CN 201680062430 A CN201680062430 A CN 201680062430A CN 108351946 A CN108351946 A CN 108351946A
- Authority
- CN
- China
- Prior art keywords
- data
- journal entries
- data field
- anonymization
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/105—Multiple levels of security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2101—Auditing as a secondary aspect
Landscapes
- Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a kind of computer implemented method for anonymization journal entries, the method may include:(1) data pattern in one group of journal entries is detected, the event that one group of journal entries record is executed by least one process executed at least one equipment, (2) at least one data field for including variable data in the journal entries is identified in the data pattern, (3) whether the data field of the assessment comprising variable data includes sensitive data with the determination data field, and whether (4) include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field with journal entries described in anonymization.The invention also discloses various other methods, system and computer-readable mediums.
Description
Background technology
System Operation Log (such as security system daily record) generally comprises the valuable data of system operatio for information about.Example
Such as, system manager can monitor security log to verify security system normal operating, diagnostic operation or performance issue, identification system
System weakness, the source for identifying security threat, and/or forensics analysis is carried out to security breaches.Administrator can also excavate peace
Full journal entries are to find the security threat of new type.In addition, data analyst can be with digging system operation log to analyze use
Family behavior and/or system performance.
However, System Operation Log generally includes sensitive information, such as personally identifiable information (PII) or foundation structure are related
Information (such as network address or server name).Regrettably, this information may enable attacker to map internal network
And search for tender spots.Log information may also be exposed to workable work schedule in social engineering attack, personal relationship or
Other information.Therefore, if security log is unprotected, security log may be the information source used during orientation threatens.Cause
This, the disclosure identifies and solves the needs to the system and method for safeguarding the other of journal entries and improving.
Invention content
As the following more detailed description, the present disclosure describes include quick for the possibility by identifying in journal entries
Feel the field of information and then carrys out the various of anonymization journal entries using the data anonymous strategy of the anonymization sensitive information
System and method.System and method as described herein using various machine learning techniques can identify sensitive information and to quick
Sense information is distinguished with other variable datas.System and method as described herein can also be directed to new entry monitoring data day
Will determines whether new entry includes sensitive information, and when recognizing the new data field comprising sensitive information, anonymization is existing
There are log file entries.
In one example, a kind of computer implemented method for anonymization journal entries may include:(1) it detects
Data pattern in one group of journal entries, this group of journal entries record one or more mistakes by being executed in one or more equipment
The event of Cheng Zhihang, (2) identify the one or more data fields for including variable data in journal entries in data pattern,
(3) data field of the assessment comprising variable data is to determine whether data field includes sensitive data, and (4) in response to determination
Whether data field includes sensitive data, and data anonymous strategy is applied to data field with anonymization journal entries.
In some instances, it may include executing source template to journal entries to detect the data pattern in journal entries
Practise analysis.In some instances, it may include jointly sub to journal entries execution longest to detect the data pattern in journal entries
Sequence analysis.In one embodiment, computer implemented method can also include:(1) it receives from one or more attached
The journal entries of the additional process executed on oil (gas) filling device, one group of data that (2) will be previously identified in journal entries and journal entries
Data patterns match in pattern, (3) identification correspond to the data anonymous strategy of data pattern, and (4) pass through application pair
The data anonymous strategy answered carrys out anonymization journal entries.
In one embodiment, computer implemented method can also include:(1) determine wherein be necessary for be considered as
The data pattern of anonymization finds the number of thresholds of the contexts of privacy of data pattern, and (2) are detected in one group of contexts of privacy
Data pattern, (3) determine that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, and (4) response
In determining that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, determine that data pattern is anonymization
's.In one embodiment, data field assessment determines that data field includes sensitive data, and data anonymous strategy is logical
It crosses following manner and carrys out anonymization data field:(1) carry out encryption data field using uni-directional hash, (2) are added using reversible encryption
Ciphertext data field, (3) utilize random data replacement data field, (4) that static data replacement data field, (5) is utilized to remove number
According to field, and/or (6) generalized data field.
In one embodiment, data field assessment determines that data field includes enumerated data and therefore do not wrap
Containing sensitive data, and data anonymous strategy does not change data field.In another embodiment, data field assessment is true
Determine the data that data field includes the known data type for not including sensitive data, and data anonymous strategy does not change data
Field.In addition, even if data field is previously determined to not include sensitive data, data field assessment can also determine data word
Section includes sensitive data now.Data anonymous strategy then can be with the data field in one group of existing journal entries of anonymization.
In one embodiment, a kind of system for realizing the above method may include stored in memory several
Module, such as:(1) mode module, the mode module detect the data pattern in one group of journal entries, this group of journal entries record
The event executed by the one or more processes executed in one or more equipment, (2) field analysis module, the field analysis mould
Block identifies the one or more data fields for including variable data in journal entries in data pattern, (3) data analysis module,
The data analysis module assesses the data field comprising variable data to determine whether data field includes sensitive data, and
(4) anonymization module, the anonymization module is in response to determining whether data field includes sensitive data, by data anonymous strategy
Applied to data field with anonymization journal entries.System can also include at least one physical processor, the physical processor
It is configured as execution pattern module, field analysis module, data analysis module and anonymization module.
In some instances, the computer-readable finger that the above method can be encoded in non-transitory computer-readable medium
It enables.For example, a kind of computer-readable medium may include one or more computer executable instructions, the computer is executable
It instructs at least one processor execution by computing device, may make computing device:(1) it detects in one group of journal entries
Data pattern, the event that this group of journal entries record is executed by the one or more processes executed in one or more equipment, (2)
The one or more data fields for including variable data in journal entries are identified in data pattern, (3) assessment includes can parameter
According to data field with determine data field whether include sensitive data, and (4) in response to determine data field whether include
Data anonymous strategy is applied to data field with anonymization journal entries by sensitive data.
Feature from any of the embodiment above can be according to General Principle as described herein in conjunction
It uses.Following detailed description is read with claim in conjunction with the accompanying drawings, it will be more fully understood by these and other implementations
Scheme, feature and advantage.
Description of the drawings
Attached drawing shows multiple exemplary implementation schemes and is part of specification.These attached drawings combine following retouch
State the various principles for showing and illustrating the disclosure.
Fig. 1 is the block diagram for the exemplary system of anonymization journal entries.
Fig. 2 is the block diagram of the additional example sexual system for anonymization journal entries.
Fig. 3 is the flow chart for the illustrative methods of anonymization journal entries.
Fig. 4 is the block diagram of example log data.
Fig. 5 is that by the exemplary meter of one or more of embodiment for being described herein and/or showing
The block diagram of calculation system.
Fig. 6 is that by the exemplary meter of one or more of embodiment for being described herein and/or showing
Calculate the block diagram of network.
In whole attached drawings, same reference character is similar with description instruction but the element that is not necessarily the same.Although described herein
Exemplary implementation scheme can receive to carry out various modifications and alternative form, but show spy by way of example in the accompanying drawings
Determine embodiment and these embodiments will be described in detail herein.However, exemplary implementation scheme as described herein is not
It is restricted to particular forms disclosed.On the contrary, the present invention cover all modifications form fallen within the scope of the appended claims,
Equivalent form and alternative form.
Specific implementation mode
The present disclosure relates generally to the system and method for anonymization journal entries.As will be explained in greater, lead to
It crosses and applies machine learning techniques, system and method as described herein can be identified by the selected data anonymous strategy of application
Including data field described in the data field of personally identifiable information or other sensitive datas and anonymization.Data anonymous strategy
May be customized in many ways, the mode include according to data type, desired safe class, journal file not
Carry out data mining plan etc..System and method as described herein can also apply data anonymous process to be more than data field anonymity
Change measurement.In addition, system and method as described herein can continuously monitor the sensitive data in new or available data field
New journal entries, and when recognizing new sensitive information, data anonymous strategy is re-applied in existing security log
Set.
The detailed description to the exemplary system for anonymization journal entries is provided below with reference to Fig. 1, Fig. 2 and Fig. 4.
The detailed description of corresponding computer implemented method will be also provided in conjunction with Fig. 3.In addition, energy will be provided respectively in connection with Fig. 5 and Fig. 6
Enough realize the exemplary computing system of one or more of embodiment described herein and the detailed description of the network architecture.
Fig. 1 is the block diagram of the exemplary system 100 for anonymization journal entries.As shown in the drawing, exemplary system 100
It may include one or more modules 102 for executing one or more tasks.For example, and will such as solve in further detail below
It releases, exemplary system 100 may include the mode module 104 of the data pattern in the multiple journal entries of detection, multiple day
The event that will program recording is executed by the one or more processes executed in one or more equipment.Exemplary system 100 can be with
Include additionally field analysis module 106, which identifies in multiple journal entries in data pattern comprising can
Become one or more data fields of data.Exemplary system 100 can also include data analysis module 108, the data analysis
Module estimation includes the data field of variable data to determine whether the data field includes sensitive data.Exemplary system 100
Can include additionally anonymization module 110, which, will in response to determining whether data field includes sensitive data
Data anonymous strategy is applied to data field with the multiple journal entries of anonymization.While shown as independent component, but in Fig. 1
One or more of module 102 can indicate the part of individual module or application program.
In certain embodiments, one or more of the module 102 in Fig. 1 can indicate one or more software applications
Program or program, the software application or program can make computing device execute one or more when being executed by computing device
A task.For example, and such as will be described in greater detail below, one or more of module 102 can indicate to be stored in one
Or on multiple computing devices and it is configured as the software module run on one or more computing devices, the computing device
Such as in equipment shown in Figure 2 (for example, computing device 202 and/or server 206), Fig. 5 computing system 510 and/or
The part of exemplary network architecture 600 in Fig. 6.One or more of module 102 in Fig. 1 can also indicate to be configured as holding
The all or part of one or more special purpose computer of row one or more task.
As shown in Figure 1, exemplary system 100 may also include one or more databases, such as database 120.At one
In example, database 120 can be configured as the one or more System Operation Logs of storage, data anonymous policy information, be
System journal entries mode data and/or the data for assisting in identifying sensitive data (such as personally identifiable information).
Database 120 can indicate part or multiple databases or the computing device of single database or computing device.Example
Such as, database 120 can indicate showing in computing system 510, and/or Fig. 6 in a part for the server 206 in Fig. 2, Fig. 5
The part of the example property network architecture 600.Alternatively, the database 120 in Fig. 1 can indicate one can accessed by computing device
Or multiple physics autonomous devices, it is exemplary in the server 206 in such as Fig. 2, the computing system 510 in Fig. 5, and/or Fig. 6
The part of the network architecture 600.
Exemplary system 100 in Fig. 1 can be realized with various ways.For example, the whole of exemplary system 100 or one
The part of the exemplary system 200 in Fig. 2 can be indicated by dividing.As shown in Fig. 2, system 200 may include via network 204 and server
The computing device 202 of 206 communications.In one example, one or more of 202 available modules 102 of computing device are compiled
Journey and/or all or part of that data in database 120 can be stored.Additionally or alternatively, it services
One or more of 206 available modules 102 of device are programmed and/or can store the whole or one of the data in database 120
Part.
In one embodiment, one or more of module 102 of Fig. 1 is passing through at least the one of computing device 202
A processor and/or when being executed by server 206 so that computing device 202 and/or server 206 being capable of anonymization daily records
Entry.For example, and as will be hereinafter described in greater detail, one or more of module 102 can make computing device
202 and/or 206 anonymization journal entries of server.For example, and as will be hereinafter described in greater detail, mode module 104
The data pattern 210 in multiple journal entries 208 can be detected, multiple journal entries record in one or more equipment by holding
The event that capable one or more processes execute.Then field analysis module 106 can identify multiple days in data pattern 210
Include one or more data fields 212 of variable data in will entry 208.Then data analysis module 108 can assess packet
Data field 212 containing variable data is to determine whether data field 212 includes sensitive data.Finally, anonymization module 110 can
In response to determining whether data field 212 includes sensitive data, data anonymous strategy 214 is applied to data field 212
With the multiple journal entries 208 of anonymization, to create anonymization journal entries 216.
Computing device 202 usually indicates that the calculating of any types or form that can read computer executable instructions is set
It is standby.The example of computing device 202 includes but not limited to laptop computer, tablet computer, desktop computer, server, honeycomb electricity
Words, personal digital assistant (PDA), multimedia player, embedded system, wearable device are (for example, smartwatch, Brilliant Eyes
Mirror etc.), game console, combination one or more in these equipment, the exemplary computing system 510 in Fig. 5 or it is any its
His suitable computing device.
Server 206 usually indicates to receive, store and/or compare the calculating of any types or form of data
Equipment.The example of server 206 includes but not limited to be configured to supply various database services and/or run certain softwares to answer
With the apps server and database server of program.
Network 204 usually indicates that any medium or framework of communication or data transmission can be promoted.The example packet of network 204
Include but be not limited to Intranet, wide area network (WAN), LAN (LAN), personal area network (PAN), internet, power line communication (PLC),
Exemplary network architecture 600 etc. in cellular network (for example, global system for mobile communications (GSM) network), Fig. 6.Network 204
It can be used and wirelessly or non-wirelessly connect to promote communication or data transmission.In one embodiment, network 204 can promote to calculate and set
Standby communication between 202 and server 206.
Fig. 3 is the flow chart for the method 300 realized for the illustrative computer of anonymization journal entries.It is shown in Fig. 3
The step of can be executed by any suitable computer-executable code and/or computing system.In some embodiments, scheme
Step shown in 3 can be by the system 100 in Fig. 1, the system 200 in Fig. 2, the computing system 510 in Fig. 5 and/or Fig. 6
One or more of the component of part of exemplary network architecture 600 execute.
As shown in figure 3, at step 302, one or more of system described herein can detect multiple journal entries
In data pattern, the thing that multiple journal entries record is executed by the one or more processes executed in one or more equipment
Part.For example, the part as the computing device 202 in Fig. 2, mode module 104 can detect in multiple journal entries 208
Data pattern 210, the thing that multiple journal entries record is executed by the one or more processes executed in one or more equipment
Part.Including the journal file of journal entries 208 may reside within computing device 202, server 206 and/or it is one or more its
On his computing device.Furthermore, it is possible to create day by the one or more processes executed in any equipment in these equipment
Will entry.
Mode module 104 may detect the data pattern in journal entries in various ways.For example, for some processes,
Especially for widely used program, mode module 104 can obtain daily record from documentation of program or other publicly available sources
The data pattern of entry.In other examples, mode module 104 can be directed to existing journal entries using text analyzing program with
Identify the fixed part and variable part of journal entries.The method may be enough to identify with seldom variable data field or a small amount of
The data pattern of the program of log entry format.
For with being not easy to distinguish the journal file of structure, such as with a large amount of log entry formats or comprising being enumerated
The journal file of many variable data fields of the mixing of data and personal recognizable information, mode module 104 can be applied more
Any one of kind machine learning algorithm is to detect the data pattern in journal entries.In some instances, mode module 104 can
To detect the data pattern in multiple journal entries by executing source template study analysis to multiple journal entries.As herein
Used, term " source template study analysis " is typically referred to for will have the text of common words combination to resolve into message
The method of variant.
Fig. 4 is the block diagram of example log entry data 400, and figure module 104 can be in source template study analysis
Using the journal entries data to detect log entry format.Journal entries data 400 may include and other journal entries phases
Set than the journal entries 402 for having been previously identified as having many common words.In this example, journal entries variant tree
404 indicate the data structure of the level for the log entry format for being configured to indicate that the group includes.In this way, occurring when in message
When words, which can be associated with the words.For example, message m l, m2 and m3 include words " available ", therefore these message and
The node of expression words " available " in journal entries variant tree 404 is associated.In the case where giving each father node, pattern
Module 104, which can be searched, to be combined with the associated most frequent words of child node and makes the combination as child node.Mode module
104 can repeat the process, until all message are all associated.Then mode module 104 can be trimmed comprising many variants
Tree branch.For example, the message in journal entries 402 may include many IP address and virtual path number.Mode module 104
These variant groups can be identified as including the data field of variable data.
In some instances, mode module 104 can be by executing longest common subsequence analysis to multiple journal entries
To detect the data pattern in multiple journal entries.As used herein, term " longest common subsequence analysis " typically refers to use
In the algorithm (basis for being used as several text analyzing utility programs) for searching the common words sequence of longest in massage set.Longest
Common subsequence is analyzed with source template study analysis the difference is that it considers appearance sequence words in the message, wherein
Source template study only relates to whether words occur, without consideration sequence.
At step 304, one or more of system described herein can identify multiple daily record items in data pattern
Include one or more data fields of variable data in mesh.For example, the part as the computing device 202 in Fig. 2, field
Analysis module 106 can identify the one or more numbers for including variable data in multiple journal entries 208 in data pattern 210
According to field 212.
Field analysis module 106 may identify the data field comprising variable data in various ways.For example, field analysis
Data field comprising variable data can be identified as the one of the analysis that mode module 104 executes in step 302 by module 106
Part, to identify the data pattern in journal entries set.For some programs, field analysis module 106 can be from program text
Identification includes the data field of variable data in shelves.In other examples, field analysis module 106 can be to existing journal entries
Using text analyzing program (the diff utility programs on such as unix system or LINUX system) to identify comprising variable data
Data field.
In some instances, as previously mentioned, field analysis module 106 can use machine learning algorithm (such as
Source template study analysis and the analysis of longest consensus) identify the data field comprising variable data.For example, such as Fig. 4 institutes
Show, source template study analysis can determine IP address and route number in message m 1-m11, to journal entries variant tree 404
Many text variants nodes are introduced, and therefore indicate the data field for including variable data.
At step 306, one or more of system described herein can assess the data field comprising variable data
To determine whether the data field includes sensitive data.For example, the part as the computing device 202 in Fig. 2, data analysis
Module 108 can assess the data field 212 comprising variable data to determine whether data field 212 includes sensitive data.
As used herein, phrase " sensitive data " typically refer to its public information disclosure individual or entity may be caused to damage
Harmful proprietary data.Sensitive data may include personally identifiable information (PII), foundation structure related data (such as internal IP
Location, user name or server name), or it is protected from disclosed data by law, contract or organizational politics.
Data analysis module 108 may determine that the data field comprising variable data includes sensitive data in many ways.
For example, documentation of program or other publicly available information can indicate that the specific data field in journal entries may include sensitive number
According to.In another example, data analysis module 108 may search for database or network directory service is to determine data field
It is no to include personally identifiable information, user name, server name etc..In another example, data analysis module 108 can use
Network diagnosis come determine data field whether include network infrastructure information (IP address of such as organization internal).Data analysis
Module 108 can determine that implicit IP address constitutes sensitive information, and outside ip address does not constitute sensitive information.
At step 308, one or more of system described herein can in response to determine data field whether include
Data anonymous strategy is applied to data field with the multiple journal entries of anonymization by sensitive data.For example, the meter as Fig. 2
A part for equipment 202 is calculated, anonymization module 110 can will be counted in response to determining whether data field 212 includes sensitive data
It is applied to data field 212 with the multiple journal entries 208 of anonymization, to generate anonymization daily record item according to anonymization strategy 214
Mesh 216.
Data anonymous strategy may be applied to data field by anonymization module 110 in many ways.For example, anonymization
Identical data anonymous strategy can be applied to all sensitive datas by module 110, or be applied depending on data type
Different data anonymous strategies.In one embodiment, data field assessment can determine that data field includes sensitive number
According to.In this embodiment, data anonymous strategy can carry out anonymization data using one or more data anonymous technologies
Field.The selection of data anonymous technology can change depending on following:For example, security level needed for data field, being
Later analysis or any other standard of no some information needed in retention data field for journal entries.
In one example, anonymization module 110 can be encrypted data field by using uni-directional hash to hide
Nameization data field.It can contribute to the analysis later of journal entries using uni-directional hash, while protecting sensitive data from public affairs
It opens.When due to applying hashing algorithm every time, which generates identical data for same Hash value, therefore by data hash
The information that same Hash value in each case quotes identical source text can be retained by changing, and underground source text.For example, being directed to
User name KPAULSEN, MD5 hashing algorithm generate hashed value " d0d4742e5beb935cf3272c4e77215f18 ".Later
The people of time analysis journal entries will recognize hashed value reference same subscriber in each case, without knowing user name.
In another example, anonymization module 110 can be encrypted by using reversible encryption to data field
Anonymization data field.As Hash, secret value can be retained by carrying out anonymization data field using irreversible cryptographic algorithm
Correspondence between source text, and underground source text.However, using reversible encryption, trusted data
Analyst can decrypt ciphertext to re-create source text using specific encryption key.
In another example, anonymization module 110 can carry out anonymization number by using random data replacement data field
According to field.By this method, anonymization module 110 can protect the sensitive data in data field, and without such as Hash one
Sample safeguards the relationship between anonymization data and source data.Carry out anonymization data field still retention data word using random data
Section includes the information of variable data.As discussed above, the machine learning algorithm as source template learns can be in analysis day
Variable data field is identified during will entry.
In another example, anonymization module 110 can be by generalized data field come anonymization data field.Number
It is a kind of anonymization technology according to generalization, certain sensitive data is replaced with to the more typically data of identification specific data classification,
Without public data itself.For example, anonymization module 110 can turn to implicit IP address " 208.65.13.15 " anonymity
“208.65.13.XXX”.The people of analysis anonymization data logging will identify the sub-network of computing device, but None- identified is special
Locking equipment.In another example, user name can be replaced with the title of its department by anonymization module 110.
Some simple anonymization technologies effectively anonymization sensitive data, but little or no be preserved for later
The information of analysis.In one example, anonymization module 110 can carry out anonymization number by using static data replacement data field
According to field.For example, anonymization module 110 can use character string " [IP address] " to replace IP address.In another example, anonymous
Anonymization data field can be carried out simply by removing data field by changing module 110.
In one embodiment, system as described herein can determine data anonymous strategy using statistics heuristic
Whether desired anonymization rank has been realized.For example, system as described herein can be with:(1) determine wherein be necessary for be considered as
The data pattern of anonymization finds the number of thresholds of the contexts of privacy of data pattern, and (2) are detected in multiple privacy contexts
Data pattern, (3) determine that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, and (4) response
In determining that the quantity of the contexts of privacy comprising data pattern is more than contexts of privacy threshold value, determine that data pattern is anonymization
's.As used herein, term " contexts of privacy " is typically referred to comprising must be by the environment of the personal information of anonymization.For example,
Journal file from a business can be contexts of privacy.If finding data mould in sufficient amount of contexts of privacy
Formula, then it is believed that data pattern is without personal recognizable information and therefore by abundant anonymization.For example, as the meter in Fig. 2
A part for equipment 202 is calculated, anonymization module 110 can determine if finding data pattern in 50% contexts of privacy,
Then data pattern is by abundant anonymization.
In one embodiment, data field assessment can determine that data field includes enumerated data and therefore
Not comprising sensitive data.In this embodiment, data anonymous strategy can not change data field.For example, as in Fig. 2
Computing device 202 a part, data analysis module 108 can determine data field include variable data when, variant
Quantity is relatively small, to indicate that the field includes the data of only several probable values enumerated.For example, data analysis module
108 can determine data field only include " installation ", " refreshing " and " not installing " value, and anonymization data field without
It need to further operate.
In another embodiment, data field assessment can determine that data field does not include sensitive data known to including
Data type data.In this embodiment, data anonymous strategy can not change data field.For example, program is literary
The data that shelves or other publicly available sources can include the specific data type for not being considered as sensitive data with indicator data field,
And for the data anonymous strategy of data field without taking any further operation come anonymization data field.
In one embodiment, system as described herein can be directed to the new entry monitoring journal file added,
And apply data anonymous strategy with anonymization journal entries as needed.For example, system as described herein can be with:(1) it connects
The journal entries of the additional process executed on optional equipment are received, (2) had previously known journal entries with multiple journal entries
Data patterns match in other one group of data pattern, (3) identification correspond to the data anonymous strategy of data pattern, and
(4) by the corresponding data anonymous strategy of application come anonymization journal entries.For example, as the computing device 202 in Fig. 2
A part, mode module 104 can receive the journal entries from equipment, and determine that the data patterns match of journal entries is first
The data pattern of preceding identification.Then anonymization module 110 can apply the data anonymous corresponding to identified data pattern
Strategy is with anonymization journal entries.
In one embodiment, even if data field is previously determined to not include sensitive data, data field assessment
It includes sensitive data that can also determine data field now.In this example, data anonymous strategy can be multiple existing with anonymization
There is the data field in journal entries.For example, when for new entry monitoring journal file, system as described herein can identify
The sensitive data being previously determined in the field comprising enumerated data or other nonsensitive datas.In particular, conduct
A part for computing device 202 in Fig. 2, anonymization module 110 can be in the new journal entries of anonymization and existing journal entries
Data field to realize desired data anonymous rank.
As in greater detail, system and method as described herein can be by identifying first comprising sensitive information above
Data field and then application carry out anonymization journal entries for the data anonymous strategy of anonymization sensitive data.This paper institutes
The system and method stated can apply machine learning algorithm or include the data field of variable data in journal entries for identification
Other technologies.System and method as described herein can also identify the sensitive data in data field using various technologies.
Additionally, system and method as described herein can select data anonymous strategy to provide various data security levels or advantageous
In the analysis later of journal entries.System and method as described herein can also assess data anonymous process to verify the mistake
Journey satisfaction or the expectation measured value more than data anonymous.Additionally, system and method as described herein can continue to monitor day
Will file so as to the new journal entries of anonymization or determines when to handle existing journal entries again to keep desired
Data anonymous rank.
Fig. 5 is the exemplary meter that can realize one or more of embodiment for being described herein and/or illustrating
The block diagram of calculation system 510.For example, all or part of of computing system 510 can be combined individually or with other elements to execute
One or more of steps described herein (one or more of step such as shown in Fig. 3) and/or as with
In the device of execution.All or part of of computing system 510 also can perform any other for being described herein and/or illustrating
Step, method or process and/or as the device for execution.
Computing system 510 indicates any uniprocessor or multiprocessor that are able to carry out computer-readable instruction in a broad sense
Computing device or system.The example of computing system 510 includes but not limited to:Work station, laptop computer, client-side terminal,
Server, distributed computing system, handheld device or any other computing system or equipment.In its most basic configuration, meter
Calculation system 510 may include at least one processor 514 and system storage 516.
Processor 514 usually indicate that data or interpretation can be handled and the physics of any types or form that executes instruction at
Manage unit (for example, hard-wired central processing unit).In certain embodiments, processor 514, which can receive, comes from software
The instruction of application program or module.These instructions can make processor 514 execute one or more for being described herein and/or illustrating
The function of a exemplary implementation scheme.
System storage 516 usually indicate can store data and/or other computer-readable instructions any types or
The volatibility or non-volatile memory device or medium of form.The example of system storage 516 includes but not limited to:Arbitrary access
Memory (RAM), read-only memory (ROM), flash memories or any other suitable memory devices.Although not being required
, but in certain embodiments, computing system 510 may include volatile memory-elements (such as, system storage
And both non-volatile memory devices (such as, main storage device 532, as detailed below) 516).In one example, Fig. 1
One or more of module 102 can be loaded into system storage 516.
In certain embodiments, in addition to processor 514 and system storage 516, exemplary computing system 510 may be used also
Including one or more components or element.For example, as shown in figure 5, computing system 510 may include Memory Controller 518, defeated
Enter/export (I/O) controller 520 and communication interface 522, each of which can be interconnected via communication infrastructure 512.
Communication infrastructure 512 usually indicate can to promote any types of the communication between one or more components of computing device or
The foundation structure of form.The example of communication infrastructure 512 includes but not limited to communication bus (such as Industry Standard Architecture
(ISA), peripheral parts interconnected (PCI), PCI Express (PCIe) or similar bus) and network.
Memory Controller 518 usually indicates that one of memory or data or control computing system 510 can be handled
Or any types of the communication between multiple components or the equipment of form.For example, in certain embodiments, Memory Controller
518 can carry out leading between control processor 514, system storage 516 and I/O controllers 520 via communication infrastructure 512
Letter.
I/O controllers 520 usually indicate to coordinate and/or control times for outputting and inputting function of computing device
The module of what type or form.For example, in certain embodiments, I/O controllers 520 are controllable or promote computing system 510
One or more elements between data transmission, these elements are such as processor 514, system storage 516, communication interface
522, display adapter 526, input interface 530 and memory interface 534.
Communication interface 522 indicates that exemplary computing system 510 and one or more other equipment can be promoted in a broad sense
Between any types of communication or the communication equipment of form or adapter.For example, in certain embodiments, communication interface
522 can promote the communication between computing system 510 and special or public network including additional computing systems.Communication interface 522
Example include but not limited to:Wired network interface (such as network interface card), radio network interface (such as radio network interface
Card), modem and any other suitable interface.In at least one embodiment, communication interface 522 can via with net
Network, directly linking for such as internet are directly connected to provide with remote server.Communication interface 522 also can be for example, by office
Domain net (such as ethernet network), personal area network, phone or cable system, cellular phone connection, satellite data connection are appointed
What he is suitable connects to provide such connection indirectly.
In certain embodiments, communication interface 522 can also indicate host adapter, the host adapter be configured as through
Promoted by external bus or communication channel logical between computing system 510 and one or more complementary networks or storage device
Letter.The example of host adapter includes but not limited to:Small computer system interface (SCSI) host adapter, general serial are total
Line (USB) host adapter, 1394 host adapter of Institute of Electrical and Electronics Engineers (IEEE), Advanced Technology Attachment (ATA),
Parallel ATA (PATA), serial ATA (SATA) and outside SATA (eSATA) host adapter, Fibre Channel port adapters, with
Too net adapter etc..Communication interface 522 may also allow for computing system 510 to participate in distributed or remote computation.For example, communication interface
522 can receive the instruction from remote equipment or send an instruction to remote equipment for executing.
As shown in figure 5, computing system 510 may also include at least one display equipment 524, the display equipment is suitable via display
Orchestration 526 is connected to communication infrastructure 512.Display equipment 524 is usually indicated to show in a visual manner and is adapted to by display
Any types for the information that device 526 forwards or the equipment of form.Similarly, display adapter 526 usually indicates to be configured as turning
Send from figure, text and other data of communication infrastructure 512 (or come from frame buffer, as known in the art) with
The equipment of any types or form that are shown in display equipment 524.
As shown in figure 5, exemplary computing system 510 may also include is connected to communication infrastructure via input interface 530
512 at least one input equipment 528.Input equipment 528 usually indicates that input can be provided to exemplary computing system 510
(by computer or life at) any types or form input equipment.The example of input equipment 528 includes but not limited to:Key
Disk, indicating equipment, speech recognition apparatus or any other input equipment.
As shown in figure 5, exemplary computing system 510 may also include main storage device 532 and couple via memory interface 534
To the backup storage device 533 of communication infrastructure 512.Storage device 532 and 533 usually indicate can store data and/or
Any types of other computer-readable instructions or the storage device of form or medium.For example, storage device 532 and 533 can be
Disc driver (for example, so-called hard disk drive), solid state drive, floppy disk, tape drive, disc drives
Device, flash drive etc..Memory interface 534 usually indicate for storage device 532 and 533 with computing system 510 other
Any types of transmission data or the interface of form or equipment between component.In one example, the database 120 of Fig. 1 can be deposited
Storage is in main storage device 532.
In certain embodiments, storage device 532 and 533 can be configured as to be configured as storage computer software,
The removable storage unit of data or other computer-readable informations, which executes, to be read and/or is written.It is suitable that storage list can be removed
Member example include but not limited to:Floppy disk, tape, CD, flash memory device etc..Storage device 532 and 533 may also include
Allow by computer software, data or other computer-readable instructions be loaded into other similar structures in computing system 510 or
Equipment.For example, storage device 532 and 533 can be configured as reading and writing software, data or other computer-readable informations.
Storage device 532 and 533 can also be a part for computing system 510, or can be visited by other interface systems
The autonomous device asked.
Many other equipment or subsystem can be connected to computing system 510.On the contrary, without all portions shown in Fig. 5
Part and equipment all exist, and can also put into practice the embodiment for being described herein and/or illustrating.Equipment mentioned above and subsystem
It can also be interconnected by being different from mode shown in fig. 5.Any amount of software, firmware and/or hard also can be used in computing system 510
Part configures.For example, one or more exemplary implementation scheme disclosed herein can be encoded as on computer-readable medium
Computer program (also referred to as computer software, software application, computer-readable instruction or computer control logic).Such as this
Used in text, term " computer-readable medium ", which is often referred to can to store or carry any type of of computer-readable instruction, to be set
Standby, carrier or medium.The example of computer-readable medium includes but not limited to:Transmission type media (such as carrier wave) and non-transient type
Medium (such as magnetic storage medium, for example, hard disk drive, tape drive and floppy disk), optical storage media (for example, CD (CD),
Digital video disc (DVD) and Blu-ray Disc), electronic storage medium (for example, solid state drive and flash media) and other distribution
System.
Computer-readable medium comprising computer program can be loaded into computing system 510.It then can be by computer
The all or part of computer program stored on readable medium is stored in system storage 516 and/or storage device 532 and 533
Various pieces in.When being executed by processor 514, the computer program being loaded into computing system 510 can make processor 514
Execute the function and/or work of one or more of exemplary implementation scheme for being described herein and/or illustrating
For the device for executing it.Additionally or alternatively, it can realize and be described herein in firmware and/or hardware
And/or one or more of the exemplary implementation scheme illustrated.For example, computing system 510 can be configurable for implementing this
The application-specific integrated circuit (ASIC) of one or more of exemplary implementation scheme disclosed in text.
Fig. 6 is the block diagram of exemplary network architecture 600, wherein FTP client FTP 610,620 and 630 and server 640
It can be connected to network 650 with 645.As detailed above, all or part of of the network architecture 600 can individually or and other elements
In conjunction with executing one or more of step disclosed herein (one or more of step such as shown in Fig. 3) simultaneously
And/or person is as the device for executing it.All or part of of the network architecture 600 can also be used in the execution disclosure illustrate
Other steps and feature and/or as the device for executing it.
FTP client FTP 610,620 and 630 usually indicates the computing device or system of any types or form, such as Fig. 5
In exemplary computing system 510.Similarly, server 640 and 645 usually indicates to be configured to supply various database services
And/or the computing device or system of the certain software applications of operation, such as apps server or database service
Device.Network 650 usually indicates that any telecommunications or computer network include, for example, Intranet, WAN, LAN, PAN or Internet.
In one example, FTP client FTP 610,620 and/or 630 and/or server 640 and/or 645 may include the system 100 of Fig. 1
All or part of.
As shown in fig. 6, one or more storage device 660 (1)-(N) can be directly attached to server 640.Similarly, one
A or multiple storage device 670 (1)-(N) can be directly attached to server 645.Storage device 660 (1)-(N) and storage device
670 (1)-(N) usually indicate that the storage of any types or form that can store data and/or other computer-readable instructions is set
Standby or medium.In certain embodiments, storage device 660 (1)-(N) and storage device 670 (1)-(N) can indicate by with
It is set to and uses various agreements (such as Network File System (NFS), Server Message Block (SMB) or Universal Internet File System
(CIFS)) network connection storage (NAS) equipment communicated with server 640 and 645.
Server 640 and 645 is also connected to storage area network (SAN) framework 680.SAN frameworks 680 usually indicate energy
Enough promote any types of the communication between multiple storage devices or the computer network of form or framework.SAN frameworks 680 can promote
Communication between server 640 and 645 and multiple storage devices 690 (1)-(N) and/or intelligent storage array 695.SAN framves
Structure 680 can also promote FTP client FTP 610,620 and 630 in this way by network 650 and server 640 and 645
With the communication between storage device 690 (1)-(N) and/or intelligent storage array 695:Equipment 690 (1)-(N) and array 695 are in
Equipment now is locally attached for FTP client FTP 610,620 and 630.With storage device 660 (1)-(N) and storage device 670
(1)-(N) is identical, storage device 690 (1)-(N) and intelligent storage array 695 usually indicate can to store data and/or other
Any types of computer-readable instruction or the storage device of form or medium.
In certain embodiments, and with reference to the exemplary computing system of Fig. 5 510, communication interface is (logical in such as Fig. 5
Letter interface 522) it can be used for providing connectivity between each FTP client FTP 610,620 and 630 and network 650.Client system
System 610,620 and 630 can access server 640 or 645 using such as web browser or other client softwares
On information.Such software allows the access of FTP client FTP 610,620 and 630 to be set by server 640, server 645, storage
For 660 (1)-(N), the number of 695 trustship of storage device 670 (1)-(N), storage device 690 (1)-(N) or intelligent storage array
According to.Although Fig. 6 is shown exchanges data using network (such as internet), the implementation for being described herein and/or illustrating
Scheme is not limited only to internet or any specific network-based environment.
In at least one embodiment, the whole or one in one or more exemplary implementation schemes disclosed herein
Part can be encoded as computer program and be loaded into server 640, server 645, storage device 660 (1)-(N), storage set
Standby 670 (1)-(N), storage device 690 (1)-(N), intelligent storage array 695 or their arbitrary group are closed and are held by it
Row.All or part of in one or more exemplary implementation scheme disclosed herein can also be encoded as computer journey
Sequence is stored in server 640, is run by server 645, and is assigned to FTP client FTP 610,620 and by network 650
630。
As detailed above, one or more components of computing system 510 and/or the network architecture 600 it is executable and/or
As a kind of device the illustrative methods for anonymization journal entries are executed for individually or with other elements being combined
One or more steps.
Although above disclosure elaborates various embodiments, each frame using specific block diagram, flow chart and example
Figure component, flow chart step, the usable multiple hardwares of component, software or the firmware for operating and/or being described herein and/or illustrating
(or any combination thereof) configuration is independent and/or jointly realizes.In addition, in any disclosure for the component being included in other component
Appearance should be considered as being exemplary in itself, because can implement many other frameworks to realize identical function.
In some instances, all or part of of the exemplary system 100 in Fig. 1 can indicate cloud computing environment or be based on
The part of the environment of network.Cloud computing environment can provide various services and applications by internet.These clothes based on cloud
Business (such as software services, platform services, foundation structure services etc.) web browser or other remote interfaces can be passed through
It accesses.Various functionality described herein can be provided by remote desktop environment or any other computing environment based on cloud.
In various embodiments, all or part of of the exemplary system 100 in Fig. 1 can promote calculating based on cloud
Multi-tenant application in environment.In other words, software module as described herein can configure computing system (for example, server) to promote
Into the multi-tenant application of one or more of function described herein.For example, one or more of software module described herein
Server can be programmed with allow two or more clients (for example, customer) it is shared just running on the server answer
Use program.The server programmed in this way can between multiple customers (that is, tenant) sharing application program, operating system,
Processing system and/or storage system.One or more of module described herein can also be each customer segmentation multi-tenant application
The data and/or configuration information of program prevent a customer from accessing the data and/or configuration information of another customer.
According to various embodiments, all or part of of exemplary system 100 in Fig. 1 can be real in virtual environment
It is existing.For example, module described herein and/or data can be resident and/or execute in virtual machine.As used herein, term is " virtual
Machine " is often referred to any operating system environment abstracted from computing hardware by virtual machine manager (for example, management program).
Additionally or alternatively, module and/or data as described herein can be resident and/or hold in virtualization layer
Row.As used herein, term " virtualization layer " is often referred to covering operating system environment and/or is taken out from operating system environment
Any data Layer and/or application layer come.Virtualization layer can be by software virtualization solution (for example, file system filter)
Management, software virtualization solution by virtualization layer be rendered as just look like it be bottom basic operating system a part.Example
Such as, software virtualization solution can reset the calling of the position in initial orientation to basic file system and/or registration table
To the position in virtualization layer.
In some instances, all or part of of the exemplary system 100 in Fig. 1 can indicate the portion of mobile computing environment
Point.Mobile computing environment can be realized that these equipment include mobile phone, tablet computer, electronics by a variety of mobile computing devices
Book reader, personal digital assistant, wearable computing devices are (for example, computing device, smartwatch with head-mounted display
Deng), etc..In some instances, mobile computing environment can have one or more distinguishing characteristics, including (for example) supply battery
Electricity dependence, at any given time only present a foreground application, remote management feature, touch screen feature, (for example,
By offers such as global positioning system, gyroscope, accelerometers) modification of position and movement data, limitation to system-level configuration
And/or limitation third party software checks the restricted platform of the ability of the behavior of other applications, limits the installation of application program
Control device (for example, only installation from approved application program shop application program), etc..It is as described herein each
Kind function is provided to mobile computing environment and/or can be interacted with mobile computing environment.
In addition, all or part of of the exemplary system 100 in Fig. 1 can indicate one or more information management systems
Part is interacted with one or more information management systems, using the data generated by one or more information management systems and/or
Generate the data used by one or more information management systems.As used herein, term " information management " can refer to the guarantor of data
Shield, tissue and/or storage.The example of information management system may include but be not limited to:Storage system, standby system, filing system,
Dubbing system, highly available system, data search system, virtualization system etc..
In some embodiments, all or part of of the exemplary system 100 in Fig. 1 can indicate one or more letters
Cease the part of security system, generate the data protected by one or more information safety systems and/or with one or more information
Security system communications.As used herein, term " information security " can refer to the control of the access to protected data.Information security system
The example of system may include and be not limited to:The system, data loss prevention system, authentication system of managed security service are provided
System, access control system, encryption system, strategy follow system, intrusion detection and guard system, electronics discovery system etc..
According to some examples, all or part of of exemplary system 100 in Fig. 1 can indicate one or more endpoints peaces
System-wide part, with one or more endpoint security system communications and/or by one or more endpoint security system protections.Such as
Used herein, term " endpoint security " can refer to protection point-to-point system from using, accessing and/or controlling without permission and/or illegally
System.The example of endpoint protection system may include and be not limited to:Anti-malware system, customer certification system, encryption system, secrecy
System, spam filter service, etc..
The procedure parameter and sequence of steps for being described herein and/or illustrating only provide and can be according to need by way of example
Change.For example, although the step of as shown herein and/or description can be shown or be discussed with particular order, these steps differ
It is fixed to need to execute by the sequence for illustrating or discussing.The various illustrative methods for being described herein and/or illustrating can also be omitted herein
Description or one or more of the step of illustrate, or other than those of disclosed step further include other step.
Although described in the background of global function computing system and/or instantiating various embodiments, these are exemplary
One or more of embodiment can be used as various forms of program products to distribute, and distribution is executed for practical without taking into account
Computer-readable medium specific type.The software module for executing certain tasks can also be used in embodiments disclosed herein
To realize.These software modules may include script, batch file or be storable on computer readable storage medium or computing system
In other executable files.In some embodiments, computer system configurations can be execution this paper institutes by these software modules
One or more of the disclosed exemplary embodiments.
In addition, one or more of module as described herein can be by the expression of data, physical equipment and/or physical equipment
Another form is converted to from a kind of form.For example, one or more of module described herein can be with:Receive the behaviour to be converted
Make journal entries, conversion log entry, output transformation result and carrys out anonymization one with anonymization journal entries, using transformation result
Or multiple data loggings and transformation result is stored to protect personal recognizable information.In addition to this it or alternatively selects
Select, one or more of module described herein can by executing on the computing device, on the computing device store data and/or
It is otherwise interacted with computing device, by processor, volatile memory, nonvolatile memory, and/or physical computing
Any other part of equipment is converted to another form from a kind of form.
It is to enable others skilled in the art best using disclosed herein to provide previously described purpose
The various aspects of exemplary implementation scheme.The exemplary description is not intended in detail or is confined to disclosed any accurate
Form.Under the premise of not departing from spirit and scope of the present invention, many modifications and variations can be carried out.Embodiment party disclosed herein
Case should all be considered as illustrative rather than restrictive in all respects.It should refer to appended claims and its equivalent form come
Determine the scope of the present invention.
Unless otherwise stated, the term " being connected to " used in the present description and claims and " being connected to "
(and its derivative form) is construed as allowing directly or indirectly (that is, via other elements or component) connection.In addition, at this
The term "a" or "an" used in description and claims it should be understood that at least one of " ... ".Most
Afterwards, for ease of using, the term " comprising " and " having " that uses in the present description and claims (and it derives shape
Formula) with word "comprising" it is interchangeable and with word "comprising" have identical meanings.
Claims (20)
1. a kind of computer implemented method for anonymization journal entries, at least part of the method is by including at least
The computing device of one processor executes, the method includes:
Detect the data pattern in multiple journal entries, the multiple journal entries record by executed at least one equipment to
The event that a few process executes;
At least one data field for including variable data in the multiple journal entries is identified in the data pattern;
Whether the data field of the assessment comprising variable data includes sensitive data with the determination data field;
Whether include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field
With the multiple journal entries of anonymization.
2. computer implemented method according to claim 1, wherein detecting the number in the multiple journal entries
Include that source template study analysis is executed to the multiple journal entries according to pattern.
3. computer implemented method according to claim 1, wherein detecting the number in the multiple journal entries
Include that longest common subsequence analysis is executed to the multiple journal entries according to pattern.
4. computer implemented method according to claim 1, further includes:
Receive the journal entries of the additional process executed on optional equipment;
By the data patterns match in one group of data pattern being previously identified in the journal entries and the multiple journal entries;
Data anonymous strategy of the identification corresponding to the data pattern;
By the application corresponding data anonymous strategy come journal entries described in anonymization.
5. computer implemented method according to claim 1, further includes:
Determination is wherein necessary for being considered as that the data pattern of anonymization finds the contexts of privacy of the data pattern
Number of thresholds;
The data pattern is detected in multiple privacy contexts;
Determine that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value;
In response to determining that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value, institute is determined
It is anonymization to state data pattern.
6. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes sensitive data;
The data anonymous strategy is by least one of following come data field described in anonymization:
The data field is encrypted using uni-directional hash;
The data field is encrypted using reversible encryption;
The data field is replaced using random data;
The data field is replaced using static data;
Remove the data field;
The generalization data field.
7. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes enumerated data and therefore do not include sensitive data;
The data anonymous strategy does not change the data field.
8. computer implemented method according to claim 1, wherein:
The data field assessment determines that the data field includes the data of the known data type for not including sensitive data;
The data anonymous strategy does not change the data field.
9. computer implemented method according to claim 1, wherein:
Even if the data field, which is previously determined to not include if the sensitive data data field is assessed, can determine whether the number
Include sensitive data now according to field;
The data field in the multiple existing journal entries of data anonymous strategy anonymization.
10. a kind of system for anonymization journal entries, the system comprises:
Mode module stored in memory, the mode module detects the data pattern in multiple journal entries, described more
The event that a journal entries record is executed by least one process executed at least one equipment;
Field analysis module stored in memory, the field analysis module identify the multiple in the data pattern
Include at least one data field of variable data in journal entries;
Data analysis module stored in memory, the data analysis module assessment include the data word of variable data
Whether section includes sensitive data with the determination data field;
Anonymization module stored in memory, whether the anonymization module is in response to the determination data field comprising quick
Feel data, data anonymous strategy is applied to the data field with the multiple journal entries of anonymization;
At least one physical processor, at least one physical processor are configured as executing the mode module, the word
Piecewise analysis module, the data analysis module and the anonymization module.
11. system according to claim 10, wherein the mode module is by disappearing to the execution of the multiple journal entries
Breath Template Learning is analyzed to detect the data pattern in the multiple journal entries.
12. system according to claim 10, wherein the mode module is by executing most the multiple journal entries
Long common subsequence is analyzed to detect the data pattern in the multiple journal entries.
13. system according to claim 10, further includes:
Receive the journal entries of the additional process executed on optional equipment;
By the data patterns match in one group of data pattern being previously identified in the journal entries and the multiple journal entries;
Data anonymous strategy of the identification corresponding to the data pattern;
By the application corresponding data anonymous strategy come journal entries described in anonymization.
14. system according to claim 10, further includes:
Determination is wherein necessary for being considered as that the data pattern of anonymization finds the contexts of privacy of the data pattern
Number of thresholds;
The data pattern is detected in multiple privacy contexts;
Determine that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value;
In response to determining that the quantity of the contexts of privacy comprising the data pattern is more than the contexts of privacy threshold value, institute is determined
It is anonymization to state data pattern.
15. system according to claim 10, wherein:
The data field assessment determines that the data field includes sensitive data;
The data anonymous strategy is by least one of following come data field described in anonymization:
The data field is encrypted using uni-directional hash;
The data field is encrypted using reversible encryption;
The data field is replaced using random data;
The data field is replaced using static data;
Remove the data field;
The generalization data field.
16. system according to claim 10, wherein:
The data field assessment determines that the data field includes enumerated data and therefore do not include sensitive data;
The data anonymous strategy does not change the data field.
17. system according to claim 10, wherein:
The data field assessment determines that the data field includes the data of the known data type for not including sensitive data;
The data anonymous strategy does not change the data field.
18. system according to claim 10, wherein:
Even if the data field, which is previously determined to not include if the sensitive data data field is assessed, can determine whether the number
Include sensitive data now according to field;
The data field in the multiple existing journal entries of data anonymous strategy anonymization.
19. a kind of includes the non-transitory computer-readable medium of one or more computer-readable instructions, one or more of
Computer-readable instruction is at least one processor execution by computing device so that the computing device:
Detect the data pattern in multiple journal entries, the multiple journal entries record by executed at least one equipment to
The event that a few process executes;
At least one data field for including variable data in the multiple journal entries is identified in the data pattern;
Whether the data field of the assessment comprising variable data includes sensitive data with the determination data field;
Whether include sensitive data in response to the determination data field, data anonymous strategy is applied to the data field
With the multiple journal entries of anonymization.
20. non-transitory computer-readable medium according to claim 19, wherein one or more of computer-readable
Instruction is so that the computing device is the multiple to detect by executing source template study analysis to the multiple journal entries
The data pattern in journal entries.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/947915 | 2015-11-20 | ||
US14/947,915 US10326772B2 (en) | 2015-11-20 | 2015-11-20 | Systems and methods for anonymizing log entries |
PCT/US2016/053995 WO2017087074A1 (en) | 2015-11-20 | 2016-09-27 | Systems and methods for anonymizing log entries |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108351946A true CN108351946A (en) | 2018-07-31 |
CN108351946B CN108351946B (en) | 2022-03-08 |
Family
ID=57137266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680062430.3A Active CN108351946B (en) | 2015-11-20 | 2016-09-27 | System and method for anonymizing log entries |
Country Status (5)
Country | Link |
---|---|
US (1) | US10326772B2 (en) |
EP (1) | EP3378007B1 (en) |
JP (1) | JP2019500679A (en) |
CN (1) | CN108351946B (en) |
WO (1) | WO2017087074A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110874477A (en) * | 2018-08-29 | 2020-03-10 | 北京京东尚科信息技术有限公司 | Log data encryption method and device, electronic equipment and medium |
CN111314292A (en) * | 2020-01-15 | 2020-06-19 | 上海观安信息技术股份有限公司 | Data security inspection method based on sensitive data identification |
CN112783850A (en) * | 2021-02-09 | 2021-05-11 | 珠海豹趣科技有限公司 | File enumeration method and device based on USN log, electronic equipment and storage medium |
CN112800003A (en) * | 2021-01-20 | 2021-05-14 | 华云数据(厦门)网络有限公司 | Recommendation method for creating snapshot, snapshot creation method and device and electronic equipment |
CN112883389A (en) * | 2021-02-09 | 2021-06-01 | 上海凯馨信息科技有限公司 | Reversible desensitization algorithm supporting feature preservation |
CN113452674A (en) * | 2021-05-21 | 2021-09-28 | 南京逸智网络空间技术创新研究院有限公司 | Galois field-based flow log multi-view anonymization method |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10223429B2 (en) * | 2015-12-01 | 2019-03-05 | Palantir Technologies Inc. | Entity data attribution using disparate data sets |
US10419401B2 (en) * | 2016-01-08 | 2019-09-17 | Capital One Services, Llc | Methods and systems for securing data in the public cloud |
US10192278B2 (en) * | 2016-03-16 | 2019-01-29 | Institute For Information Industry | Traceable data audit apparatus, method, and non-transitory computer readable storage medium thereof |
US10754983B2 (en) * | 2017-03-31 | 2020-08-25 | Interset Software Inc. | Anonymization of sensitive data for use in user interfaces |
US11023594B2 (en) * | 2017-05-22 | 2021-06-01 | Georgetown University | Locally private determination of heavy hitters |
US11062041B2 (en) * | 2017-07-27 | 2021-07-13 | Citrix Systems, Inc. | Scrubbing log files using scrubbing engines |
US10469307B2 (en) * | 2017-09-26 | 2019-11-05 | Cisco Technology, Inc. | Predicting computer network equipment failure |
US11263341B1 (en) * | 2017-10-11 | 2022-03-01 | Snap Inc. | Identifying personally identifiable information within an unstructured data store |
US10565398B2 (en) * | 2017-10-26 | 2020-02-18 | Sap Se | K-anonymity and L-diversity data anonymization in an in-memory database |
US10333902B1 (en) * | 2017-12-19 | 2019-06-25 | International Business Machines Corporation | Data sanitization system for public host platform |
US11907941B2 (en) * | 2018-01-04 | 2024-02-20 | Micro Focus Llc | Anonymization of data fields in transactions |
DK3800856T3 (en) * | 2018-02-20 | 2023-08-28 | Darktrace Holdings Ltd | Cyber security appliance for a cloud infrastructure |
US11301568B1 (en) * | 2018-04-05 | 2022-04-12 | Veritas Technologies Llc | Systems and methods for computing a risk score for stored information |
US11113417B2 (en) * | 2018-07-10 | 2021-09-07 | Sap Se | Dynamic data anonymization using taint tracking |
US20200125725A1 (en) * | 2018-10-19 | 2020-04-23 | Logrhythm, Inc. | Generation and maintenance of identity profiles for implementation of security response |
US11030350B2 (en) * | 2018-11-29 | 2021-06-08 | Visa International Service Association | System, method, and apparatus for securely outputting sensitive information |
US11803481B2 (en) | 2019-02-28 | 2023-10-31 | Hewlett Packard Enterprise Development Lp | Data anonymization for a document editor |
US11151285B2 (en) | 2019-03-06 | 2021-10-19 | International Business Machines Corporation | Detecting sensitive data exposure via logging |
US11188680B2 (en) | 2019-09-20 | 2021-11-30 | International Business Machines Corporation | Creating research study corpus |
US11328089B2 (en) | 2019-09-20 | 2022-05-10 | International Business Machines Corporation | Built-in legal framework file management |
US11106813B2 (en) * | 2019-09-20 | 2021-08-31 | International Business Machines Corporation | Credentials for consent based file access |
US11327665B2 (en) | 2019-09-20 | 2022-05-10 | International Business Machines Corporation | Managing data on volumes |
US11321488B2 (en) | 2019-09-20 | 2022-05-03 | International Business Machines Corporation | Policy driven data movement |
US11443056B2 (en) | 2019-09-20 | 2022-09-13 | International Business Machines Corporation | File access restrictions enforcement |
US11861493B2 (en) * | 2019-12-30 | 2024-01-02 | Micron Technology, Inc. | Machine learning models based on altered data and systems and methods for training and using the same |
EP3905087B1 (en) * | 2020-04-27 | 2023-01-18 | Brighter AI Technologies GmbH | Method and system for selective and privacy-preserving anonymization |
US11586486B2 (en) * | 2020-08-24 | 2023-02-21 | Vmware, Inc. | Methods and systems that efficiently cache log/event messages in a distributed log-analytics system |
US11874951B2 (en) * | 2021-03-16 | 2024-01-16 | Tata Consultancy Services Limited | System and method for risk aware data anonymization |
US20220327237A1 (en) * | 2021-04-13 | 2022-10-13 | Bi Science (2009) Ltd | System and a method for identifying private user information |
WO2023016641A1 (en) | 2021-08-11 | 2023-02-16 | Telefonaktiebolaget Lm Ericsson (Publ) | Handling of logged restricted information based on tag syntax allocation |
US12019782B1 (en) * | 2021-11-19 | 2024-06-25 | Trend Micro Incorporated | Privacy protection for customer events logs of cybersecurity events |
EP4235474A1 (en) | 2022-02-24 | 2023-08-30 | Fundación Tecnalia Research & Innovation | Method and system for anonymising event logs |
US20230315884A1 (en) * | 2022-04-01 | 2023-10-05 | Blackberry Limited | Event data processing |
US12013970B2 (en) * | 2022-05-16 | 2024-06-18 | Bank Of America Corporation | System and method for detecting and obfuscating confidential information in task logs |
KR20240013440A (en) * | 2022-07-22 | 2024-01-30 | 쿠팡 주식회사 | Electronic apparatus for processing data and method thereof |
US20240070322A1 (en) * | 2022-08-30 | 2024-02-29 | Vmware, Inc. | System and method for anonymizing sensitive information in logs of applications |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136189A (en) * | 2011-11-28 | 2013-06-05 | 国际商业机器公司 | Confidential information identifying method, information processing apparatus, and program |
US20140237620A1 (en) * | 2011-09-28 | 2014-08-21 | Tata Consultancy Services Limited | System and method for database privacy protection |
US20140304825A1 (en) * | 2011-07-22 | 2014-10-09 | Vodafone Ip Licensing Limited | Anonymization and filtering data |
US20150302206A1 (en) * | 2014-04-22 | 2015-10-22 | International Business Machines Corporation | Method and system for hiding sensitive data in log files |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002032473A (en) * | 2000-07-18 | 2002-01-31 | Fujitsu Ltd | System and program storage medium for medical information processing |
US7174507B2 (en) * | 2003-02-10 | 2007-02-06 | Kaidara S.A. | System method and computer program product for obtaining structured data from text |
US20040199828A1 (en) * | 2003-04-01 | 2004-10-07 | International Business Machines Corporation | Method and apparatus for tracing troubleshooting events for aiding technical assistance |
AU2004313518A1 (en) * | 2003-12-15 | 2005-07-28 | Evolveware Information Technology (India) Pty. Ltd | An apparatus for migration and conversion of software code from any source platform to any target platform |
US8412671B2 (en) * | 2004-08-13 | 2013-04-02 | Hewlett-Packard Development Company, L.P. | System and method for developing a star schema |
US20070299881A1 (en) * | 2006-06-21 | 2007-12-27 | Shimon Bouganim | System and method for protecting selected fields in database files |
US8341405B2 (en) * | 2006-09-28 | 2012-12-25 | Microsoft Corporation | Access management in an off-premise environment |
JP2008312156A (en) * | 2007-06-18 | 2008-12-25 | Hitachi Information & Control Solutions Ltd | Information processing apparatus, encryption processing method, and encryption processing program |
US8001136B1 (en) * | 2007-07-10 | 2011-08-16 | Google Inc. | Longest-common-subsequence detection for common synonyms |
US8166313B2 (en) | 2008-05-08 | 2012-04-24 | Fedtke Stephen U | Method and apparatus for dump and log anonymization (DALA) |
AU2011201369A1 (en) * | 2010-03-25 | 2011-10-13 | Rl Solutions | Systems and methods for redacting sensitive data entries |
US8544104B2 (en) * | 2010-05-10 | 2013-09-24 | International Business Machines Corporation | Enforcement of data privacy to maintain obfuscation of certain data |
JP2013186508A (en) * | 2012-03-06 | 2013-09-19 | Mitsubishi Denki Information Technology Corp | Data processing system and log data management device |
US20130332194A1 (en) * | 2012-06-07 | 2013-12-12 | Iquartic | Methods and systems for adaptive ehr data integration, query, analysis, reporting, and crowdsourced ehr application development |
JPWO2014181541A1 (en) * | 2013-05-09 | 2017-02-23 | 日本電気株式会社 | Information processing apparatus and anonymity verification method for verifying anonymity |
US9448859B2 (en) * | 2013-09-17 | 2016-09-20 | Qualcomm Incorporated | Exploiting hot application programming interfaces (APIs) and action patterns for efficient storage of API logs on mobile devices for behavioral analysis |
US9965606B2 (en) * | 2014-02-07 | 2018-05-08 | Bank Of America Corporation | Determining user authentication based on user/device interaction |
US9378079B2 (en) * | 2014-09-02 | 2016-06-28 | Microsoft Technology Licensing, Llc | Detection of anomalies in error signals of cloud based service |
US9838359B2 (en) * | 2015-10-29 | 2017-12-05 | Ca, Inc. | Separation of IoT network thing identification data at a network edge device |
US10338977B2 (en) * | 2016-10-11 | 2019-07-02 | Oracle International Corporation | Cluster-based processing of unstructured log messages |
US11057344B2 (en) * | 2016-12-30 | 2021-07-06 | Fortinet, Inc. | Management of internet of things (IoT) by security fabric |
-
2015
- 2015-11-20 US US14/947,915 patent/US10326772B2/en active Active
-
2016
- 2016-09-27 CN CN201680062430.3A patent/CN108351946B/en active Active
- 2016-09-27 WO PCT/US2016/053995 patent/WO2017087074A1/en active Application Filing
- 2016-09-27 JP JP2018523029A patent/JP2019500679A/en active Pending
- 2016-09-27 EP EP16781925.9A patent/EP3378007B1/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140304825A1 (en) * | 2011-07-22 | 2014-10-09 | Vodafone Ip Licensing Limited | Anonymization and filtering data |
US20140237620A1 (en) * | 2011-09-28 | 2014-08-21 | Tata Consultancy Services Limited | System and method for database privacy protection |
CN103136189A (en) * | 2011-11-28 | 2013-06-05 | 国际商业机器公司 | Confidential information identifying method, information processing apparatus, and program |
US20150302206A1 (en) * | 2014-04-22 | 2015-10-22 | International Business Machines Corporation | Method and system for hiding sensitive data in log files |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110874477A (en) * | 2018-08-29 | 2020-03-10 | 北京京东尚科信息技术有限公司 | Log data encryption method and device, electronic equipment and medium |
CN111314292A (en) * | 2020-01-15 | 2020-06-19 | 上海观安信息技术股份有限公司 | Data security inspection method based on sensitive data identification |
CN112800003A (en) * | 2021-01-20 | 2021-05-14 | 华云数据(厦门)网络有限公司 | Recommendation method for creating snapshot, snapshot creation method and device and electronic equipment |
CN112783850A (en) * | 2021-02-09 | 2021-05-11 | 珠海豹趣科技有限公司 | File enumeration method and device based on USN log, electronic equipment and storage medium |
CN112883389A (en) * | 2021-02-09 | 2021-06-01 | 上海凯馨信息科技有限公司 | Reversible desensitization algorithm supporting feature preservation |
CN112783850B (en) * | 2021-02-09 | 2023-09-22 | 珠海豹趣科技有限公司 | File enumeration method and device based on USN (universal serial bus) log, electronic equipment and storage medium |
CN113452674A (en) * | 2021-05-21 | 2021-09-28 | 南京逸智网络空间技术创新研究院有限公司 | Galois field-based flow log multi-view anonymization method |
CN113452674B (en) * | 2021-05-21 | 2024-05-07 | 南京逸智网络空间技术创新研究院有限公司 | Galois field-based flow log multi-view anonymization method |
Also Published As
Publication number | Publication date |
---|---|
US10326772B2 (en) | 2019-06-18 |
US20170149793A1 (en) | 2017-05-25 |
CN108351946B (en) | 2022-03-08 |
EP3378007B1 (en) | 2022-01-19 |
JP2019500679A (en) | 2019-01-10 |
EP3378007A1 (en) | 2018-09-26 |
WO2017087074A1 (en) | 2017-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108351946A (en) | System and method for anonymization journal entries | |
US9245123B1 (en) | Systems and methods for identifying malicious files | |
US9401925B1 (en) | Systems and methods for detecting security threats based on user profiles | |
US8925037B2 (en) | Systems and methods for enforcing data-loss-prevention policies using mobile sensors | |
JP6122555B2 (en) | System and method for identifying compromised private keys | |
JP6101408B2 (en) | System and method for detecting attacks on computing systems using event correlation graphs | |
US9077747B1 (en) | Systems and methods for responding to security breaches | |
US9652597B2 (en) | Systems and methods for detecting information leakage by an organizational insider | |
JP6703616B2 (en) | System and method for detecting security threats | |
CN108701188A (en) | In response to detecting the potential system and method for extorting software for modification file backup | |
US10410158B1 (en) | Systems and methods for evaluating cybersecurity risk | |
CN108293044A (en) | System and method for detecting malware infection via domain name service flow analysis | |
US9323930B1 (en) | Systems and methods for reporting security vulnerabilities | |
US9749299B1 (en) | Systems and methods for image-based encryption of cloud data | |
CN108292133A (en) | System and method for identifying compromised device in industrial control system | |
US9652615B1 (en) | Systems and methods for analyzing suspected malware | |
US10425435B1 (en) | Systems and methods for detecting anomalous behavior in shared data repositories | |
US9973525B1 (en) | Systems and methods for determining the risk of information leaks from cloud-based services | |
US10313386B1 (en) | Systems and methods for assessing security risks of users of computer networks of organizations | |
CN109997138A (en) | For detecting the system and method for calculating the malicious process in equipment | |
US9569617B1 (en) | Systems and methods for preventing false positive malware identification | |
US10366344B1 (en) | Systems and methods for selecting features for classification | |
US9659176B1 (en) | Systems and methods for generating repair scripts that facilitate remediation of malware side-effects | |
US20190311136A1 (en) | Systems and methods for utilizing an information trail to enforce data loss prevention policies on potentially malicious file activity | |
US9754086B1 (en) | Systems and methods for customizing privacy control systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200103 Address after: California, USA Applicant after: CA,INC. Address before: California, USA Applicant before: Symantec Corporation |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |