US20130031111A1 - System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database - Google Patents
System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database Download PDFInfo
- Publication number
- US20130031111A1 US20130031111A1 US13/645,164 US201213645164A US2013031111A1 US 20130031111 A1 US20130031111 A1 US 20130031111A1 US 201213645164 A US201213645164 A US 201213645164A US 2013031111 A1 US2013031111 A1 US 2013031111A1
- Authority
- US
- United States
- Prior art keywords
- client system
- bloom filter
- database
- objects
- prevalence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
Definitions
- the present invention relates to networks, and more particularly to increasing the efficiency of networks with the growing increase of cloud based technologies.
- the threat landscape In the context of network security, the threat landscape has grown exponentially over the last few years. The threat landscape has grown so much that most Anti-Virus vendors are evaluating and implementing various technologies to mitigate the unmatched growth in the number of threats. As the threat landscape grows, so does the need to mitigate the threats associated with that growth.
- the lookup rate may be desirable to keep the lookup rate to less than a certain number of lookups per day. For example, it may be desirable to keep the lookup rate to less than ten lookups per day per client.
- harsh criteria is often used to keep the lookup rates low.
- problematic items e.g. malware, infected files, etc.
- a system, method, and computer program product are provided for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database.
- a database including a plurality of known objects is identified. Additionally, the database is segmented into a plurality of segments. Furthermore, each of the plurality of known objects are assigned to one of the plurality of segments, based at least in part on a prevalence associated with each of the plurality of known objects.
- FIG. 1 illustrates a network architecture, in accordance with one embodiment.
- FIG. 2 shows a representative hardware environment that may be associated with the servers and/or clients of FIG. 1 , in accordance with one embodiment.
- FIG. 3 shows a method for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment.
- FIG. 4 shows a method for reducing a number of signature lookups required by a system, in accordance with one embodiment.
- FIG. 5 shows a system for reducing a number of signature lookups required by a system and for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment.
- FIG. 1 illustrates a network architecture 100 , in accordance with one embodiment.
- a plurality of networks 102 is provided.
- the networks 102 may each take any form including, but not limited to a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, peer-to-peer network, etc.
- LAN local area network
- WAN wide area network
- peer-to-peer network etc.
- servers 104 which are capable of communicating over the networks 102 .
- clients 106 are also coupled to the networks 102 and the servers 104 .
- Such servers 104 and/or clients 106 may each include a desktop computer, lap-top computer, hand-held computer, mobile phone, personal digital assistant (PDA), peripheral (e.g. printer, etc.), any component of a computer, and/or any other type of logic.
- PDA personal digital assistant
- peripheral e.g. printer, etc.
- any component of a computer and/or any other type of logic.
- at least one gateway 108 is optionally coupled therebetween.
- FIG. 2 shows a representative hardware environment that may be associated with the servers 104 and/or clients 106 of FIG. 1 , in accordance with one embodiment.
- Such figure illustrates a typical hardware configuration of a workstation in accordance with one embodiment having a central processing unit 210 , such as a microprocessor, and a number of other units interconnected via a system bus 212 .
- a central processing unit 210 such as a microprocessor
- the workstation shown in FIG. 2 includes a Random Access Memory (RAM) 214 , Read Only Memory (ROM) 216 , an I/O adapter 218 for connecting peripheral devices such as disk storage units 220 to the bus 212 , a user interface adapter 222 for connecting a keyboard 224 , a mouse 226 , a speaker 228 , a microphone 232 , and/or other user interface devices such as a touch screen (not shown) to the bus 212 , communication adapter 234 for connecting the workstation to a communication network 235 (e.g., a data processing network) and a display adapter 236 for connecting the bus 212 to a display device 238 .
- a communication network 235 e.g., a data processing network
- display adapter 236 for connecting the bus 212 to a display device 238 .
- the workstation may have resident thereon any desired operating system. It will be appreciated that an embodiment may also be implemented on platforms and operating systems other than those mentioned.
- One embodiment may be written using JAVA, C, and/or C++ language, or other programming languages, along with an object oriented programming methodology.
- Object oriented programming (OOP) has become increasingly used to develop complex applications.
- FIG. 3 shows a method 300 for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment.
- the method 300 may be implemented in the context of the architecture and environment of FIGS. 1 and/or 2 . Of course, however, the method 300 may be carried out in any desired environment.
- a database including a plurality of known objects is identified. See operation 302 .
- the objects may include any item capable of being stored in the database.
- the known objects may include files or programs.
- the known objects may include objects that are known to be non-malicious.
- the database may include a whitelist database.
- a whitelist refers to any data structure that identifies one or more objects that are known to be non-malicious objects.
- the database may include a blacklist database.
- a blacklist refers to any data structure that identifies one or more objects that are known to be malicious, unsafe, or undesirable objects.
- the known objects may include objects that are known to be malicious.
- the objects may include whitelisted objects.
- the whitelisted objects may be defined utilizing a Bloom filter.
- the Bloom filter may be utilized as a whitelist to offset a high false positive rate.
- the database is segmented into a plurality of segments. See operation 304 .
- the database may be segmented into any number of segments.
- each of the plurality of known objects are assigned to one of the plurality of segments, based at least in part on a prevalence associated with each of the plurality of known objects. See operation 306 .
- the prevalence may be indicative of an amount the object is accessed and/or utilized.
- the prevalence may include a high prevalence or a low prevalence.
- a high prevalence may indicate that an object is accessed and/or utilized regularly, or more than a predetermined amount.
- a low prevalence may indicate that an object is not accessed and/or utilized frequently, or less than a predetermined amount.
- the prevalence information may be obtained utilizing client system based antivirus software.
- the segments may then be allocated such that at least one of the segments corresponds to low prevalence objects. Additionally, at least one of the segments may correspond to high prevalence objects.
- the method 300 may further include determining whether to perform a lookup operation on the database.
- a Bloom filter may be utilized to determine whether to perform the lookup operation on the database.
- the Bloom filter may be stored on a client system.
- the Bloom filter may also be associated with and/or represent a blacklist.
- a server system may be configured to update the Bloom filter stored on the client system.
- the updating may include pushing hashes stored as Bloom filter bit vectors to the client system.
- Bloom filter updates may be sent along with additional software updates.
- FIG. 4 shows a method 400 for reducing a number of signature lookups required by a system, in accordance with one embodiment.
- the method 400 may be implemented in the context of the architecture and environment of FIGS. 1-3 . Of course, however, the method 400 may be carried out in any desired environment. It should also be noted that the aforementioned definitions may apply during the present description.
- a signature lookup request is received. See operation 402 .
- the signature lookup request is then analyzed using a Bloom filter stored on a client device. See operation 404 .
- the Bloom filter may be stored on a client system.
- the client system may include a computer, a PDA, a mobile phone, or any other type of computing device.
- a server system may update the Bloom filter stored on the client system.
- the updating may include pushing hashes stored as Bloom filter bit vectors to the client system.
- Bloom filter updates may be sent along with additional software updates.
- the Bloom filter may be associated with a blacklist.
- the number of network based lookups in a network may be reduced.
- the number of network based lookups in a cloud is very high. This is because the network based lookups involve signature lookups across a network for each file scanned on a system (e.g. a client system, etc.).
- problematic items e.g. malware, etc.
- an antivirus DAT (AV DAT) based scanning model may be implemented.
- the high number of specific signatures based on checksum or hash functions e.g. Cyclic Redundancy Check, Message Digest Algorithm, etc.
- checksum or hash functions e.g. Cyclic Redundancy Check, Message Digest Algorithm, etc.
- large memory footprints of DATs may be a challenge on some low memory systems and on systems with slower connectivity to the Internet for downloading these files.
- the DAT releases may have to be very frequent to achieve the existing performance levels of real time lookups.
- a set membership data structure may be utilized, such as a Bloom filter, that provides large savings in space, potentially at the expense of false positives.
- the MD5 hashes may be stored as a Bloom filter bit vector and pushed very frequently to the client system. Since, in some cases, an MD5 lookup may only determine if the MD5 for a given file is present in the bad file set, this lookup may be accomplished with the local bloom filter.
- Bloom filters may be used to determine if a lookup should occur. Where false rates are not an issue, Bloom filters that have a high compression, and therefore a very small size, may be utilized. Additionally, in one embodiment, Bloom filter updates may be streamed between DAT releases to ensure these are near to real time lookups.
- FIG. 5 shows a system 500 for reducing a number of signature lookups required by a system and for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment.
- the system 500 may be implemented in the context of the architecture and environment of FIGS. 1-4 .
- the system 500 may be implemented in the context of any desired environment.
- the aforementioned definitions may apply during the present description.
- the system 500 may include memory 502 .
- the memory 502 may be allocated to a database dedicated to a whitelist database 504 and/or a blacklist database 506 .
- the whitelist database 504 and the blacklist database 506 may be located on a client system 508 .
- the client system 508 may be in communication with a server system 510 over a network 512 .
- the whitelist database 504 may be segmented into segments that include high prevalence objects and low prevalence objects. This may be implemented to store a large whitelist of programs on a host in a memory efficient way.
- AV Anti-Virus
- One issue with this approach is the increasing number of signatures a client side solution needs to carry every time an AV vendor analyzes and deems a file/network packet/behavior as malign. These are typically delivered to the client computers in the form of signature updates.
- This technology is generally more pro-active in mitigating the threats as anything new entering a system is deemed suspicious.
- This technique again calls for the client systems to carry the most recent updates of a whitelist database.
- a whitelist database would carry signatures of all the files that are known to be benign. However, with time and advancements in technology, the number of “good” files is also expected to increase. Thus, white listing may also see exponentially increasing updates.
- new proactive techniques may have higher than usual false positive rates.
- a technique may be implemented to store a large whitelist of programs on a host in a memory efficient way. Therefore, the ability to store a large whitelist on a host allows proactive techniques to be more aggressive against new and potentially unseen malware samples.
- a whitelist database may be segmented into several parts, including high prevalence parts and low prevalence parts. In one embodiment, this information may be collected through the client side AV software whenever programs are executed on the system.
- a first segment may contain signatures for files that are most prevalent (e.g. information on all Microsoft Office files Adobe files, all system .dlls loaded by these applications, etc.).
- a second segment may contain signatures for the files that are not prevalent.
- the first segment may contain files released as part of operating systems and as part of widely used software applications. In general, this set of files would not change frequently. As an option, these files may be delivered as a bloom filter bit vector, representing the MD5 values of all the white listed files to the client systems incrementally at longer intervals of time. In this way, the overhead of delivering large sized signature files may be reduced.
- Bloom filters may be used as a whitelist to offset a high false positive rate an aggressive proactive test may introduce. For example, if a data mining technique is used to detect 90% tpr at 1% fpr, a Bloom filter may be used to make sure that the 90% do not contain known good applications. In this way, as a worst case, the heuristic would be ineffective if the Bloom filter has a false positive.
- the blacklist 506 may be represented by a set membership data structure, such as a Bloom filter, that provides large savings in space.
- the MD5 hashes may be stored as a Bloom filter bit vector and pushed very frequently to the client system 508 by the server system 510 over the network 512 . Since, in some cases, an MD5 lookup may only determine if the MD5 for a given file is present in the bad file set, this lookup may be accomplished with the local bloom filter.
- the Bloom filter may be used to determine if a lookup should occur.
- Bloom filters that have a high compression and a small size, may be utilized.
- Bloom filter updates may be streamed between DAT releases provided by the server system 510 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A system, method, and computer program product are provided for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database. In operation, a database including a plurality of known objects is identified. Additionally, the database is segmented into a plurality of segments. Furthermore, each of the plurality of known objects are assigned to one of the plurality of segments, based at least in part on a prevalence associated with each of the plurality of known objects.
Description
- The present invention relates to networks, and more particularly to increasing the efficiency of networks with the growing increase of cloud based technologies.
- In the context of network security, the threat landscape has grown exponentially over the last few years. The threat landscape has grown so much that most Anti-Virus vendors are evaluating and implementing various technologies to mitigate the unmatched growth in the number of threats. As the threat landscape grows, so does the need to mitigate the threats associated with that growth.
- Currently, the number of network based lookups required in a network cloud is very high. These network based lookups include performing signature lookups across a network for each file scanned on a system (e.g. a client computer, etc.). Thus, as the number of threats increase, the number of lookups required to ensure the network is secure also increases.
- In some cases, however, it may be desirable to keep the lookup rate to less than a certain number of lookups per day. For example, it may be desirable to keep the lookup rate to less than ten lookups per day per client. Thus, harsh criteria is often used to keep the lookup rates low. As a result, many problematic items (e.g. malware, infected files, etc.) are not examined and such items are missed on the client systems. There is thus a need for overcoming these and/or other issues associated with the prior art.
- A system, method, and computer program product are provided for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database. In operation, a database including a plurality of known objects is identified. Additionally, the database is segmented into a plurality of segments. Furthermore, each of the plurality of known objects are assigned to one of the plurality of segments, based at least in part on a prevalence associated with each of the plurality of known objects.
-
FIG. 1 illustrates a network architecture, in accordance with one embodiment. -
FIG. 2 shows a representative hardware environment that may be associated with the servers and/or clients ofFIG. 1 , in accordance with one embodiment. -
FIG. 3 shows a method for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment. -
FIG. 4 shows a method for reducing a number of signature lookups required by a system, in accordance with one embodiment. -
FIG. 5 shows a system for reducing a number of signature lookups required by a system and for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment. -
FIG. 1 illustrates anetwork architecture 100, in accordance with one embodiment. As shown, a plurality ofnetworks 102 is provided. In the context of thepresent network architecture 100, thenetworks 102 may each take any form including, but not limited to a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, peer-to-peer network, etc. - Coupled to the
networks 102 areservers 104 which are capable of communicating over thenetworks 102. Also coupled to thenetworks 102 and theservers 104 is a plurality ofclients 106.Such servers 104 and/orclients 106 may each include a desktop computer, lap-top computer, hand-held computer, mobile phone, personal digital assistant (PDA), peripheral (e.g. printer, etc.), any component of a computer, and/or any other type of logic. In order to facilitate communication among thenetworks 102, at least onegateway 108 is optionally coupled therebetween. -
FIG. 2 shows a representative hardware environment that may be associated with theservers 104 and/orclients 106 ofFIG. 1 , in accordance with one embodiment. Such figure illustrates a typical hardware configuration of a workstation in accordance with one embodiment having acentral processing unit 210, such as a microprocessor, and a number of other units interconnected via asystem bus 212. - The workstation shown in
FIG. 2 includes a Random Access Memory (RAM) 214, Read Only Memory (ROM) 216, an I/O adapter 218 for connecting peripheral devices such asdisk storage units 220 to thebus 212, auser interface adapter 222 for connecting akeyboard 224, amouse 226, aspeaker 228, amicrophone 232, and/or other user interface devices such as a touch screen (not shown) to thebus 212,communication adapter 234 for connecting the workstation to a communication network 235 (e.g., a data processing network) and adisplay adapter 236 for connecting thebus 212 to adisplay device 238. - The workstation may have resident thereon any desired operating system. It will be appreciated that an embodiment may also be implemented on platforms and operating systems other than those mentioned. One embodiment may be written using JAVA, C, and/or C++ language, or other programming languages, along with an object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications.
- Of course, the various embodiments set forth herein may be implemented utilizing hardware, software, or any desired combination thereof. For that matter, any type of logic may be utilized which is capable of implementing the various functionality set forth herein.
-
FIG. 3 shows amethod 300 for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment. As an option, themethod 300 may be implemented in the context of the architecture and environment ofFIGS. 1 and/or 2. Of course, however, themethod 300 may be carried out in any desired environment. - As shown, a database including a plurality of known objects is identified. See
operation 302. The objects may include any item capable of being stored in the database. For example, in various embodiments, the known objects may include files or programs. - In either case, the known objects may include objects that are known to be non-malicious. In this case, the database may include a whitelist database. In the context of the present description, a whitelist refers to any data structure that identifies one or more objects that are known to be non-malicious objects.
- As another option, the database may include a blacklist database. In the context of the present description, a blacklist refers to any data structure that identifies one or more objects that are known to be malicious, unsafe, or undesirable objects. In this case, the known objects may include objects that are known to be malicious.
- In one embodiment, the objects may include whitelisted objects. As an option, the whitelisted objects may be defined utilizing a Bloom filter. In this case, the Bloom filter may be utilized as a whitelist to offset a high false positive rate.
- As shown further in
FIG. 3 , the database is segmented into a plurality of segments. Seeoperation 304. The database may be segmented into any number of segments. - Furthermore, each of the plurality of known objects are assigned to one of the plurality of segments, based at least in part on a prevalence associated with each of the plurality of known objects. See
operation 306. The prevalence may be indicative of an amount the object is accessed and/or utilized. - For example, the prevalence may include a high prevalence or a low prevalence. In this case, a high prevalence may indicate that an object is accessed and/or utilized regularly, or more than a predetermined amount. A low prevalence may indicate that an object is not accessed and/or utilized frequently, or less than a predetermined amount. In one embodiment, the prevalence information may be obtained utilizing client system based antivirus software.
- The segments may then be allocated such that at least one of the segments corresponds to low prevalence objects. Additionally, at least one of the segments may correspond to high prevalence objects.
- In one embodiment, the
method 300 may further include determining whether to perform a lookup operation on the database. As an option, a Bloom filter may be utilized to determine whether to perform the lookup operation on the database. In this case, the Bloom filter may be stored on a client system. The Bloom filter may also be associated with and/or represent a blacklist. - Furthermore, a server system may be configured to update the Bloom filter stored on the client system. in this case, the updating may include pushing hashes stored as Bloom filter bit vectors to the client system. As an option, Bloom filter updates may be sent along with additional software updates.
- More illustrative information will now be set forth regarding various optional architectures and features with which the foregoing technique may or may not be implemented, per the desires of the user. It should be strongly noted that the following information is set forth for illustrative purposes and should not be construed as limiting in any manner. Any of the following features may be optionally incorporated with or without the exclusion of other features described.
-
FIG. 4 shows amethod 400 for reducing a number of signature lookups required by a system, in accordance with one embodiment. As an option, themethod 400 may be implemented in the context of the architecture and environment ofFIGS. 1-3 . Of course, however, themethod 400 may be carried out in any desired environment. It should also be noted that the aforementioned definitions may apply during the present description. - As shown, a signature lookup request is received. See
operation 402. The signature lookup request is then analyzed using a Bloom filter stored on a client device. Seeoperation 404. - It is then determined whether to perform a lookup based on the analysis. See
operation 406. If it is determined that a lookup is to be performed, the lookup is performed. Seeoperation 408. - Thus, it may be determined whether to perform a lookup operation on a database utilizing a Bloom filter. In this case, the Bloom filter may be stored on a client system. The client system may include a computer, a PDA, a mobile phone, or any other type of computing device.
- Furthermore, a server system may update the Bloom filter stored on the client system. In this case, the updating may include pushing hashes stored as Bloom filter bit vectors to the client system. As an option, Bloom filter updates may be sent along with additional software updates. Furthermore, the Bloom filter may be associated with a blacklist.
- Using the
method 400, the number of network based lookups in a network may be reduced. For example, in many systems, the number of network based lookups in a cloud is very high. This is because the network based lookups involve signature lookups across a network for each file scanned on a system (e.g. a client system, etc.). - In some cases, however, it may be desirable to keep the lookup rate to less than a certain number of lookups per day. For example, it may be desirable to keep the lookup rate to less than ten lookups per day per client. To accomplish this, harsh criteria is often used to keep the lookup rates low. As a result, many problematic items (e.g. malware, etc.) are not examined and such items are missed on the client systems.
- By performing a lookup of file signatures that are available locally on a client machine, better results may be achieved. In some cases, an antivirus DAT (AV DAT) based scanning model may be implemented. In these cases, the high number of specific signatures based on checksum or hash functions (e.g. Cyclic Redundancy Check, Message Digest Algorithm, etc.) in the DAT set may inflate the size of the DATs, making the DAT set computationally and economically infeasible. For example, large memory footprints of DATs may be a challenge on some low memory systems and on systems with slower connectivity to the Internet for downloading these files.
- Additionally, the DAT releases may have to be very frequent to achieve the existing performance levels of real time lookups. Thus, there is a need for a relatively smaller DAT size that is sufficient to determine if the MD5 being looked up is present in a blacklist database.
- Accordingly, in one embodiment, a set membership data structure may be utilized, such as a Bloom filter, that provides large savings in space, potentially at the expense of false positives. As an option, the MD5 hashes may be stored as a Bloom filter bit vector and pushed very frequently to the client system. Since, in some cases, an MD5 lookup may only determine if the MD5 for a given file is present in the bad file set, this lookup may be accomplished with the local bloom filter.
- As the hit needs to be confirmed by a lookup, Bloom filters may be used to determine if a lookup should occur. Where false rates are not an issue, Bloom filters that have a high compression, and therefore a very small size, may be utilized. Additionally, in one embodiment, Bloom filter updates may be streamed between DAT releases to ensure these are near to real time lookups.
-
FIG. 5 shows asystem 500 for reducing a number of signature lookups required by a system and for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database, in accordance with one embodiment. As an option, thesystem 500 may be implemented in the context of the architecture and environment ofFIGS. 1-4 . Of course, however, thesystem 500 may be implemented in the context of any desired environment. Once again, the aforementioned definitions may apply during the present description. - As shown, the
system 500 may includememory 502. Thememory 502 may be allocated to a database dedicated to awhitelist database 504 and/or ablacklist database 506. Thewhitelist database 504 and theblacklist database 506 may be located on aclient system 508. Theclient system 508 may be in communication with aserver system 510 over anetwork 512. - As shown, the
whitelist database 504 may be segmented into segments that include high prevalence objects and low prevalence objects. This may be implemented to store a large whitelist of programs on a host in a memory efficient way. - For example, in the context of network security, the threat landscape has grown exponentially and continues to grow. The threats have grown so much that almost all the Anti-Virus (AV) vendors are currently evaluating and implementing various technologies to mitigate the unmatched growth in number of threats. Behavioral detection, automated signature creation, heuristic detections, black listing packers, etc. are few of the recent innovations that most of AV vendors implement.
- Much of these innovations use the “black listing” approach, where a threat is detected and mitigated if it is known to be of malicious nature. These threats may include a file, a network packet, a particular behavior, etc.
- Black listing generally only promises to detect a threat if the threat was analyzed before by the AV provider and has been deemed malign. One issue with this approach is the increasing number of signatures a client side solution needs to carry every time an AV vendor analyzes and deems a file/network packet/behavior as malign. These are typically delivered to the client computers in the form of signature updates.
- The exponential growth in the threat landscape has also resulted in an exponential growth in signatures carried by these AV solutions. As another approach, a white listing technique may be implemented to keep a system free from threats. These systems are generally based on the premise that anything not known could be malicious.
- This technology is generally more pro-active in mitigating the threats as anything new entering a system is deemed suspicious. This technique again calls for the client systems to carry the most recent updates of a whitelist database. A whitelist database would carry signatures of all the files that are known to be benign. However, with time and advancements in technology, the number of “good” files is also expected to increase. Thus, white listing may also see exponentially increasing updates.
- Also, new proactive techniques may have higher than usual false positive rates. To mitigate this, a technique may be implemented to store a large whitelist of programs on a host in a memory efficient way. Therefore, the ability to store a large whitelist on a host allows proactive techniques to be more aggressive against new and potentially unseen malware samples.
- In the case of good files, or the files that are benign in nature, a whitelist database may be segmented into several parts, including high prevalence parts and low prevalence parts. In one embodiment, this information may be collected through the client side AV software whenever programs are executed on the system. A first segment may contain signatures for files that are most prevalent (e.g. information on all Microsoft Office files Adobe files, all system .dlls loaded by these applications, etc.). A second segment may contain signatures for the files that are not prevalent.
- By nature of the design of a computing system, the first segment may contain files released as part of operating systems and as part of widely used software applications. In general, this set of files would not change frequently. As an option, these files may be delivered as a bloom filter bit vector, representing the MD5 values of all the white listed files to the client systems incrementally at longer intervals of time. In this way, the overhead of delivering large sized signature files may be reduced.
- In one embodiment, Bloom filters may be used as a whitelist to offset a high false positive rate an aggressive proactive test may introduce. For example, if a data mining technique is used to detect 90% tpr at 1% fpr, a Bloom filter may be used to make sure that the 90% do not contain known good applications. In this way, as a worst case, the heuristic would be ineffective if the Bloom filter has a false positive.
- Furthermore, with respect to the
system 500, theblacklist 506 may be represented by a set membership data structure, such as a Bloom filter, that provides large savings in space. The MD5 hashes may be stored as a Bloom filter bit vector and pushed very frequently to theclient system 508 by theserver system 510 over thenetwork 512. Since, in some cases, an MD5 lookup may only determine if the MD5 for a given file is present in the bad file set, this lookup may be accomplished with the local bloom filter. - In this way, the Bloom filter may be used to determine if a lookup should occur. As an option, Bloom filters that have a high compression and a small size, may be utilized. In one embodiment, Bloom filter updates may be streamed between DAT releases provided by the
server system 510. - While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Claims (21)
1-20. (canceled)
21. A method, comprising:
accessing a database that is provided for a client system; and
assigning a plurality of known objects to one of a plurality of segments based, at least in part, on prevalence information associated with each of the plurality of known objects, wherein the database includes whitelisted objects, which are non-malicious objects and which are defined utilizing a Bloom filter of the client system, and wherein the Bloom filter is further configured to determine whether to perform a lookup operation on the database.
22. The method of claim 21 , wherein the prevalence information is obtained using antivirus software of the client system.
23. The method of claim 21 , wherein the database includes a whitelist database.
24. The method of claim 21 , wherein an antivirus DAT model is implemented in conjunction with the Bloom filter and the database.
25. The method of claim 21 , wherein the Bloom filter is implemented in conjunction with MD5 hashes, which are communicated to the client system.
26. The method of claim 21 , wherein Bloom filter updates are streamed to the client system between DAT releases.
27. The method of claim 21 , wherein the Bloom filter is utilized as a whitelist to offset a false positive rate.
28. The method of claim 21 , wherein a server updates the Bloom filter for the client system.
29. The method of claim 21 , wherein updating the Bloom filter includes communicating hashes stored as Bloom filter bit vectors to the client system.
30. The method of claim 21 , wherein Bloom filter updates are sent along as part of additional software updates for the client system.
31. The method of claim 21 , wherein the Bloom filter is utilized as a blacklist.
32. The method of claim 21 , further comprising:
receiving a signature lookup request; and
analyzing the signature lookup request using the Bloom filter.
33. The method of claim 21 , wherein the client system is a selected one of a group of client devices, the group consisting of:
a) a computer;
b) a personal digital assistant (PDA); and
c) a mobile phone.
34. The method of claim 21 , wherein Bloom filter bit vectors, representing MD5 values of whitelist files, are provided to the client system.
35. A method, comprising:
receiving a signature lookup request;
analyzing the signature lookup request using a Bloom filter provided for a client system; and
performing a lookup in a database based on the signature lookup request, wherein the database comprises a blacklist database and a whitelist database, which is segmented into high prevalence objects and low prevalence objects, wherein the high prevalence objects reflect objects that are accessed more frequently than objects included in the low prevalence objects.
36. A client system, comprising:
a processor; and
a memory coupled to the processor, wherein the client system is configured to:
access a database that is provided for the client system; and
assign a plurality of known objects to one of a plurality of segments based, at least in part, on prevalence information associated with each of the plurality of known objects, wherein the database includes whitelisted objects, which are non-malicious objects and which are defined utilizing a Bloom filter of the client system, and wherein the Bloom filter is further configured to determine whether to perform a lookup operation on the database.
37. The client system of claim 36 , wherein an antivirus DAT model is implemented in conjunction with the Bloom filter and the database.
38. The client system of claim 36 , wherein the Bloom filter is implemented in conjunction with MD5 hashes, which are communicated to the client system, and wherein Bloom filter updates are streamed to the client system between DAT releases.
39. The client system of claim 36 , wherein the client system is further configured to:
receive a signature lookup request; and
analyze the signature lookup request using the Bloom filter.
40. The client system of claim 36 , wherein Bloom filter bit vectors, representing MD5 values of whitelist files, are provided to the client system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/645,164 US20130031111A1 (en) | 2009-10-26 | 2012-10-04 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/605,678 US8306988B1 (en) | 2009-10-26 | 2009-10-26 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
US13/645,164 US20130031111A1 (en) | 2009-10-26 | 2012-10-04 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/605,678 Continuation US8306988B1 (en) | 2009-10-26 | 2009-10-26 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130031111A1 true US20130031111A1 (en) | 2013-01-31 |
Family
ID=47075547
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/605,678 Active 2030-12-15 US8306988B1 (en) | 2009-10-26 | 2009-10-26 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
US13/645,164 Abandoned US20130031111A1 (en) | 2009-10-26 | 2012-10-04 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/605,678 Active 2030-12-15 US8306988B1 (en) | 2009-10-26 | 2009-10-26 | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database |
Country Status (1)
Country | Link |
---|---|
US (2) | US8306988B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120144189A1 (en) * | 2009-08-11 | 2012-06-07 | Zhong Zhen | Wlan authentication method, wlan authentication server, and terminal |
US8539583B2 (en) | 2009-11-03 | 2013-09-17 | Mcafee, Inc. | Rollback feature |
US20130246423A1 (en) * | 2011-01-24 | 2013-09-19 | Rishi Bhargava | System and method for selectively grouping and managing program files |
US8843496B2 (en) | 2010-09-12 | 2014-09-23 | Mcafee, Inc. | System and method for clustering host inventories |
WO2019237362A1 (en) | 2018-06-15 | 2019-12-19 | Nokia Technologies Oy | Privacy-preserving content classification |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9262624B2 (en) * | 2011-09-16 | 2016-02-16 | Mcafee, Inc. | Device-tailored whitelists |
US10291633B1 (en) | 2016-10-18 | 2019-05-14 | The United States Of America As Represented By The Secretary Of The Army | Bandwidth conserving signature deployment with signature set and network security |
CN115827702B (en) * | 2023-01-13 | 2023-05-16 | 中国人民解放军61660部队 | Software white list query method based on bloom filter |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050257266A1 (en) * | 2003-06-11 | 2005-11-17 | Cook Randall R | Intrustion protection system utilizing layers and triggers |
US20090293125A1 (en) * | 2008-05-21 | 2009-11-26 | Symantec Corporation | Centralized Scanner Database With Qptimal Definition Distribution Using Network Queries |
US20100083376A1 (en) * | 2008-09-26 | 2010-04-01 | Symantec Corporation | Method and apparatus for reducing false positive detection of malware |
US8375450B1 (en) * | 2009-10-05 | 2013-02-12 | Trend Micro, Inc. | Zero day malware scanner |
-
2009
- 2009-10-26 US US12/605,678 patent/US8306988B1/en active Active
-
2012
- 2012-10-04 US US13/645,164 patent/US20130031111A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050257266A1 (en) * | 2003-06-11 | 2005-11-17 | Cook Randall R | Intrustion protection system utilizing layers and triggers |
US20090293125A1 (en) * | 2008-05-21 | 2009-11-26 | Symantec Corporation | Centralized Scanner Database With Qptimal Definition Distribution Using Network Queries |
US20100083376A1 (en) * | 2008-09-26 | 2010-04-01 | Symantec Corporation | Method and apparatus for reducing false positive detection of malware |
US8375450B1 (en) * | 2009-10-05 | 2013-02-12 | Trend Micro, Inc. | Zero day malware scanner |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8589675B2 (en) * | 2009-08-11 | 2013-11-19 | Huawei Device Co., Ltd. | WLAN authentication method by a subscriber identifier sent by a WLAN terminal |
US20120144189A1 (en) * | 2009-08-11 | 2012-06-07 | Zhong Zhen | Wlan authentication method, wlan authentication server, and terminal |
US9852296B2 (en) | 2009-11-03 | 2017-12-26 | Mcafee, Llc | Rollback feature |
US9032523B2 (en) | 2009-11-03 | 2015-05-12 | Mcafee, Inc. | Rollback feature |
US9607150B2 (en) | 2009-11-03 | 2017-03-28 | Mcafee, Inc. | Rollback feature |
US9703958B2 (en) | 2009-11-03 | 2017-07-11 | Mcafee, Inc. | Rollback feature |
US8539583B2 (en) | 2009-11-03 | 2013-09-17 | Mcafee, Inc. | Rollback feature |
US8843496B2 (en) | 2010-09-12 | 2014-09-23 | Mcafee, Inc. | System and method for clustering host inventories |
US20130246423A1 (en) * | 2011-01-24 | 2013-09-19 | Rishi Bhargava | System and method for selectively grouping and managing program files |
US9075993B2 (en) * | 2011-01-24 | 2015-07-07 | Mcafee, Inc. | System and method for selectively grouping and managing program files |
WO2019237362A1 (en) | 2018-06-15 | 2019-12-19 | Nokia Technologies Oy | Privacy-preserving content classification |
US20210256126A1 (en) * | 2018-06-15 | 2021-08-19 | Nokia Technologies Oy | Privacy-preserving content classification |
EP3807798A4 (en) * | 2018-06-15 | 2022-01-26 | Nokia Technologies OY | Privacy-preserving content classification |
Also Published As
Publication number | Publication date |
---|---|
US8306988B1 (en) | 2012-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3814961B1 (en) | Analysis of malware | |
AU2018217323B2 (en) | Methods and systems for identifying potential enterprise software threats based on visual and non-visual data | |
US20130031111A1 (en) | System, method, and computer program product for segmenting a database based, at least in part, on a prevalence associated with known objects included in the database | |
EP2486507B1 (en) | Malware detection by application monitoring | |
US7801840B2 (en) | Threat identification utilizing fuzzy logic analysis | |
US8196201B2 (en) | Detecting malicious activity | |
US7650639B2 (en) | System and method for protecting a limited resource computer from malware | |
US9715589B2 (en) | Operating system consistency and malware protection | |
US8239944B1 (en) | Reducing malware signature set size through server-side processing | |
US10678921B2 (en) | Detecting malware with hash-based fingerprints | |
US8392996B2 (en) | Malicious software detection | |
US8256000B1 (en) | Method and system for identifying icons | |
US20130247190A1 (en) | System, method, and computer program product for utilizing a data structure including event relationships to detect unwanted activity | |
US9003314B2 (en) | System, method, and computer program product for detecting unwanted data based on an analysis of an icon | |
US9106688B2 (en) | System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature | |
US8925085B2 (en) | Dynamic selection and loading of anti-malware signatures | |
US20130246352A1 (en) | System, method, and computer program product for generating a file signature based on file characteristics | |
US8726377B2 (en) | Malware determination | |
US8484725B1 (en) | System, method and computer program product for utilizing a threat scanner for performing non-threat-related processing | |
AU2007204089A1 (en) | Malicious software detection | |
AU2007203543A1 (en) | Threat identification | |
AU2007203373A1 (en) | Detecting malicious activity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |