CA2935130A1 - Encrypted data - computer virus, malware and ransom ware detection system - Google Patents

Encrypted data - computer virus, malware and ransom ware detection system Download PDF

Info

Publication number
CA2935130A1
CA2935130A1 CA2935130A CA2935130A CA2935130A1 CA 2935130 A1 CA2935130 A1 CA 2935130A1 CA 2935130 A CA2935130 A CA 2935130A CA 2935130 A CA2935130 A CA 2935130A CA 2935130 A1 CA2935130 A1 CA 2935130A1
Authority
CA
Canada
Prior art keywords
encrypted
virus
scanning
malware
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA2935130A
Other languages
French (fr)
Inventor
Mirza Kamaludeen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CA2935130A priority Critical patent/CA2935130A1/en
Publication of CA2935130A1 publication Critical patent/CA2935130A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Abstract

There is provided apparatus, systems and methods suitable for detecting malware in encrypted data. The term malware refers to any computer codes/instructions, which perform actions unauthorized or unwanted by the user, including computer viruses, worms, Trojans and exploits.
Potential applications include, but are not limited to, secure and remote storage systems such as email servers and cloud servers, encrypted devices such as hard drives, USB
flash drives and smart cards. Three approaches, with different properties, are provided for verifying standard hash signatures and generic signatures such as code snippets patterns, using encrypted indices and homomorphic encryptions. The entire scanning process is executed in the encrypted domain. One or more method and system is provided for scanning the files for viruses and malwares. Note that, in our system, the data to be scanned is encrypted under the data owner's private key and not to be confused with polymorphic viruses, which arc data encrypted by the malware writer/code. If a signature compatible to our system can be devised for a polymorphic virus, it can also be detected under the present invention.
This system allows storage of encrypted files on cloud computing/storage servers or their passage into secure system, by providing assurance that the encrypted files are free of viruses and malwares that were available to the scanner. It may also provide assurance to the file owner, that their files were not changed, such as when coupled with the invention in "Auditing using Encrypted Indices", that the resident files were not tampered.

One approach involves the data owner encrypting and sending the data files and their indices, in accordance with an encryption scheme, to a storage repository, device, or cloud storage servers.
Another consists of the storage of data encrypted using homomorphic encryption and an anti-virus as a service provider sending encrypted virus/malware signatures, using data owner's public key, for scanning on the storage device/server.
This Invention covers three solutions for performing virus scanning over encrypted data, 1) For a private encrypted data storage service/device, where the data owner possesses the anti-virus tools and database 2) For a public encrypted data storage service/device, where the data owner requests the scanning service from an Anti-virus provider, which controls the anti-virus tools and database.
3) For a public unencrypted data storage service/device, where the data owner requests the scanning service from the Anti-virus provider, which controls the anti-virus tools and database.

Description

ENCRYPTED DATA - COMPUTER VIRUS, MAL WARE AND RANSOM
WARE DETECTION SYSTEM
DESCRIPTION OF INVEN I ION
Cloud computing have garnered much attention in recent years for its many advantages. In particular, outsourcing one's computing resources to a cloud service operator is often far more efficient and cost-effective compared to maintaining an in-house data center.
The storage of data on remote servers controlled by third parties, however, raises security and privacy concerns. As such, it is often required that data stored on the cloud be encrypted. The challenge of working with encrypted data on remote servers has been the focus of many researches in recent years.
Among them, how can we selectively access the encrypted information? How can we ensure that data integrity is maintained without a copy of the data? How can we perform deduplication on encrypted data to reduce storage cost? In this invention, we consider the problem of malware detection on encrypted data. The issue is particularly relevant where organizations use cloud storage to archive data or to back up their systems. It is not unusual that companies would archive a significant amount of data in case the information becomes needed again in the future and such data may not be accessed for long periods of time. While many malwares are identified soon after released, some remain dormant long before they are discovered.
Should the archive contain malwares, retrieval could activate them. Similarly, restoring a system to a backup containing malwares could also be problematic. In these scenarios, a methodology to detect malwares in the encrypted archive would be invaluable. While it is theoretically possible to retrieve the entire archive and perform local scanning, the communication cost would be prohibitive in practice. Furthermore, anti-virus companies often share information on newly discovered malwares to aid in protecting users against the latest threats. The shared information was always considered trusted. Recently, it has emerged that a reputed anti-virus company may have intentionally injected false positives in an attempt to harm the reputation of its competitors, resulting in a spike of legitimate files being identified as malware in the early 2000s. Such malicious behavior from a partner was not considered in the past. One of the main reasons that a malicious party was able to launch such an attack was due to the direct access to the competitors' anti-virus programs and their malware database. This allows them to reverse engineer the software to determine and to mislead their identification algorithms. Note that malware writers have also always been able to do the same to verify that their latest malwares can avoid detection or to be alerted when their released malware has been identified. All are good reasons why an anti-virus tool can be invaluable when offered as a cloud service, where the detection algorithms and virus database can be kept private and server-side updates are seamless and invisible to public.
Modern malware identification techniques include signature-based detection, heuristics-based detection, behavioural-based detection (including Sandbox detection) and data-mining techniques (Al). All of these solutions have targeted unencrypted data. While each technique has its strengths and flaws, combining them has been an effective strategy in practice. A malware's standard signature often refers to the file's hash signature. While its detection capability is very limited on its own due to the proliferation of polymorphic viruses, it is still one of the most reliable tools available and is highly effective when combined with other identification techniques. The combination of the various properties, behavioural and structural, of a malware is often referred to as a generic signature, which is one of the most commonly used identification techniques in Anti-Virus software today. Our sqlutions for encrypted cloud storage will be restricted to detecting malware based on structural rather than behavioural properties. More specifically, in addition to being able to identify perfect matches of malware, the solutions allow for detection of sequences of potentially malicious computer instructions, which may appear at different positions in a file. That is, our system can detect a computer virus that attaches itself to and hide in legitimate data files, a common malware behaviour. Malware that evolves and has several versions in the wild also exhibits such properties. By extension, our schemes also allow the use of wildcards. Our proposed solution in a private scanning model relies on the use of encrypted indexes, originating from the field of keyword search over encrypted data.
4.0 SUMMARY OF INVENTION
We propose solutions for performing Viruses and Malvvare detection in the following scenarios:

- A malware scanner for encrypted cloud storage accessed solely by the data owner - An Anti-Virus as a service provider operating over encrypted cloud storage - An Anti-Virus service provider operating over unencrypted cloud services.
In general, our solutions target the following features:
A. Privacy A.1.User data A.2.Anti-Virus detection algorithm A.3.Malware/Virus database A.4.Invisibility of software updates B. Protection against reverse-engineering and gaming of detection algorithm C. Ease of updates for software as a service D. Ease of collection of virus data for analysis Our private malware scanning solution is based on encrypted indexes and achieves performance comparable to leading keyword search algorithms. The private solution aims to achieve A.I, while achieving good performance. Our scheme for Anti-Virus as a service, based on homomorphic encryption, further protects against malicious agents by performing a portion of the detection algorithm on an Anti-Virus server and by having all operations performed in the encrypted domain to protect privacy. The scheme achieves all listed features at the cost of higher complexity. Our final solution, implementing Anti-Virus as a service on unencrypted data, aims to achieve A.2. A.4, B, C and D, where the virus database/detection algorithm is partially performed on an Anti-Virus server.

6.0 BRIEF DESCRIPTION OF DRAWINGS
Figure 1 ¨ Block diagram of a prior art Virus and Malware scanning system on unencrypted data where data, signatures and identification system reside on the same computer system Figure 2 ¨ Block diagram of an embodiment of,the Setup system for encrypting the data source and associated indices for private storage Figure 3 ¨ Block diagram of an embodiment of the Virus and Malware scanning system for private storage Figure 4 ¨ Block diagram of an embodiment of the Setup system for encrypting the data source for public storage and scanning by an anti-virus as a service provider Figure 5 ¨ Block diagram of an embodiment of the Virus and Malware scanning system in encrypted public storage by an anti-virus as a service provider Figure 6¨ Block diagram of an embodiment of the Virus and Malware scanning system in unencrypted public storage by an anti-virus as a service provider Figure 7 ¨ Flow diagram illustrating the operations of the Setup system for encrypting the data source and associated indices for private storage in an embodiment of the Virus and Malware scanning system for scanning for malware hash signatures in private storage Figure 8A, 8B ¨ Flow diagram illustrating the operations for scanning for malware hash signatures in an embodiment of the Virus and Malware scanning system in private storage Figure 9A, 9B ¨ Flow diagram illustrating the operations for scanning for byte strings or code snippets in an embodiment of the Virus and Malware scanning system in private storage Figure 10 ¨ Flow diagram illustrating the operations in the Setup system of an embodiment of the Virus and Malware scanning system in encrypted public storage by an anti-virus as a service provider Figure 11 ¨ Flow diagram illustrating the operations for scanning for malware hash signatures in an embodiment of the Virus and Malware scanning system in unencrypted public storage by an anti-virus as a service provider Figure 12¨ Flow diagram illustrating the operations for scanning byte strings or code snippets in an embodiment of the Virus and Malware scanning system in unencrypted public storage by an anti-virus as a service provider 7.0 DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
For private scanning, data is first indexed to create multiple indexes, one set comprises of blocks of data over multiple files, while the other comprises of blocks of data within individual files.
The data and the indices are encrypted using a standard encryption algorithm in the encryptor.
During scanning, virus/malware signature are parsed into blocks and encrypted with the encryptor. The encrypted signature blocks are sent to the storage server/device whereby the indices are referenced and the encrypted entries returned to the user. The user decrypts the entries and proceeds with further queries on other indices as required. Given the required index entries, the user identifies matching signatures.
For public scanning with the anti-virus as a server, the user encrypts the data using homomorphic encryption and stores the data on the storage server/device and publishes the public key. To perform a virus scan, the Anti-Virus server extracts the public key for the user requesting the scan and encrypts the virus/malware signatures. It then transfers the encrypted signatures to the storage server/device. The storage server/device, exploiting the properties of homomorphic encryption, performs the scanning and sends the encrypted results back to the user. The user decrypts the scanned results and sends them back to the Anti-Virus server for analysis. The Anti-Virus server determines whether the data is infected, using its own detection algorithm, based on the scan results.
For an unencrypted data service, user uploads unencrypted data to storage server/device. To perform a virus scan, the Anti-Virus server sends the virus/malware signatures to the storage server/device where the scan will be performed. The scan results are then returned to the Anti-Virus server where it determines whether the data is infected, using its own detection algorithm, based on the scan results.
One embodiment of the method for private scanning implemented on a Cloud Storage system is:
1) Attach the standard hash signature to each file
2) Parse all files in data set for blocks of size n to generate an index A, mapping blocks to files
3) Parse each file for blocks of size n to generate an index B,, mapping blocks to block location
4) Encrypt all files and indices using any encryption scheme
5) Upload files and indices to the cloud server For standard hash signature such as SHA-256, 1) Parse the standard signature for unique blocks of size n, include padding if necessary 2) Encrypt the blocks and query the cloud server for the encrypted signature blocks 3) Cloud server looks up the encrypted blocks in index A and returns the encrypted entries 4) Data owner decrypts the entries 5) If a different encryption key is used for encrypting index 131, re-encrypt the blocks under the new encryption key
6) Query the cloud server for the encrypted blocks for the candidates files
7) Cloud server looks up the encrypted blocks in index B, and returns the encrypted entries
8) Data owner decrypts results and matches are identified where the block locations correspond to the hash signature block locations
9) Anti-virus detection algorithm may request further scans based on the results
10) Potential viruses/malwares are identified For generic signature such as byte strings or code snippet, 1) Parse the byte string for unique blocks of sik n 2) Encrypt the blocks and query the cloud server for the encrypted signature blocks 3) Cloud server looks up the encrypted blocks in index A and returns the encrypted entries 4) Data owner decrypts the entries 5) If a different encryption key is used for encrypting index Bõ re-encrypt the blocks under the new encryption key 6) Query the cloud server for the encrypted blocks for the candidates files 7) Cloud server looks up the encrypted blocks in index B, and returns the encrypted entries 8) Data owner decrypts results and matches are identified where the block locations appear as specified in the generic signature.

a. For example, with block size = 64, the byte string {0x2AF3, 0xB027, OxAA83}

would be matched if a sequence is found in the decrypted block locations such as 2,3,4 or 87,88,89.
9) Anti-virus detection algorithm may request further scans based on the results 10) Potential viruses/malwares are identified One embodiment of the method for public scanning with an anti-virus service provider implemented on a Cloud Storage system is: "
I) Attach the standard hash signature to each file 2) Encrypt all files using homomorphic encryption 3) Upload files to the cloud server 4) Publishes the public key used For standard hash signature such as SHA-256, 1) Using the data owner's public key, Anti-Virus server encrypts standard virus/malware signatures, E(-Malwares,g) and sends to the cloud server 2) The cloud server executes the following operations, for each file in the data set:
i. Compute E(Q1) = E(R) * (E(Files,g,,) + E(-Malwares,g)), where R is a random value and Files,go is the hash signature of the file ii. The results are sent to the data owner 3) Data owner decrypts the results and matches are identified where Q, = 0 4) If required, data owner sends the list of results to Anti-Virus provider 5) If required, Anti-Virus provider may request further scans based on the results 6) Anti-Virus provider identifies potential viruses/malwares based on the results 7) Data owner and/or Cloud server are then notified For generic signature such as byte strings or code snippet, 1) Using the data owner's public key, Anti-Virus server encrypts a byte string E(-y) of size ny and sends to the cloud server 2) The cloud server executes the following operations, for each file in the data set:
i. Compute E(Q,,j) = E(R) * (E(File,o) + E(-y)), where R is a random value and File, j is fly blocks at jth byte location of the ith file ii. The results are sent to the data owner 3) Data owner decrypts the results and matches are identified where Qid = 0 4) If required, data owner sends the list of results to Anti-Virus provider 5) If required, Anti-Virus provider may request further scans based on the results 6) Anti-Virus provider identifies potential viruses/malwares based on the results 7) Data owner and/or Cloud server are then notified One embodiment of the method for public scanning on unencrypted data implemented on a Cloud Storage system is:
1) Attach the standard hash signature to each file, if desired, for efficiency 2) Upload files to the cloud server For standard hash signature such as SHA-256, 1) Sends virus/malvvare signatures to cloud server 2) Cloud server computes, if necessary, and compares the hash signatures of user files with virus/malware signatures 3) If necessary, results are returned to Anti-Vh'us server for further analysis 4) If necessary, further scans are requested by the Anti-Virus server 5) Based on the scan results, viruses/malwares arc identified and cloud server/user are notified For generic signature such as byte strings or code snippet, 1) Sends byte strings to cloud server 2) Cloud server scans user files for byte strings 3) Results are returned to Anti-Virus server for analysis 4) If necessary, results are returned to Anti-Virus server for further analysis 5) If necessary, further scans are requested by the Anti-Virus server 6) Based on the scan results, viruses/malwares are identified and cloud server/user are notified In many cases, an organization is the sole user accessing and writing data onto the cloud server.
We describe here an efficient solution to provide malware scanning capability in this scenario, where security concerns are restricted to that of the cloud operator.
A. Communication Model Our communication model involves two parties, as shown in figure 2, where the data owner encrypts and uploads the data to the cloud server and subsequently requests virus scanning by following a communication protocol. The data owner is assumed to have a set of virus definition consisting of standard signatures such as MD-4 or SI1A- 256 and generic signatures describing snippets of malicious codes. The solution must also be sound, that is to have a communication cost lower than the size of the data set.
In terms of security, we assume the cloud operator to be semi-honest, following our protocol without deviation, but are interested in learning information on the stored data. The main requirement of our scheme is that the content of the files stored on the cloud server remains private and no information is leaked as a result of the malware scanning setup and protocol.
B. Ma/ware scanning protocol Our proposed solution is based on encrypted indexes, using techniques similar to [13] and [16], and allows for standard and generic signature (with wild cards) based detections. Briefly, two indexes are used: A block-to-file index, 1, and a block location index, IL.
The block-to-file index enables a coarse scan to quickly identify potentially infected files in the data set while the block location index allows for detailed scan on individual files for virus identification. Our scheme uses symmetric encryption to protect the content of the files and indexes. During setup, N files, forming the data set to be uploaded, are parsed and indexed as n-bit blocks, resulting in a block-to-file index and N block location indexes. The indexes and files are then encrypted and uploaded onto the server. To perform a virus scan request, the data owner first queries the block-to-file index, I, for files containing instruction blocks corresponding to virus definitions followed by a query on IL to determine code sequence matches. The queries sent to the cloud service provider contain encrypted blocks while the returned entries are also encrypted. To speed up standard signature detection, a hash signature, such as SHA-256, is attached at the beginning of every file.
Our scheme achieves the required functionalities without additional complexity in terms of storage, communication and computational cost when compared to search schemes for encrypted cloud storage
[11].
A detailed description of the algorithms is as follows. A file collection, D =
{DI, D2, , Dn}, is parsed for n -bit blocks, xj . A block-to-file index, I, is generated mapping blocks to files such that 1(xj) = {da,dh,...,dn}, where di = I if xj is found in the file and di = 0 otherwise. The resulting index is encrypted and uploaded to the cloud server:
l(HK(xj)) = {EK(da,db,...,dn)}. (4) In addition, a block location index, Ivo, is generated for each file:[s]lL(I)(HK(Dilxi)) = fEK(001,.i2,===tinxi )1, (5) where HK() is a cryptographically secure hash function and ix are the locations of xi in the file with identifier, Di.
To perform a standard signature scan for a malware with a nh-bit hash signature, Sig(y) = {yi, y2 = = = , Ynh/ri' }, the user begins by sending the keyed hash of the signature, HK(Sig(y)), to the cloud.
The corresponding encrypted index entries in I are returned to the data owner.
A set of candidate files, Dc, are then found by decrypting the entries and identifying their intersection:
DKO(Hk(Y1)))8zDK(I(1-11(.(y2)))- &Dic(I(HK(Ynd.))), where & denotes a bitwise AND operation. Tht data owner then sends HK(DilSig(y)), where Di E Dc to query the block location of the signatures. Upon receiving the encrypted location entries, IL(1)(HK(DilYi)) = Ei((y00 from the cloud server, the data owner decrypts and identifies files, D1, as matches if 1, 2 ... , flh In are respectively in DK(EK0,0),DK(EK(y2))===,Dit(EK(v nhin Note that multiple malwares can be verified simultaneously by sending multiple keyed hash signatures at the same time.
To perform a generic signature scan for a malware with the following characteristic code snippets Ya = { yi, Y2 = = , Yq'} and Yb = IYI,Y2...,Yq' 1, the user computes and sends Hic(Ya U yb) to the cloud. The cloud server returns the corresponding encrypted index entries to data owner, who then finds the set of candidate files from their intersection in the same way as when performing a standard signature scan. To determine the existence of the code snippets, the block locations arc then queried by sending HK(Dilyi) where yi E {ya U yb} to the cloud server.
Upon decrypting the location information, the location of the first block, yi, in ys is extracted. For each location, Loc(Yi) = }11, 12. , 1,4, the data owner verifies if the following block is y2. For each candidate found, {II + 1, 12 + 1 . . . , 1} e Loc(y2 ), the data owner continues with the following block.
The process iterates until the last block in the snippet is verified or until no candidates remain.
Files where the location of the code snippets appear as ordered are identified as matches.
An alternative approach for verifying code snippets that may require less communication would proceed as before for identifying candidate files. Instead of querying the locations of all blocks within candidates, a random block in the code snippets is queried, H (D, ly, ). Given the locations, the owner returns (1-1(Expois (Y1, Y2, = = = 310), iS), (7) where i is the index of the matched file and js is the expected starting location of the code snippet, for each match. EKD, is 0 represents the symmetric encryption of the code snippet at location js of file D. The cloud then computes H(E(xjõxis.,,,...,x,s,q)), where E(x) is the jth stored block in file i.
Matches are found where the following equality holds:
H(E(Xjõ...,Xjs+0) = H(EKD (Y1,== =,Y0). (8) is The communication cost of the alternative approach depends on the frequency of the random block chosen. Instead of randomly choosing the block, one approach to ensure lower communication cost is to select the block with the lowest frequency in the file set. To do so, a block frequency list would be stored locally by the data owner. To allow for fast computation of encrypted block sequences, the data should be encrypted using a block cipher in counter mode.
The scheme is fairly efficient, requiring mainly decryption of index entries and lookup of sequences by the data owner. Any symmetric encryption algorithm can be used and security is easily observed since the index entries are encrypted as a Encrypted cart reuultS Cloud Snrun.D
CM, Owner.) =' Encrypted signatures/
code sequences fulakkore C _____ wan request/results t urruz=
Fig. 2. Communication model for three-party malware scanning service on encrypted data whole. The setup and protocol are flexible and can also be used for keyword and phrase searches with proper choice of parameters. A hierarchical setup of indexes could also lead to better efficiencies.
7.1 ANTI-VIRUS As A SERVICE FOR ENCRYP FED CLOUD STORAGE
While simple, the previous scenario requires the data owner to manage the malware scanning process, including the anti- virus software and database. Furthermore, an anti-virus soft- ware designed to run entirely client-side exposes the virus database and detection algorithms to the public, including malicious users which may use the information to aid in malware development or to disrupt normal operation of the anti-virus software.
Therefore, we propose the implementation of anti-virus as a cloud service, where only essential scanning is performed client-side and malware detection based on the scanning results is performed by the anti-virus company's server. Aside from simplifying the frequent updates currently required by anti-virus software, denying malicious parties access to a significant portion of the scanning algorithm would also hinder efforts to produce malware that evades detection and protect against reverse-engineering of the detection algorithms. In addition, our solution maintains user privacy by requiring the data stored on the cloud be encrypted under data owner's private key.
A. Communication Model We consider a three-party model, as shown in figure 2, where the data owner encrypts and uploads the data to the cloud server and the anti-virus company offers a malware scanning service on the encrypted data. Virus scanning is performed by following a communication protocol. The anti-virus company controls the malware database and the detection algorithm. The data owner generally initiates by sending a scan request to the anti-virus server. Then, a set of encrypted signatures/code sequences are sent to the cloud for tests. The encrypted results are sent back to the data owner, who sends the decrypted scan results back to the anti-virus server for analysis.
In terms of security, we assume both the cloud operator and anti-virus service provider to be semi-honest, following our protocol without deviation, but are interested in learning information on the stored data. The main requirement of our scheme is that the content of the files stored on the cloud server remains private and no information are leaked as a result of the malware scanning setup and protocol except where malware matches are found.
Another highly desirable property is that the virus definitions and malware identification methodologies, which include the weighing of,the various matches and combinations of results that would lead to positive identifications, are not leaked as a result of the scanning protocol.
B. Ma/ware scanning protocol Our proposed solution is based on homomorphic encryption. Without loss of generalization, we will describe the scheme using Paillier's cryptosystem. Briefly, the scheme works as follows. The data owner encrypts the data set using Paillier's cryptosystem and sends the result to the cloud.
The public key is also uploaded with the data set. A hash signature, such as SHA-256, is attached to each file to enable standard signature verification. To perform a malware scan, the anti- virus company encrypts the standard and generic signatures using the data owner's public key and sends them to the cloud for testing. The cloud computes the difference between the encrypted data and test sequences. Since the data and the hash signature/code snippets are both encrypted, the cloud gains no information on either and sends the results back to the data owner.
Using the homomorphic property that E(X)+E(¨Y) = E(0) if X = Y, the data owner decrypts and sends the individual test results to the anti-virus server. Finally, the anti-virus server sends the malware scan results to data owner or cloud, depending on end user agreement.
A detailed description of the algorithms is as follows. We first generate the publickey, (n, g), and the private key, (X, p.), for a Paillier cryptosystem which accepts n'-bit plaintexts. Given a file collection, D = ID1, D2, . Dr,1, a hash signature, Sig(1)1). such as SHA-256, is computed and attached to each file. The file collection is then encrypted as d-bit blocks, xi. Recall that the ciphertext, c, for a given plaintext, m, is given by c = ernmod n2, where r E
To perform a standard malware signature scan, the anti- virus server encrypts the negation of a nh-bit hash signature, Sig(y) =
,y1, y2 = . yTh= using the user's public key. This results in the encrypted signature:
Csig(y) = tg Y1 r111, g2 r2.= = = g rõ' (9) h where n'h = nh/n' is the number of blocks needed to represent the hash signature. The anti-virus server then sends the encrypted hash signature. For each file, the cloud storage provider computes Qsig(D) ¨{Q1,(22.-Qn'1}=csig(D,) +csig(y) (10) and homomorphically multiplies each block by a random value before passing to the data owner:
rir2r0=:QSig(D,) ,Q2 ...Qd (11) where r, E 4 are randomly generated. The data owner decrypts each result and a match is detected if D(Q Sig(D,)) = {0, 0. . . 0}. (12) The scan results are sent back to the anti-virus server as a bit sequence'is_EnQstd(Sig(Y))={QD,,QD2...Qpn} (13) where QD, represents a single bit and is set to 1 if a match is detected for D, and is set to 0 otherwise.
Based on the scan results, the anti-virus server determines if a malware is detected.
The previous step can be performed more efficiently at the cost of a small chance of false positive by aggregating the encrypted blocks. That is, compute identified if D(Q'A) = 0. (15) The description on generic signature will proceed using this aggregated approach for clarity. Note that multiple malware can be verified simultaneously by sending multiple keyed hash signatures at the same time.
A generic signature verification with code snippets proceeds in the same manner using a sliding windows approach. For the following malware block sequence, ya= {Yi, Y2 = = =
, NO, the anti-virus server encrypts the sequence using the data owner's public key to obtain ¨Yi n ¨Y2 n ¨y' rirCya ={g ri,g r2 ...,g qr,i,} (16) and sends it to cloud server. For an encrypted file CD,=
{C1, Cx2.= = = Cxn :4,,Q(js)=-{Q i,Q2===Qq'} =cos) +Cy, (17) r) where C-7,00 = . . C }, for the starting block, Js = 1 to nco, + 1.
The results are then aggregated and QA )' (Xis WA (18) is sent back to the data owner, who decrypts the results to determine matches where D(QA(is)') = 0. (19) The individual results are sent to the anti-virus server as a bit sequence:
QGen(D,,Ya)= 1QD,( 1),QD,(2)---QD,(nD, +1)1 (20) where Q0, (J5) represents a single bit and is set to 1 if a match is detected for D, at position j5 and is set to 0 otherwise. Based on the scan results, the anti-virus server determines if a malware is detected.
It is interesting to note that the cloud can only access the encrypted data, the encrypted code sequences and hash signatures under test. Without the user's private key, the cloud cannot learn their content. Similarly, the anti-virus server is never granted access to the encrypted data and only receives the test results in the form of a bit seqyence representing a match versus non-match. An encrypted sequence test result, Q'A' is first randomized such that no information is divulged except in the event that QA¨ 0. Thus, other than the event of a match, little is revealed on the content of the data under test.
7.2 AN I-VIRUS As A SERVICE FOR UNENCRYP I ED CLOUD STORAGE
Despite the promise of better user privacy, much of today's cloud storage providers do not provide encryption services where the private key is controlled by the data owner. In fact, most providers continue to work with unencrypted data due to efficiency and various functionalities that are available only data remains unencrypted. Nonetheless, anti-virus as a service would still be valuable in an unencrypted cloud for its ease in keeping the malware scanning tools up to date and its ability to hinder the viability testing of malwares and to prevent reverse engineering of detection algorithms. The scenario would also be interesting in applications where privacy is not of concern.
Furthermore, detection can be performed based on behavioral in addition to structural characteristics of malwares, leveraging techniques that currently do not work on encrypted data.
A. Communication Model The communication model for an anti-virus as a service is the same three-party model as in the encrypted case, shown in figure 2. The data owner uploads the data to the cloud server and the anti-virus company offers a malware scanning service on the data. Virus scanning is performed by following a communication protocol. The anti-virus company controls the malware database and the detection algorithm. A client scanning software runs on the cloud server.
Unlike the encrypted case, there are no privacy or security requirements on unencrypted data. The objective is to hide as much as possible the malware detection algorithm.
B. Ma/ware scanning protocol To illustrate the technique, consider the following set of rules:

QA QSig(D,) AA
Q(14) and returning Q' ¨ VA to the data owner, where a match is 1, the cloud computes 1=1 sudo followed by self-decryption and execution, (Behavioral), Weight =
2[slip]SHA-256 hash signature is {x,y or z}, (Structural), Weight = I
Code snippet {csi , wildcardi bytes CS2 }, (Structural), Weight = 1 A score of 2 results in a positive malware match. In standalone anti-virus software, the entire sequence of tests is performed locally and visible to anyone monitoring the software. As a cloud service, the client performing the scan may relay the information that a sudo had been called, that a file with hash x is found or a code snippet is detected, but only the anti-virus server could decide whether a positive malware match had occurred, based on the scan results.
While our simple example may contain only a few rules, a practical anti- virus may perform hundreds of tests. A
positive scan result may not necessarily be a factor that led to the positive match of the malware detected, and vice-versa. This separation of the anti-virus scanning process effectively turns the anti-virus server into a black box to a malicious user. The ease and invisibility of software updates can also alter the behavior of the anti-virus server without an outsider becoming aware. This dramatically complicates any reverse engineering efforts and attempts at testing malware viability without being detected.
It should be noted, however, that behavioral detection must be performed by the client scanner due to the time-sensitive nature of executing codes, although behavior deemed to be high-risk may be interrupted to await further instructions from the anti-virus server, which may include structural verification of the executing file/code/ram, or permission for the executing code to continue to run under restricted conditions while monitoring further suspicious behaviors.

8.0 CONCLUSION
In this paper, we explored the problem of malware detections on cloud services, and proposed three anti-virus solutions for cloud services. Our solution based on encrypted search provides an intuitive approach to perform malware detection on an encrypted cloud storage whose access is limited to the data owner. The scheme allows for any symmetric encryption algorithm to be used, with performance comparable to the leading keyword and phrase search algorithms.
Aside from user privacy, we also examined the disadvantages of implementing anti-virus as a software that performs scanning locally. In particular, the current approach exposes the virus database and detection algorithm, potentially aiding malwarc writers to evade detection and malicious agents to reverse engineer the detection algorithms. Therefore, we propose an anti-virus as a service solution, which, in addition to mitigating the aforementioned risks, also eases the frequent updates required for the critical service. Our solution is based on homomorphic encryption and demonstrated using Paillier's cryptosystem. Detection is performed in the encrypted domain, ensuring privacy.
While encryption leads to greater security and privacy, many valuable functionalities are currently not possible in the encrypted domain and unencrypted cloud services will continue to operate in the foreseeable future. Nonetheless, the merit of implementing anti-virus as a cloud service extends to unencrypted cloud services. In addition, operating over unencrypted data allows for behavioral detection that plays a significant role in anti-virus software today.

Claims (36)

Encrypted Data - Computer Virus, Malware AND Ransom Ware Detection System 1.0 CLAIMS
The following claims and Scope of claims should not be limited by examples provided herein, but should be given the broadest interpretation consistent with the description as a whole.
1. A method of searching, scanning and detecting computer viruses and malwares in encrypted data. The method comprises:
- Generating and encrypting indices for a data set - Scanning the indices for encrypted blocks of virus/malware signatures - Decrypting index entries and identifying matching signatures and patterns
2. A method according to claim 1 where the encryptor conforms to a symmetric encryption scheme
3. A method according to claim 1 where the virus/malware signatures correspond to hash signatures such as SHA-256, byte-strings, usage of wild cards or byte string distance/difference
4. A method according to claim 1 where multiple indices are used to enable multi-level scanning, as a trade-off between privacy and efficiency
5. A method according to claim 1 where indices and virus/malware signatures are not encrypted but stored and processed locally to increase efficiency
6. A method of claim 1 for scanning encrypted file for viruses based on operations with their encrypted indices and at least one encrypted virus signatures
7. A system of claim 1 for scanning encrypted file for viruses based on operations with their encrypted indices and at least one encrypted virus signatures
8. A method of claim 1 for scanning encrypted files for malware based on operations with their encrypted indices and at least one encrypted malware signatures
9. A system of claim 1 for scanning encrypted files for malware based on operations with their encrypted indices and at least one encrypted malware signatures.
10. A method of claim 1 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
11. A system of claim 1 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
12. A method of searching, scanning and detecting computer viruses and malwares in encrypted data. The method comprises:
- Encrypting the data set using homomorphic encryption - Encrypting virus/malware signatures using homomorphic encryption - Performing pattern matching computations in the encrypted domain between encrypted data set and encrypted virus/malware signatures - Decrypting the matching results and passing them to a detection system that identifies computer viruses and malwares.
13. A method according to claim 12 where the public key is published on a trusted third party such as a certificate authority.
14. A method according to claim 12 where the virus/malware signatures correspond to hash signatures such as SHA-256, byte-strings, usage of wild cards or byte string distance/difference.
15. A method according to claim 12 where the encrypted results are labelled with temporary identifiers, whose mapping to corresponding files and byte locations are relayed to the anti-virus server.
16. A method of claim 12 for scanning encrypted file for viruses based on matching encrypted data with at least one encrypted virus signatures.
17. A system of claim 12 for scanning encrypted file for viruses based on matching encrypted data with at least one encrypted virus signatures.
18. A method of claim 12 for scanning encrypted files for malware based on matching encrypted data with at least one encrypted malware signatures.
19. A system of claim 12 for scanning encrypted files for malware based on matching encrypted data with at least one encrypted malware signatures.
20. A method of claim 12 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
21. A system of claim 12 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
22. A method of searching, scanning and detecting computer viruses and malwares in unencrypted data. The method comprises:
- Transferring virus and malware signatures to storage server/device - Performing pattern matching of the virus/malware signatures on the data set - Analyzing the pattern matching results to identify virus/malware.
23. A method of claim 22 for scanning encrypted file for viruses based on matching encrypted data with at least one encrypted virus signatures.
24. A system of claim 22 for scanning encrypted file for viruses based on matching encrypted data with at least one encrypted virus signatures.
25. A method of claim 22 for scanning encrypted files for malware based on matching encrypted data with at least one encrypted malware signatures.
26. A system of claim 22 for scanning encrypted files for malware based on matching encrypted data with at least one encrypted malware signatures.
27. A method of claim 22 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
28. A system of claim 22 for scanning encrypted files for viruses without scanning or transporting the encrypted files them selves.
29. A cloud based virus and malware scanning system pertaining to claim 1, 12 and 22 for encrypted data.
30. A network based virus and malware scanning system pertaining to claim 1, 12 and 22 for encrypted data.
31. A device based virus and malware scanning system pertaining to claim 1, 12 and 22 for encrypted data.
32. A method or system pertaining to claim 1, 12 and 22 implemented as an embedded integrated circuit.
33. A pre-screening method for encrypted files to ensure no manipulation (integrity), operation on the data.
34. A pre-screening system for encrypted files to ensure no manipulation (integrity), operation on the data.
35. An audit method for encrypted files to ensure no manipulation (integrity), based on operations in the files encrypted indices.
36. An audit system for encrypted files to ensure no manipulation (integrity), operation on the for the file encrypted indices.
CA2935130A 2016-07-26 2016-07-26 Encrypted data - computer virus, malware and ransom ware detection system Abandoned CA2935130A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA2935130A CA2935130A1 (en) 2016-07-26 2016-07-26 Encrypted data - computer virus, malware and ransom ware detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA2935130A CA2935130A1 (en) 2016-07-26 2016-07-26 Encrypted data - computer virus, malware and ransom ware detection system

Publications (1)

Publication Number Publication Date
CA2935130A1 true CA2935130A1 (en) 2018-01-26

Family

ID=61008613

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2935130A Abandoned CA2935130A1 (en) 2016-07-26 2016-07-26 Encrypted data - computer virus, malware and ransom ware detection system

Country Status (1)

Country Link
CA (1) CA2935130A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3623981A1 (en) * 2018-09-12 2020-03-18 British Telecommunications public limited company Index based ransomware categorisation
CN111417121A (en) * 2020-02-17 2020-07-14 西安电子科技大学 Multi-malware hybrid detection method, system and device with privacy protection function
CN111522778A (en) * 2020-04-27 2020-08-11 广州大学 File migration method
EP3937419A1 (en) * 2020-07-07 2022-01-12 Samsung Electronics Co., Ltd. Electronic device using homomorphic encryption and encrypted data processing method thereof
US11270016B2 (en) 2018-09-12 2022-03-08 British Telecommunications Public Limited Company Ransomware encryption algorithm determination
US11449612B2 (en) 2018-09-12 2022-09-20 British Telecommunications Public Limited Company Ransomware remediation
US11677757B2 (en) 2017-03-28 2023-06-13 British Telecommunications Public Limited Company Initialization vector identification for encrypted malware traffic detection
CN116992447A (en) * 2023-09-21 2023-11-03 北京安天网络安全技术有限公司 Malicious file detection method, electronic equipment and storage medium
US11824967B2 (en) 2020-07-07 2023-11-21 Samsung Electronics Co., Ltd. Electronic device using homomorphic encryption and encrypted data processing method thereof

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11677757B2 (en) 2017-03-28 2023-06-13 British Telecommunications Public Limited Company Initialization vector identification for encrypted malware traffic detection
EP3623981A1 (en) * 2018-09-12 2020-03-18 British Telecommunications public limited company Index based ransomware categorisation
US11270016B2 (en) 2018-09-12 2022-03-08 British Telecommunications Public Limited Company Ransomware encryption algorithm determination
US11449612B2 (en) 2018-09-12 2022-09-20 British Telecommunications Public Limited Company Ransomware remediation
CN111417121A (en) * 2020-02-17 2020-07-14 西安电子科技大学 Multi-malware hybrid detection method, system and device with privacy protection function
CN111417121B (en) * 2020-02-17 2022-04-12 西安电子科技大学 Multi-malware hybrid detection method, system and device with privacy protection function
CN111522778A (en) * 2020-04-27 2020-08-11 广州大学 File migration method
CN111522778B (en) * 2020-04-27 2022-04-19 广州大学 File migration method
EP3937419A1 (en) * 2020-07-07 2022-01-12 Samsung Electronics Co., Ltd. Electronic device using homomorphic encryption and encrypted data processing method thereof
US11824967B2 (en) 2020-07-07 2023-11-21 Samsung Electronics Co., Ltd. Electronic device using homomorphic encryption and encrypted data processing method thereof
CN116992447A (en) * 2023-09-21 2023-11-03 北京安天网络安全技术有限公司 Malicious file detection method, electronic equipment and storage medium
CN116992447B (en) * 2023-09-21 2023-12-15 北京安天网络安全技术有限公司 Malicious file detection method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CA2935130A1 (en) Encrypted data - computer virus, malware and ransom ware detection system
Cabaj et al. Software-defined networking-based crypto ransomware detection using HTTP traffic characteristics
US20210099287A1 (en) Cryptographic key generation for logically sharded data stores
US10437997B2 (en) Method and apparatus for retroactively detecting malicious or otherwise undesirable software as well as clean software through intelligent rescanning
US10586057B2 (en) Processing data queries in a logically sharded data store
Tan et al. A root privilege management scheme with revocable authorization for Android devices
US10148643B2 (en) Authenticating or controlling software application on end user device
US9647843B2 (en) System and method for secure database queries
US8694467B2 (en) Random number based data integrity verification method and system for distributed cloud storage
CN107506659B (en) Data protection system and method of general database based on SGX
US11126718B2 (en) Method for decrypting data encrypted by ransomware
US20140223580A1 (en) Method of and apparatus for processing software using hash function to secure software, and computer-readable medium storing executable instructions for performing the method
US20080025515A1 (en) Systems and Methods for Digitally-Signed Updates
CA3065767C (en) Cryptographic key generation for logically sharded data stores
Shepherd et al. EmLog: tamper-resistant system logging for constrained devices with TEEs
Song et al. Impeding Automated Malware Analysis with Environment-sensitive Malware.
Kaushik et al. Attack penetration system for SQL injection
US7779269B2 (en) Technique for preventing illegal invocation of software programs
Asghar et al. Use of cryptography in malware obfuscation
Poon et al. Scanning for viruses on encrypted cloud storage
Kupershtein et al. The database-oriented approach to files protection in android operation system
EP4123486A1 (en) Systems and methods for improved researcher privacy in distributed ledger-based query logging systems
Sood Physically Unclonable Functions with Confidential Computing for Enhanced Encryption of EHRs
Almansa Arévalo Hashing: Types, Benefits and Security Issues
da Rocha et al. Trusted Client-Side Encryption for Cloud Storage

Legal Events

Date Code Title Description
FZDE Dead

Effective date: 20190726