CN111417121B - Multi-malware hybrid detection method, system and device with privacy protection function - Google Patents

Multi-malware hybrid detection method, system and device with privacy protection function Download PDF

Info

Publication number
CN111417121B
CN111417121B CN202010097900.8A CN202010097900A CN111417121B CN 111417121 B CN111417121 B CN 111417121B CN 202010097900 A CN202010097900 A CN 202010097900A CN 111417121 B CN111417121 B CN 111417121B
Authority
CN
China
Prior art keywords
software
detection
data
client
malicious
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010097900.8A
Other languages
Chinese (zh)
Other versions
CN111417121A (en
Inventor
王静雯
闫峥
于熙洵
彭立
魏文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010097900.8A priority Critical patent/CN111417121B/en
Publication of CN111417121A publication Critical patent/CN111417121A/en
Application granted granted Critical
Publication of CN111417121B publication Critical patent/CN111417121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • H04W12/128Anti-malware arrangements, e.g. protection against SMS fraud or mobile malware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/02Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/04Key management, e.g. using generic bootstrapping architecture [GBA]
    • H04W12/041Key generation or derivation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Bioethics (AREA)
  • Virology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of malicious software detection, and discloses a multi-malicious software hybrid detection method, system and device with privacy protection. A third party generates a public and private key pair according to a homomorphic encryption algorithm and issues a public key; the client collects behavior data of the user group using software, performs primary calculation, uses a third party public key for encryption, adds the encrypted data with the generated random number and uploads the result to the server; the server side encrypts data according to the uploaded user group by using a credit evaluation algorithm, performs interactive decryption with a third party by using homomorphism properties, completes calculation of credit values of different software, and determines a software detection sequence according to the credit values of the software; during detection, the server side sequentially calls API use frequency data obtained by the decompilated software APK from the client side according to the sequence, static detection is carried out on the software according to the static learning model, if the detection result is non-malicious, related encrypted data and a public key of the encrypted data are called according to a system collected by the client side, and real-time detection is carried out by utilizing a homomorphism property and a dynamic learning model.

Description

Multi-malware hybrid detection method, system and device with privacy protection function
Technical Field
The invention belongs to the technical field of malicious software detection, and particularly relates to a multi-malicious software hybrid detection method, system and device with privacy protection.
Background
Currently, the closest prior art: the mobile malicious software is application software which is operated on a mobile terminal with a mobile communication function, such as a smart phone, and has malicious behaviors of eavesdropping user calls, stealing user information, destroying user data, using payment services without permission, sending junk information, pushing advertisements or fraud information, influencing the operation of the mobile terminal, damaging the network security of the internet and the like. With the comprehensive popularization of smart phones and the vigorous development of the mobile application industry, mobile malicious software also widely floods the application market. The mobile malicious software often introduces security holes into the mobile equipment, so that economic loss of users is caused, and privacy disclosure and other problems are caused. Therefore, establishing mechanisms to protect users from being attacked by discovering and preventing malware has become a direction of interest to many researchers in the security field. Today, the Android system is most widely used, and therefore, we focus research on malware detection for the Android system. In actual detection of malicious software, considering different popularity of different software, once malicious behavior occurs, the influence surface and the brought harm are different, and an optimized scheme should introduce a software influence degree evaluation mechanism with privacy protection and perform preferential detection on software with wide influence range, so as to early discover and early defend and reduce the brought serious damage. Meanwhile, the scheme for detecting the malicious software has high calculation cost, related services are often provided by means of a cloud server, and some privacy of the user can be hidden by data uploaded to the cloud server by the user, which brings privacy problems.
Android malware detection schemes are roughly divided into three categories: static detection, dynamic detection, and hybrid detection. The existing scheme is briefly described as follows:
(1) static detection refers to determining malware by looking for malicious features and malicious code segments without executing an application. In existing solutions, information such as permissions, code, etc. is usually obtained by decompiling the APK file of the application, from which malicious features are sought for detection. Sujithra and Padmavathi [1] propose a scheme for Android malware detection using a classification method and an optimization method in machine learning by using authority information. The method comprises the steps of decompiling an APK file of an Android application program, obtaining information related to authority from a configuration file as characteristics, after characteristic selection is completed, carrying out model training by using a classification algorithm in machine learning to obtain a classifier, enabling the classifier to divide software into normal software and malicious software, and then detecting the malicious software by using the classifier. However, if the APK file is generated by using the obfuscation technique, the method may not obtain the correct configuration information, and thus the detection may not be performed. Kapse et al [2] obtain data information in the configuration file related to permissions, components, and API calls by decompiling. And distributing weight values according to the malicious behaviors of the malicious software, and distributing the most common authority and API in the malicious software with the maximum weight values. And determining a threshold value of the weight by analyzing the malicious software and the normal software, and judging whether the software is malicious or not according to the threshold value. The scheme can cope with the confusion policy attack of the malicious software, but cannot detect the action of the application program defined by the runtime. Arp et al [3] propose a solution for static analysis by obtaining various characteristics from configuration files and software code. According to the scheme, the APK file is decompiled to obtain information such as permission, API call and network address in a configuration file and a code, the information is mapped to a vector space, then machine learning is used for model training, and a detection model is obtained for later detection. However, this approach does not detect malware that uses obfuscation techniques as well as dynamic code loading techniques. The static detection method based on the API grade obtains the API, the package and the parameter information in the codes as the characteristics by decompiling the APK files of the malicious software and normal software samples, and then uses a machine learning KNN algorithm to train a classifier to detect whether the software is malicious or not. The scheme is based on a KNN algorithm, and the operation cost is high. Wu et al [5] designed a static detection system DroidMat that extracts information such as request permission, intent, etc. from a manifest file of software by decompiling an APK file, and simultaneously obtains API calls of each component, and performs enhanced modeling using a K-means algorithm. And finally, completing the classification of the software by using a KNN algorithm. Also, this approach fails to detect dynamically loaded malicious code. In summary, the static detection method is simple and efficient in the data extraction stage, but is easily deceived by the obfuscation scheme, and cannot detect the behavior of the application program defined at runtime.
(2) And dynamic detection, namely executing the application program in the isolation environment to acquire the dynamic behavior of the application program, and detecting the malicious software according to the dynamic behavior. Burguera et al [6] propose a solution for analyzing software dynamic behavior by collecting system calls from multiple real users using crowd sourcing and a central server. After the Linux system call of the application is collected, the Linux system call is sent to the central server, and the central server detects the corresponding software by using a clustering algorithm in machine learning. However, in this scheme, the system call collected from the user implies the use behavior information of the software by the user, and the information belongs to the behavior privacy information of the user and is not protected in the scheme. Shabtai et al [7] propose a behavior-based Android malware detection system. The system acquires data by continuously monitoring status information of the equipment, such as power consumption, CPU consumption and the like. Thereafter, normal software and malware are distinguished using machine learning algorithms. But the solution stays in the theoretical part and no real data set is used for testing. Zhao et al [8] designed a detection framework based on SVM algorithm-AntiMalDroid, which utilizes machine learning for detection. The framework is roughly divided into two phases, a training phase and a detection phase. In the training phase, the software sample which is known to be malicious or not is utilized, the behavior and the characteristic of the software are monitored in the software execution process, and the behavior and the characteristic serve as characteristics, and the SVM algorithm is utilized to train the detection model. In the detection stage, the software to be detected is detected by using the model obtained in the training stage. The scheme consumes more time in the detection process. Dini et al [9] designed a multi-layered Android malware detector that could monitor the Android system at both the kernel and user layers and use machine learning techniques to distinguish between normal and malicious behavior to detect software. And in the kernel layer, the system call, the running process, the CPU use condition and other information are evaluated. At the user level, it evaluates the information of keystrokes, dialed numbers, SMS, bluetooth and Wi-Fi sent and received. However, the scheme is complex in process and is not suitable for real-time detection of real scenes. In summary, compared to the static detection method, the result of the dynamic detection is more accurate, but a large amount of computing resources are consumed.
(3) The hybrid detection combines the static detection and the dynamic detection, and balances the advantages and the disadvantages of the hybrid detection and the dynamic detection. Architecture for detection using data such as opcodes, text information, system calls, administrator privileges, etc. Martinelli et al [10] propose a hybrid detection architecture. For the software to be detected, the scheme firstly carries out decompiling on the software to be detected to obtain an operation code, and then uses an SVM algorithm in machine learning as a classifier to divide the application into normal and malicious applications. And for the software with a normal detection result, a dynamic detection method is used for acquiring text information, system call and administrator permission of the software as characteristics, and a classifier and a security policy are used for detecting whether the software is malicious or not. However, the scheme does not consider that the software is sequenced according to the influence degree before detection, and the software with high influence degree is preferentially detected so as to optimally reduce the damage caused by the malicious behaviors of the software. Meanwhile, during real-time detection, the used system call implies the use behavior of the user on the application, which relates to privacy, so that the privacy protection problem needs to be considered. Yuan et al [11] proposed a method of extracting features from static analysis and dynamic analysis, respectively, and detecting using deep learning. It extracts permissions, sensitive APIs and dynamic behavior as features. Wherein the authority and sensitive API information is extracted through an APK file of a decompilation application, and the dynamic behavior information is extracted through a safe isolatable execution environment-sandbox. And finally, detecting the software by using deep learning as a detection algorithm. The scheme has no real-time detection process and the deep learning is more complex. Blaising et al [12] proposed AASandbox to test software by a combination of static and dynamic assays. A static analysis part, which first analyzes using the decompiled dex file. And then analyzing the interaction information of the application and the system bottom layer by executing the application program in the isolated sandbox environment to complete dynamic analysis. However, this solution does not involve detection of the application at real-time runtime, nor does it take priority order and privacy protection into account. In summary, in the existing solutions, sequencing the detected sequence of the software during detection according to the influence degree of the application is not considered, and the problem of privacy protection in the detection process is also not considered.
In summary, the problems of the prior art are as follows:
(1) the prior art lacks a scheme capable of effectively detecting whether an application program is malicious or not before installation and use and in real-time running.
(2) In the prior art, the detection priority is not considered, and for popular and widely used software, whether the software is malicious or not should be preferentially detected in order to prevent damage caused by malicious behaviors.
(3) In the prior art, privacy protection of a user in a cloud server malicious software detection process is not considered.
The difficulty of solving the technical problems is as follows:
the technical problem is that how to determine an effective scheme for detection before installation and during real-time operation of an application program, how to determine a detection priority order can reduce damage caused by malicious applications, and how to protect privacy of a user in the whole detection process.
The significance of solving the technical problems is as follows:
(1) the application program is detected before installation and in real-time operation, so that the detection completeness and effectiveness can be ensured, and damage caused by malicious application is prevented.
(2) The privacy information of the user is protected in the detection process, and the privacy of the user can be prevented from being invaded.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multi-malicious software hybrid detection method, a system and a device with privacy protection.
The invention is realized in such a way that a multi-malware mixed detection method with privacy protection comprises the following steps:
firstly, a third party generates a public and private key pair according to a homomorphic encryption key generation algorithm and publishes a public key to all clients and a server;
secondly, the client collects behavior data of different software used by the user group, performs simple preliminary calculation, encrypts the data by using a public key from a third party, adds the data with the generated random number, and uploads the result to the server;
thirdly, the server side completes calculation of different software credit values by using homomorphic addition property according to the uploaded encrypted data of the user group and through interactive decryption with a third party under the condition that privacy data of the client side are not obtained by using a credit evaluation algorithm, sorts the software credit values according to the sizes of the software credit values and determines the detection sequence of the software;
and fourthly, during detection, the server side sequentially and alternately calls API use frequency data which are obtained by using decompilated software APK and correspond to the software with the client side according to the sequence, sequentially and statically detects the software according to a static learning model, and for the software with a static detection result of non-malicious, carries out real-time detection by utilizing homomorphic addition property and a dynamic learning model according to system call related encrypted data collected by the client side and a public key which is issued by the client side and generated according to a homomorphic encryption key generation algorithm.
Further, the reputation evaluation of the multi-malware mixed detection method quantifies the popularity of the software according to the use condition of the software by a user, and the popularity is expressed by a reputation value; a semi-trusted third party is introduced, and privacy protection is achieved by means of a homomorphic encryption technology.
Further comprising:
firstly, a third party generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGenp,SKp) And the public key PKpPublishing to all clients and servers;
secondly, the client collects the use times, duration and frequency information of the software in a given time window, and preliminarily calculates the recommendation credit value s of the software according to a formulakAggregated reputation values
Figure BDA0002385835880000041
And their product
Figure BDA0002385835880000042
Third, the client uses the public key PK provided by the third partypAnd a self-generated random number rkEncrypting the data to obtain HE(s)k+rk) And
Figure BDA0002385835880000051
the encrypted data and rkSending the data to the server side together;
fourthly, after the server side obtains the data from all the clients, all the random numbers are summed to obtain
Figure BDA0002385835880000052
Calculated using the homomorphism properties mentioned
Figure BDA0002385835880000053
And
Figure BDA0002385835880000054
and sending the data to a third party for decryption;
fifthly, the third party uses the private key SK of the third partypDecrypting the received encrypted data to obtain decrypted data
Figure BDA0002385835880000055
And
Figure BDA0002385835880000056
and sending the data to the server;
sixthly, the server receivesData known to itself after coming from a third party
Figure BDA0002385835880000057
Are subtracted to obtain
Figure BDA0002385835880000058
And
Figure BDA0002385835880000059
secondly, calculating by using a formula to obtain an applied credit value R (i), and obtaining the credit value of the software by the server side on the premise of not knowing data information sent by each client side;
Figure BDA00023858358800000510
wherein,
Figure BDA00023858358800000511
to SkIs calculated as follows:
Figure BDA00023858358800000512
wherein,
Figure BDA00023858358800000513
the initial value of gamma is 0, and when y is less than 0, gamma is equal to gamma + 1; thr is 3, δ is 0.05, and μ is 0.1.
Further, the second step includes:
1) by monitoring the number of times N that a user uses software in a given time windowi(t), duration UTi(t) and frequency FEi(t), quantifying the use behavior UB, the reflection behavior RB and the association behavior CB of the software, wherein the formula is as follows:
a) quantification of UB, expressed as the UB component of the personal trust value of the user for software i at time t, is formulated as follows:
Figure BDA00023858358800000514
b) the quantification of RB, expressed as the RB component of the user's personal trust value for software i at time t, is formulated as follows:
Ti(t)RB=2(dt{Ni(t)+UTi(t)+FEi(t)});
wherein,
Figure BDA0002385835880000061
c) the quantification of CB, expressed as the CB component of the user's personal trust value for software i at time t, is formulated as follows:
Figure BDA0002385835880000062
Figure BDA0002385835880000063
wherein,
Figure BDA0002385835880000064
2) calculating the trust value T of the user to the software i according to the quantized values of UB, RB and CBi(t):
Figure BDA0002385835880000065
Wherein,
Figure BDA0002385835880000066
3) reputation value R (i) of software i, and recommended trust value S for software i by user using the softwarekAnd an aggregate reputation value calculated based on the user's usage experience
Figure BDA0002385835880000067
Correlation, the calculation formula is as follows:
Figure BDA0002385835880000068
wherein,
Figure BDA0002385835880000069
further, after the application program is downloaded and before the application program is installed and used, the static detection method of the multi-malware hybrid detection method obtains the danger level authority and the information of the corresponding API as the characteristics by decompiling the APK file, and then completes the detection of the malware by using a machine learning method, wherein the detection process is as follows:
the method comprises the steps that firstly, the service end performs decompiling on the APK of the existing normal software and malicious software of known types, the frequency of occurrence of a system API corresponding to the used dangerous authority is obtained, the frequency is used as a characteristic, a supervised learning algorithm in machine learning is operated to perform model training, and a classifier for performing static detection on the software is obtained;
secondly, the client side decompiles the downloaded APK, obtains the occurrence times of the API corresponding to the danger level authority in the file, and uploads the data to the server side for detection;
and thirdly, after receiving the data from the client, the server detects the software by using a classifier obtained by offline training, judges whether the software is malicious or not and returns the result to the client.
Further, the real-time detection of the multi-malicious software hybrid detection method utilizes the system call sequence data during the software operation to perform real-time detection, and once a malicious behavior is found, the malicious behavior is immediately reflected to the client. Meanwhile, the collected system call information implies the behavior privacy of the user using the software; the method specifically comprises the following steps:
the method comprises the following steps that firstly, key generation of a client side and online model training of a server side are divided into two stages;
off-line training: the server-side carries out simulation operation on the existing sample sets of normal and malicious software, respectively obtains the system calling sequences of the normal and malicious software, selects the feature set of the sequences by using a feature selection algorithm, and converts each sample into a feature vector form for representation based on the feature set; training the model by using an SVM algorithm and obtaining omega and b values in a decision function for real-time detection; the decision function is formulated as:
Figure BDA0002385835880000071
and (3) key generation: the client generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGenp,SKp) And the public key is issued to the server;
secondly, the client side obtains a system call sequence in the software use process in a given time window, and the frequency ({ x) of the corresponding feature occurrence is counted according to the feature seti1, …, n) and uses the public key PKpFor each characteristic value xiEncrypted characteristic vector [ HE (x) is obtained by encryption1),HE(x2),...,HE(xn)]Then sending the data to a server;
thirdly, after receiving the encrypted data from the client, the server uses the public key PKpEncrypting the value b obtained in the first step of off-line training to obtain HE (b); according to homomorphism properties, calculating the HE (omega x + b) by the following formula;
Figure BDA0002385835880000072
then, sending HE (ω x + b) to the client;
fourthly, the client uses the private key SKpAnd decrypting the data to obtain omega x + b, and obtaining whether the software is malicious or not according to a decision function formula.
It is another object of the present invention to provide a program storage medium for receiving user input, the stored computer program causing an electronic device to execute steps comprising:
firstly, a client collects behavior data of a user group using different software, simply performs primary calculation, and uploads a result to a server;
secondly, the server side calculates credit values corresponding to different software according to the uploaded data by using a credit evaluation algorithm, sorts the credit values according to the credit values of the software and determines the detection sequence of the software;
and thirdly, during detection, the server side sequentially interacts with the client side according to the sequence to call API use time data which are obtained by using decompilated software APK and correspond to the software, the software is sequentially subjected to static detection according to a static learning model, and for the software with a non-malicious static detection result, the server side calls related data according to the system collected by the client side and performs real-time detection according to a dynamic learning model.
Another object of the present invention is to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface to implement the multi-malware hybrid detection method when executed on an electronic device.
Another object of the present invention is to provide a multi-malware hybrid detection system implementing the multi-malware hybrid detection method, including:
the credit evaluation module is used for quantifying the popularity and the influence of the software, determining the detection sequence of the software and preferentially detecting the software with high credit value;
the static detection module is used for decompiling and detecting the APK by using a static detection method after the software is downloaded, and if the software is malicious, the user is advised not to install the software and the software is directly deleted; installing the remaining software which is detected to be normal;
the real-time detection module is used for monitoring a system calling sequence of the software in a specified time window when a user uses the software, acquiring data and detecting the data in real time; and if the malicious software is detected, feeding back to the user.
Another object of the present invention is to provide a multi-malware hybrid detection apparatus equipped with the multi-malware hybrid detection system, the multi-malware hybrid detection apparatus including:
the client is installed in all the user equipment and is responsible for collecting data used for priority evaluation and malicious software detection of each piece of equipment and sending the data to the server;
the server side evaluates related data according to the priorities uploaded by the client side, calculates the credit values of all software and evaluates the priorities of the software; judging whether the software is malicious or not according to the uploaded detection related data and returning a result to the client;
and the third-party module is used for assisting the interactive calculation process of the client and the server and realizing the privacy protection of the client during the priority evaluation and detection of the malicious software.
In summary, the advantages and positive effects of the invention are: the prior art lacks a scheme capable of effectively and simultaneously detecting whether software is malicious or not before installation and use and in real-time running in a real application scene. Existing malware detection schemes are broadly divided into three categories: static detection, dynamic detection, and hybrid detection. Static detection has the advantage that it is convenient and fast to acquire detection data, but is vulnerable to confusing systems. At the same time, it does not support the detection of applications defined at runtime. The dynamic detection can make up for the defects of the static detection. However, dynamic detection generally runs a program in an isolated environment to acquire data, and the detection is not real-time detection in a real use scene of a user. The hybrid detection method integrates static detection and dynamic detection, balances the advantages and disadvantages of the static detection and the dynamic detection, but still does not support real-time detection in a real scene. Therefore, the invention provides a scheme which can simultaneously and effectively carry out static detection and real-time detection under a real application use scene.
Existing malware detection schemes do not take into account differences in software priorities in real scenarios. In a real scene, different software has different influence degrees, the software with strong influence degree is popular, the downloading amount is large, the use times are large, and once the software has malicious behaviors, the damage to a user is large. Therefore, in the detection of malicious software, software with high influence degree should be preferentially detected so as to optimally reduce the damage degree of the software to users.
The existing malware detection scheme does not consider the privacy problem of user data. In an actual application scenario, a user needs to use a cloud service or other third-party services to integrate and evaluate the malicious degree of various types of software and outsource detection operation of malicious software. However, when the user interacts with the third-party service, data implying own private information is uploaded, which may cause the privacy of the user to be violated. Therefore, the invention provides a malicious software detection scheme with privacy protection, thereby relieving the worries of the user and preventing the user from being invaded.
The invention provides a detection system capable of simultaneously detecting an application program before installation and use and in real-time operation. The software detection priority ordering scheme with privacy protection optimizes the overall detection process of multiple malicious software. And a protection mechanism is added to realize privacy protection during real-time detection of the malicious software. A detection system capable of simultaneously carrying out malicious detection before software installation and in real-time operation is provided. The detection priorities are sorted. And evaluating the influence degree of the software before detection, and preferentially detecting the software with strong influence degree. And a privacy protection technology is added to protect data which may reveal the user privacy in the priority sequencing and detection processes.
Compared with the prior art, the invention has the following advantages:
(1) the detection is more comprehensive: before software installation, some malicious software can be discovered by decompiling the APK file for detection, but some applications may reload codes in the running process to implement malicious behaviors. Aiming at the phenomenon, the invention carries out real-time detection by acquiring a system calling sequence when software runs. By the detection mode, the detection can be more comprehensive.
(2) And (3) reducing damage: the detection process can be time-costly. The invention provides the method for evaluating the influence of software before detection, firstly preferentially detects the software with strong influence degree, and can find and prevent the software early if the application has problems. The optimization reduces the damage caused by malicious activities.
(3) Privacy protection: at present, with the continuous development of technologies such as data mining and the like, once data implying user privacy is leaked, the user privacy is easily violated. The invention introduces a privacy protection algorithm to protect the data of the user and protect the privacy of the user.
(4) Effectiveness: the scheme for effectively detecting the system API data of the APK through decompiling and the scheme for detecting the malicious software through obtaining the system calling sequence are both schemes for effectively detecting the malicious software.
(5) Flexibility: the detection system provided by the invention can be used for detecting before and during software installation, so that the risk of using malicious software by a user is reduced. Meanwhile, a part of malicious software is filtered by detection before installation, so that the number of the real-time detection software is reduced, and the system overhead of the detection of the malicious software during running is further reduced.
TABLE 3 comparative analysis of the present work with the present invention
Figure BDA0002385835880000101
Note: x represents not mentioned, and √ represents solved
[1]M.Sujithra and G.Padmavathi,“Enhanced Permission Based Malware Detection in Mobile Devices Using Optimized Random Forest Classifier with PSO-GA,”Research Journal of Applied Sciences,Engineering and Technology,vol.12,no.7,pp.732-741,2016.
[2]G.Kapseand A.Gupta,“Detection of Malware on Android based on Application Features,”International Journal of Computer Science and Information Technologies,vol.6,no.4,pp.3561-3564,2015.
[3]D.Arp,M.Spreitzenbarth,M.Hubner,H.Gascon,andK.Rieck,“Drebin:Effective and explainable detection of android malware in your pocket,”in Network and Distributed System Security Symposium,2014,vol.14,pp.23-26.
[4]Y.Aafer,W.Du,and H.Yin,“DroidAPIMiner:Mining API-level features for robust malware detection in android,”in Security and Privacy in Communication Networks-9th International ICST Conference,SecureComm 2013,Revised Selected Papers,2013,vol.127,pp.86-103.
[5]D.Wu,C.Mao,T.Wei,H.Lee,and K.Wu,“Droidmat:Android malware detection through manifest and api calls tracing,”in 2012Seventh Asia Joint Conference on Information Security,Tokyo,2012,pp.62-69.
[6]I.Burguera,U.Zurutuza,and S.Nadjm-Tehrani,“Crowdroid:behavior-based malware detection system forAndroid,”in Proceedings ofthe 1st ACM workshop on Security and privacy in smartphones andmobile devices,2011,pp.15–26.
[7]A.Shabtai,U.Kanonov,Y.Elovici,C.Glezer,and Y.Weiss,“Andromaly:a behavioral malware detection framework for android devices,”Journal ofIntelligent Information Systems,vol.38,no.1,pp.161–190,2012.
[8]M.Zhao,F.Ge,T.Zhang,and Z.Yuan,“AntiMalDroid:An efficient SVM-based malware detection framework for android,”Communications in Computer and Information Science,vol.243,pp.158–166,2011.
[9]G.Dini,F.Martinelli,A.Saracino,and D.Sgandurra,“MADAM:a multi-level anomaly detector for android malware,”in International Conference on Mathematical Methods,Models and Architectures for ComputerNetwork Security,2012,pp.240-253.
[10]F.Martinelli,F.Mercaldo,andA.Saracino,“BIRDEMAID:a hybrid tool for accurate detection ofAndroid malware,”inProceedings ofthe 2017ACM on Asia Conference on Computer andCommunications Security,2017,pp.899–901.
[11]Z.Yuan,Y.Lu,Z.Wang,and Y.Xue,“Droid-sec:deep learning in android malware detection,”ACMSIGCOMM Computer CommunicationReview,vol.44,no.4,pp.371-372,2014.
[12]T.
Figure BDA0002385835880000111
L.Batyuk,A.D.Schmidt,S.A.Camtepe,and S.Albayrak,“An android application sandbox system for suspicious software detection,”in Proceedings of the 5th IEEE International Conference Malicious Unwanted Software,2010,pp.55–62.
[13]Z.Yan,P.Zhang,and R.H.Deng,“TruBeRepec:a trust-behavior-based reputation and recommender system for mobile applications,”Personal and Ubiquitous Computing,vol.16,no.5,pp.485-506,2012.
Drawings
Fig. 1 is a flowchart of a multi-malware hybrid detection method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a multi-malware hybrid detection system according to an embodiment of the present invention;
in the figure: 1. a reputation evaluation module; 2. a static detection module; 3. and a real-time detection module.
Fig. 3 is a schematic structural diagram of a multi-malware hybrid detection apparatus according to an embodiment of the present invention;
in the figure: 4. a client; 5. a server side; 6. a third party module.
Fig. 4 is a schematic diagram of a multi-malware hybrid detection system provided in an embodiment of the present invention.
Fig. 5 is a flowchart of an implementation of a multi-malware hybrid detection method according to an embodiment of the present invention.
FIG. 6 is a flow diagram of reputation evaluation module interaction provided by an embodiment of the present invention.
Fig. 7 is a flowchart of real-time detection interaction provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, and an apparatus for multi-malware hybrid detection with privacy protection, which is provided with a privacy-enhanced multi-malware hybrid detection system with priority evaluation, and the present invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a multi-malware hybrid detection method provided in an embodiment of the present invention includes the following steps:
s101, a third party generates a public and private key pair according to a homomorphic encryption key generation algorithm and publishes a public key to all clients and a server;
s102: the client collects behavior data of a user group using different software, performs simple preliminary calculation, encrypts the data by using a public key from a third party, adds the encrypted data with the generated random number, and uploads the result to the server;
s103: the server side completes calculation of different software credit values by using a credit evaluation algorithm according to the uploaded encrypted data of the user group, by using homomorphic addition property and through interactive decryption with a third party under the condition of not obtaining privacy information of the client side, sorts the software credit values according to the magnitude of the software credit values and determines the detection sequence of the software;
s104: during detection, the server side sequentially and interactively calls API use frequency data, obtained by using decompilated software APK, corresponding to the software with the client side according to the sequence, static detection is sequentially carried out on the software according to a static learning model, and for the software with a non-malicious static detection result, real-time detection is carried out by utilizing homomorphic addition property and a dynamic learning model according to system call related encrypted data collected by the client side and a public key generated by a homomorphic encryption key generation algorithm and issued by the client side.
As shown in fig. 2, the multi-malware hybrid detection system provided in the embodiment of the present invention includes:
and the credit evaluation module 1 is used for quantifying the popularity and the influence of the software, determining the detection sequence of the software and preferentially detecting the software with high credit value.
The static detection module 2 is used for decompiling and detecting the APK by using a static detection method after the software is downloaded, and if the software is malicious, the user is advised not to install the software and the software is directly deleted; and installing the remaining software which is detected to be normal.
The real-time detection module 3 is used for monitoring a system calling sequence of the software in a specified time window when a user uses the software, acquiring data and detecting the data in real time; and if the malicious software is detected, feeding back to the user.
As shown in fig. 3, the apparatus for detecting multiple malware mixture provided by the embodiment of the present invention includes:
the client 4 is installed in all the user equipment, and is responsible for collecting data used for priority evaluation and malicious software detection of each piece of equipment and sending the data to the server;
the server 5 evaluates the related data according to the priorities uploaded by the client, calculates the credit values of the software and evaluates the priorities of the software; judging whether the software is malicious or not according to the uploaded detection related data and returning a result to the client;
and the third-party module 6 is used for assisting the interactive calculation process of the client and the server, and realizing the privacy protection of the client during the priority evaluation and detection of the malicious software.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
Table 1 abbreviations
Figure BDA0002385835880000131
TABLE 2 legends and definitions
Figure BDA0002385835880000132
Figure BDA0002385835880000141
1. Background of the invention
In the present invention, reputation evaluation, supervised learning, and homomorphic encryption related knowledge are required, and are briefly introduced here.
1.1 reputation evaluation
The invention uses a reputation evaluation method [13] to quantify the impact strength of software. It defines reputation as the degree to which the public believes an application can complete a task as desired. Reputation evaluation is the quantification of reputation using attributes that affect the reputation value. The reputation evaluation scheme quantifies the reputation value of the software by utilizing the use condition of the public to the software, and the higher the reputation value is, the more the software is used by the public, the stronger the influence degree is. The reputation evaluation method is described below.
1) By monitoring the number of times N that a user uses software in a given time windowi(t), duration UTi(t) and frequency FEi(t), quantifying the use behavior UB, the reflection behavior RB and the association behavior CB of the software, wherein the formula is as follows:
a) quantification of UB, expressed as the UB component of the personal trust value of the user for software i at time t, is formulated as follows:
Figure BDA0002385835880000142
b) the quantification of RB, expressed as the RB component of the user's personal trust value for software i at time t, is formulated as follows:
Ti(t)RB=2(dt{Ni(t)+UTi(t)+FEi(t)}) (2)
wherein,
Figure BDA0002385835880000143
c) the quantification of CB, expressed as the CB component of the user's personal trust value for software i at time t, is formulated as follows:
Figure BDA0002385835880000151
wherein,
Figure BDA0002385835880000152
2) calculating the trust value T of the user to the software i according to the quantized values of UB, RB and CBi(t):
Figure BDA0002385835880000153
Wherein,
Figure BDA0002385835880000154
3) the reputation value R (i) of the software i and the recommended trust value s for the user using the softwarekAnd an aggregate reputation value calculated based on the user's usage experience
Figure BDA00023858358800001511
Correlation, the calculation formula is as follows:
Figure BDA0002385835880000155
wherein,
Figure BDA0002385835880000156
Figure BDA0002385835880000157
wherein,
Figure BDA0002385835880000158
to skIs calculated as follows:
Figure BDA0002385835880000159
wherein,
Figure BDA00023858358800001510
the initial value of gamma is 0, and when y is less than 0, gamma is equal to gamma + 1; thr is 3, δ is 0.05, and μ is 0.1.
And (3) obtaining a credit value R (i) after the calculation, representing the popularity of the software and measuring the influence of the software.
1.2 supervised learning
In the invention, the acquisition of the detection model needs to be assisted by a supervised learning technology in machine learning. Supervised learning is the training of a function (model) from a training data set given known classes. According to the trained function (model), the new unknown class data can be predicted. The sample data in the training data set is composed of an input object (usually a feature vector) and an output value (the result data associated with the input object, usually called a label). Supervised learning is further classified into both regression-type supervised learning and classification-type supervised learning, depending on whether the output value is continuous or discrete. In the present invention, a classification method SVM is used, which is briefly described below.
Given a training data set:
T={(x1,y1),(x2,y2),...,(xN,yN)};
wherein x isiRepresenting a feature vector, yiFlags representing output values, i.e. classes, usually yiE {1, -1}, which means that the two categories are divided into yiWhen 1, represents xiThe positive case is the opposite case.
The goal of this algorithm learning is to find a separate hyperplane from the feature space, and instances can be divided into two categories by this hyperplane. SVMs are classified into linear and nonlinear types, and the present invention uses linear SVMs.
The linear SVM obtains a separation hyperplane (wherein omega, x are vectors, and omega · x is vector inner product operation) by interval maximization or learning of solving a convex quadratic programming problem for a given training data set:
ω·x+b=0 (8)
the corresponding classification decision function is:
Figure BDA0002385835880000161
1.3 homomorphic encryption
Homomorphic encryption is an encryption algorithm capable of realizing multiple operation functions among ciphertexts, namely, a decryption result after calculation among the ciphertexts is equivalent to a result of direct calculation of a plaintext. By utilizing the characteristic, a third party can be entrusted to process data without revealing information so as to realize privacy protection. Therefore, data related to privacy in the invention are protected by using a homomorphic encryption algorithm Paillier, and the Paillier cryptosystem is introduced as follows:
key generation (KeyGen):
1) two large prime numbers p and q are randomly selected, and the two prime numbers satisfy gcd (pq, (p-1) (q-1)) -1.
2) N ═ pq and λ ═ lcm (p-1, q-1) were calculated.
3) Selecting a random integer g to satisfy
Figure BDA0002385835880000171
And in the presence of (L (g))λmod n2))-1mod n, wherein
Figure BDA0002385835880000172
Given a public key of (n, g) and a private key of (λ, μ).
Encryption (Enc): for the message m needing encryption, an integer r is randomly selected, and the condition that r is more than 0 and less than n is satisfied
Figure BDA0002385835880000174
That is, gcd (r, n) is guaranteed to be 1. Then, m is encrypted with the public key (n, g): c is gm·rnmod n2And obtaining a ciphertext c.
Decryption (Dec): for ciphertext c, it is decrypted using the private key (λ, μ): m ═ L (c)λmod n2)·μmod n,
Figure BDA0002385835880000173
The invention realizes data privacy protection by utilizing the additive homomorphism property of the Paillier encryption algorithm, and the property is as follows:
HE(m1)*HE(m2)=HE(m1+m2)
HE(m)k=HE(m*k)
2. the system structure and the interaction process comprise three entities, and the functions of each entity are shown in fig. 4:
a client: the device is installed in all user equipment and is responsible for collecting data used for priority evaluation and malicious software detection of each equipment and sending the data to a server;
the server side: and evaluating related data according to the priority uploaded by the client, calculating the reputation value of each software, and evaluating the priority of each software. Judging whether the software is malicious or not according to the uploaded detection related data and returning a result to the client;
a third party: and the interactive calculation process of the client and the server is assisted, and the privacy protection of the client is realized during the priority evaluation and detection of the malicious software.
In this system, the client is installed on each user device, has the same rights as the user device, and is trusted. The server and the third party are semi-trusted, and both feel curious about data content sent by the client, and may snoop privacy information of the user uploaded by the client. Due to respective interests, the three parties can not mutually collude with each other and can integrity complete corresponding functions and tasks. Meanwhile, different entities in the system communicate with each other by adopting a secure channel. However, when the client communicates with the server, the data uploaded by the client implies the privacy of the behavior of the client and needs to be protected. Here the semi-trusted third party is responsible for privacy protection of the user.
3. Scheme flow and specific design
The invention provides an Android malicious software hybrid detection system with priority sequencing and privacy protection, and the general flow is shown in fig. 5. Firstly, the client collects behavior data of different software used by a user group, simply performs primary calculation, and uploads the result to the server. And then, the server side calculates credit values corresponding to different software according to the uploaded data by using a credit evaluation algorithm, sorts the credit values according to the credit values of the software, and determines the detection sequence of the software. During detection, the server side sequentially interacts with the client side according to the sequence to call API use frequency data, obtained by using decompilated software APK, corresponding to the software, and sequentially performs static detection on the software according to the static learning model, and for the software with a static detection result being non-malicious, real-time detection is performed according to the dynamic learning model and system call related data collected by the client side.
The scheme of the invention is mainly divided into three functional modules: reputation evaluation, static detection, and real-time detection. The reputation evaluation module is used for quantifying the popularity and the influence of the software and determining the detection sequence of the software. And preferentially detecting software with high reputation value. After the software is downloaded, a static detection method can be used for decompiling and detecting the APK, and if the software is malicious, a user is advised not to install the software but to delete the software directly. And installing the remaining software which is detected to be normal. When a user uses the software, the system calling sequence of the software is monitored in a specified time window, and data are acquired for real-time detection. And if the malicious software is detected, feeding back to the user.
3.1 reputation evaluation module, according to the user's use of software to quantify the software popularity, here expressed by reputation value, reputation value higher indicates more popular, once appear malicious behavior cause damage bigger, should detect this kind of software preferentially. However, the data sent by the user to the server implies the use condition of the user to the software, belongs to privacy data, and needs to be protected in privacy. Therefore, a semi-trusted third party is introduced, and privacy protection is achieved by means of a homomorphic encryption technology. The interaction process of this module is illustrated in fig. 6, which is described in detail below:
first step (key generation): third party according to the same as in 1.3A state encryption key generation algorithm KeyGen for generating a public and private key Pair (PK)p,SPp) And the public key PKpAnd publishing the data to all clients and the server.
Second step (data collection): the client collects the use times, duration and frequency information of the software in a given time window, and preliminarily calculates the recommendation credit value s of the software according to the formula (1) -formula (5) and formula (7) in 1.1kAggregated reputation values
Figure BDA0002385835880000181
And their product
Figure BDA0002385835880000182
Third step (data upload): the client uses the public key PK provided by the third partypAnd a self-generated random number rkEncrypting the data to obtain HE(s)k+rk) And
Figure BDA0002385835880000183
the encrypted data and rkAnd sending the data to the server side together.
Fourth step (preliminary calculation): after the server side obtains data from all the clients, all the random numbers are summed to obtain
Figure BDA0002385835880000184
Calculation of homomorphism Using the homomorphism Properties mentioned in 1.3
Figure BDA0002385835880000185
And
Figure BDA0002385835880000186
and send the data to a third party for decryption.
Fifth step (data decryption): the third party uses its own private key SKpDecrypting the received encrypted data to obtain decrypted data
Figure BDA0002385835880000191
And
Figure BDA0002385835880000192
and sends the data to the server.
Sixth step (final calculation): the server receives the data from the third party and then the data known to the server
Figure BDA0002385835880000193
Are subtracted to obtain
Figure BDA0002385835880000194
And
Figure BDA0002385835880000195
the reputation value r (i) for the application can then be calculated using equation (6) in section 1.1. Therefore, the server side obtains the reputation value of the software on the premise of not knowing the data information sent by each client side.
3.2 static detection Module
After the application program is downloaded and before the application program is installed and used, the module obtains the danger level authority and the information of the corresponding API as the characteristics by decompiling the APK file, and then completes the detection of the malicious software by utilizing a machine learning method. The detailed detection process is as follows:
first step (model training under line): the service end performs decompiling on the APK of the existing normal software and malicious software of known types, obtains the occurrence frequency of the system API corresponding to the used danger authority, and operates a supervised learning algorithm in machine learning to perform model training by taking the occurrence frequency as a characteristic to obtain a classifier for performing static detection on the software.
Second step (data collection): and the client side decompiles the downloaded APK, acquires the occurrence times of the API corresponding to the danger level authority in the file, and uploads the data to the server side for detection.
Third step (on-line detection): after receiving the data from the client, the server detects the software by using the classifier obtained by offline training, judges whether the software is malicious or not, and returns the result to the client.
3.3 real-time detection module
For the installed and used software, the module utilizes the system call sequence data of the software runtime to perform real-time detection, and once malicious behaviors are found, the malicious behaviors are immediately reflected to the client. Meanwhile, the collected system call information implies the behavior privacy of the user using software, and the invention protects the user using the homomorphic encryption technology. The interaction flow of this module is given in fig. 7, which is described in detail below:
first step (initialization phase): the method mainly comprises two stages of key generation of a client and online model training of a server.
Off-line training: the server-side carries out simulation operation on the existing sample sets of normal and malicious software, respectively obtains the system calling sequences of the normal and malicious software, selects the feature set of the sequences by using a feature selection algorithm, and converts each sample into a feature vector form for representation based on the feature set. Then, the model is trained using SVM algorithm and the decision function (ω and b values in equation (9)) is obtained for real-time detection.
And (3) key generation: the client generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGen in 1.3p,SKp) And issues the public key to the server.
Second step (real-time monitoring): the client acquires a system call sequence in the use process of software in a given time window, and counts the frequency ({ x) of the corresponding feature according to the feature seti1, n) and uses the public key PKpFor each characteristic value xiEncrypted characteristic vector [ HE (x) is obtained by encryption1),HE(x2),...,HE(xn)]And then sends it to the server.
Third step (real-time detection): after receiving the encrypted data from the client, the server uses the public key PKpAnd encrypting the b value obtained in the off-line training of the first step to obtain HE (b). From the homomorphic nature, HE (ω x + b) can be calculated from the following equation.
Figure BDA0002385835880000201
After that, HE (ω x + b) is sent to the client.
Fourth step (get result): client side using private key SKpAnd decrypting the data to obtain omega x + b, and knowing whether the software is malicious or not according to a formula (9).
Through the process, the server can complete the real-time detection of the malicious software on the premise that the server does not obtain any effective private data of the user.
The invention mainly comprises three parts of credit evaluation, static detection and real-time monitoring. Here, a specific embodiment implemented using the Java language is given. The entire implementation may use a client-server architecture. The tasks of the client are completed by compiling Android client codes. Two servers are used for respectively writing a server code and a third party code in the invention.
In the reputation evaluation stage, the generation of a key is completed according to a specific algorithm of Paillier homomorphic encryption when the server code of the third party is realized. In addition, it is necessary to include code to decrypt and differencing the data from the server. The client code needs to complete the functions of data collection, preliminary processing, public key acquisition and sending to the server according to the specific algorithm in the invention. In the server code of the server, the functions of summing the client data, forwarding to a third party and calculating a reputation value need to be completed according to the specific algorithm of the invention.
In the static detection stage, the client code needs to implement the functions of decompiling the APK, counting the number of system APIs, and sending data to the server. The server code of the server needs to complete two functions of off-line training and on-line detection according to a specific algorithm in the invention. The machine learning algorithm can select supervised learning algorithms such as a support vector machine and a random forest, and the algorithms can be realized in a programming mode.
In the real-time detection stage, a client code needs to complete generation of a public and private key pair according to a Paillier algorithm, and functions of real-time monitoring system call data and encryption sending to a server side also need to be realized. The realization of server codes at the server side needs to complete the functions of offline training and real-time detection.
All of the above can be implemented programmatically according to the specific algorithm details of the present invention. The implementer can select the programming language and the architecture according to the requirement of the implementer.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A multi-malware mixed detection method with privacy protection is characterized by comprising the following steps:
firstly, a third party generates a public and private key pair according to a homomorphic encryption key generation algorithm and publishes a public key to all clients and a server;
secondly, the client collects behavior data of different software used by the user group, performs simple preliminary calculation, encrypts the data by using a public key from a third party, adds the data with the generated random number, and uploads the result to the server;
thirdly, the server side completes calculation of different software credit values by using homomorphic addition property according to the uploaded encrypted data of the user group and through interactive decryption with a third party under the condition that privacy data of the client side are not obtained by using a credit evaluation algorithm, sorts the software credit values according to the sizes of the software credit values and determines the detection sequence of the software;
fourthly, during detection, the server side sequentially and interactively calls API use frequency data which are obtained by using decompilated software APK and correspond to the software with the client side according to the sequence, static detection is sequentially carried out on the software according to a static learning model, and for the software with a static detection result being non-malicious, real-time detection is carried out by utilizing homomorphic addition property and a dynamic learning model according to system call related encrypted data collected by the client side and a public key which is issued by the client side and is generated according to a homomorphic encryption key generation algorithm;
the real-time detection of the multi-malicious software mixed detection method utilizes the system calling sequence data during the software operation to carry out real-time detection, and once a malicious behavior is found, the malicious behavior is immediately reflected to a client; meanwhile, the collected system call information implies the behavior privacy of the user using the software; the method specifically comprises the following steps:
the method comprises the following steps that firstly, key generation of a client side and online model training of a server side are divided into two stages;
off-line training: the server-side carries out simulation operation on the existing sample sets of normal and malicious software, respectively obtains the system calling sequences of the normal and malicious software, selects the feature set of the sequences by using a feature selection algorithm, and converts each sample into a feature vector form for representation based on the feature set; training the model by using an SVM algorithm and obtaining omega and b values in a decision function for real-time detection; the decision function is formulated as:
Figure FDA0003384338750000011
and (3) key generation: the client generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGenp,SKp) And the public key is issued to the server;
secondly, the client side obtains a system call sequence in the software use process in a given time window, and the frequency ({ x) of the corresponding feature occurrence is counted according to the feature seti1, n) and uses the public key PKpFor each characteristic value xiEncrypted characteristic vector [ HE (x) is obtained by encryption1),HE(x2),...,HE(xn)]Then sending the data to a server;
thirdly, after receiving the encrypted data from the client, the server uses the public key PKpEncrypting the value b obtained in the first step of off-line training to obtain HE (b); according to homomorphism properties, calculating the HE (omega x + b) by the following formula;
Figure FDA0003384338750000021
then, sending HE (ω x + b) to the client;
fourthly, the client uses the private key SKpAnd decrypting the data to obtain omega x + b, and obtaining whether the software is malicious or not according to a decision function formula.
2. The multi-malware hybrid detection method with privacy protection as claimed in claim 1, wherein reputation evaluation of the multi-malware hybrid detection method quantifies popularity of software according to usage of software by users, expressed in reputation values; a semi-trusted third party is introduced, and privacy protection is achieved by means of a homomorphic encryption technology.
3. The multi-malware hybrid detection method with privacy protection as recited in claim 2, further comprising:
firstly, a third party generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGenp,SKp) And the public key PKpPublishing to all clients and servers;
secondly, the client collects the use times, duration and frequency information of the software in a given time window, and preliminarily calculates the recommendation credit value s of the software according to a formulakAggregated reputation values
Figure FDA0003384338750000022
And their product
Figure FDA0003384338750000023
Third, the client uses the public key PK provided by the third partypAnd a self-generated random number rkEncrypting the data to obtain HE(s)k+rk) And
Figure FDA0003384338750000031
the encrypted data and rkSending the data to the server side together;
fourthly, after the server side obtains the data from all the clients, all the random numbers are summed to obtain
Figure FDA0003384338750000032
Calculated using the homomorphism properties mentioned
Figure FDA0003384338750000033
And
Figure FDA0003384338750000034
and sending the data to a third party for decryption;
fifthly, the third party uses the private key SK of the third partypDecrypting the received encrypted data to obtain decrypted data
Figure FDA0003384338750000035
And
Figure FDA0003384338750000036
and sending the data to the server;
sixthly, the server receives the data from the third party and then the data known by the server
Figure FDA0003384338750000037
Are subtracted to obtain
Figure FDA0003384338750000038
And
Figure FDA0003384338750000039
secondly, calculating by using a formula to obtain an applied credit value R (i), and obtaining the credit value of the software by the server side on the premise of not knowing data information sent by each client side;
Figure FDA00033843387500000310
wherein,
Figure FDA00033843387500000311
to skIs calculated as follows:
Figure FDA00033843387500000312
wherein y ═ ρ - | r (i) -Vi k(i)|,
Figure FDA00033843387500000313
The initial value of gamma is 0, and when y is less than 0, gamma is equal to gamma + 1; thr is 3, δ is 0.05, and μ is 0.1.
4. The multi-malware hybrid detection method with privacy protection as recited in claim 3, wherein the second step comprises:
1) by monitoring the number of times N that a user uses software in a given time windowi(t), duration UTi(t) and frequency FEi(t), quantifying the use behavior UB, the reflection behavior RB and the association behavior CB of the software, wherein the formula is as follows:
a) quantification of UB, expressed as the UB component of the personal trust value of the user for software i at time t, is formulated as follows:
Figure FDA0003384338750000041
b) the quantification of RB, expressed as the RB component of the user's personal trust value for software i at time t, is formulated as follows:
Ti(t)RB=2(dt{Ni(t)+UTi(t)+FEi(t)});
wherein,
Figure FDA0003384338750000042
c) the quantification of CB, expressed as the CB component of the user's personal trust value for software i at time t, is formulated as follows:
Figure FDA0003384338750000043
Figure FDA0003384338750000044
wherein,
Figure FDA0003384338750000045
2) calculating the trust value T of the user to the software i according to the quantized values of UB, RB and CBi(t):
Figure FDA0003384338750000046
Wherein,
Figure FDA0003384338750000047
3) the reputation value R (i) of the software i and the recommended trust value s for the user using the softwarekAnd an aggregate reputation value calculated based on the user's usage experience
Figure FDA0003384338750000049
Correlation, the calculation formula is as follows:
Figure FDA0003384338750000048
wherein,
Figure FDA0003384338750000051
5. the multi-malware hybrid detection method with privacy protection as claimed in claim 1, wherein static detection of the multi-malware hybrid detection method is implemented after an application program is downloaded and before the application program is installed and used, by decompiling an APK file, a danger level authority and information of a corresponding API are obtained as features, and then detection of malware is completed by using a machine learning method, and the detection process is as follows:
the method comprises the steps that firstly, the service end performs decompiling on the APK of the existing normal software and malicious software of known types, the frequency of occurrence of a system API corresponding to the used dangerous authority is obtained, the frequency is used as a characteristic, a supervised learning algorithm in machine learning is operated to perform model training, and a classifier for performing static detection on the software is obtained;
secondly, the client side decompiles the downloaded APK, obtains the occurrence times of the API corresponding to the danger level authority in the file, and uploads the data to the server side for detection;
and thirdly, after receiving the data from the client, the server detects the software by using a classifier obtained by offline training, judges whether the software is malicious or not and returns the result to the client.
6. A program storage medium storing a computer program for causing an electronic device to perform steps comprising:
firstly, a client collects behavior data of a user group using different software, simply performs primary calculation, and uploads a result to a server;
secondly, the server side calculates credit values corresponding to different software according to the uploaded data by using a credit evaluation algorithm, sorts the credit values according to the credit values of the software and determines the detection sequence of the software;
thirdly, during detection, the server side sequentially interacts with the client side according to the sequence to call API use frequency data which are obtained by using decompilated software APK and correspond to the software, static detection is sequentially carried out on the software according to a static learning model, and for the software with a non-malicious static detection result, real-time detection is carried out according to a dynamic learning model and system call related data collected by the client side;
the real-time detection of the multi-malicious software mixed detection method utilizes the system call sequence data during the software operation to carry out real-time detection, and once a malicious behavior is found, the malicious behavior is immediately reflected to a client; meanwhile, the collected system call information implies the behavior privacy of the user using the software; the method specifically comprises the following steps:
the method comprises the following steps that firstly, key generation of a client side and online model training of a server side are divided into two stages;
off-line training: the server-side carries out simulation operation on the existing sample sets of normal and malicious software, respectively obtains the system calling sequences of the normal and malicious software, selects the feature set of the sequences by using a feature selection algorithm, and converts each sample into a feature vector form for representation based on the feature set; training the model by using an SVM algorithm and obtaining omega and b values in a decision function for real-time detection; the decision function is formulated as:
Figure FDA0003384338750000061
and (3) key generation: the client generates a public and private key Pair (PK) according to a homomorphic encryption key generation algorithm KeyGenp,SKp) And the public key is issued to the server;
secondly, the client side obtains a system call sequence in the software use process in a given time window, and the frequency ({ x) of the corresponding feature occurrence is counted according to the feature seti1, n) and uses the public key PKpFor each characteristic value xiEncrypted characteristic vector [ HE (x) is obtained by encryption1),HE(x2),...,HE(xn)]Then sending the data to a server;
thirdly, after receiving the encrypted data from the client, the server uses the public key PKpEncrypting the value b obtained in the first step of off-line training to obtain HE (b); according to homomorphism properties, calculating the HE (omega x + b) by the following formula;
Figure FDA0003384338750000062
then, sending HE (ω x + b) to the client;
fourthly, the client uses the private key SKpAnd decrypting the data to obtain omega x + b, and obtaining whether the software is malicious or not according to a decision function formula.
7. A computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface to implement the multi-malware hybrid detection method of any one of claims 1-5 when executed on an electronic device.
8. A multi-malware hybrid detection system implementing the multi-malware hybrid detection method of any one of claims 1 to 5, comprising:
the credit evaluation module is used for quantifying the popularity and the influence of the software, determining the detection sequence of the software and preferentially detecting the software with high credit value;
the static detection module is used for decompiling and detecting the APK by using a static detection method after the software is downloaded, and if the software is malicious, the user is advised not to install the software and the software is directly deleted; installing the remaining software which is detected to be normal;
the real-time detection module is used for monitoring a system calling sequence of the software in a specified time window when a user uses the software, acquiring data and detecting the data in real time; and if the malicious software is detected, feeding back to the user.
9. A multi-malware hybrid detection apparatus on which the multi-malware hybrid detection system with privacy protection of claim 8 is mounted, the multi-malware hybrid detection apparatus comprising:
the client is installed in all the user equipment and is responsible for collecting data used for priority evaluation and malicious software detection of each piece of equipment and sending the data to the server;
the server side evaluates related data according to the priorities uploaded by the client side, calculates the credit values of all software and evaluates the priorities of the software; judging whether the software is malicious or not according to the uploaded detection related data and returning a result to the client;
and the third-party module is used for assisting the interactive calculation process of the client and the server and realizing the privacy protection of the client during the priority evaluation and detection of the malicious software.
CN202010097900.8A 2020-02-17 2020-02-17 Multi-malware hybrid detection method, system and device with privacy protection function Active CN111417121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010097900.8A CN111417121B (en) 2020-02-17 2020-02-17 Multi-malware hybrid detection method, system and device with privacy protection function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010097900.8A CN111417121B (en) 2020-02-17 2020-02-17 Multi-malware hybrid detection method, system and device with privacy protection function

Publications (2)

Publication Number Publication Date
CN111417121A CN111417121A (en) 2020-07-14
CN111417121B true CN111417121B (en) 2022-04-12

Family

ID=71494104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010097900.8A Active CN111417121B (en) 2020-02-17 2020-02-17 Multi-malware hybrid detection method, system and device with privacy protection function

Country Status (1)

Country Link
CN (1) CN111417121B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417451B (en) * 2020-11-20 2022-04-12 复旦大学 Malicious software detection method adaptive to intelligent chip hierarchical architecture and based on deep learning
CN112799914B (en) * 2021-01-28 2022-08-05 南湖实验室 Method and system for dynamically supervising codes and data in computer operation in full time
CN112948832A (en) * 2021-03-19 2021-06-11 南京邮电大学 Behavior modeling-based malicious software detection method
CN113271293A (en) * 2021-04-09 2021-08-17 上海大学 Verifiable constrained quadratic programming security outsourcing calculation method and system
CN113378231A (en) * 2021-07-08 2021-09-10 杭州煋辰数智科技有限公司 Privacy calculation method and application of big data application open platform
CN114817999B (en) * 2022-06-28 2022-09-02 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption
CN115665704B (en) * 2022-11-21 2023-03-14 广州天辰信息科技有限公司 Activity privacy safety recommendation method based on big data
CN116150753A (en) * 2022-12-21 2023-05-23 上海交通大学 Mobile end malicious software detection system based on federal learning
CN116070250B (en) * 2023-03-07 2023-06-23 卓望数码技术(深圳)有限公司 Password algorithm evaluation method and device for android system application program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850784A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic vector
CA2935130A1 (en) * 2016-07-26 2018-01-26 Mirza Kamaludeen Encrypted data - computer virus, malware and ransom ware detection system
US10009360B1 (en) * 2016-03-25 2018-06-26 EMC IP Holding Company LLC Malware detection and data protection integration
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method
CN109840417A (en) * 2017-11-28 2019-06-04 清华大学 A kind of malware detection method and device
CN109871681A (en) * 2019-02-28 2019-06-11 天津大学 Android malware detection method is loaded towards dynamic code based on hybrid analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11232201B2 (en) * 2018-05-14 2022-01-25 Sonicwall Inc. Cloud based just in time memory analysis for malware detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850784A (en) * 2015-04-30 2015-08-19 中国人民解放军国防科学技术大学 Method and system for cloud detection of malicious software based on Hash characteristic vector
US10009360B1 (en) * 2016-03-25 2018-06-26 EMC IP Holding Company LLC Malware detection and data protection integration
CA2935130A1 (en) * 2016-07-26 2018-01-26 Mirza Kamaludeen Encrypted data - computer virus, malware and ransom ware detection system
CN109840417A (en) * 2017-11-28 2019-06-04 清华大学 A kind of malware detection method and device
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method
CN109871681A (en) * 2019-02-28 2019-06-11 天津大学 Android malware detection method is loaded towards dynamic code based on hybrid analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Real-Time Detection of Malware Activities by Analyzing Darknet Traffic Using Graphical Lasso;Chansu Han等;《IEEE》;20191031;第144-151页 *
一种基于深度学习的恶意软件家族分类模型;郑锐等;《信息安全学报》;20200131;第1-9页 *

Also Published As

Publication number Publication date
CN111417121A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN111417121B (en) Multi-malware hybrid detection method, system and device with privacy protection function
US10951496B2 (en) System and method for cloud-based control-plane event monitor
Khan A survey of security issues for cloud computing
Lee et al. On security and privacy issues of fog computing supported Internet of Things environment
Yan et al. A study on power side channels on mobile devices
Sharma et al. A survey on analysis and detection of Android ransomware
CN111368297A (en) Privacy protection mobile malicious software detection method, system, storage medium and application
CN112766495A (en) Deep learning model privacy protection method and device based on mixed environment
Wei et al. EPMDroid: Efficient and privacy-preserving malware detection based on SGX through data fusion
Ismail et al. Mobile cloud database security: problems and solutions
Ashawa et al. Analysis of mobile malware: a systematic review of evolution and infection strategies
Batten et al. Smartphone applications, malware and data theft
Kumars et al. A survey of intelligent techniques for Android malware detection
US10521613B1 (en) Adaptive standalone secure software
CN106031079B (en) Operator in Encryption Algorithm is promoted
Yadav et al. A review on malware analysis for iot and android system
Lee et al. A study on realtime detecting smishing on cloud computing environments
Xie et al. Network security analysis for cloud computing environment
Sabbah et al. Android malware detection: a literature review
Muhammad et al. A systematic evaluation of android anti-malware tools for detection of contemporary malware
Chen et al. Identifying threat patterns of android applications
CN112749780B (en) Data processing method, device and equipment
Jeon et al. Automated multi-layered bytecode generation for preventing sensitive information leaks from android applications
Roopak et al. Android malware detection mechanism based on bayesian model averaging
Ogwara et al. Enhancing Data Security in the User Layer of Mobile Cloud Computing Environment: A Novel Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant