CN106548069B - Feature extraction system and method based on sorting algorithm - Google Patents

Feature extraction system and method based on sorting algorithm Download PDF

Info

Publication number
CN106548069B
CN106548069B CN201610563595.0A CN201610563595A CN106548069B CN 106548069 B CN106548069 B CN 106548069B CN 201610563595 A CN201610563595 A CN 201610563595A CN 106548069 B CN106548069 B CN 106548069B
Authority
CN
China
Prior art keywords
features
unit
feature
ranking
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610563595.0A
Other languages
Chinese (zh)
Other versions
CN106548069A (en
Inventor
徐艺航
康学斌
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Antiy Network Technology Co Ltd
Original Assignee
Beijing Antiy Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Antiy Network Technology Co Ltd filed Critical Beijing Antiy Network Technology Co Ltd
Priority to CN201610563595.0A priority Critical patent/CN106548069B/en
Publication of CN106548069A publication Critical patent/CN106548069A/en
Application granted granted Critical
Publication of CN106548069B publication Critical patent/CN106548069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis

Abstract

The invention discloses a feature extraction system and method based on a sorting algorithm, which comprises the following steps: a database unit configured to store a specific number of features including one or more pieces of behavior information, the features being set to feature 1; the characteristic inflow unit is used for extracting the characteristics of one or more black samples to generate corresponding characteristics, and storing the corresponding characteristics into the database unit, wherein the corresponding characteristics are set as characteristics 2; the extraction unit is used for performing feature extraction on one or more samples to be detected to generate corresponding sample features; the detection unit is configured to generate a detection result based on a mode that any one feature element f in the received features 1 and 2 is compared with the sample features for verification; and a sorting unit configured to sort the one or more features by an order of inclusion based on a result of the ranking function of the detection results. The invention overcomes the problem of uneven quality of extracted features caused by the fact that the extracted features are not screened in the traditional automatic feature extraction mode.

Description

Feature extraction system and method based on sorting algorithm
Technical Field
The invention relates to the technical field of computer security, in particular to a feature extraction system and method based on a sorting algorithm.
Background
With the rapid development of computer technology and the wide popularization of the internet, network security events are more and more developed, attack events as large as national level, and as small as ordinary websites are hung on horses, are all related to computer virus information, and from the perspective of network security prevention, the extraction of virus features becomes the primary task of virus identification, so-called features are the virus marks, and the provided features have high identification rate on viruses of this type and cannot cause false alarm of viruses other than this type, namely high-quality features.
The existing virus feature extraction methods mainly comprise two manual extraction modes and an automatic extraction mode, wherein the common manual feature extraction mode is that firstly, a virus sample of the same family is analyzed in a static and dynamic mode, then character strings, network behaviors and the like of the virus sample are analyzed, and features capable of identifying the virus or the virus of the same family are provided; the common automatic feature extraction mode is that the same algorithm is used for extracting features for each virus sample or network data packet, and the automatic features are usually complete hash to avoid false alarm, so the universality is poor, the quantity is large, and the efficiency is low. The extracted features are not screened, that is, the grades of all the features are the same, which causes the quality of the extracted features to be uneven, further influences the detection rate and the false alarm rate of the antivirus engine using the features, and cannot meet the requirement that the high-quality features are required to be used in the environment with limited space.
Disclosure of Invention
In order to solve the technical problem, the invention provides a feature extraction system and method based on a sorting algorithm.
According to a first aspect of the present invention, a feature extraction system based on a ranking algorithm is provided. The system comprises: a database unit configured to store a specific number of features including one or more behavior information, wherein the features are set to feature 1; the characteristic inflow unit is configured to perform characteristic extraction on one or more black samples to generate corresponding characteristics, and store the corresponding characteristics into the database unit, wherein the corresponding characteristics are set as characteristics 2; the extraction unit is used for performing feature extraction on one or more samples to be detected to generate corresponding sample features; the detection unit is configured to generate a detection result based on a mode that any one feature element f in the received features 1 and 2 is compared with the sample features for verification; and an ordering unit configured to order the one or more features including an order based on a ranking function result of the detection result.
In some embodiments, the ranking function further comprises a scoring function s (f) calculated by the following formula:
S(f)=S+n-m;
the initial score of the arbitrary feature element f is set to be S, n represents the number of black samples verified by the arbitrary feature element f in the sample features, and m represents the number of white samples verified as black samples by the arbitrary feature element f in the sample features.
In some embodiments, the sorting unit is capable of sorting in descending or ascending order using the ranking function.
In some embodiments, further comprising:
a deletion unit configured to delete one or more features forming the certain number of features based on the sorting.
In some embodiments, further comprising:
a deletion unit configured to delete the one or more features based on the m value, wherein m >3, then deleting the corresponding feature element f.
In some embodiments, the deleting unit is configured to delete a preset number of features ranked back or front to form the specific number of features based on the ranking.
According to a second aspect of the present invention, there is provided a feature extraction method of a feature extraction system based on a ranking algorithm, comprising: based on a specific number of features stored in the database unit including one or more behavioral information, wherein the feature is set to feature 1; performing feature extraction on one or more black samples based on the feature inflow unit to generate corresponding features, and storing the corresponding features into a database unit, wherein the corresponding features are set as features 2; performing feature extraction on one or more samples to be detected based on the extraction unit to generate corresponding sample features; generating a detection result based on a mode of comparing and verifying any one characteristic element f in the characteristics 1 and 2 received by the detection unit with the characteristics of the sample; and sorting the one or more features according to the sorting order of the ranking function result of the detection result based on the sorting unit.
In some embodiments, the ranking function further comprises a scoring function s (f) calculated by the following formula:
S(f)=S+n-m
the initial score of the arbitrary feature element f is set to be S, n represents the number of black samples verified by the arbitrary feature element f in the sample features, and m represents the number of white samples verified as black samples by the arbitrary feature element f in the sample features.
In some embodiments, the sorting unit is capable of sorting in descending or ascending order using the ranking function.
In some embodiments, further comprising:
deleting one or more features for the ordering based on a deletion unit forms the particular number of features.
In some embodiments, further comprising:
deleting the one or more features for the m value based on a deletion unit, wherein m >3, then deleting the corresponding feature element f.
In some embodiments, the deleting unit deletes a preset number of features ranked backward or forward based on the ranking to form the specific number of features.
By using the system and the method, the extracted features can be used for carrying out comparison detection on the sample features, and the detection results are ranked and eliminated by combining a sorting algorithm so as to perfect the features in the feature library, improve the efficiency of feature extraction, extract high-quality features and meet the requirement of a user on using the high-quality features in a limited space environment.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 shows a block diagram of a ranking algorithm based feature extraction system according to an embodiment of the invention.
Fig. 2 shows a flow chart of a feature extraction method based on a ranking algorithm according to an embodiment of the invention.
Detailed Description
In the following detailed description of the preferred embodiments of the present invention, reference is made to the accompanying drawings, in which details and functions that are not necessary for the invention are omitted so as not to obscure the understanding of the present invention. While exemplary embodiments are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Hereinafter, the features of all types of computer viruses (including common infectious viruses, Word and Excel macro viruses, boot sector viruses, script viruses, trojans, backdoor programs, keyloggers, password thieves, etc.) are collectively referred to as "features" for ease of description. Those skilled in the art will appreciate that the "signature" hereinafter may be any form of computer virus signature.
FIG. 1 shows a block diagram of a ranking algorithm based feature extraction system according to an embodiment of the invention. As shown in fig. 1, the system may include: database unit 110, feature inflow unit 120, extraction unit 130, detection unit 140, and sorting unit 150.
The database unit 110 is configured to store a specific number of features including one or more behavior information, wherein the feature is set to feature 1.
Wherein a certain number of features may be user-set to maintain a limited set of features. The features referred to herein may be those that have been originally identified as a computer virus, the stage of feature extraction being formed at system set-up.
And a feature inflow unit 120 configured to perform feature extraction on one or more black samples to generate corresponding features, and store the corresponding features in the database unit, wherein the corresponding features are set as feature 2.
The feature inflow unit 120 may continuously flow the automatically extracted black sample features into the feature set, and continuously flow the sample to ensure continuous addition, update, and propagation of the features.
The extracting unit 130 is configured to perform feature extraction on one or more samples to be detected to generate corresponding sample features.
In which a large number of black and white samples are received and feature extraction is performed on these samples to generate corresponding sample features.
A large number of black and white samples continuously flowing in are subjected to feature extraction in the extraction unit 130, and the generated several sample features are transmitted to the detection unit 140, wherein the black and white samples are subjected to detection setting by the user before feature flowing in.
The detection unit 140 is configured to generate a detection result based on a manner of comparing and verifying any one feature element f of the received features 1 and 2 with the sample feature; the detection result includes the sample feature number that matches or does not match the feature element f.
Preferably, the number of black and white samples in the detected sample feature matching the feature element f is represented by n and m, where n and m are positive integers.
A sorting unit 150 configured to sort the one or more features by an order of inclusion based on a result of the ranking function of the detection results.
Wherein the detecting unit 140 sends the detection results to the sorting unit 150. After receiving the detection result, the sorting unit 150 scores the result by using the scoring function s (f) in the ranking function, and sorts the result of the one or more feature ranking functions, and preferably, the sorting unit 150 can sort the result by using the ranking function in a descending manner.
In some embodiments, the ranking function further comprises a scoring function s (f) calculated by the following formula:
S(f)=S+n-m;
the initial score of any feature element f is set to be S, n represents the number of black samples verified by any feature element f in the sample features, and m represents the number of white samples verified as black samples by any feature element f in the sample features.
In some embodiments, the sorting unit 150 can sort in descending order using a ranking function. Wherein, the sorting unit 150 can also sort in ascending order by using a ranking function.
Specifically, if a descending sorting mode is adopted, deleting one or more next features to form a specific number of features; if an ascending sort is used, then the top feature or features are deleted to form a particular number of features. The relationship between the sorting and the deletion can be set according to the requirements of the user.
In some embodiments, further comprising:
a deleting unit 160 configured to delete one or more features based on the ranking to form a certain number of features.
In some embodiments, further comprising:
a deleting unit 160 configured to delete one or more features based on the value of m, wherein m >3, the corresponding feature element f is deleted.
Specifically, when the detected white sample number m >3, the deletion unit 160 deletes the corresponding feature element f directly from the feature set, and the deleted features will not be sorted any more.
In some embodiments, the deleting unit 160 is configured to delete a preset number of features ranked back or front to form a particular number of features based on the ranking.
Specifically, the number of deletions may be set in advance by the user according to different production environments.
In another embodiment, the number of deleted features is the same as the number of features 2. Where the number of features deleted is the same as the number of features 2 to produce a set of features 1, and the number is guaranteed to be the same as the particular number set by the user.
Fig. 2 shows a flowchart of a feature extraction method based on a ranking algorithm according to an embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
s210, based on a specific number of features including one or more behavior information stored in the database unit 110, wherein the feature is set to feature 1;
s220, extracting the characteristics of one or more black samples based on the characteristic inflow unit 120 to generate corresponding characteristics, and storing the corresponding characteristics into a database unit, wherein the corresponding characteristics are set as characteristics 2;
s230, performing feature extraction on one or more samples to be detected based on the extraction unit 130 to generate corresponding sample features;
s240, generating a detection result based on a mode of comparing and verifying any one feature element f in the feature 1 and the feature 2 received by the detection unit 140 with the sample feature;
s250, sorting the one or more features according to the sorting unit 150 for the ranking function result of the detection result.
In some embodiments, the ranking function further comprises a scoring function s (f) calculated by the following formula:
S(f)=S+n-m
the initial score of any feature element f is set to be S, n represents the number of black samples verified by any feature element f in the sample features, and m represents the number of white samples verified as black samples by any feature element f in the sample features.
In some embodiments, the sorting unit 150 can sort in descending or ascending order using a ranking function.
In some embodiments, further comprising:
s260, deleting one or more features for sorting based on the deleting unit 160 forms a certain number of features.
In some embodiments, further comprising:
based on the deletion unit 160 deleting one or more features for the value of m, where m >3, the corresponding feature element f is deleted.
Specifically, when the detected white sample number m >3, the deletion unit 160 deletes the corresponding feature element f directly from the feature set, and the deleted features will not be sorted any more.
In some embodiments, the deleting unit 160 deletes a preset number of features ranked back or front based on the ranking to form a specific number of features.
Specifically, the number of deletions may be set in advance by the user according to different production environments.
In another embodiment, the number of deleted features is the same as the number of features 2. Where the number of features deleted is the same as the number of features 2 to produce a set of features 1, and the number is guaranteed to be the same as the particular number set by the user.
In summary, in the embodiments disclosed in the present invention, a large number of black and white samples are continuously used to flow into the database feature library, all features in the feature library are detected by using the extracted features, and are compared and detected, and the detection results are ranked and eliminated by combining with the sorting algorithm, and are deleted periodically, so that the detection and false alarm conditions of the verification features can be effectively and quickly obtained, the features in the feature library are continuously improved, the efficiency of feature extraction is improved, and simultaneously, the high-quality features can be extracted.
The invention has thus been described with reference to the preferred embodiments. It should be understood by those skilled in the art that various other changes, substitutions, and additions may be made without departing from the spirit and scope of the invention. The scope of the invention is therefore not limited to the particular embodiments described above, but rather should be determined by the claims that follow.

Claims (12)

1. A system for feature extraction based on a ranking algorithm, comprising:
a database unit configured to store a specific number of features including one or more behavior information, wherein the features are set to feature 1;
the characteristic inflow unit is configured to perform characteristic extraction on one or more black samples to generate corresponding characteristics, and store the corresponding characteristics into the database unit, wherein the corresponding characteristics are set as characteristics 2;
the extraction unit is used for performing feature extraction on one or more samples to be detected to generate corresponding sample features;
the detection unit is configured to generate a detection result based on a mode that any one feature element f in the received features 1 and 2 is compared with the sample features for verification; and
a ranking unit configured to rank the one or more features by an order of inclusion based on a ranking function result of the detection result.
2. The system of claim 1, wherein the ranking function further comprises a scoring function s (f) calculated by the following formula:
S(f)=S+n-m;
the initial score of the arbitrary feature element f is set to be S, n represents the number of black samples verified by the arbitrary feature element f in the sample features, and m represents the number of white samples verified as black samples by the arbitrary feature element f in the sample features.
3. The system of claim 1, wherein the sorting unit is capable of sorting in a descending order or an ascending order using the ranking function.
4. The system of any one of claims 1 to 3, further comprising:
a deletion unit configured to delete one or more features forming the certain number of features based on the sorting.
5. The system of claim 2, further comprising:
a deletion unit configured to delete the one or more features based on the m value, wherein m >3, then deleting the corresponding feature element f.
6. The system according to claim 4, wherein the deleting unit is configured to delete a preset number of features ranked back or front based on the ranking to form the specific number of features.
7. A feature extraction method of a ranking algorithm based feature extraction system as claimed in claim 1, comprising:
based on a specific number of features stored in the database unit including one or more behavioral information, wherein the feature is set to feature 1;
performing feature extraction on one or more black samples based on the feature inflow unit to generate corresponding features, and storing the corresponding features into a database unit, wherein the corresponding features are set as features 2;
performing feature extraction on one or more samples to be detected based on the extraction unit to generate corresponding sample features;
generating a detection result based on a mode of comparing and verifying any one characteristic element f in the characteristics 1 and 2 received by the detection unit with the characteristics of the sample; and
and ordering the one or more characteristics according to the ranking function result of the detection result based on the ordering unit.
8. The method of claim 7, wherein the ranking function further comprises a scoring function S (f) calculated by the following formula:
S(f)=S+n-m
the initial score of the arbitrary feature element f is set to be S, n represents the number of black samples verified by the arbitrary feature element f in the sample features, and m represents the number of white samples verified as black samples by the arbitrary feature element f in the sample features.
9. The method of claim 7, wherein the sorting unit is capable of sorting in descending or ascending order using the ranking function.
10. The method of any of claims 7 to 9, further comprising:
deleting one or more features for the ordering based on a deletion unit forms the particular number of features.
11. The method of claim 8, further comprising:
deleting the one or more features for the m value based on a deletion unit, wherein m >3, then deleting the corresponding feature element f.
12. The method according to claim 10, wherein the deleting unit deletes a preset number of features ranked in the back or in the front based on the ranking to form the specific number of features.
CN201610563595.0A 2016-07-18 2016-07-18 Feature extraction system and method based on sorting algorithm Active CN106548069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610563595.0A CN106548069B (en) 2016-07-18 2016-07-18 Feature extraction system and method based on sorting algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610563595.0A CN106548069B (en) 2016-07-18 2016-07-18 Feature extraction system and method based on sorting algorithm

Publications (2)

Publication Number Publication Date
CN106548069A CN106548069A (en) 2017-03-29
CN106548069B true CN106548069B (en) 2020-04-24

Family

ID=58367803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610563595.0A Active CN106548069B (en) 2016-07-18 2016-07-18 Feature extraction system and method based on sorting algorithm

Country Status (1)

Country Link
CN (1) CN106548069B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923617A (en) * 2010-08-18 2010-12-22 奇智软件(北京)有限公司 Cloud-based sample database dynamic maintaining method
CN101984450A (en) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 Malicious code detection method and system
CN103632091A (en) * 2012-08-21 2014-03-12 腾讯科技(深圳)有限公司 Malicious feature extraction method and device and storage media
CN103761476A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Characteristic extraction method and device
CN104700033A (en) * 2015-03-30 2015-06-10 北京瑞星信息技术有限公司 Virus detection method and virus detection device
CN105743877A (en) * 2015-11-02 2016-07-06 哈尔滨安天科技股份有限公司 Network security threat information processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923617A (en) * 2010-08-18 2010-12-22 奇智软件(北京)有限公司 Cloud-based sample database dynamic maintaining method
CN101984450A (en) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 Malicious code detection method and system
CN103632091A (en) * 2012-08-21 2014-03-12 腾讯科技(深圳)有限公司 Malicious feature extraction method and device and storage media
CN103761476A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Characteristic extraction method and device
CN104700033A (en) * 2015-03-30 2015-06-10 北京瑞星信息技术有限公司 Virus detection method and virus detection device
CN105743877A (en) * 2015-11-02 2016-07-06 哈尔滨安天科技股份有限公司 Network security threat information processing method and system

Also Published As

Publication number Publication date
CN106548069A (en) 2017-03-29

Similar Documents

Publication Publication Date Title
US10303874B2 (en) Malicious code detection method based on community structure analysis
US9715588B2 (en) Method of detecting a malware based on a white list
KR101162051B1 (en) Using string comparison malicious code detection and classification system and method
WO2019128529A1 (en) Url attack detection method and apparatus, and electronic device
US10212114B2 (en) Systems and methods for spam detection using frequency spectra of character strings
US8954519B2 (en) Systems and methods for spam detection using character histograms
US10789366B2 (en) Security information management system and security information management method
CN105224600B (en) A kind of detection method and device of Sample Similarity
US20120174227A1 (en) System and Method for Detecting Unknown Malware
CN111382430A (en) System and method for classifying objects of a computer system
EP3346664B1 (en) Binary search of byte sequences using inverted indices
CN103020521B (en) Wooden horse scan method and system
Naik et al. Cyberthreat Hunting-Part 1: triaging ransomware using fuzzy hashing, import hashing and YARA rules
WO2020082763A1 (en) Decision trees-based method and apparatus for detecting phishing website, and computer device
Naik et al. Augmented YARA rules fused with fuzzy hashing in ransomware triaging
KR20130071617A (en) System and method for detecting variety malicious code
WO2018047027A1 (en) A method for exploring traffic passive traces and grouping similar urls
US11487876B1 (en) Robust whitelisting of legitimate files using similarity score and suspiciousness score
CN106548069B (en) Feature extraction system and method based on sorting algorithm
US20230104884A1 (en) Method for detecting webpage spoofing attacks
Yazhmozhi et al. Natural language processing and Machine learning based phishing website detection system
CN107203718B (en) Detection method and system for SQL command injection
CN111970272A (en) APT attack operation identification method
CN103501294A (en) Method for judging whether program is malicious or not
Han Detection of web application attacks with request length module and regex pattern analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 Beijing city Haidian District minzhuang Road No. 3, Tsinghua Science Park Building 1 Yuquan Huigu a

Applicant after: Beijing ahtech network Safe Technology Ltd

Address before: 100080 Zhongguancun Haidian District street, No. 14, layer, 1 1415-16

Applicant before: Beijing Antiy Electronic Installation Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant