CN114969725A - Target command identification method and device, electronic equipment and readable storage medium - Google Patents

Target command identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114969725A
CN114969725A CN202210404482.1A CN202210404482A CN114969725A CN 114969725 A CN114969725 A CN 114969725A CN 202210404482 A CN202210404482 A CN 202210404482A CN 114969725 A CN114969725 A CN 114969725A
Authority
CN
China
Prior art keywords
classification
positive
command
word
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210404482.1A
Other languages
Chinese (zh)
Inventor
黄健文
丁奕
朱林
陈秋华
廖志芳
苏丽裕
董钟豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Internet Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Internet Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210404482.1A priority Critical patent/CN114969725A/en
Publication of CN114969725A publication Critical patent/CN114969725A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a target command identification method, belongs to the field of computer security, and is used for improving the efficiency, accuracy and generalization capability of target command detection. The method comprises the following steps: constructing positive and negative samples of a classification training set according to a plurality of operation codes in the operation command; establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples; determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; constructing a classification library according to the expanded feature word library; training the classification model through the classification library; and inputting the target command into the classification model for classification to obtain the type of the target command.

Description

Target command identification method and device, electronic equipment and readable storage medium
Technical Field
The application belongs to the field of computer security, and particularly relates to a target command identification method and device, electronic equipment and a computer-readable storage medium.
Background
With the development of the internet, providing a technology for recognizing commands is increasingly important. For example, the development of the internet has provided great convenience to people, as well as providing some vandals with a way to attack the user's computer device. Vandals typically employ operational commands that can be run on a computer device as the primary means of attack. Since a large amount of confidential data is often stored on a computer device, if an attack is made, a very serious loss is usually caused to a user. Accordingly, there is a need to provide techniques for identifying malicious commands. In the related art, the method for detecting the operation command mainly includes: detection techniques based on signature scanning.
The related art has the following disadvantages: the detection efficiency is low, the false alarm rate is high, and the generalization capability is not high.
Disclosure of Invention
The embodiment of the application provides a target command identification method and device, electronic equipment and a computer readable storage medium, which can improve the efficiency, accuracy and generalization capability of target command detection.
In a first aspect, an embodiment of the present application provides a target command identification method, where the method includes: constructing positive and negative samples of a classification training set according to a plurality of operation codes in the operation command; establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples; determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; constructing a classification library according to the expanded feature word library; training the classification model through the classification library; and inputting the target command into the classification model for classification to obtain the type of the target command.
In a second aspect, an embodiment of the present application provides an apparatus for target command identification, including: the sample construction module is used for constructing positive and negative samples of the classification training set according to a plurality of operation codes in the operation command; the word stock construction module is used for establishing a feature word stock corresponding to the positive and negative samples according to the positive and negative samples; the word bank expanding module is used for determining the operation codes and the parameter construction word characteristics carried by the operation codes and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; the classification library construction module is used for constructing a classification library according to the expanded feature word library; a training module for training the classification model through the classification library; and the classification module is used for inputting the target command into the classification model for classification to obtain the type of the target command.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In the embodiment of the application, positive and negative samples of a classification training set are constructed according to a plurality of operation codes in an operation command; establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples; determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; constructing a classification library according to the expanded feature word library; training the classification model through the classification library; and inputting the target command into the classification model for classification to obtain the type of the target command. Thus, the efficiency, accuracy and generalization capability of target command detection can be improved.
Drawings
Fig. 1 is a schematic flowchart of a target command identification method according to an embodiment of the present disclosure;
FIG. 2 is a schematic block diagram of a target command recognition apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
FIG. 4 is a schematic structural diagram of a classification model according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The target command identification method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Fig. 1 shows a schematic flow diagram of a target command recognition method 100 provided by an embodiment of the present invention, which may be performed by an electronic device, and the electronic device may include: a server and/or a terminal device. In other words, the method may be performed by software or hardware installed in the electronic device, the method comprising the steps of:
s102: and constructing positive and negative samples of the classification training set according to a plurality of operation codes in the operation command.
In one implementation, the target command identification method establishes a sample to be extracted according to an operation code of the operation command; and recalling the operation command through the sample to be extracted, and dividing the type of the operation command into a positive sample or a negative sample.
The operation code refers to a part of an instruction or a field (usually represented by code) specified in a computer program to perform an operation, namely, an instruction sequence number, which is used to tell a processor CPU which instruction needs to be executed. Therefore, the operation code is used for establishing a sample to be extracted, and the classification of the type of the operation command is facilitated.
And recalling the operation command through the sample to be extracted, wherein the matching operation command is a positive sample or a negative sample. Preferably, the matching operation command is a primary positive sample, a secondary positive sample, a tertiary positive sample and a tertiary negative sample. For example, the level of the positive sample is divided into one level, two levels and three levels according to the degree of damage that the positive sample can cause, and then the sample feature word library includes a first level positive sample, a second level positive sample, a third level positive sample and a negative sample. Therefore, the malicious command can be recognized, and the levels of the recognized malicious command can be favorably pre-warned and processed according to different levels.
S104: and establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples.
In one implementation, the target command identification method performs feature extraction on the sample to be extracted by using a predetermined algorithm of a variable window; and establishing a feature word library according to the result extracted by the preset algorithm of the variable window.
The feature extraction can be performed on the sample to be extracted by using an N-gram algorithm, for example, a 4-gram algorithm. In the 4-gram algorithm, 4 refers to the length of the sliding window to ensure the quantity and quality of the extracted features. In a sliding window, a shorter sequence of operations in the operation command may be extracted, which may obtain a certain program semantic of the operation command.
Assuming that the sample to be extracted is { call, push, add, mov, adc, anc }, the obtained operation sequence is { (call, push, add, mov), (push, add, mov, adc), (add, mov, adc, anc) }.
However, the number of opcodes that can best characterize different target commands may be different, and therefore, the generalization performance of the final calculation result is poor due to the fixed N value. Therefore, the proposal creatively provides an extraction method with variable N value, thereby improving the generalization performance of malicious command identification.
For example, in the related art, N-4 and N-5 in the gram algorithm. Assuming that the sample to be extracted is { call, push, add, mov, adc, anc }, when N is 4, the obtained operation sequence is { (call, push, add, mov), (push, add, mov, adc), (add, mov, adc, anc) }; when N is 5, the obtained operation sequence is { (call, push, add, mov, adc), (push, add, mov, adc, anc) }. Finally, combining the two results in the operation sequence { (call, push, add, mov), (push, add, mov, adc), (add, mov, adc, anc), (call, push, add, mov, adc), (push, add, mov, adc, anc) }. Therefore, in order to improve the generalization capability of the target command, the window length of 4 and the window length of 5 are combined in the scheme, so that the possibility of code omission can be effectively reduced.
Meanwhile, the problem of the related technology is that the length of the window cannot be adjusted dynamically according to the sample to be detected by solely using the mode of manually selecting the N value of the gram algorithm, and meanwhile, the length of the window cannot be adjusted dynamically by only detecting the sample in the mode of manually setting the length.
Therefore, the scheme provides a detection method for dynamically adjusting the length of a detection window, and the core idea of the method is to dynamically adjust the length of the window by using the window length and the number of the screened unmatched characters with the current length. The specific algorithm for adjusting the window length is as follows:
Figure BDA0003601721030000051
wherein, the character string set S, the character string P obtained after query, wherein Q top-k Is the final length of the window, where τ' top-k is the desired window length.
The top-k value can be obtained by using related similarity calculation sequencing, and the expected top-k value can be obtained by an expected algorithm calculation mode or can be set by human experience.
In the above manner, the dynamically adjusted window length can ensure that similar character strings with less omission are omitted as much as possible.
S106: determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements.
In one implementation manner, the target command identification method extracts an operation code of the operation command and a parameter carried by the operation code; extracting target parameters with carrying frequency larger than a target value, and expanding a feature word library of the operation code according to the target parameters, wherein the carrying frequency is determined according to the frequency of the operation code appearing in the operation command and the frequency of the parameter appearing in the operation command.
Wherein, for the same operation code, although the word sense expressed by different parameters is greatly deviated, a malicious command always has repeated calling for some parameters. Therefore, a more preferable scheme is to use the point as a basis for expanding the feature word library.
Taking a positive sample as an example, a single operation code and parameters carried by the operation code in a malicious command are extracted, a carrying frequency f is set,
Figure BDA0003601721030000061
wherein S is a For the number of occurrences of an opcode in a malicious command, S 1 And extracting the parameters with the frequency f larger than a certain numerical value for the times of the occurrence of certain parameters in the malicious commands, and expanding the parameters into a feature word library of the operation codes.
For the aforementioned parameters, the same parameters of different opcodes need to be distinguished and treated with different feature words.
S108: and constructing a classification library according to the expanded feature word library.
In one implementation, the target command identification method recalls samples matched to the operation sequences and the operation code parameters through the feature thesaurus; performing word segmentation on the sample, completing vectorization coding, and taking a mean value to obtain a sentence vector; and constructing a classification library based on the sentence vectors and the target labels, wherein the target labels comprise primary positive samples, secondary positive samples, tertiary positive samples and tertiary negative samples.
And recalling the samples matched with the operation sequences and the operation code parameters through the feature word library, and forming a classification library by the samples and the labels. The label includes a positive swatch and a negative swatch. Optionally, the label includes a primary positive sample, a secondary positive sample, and a tertiary positive sample, a negative sample.
Each label has a plurality of samples therein; correspondingly, each sample has a sentence vector, wherein the sentence vector is obtained by performing word segmentation on each sample in the classification library, completing vectorization coding and finally taking the average value.
S110: training the classification model through the classification library.
In one implementation, the target command recognition method constructs a first classification model, wherein the first classification model comprises a feature extraction model and a second classification model; inputting an operation command as a text to be classified into the feature extraction model, and outputting the extracted first feature; classifying the first features by using the second classification model to obtain the type of the text to be classified; calculating loss information according to the type of the sample to be classified and the preset type of the text to be classified; and updating parameters of the first classification model based on the loss information to obtain a third classification model.
And constructing a classification model which comprises a feature extraction model and a second classification model as shown in the following formula. The feature extraction model can take models such as LSTM, CNN and GRU as the feature extraction model; as shown in the following formula, BI-LSTM was used. The classification model adopts am-soft.
The training process is as follows:
the feature extraction model takes an input operation command as a text to be classified, and the output result is y, namely the extracted features. And then classifying the extracted features by using am-soft to obtain the type (malicious command or non-malicious command) of the sample to be classified.
The model training process is calculated according to the following principle that the loss information of the text sample is minimized:
p=am-softmax(<y,c 1 >,<y,c 2 >,...,<y,c n >)
and
Figure BDA0003601721030000071
c i a sentence vector for each sample in step 3.
And obtaining the model after training is finished after the training is finished.
The structure of the first classification model is shown in fig. 4, where X0, X1, X2, and X3 are vectors obtained by extracting features of samples to be classified in step 1 and step 2, the vectors are encoded by the BI-LSTM classification model and then output as concat (i.e., output result y), all output concat is transformed into a fully-connected FC layer, and an am-softmax classifier is used to calculate similarity, so as to determine whether the samples to be classified are malicious commands.
More specifically, the training process is as follows:
1. inputting feature texts of which the categories (primary positive samples, secondary positive samples, tertiary positive samples and negative samples) are explicitly classified;
2. obtaining vector codes of the feature texts through a bidirectional LSTM network;
3. inputting the vector code to a classifier am-softmax;
4. the classifier matches the corresponding vector code (i.e., parameters) based on the known classification category, completing the training of the classifier.
S112: and inputting the target command into the classification model for classification to obtain the type of the target command.
In one implementation, the target command identification method performs word segmentation according to the target command to complete vectorization coding, and obtains a sentence vector by taking a mean value; and inputting the sentence vector into the trained third classification model for similarity calculation to obtain the classification type of the sample to be detected.
And obtaining a sentence vector according to the sample to be detected, and inputting the sentence vector into the trained model to obtain the operation code type. The specific process is as follows:
1. inputting a text to be classified;
2. obtaining vector codes of texts to be classified through a bidirectional LSTM network;
3. inputting the vector code to a classifier am-softmax;
4. and sorting the matching results of similarity calculation and defining the finally classified category through the trained corresponding relation between the codes and the categories and the similarity calculation in the classifier.
The specific principle of classification according to similarity is as follows:
and judging the type of the input operation command according to the obtained similarity: judging that the similarity of a first-level (or second-level or third-level) positive sample with the malicious command exceeds a preset positive similarity threshold, and the similarity of a negative sample with the malicious command is lower than a preset negative similarity threshold, and judging the command as a first-level (or second-level or third-level) malicious command; on the contrary, if the similarity of the negative samples with the malicious command exceeds the preset negative similarity threshold, and the similarity of the positive samples with all levels (primary, secondary and tertiary) of the malicious command is lower than the preset positive similarity threshold, the malicious command is judged to be a non-malicious command.
According to the target command identification method provided by the embodiment of the invention, positive and negative samples of a classification training set are constructed according to a plurality of operation codes in an operation command; establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples; determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; constructing a classification library according to the expanded feature word library; training the classification model through the classification library; and inputting the target command into the classification model for classification to obtain the type of the target command. Thus, the efficiency, accuracy and generalization capability of target command detection can be improved.
Fig. 2 is a schematic structural diagram of a target command recognition apparatus according to an embodiment of the present invention. As shown in fig. 2, the target command recognition apparatus 200 includes: the system comprises a sample construction module 202, a word bank construction module 204, a word bank expansion module 206, a classification bank construction module 208, a training module 210 and a classification module 212.
The sample construction module 202 is configured to construct positive and negative samples of the classification training set according to the multiple operation codes in the operation command;
a word bank construction module 204, configured to establish a feature word bank corresponding to the positive and negative samples according to the positive and negative samples;
a word bank expanding module 206, configured to determine the operation code and the parameter construction word feature carried by the operation code, and expand the feature word bank corresponding to the positive and negative samples through the word feature meeting the predetermined requirement;
a classification library construction module 208, configured to construct a classification library according to the expanded feature word library;
a training module 210, configured to train the classification model through the classification library;
and the classification module 212 is configured to input the target command into the classification model for classification, so as to obtain the type of the target command.
In one implementation, the sample construction module 202 is configured to construct a sample to be extracted according to an operation code of the operation command; and recalling the operation command through the sample to be extracted, and dividing the type of the operation command into a positive sample or a negative sample.
In one implementation, the feature lexicon constructing module 204 is configured to perform feature extraction on the sample to be extracted by using a predetermined algorithm of a variable window; and establishing a feature word library according to the result extracted by the preset algorithm of the variable window.
In one implementation, the lexicon expansion module 206 is configured to extract an operation code of the operation command and a parameter carried by the operation code; extracting a target parameter with a carrying frequency greater than a target numerical value, and expanding a feature lexicon of the operation code according to the target parameter, wherein the carrying frequency is determined according to the frequency of the operation code appearing in the operation command and the frequency of the parameter appearing in the operation command.
In one implementation, the classification library construction module 208 is configured to recall the sample matched to the operation sequence and the operation code parameter through the feature thesaurus; performing word segmentation on the sample, completing vectorization coding, and taking a mean value to obtain a sentence vector; and constructing a classification library based on the sentence vectors and the target labels, wherein the target labels comprise primary positive samples, secondary positive samples, tertiary positive samples and tertiary negative samples.
In one implementation, the training module 210 is configured to construct a first classification model, where the first classification model includes a feature extraction model and a second classification model; inputting an operation command as a text to be classified into the feature extraction model, and outputting the extracted first feature; classifying the first features by using the second classification model to obtain the type of the text to be classified; calculating loss information according to the type of the sample to be classified and the preset type of the text to be classified; and updating parameters of the first classification model based on the loss information to obtain a third classification model.
In one implementation, the classification module 212 is configured to perform word segmentation according to the target command, complete vectorization coding, and obtain a sentence vector by taking a mean value; and inputting the sentence vector into the trained third classification model for similarity calculation to obtain the classification type of the sample to be detected.
The target command recognition device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The target command recognition device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which is not specifically limited in the embodiment of the present application.
The target command identifying device provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.
Fig. 3 is a schematic diagram of a hardware structure of an electronic device to implement the embodiment of the present application, where the electronic device may be a terminal device or a server device, and the electronic device includes: antenna 301, radio frequency device 302, baseband device 303, network interface 304, memory 305 and processor 306, programs or instructions stored on memory 305 and executable on said processor 306, which when executed by processor 306, implement:
wherein, the processor 306 is configured to construct positive and negative samples of the classification training set according to a plurality of operation codes in the operation command; establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples; determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements; constructing a classification library according to the expanded feature word library; training the classification model through the classification library; and inputting the target command into the classification model for classification to obtain the type of the target command.
In one implementation, the processor 306 is configured to establish a sample to be extracted according to an operation code of the operation command; and through a sample recall operation command to be extracted, dividing the type of the operation command into a positive sample or a negative sample.
In one implementation, the processor 306 is configured to perform feature extraction on the sample to be extracted by using a predetermined algorithm of a variable window; and establishing a feature word library according to the result extracted by the preset algorithm of the variable window.
In one implementation, the processor 306 is configured to extract an operation code of the operation command and a parameter carried by the operation code; extracting a target parameter with a carrying frequency greater than a target numerical value, and expanding a feature lexicon of the operation code according to the target parameter, wherein the carrying frequency is determined according to the frequency of the operation code appearing in the operation command and the frequency of the parameter appearing in the operation command.
In one implementation, the processor 306 is configured to recall the samples matched to the operation sequence and the opcode parameters from the feature thesaurus; performing word segmentation on the sample, completing vectorization coding, and taking a mean value to obtain a sentence vector; and constructing a classification library based on the sentence vectors and the target labels, wherein the target labels comprise primary positive samples, secondary positive samples, tertiary positive samples and tertiary negative samples.
In one implementation, the processor 306 is configured to construct a first classification model, wherein the first classification model includes a feature extraction model and a second classification model; inputting an operation command as a text to be classified into the feature extraction model, and outputting the extracted first feature; classifying the first features by using the second classification model to obtain the type of the text to be classified; calculating loss information according to the type of the sample to be classified and the preset type of the text to be classified; and updating parameters of the first classification model based on the loss information to obtain a third classification model.
In one implementation, the processor 306 is configured to perform word segmentation according to the target command, complete vectorization coding, and obtain a sentence vector by taking an average value; and inputting the sentence vector into the trained third classification model for similarity calculation to obtain the classification type of the sample to be detected.
The electronic device 300 according to the embodiment of the present application may refer to the process corresponding to the method 100-200 according to the embodiment of the present application, and each unit/module and the other operations and/or functions in the electronic device 100 are respectively for implementing the corresponding process in the method 100-200, and can achieve the same or equivalent technical effects, and for brevity, no further description is provided herein.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above target command identification method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above 100-200 method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of target command identification, the method comprising:
constructing positive and negative samples of a classification training set according to a plurality of operation codes in the operation command;
establishing a feature word library corresponding to the positive and negative samples according to the positive and negative samples;
determining the operation codes and the parameter construction word characteristics carried by the operation codes, and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements;
constructing a classification library according to the expanded feature word library;
training the classification model through the classification library;
and inputting the target command into the classification model for classification to obtain the type of the target command.
2. The method of claim 1, wherein constructing positive and negative samples of a classification training set from a plurality of opcodes in an operation command comprises:
establishing a sample to be extracted according to the operation code of the operation command;
and recalling the operation command through the sample to be extracted, and dividing the type of the operation command into a positive sample or a negative sample.
3. The method of claim 1, wherein the establishing a feature lexicon corresponding to the positive and negative samples according to the positive and negative samples comprises:
performing feature extraction on the sample to be extracted by using a predetermined algorithm of a variable window;
and establishing a feature word library according to the result extracted by the preset algorithm of the variable window.
4. The method of claim 1, wherein the determining the operation code and the parameters carried by the operation code to construct word features and expanding the feature lexicon corresponding to the positive and negative samples through the word features meeting predetermined requirements comprises:
extracting an operation code of the operation command and parameters carried by the operation code;
extracting target parameters with carrying frequency larger than a target value, and expanding a feature word library of the operation code according to the target parameters, wherein the carrying frequency is determined according to the frequency of the operation code appearing in the operation command and the frequency of the parameter appearing in the operation command.
5. The method of claim 1, wherein the constructing a classification library from the expanded feature lexicon comprises:
recalling the sample matched with the operation sequence and the operation code parameter through the feature word bank;
performing word segmentation on the sample, completing vectorization coding, and taking a mean value to obtain a sentence vector;
and constructing a classification library based on the sentence vectors and the target labels, wherein the target labels comprise primary positive samples, secondary positive samples, tertiary positive samples and tertiary negative samples.
6. The method of claim 1, wherein said training said classification model through said classification library comprises:
constructing a first classification model, wherein the first classification model comprises a feature extraction model and a second classification model;
inputting an operation command as a text to be classified into the feature extraction model, and outputting the extracted first feature;
classifying the first features by utilizing the second classification model to obtain the type of the text to be classified;
calculating loss information according to the type of the sample to be classified and the preset type of the text to be classified;
and updating parameters of the first classification model based on the loss information to obtain a third classification model.
7. The method of claim 1, wherein said entering a target command into said classification model for classification into a type of said target command comprises:
performing word segmentation according to the target command, completing vectorization coding, and taking a mean value to obtain a sentence vector;
and inputting the sentence vector into the trained third classification model for similarity calculation to obtain the classification type of the sample to be detected.
8. An apparatus for target command recognition, comprising:
the sample construction module is used for constructing positive and negative samples of the classification training set according to a plurality of operation codes in the operation command;
the word stock construction module is used for establishing a feature word stock corresponding to the positive and negative samples according to the positive and negative samples;
the word bank expanding module is used for determining the operation codes and the parameter construction word characteristics carried by the operation codes and expanding the characteristic word bank corresponding to the positive and negative samples through the word characteristics meeting the preset requirements;
the classification library construction module is used for constructing a classification library according to the expanded feature word library;
a training module for training the classification model through the classification library;
and the classification module is used for inputting the target command into the classification model for classification to obtain the type of the target command.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the method of navigating according to any one of claims 1-7.
10. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, carry out the steps of the method of navigating according to any one of claims 1 to 7.
CN202210404482.1A 2022-04-18 2022-04-18 Target command identification method and device, electronic equipment and readable storage medium Pending CN114969725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210404482.1A CN114969725A (en) 2022-04-18 2022-04-18 Target command identification method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210404482.1A CN114969725A (en) 2022-04-18 2022-04-18 Target command identification method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114969725A true CN114969725A (en) 2022-08-30

Family

ID=82976976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210404482.1A Pending CN114969725A (en) 2022-04-18 2022-04-18 Target command identification method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114969725A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034275A (en) * 2023-10-10 2023-11-10 北京安天网络安全技术有限公司 Malicious file detection method, device and medium based on Yara engine

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344615A (en) * 2018-07-27 2019-02-15 北京奇虎科技有限公司 A kind of method and device detecting malicious commands
CN110414229A (en) * 2019-03-29 2019-11-05 腾讯科技(深圳)有限公司 Operational order detection method, device, computer equipment and storage medium
CN111460148A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN112507336A (en) * 2020-12-15 2021-03-16 四川长虹电器股份有限公司 Server-side malicious program detection method based on code characteristics and flow behaviors
CN112765428A (en) * 2021-01-15 2021-05-07 济南大学 Malicious software family clustering and identifying method and system
CN112819023A (en) * 2020-06-11 2021-05-18 腾讯科技(深圳)有限公司 Sample set acquisition method and device, computer equipment and storage medium
WO2021217930A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 Dissertation classification method and apparatus based on classification model, and electronic device and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344615A (en) * 2018-07-27 2019-02-15 北京奇虎科技有限公司 A kind of method and device detecting malicious commands
CN110414229A (en) * 2019-03-29 2019-11-05 腾讯科技(深圳)有限公司 Operational order detection method, device, computer equipment and storage medium
CN111460148A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
WO2021217930A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 Dissertation classification method and apparatus based on classification model, and electronic device and medium
CN112819023A (en) * 2020-06-11 2021-05-18 腾讯科技(深圳)有限公司 Sample set acquisition method and device, computer equipment and storage medium
CN112507336A (en) * 2020-12-15 2021-03-16 四川长虹电器股份有限公司 Server-side malicious program detection method based on code characteristics and flow behaviors
CN112765428A (en) * 2021-01-15 2021-05-07 济南大学 Malicious software family clustering and identifying method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034275A (en) * 2023-10-10 2023-11-10 北京安天网络安全技术有限公司 Malicious file detection method, device and medium based on Yara engine
CN117034275B (en) * 2023-10-10 2023-12-22 北京安天网络安全技术有限公司 Malicious file detection method, device and medium based on Yara engine

Similar Documents

Publication Publication Date Title
CN109635273B (en) Text keyword extraction method, device, equipment and storage medium
CN109005145B (en) Malicious URL detection system and method based on automatic feature extraction
CN107180084B (en) Word bank updating method and device
CN109831460B (en) Web attack detection method based on collaborative training
CN111460820A (en) Network space security domain named entity recognition method and device based on pre-training model BERT
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN111291195A (en) Data processing method, device, terminal and readable storage medium
CN112989035A (en) Method, device and storage medium for recognizing user intention based on text classification
CN111177367B (en) Case classification method, classification model training method and related products
WO2014022172A2 (en) Information classification based on product recognition
WO2023116561A1 (en) Entity extraction method and apparatus, and electronic device and storage medium
CN110909531A (en) Method, device, equipment and storage medium for discriminating information security
CN115617955B (en) Hierarchical prediction model training method, punctuation symbol recovery method and device
CN111428027A (en) Query intention determining method and related device
CN111368529B (en) Mobile terminal sensitive word recognition method, device and system based on edge calculation
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
CN111506726B (en) Short text clustering method and device based on part-of-speech coding and computer equipment
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN112446209A (en) Method, equipment and device for setting intention label and storage medium
CN114969725A (en) Target command identification method and device, electronic equipment and readable storage medium
CN114925702A (en) Text similarity recognition method and device, electronic equipment and storage medium
CN110705282A (en) Keyword extraction method and device, storage medium and electronic equipment
CN116644183B (en) Text classification method, device and storage medium
CN114254636A (en) Text processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination