CN115718918A

CN115718918A - Malicious npm packet detection method, device, equipment and medium

Info

Publication number: CN115718918A
Application number: CN202211482921.7A
Authority: CN
Inventors: 金成强; 余金开
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-02-28

Abstract

The application discloses a method, a device, equipment and a medium for detecting a malicious npm packet, which comprise the following steps: extracting characteristic data of an npm packet to be detected; inputting the characteristic data into a classification model to obtain a classification result of the npm packet to be detected; and when the classification result is a normal packet, detecting the npm packet to be detected based on preset detection logic to obtain a detection result. Therefore, the automatic monitoring of the npm packet is realized, the malicious packet is prevented from being marked as a normal packet, and the efficiency and the accuracy of the detection of the malicious npm packet can be improved.

Description

Malicious npm packet detection method, device, equipment and medium

Technical Field

The present application relates to the field of package detection technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting a malicious npm package.

Background

Javascript has been popular for a long time, and the V8 engine has been developed as a member of the language development of the server. A large number of reusable software packages exist in a corresponding package management tool npm (i.e., a node package manager), and the volume is huge. The popularity of communities has promoted the development of npm and also the development of node. However, only one npm account needs to be registered when a new package is released in the npm, and only the registry uploading synchronization needs to be simply carried out when a new version of the package is updated. Malicious packages have become a major focus in recent years because even if discovered and removed quickly, the short-lived presence in the registry can affect many end-users.

At present, because the number of npm packets issued is large, the traditional manual auditing or semi-automatic auditing mode is difficult to deal with, and the problem of low accuracy caused by mistakenly marking normal packets or missing malicious packets exists.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, a device and a medium for detecting a malicious npm packet, which can improve efficiency and accuracy of detecting the malicious npm packet. The specific scheme is as follows:

in a first aspect, the present application discloses a malicious npm packet detection method, including:

extracting characteristic data of an npm packet to be detected;

inputting the characteristic data into a classification model to obtain a classification result of the npm packet to be detected;

and when the classification result is a normal packet, detecting the npm packet to be detected based on preset detection logic to obtain a detection result.

Optionally, the detecting the npm packet to be detected based on the preset detection logic to obtain a detection result includes:

reconstructing the source code in the npm packet to be detected to obtain a reconstructed file, and judging whether the npm packet to be detected is a malicious packet or not based on the reconstructed file and the information of the npm packet to be detected in the registry to obtain a detection result;

or comparing the executable file in the npm packet to be detected with the known malicious packet to obtain a detection result.

Optionally, the extracting the feature data of the npm packet to be detected includes:

extracting characteristic data of a source code or an executable file in the npm packet to be detected;

and/or extracting characteristic data of the running script in the to-be-detected npm packet during installation;

and/or extracting version feature data of the npm packet to be detected based on the description file in the npm packet to be detected.

Optionally, the extracting the feature data of the source code or the executable file in the npm packet to be detected includes:

extracting characteristic data of target operation in a source code or an executable file in the npm packet to be detected; the target operation comprises an operation of acquiring personal information of a user, an operation of accessing a specific system resource and an operation of calling specific APIs;

and/or, extracting the shannon entropy of the source code or the executable file.

Optionally, the operation of invoking the specific APIs includes an operation of invoking APIs generated by dynamic code, and the method includes:

and performing regular matching on the source code or the executable file in the npm packet to be detected, and if the source code or the executable file is matched with the template engine, performing keyword matching on the code generated by the template engine to obtain the characteristic data of the operation of calling the APIs generated by the dynamic codes.

Optionally, the extracting feature data of the running script during installation in the npm packet to be detected includes:

and matching the function used for generating the code in the running script during installation, and extracting the function content of the function to obtain the characteristic data of the running script during installation.

Optionally, the extracting version feature data of the npm packet to be detected based on the description file in the npm packet to be detected includes:

and extracting semantic features of the description file by using a preset semantic analysis logic, and acquiring the release interval time of the version of the npm packet to be detected and the last version to obtain the version feature data of the npm packet to be detected.

In a second aspect, the present application discloses a malicious npm-packet detection apparatus, comprising:

the characteristic data extraction module is used for extracting the characteristic data of the npm packet to be detected;

a classification result obtaining module, configured to input the feature data into a classification model, to obtain a classification result of the npm packet to be detected;

and the detection result acquisition module is used for detecting the npm packet to be detected based on preset detection logic to obtain a detection result when the classification result is a normal packet.

In a third aspect, the present application discloses an electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the malicious npm-packet detection method.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned malicious npm-packet detection method.

Therefore, the feature data of the npm packet to be detected are extracted firstly, then the feature data are input into the classification model, the classification result of the npm packet to be detected is obtained, and when the classification result is a normal packet, the npm packet to be detected is detected based on the preset detection logic, and the detection result is obtained. That is, according to the npm packet detection method and device, the classifier is used for outputting the classification result of the npm packet to be detected based on the characteristic data of the npm packet to be detected, when the classification result is a normal packet, the preset detection logic is called to perform secondary detection on the npm packet to be detected, so that the npm packet is automatically monitored, the malicious packet is prevented from being marked as the normal packet, and the efficiency and the accuracy of malicious npm packet detection can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a malicious npm packet detection method disclosed in the present application;

fig. 2 is a flowchart of a specific malicious npm packet detection method disclosed in the present application;

fig. 3 is a schematic structural diagram of a malicious npm-packet detection device disclosed in the present application;

fig. 4 is a block diagram of an electronic device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First, terms of art to which the present application relates are explained: npm is the default package manager for JavaScript runtime node. JavaScript is the most popular scripting language on the internet, and the language can be used for HTML and web and can be widely used for devices such as servers, PCs, notebook computers, tablet computers and smart phones. TypeScript is a free and open source programming language that is a superset of JavaScript, extending the syntax of JavaScript. Tree-sitter: the method is a parser generator tool and an incremental parsing library, can construct a specific syntax tree for a source code file, efficiently updates the syntax tree when the source file is edited, and supports parsing of various programming languages, including python, java, c and the like.

npm is one of the pillars of JavaScript and TypeScript ecosystems, hosting more than a large number of packages containing various simple utility libraries, complex frameworks, and even entire applications. Every day developers worldwide submit tens of thousands of updates and publish hundreds of new packages. Because of the popularity of npm, it has become a primary target for malicious attackers, who introduce malicious software when new software packages are released or when existing software packages are updated, thereby tampering with or revealing sensitive data of the user installing these software packages or any software package. Defending against such attacks is critical to maintaining the integrity of the software supply chain, but the large number of software package updates means that complete manual review cannot be achieved. Wherein the content of the first and second substances,

to fundamentally solve the problem of attacks against npm, one can start with two aspects: firstly, the malicious packets are blocked from being synchronized into the registry, namely, a strict examination mechanism is executed, and the malicious packets are checked and cleaned in the period from the uploading to the actual synchronization into the registry; and secondly, the access control is executed to limit the installation process of the package, and the attack surface is reduced. The inventor discovers that in the process of implementing the application: 1. since a malicious attacker may publish a malicious npm-package by stealing the credentials of the npm-package maintainer, the pure trust of the npm-package provider or publisher is not applicable. 2. Malicious npm packets tend not to disclose their source code to avoid being discovered, so the pure source code detection approach may not be suitable. 3. Attackers often publish multiple identical-text copies of the same malware package under different names. However, because the packet metadata may be different, conventional classifiers sometimes cannot discover these copies. 4. Developers release a large number of public package versions, including new packages and updated versions of existing packages, in a short period of time, so that the traditional manual or semi-automatic auditing classification approaches cannot cope with such a large number. 5. Since npm packets update very fast, there is a need for a fast way to classify whether an npm packet belongs to a malicious npm packet. In order to solve the problem, the application provides a malicious npm packet detection scheme, which can improve the efficiency and accuracy of malicious npm packet detection.

Referring to fig. 1, an embodiment of the present application discloses a malicious npm packet detection method, including:

step S11: and extracting the characteristic data of the npm packet to be detected.

In a specific implementation, the feature data of the source code or executable file in the npm packet to be detected may be extracted; and/or extracting characteristic data of the running script in the to-be-detected npm packet during installation; and/or extracting version feature data of the npm packet to be detected based on the description file in the npm packet to be detected. It will be appreciated that the npm packet may include source code/executable files, runtime scripts at install, and description files. When the publisher does not want to publish the source code, the source code may be published in the form of an executable file, the runtime script at installation is a script that executes operations such as environment configuration and the like when installing the npm packet, and the description file is generally a description of the npm packet and may include software functions, updated function descriptions and the like.

Further, the extracting of the feature data of the source code or the executable file in the npm packet to be detected may specifically be: extracting characteristic data of target operation in a source code or an executable file in the npm packet to be detected; the target operation comprises an operation of acquiring personal information of a user, an operation of accessing a specific system resource and an operation of calling specific APIs; and/or, extracting the shannon entropy of the source code or the executable file. The operation of acquiring the personal information of the user may be an operation of acquiring a user account and a password, and the operation of accessing a specific system resource may include a file operation (such as reading/writing), a process creation operation (creating a new process), a network request (sending and receiving data), and the like. Operations that call specific APIs may include operations that call APIs for cryptographic functions, operations that call APIs for data encoding, and operations that call APIs for dynamic code generation. Also, the source code may be compressed short code.

In a specific embodiment, character string matching can be used for feature matching and extracting feature data of the operation of obtaining the personal information of the user; moreover, the feature data of the operations for accessing specific system resources and calling specific APIs can be matched by using the API key, including matching by using the API of the operating system and other API keys. And if the operation for calling the specific APIs comprises an operation for calling APIs generated by dynamic codes, regular matching can be performed on the source codes or executable files in the npm packet to be detected, and if the source codes or the executable files are matched with the template engine, keyword matching is performed on the codes generated by the template engine to obtain feature data of the operation for calling APIs generated by dynamic codes. The keyword matching may be matching using an API keyword. The template engine is an engine for generating code. And, the matched feature data of the target operation may include a function and a function content.

Furthermore, a function used for generating a code in the installation running script can be matched, and the function content of the function is extracted to obtain the feature data of the installation running script.

Moreover, the semantic features of the description file can be extracted by using a preset semantic analysis logic, and the release interval time of the version of the npm packet to be detected and the previous version can be obtained, so as to obtain the version feature data of the npm packet to be detected. And if the npm packet to be detected is the initial version, the version release interval time is determined to be 0. That is, the version characteristic data includes semantic characteristics and version release interval times. In this way, by means of the version characteristics, further detection of other versions of the package is performed, so that not only malicious updates (i.e. previously benign packages become malicious programs) but also packages that are malicious from the beginning can be detected.

Step S12: and inputting the characteristic data into a classification model to obtain a classification result of the npm packet to be detected.

In a specific embodiment, the classification model is a model obtained by training an initial model based on training data, and the initial model may be a decision tree, a naive bayes classifier, an SVM (Support Vector Machine ), or the like. The training data comprise feature data of npm packet samples and label information, the training data of the decision tree and the naive Bayesian classifier comprise feature data of normal samples and feature data of malicious samples, and the training data of the SVM comprise feature data of normal samples.

Step S13: and when the classification result is a normal packet, detecting the npm packet to be detected based on preset detection logic to obtain a detection result.

In a specific implementation manner, the source code in the npm packet to be detected may be reconstructed to obtain a reconstructed file, and whether the npm packet to be detected is a malicious packet is determined based on the reconstructed file and information of the npm packet to be detected in the registry, so as to obtain a detection result; or comparing the executable file in the npm packet to be detected with the known malicious packet to obtain a detection result.

It can be understood that the registry includes the registration information of the npm packet, and the registration information includes the hash value of the npm packet, in the embodiment of the present application, the source code in the npm packet to be detected may be reconstructed to obtain a reconstructed file, the hash value of the reconstructed file is calculated, if the hash value is inconsistent with the hash value of the npm packet in the registry, it is determined that the packet is a malicious packet, otherwise, the packet is a normal packet. And comparing the executable file in the npm packet to be detected with the known malicious packet, wherein the specific process can be as follows: and calculating an MD5 hash corresponding to the executable file, comparing with the MD5 hash table of the known malicious packet, if the consistent MD5 hash exists in the MD5 hash table, judging that the npm packet is the malicious packet, and otherwise, judging that the npm packet is the normal packet. Like this, can avoid the classifier with malicious packet misclassification for normal package, promote detection accuracy.

Therefore, in the embodiment of the application, the feature data of the npm packet to be detected is extracted first, then the feature data is input into the classification model, the classification result of the npm packet to be detected is obtained, and when the classification result is a normal packet, the npm packet to be detected is detected based on the preset detection logic, and the detection result is obtained. That is, in the embodiment of the present application, the classifier is first used to output the classification result of the npm packet to be detected based on the feature data of the npm packet to be detected, and when the classification result is a normal packet, the preset detection logic is called to perform secondary detection on the npm packet to be detected, so that the npm packet is automatically monitored, the malicious packet is prevented from being marked as a normal packet, and the efficiency and accuracy of malicious npm packet detection can be improved.

Referring to fig. 2, fig. 2 is a flowchart of a specific malicious npm packet detection method disclosed in the embodiment of the present application. The classifier, i.e., the classification model, is obtained by training a machine learning classifier using labeled training data. The training data includes api features and metadata obtained with a lightweight semantic scan. The reconstructor is used for reconstructing the package by using the source code, reconstructing the generated software package from the source code and comparing the software package with the version of the software package released in the registry. A clone detector for checking npm with known malicious packets, specifically calculating MD hash of packet tarbal content and comparing it with hash list of known malicious packets, and ignoring packet name and version specified in packet.

In the training phase, 643 malicious packets are obtained, and 1147 normal packets issued on the same date are added on the basis of the 643 malicious packets, so that a total amount of 1790 labeled corpus is formed. Since the volume of this corpus is really too small for the total data volume of 170w packets in a single version at the time, the following algorithm can be used in order to solve the extreme imbalance of samples: decision tree, naive Bayes classifier, and a classification SVM. The decision tree can well explain the feature importance, and the naive Bayes classifier and the SVM can solve the problem of sample imbalance. The sklern library from Python was used. For decision trees, information gain is used as a segmentation criterion. For the naive bayesian classifier, bernoulli variants are used that can only handle boolean features, thus omitting discrete features (entropy mean and standard deviation and update time) if this feature folds other features into 1 presence, and 0 otherwise. For SVMs, a linear kernel is selected and trained on only benign examples, since the task of the classifier is to detect anomalous versions that are significantly different from benign. The ν parameter of the SVM is determined by leave-one-out experiments on the basic corpus, which approximates the number of expected outliers.

Further, in the embodiment of the application, a feature extractor can be used for extracting a feature data set of any npm packet to be detected in an npm packet database, the feature data set is input into a trained classifier to obtain a prediction result, and then reconstruction or matching detection with a known malicious packet is performed to obtain a final prediction result.

Wherein the collected feature data set of the npm packet contains the following feature data: 1. acquiring personal sensitive information of a terminal user: account, password, etc.; 2. accessing certain system resources, including: 2.1, file operation: read/write, 2.2, process creation: create new process, 2.3, network request: receiving and transmitting data; 3. 3.1, encryption function, 3.2, data coding and 3.3 dynamic code generation by using specific APIs; 4. running the script during installation; 5. short codes (to avoid detection) or binary files (executable files), wherein the detection avoidance means that the readability of the codes is reduced by a code compression mode, so that some conventional keyword detection malicious npm is not applicable to escape detection. 6. The multi-version feature (i.e., version feature data in the foregoing embodiment) includes: 1. the interval time of release of the version; 2. the update type described in the version update specification. Further, the feature extractor analyzes the Javascript and Typescript files in the package through Tree-size to obtain the single package features: for obtaining end-user personal sensitive information: account, password, etc., using string matching for feature matching; for accessing certain system resources and using certain APIs, matching is performed using the API of the operating system and other API keys. And for the dynamic code generation characteristic matching template engine, carrying out keyword matching on the engine generation content. For script run at install), the eval and Function (two functions) are called to generate code, matching the Function content. For short codes or binary files, extraction is performed through Shannon entropy values. For the multi-version characteristics, acquiring a version release time interval by using npm view time; and obtaining the updating type, namely semantic features, by utilizing semantic analysis. For a package uploaded to the npm-centric warehouse for the first time, a pseudo-update version needs to be introduced, the time between release of the version is set to be 0, and the update type is the semantic feature of the description of the file in a single package.

Therefore, the method can effectively prevent the NPm packet from being attacked by a malicious attacker, solves the problem that the npm packet updating amount and the new increment are too large and cannot effectively identify the malicious npm packet, realizes the automation of detection of the malicious npm packet, does not use manual examination, has strong real-time performance, can keep up with the version changing speed, has high accuracy, avoids mistakenly marking the normal packet or omitting the malicious packet, and ensures that each npm packet can extract features and operate a classifier only within several seconds. The classifier is also cheap and fast to retrain, so continuous improvement can be made as more and more results are classified.

It should be noted that the decision tree extracts the main feature points of the npm packets, that is, the decision tree can classify npm packets with similar features or the same features into one class, and under the condition that the features of malicious npm packets are known, the unknown npm packets can be quickly classified (whether malicious or not) according to the features. Naive bayes classifier: naive Bayes is an algorithm that uses the conditional probability of Bayes to make classification judgment, and can be used to solve the 'inverse probability' problem in solving the classification problem. The "forward probability" problem is how likely it is that if 2 white balls and 3 black balls are known in a bag, a black ball will be touched at a random time. However, in reality, it is more common to guess the color distribution of the ball in the bag by knowing the color of a certain touched ball without knowing how many balls with the same color are inside. This is the "inverse probability" problem. The naive Bayes classification is a classification method based on Bayes theorem and independent hypothesis of characteristic conditions, originates from classical mathematical theory, and has stable mathematical basis and classification efficiency. It is a very simple classification algorithm. The method comprises the steps of judging which category the item to be classified belongs to by solving the occurrence probability of each category of the given item to be classified, and selecting the category with the highest probability under the known condition by naive Bayes classification under the condition of no redundant condition. SVM: the SVM classification is an advanced version of linear classification, namely, on the basis of the linear classification, a concept of a support vector is added, and a classification line capable of segmenting different classifications at the maximum interval is expected to be found out, so that data verification can be avoided. The support vector machine is a supervised learning method and can be widely applied to statistical classification and regression analysis. The support vector machine belongs to a generalized linear classifier, which can simultaneously minimize the empirical error and maximize the geometric marginal region, and is also called a maximum marginal region classifier. Meanwhile, the support vector machine maps the vector into a higher-dimensional space, and a large-interval hyperplane is established in the space. Two hyperplanes parallel to each other are built on both sides of the hyperplane separating the data, and the hyperplane separating the hyperplanes maximizes the distance between the two parallel hyperplanes. The larger the distance or difference between the hyperplanes of parallelism is assumed, the smaller the total error of the classifier.

Referring to fig. 3, the present application discloses a malicious npm packet detection apparatus, including:

the characteristic data extraction module 11 is used for extracting characteristic data of the npm packet to be detected;

a classification result obtaining module 12, configured to input the feature data into a classification model, so as to obtain a classification result of the npm packet to be detected;

and a detection result obtaining module 13, configured to, when the classification result is a normal packet, detect the npm packet to be detected based on preset detection logic, so as to obtain a detection result.

Therefore, in the embodiment of the application, the feature data of the npm packet to be detected is extracted first, then the feature data is input into the classification model, the classification result of the npm packet to be detected is obtained, and when the classification result is a normal packet, the npm packet to be detected is detected based on the preset detection logic, and the detection result is obtained. That is, the classification result of the npm packet to be detected is output by the classifier based on the characteristic data of the npm packet to be detected, and when the classification result is a normal packet, the preset detection logic is called to perform secondary detection on the npm packet to be detected, so that the automatic monitoring of the npm packet is realized, the malicious packet is prevented from being marked as the normal packet, and the efficiency and accuracy of malicious npm packet detection can be improved.

The detection result obtaining module 13 is specifically configured to reconstruct the source code in the npm packet to be detected to obtain a reconstructed file, and determine whether the npm packet to be detected is a malicious packet based on the reconstructed file and information of the npm packet to be detected in the registry, so as to obtain a detection result; or comparing the executable file in the npm packet to be detected with the known malicious packet to obtain a detection result.

Moreover, the feature data extraction module 11 may specifically include:

the first feature extraction unit is used for extracting feature data of a source code or an executable file in the npm packet to be detected;

the second feature extraction unit is used for extracting feature data of the running script in the to-be-detected npm packet during installation;

and the third feature extraction unit is used for extracting the version feature data of the npm packet to be detected based on the description file in the npm packet to be detected.

Wherein, the first feature extraction unit is specifically configured to:

extracting characteristic data of target operation in a source code or an executable file in the npm packet to be detected; the target operation comprises an operation of acquiring personal information of a user, an operation of accessing a specific system resource and an operation of calling specific APIs; and/or, extracting the shannon entropy of the source code or the executable file.

Moreover, the operation of invoking the specific APIs includes an operation of invoking APIs generated by dynamic codes, and the first feature extraction unit is specifically configured to: and performing regular matching on the source code or the executable file in the npm packet to be detected, and if the source code or the executable file is matched with the template engine, performing keyword matching on the code generated by the template engine to obtain the characteristic data of the operation of calling the APIs generated by the dynamic codes.

And the second feature extraction unit is used for matching a function used for generating a code in the installation running script and extracting the function content of the function to obtain feature data of the installation running script.

And the third feature extraction unit is used for extracting the semantic features of the description file by using preset semantic analysis logic, and acquiring the version release interval time of the npm packet to be detected and the previous version to obtain the version feature data of the npm packet to be detected.

Referring to fig. 4, an embodiment of the present application discloses an electronic device 20, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for storing computer programs; the processor 21 is configured to execute the computer program, and the malicious npm packet detection method disclosed in the foregoing embodiment.

For the specific process of the above malicious npm packet detection method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not described here.

The memory 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the storage manner may be a transient storage manner or a permanent storage manner.

In addition, the electronic device 20 further includes a power supply 23, a communication interface 24, an input-output interface 25, and a communication bus 26; the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to a specific application requirement, which is not specifically limited herein.

Further, an embodiment of the present application also discloses a computer-readable storage medium for storing a computer program, where the computer program, when executed by a processor, implements the malicious npm packet detection method disclosed in the foregoing embodiment.

In the present specification, the embodiments are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed description is given to a malicious npm packet detection method, apparatus, device and medium provided by the present application, and a specific example is applied in the present application to explain the principle and implementation manner of the present application, and the description of the above embodiment is only used to help understanding the method and core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A malicious npm packet detection method, comprising:

extracting characteristic data of an npm packet to be detected;

2. The method according to claim 1, wherein the detecting the npm packet to be detected based on a preset detection logic to obtain a detection result includes:

3. The method according to claim 1, wherein the extracting the feature data of the npm packet to be detected includes:

extracting the characteristic data of the source code or the executable file in the npm packet to be detected;

4. The method according to claim 3, characterized in that said extracting the characteristic data of the source code or executable files inside said npm-packet to be detected comprises:

5. The malicious npm-packet detection method according to claim 4, wherein said operation of invoking specific APIs comprises an operation of invoking dynamic code-generated APIs, the method comprising:

6. The method according to claim 3, wherein the extracting feature data of the runtime script when installed in the npm packet to be detected includes:

and matching the function used for generating the code in the installation running script, and extracting the function content of the function to obtain the characteristic data of the installation running script.

7. The method according to claim 3, wherein the extracting version feature data of the npm-packet to be detected based on the description file in the npm-packet to be detected comprises:

and extracting semantic features of the description file by using a preset semantic analysis logic, and acquiring the release interval time of the version of the npm packet to be detected and the last version to obtain version feature data of the npm packet to be detected.

8. A malicious npm-packet detection apparatus, comprising:

a classification result obtaining module, configured to input the feature data into a classification model, so as to obtain a classification result of the to-be-detected npm packet;

9. An electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor configured to execute the computer program to implement the malicious npm packet detection method according to any of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the malicious npm packet detection method according to any one of claims 1 to 7.