CN113569242A

CN113569242A - Illegal software identification method

Info

Publication number: CN113569242A
Application number: CN202110859667.7A
Authority: CN
Inventors: 王志英; 杨航; 刘家豪; 冯国聪; 王皓然; 农彩勤; 刘祥; 刘欣
Original assignee: China Southern Power Grid Co Ltd; Southern Power Grid Digital Grid Research Institute Co Ltd
Current assignee: China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd; China Southern Power Grid Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-29

Abstract

The application relates to a method, a device, computer equipment and a storage medium for identifying illegal software, which are used for acquiring a black and white list of the software and a reference baseline model of the software process, traversing the software process operated by a host, identifying the software process which is not matched with the reference baseline model of the software process in the software process operated by the host, obtaining a suspicious process, determining the software corresponding to the suspicious process, and identifying the illegal software in the host according to the black and white list of the software and the software corresponding to the suspicious process. In the whole process, the software process reference baseline model is obtained by the software process of the white list software based on machine learning, the software process corresponding to the white list can be identified, if the software process which is not matched with the software process reference baseline model exists, the process does not belong to the process corresponding to the white list, and the illegal software in the host can be accurately identified by combining the black and white list of the software.

Description

Illegal software identification method

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying software violation, a computer device, and a storage medium.

Background

With the development of computer technology and communication technology, an intelligent networking technology appears, a plurality of computers form a networking network, the interaction of data among the computers is realized, and the data transmission efficiency is improved.

In the computer networking, because a plurality of participating computers realize different functions, different software is installed on the computers, and the software needs to interact with other computers in the running process to perform data. In the data interaction process, if illegal software attacks other computers, the normal operation of the whole networking network is necessarily affected, and even abnormal data (viruses, trojans and the like) can be infected in the whole networking network.

It can be seen that how to accurately identify violating software in a computer is very essential. Most of traditional illegal software identification schemes adopt a black-and-white list mode, the mode is based on a software black-and-white list set in an initial stage, only the black-list software with the illegal risk set in the initial stage can be fixedly identified, the software with the illegal risk in the original white-list software cannot be accurately identified in subsequent operation, and the identification accuracy of the illegal software is low.

Disclosure of Invention

In view of the above, it is necessary to provide an accurate illegal software identification method, device, computer device and storage medium for solving the above technical problems.

A method of identifying software violations, the method comprising:

acquiring a black and white list of software and a software process reference baseline model;

traversing software processes operated by the host, and identifying software processes which are not matched with the software process reference baseline model in the software processes operated by the host to obtain suspicious processes;

determining software corresponding to the suspicious process;

identifying illegal software in the host according to a black and white list of the software and the software corresponding to the suspicious process;

the software process reference baseline model is obtained by training a machine learning model through a software process of white list software in a black and white list of the software.

In one embodiment, the method for identifying software violations further includes:

acquiring a deviation of a software process from a baseline model;

identifying a software process which is matched with the software process deviating from the baseline model in the software process operated by the host to obtain a dangerous process;

generating and pushing an alarm message corresponding to the dangerous process;

wherein, the deviation of the software process from the baseline model is obtained by training a machine learning model by the software process of the black list software in the black and white list of the software.

acquiring software processes corresponding to each software in a black and white list of the software to obtain a software process set;

adding a trusted label and first training data to a software process corresponding to white list software in a software process set; adding a danger label to a software process corresponding to blacklist software in the software process set to obtain second training data;

obtaining an initial machine learning model;

and training the initial machine learning model by adopting the first training data and the second training data to respectively obtain a software process reference baseline model and a software process deviation baseline model.

In one embodiment, the initial machine learning model comprises a linear classification machine learning model.

In one embodiment, the acquiring the black and white list of the software comprises:

acquiring black and white lists of preset software of each networking host;

and collecting black and white lists of the preset software and screening and removing duplication to obtain the black and white lists of the software.

In one embodiment, after identifying the offending software in the host, the method further comprises:

positioning a host running violation software to obtain a target host;

acquiring an IP of a target host;

and (4) plugging the IP of the target host.

An illegal software identification device, the device comprising:

the data acquisition module is used for acquiring a black and white list of software and a software process reference baseline model;

the suspicious classification module is used for traversing the software processes operated by the host, identifying the software processes which are not matched with the software process reference baseline model in the software processes operated by the host, and obtaining suspicious processes;

the software determining module is used for determining software corresponding to the suspicious process;

the violation identification module is used for identifying violation software in the host according to the black and white list of the software and the software corresponding to the suspicious process;

In one embodiment, the illegal software identification device further includes a danger prompt module, configured to obtain a deviation of the software process from the baseline model; identifying a software process which is matched with the software process deviating from the baseline model in the software process operated by the host to obtain a dangerous process; generating and pushing an alarm message corresponding to the dangerous process; wherein, the deviation of the software process from the baseline model is obtained by training a machine learning model by the software process of the black list software in the black and white list of the software.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

determining software corresponding to the suspicious process;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

determining software corresponding to the suspicious process;

The illegal software identification method, the illegal software identification device, the computer equipment and the storage medium acquire the black and white list of the software and the software process reference baseline model, traverse the software process operated by the host, identify the software process which is not matched with the software process reference baseline model in the software process operated by the host, acquire the suspicious process, determine the software corresponding to the suspicious process, and identify the illegal software in the host according to the black and white list of the software and the software corresponding to the suspicious process. In the whole process, the software process reference baseline model is obtained by the software process of the white list software based on machine learning, the software process corresponding to the white list can be identified, if the software process which is not matched with the software process reference baseline model exists, the process does not belong to the process corresponding to the white list, and the illegal software in the host can be accurately identified by combining the black and white list of the software.

Drawings

FIG. 1 is a diagram of an application environment for a method for identifying software violations in one embodiment;

FIG. 2 is a flow diagram illustrating a method for identifying software violations in one embodiment;

FIG. 3 is a flowchart illustrating a method for identifying software violations in accordance with another embodiment;

FIG. 4 is a block diagram of the structure of the illegal software identification device in one embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The illegal software identification method provided by the application can be applied to the application environment shown in fig. 1. Where host 102 communicates with server 104 over a network. The method comprises the steps that a server obtains a black and white list of software corresponding to a host in networking, a software process reference base line model is loaded in advance on the server, the server obtains the black and white list of the software and the software process reference base line model, software processes operated by the host are traversed, software processes which are not matched with the software process reference base line model in the software processes operated by the host are identified, suspicious processes are obtained, software corresponding to the suspicious processes are determined, and illegal software in the host is identified according to the black and white list of the software and the software corresponding to the suspicious processes. The server may push an alert message corresponding to the violation software to a manager. The host 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a method for identifying illegal software is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

s200: and acquiring a black and white list of the software and a software process reference baseline model.

The software black and white list is formed by predetermined white list software and black list software, and the software black list and the software white list specifically comprise two parts of a software black list and a software white list. Specifically, each host in the whole networking has its own corresponding software black-and-white list, and the server can collect the software black-and-white lists of each host to obtain the black-and-white lists of all hosts in the networking. The software process reference baseline module is obtained by training the software process of white list software in a black and white list of the software in a machine learning mode. Generally, software running inside the host belongs to white list software, and an initial machine learning model can be trained based on the process of the software running inside the host, so that a software process reference baseline model is obtained.

S400: and traversing the software processes operated by the host, and identifying the software processes which are not matched with the software process reference baseline model in the software processes operated by the host to obtain suspicious processes.

Scanning and traversing the software process operated by the host, matching the software process operated by the host with the software process reference baseline module, and identifying the software process which is not matched with the software process reference baseline module in the software process operated by the host, wherein the software process belongs to a suspicious process. And (3) allowing the software process which is matched with the software process reference baseline model in the software process operated by the host to normally operate, namely belonging to a trusted process corresponding to the white list.

S600: and determining the software corresponding to the suspicious process.

And determining software corresponding to each suspicious process, wherein the software corresponding to each suspicious process has certain safety risk and is most likely to belong to illegal software, and the software needs to be prohibited from being installed and operated in the whole networking. The corresponding relation between the software process and the software can be pre-constructed and stored in the server, a unique identification mark is distributed to each newly added software, and the mark is continuously distributed to the process to which the software process belongs on the basis of the software identification mark, so that the corresponding software of each process can be accurately identified.

S800: and identifying the illegal software in the host according to the black and white list of the software and the software corresponding to the suspicious process.

All software in the host is traversed, and the illegal software in the host is identified based on the software black-and-white list and the software corresponding to the suspicious process, so that the illegal software can be found more accurately. Specifically, when the software belongs to the black and white single blacklist software of the software or the software is the software identifier corresponding to the suspicious process, the software is judged to belong to the illegal software. The following description takes A, B, C three software as examples, and if software A is identified to belong to software in a black list according to a black and white list of the software, the software A is judged to belong to illegal software; identifying that the software B belongs to white list software according to a black and white list of the software, and judging that the software B also belongs to illegal software if the software B is software corresponding to a suspicious process; and identifying that the software C belongs to the white-name software according to the black-white list of the software, identifying that the software C does not belong to the software corresponding to the suspicious process after the software C is processed in the steps S400 to S600, and judging that the software C does not belong to the illegal software.

The illegal software identification method comprises the steps of obtaining a black and white list of software and a software process reference baseline model, traversing the software process operated by the host, identifying the software process which is not matched with the software process reference baseline model in the software process operated by the host to obtain a suspicious process, determining the software corresponding to the suspicious process, and identifying the illegal software in the host according to the black and white list of the software and the software corresponding to the suspicious process. In the whole process, the software process reference baseline model is obtained by the software process of the white list software based on machine learning, the software process corresponding to the white list can be identified, if the software process which is not matched with the software process reference baseline model exists, the process does not belong to the process corresponding to the white list, and the illegal software in the host can be accurately identified by combining the black and white list of the software.

In one embodiment, the method for identifying software violations further includes: acquiring a deviation of a software process from a baseline model; identifying a software process which is matched with the software process deviating from the baseline model in the software process operated by the host to obtain a dangerous process; generating and pushing an alarm message corresponding to the dangerous process;

In this embodiment, the server further obtains a software process deviation baseline model, where the software process deviation baseline model is obtained by training in a machine learning manner in advance based on software of blacklist software in a software black and white case. Software processes matched with blacklist software can be identified from software processes operated by the host computer based on the deviation of the software processes from the baseline model, the processes threaten the safety of the whole networking, and at the moment, the server generates and pushes alarm messages corresponding to dangerous processes. Specifically, the alarm message may carry an identity of the dangerous process, an identity of software corresponding to the dangerous process, and a host corresponding to the process, so that the server or a manager may perform further control measures to ensure the safety of the entire networking operation.

acquiring software processes corresponding to each software in a black and white list of the software to obtain a software process set; adding a trusted label and first training data to a software process corresponding to white list software in a software process set; adding a danger label to a software process corresponding to blacklist software in the software process set to obtain second training data; obtaining an initial machine learning model; and training the initial machine learning model by adopting the first training data and the second training data to respectively obtain a software process reference baseline model and a software process deviation baseline model.

The software process deviation baseline model and the software process reference baseline model can be obtained by training a process corresponding to black-list software in a black-and-white list of software and a process corresponding to white-list software. In practical application, the server can pre-collect software black and white lists on each host within a period of time, obtain software processes corresponding to each software in the software black and white lists to obtain a software process set, add credible labels to the software processes corresponding to the white list software, and the processes are generally processes running inside the host and are sorted to obtain first training data; adding danger labels to software processes corresponding to blacklist software, wherein the processes generally belong to dangerous processes due to potential safety hazards existing in the whole networking, sorting to obtain second training data, and training an initial machine learning model by adopting the first training data to obtain a software process reference baseline model; and training the initial machine learning model by adopting second training data to obtain a software process deviation baseline model. In this embodiment, 2 different sets of training data (first training data and second training data) are respectively used to perform the initial machine learning model, so as to obtain a software process reference baseline model and a software process deviation baseline model. The initial machine learning model can be a linear classification machine learning model, and the training of the linear classification machine learning model can accurately realize sample classification, so that a software credible process and a dangerous process can be accurately identified.

acquiring black and white lists of preset software of each networking host; and collecting black and white lists of the preset software and screening and removing duplication to obtain the black and white lists of the software.

The server collects black and white lists corresponding to each host in the whole network, screens and removes duplicate lists, and removes meaningless and disordered data to obtain the black and white lists of the software.

As shown in fig. 3, in one embodiment, after S800, the method further includes:

s920: positioning a host running violation software to obtain a target host;

s940: acquiring an IP of a target host;

s960: and (4) plugging the IP of the target host.

For a host running illegal software, in order to avoid that the host affects the security of the whole networking data, the IP of the host needs to be blocked, and data interaction between the host and other hosts and servers in the networking is prohibited. Specifically, when the fact that the host runs the software identifier with the violation is determined, the host is located, the host is used as the target host for IP plugging at this time, the IP of the target host is obtained, the IP of the target host is directly plugged, and communication with other hosts and a server is forbidden.

In practical application, taking the application of the illegal software identification method in the smart grid as an example, the method specifically comprises the following steps:

1. the server collects the white list processes running inside each host device in advance, registers which processes belong to the compliance processes, and brings the processes into a white list process library.

2. During the operation of the host safety protection system built in the server, a learning mode is started for a period of time, for example, the host safety protection system operates for 2 weeks, processes running inside the host are all learned into the system, and the processes running inside the host are brought into the white list library.

3. After the learning mode is finished, a host safety protection system arranged in the server starts a defense mode, a current running process is matched with a process of the white list library in the defense mode, the process which is not contained in the white list library is recorded as an illegal process and early-warned, and operation and maintenance personnel judge whether host plugging is needed according to early-warning conditions.

It should be understood that, although the steps in the flowcharts are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in each of the flowcharts described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

As shown in fig. 4, the present application also provides an illegal software identification device, which includes:

the data acquisition module 200 is used for acquiring a black and white list of software and a software process reference baseline model;

the suspicious classification module 400 is used for traversing the software processes operated by the host, identifying the software processes which are not matched with the software process reference baseline model in the software processes operated by the host, and obtaining suspicious processes;

a software determining module 600, configured to determine software corresponding to a suspicious process;

the violation identification module 800 is configured to identify violation software in the host according to a black and white list of the software and software corresponding to the suspicious process;

The illegal software identification device acquires the black and white list of the software and the software process reference baseline model, traverses the software process operated by the host, identifies the software process which is not matched with the software process reference baseline model in the software process operated by the host, obtains the suspicious process, determines the software corresponding to the suspicious process, and identifies the illegal software in the host according to the black and white list of the software and the software corresponding to the suspicious process. In the whole process, the software process reference baseline model is obtained by the software process of the white list software based on machine learning, the software process corresponding to the white list can be identified, if the software process which is not matched with the software process reference baseline model exists, the process does not belong to the process corresponding to the white list, and the illegal software in the host can be accurately identified by combining the black and white list of the software.

In one embodiment, the illegal software identification device further includes a model training module, configured to obtain software processes corresponding to each software in a black-and-white list of the software, so as to obtain a software process set; adding a trusted label and first training data to a software process corresponding to white list software in a software process set; adding a danger label to a software process corresponding to blacklist software in the software process set to obtain second training data; obtaining an initial machine learning model; and training the initial machine learning model by adopting the first training data and the second training data to respectively obtain a software process reference baseline model and a software process deviation baseline model.

In one embodiment, the data obtaining module 200 is further configured to obtain a black-and-white list of preset software of each networking host; and collecting black and white lists of the preset software and screening and removing duplication to obtain the black and white lists of the software.

In one embodiment, the illegal software identification device further comprises an IP blocking module, configured to locate a host running the illegal software, and obtain a target host; acquiring an IP of a target host; and (4) plugging the IP of the target host.

For specific limitations of the illegal software identification device, reference may be made to the above limitations of the illegal software identification method, which is not described herein again. The respective modules in the above-described violation software identification device may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as black and white list data of the preset software, a reference baseline model of the preset software process and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of violation software identification.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

determining software corresponding to the suspicious process;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring a deviation of a software process from a baseline model; identifying a software process which is matched with the software process deviating from the baseline model in the software process operated by the host to obtain a dangerous process; generating and pushing an alarm message corresponding to the dangerous process; wherein, the deviation of the software process from the baseline model is obtained by training a machine learning model by the software process of the black list software in the black and white list of the software.

positioning a host running violation software to obtain a target host; acquiring an IP of a target host; and (4) plugging the IP of the target host.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

determining software corresponding to the suspicious process;

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for identifying software violations, the method comprising:

traversing software processes operated by a host, and identifying software processes which are not matched with the software process reference baseline model in the software processes operated by the host to obtain suspicious processes;

determining software corresponding to the suspicious process;

identifying illegal software in a host according to the black and white list of the software and the software corresponding to the suspicious process;

and the software process reference baseline model is obtained by training a machine learning model by the software process of the white list software in the black and white list of the software.

2. The method of claim 1, further comprising:

acquiring a deviation of a software process from a baseline model;

generating and pushing an alarm message corresponding to the dangerous process;

wherein the software process deviation baseline model is obtained by training a machine learning model by the software process of the blacklist software in the software black and white list.

3. The method of claim 2, further comprising:

adding a trusted label and first training data to a software process corresponding to white list software in the software process set; adding a danger label to a software process corresponding to blacklist software in the software process set to obtain second training data;

obtaining an initial machine learning model;

4. The method of claim 3, wherein the initial machine learning model comprises a linear classification machine learning model.

5. The method of claim 1, wherein obtaining a black and white list of software comprises:

acquiring black and white lists of preset software of each networking host;

and collecting the black and white list of the preset software and screening and removing the duplication to obtain the black and white list of the software.

6. The method of claim 1, wherein after identifying the offending software in the host, further comprising:

positioning a host running the violation software to obtain a target host;

acquiring the IP of the target host;

and blocking the IP of the target host.

7. An illegal software identification device, characterized in that the device comprises:

the suspicious identification module is used for traversing software processes operated by the host, identifying the software processes which are not matched with the software process reference baseline model in the software processes operated by the host, and obtaining suspicious processes;

the software determining module is used for determining the software corresponding to the suspicious process;

the violation identification module is used for identifying violation software in the host according to the software black-and-white list and the software corresponding to the suspicious process;

8. The apparatus of claim 7, further comprising a risk prompt module to obtain a deviation of the software process from the baseline model; identifying a software process which is matched with the software process deviating from the baseline model in the software process operated by the host to obtain a dangerous process; generating and pushing an alarm message corresponding to the dangerous process; wherein the software process deviation baseline model is obtained by training a machine learning model by the software process of the blacklist software in the software black and white list.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.