CN117951693A - Method and device for identifying sensitive operation behaviors and readable storage medium - Google Patents

Method and device for identifying sensitive operation behaviors and readable storage medium Download PDF

Info

Publication number
CN117951693A
CN117951693A CN202410138280.6A CN202410138280A CN117951693A CN 117951693 A CN117951693 A CN 117951693A CN 202410138280 A CN202410138280 A CN 202410138280A CN 117951693 A CN117951693 A CN 117951693A
Authority
CN
China
Prior art keywords
sensitive
word
identified
behavior
word frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410138280.6A
Other languages
Chinese (zh)
Inventor
周莉
刘建国
何琳
林海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Digital Technology Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202410138280.6A priority Critical patent/CN117951693A/en
Publication of CN117951693A publication Critical patent/CN117951693A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a method and a device for identifying sensitive operation behaviors and a readable storage medium, wherein the method comprises the following steps: acquiring an operation behavior to be identified; calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to a fort operation log; and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold. The method, the device and the readable storage medium can solve the problems that the existing sensitive operation behavior identification method mainly identifies the sensitive operation behavior in a completely matched mode, the identification of the sensitive operation behavior is easy to be missed, and the hidden danger of large data leakage risk exists.

Description

Method and device for identifying sensitive operation behaviors and readable storage medium
Technical Field
The present invention relates to the field of data security technologies, and in particular, to a method and apparatus for identifying a sensitive operation behavior, and a readable storage medium.
Background
With the gradual increase of the data security attention, the data security and the data protection degree of enterprises are increasingly enhanced. Sensitive files and sensitive data of enterprises are often stored on an internal server, and once the sensitive files and the sensitive data suffer from data leakage, the sensitive files and the sensitive data are very liable to have extremely serious influence on the stability of the enterprises and the society, but the enterprises often lack necessary monitoring means on the operation behaviors of the internal staff on the server.
At present, sensitive operation behaviors can be judged and identified through a log analysis method, but the identification method is a completely matched method, so that the sensitive operation behaviors are easy to identify and overlook, and the hidden danger of large data leakage risk exists.
Disclosure of Invention
The invention aims to solve the technical problems of the prior art, and provides a method, a device and a readable storage medium for identifying sensitive operation behaviors, which are used for solving the problems that the existing method for identifying the sensitive operation behaviors mainly identifies the sensitive operation behaviors in a completely matched mode, is easy to cause the identification of the sensitive operation behaviors to be missed and has larger hidden danger of data leakage.
In a first aspect, the present invention provides a method of identifying sensitive operational behaviour, the method comprising
The method comprises the following steps:
Acquiring an operation behavior to be identified;
calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to a fort operation log;
and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold.
Further, before the operation behavior to be identified is obtained, the method further includes:
Collecting operation logs of the fort machine;
Performing field identification and analysis on the operation log;
and forming the sensitive characteristic list according to the parsed operation log.
Further, the fields include a user name, an operation time, an operation instruction, an operation content, a source internet protocol IP address, a destination IP address.
Further, the sensitive content in the sensitive feature list includes at least one of the following: sensitive operation, sensitive path, sensitive file.
Further, before the calculating the cosine similarity between the operation behavior to be identified and the sensitive content in the sensitive feature list, the method further includes:
Word segmentation is carried out on each sensitive content in the sensitive feature list, and word vectors corresponding to the sensitive content are sequentially obtained;
creating word frequency corresponding to each word vector for each sensitive content;
all word vectors and word frequencies in the sensitive feature list are arranged to form a sensitive word frequency knowledge base;
And merging word vectors and word frequencies corresponding to each sensitive content according to the sensitive word frequency knowledge base to form the merged word vectors and word frequencies.
Further, the calculating the cosine similarity between the operation behavior to be identified and the sensitive content in the sensitive feature list specifically includes:
Word segmentation is carried out on target operation contents corresponding to the operation behaviors to be identified, and target word vectors are obtained;
Calculating a target word frequency corresponding to the target word vector;
according to the target word vector and the target word frequency corresponding to the operation behaviors to be recognized, comparing the sensitive word frequency knowledge base, and reserving the target word vector and the target word frequency which are consistent with the sensitive word frequency knowledge base to form an optimized target word vector and target word frequency;
And calculating the cosine similarity of the optimized target word frequency corresponding to the target word vector and the word frequency corresponding to the corresponding word vector in the sensitive word frequency knowledge base.
Further, after the identifying the operation behavior to be identified as a sensitive operation behavior, the method further includes:
and triggering a sensitive operation risk alarm.
In a second aspect, the present invention provides an apparatus for identifying sensitive operational behaviour, the apparatus comprising:
the operation behavior acquisition module is used for acquiring operation behaviors to be identified;
The cosine similarity calculation module is connected with the operation behavior acquisition module and is used for calculating cosine similarity of the operation behavior to be identified and sensitive content in a sensitive feature list, wherein the sensitive feature list is formed according to a fort machine operation log;
And the operation behavior identification module is connected with the cosine similarity calculation module and is used for identifying the operation behavior to be identified as a sensitive operation behavior in response to the fact that the cosine similarity is larger than a preset sensitive threshold.
In a third aspect, the present invention provides an apparatus for identifying sensitive operational behaviour, comprising a memory in which a computer program is stored and a processor arranged to run the computer program to implement the method for identifying sensitive operational behaviour described in the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for identifying sensitive operational behaviour as described in the first aspect.
The invention provides a method, a device and a readable storage medium for identifying sensitive operation behaviors, wherein the method, the device and the readable storage medium firstly acquire the operation behaviors to be identified; then, calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to an operation log of the fort machine; and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold. According to the invention, the operation behaviors to be identified are compared with the cosine similarity of the sensitive content in the sensitive feature list formed by the fort machine operation log, so that the sensitive operation behaviors of abnormal users can be more accurately identified, the compliance and safety of the user operation behaviors are ensured, further, the occurrence of data leakage events can be effectively prevented, and the safety of data is ensured. The method solves the problems that the existing sensitive operation behavior identification method mainly identifies the sensitive operation behavior in a completely matched mode, the identification of the sensitive operation behavior is easy to be missed, and the hidden danger of large data leakage exists.
Drawings
FIG. 1 is a flow chart of a method for identifying sensitive operation behavior in embodiment 1 of the present invention;
FIG. 2 is a schematic structural diagram of a device for identifying sensitive operation behavior according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of an identification device for sensitive operation behavior in embodiment 3 of the present invention.
Detailed Description
In order to make the technical scheme of the present invention better understood by those skilled in the art, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings.
It is to be understood that the specific embodiments and figures described herein are merely illustrative of the invention, and are not limiting of the invention.
It is to be understood that the various embodiments of the invention and the features of the embodiments may be combined with each other without conflict.
It is to be understood that only the portions relevant to the present invention are shown in the drawings for convenience of description, and the portions irrelevant to the present invention are not shown in the drawings.
It should be understood that each unit and module in the embodiments of the present invention may correspond to only one physical structure, may be formed by a plurality of physical structures, or may be integrated into one physical structure.
It will be appreciated that the terms "first," "second," and the like in embodiments of the present invention are used to distinguish between different objects or to distinguish between different processes on the same object, and are not used to describe a particular order of objects.
It will be appreciated that, without conflict, the functions and steps noted in the flowcharts and block diagrams of the present invention may occur out of the order noted in the figures.
It is to be understood that the flowcharts and block diagrams of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, apparatuses, devices, methods according to various embodiments of the present invention. Where each block in the flowchart or block diagrams may represent a unit, module, segment, code, or the like, which comprises executable instructions for implementing the specified functions. Moreover, each block or combination of blocks in the block diagrams and flowchart illustrations can be implemented by hardware-based systems that perform the specified functions, or by combinations of hardware and computer instructions.
It should be understood that the units and modules related in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, for example, the units and modules may be located in a processor.
Example 1:
The present embodiment provides a method for identifying sensitive operation behavior, as shown in fig. 1, where the method includes:
Step S101: and acquiring the operation behavior to be identified.
In this embodiment, the operation behavior to be identified is the current operation behavior on the fort machine.
Step S102: and calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to the operation log of the fort machine.
In this embodiment, in order to accurately locate the risk, a sensitive operation list is formed according to the operation log of the fort machine, and the sensitive operation list is used for calculating and comparing cosine similarity, so that the sensitive operation behaviors of abnormal users can be more accurately identified and judged.
It should be noted that, the fort machine is a network security device for protecting the internal network of the enterprise from external attack and malicious intrusion. The system can monitor and filter the external network traffic and prevent malicious software and attackers from entering the enterprise internal network.
Optionally, before the acquiring the operation behavior to be identified, the method further includes:
Collecting operation logs of the fort machine;
Performing field identification and analysis on the operation log;
and forming the sensitive characteristic list according to the parsed operation log.
In this embodiment, first, the operation log of the bastion machine is collected to form the operation log data set ri_d of the bastion machine. The operation log of the bastion machine can be transmitted and collected in a syslog transmission mode, and all kinds of operation behaviors of a user on the bastion machine are transmitted in full quantity, wherein the operation log mainly comprises operation behaviors of the user jumping to a server after logging in the bastion machine.
Optionally, the fields include a user name, an operation time, an operation instruction, an operation content, a source IP (Internet Protocol ) address, a destination IP address.
In this embodiment, field identification and parsing are performed on the fort operation log, including parsing the original fort operation log, extracting the corresponding fort operation log field, where the field content needs to include fort user name, operation time, operation command operation_cmd, operation content operation_content, source IP address src_ip, destination IP address dst_ip, and the like.
Optionally, the sensitive content in the sensitive feature list includes at least one of: sensitive operation, sensitive path, sensitive file.
In this embodiment, a unit or an enterprise forms a sensitive feature list with its own features according to its own service requirements, where the sensitive feature list is composed of contents such as sensitive operations (i.e. risk instructions), sensitive paths, and sensitive files.
Specifically, a unit or an enterprise forms a sensitive feature list V with own characteristics according to own service requirements, wherein sensitive features mainly refer to high-risk and high-sensitive contents obtained by combing the unit or the enterprise, and the method comprises the steps of, but is not limited to, using high-risk abnormal risk operation instructions, using operation instructions to access sensitive paths, obtaining sensitive files, checking the contents of the sensitive files and the like. Units or enterprises comb and classify sensitive features according to risk types, including but not limited to sensitive operations, sensitive paths, sensitive files, and the like.
Optionally, before calculating the cosine similarity between the operation behavior to be identified and the sensitive content in the sensitive feature list, the method further includes:
Word segmentation is carried out on each sensitive content in the sensitive feature list, and word vectors corresponding to the sensitive content are sequentially obtained;
creating word frequency corresponding to each word vector for each sensitive content;
all word vectors and word frequencies in the sensitive feature list are arranged to form a sensitive word frequency knowledge base;
And merging word vectors and word frequencies corresponding to each sensitive content according to the sensitive word frequency knowledge base to form the merged word vectors and word frequencies.
In this embodiment, first, each sensitive content { V i |i=1, & gt, n } in the sensitive feature list V is segmented, and a word vector corresponding to each sensitive content V i is sequentially obtainedWherein j is greater than or equal to 1; then, for each sensitive content, creating word frequency/>, for word vector V i after word segmentationWherein j is greater than or equal to 1; then, all word vectors and word frequencies in the sensitive feature list are arranged to form a sensitive word frequency knowledge base D (D 1,...,Dt), wherein t is less than or equal to i x j, and word vector contents of the sensitive word frequency knowledge base are compared to form word vectors/>, of each sensitive contentSum word frequency/>Where i=1,..n.
For example, the sensitive feature list V includes all the sensitive operations approved by the user, and the sensitive feature list includes two contents:
sz/root/etc/passwd 1.txt
scp-r/root/etc/root@172.31.11.9:/root/etc/
Word segmentation is performed on V to form 2 word vectors V1 (sz,/root/etc/passwd, 1. Txt), V2 (scp, -r,/root/etc/, root, @, 172.31.11.9)
The word frequencies of V1 and V2 are calculated to obtain T1 (1, 1), T2 (1,1,2,1,1,1)
V1 and V2, T1 and T2 are combined to obtain DP (V1, V2) and DT (T1, T2).
Optionally, the calculating the cosine similarity between the operation behavior to be identified and the sensitive content in the sensitive feature list specifically includes:
Word segmentation is carried out on target operation contents corresponding to the operation behaviors to be identified, and target word vectors are obtained;
Calculating a target word frequency corresponding to the target word vector;
according to the target word vector and the target word frequency corresponding to the operation behaviors to be recognized, comparing the sensitive word frequency knowledge base, and reserving the target word vector and the target word frequency which are consistent with the sensitive word frequency knowledge base to form an optimized target word vector and target word frequency;
And calculating the cosine similarity of the optimized target word frequency corresponding to the target word vector and the word frequency corresponding to the corresponding word vector in the sensitive word frequency knowledge base.
In this embodiment, the current operation behavior of each bastion machine is taken as the operation behavior to be identified, and the target operation content operation_content is segmented, mainly, for each bastion machine operation log OP id, the extracted target operation content is segmented to form an operation behavior target word vectorWherein u is not less than 1. Creating word frequency/>, for word vectors after word segmentation, for each fort machine operation log OP id Wherein u is not less than 1. Re-combing the target word vector and the target word frequency of the operation behavior of the fort machine by comparing with the sensitive word frequency knowledge base, and reserving the target word vector and the target word frequency which are consistent with the sensitive word frequency knowledge base to form an optimized operation behavior target word vector/>And target word frequency/>Where t=u. And then, calculating the cosine similarity of the optimized target word frequency corresponding to the target word vector and the word frequency corresponding to the corresponding word vector in the sensitive word frequency knowledge base according to the following formula:
For example, assume that the currently generated oplog data (corresponding to the operational behavior to be identified) is:
sz/root/etc/passwd 5.txt
Similar to this operation is V1 in the sensitive word frequency knowledge base, only word vectors and word frequencies conforming to the sensitive word frequency knowledge base are reserved, the OP (sz,/root/etc/passwd, 5. Txt) word frequency is TP (1, 0), and because 1.Txt corresponding to the inside of the sensitive word frequency knowledge base is not 5.Txt, the last term of TP is 0, namely no correspondence exists. And then, according to TP (1, 0), T1 (1, 1) carries out cosine similarity calculation.
It should be noted that, in the above formula, the molecules in the middle of the two equal signs calculate the repetition degree of the currently received fort machine operation log and the sensitive word frequency knowledge base, taking TP (1, 0), T1 (1, 1) as an example, tp×t1=1×1+1×1+0×1=2; the denominator is the length of the two vectors themselves=sqrt (1≡2+1≡2+0≡2) ×sqrt (1≡2+1≡2) =sqrt (2) =sqrt (3) =sqrt (6).
Step S103: and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold.
In this embodiment, a sensitive threshold Y is set, and when the cosine similarity between the operation behavior to be identified and the sensitive word frequency knowledge base is greater than the sensitive threshold, the operation behavior to be identified is identified as a sensitive operation behavior, otherwise, the operation behavior to be identified is identified as a non-sensitive operation behavior.
Optionally, after the identifying the operation behavior to be identified as a sensitive operation behavior, the method further includes:
and triggering a sensitive operation risk alarm.
In this embodiment, in order to effectively prevent occurrence of a data leakage event, when a sensitive operation behavior is identified, a sensitive operation risk alarm is triggered.
In a specific embodiment, the method for identifying sensitive operation behavior may include the following steps:
a) Collecting operation logs of the fort machine to form a fort machine log data set RI_D;
b) Performing field identification and analysis on the bastion log, wherein the identified data comprise a bastion user name, an operation time, an operation instruction operation_cmd, an operation content operation_content, a source IP address src_ip, a destination IP address dst_ip and the like;
c) Forming a sensitive characteristic list with self characteristics by a unit or an enterprise according to self business requirements, wherein the sensitive characteristic list consists of sensitive contents such as sensitive operation, sensitive paths, sensitive files and the like;
d) Dividing words of each sensitive content in the sensitive feature list, sequentially obtaining word vectors of each sensitive content, constructing word frequencies of each sensitive content, and constructing a sensitive word frequency knowledge base;
e) Dividing words of operation content operation_content in a current operation log of the fort machine, and calculating word frequency of the operation content;
f) Sequentially calculating cosine similarity of word frequency in a word frequency knowledge base and a result of the bastion machine operation content after word segmentation;
g) And setting a sensitive threshold, and triggering a sensitive operation alarm when the cosine similarity is larger than the sensitive threshold.
The specific construction process of the sensitive word frequency knowledge base D is as follows:
Each sensitive content { V i |i=1, & gt, n } in the sensitive feature list V is subjected to word segmentation, and word vectors corresponding to each sensitive content V i are sequentially obtained Wherein j is greater than or equal to 1;
Creating word frequency for word vector V i after word segmentation for each sensitive content Wherein j is greater than or equal to 1;
for all the terms of the sensitive feature list V And carrying out aggregation and de-duplication to form a sensitive word frequency knowledge base D (D 1,...,Dt), wherein t is less than or equal to i.
Forming each sensitive content into a word vector against the word vector content of the sensitive word frequency knowledge baseSum word frequency/>Where i=1,..n.
In another specific embodiment, the method for identifying sensitive operation behavior may include the following steps:
(1) Extracting fort machine operation log
Collecting operation logs of the fort machine to form a fort machine log data set RI_D; and carrying out field identification and analysis on the bastion log, wherein the identified data comprise a bastion user name, an operation time, an operation instruction operation_cmd, an operation content operation_content, a source IP address src_ip, a destination IP address dst_ip and the like.
(2) Forming a list of sensitive features V
The units or enterprises form a sensitive characteristic list V with own characteristics according to own business requirements, wherein the sensitive characteristics mainly refer to high-risk and high-sensitivity contents obtained by combing the units or enterprises, and the sensitive characteristics include, but are not limited to, using high-risk abnormal risk operation instructions, using operation instructions to access sensitive paths, obtaining sensitive files, checking the content of the sensitive files and the like.
(3) Forming a sensitive word frequency knowledge base
Dividing words of each sensitive content in the sensitive feature list, sequentially obtaining word vectors of each sensitive content, constructing word frequencies of each sensitive content, constructing a sensitive word frequency knowledge base, and forming word vectors of each sensitive content by comparing the word vector contents of the sensitive word frequency knowledge baseSum word frequency/>Where i=1,..n.
(4) Forming the operation action word vector and word frequency of the fort machine
Dividing words of operation content operation_content in a fort machine log, calculating word frequency of the operation content, re-combing fort machine operation action word vectors and word frequency against a sensitive word frequency knowledge base, and reserving fort machine operation action word vectors and word frequency conforming to the sensitive word frequency knowledge base to form optimized operation action word vectorsSum word frequency/>
(5) Cosine similarity identification and risk warning
Sequentially calculating the similarity of word frequency cosine in the word frequency knowledge base and the result of the bastion machine operation content after word segmentation:
and setting a sensitive threshold Y, and triggering a sensitive operation alarm when the cosine similarity is larger than the sensitive threshold.
According to the method, the operation log of the bastion machine is used for analyzing the operation behaviors of the bastion machine user, and word segmentation technology is used for carrying out word segmentation identification on the operation behaviors of the bastion machine to form word vectors and obtain corresponding word frequencies. And the enterprise combines the contents of sensitive operation, sensitive file name, sensitive path and the like which need to be restrained according to the service characteristics to form a sensitive characteristic list. The method comprises the steps of obtaining word vectors and word frequencies corresponding to each sensitive content by a word segmentation processing technology for a sensitive feature list, performing sensitive word segmentation processing, forming a sensitive word frequency knowledge base, sequentially comparing real-time fort operation logs with the content of the sensitive word frequency knowledge base, calculating cosine similarity of word frequencies in the fort operation logs and the sensitive word frequency knowledge base after optimizing fort operation log word vector display items, judging abnormal risk operation by setting a sensitive threshold value, identifying sensitive operation behaviors, triggering sensitive operation alarms, and accordingly preventing data leakage and guaranteeing data safety.
The method for identifying the sensitive operation behavior provided by the embodiment of the invention comprises the steps of firstly, acquiring the operation behavior to be identified; then, calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to an operation log of the fort machine; and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold. According to the invention, the operation behaviors to be identified are compared with the cosine similarity of the sensitive content in the sensitive feature list formed by the fort machine operation log, so that the sensitive operation behaviors of abnormal users can be more accurately identified, the compliance and safety of the user operation behaviors are ensured, further, the occurrence of data leakage events can be effectively prevented, and the safety of data is ensured. The method solves the problems that the existing sensitive operation behavior identification method mainly identifies the sensitive operation behavior in a completely matched mode, the identification of the sensitive operation behavior is easy to be missed, and the hidden danger of large data leakage exists.
Example 2:
as shown in fig. 2, the present embodiment provides a device for identifying a sensitive operation behavior, which is configured to execute the method for identifying a sensitive operation behavior, where the device includes:
An operation behavior acquisition module 11, configured to acquire an operation behavior to be identified;
the cosine similarity calculation module 12 is connected with the operation behavior acquisition module 11 and is used for calculating cosine similarity of the operation behavior to be identified and sensitive content in a sensitive feature list, wherein the sensitive feature list is formed according to a fort machine operation log;
and the operation behavior identification module 13 is connected with the cosine similarity calculation module 12 and is used for identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold.
Optionally, the apparatus further comprises:
The operation log acquisition module is used for acquiring operation logs of the fort machine;
The field identification analysis module is used for carrying out field identification and analysis on the operation log;
And the sensitive characteristic list generation module is used for forming the sensitive characteristic list according to the parsed operation log.
Optionally, the fields include a user name, an operation time, an operation instruction, an operation content, a source internet protocol IP address, a destination IP address.
Optionally, the sensitive content in the sensitive feature list includes at least one of: sensitive operation, sensitive path, sensitive file.
Optionally, the apparatus further comprises:
The word segmentation acquisition module is used for segmenting each sensitive content in the sensitive characteristic list to sequentially obtain word vectors corresponding to the sensitive content;
the word frequency acquisition module is used for creating word frequencies corresponding to each word vector aiming at each sensitive content;
The knowledge base forming module is used for sorting all word vectors and word frequencies in the sensitive feature list to form a sensitive word frequency knowledge base;
And the merging module is used for merging the word vector and the word frequency corresponding to each sensitive content against the sensitive word frequency knowledge base to form a merged word vector and word frequency.
Optionally, the cosine similarity calculating module 12 specifically includes:
The first processing unit is used for word segmentation of the target operation content corresponding to the operation behaviors to be identified to obtain a target word vector;
the second processing unit is used for calculating a target word frequency corresponding to the target word vector;
The third processing unit is used for comparing the sensitive word frequency knowledge base according to the target word vector and the target word frequency corresponding to the operation behaviors to be recognized, and reserving the target word vector and the target word frequency which are consistent with the sensitive word frequency knowledge base to form an optimized target word vector and target word frequency;
and the fourth processing unit is used for calculating the cosine similarity of the optimized target word frequency corresponding to the target word vector and the word frequency corresponding to the corresponding word vector in the sensitive word frequency knowledge base.
Optionally, the apparatus further comprises:
And the risk alarm module is used for triggering sensitive operation risk alarm.
Example 3:
Referring to fig. 3, the present embodiment provides an apparatus for identifying a sensitive operation behavior, comprising a memory 21 and a processor 22, the memory 21 storing a computer program, the processor 22 being arranged to run the computer program to perform the method for identifying a sensitive operation behavior in embodiment 1.
The memory 21 is connected to the processor 22, the memory 21 may be a flash memory, a read-only memory, or other memories, and the processor 22 may be a central processing unit or a single chip microcomputer.
Example 4:
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for identifying sensitive operation behavior in embodiment 1 described above.
Computer-readable storage media include volatile or nonvolatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media includes, but is not limited to, RAM (Random Access Memory ), ROM (Read-Only Memory), EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, charged erasable programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact Disc Read-Only Memory), digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
In summary, the method, the device and the readable storage medium for identifying the sensitive operation behavior provided by the embodiment of the invention firstly acquire the operation behavior to be identified; then, calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to an operation log of the fort machine; and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold. According to the invention, the operation behaviors to be identified are compared with the cosine similarity of the sensitive content in the sensitive feature list formed by the fort machine operation log, so that the sensitive operation behaviors of abnormal users can be more accurately identified, the compliance and safety of the user operation behaviors are ensured, further, the occurrence of data leakage events can be effectively prevented, and the safety of data is ensured. The method solves the problems that the existing sensitive operation behavior identification method mainly identifies the sensitive operation behavior in a completely matched mode, the identification of the sensitive operation behavior is easy to be missed, and the hidden danger of large data leakage exists.
It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the invention, and are also considered to be within the scope of the invention.

Claims (10)

1. A method of identifying sensitive operational behavior, the method comprising:
Acquiring an operation behavior to be identified;
calculating cosine similarity between the operation behaviors to be identified and sensitive contents in a sensitive feature list, wherein the sensitive feature list is formed according to a fort operation log;
and identifying the operation behavior to be identified as a sensitive operation behavior in response to the cosine similarity being greater than a preset sensitive threshold.
2. The method of claim 1, wherein prior to the obtaining the operational behavior to be identified, the method further comprises:
Collecting operation logs of the fort machine;
Performing field identification and analysis on the operation log;
and forming the sensitive characteristic list according to the parsed operation log.
3. The method of claim 2, wherein the fields comprise a user name, an operation time, an operation instruction, an operation content, a source internet protocol IP address, a destination IP address.
4. The method of claim 1, wherein the sensitive content in the list of sensitive features comprises at least one of: sensitive operation, sensitive path, sensitive file.
5. The method of claim 1, wherein prior to calculating cosine similarity of the operational behavior to be identified to sensitive content in a list of sensitive features, the method further comprises:
Word segmentation is carried out on each sensitive content in the sensitive feature list, and word vectors corresponding to the sensitive content are sequentially obtained;
creating word frequency corresponding to each word vector for each sensitive content;
all word vectors and word frequencies in the sensitive feature list are arranged to form a sensitive word frequency knowledge base;
And merging word vectors and word frequencies corresponding to each sensitive content according to the sensitive word frequency knowledge base to form the merged word vectors and word frequencies.
6. The method according to claim 5, wherein the calculating the cosine similarity between the operation behavior to be identified and the sensitive content in the sensitive feature list specifically comprises:
Word segmentation is carried out on target operation contents corresponding to the operation behaviors to be identified, and target word vectors are obtained;
Calculating a target word frequency corresponding to the target word vector;
according to the target word vector and the target word frequency corresponding to the operation behaviors to be recognized, comparing the sensitive word frequency knowledge base, and reserving the target word vector and the target word frequency which are consistent with the sensitive word frequency knowledge base to form an optimized target word vector and target word frequency;
And calculating the cosine similarity of the optimized target word frequency corresponding to the target word vector and the word frequency corresponding to the corresponding word vector in the sensitive word frequency knowledge base.
7. The method of claim 1, wherein after the identifying the operational behavior to be identified as a sensitive operational behavior, the method further comprises:
and triggering a sensitive operation risk alarm.
8. An apparatus for identifying sensitive operational behaviour, said apparatus comprising:
the operation behavior acquisition module is used for acquiring operation behaviors to be identified;
The cosine similarity calculation module is connected with the operation behavior acquisition module and is used for calculating cosine similarity of the operation behavior to be identified and sensitive content in a sensitive feature list, wherein the sensitive feature list is formed according to a fort machine operation log;
And the operation behavior identification module is connected with the cosine similarity calculation module and is used for identifying the operation behavior to be identified as a sensitive operation behavior in response to the fact that the cosine similarity is larger than a preset sensitive threshold.
9. An apparatus for identifying sensitive operational behaviour, comprising a memory in which a computer program is stored and a processor arranged to run the computer program to implement a method of identifying sensitive operational behaviour as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements a method for identifying sensitive operational behaviour according to any one of claims 1-7.
CN202410138280.6A 2024-01-31 2024-01-31 Method and device for identifying sensitive operation behaviors and readable storage medium Pending CN117951693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410138280.6A CN117951693A (en) 2024-01-31 2024-01-31 Method and device for identifying sensitive operation behaviors and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410138280.6A CN117951693A (en) 2024-01-31 2024-01-31 Method and device for identifying sensitive operation behaviors and readable storage medium

Publications (1)

Publication Number Publication Date
CN117951693A true CN117951693A (en) 2024-04-30

Family

ID=90796241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410138280.6A Pending CN117951693A (en) 2024-01-31 2024-01-31 Method and device for identifying sensitive operation behaviors and readable storage medium

Country Status (1)

Country Link
CN (1) CN117951693A (en)

Similar Documents

Publication Publication Date Title
US10885185B2 (en) Graph model for alert interpretation in enterprise security system
CN112003838B (en) Network threat detection method, device, electronic device and storage medium
CN105009132A (en) Event correlation based on confidence factor
US10505986B1 (en) Sensor based rules for responding to malicious activity
CN113132311A (en) Abnormal access detection method, device and equipment
CN114760106A (en) Network attack determination method, system, electronic device and storage medium
CN112769775A (en) Threat information correlation analysis method, system, equipment and computer medium
US10637878B2 (en) Multi-dimensional data samples representing anomalous entities
CN116112211A (en) Knowledge-graph-based network attack chain reduction method
EP3705974B1 (en) Classification device, classification method, and classification program
CN112287340B (en) Evidence obtaining and tracing method and device for terminal attack and computer equipment
US20230087309A1 (en) Cyberattack identification in a network environment
CN111104670B (en) APT attack identification and protection method
KR102516819B1 (en) Method for allowing threat events to be analyzed and handled based on big data and server using the same
CN111885088A (en) Log monitoring method and device based on block chain
CN117951693A (en) Method and device for identifying sensitive operation behaviors and readable storage medium
CN115146263A (en) User account collapse detection method and device, electronic equipment and storage medium
Li et al. LogKernel: A threat hunting approach based on behaviour provenance graph and graph kernel clustering
CN115643044A (en) Data processing method, device, server and storage medium
CN114186278A (en) Database abnormal operation identification method and device and electronic equipment
CN113886812A (en) Detection protection method, system, computer equipment and readable storage medium
CN116155519A (en) Threat alert information processing method, threat alert information processing device, computer equipment and storage medium
CN113872959A (en) Risk asset grade judgment and dynamic degradation method, device and equipment
CN115098602B (en) Data processing method, device and equipment based on big data platform and storage medium
CN115809466B (en) Security requirement generation method and device based on STRIDE model, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination