CN113919239A - Intelligent internal threat detection method and system based on space-time feature fusion - Google Patents

Intelligent internal threat detection method and system based on space-time feature fusion Download PDF

Info

Publication number
CN113919239A
CN113919239A CN202111526630.9A CN202111526630A CN113919239A CN 113919239 A CN113919239 A CN 113919239A CN 202111526630 A CN202111526630 A CN 202111526630A CN 113919239 A CN113919239 A CN 113919239A
Authority
CN
China
Prior art keywords
time
user behavior
user
basic
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111526630.9A
Other languages
Chinese (zh)
Other versions
CN113919239B (en
Inventor
杨林
李东阳
马琳茹
王晓磊
张洪广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Original Assignee
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences filed Critical Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority to CN202111526630.9A priority Critical patent/CN113919239B/en
Publication of CN113919239A publication Critical patent/CN113919239A/en
Application granted granted Critical
Publication of CN113919239B publication Critical patent/CN113919239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an intelligent internal threat detection method and system based on space-time feature fusion. The method for determining the abnormal degree of the user behavior by utilizing the time characteristic and the space characteristic of the user behavior specifically comprises the following steps: step S1, calling a basic feature extraction module, extracting basic features of the user behavior and coding the basic features; step S2, calling a time characteristic analysis module, and constructing a user behavior mixing matrix to train a time characterization model; step S3, calling a spatial characteristic analysis module to obtain coded basic characteristics of users belonging to the same role so as to train a spatial common group model; and step S4, calling an abnormal integration analysis module, and calculating the abnormal degree of the user behavior.

Description

Intelligent internal threat detection method and system based on space-time feature fusion
Technical Field
The invention belongs to the field of detection and identification of abnormal behaviors, and particularly relates to an intelligent internal threat detection method and system based on space-time feature fusion.
Background
In recent years, the frequent occurrence of data leakage and spyware has made internal threats increasingly a non-negligible practical challenge in the field of network security. A recent network security survey shows that the number of security incidents caused by insiders has risen by 47% since 2018, and this figure will continue to increase with the unstable development of the world economy. However, compared to a severe safety situation, only 33% of organizations represent the ability to detect possible abnormal behavior inside the system. Meanwhile, as the initiator of the internal threat is an authorized employee familiar with the system architecture and security measures, the damage caused by the internal threat is far greater than that caused by a malicious attack from the outside. Therefore, in the face of the serious challenges caused by internal threats such as system destruction and information leakage, it is necessary to provide an effective internal threat detection scheme.
The internal threat detection scheme widely adopted at present is mostly based on rule matching or a simple machine learning algorithm, wherein the scheme based on rule matching can only detect known internal threats and lacks the perception capability of unknown threats, while the detection scheme based on the simple machine learning algorithm is seriously dependent on the advantages and disadvantages of characteristic engineering on one hand and is far inferior to a deep learning algorithm in detection precision on the other hand. The specific drawbacks for the latter can be summarized as follows: (1) in the aspect of feature extraction, most of the existing user behavior characterization schemes only concern the type information of user behaviors and ignore time information, so that model construction is not complete enough, and the improvement of detection performance is limited to a certain extent; (2) in the aspect of model construction, the existing detection scheme only considers the change condition of individual behaviors along with time when establishing a behavior comparison baseline, and ignores the spatial correlation existing among the behaviors of members in the same row, so that the limitation can cause that collective behavior change caused by factors such as service downtime or environmental mutation is mistakenly identified as abnormal, and unnecessary manual examination burden is caused.
Disclosure of Invention
The invention aims to solve the technical problems that: on the premise of giving historical log information and role relations of a plurality of behavior domains of employees of an organization, judging whether the current behaviors of the employees are abnormal or not according to the historical behaviors of the employees and the same lines. The difficulty of the problem can be further decomposed into how to extract spatial and temporal characteristics in the user behavior information and how to effectively fuse the analysis results of the user behavior in terms of space and time so as to achieve the final goal of accurately detecting abnormal users.
The invention aims to provide an intelligent internal threat detection technology based on spatiotemporal feature fusion, which can effectively utilize time and space characteristics existing in a user behavior log to realize synchronous establishment of a user behavior historical baseline and a peer baseline, and further improve the performance of an internal threat detection scheme.
The invention discloses an intelligent internal threat detection method based on space-time feature fusion in a first aspect. The method determines the abnormal degree of the user behavior by utilizing the time characteristic and the space characteristic of the user behavior, and specifically comprises the following steps:
step S1, calling a basic feature extraction module, extracting basic features of the user behavior from user multi-source log information collected according to a time sequence, encoding the basic features, and taking the encoded basic features as the basis of time characteristic analysis and space characteristic analysis;
step S2, calling a time characteristic analysis module, and constructing a user behavior mixing matrix by splicing the coded basic characteristics and the time measurement indexes of the coded basic characteristics, wherein the user behavior mixing matrix is used for training a time characterization model;
step S3, calling a spatial characteristic analysis module to obtain coded basic characteristics of users belonging to the same role so as to train a spatial common group model;
step S4, invoking an abnormal integration analysis module, and calculating an abnormal degree of the user behavior by using the time sample reconstruction error and the space sample reconstruction error respectively obtained by the time characterization model and the space common group model, where the abnormal degree is used to determine the internal threat.
According to the method of the first aspect of the present invention, in step S1, encoding the basic feature includes converting the text-type log information in the user multi-source log information into a numeric-type frequent feature vector with a day granularity.
According to the method of the first aspect of the present invention, in the step S2, a measurement value set of the encoded base features within a window range is obtained according to a sliding window mechanism, and based on the measurement value set, the time metric index of the current-day behavior feature of the user within the window range is calculated by using the following formula:
Figure 788412DEST_PATH_IMAGE001
(1)
Figure 100002_DEST_PATH_IMAGE002
(2)
wherein the content of the first and second substances,
Figure 602915DEST_PATH_IMAGE003
as an indicator of the first time measure,
Figure 100002_DEST_PATH_IMAGE004
in order to be an indicator of the second time metric,
Figure 693493DEST_PATH_IMAGE005
Figure 100002_DEST_PATH_IMAGE006
Figure 644263DEST_PATH_IMAGE007
respectively said basic characteristicsfIn the first placed、d-T+j+1、d-T+jSpecific to dayTime intervallIs measured by the frequency of the measurement of (c),jin order to traverse the value for the time window,
Figure 100002_DEST_PATH_IMAGE008
is composed of
Figure 794884DEST_PATH_IMAGE009
The historical measurements within the window range are,Tin order to be able to slide the length of the window,βin order to exponentially weight the average coefficient,meanthe mean value is represented by the average value,stdthe standard deviation is indicated.
According to the method of the first aspect of the present invention, in step S2, the calculated time metrics are spliced after the measured values of the encoded basic features to construct the user behavior mixture matrix, and the time characterization model is trained based on the depth self-encoder with the user behavior mixture matrix as an input, and training samples of the time characterization model are limited to personal historical data in the user multi-source log information.
According to the method of the first aspect of the present invention, in step S3, the encoded basic features are clustered by using an organization attribution relationship to form a data set with a role as a unique identifier, the encoded basic features of users belonging to the same role are obtained from the data set, the encoded basic features of users belonging to the same role are used as input, the spatial common group model is trained based on the deep autoencoder, and the training samples of the spatial common group model are historical data of users belonging to the same role.
According to the method of the first aspect of the present invention, in step S4, different balancing coefficients are set according to different application scenarios, and the balancing coefficients are used for calculating the degree of the behavior abnormality of the user.
The invention discloses an intelligent internal threat detection system based on spatiotemporal feature fusion in a second aspect. The system determines the degree of abnormality of the user behavior by using the temporal characteristics and the spatial characteristics of the user behavior, and specifically includes:
the basic feature extraction module is configured to extract basic features of the user behaviors from user multi-source log information collected according to a time sequence, encode the basic features, and use the encoded basic features as the basis of time characteristic analysis and space characteristic analysis;
a temporal characteristic analysis module configured to construct a user behavior mixture matrix by concatenating the encoded base features and temporal metrics of the encoded base features, the user behavior mixture matrix being used to train a temporal characterization model;
a spatial characteristic analysis module configured to obtain encoded base features of users belonging to the same role to train a spatial common group model;
an anomaly integration analysis module configured to calculate an anomaly degree of the user behavior using the temporal sample reconstruction error and the spatial sample reconstruction error obtained by the temporal characterization model and the spatial common group model, respectively, the anomaly degree being used to determine the internal threat.
According to the system of the second aspect of the present invention, the basic feature extraction module is specifically configured to encode the basic feature, including converting the log information of the text type in the user multi-source log information into the frequency feature vector of the numerical value type with the day granularity.
According to the system of the second aspect of the present invention, the temporal characteristic analysis module is specifically configured to obtain a set of measurement values of the encoded base features within a window according to a sliding window mechanism, and based on the set of measurement values, calculate the temporal metric indicator of the current-day behavior feature of the user within the window using the following formula:
Figure DEST_PATH_IMAGE010
(1)
Figure 831104DEST_PATH_IMAGE002
(2)
wherein the content of the first and second substances,
Figure 99274DEST_PATH_IMAGE003
as an indicator of the first time measure,
Figure 45496DEST_PATH_IMAGE004
in order to be an indicator of the second time metric,
Figure 599974DEST_PATH_IMAGE005
Figure 372758DEST_PATH_IMAGE011
Figure 805139DEST_PATH_IMAGE007
respectively said basic characteristicsfIn the first placed、d-T+j+1、d-T+jSpecific time interval of daylIs measured by the frequency of the measurement of (c),jin order to traverse the value for the time window,
Figure 979768DEST_PATH_IMAGE008
is composed of
Figure 675454DEST_PATH_IMAGE009
The historical measurements within the window range are,Tin order to be able to slide the length of the window,βin order to exponentially weight the average coefficient,meanthe mean value is represented by the average value,stdthe standard deviation is indicated.
According to the system of the second aspect of the present invention, the temporal characteristic analysis module is specifically configured to concatenate the calculated temporal metrics behind the measured values of the encoded base features to construct the user behavior mixture matrix, and train the temporal characterization model based on the depth self-encoder with the user behavior mixture matrix as an input, wherein training samples of the temporal characterization model are limited to personal historical data in the user multi-source log information.
According to the system of the second aspect of the present invention, the spatial characteristic analysis module is specifically configured to cluster the encoded basic features by using an organization attribution relationship to form a data set with a role as a unique identifier, obtain the encoded basic features of users belonging to the same role from the data set, train the spatial shared group model by using the encoded basic features of users belonging to the same role as an input and based on the deep auto-encoder, and train a training sample of the spatial shared group model as historical data of users belonging to the same role.
According to the system of the second aspect of the present invention, the abnormal integration analysis module is specifically configured to set different balance coefficients according to different application scenarios, where the balance coefficients are used to calculate the degree of the behavior abnormality of the user.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the intelligent detection method for internal threats based on spatiotemporal feature fusion according to any one of the first aspect of the present disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps of a spatiotemporal feature fusion-based intelligent detection method for internal threats according to any one of the first aspects of the present disclosure.
Compared with the prior art, the invention has the advantages that: firstly, the invention provides a time and space characteristic extraction method which is simpler in flow and superior in performance, and the calculation and storage cost of an internal threat intelligent detection scheme is reduced; secondly, the invention provides a novel user behavior space-time characteristic fusion scheme, which can realize the synchronous establishment of the historical baseline and the peer baseline in the detection process, thereby improving the detection accuracy of malicious users and providing important reference for the intelligent and real-time detection of internal threats.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of an intelligent detection scheme for internal threats based on spatiotemporal feature fusion, according to an embodiment of the invention;
FIG. 2a is a flowchart of an intelligent detection method for internal threats based on spatiotemporal feature fusion according to an embodiment of the present invention;
FIG. 2b is a diagram of a user behavior mixing matrix according to an embodiment of the invention;
FIG. 2c is a diagram of a sliding window mechanism according to an embodiment of the present invention;
FIG. 3 is a block diagram of an intelligent detection system for internal threats based on spatiotemporal feature fusion according to an embodiment of the invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a block diagram of an intelligent detection scheme for internal threats based on spatiotemporal feature fusion, according to an embodiment of the invention; as shown in fig. 1, the technical solution adopted by the present invention mainly includes four modules of basic feature extraction, temporal characteristic analysis, spatial characteristic analysis, and anomaly integration analysis.
The basic feature extraction module is mainly responsible for collecting multi-source logs and feature coding work, and feeds coded feature vectors to a time characteristic analysis module and a space characteristic analysis module for subsequent processing. After receiving the basic characteristic vector, the time characteristic analysis module calculates time-varying characteristics according to the sliding window and the relevant time measurement indexes, then combines and splices the basic characteristic vector to form a user behavior time mixing matrix, and the user behavior time mixing matrix is used as the input of the user personal model. Under the support of user historical data and an unsupervised detection algorithm (such as an automatic encoder), a separate historical behavior baseline model can be constructed for each user, and the reconstruction error of a sample to be detected is used as an abnormal score. The spatial signature analysis module will also construct a separate set model for each user group, but it is different that the model is common within the same user group and is trained in an end-to-end, data-driven manner. Furthermore, whereas the group model aims to exploit the spatial properties of the user behavior data, the underlying feature vectors of all users within the group are taken directly here as model input. The time characterization model and the space characterization model jointly form a behavior comparison baseline of the whole internal threat detection, and can be used for the abnormal quantification work of the test sample. And the abnormal integration analysis module performs weighted integration on the model output by combining with a specific application scene to obtain a final abnormal example and a suspicious user list so as to facilitate further check of safety analysis personnel.
The invention discloses an intelligent internal threat detection method based on spatiotemporal feature fusion, which utilizes the temporal features and the spatial features of a user to determine the abnormal degree of the user behavior. FIG. 2a is a flowchart of an intelligent detection method for internal threats based on spatiotemporal feature fusion according to an embodiment of the present invention; as shown in fig. 2a, the method specifically includes:
step S1, calling a basic feature extraction module, extracting basic features of the user behavior from user multi-source log information collected according to a time sequence, encoding the basic features, and taking the encoded basic features as the basis of time characteristic analysis and space characteristic analysis;
step S2, calling a time characteristic analysis module, and constructing a user behavior mixing matrix by splicing the coded basic characteristics and the time measurement indexes of the coded basic characteristics, wherein the user behavior mixing matrix is used for training a time characterization model;
step S3, calling a spatial characteristic analysis module to obtain coded basic characteristics of users belonging to the same role so as to train a spatial common group model;
step S4, invoking an abnormal integration analysis module, and calculating an abnormal degree of the user behavior by using the time sample reconstruction error and the space sample reconstruction error respectively obtained by the time characterization model and the space common group model, where the abnormal degree is used to determine the internal threat.
In step S1, a basic feature extraction module is invoked to extract basic features of the user behavior from the user multi-source log information collected in time sequence, and encode the basic features, where the encoded basic features are used as a basis for temporal characteristic analysis and spatial characteristic analysis.
In some embodiments, in the step S1, the encoding the basic feature includes converting the text-type log information in the user multi-source log information into a numeric-type frequent feature vector at a granularity of days.
Specifically, multi-source log information of the users is collected and sorted according to the time sequence, and basic features are extracted and encoded, namely the log information of the text type is converted into feature vectors of a frequency numerical value type in a day granularity modexWhich is then fed to a temporal and spatial signature analysis module.
In step S2, a temporal characteristic analysis module is invoked to construct a user behavior mixture matrix by concatenating the encoded base features and the temporal metrics of the encoded base features, the user behavior mixture matrix being used to train a temporal characterization model to establish a personal behavior history baseline.
In some embodiments, in the step S2, a measurement value set of the encoded basic features within a window range is obtained according to a sliding window mechanism, and based on the measurement value set, the time metric index of the current-day behavior feature of the user within the window range is calculated by using the following formula:
Figure DEST_PATH_IMAGE012
(1)
Figure 217425DEST_PATH_IMAGE013
(2)
wherein the content of the first and second substances,
Figure 217611DEST_PATH_IMAGE003
as an indicator of the first time measure,
Figure 76107DEST_PATH_IMAGE004
in order to be an indicator of the second time metric,
Figure 237967DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE014
Figure 80284DEST_PATH_IMAGE007
respectively said basic characteristicsfIn the first placed、d- T +j+1、d-T+jSpecific time interval of daylIs measured by the frequency of the measurement of (c),jin order to traverse the value for the time window,
Figure 385625DEST_PATH_IMAGE008
is composed of
Figure 597164DEST_PATH_IMAGE005
The historical measurements within the window range are,Tin order to be able to slide the length of the window,βin order to exponentially weight the average coefficient,meanthe mean value is represented by the average value,stdthe standard deviation is indicated.
In some embodiments, in the step S2, the calculated time metric is spliced after the measurement values of the encoded basic features to construct the user behavior mixture matrix, the time characterization model is trained based on the depth self-encoder with the user behavior mixture matrix as an input, and the training samples of the time characterization model are limited to the personal historical data in the user multi-source log information.
Specifically, for the time characteristic analysis module, a historical measurement value set of all basic features in a current window range is obtained according to a sliding window mechanism, and then time measurement indexes of behavior features of the current day in a sliding window are calculated according to a formula (1) and a formula (2) respectively. After the time measurement index calculation is completed, the time measurement index can be spliced with the original basic characteristic measurement value to form a user behavior mixing matrix, and negative effects caused by different orders of magnitude are eliminated through standardization operation on a time dimension. And then expanding the time-domain data into a vector format, inputting the vector format into an anomaly detection model taking a depth self-encoder as a core, and training to obtain a final time characterization model. It should be noted that in the training process of the model, the training samples are only limited to the individual historical data of the users, and the temporal characteristic analysis module constructs a separate historical baseline model for each user.
FIG. 2b is a diagram of a user behavior hybrid matrix (hybrid feature matrix architecture) according to an embodiment of the present invention; fig. 2c is a diagram of a sliding window mechanism according to an embodiment of the present invention. The user behavior mixing matrix mainly comprises two parts of contents of an original frequency characteristic and a time-varying characteristic, and the original frequency characteristic and the time-varying characteristic can be further divided into two sub-parts of working time and rest time according to behavior occurrence time. In fig. 2b, the basic feature sets extracted by the user in different behavior domains are arranged in the vertical direction. The horizontal direction represents not the date but a variation of the basic features described above at different levels. The white area represents the original feature information normalized by the user during working hours (8 am to 6 pm) and rest hours (6 pm to 8 am the next day), and the gray portion is filled with the value of the basic feature transformed by the time metric index in the sliding window (fig. 2 c). How the components are arranged is not important, since the mixing matrix is expanded into a vector form before the time-characterizing model is input.
In step S3, a spatial characteristic analysis module is invoked to obtain encoded base features of users belonging to the same role to train a spatial shared group model, which is used to establish a group peer behavior baseline.
In some embodiments, in step S3, the encoded basic features are clustered by using an organizational attribution relationship to form a data set uniquely identified by a role, the encoded basic features of users belonging to the same role are obtained from the data set, the encoded basic features of users belonging to the same role are used as input, the spatial common group model is trained based on the deep self-encoder, and the training samples of the spatial common group model are historical data of users belonging to the same role.
Specifically, for the spatial characteristic analysis module, the basic feature data of the user is classified and aggregated according to the organization attribution relationship to form a data set with the role as the unique identifier. The spatial characteristics analysis module then constructs a common group model for all members within the same character group. The spatial model also detects algorithms based on the autoencoder, but its training samples have been extended to the historical data of all users in the group. In addition, in the training process of the group model, the input of the group model is not a mixing matrix containing a time measurement index, but a simple basic feature vector.
In step S4, an abnormal integration analysis module is invoked to calculate an abnormal degree of the user behavior by using the temporal sample reconstruction error and the spatial sample reconstruction error respectively obtained from the temporal characterization model and the spatial common group model, where the abnormal degree is used to determine the internal threat.
In some embodiments, in the step S4, different balance coefficients are set according to different application scenarios, and the balance coefficients are used for calculating the degree of the behavior abnormality of the user.
Specifically, after the time and space characterization model is trained, the reconstruction error of the sample to be measured is fed to the abnormal integration analysis module as an output. Then, the abnormal integration analysis module needs to set a reasonable balance coefficient (here, set to 0.3) according to a specific application scenario, and performs weighted integration on the output of the model to generate a final abnormal score of the sample to be measured. Then, a final abnormal instance and a suspicious user list can be obtained according to the abnormal score ranking of the user behavior instance, so that a safety analysis person can conduct finer-grained inspection.
The invention discloses an intelligent internal threat detection system based on spatiotemporal feature fusion, which utilizes the temporal characteristics and the spatial characteristics of user behaviors to determine the abnormal degree of the user behaviors. FIG. 3 is a block diagram of an intelligent detection system for internal threats based on spatiotemporal feature fusion according to an embodiment of the invention; as shown in fig. 3, the system 300 specifically includes:
a basic feature extraction module 301, configured to extract basic features of the user behavior from user multi-source log information collected in time sequence, and encode the basic features, where the encoded basic features are used as a basis for time characteristic analysis and spatial characteristic analysis;
a temporal characteristic analysis module 302 configured to construct a user behavior mixture matrix by concatenating the encoded base features and the temporal metrics of the encoded base features, the user behavior mixture matrix being used to train a temporal characterization model;
a spatial characteristics analysis module 303 configured to obtain encoded base features of users belonging to the same role to train a spatial common group model;
an anomaly integration analysis module 304 configured to calculate an anomaly degree of the user behavior by using the temporal sample reconstruction error and the spatial sample reconstruction error respectively obtained by the temporal characterization model and the spatial common group model, the anomaly degree being used for determining the internal threat.
According to the system of the second aspect of the present invention, the basic feature extraction module 301 is specifically configured to encode the basic feature, including converting the text-type log information in the user multi-source log information into a numerical-type frequent feature vector with a day granularity.
According to the system of the second aspect of the present invention, the temporal characteristic analysis module 302 is specifically configured to obtain a measurement value set of the encoded base features within a window according to a sliding window mechanism, and based on the measurement value set, calculate the temporal metric index of the current-day behavior feature of the user within the window by using the following formula:
Figure 142373DEST_PATH_IMAGE012
(1)
Figure 377045DEST_PATH_IMAGE013
(2)
wherein the content of the first and second substances,
Figure 656717DEST_PATH_IMAGE003
as an indicator of the first time measure,
Figure 286544DEST_PATH_IMAGE004
in order to be an indicator of the second time metric,
Figure 524627DEST_PATH_IMAGE005
Figure 748060DEST_PATH_IMAGE014
Figure 628160DEST_PATH_IMAGE007
respectively said basic characteristicsfIn the first placed、d- T +j+1、d-T+jSpecific time interval of daylIs measured by the frequency of the measurement of (c),jin order to traverse the value for the time window,
Figure 315756DEST_PATH_IMAGE008
is composed of
Figure 724741DEST_PATH_IMAGE005
The historical measurements within the window range are,Tfor sliding windowsThe length of the first and second support members,βin order to exponentially weight the average coefficient,meanthe mean value is represented by the average value,stdthe standard deviation is indicated.
According to the system of the second aspect of the present invention, the temporal characteristic analysis module 302 is specifically configured to concatenate the calculated time metrics after the measured values of the encoded base features to construct the user behavior mixture matrix, and train the temporal characterization model based on the deep self-encoder with the user behavior mixture matrix as an input, where training samples of the temporal characterization model are limited to personal historical data in the user multi-source log information.
According to the system of the second aspect of the present invention, the spatial characteristic analysis module 303 is specifically configured to cluster the encoded basic features by using an organization attribution relationship to form a data set with a unique identifier of a role, obtain the encoded basic features of users belonging to the same role from the data set, train the spatial shared group model based on the deep self-encoder by using the encoded basic features of users belonging to the same role as input, and train the training samples of the spatial shared group model as historical data of users belonging to the same role.
According to the system of the second aspect of the present invention, the abnormal integration analysis module 304 is specifically configured to set different balance coefficients according to different application scenarios, where the balance coefficients are used to calculate the degree of the behavior abnormality of the user.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the intelligent detection method for internal threats based on spatiotemporal feature fusion according to any one of the first aspect of the present disclosure when executing the computer program.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 4 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps of a spatiotemporal feature fusion-based intelligent detection method for internal threats according to any one of the first aspects of the present disclosure.
Compared with the prior art, the invention has the advantages that: firstly, the invention provides a time and space characteristic extraction method which is simpler in flow and superior in performance, and the calculation and storage cost of an internal threat intelligent detection scheme is reduced; secondly, the invention provides a novel user behavior space-time characteristic fusion scheme, which can realize the synchronous establishment of the historical baseline and the peer baseline in the detection process, thereby improving the detection accuracy of malicious users and providing important reference for the intelligent and real-time detection of internal threats.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. An intelligent internal threat detection method based on spatio-temporal feature fusion is characterized in that the method determines the abnormal degree of user behaviors by utilizing the temporal characteristics and the spatial characteristics of the user behaviors, and specifically comprises the following steps:
step S1, calling a basic feature extraction module, extracting basic features of the user behavior from user multi-source log information collected according to a time sequence, encoding the basic features, and taking the encoded basic features as the basis of time characteristic analysis and space characteristic analysis;
step S2, calling a time characteristic analysis module, and constructing a user behavior mixing matrix by splicing the coded basic characteristics and the time measurement indexes of the coded basic characteristics, wherein the user behavior mixing matrix is used for training a time characterization model;
step S3, calling a spatial characteristic analysis module to obtain coded basic characteristics of users belonging to the same role so as to train a spatial common group model;
step S4, invoking an abnormal integration analysis module, and calculating an abnormal degree of the user behavior by using the time sample reconstruction error and the space sample reconstruction error respectively obtained by the time characterization model and the space common group model, where the abnormal degree is used to determine the internal threat.
2. The method for intelligently detecting internal threats according to claim 1, wherein in the step S1, encoding the basic features includes converting text-type log information in the user multi-source log information into numerical-type frequent feature vectors at a day granularity.
3. The method for intelligent detection of internal threats according to claim 2, wherein in the step S2, a measured value set of the encoded base features within a window range is obtained according to a sliding window mechanism, and based on the measured value set, the time metric index of the current-day behavior feature of the user within the window range is calculated by using the following formula:
Figure 288586DEST_PATH_IMAGE001
(1)
Figure DEST_PATH_IMAGE002
(2)
wherein the content of the first and second substances,
Figure 690879DEST_PATH_IMAGE003
as an indicator of the first time measure,
Figure DEST_PATH_IMAGE004
in order to be an indicator of the second time metric,
Figure 955770DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
Figure 898449DEST_PATH_IMAGE007
respectively said basic characteristicsfIn the first placed、d-T+j+1、d-T+jSpecific time interval of daylIs measured by the frequency of the measurement of (c),jin order to traverse the value for the time window,
Figure DEST_PATH_IMAGE008
is composed of
Figure 870079DEST_PATH_IMAGE009
The historical measurements within the window range are,Tin order to be able to slide the length of the window,βin order to exponentially weight the average coefficient,meanthe mean value is represented by the average value,stdthe standard deviation is indicated.
4. The method for intelligent detection of internal threats according to claim 2, wherein in the step S2, the calculated time metric indexes are spliced behind the measured values of the encoded basic features to construct the user behavior mixture matrix, the time characterization model is trained based on the depth self-encoder and the user behavior mixture matrix is used as an input, and the training samples of the time characterization model are limited to personal historical data in the user multi-source log information.
5. The method according to claim 4, wherein in step S3, the encoded base features are clustered by using an organization attribution relationship to form a data set with a unique role as a identifier, the encoded base features of the users belonging to the same role are obtained from the data set, the encoded base features of the users belonging to the same role are input, the spatial common group model is trained based on the deep self-encoder, and the training samples of the spatial common group model are historical data of the users belonging to the same role.
6. The method for intelligently detecting internal threats according to claim 5, wherein different balancing coefficients are set according to different application scenarios, and the balancing coefficients are used for calculating the degree of abnormality of the user behavior in step S4.
7. An intelligent internal threat detection system based on spatiotemporal feature fusion is characterized in that the system determines the degree of abnormality of user behaviors by using the temporal characteristics and the spatial characteristics of the user behaviors, and specifically comprises the following steps:
the basic feature extraction module is configured to extract basic features of the user behaviors from user multi-source log information collected according to a time sequence, encode the basic features, and use the encoded basic features as the basis of time characteristic analysis and space characteristic analysis;
a temporal characteristic analysis module configured to construct a user behavior mixture matrix by concatenating the encoded base features and temporal metrics of the encoded base features, the user behavior mixture matrix being used to train a temporal characterization model;
a spatial characteristic analysis module configured to obtain encoded base features of users belonging to the same role to train a spatial common group model;
an anomaly integration analysis module configured to calculate an anomaly degree of the user behavior using the temporal sample reconstruction error and the spatial sample reconstruction error obtained by the temporal characterization model and the spatial common group model, respectively, the anomaly degree being used to determine the internal threat.
8. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the intelligent detection method for internal threats based on spatiotemporal feature fusion according to any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the intelligent detection method for internal threats according to any one of claims 1 to 6 based on spatiotemporal feature fusion.
CN202111526630.9A 2021-12-15 2021-12-15 Intelligent internal threat detection method and system based on space-time feature fusion Active CN113919239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111526630.9A CN113919239B (en) 2021-12-15 2021-12-15 Intelligent internal threat detection method and system based on space-time feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111526630.9A CN113919239B (en) 2021-12-15 2021-12-15 Intelligent internal threat detection method and system based on space-time feature fusion

Publications (2)

Publication Number Publication Date
CN113919239A true CN113919239A (en) 2022-01-11
CN113919239B CN113919239B (en) 2022-02-11

Family

ID=79249198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111526630.9A Active CN113919239B (en) 2021-12-15 2021-12-15 Intelligent internal threat detection method and system based on space-time feature fusion

Country Status (1)

Country Link
CN (1) CN113919239B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553497A (en) * 2022-01-28 2022-05-27 中国科学院信息工程研究所 Internal threat detection method based on feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015460A1 (en) * 1998-11-17 2004-01-22 Bernard Alhadef Controlled capacity modeling tool
CN110737890A (en) * 2019-10-25 2020-01-31 中国科学院信息工程研究所 internal threat detection system and method based on heterogeneous time sequence event embedding learning
CN113474776A (en) * 2018-12-19 2021-10-01 非典型安全公司 Threat detection platform for real-time detection, characterization, and remediation of email-based threats

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015460A1 (en) * 1998-11-17 2004-01-22 Bernard Alhadef Controlled capacity modeling tool
CN113474776A (en) * 2018-12-19 2021-10-01 非典型安全公司 Threat detection platform for real-time detection, characterization, and remediation of email-based threats
CN110737890A (en) * 2019-10-25 2020-01-31 中国科学院信息工程研究所 internal threat detection system and method based on heterogeneous time sequence event embedding learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553497A (en) * 2022-01-28 2022-05-27 中国科学院信息工程研究所 Internal threat detection method based on feature fusion
CN114553497B (en) * 2022-01-28 2022-11-15 中国科学院信息工程研究所 Internal threat detection method based on feature fusion

Also Published As

Publication number Publication date
CN113919239B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN107346388A (en) Web attack detection methods and device
CN105447388B (en) A kind of Android malicious code detection system based on weight and method
CN102148820A (en) System and method for estimating network security situation based on index logarithm analysis
CN104866763A (en) Permission-based Android malicious software hybrid detection method
CN110472671B (en) Multi-stage-based fault data preprocessing method for oil immersed transformer
CN107133118A (en) A kind of fault diagnosis model training method, method for diagnosing faults and relevant apparatus
CN113919239B (en) Intelligent internal threat detection method and system based on space-time feature fusion
CN107623924A (en) It is a kind of to verify the method and apparatus for influenceing the related Key Performance Indicator KPI of Key Quality Indicator KQI
Amin et al. Smart grid security enhancement by using belief propagation
CN114938287B (en) Power network abnormal behavior detection method and device integrating service characteristics
CN109088744A (en) Powerline network abnormal intrusion detection method, device, equipment and storage medium
CN116915442A (en) Vulnerability testing method, device, equipment and medium
CN105027088B (en) Network analysis equipment and systematic analytic method
CN105516206A (en) Network intrusion detection method and system based on partial least squares
CN114462040A (en) Malicious software detection model training method, malicious software detection method and malicious software detection device
CN107784411A (en) The detection method and device of key variables in model
CN109544165A (en) Resource transfers processing method, device, computer equipment and storage medium
CN117609992A (en) Data disclosure detection method, device and storage medium
CN116628554A (en) Industrial Internet data anomaly detection method, system and equipment
CN115239215B (en) Enterprise risk identification method and system based on deep anomaly detection
CN114697089A (en) Network anti-intrusion method and system for industrial automation operation
CN112613231B (en) Track training data perturbation mechanism with balanced privacy in machine learning
CN114254381A (en) Data evaluation method, device, equipment and medium based on multi-party security calculation
CN114938339A (en) Data processing method and related device
CN114760087A (en) DDoS attack detection method and system in software defined industrial internet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant