CN116542224A - Office document anomaly detection method and device and readable storage medium - Google Patents

Office document anomaly detection method and device and readable storage medium Download PDF

Info

Publication number
CN116542224A
CN116542224A CN202310801935.9A CN202310801935A CN116542224A CN 116542224 A CN116542224 A CN 116542224A CN 202310801935 A CN202310801935 A CN 202310801935A CN 116542224 A CN116542224 A CN 116542224A
Authority
CN
China
Prior art keywords
operation code
document
system call
office document
code sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310801935.9A
Other languages
Chinese (zh)
Inventor
孙才俊
孙天宁
白冰
张兴明
张音捷
王之宇
徐昊天
张奕鹏
杨钢
夏俊伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310801935.9A priority Critical patent/CN116542224A/en
Publication of CN116542224A publication Critical patent/CN116542224A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/157Transformation using dictionaries or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application relates to a method and a device for detecting Office document abnormality and a readable storage medium, wherein the method comprises the following steps: acquiring an operation code sequence based on system call information generated when an Office document to be detected runs; converting the operation code sequence into a gray scale map based on a pre-acquired data dictionary; and inputting the gray level map into a trained document anomaly detection model, determining whether the Office document to be detected is abnormal, and obtaining a conclusion of whether the Office document has an anomaly security risk by extracting and integrating operation information and inputting the document anomaly detection model, so that information loss caused by limited length of the input information of the model is avoided, and the problem of lower accuracy of detecting the Office document anomaly through a neural network model in the related technology is solved.

Description

Office document anomaly detection method and device and readable storage medium
Technical Field
The present invention relates to the field of information security technologies, and in particular, to a method and an apparatus for detecting Office document abnormality, and a readable storage medium.
Background
In recent years, malicious viruses based on Office documents, such as common luxer viruses and powershell viruses, are increasingly growing, and the viruses can be built into the Office documents. In view of the APT attack events in recent years, implementing attack intrusion using malicious Office documents (Word, excel, PPT, etc.) has become one of the most commonly used attack means. An attacker usually takes a malicious Office document as a bait to induce personnel in an enterprise mechanism to trigger attack codes in the malicious Office document, so that intrusion behavior is realized. This not only presents a significant risk to the enterprise institution, but also presents a number of threats to national security. After an attacker attacks a computer system inside an enterprise institution, the computer is generally used as a springboard to move transversely, attack other network facilities with weak protection in a network, steal confidential data or destroy important infrastructure. In order to solve the problem, the related art uses a multi-layer convolutional neural network training model, and detects whether an abnormality exists in an Office document through the neural network model. However, the method has strict requirements on the length of the input data, and if the input data is too long, the data needs to be truncated, so that information is lost, and the accuracy of detecting the abnormality of the Office document is affected.
Aiming at the problem of low detection accuracy of Office document abnormality through a neural network model in the related technology, no effective solution is proposed at present.
Disclosure of Invention
In this embodiment, a method, an apparatus, and a readable storage medium for detecting Office document abnormality are provided, so as to solve the problem in the related art that the accuracy of detecting Office document abnormality by using a neural network model is low.
In a first aspect, in this embodiment, there is provided a Office document anomaly detection method, including:
acquiring an operation code sequence based on system call information generated when an Office document to be detected runs;
converting the operation code sequence into a gray scale map based on a pre-acquired data dictionary;
and inputting the gray level map into a trained document abnormality detection model, and determining whether the Office document to be detected is abnormal.
In some of these embodiments, the converting the operation code sequence into a gray scale map based on a pre-acquired data dictionary includes:
converting the operation code sequence into corresponding feature vectors based on the data dictionary;
and generating a corresponding gray scale map based on the values of the feature vectors.
In some of these embodiments, the generating the corresponding gray map based on the values of the feature vector includes:
acquiring a gray value of a corresponding pixel point based on the value of each element in the feature vector;
and generating a gray level map with preset width according to the sequence from left to right and from bottom to top based on the gray level value of each pixel point and the arrangement sequence of each element in the characteristic vector.
In some of these embodiments, the converting the operation code sequence into the corresponding feature vector based on the data dictionary includes:
querying codes corresponding to the operation codes in the data dictionary based on the operation codes in the operation code sequence;
under the condition that the corresponding code is inquired, the code is used as a characteristic vector element value corresponding to the operation code;
deleting the operation code under the condition that the corresponding code is not queried;
and based on the sequence of each operation code in the operation code sequence, arranging corresponding eigenvector element values to generate the eigenvectors.
In some embodiments, the system call information includes a log generation time, a system call operation code and a thread number, and the acquiring the operation code sequence based on the system call information generated when the Office document to be detected runs includes:
Grouping the system call information based on the thread number;
sorting the system call information of each group based on the log generation time;
and based on the ordering of the system call information, arranging the corresponding system call operation codes, and generating the operation code sequences corresponding to the groups.
In some of these embodiments, the document anomaly detection model is trained based on the following:
acquiring a corresponding operation code sequence sample set based on a pre-acquired system call information sample set;
converting the operation code sequence sample set into a corresponding picture sample set based on the data dictionary;
taking the size of the picture with the largest height value in the picture sample set as a standard size, and carrying out standardization processing on the picture samples in the picture sample set;
and inputting the picture sample subjected to the standardization processing into an initial model for training to obtain the document anomaly detection model.
In some of these embodiments, the data dictionary is obtained based on the following:
acquiring a corresponding operation code sequence sample set based on a pre-acquired system call information sample set;
and encoding the operation codes in the operation code sequence sample set to generate the data dictionary.
In some embodiments, before the acquiring the operation code sequence based on the system call information generated by the to-be-detected Office document in the running process, the method further includes:
operating the Office document to be detected in a pre-created sandbox environment to acquire corresponding log information;
extracting process information corresponding to the Office document to be detected in the log information;
and acquiring the system call information based on the process information.
In a second aspect, in this embodiment, there is provided an Office document anomaly detection apparatus, including:
the acquisition module is used for acquiring an operation code sequence based on system call information generated when the Office document to be detected runs;
a conversion module for converting the operation code sequence into a gray scale map based on a data dictionary acquired in advance;
and the determining module is used for inputting the gray level diagram into a trained document abnormality detection model and determining whether the Office document to be detected is abnormal or not.
In a third aspect, in this embodiment, there is provided a readable storage medium having stored thereon a program that when executed by a processor implements the steps of the Office document anomaly detection method described in the first aspect.
Compared with the related art, the Office document anomaly detection method provided in the embodiment acquires the operation code sequence based on the system call information generated when the Office document to be detected runs, namely, collects the operation information executed in the system call information when the Office document runs, and arranges the operation information according to the time sequence to form continuous operation information; converting an operation code sequence into a gray level diagram based on a pre-acquired data dictionary, and converting all operation information related to the Office document into a picture form according to a time sequence, wherein the picture has no height dimension limitation, and the interruption and the loss of the operation information are avoided while the time sequence of the operation information is maintained; the method comprises the steps of inputting a gray level diagram into a trained document anomaly detection model, determining whether an Office document to be detected is abnormal, extracting and integrating operation information, and inputting the model to obtain a conclusion whether the Office document is a malicious document, so that information loss caused by limited length of the input information of the model is avoided, and the problem of low accuracy of detecting the Office document anomaly through a neural network model in the related technology is solved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a block diagram of computer hardware running an Office document anomaly detection method in accordance with some embodiments of the present application;
FIG. 2 is a flow chart of an Office document anomaly detection method of some embodiments of the present application;
FIG. 3 is a flow chart of converting an operation code sequence into a grayscale image according to some embodiments of the present application;
FIG. 4 is a flow chart of generating a gray map based on values of feature vectors according to some embodiments of the present application;
FIG. 5 is a flow chart of converting an opcode sequence into feature vectors based on a data dictionary in accordance with some embodiments of the present application;
FIG. 6 is a flow diagram of a system call information based acquisition of an opcode sequence in accordance with some embodiments of the present application;
FIG. 7 is a training flow diagram of a document anomaly detection model according to some embodiments of the present application;
FIG. 8 is a flow chart of the retrieval of a data dictionary in some embodiments of the present application;
FIG. 9 is a flow diagram of the retrieval of system call information according to some embodiments of the present application;
FIG. 10 is a flow chart of an Office document anomaly detection method in accordance with some preferred embodiments of the present application;
FIG. 11 is a block diagram of an Office document anomaly detection apparatus according to some embodiments of the present application.
Detailed Description
For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.
The Office document anomaly detection method provided by the embodiment of the application can be executed in a terminal, a computer, a server or similar computing devices. When the method is applied to a terminal, a computer, a server, or a similar computing device, fig. 1 is a block diagram of a hardware structure of a computer of the Office document abnormality detection method according to some embodiments of the present application. As shown in fig. 1, the computer may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a CPU, a microprocessor MCU, a programmable logic device FPGA, or the like processing means. The computer may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the configuration shown in FIG. 1 is merely illustrative and is not intended to limit the configuration of the computer described above. For example, the computer may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the Office document abnormality detection method in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some embodiments, the memory 104 may further comprise memory located remotely from the processor 102, which may be connected to the computer via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network described above includes a wireless network provided by a communications provider. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
In this embodiment, a method for detecting an Office document abnormality is provided, and fig. 2 is a flowchart of a method for detecting an Office document abnormality according to some embodiments of the present application, as shown in fig. 2, where the flowchart includes the following steps:
step S201, based on the system call information generated when the Office document to be detected runs, an operation code sequence is acquired.
In this embodiment, the Office document to be detected runs in the Windows operating system, and the Office document to be detected may be a document type such as Word, excel, powerPoint. The document may generate call information to system APIs during execution that may be collected by a system monitoring tool, such as a process monitor, which may capture and display information such as real-time file systems, registries, and process/thread activities. The operation code refers to a character string used for describing the system call operation in the system call information, for example ReadFile, regOpenKey, regQueryValue, and according to the operation code, it can be determined which system call operations are executed in the running process of the Office document to be detected. The operation code sequence is a character string sequence formed by arranging operation codes according to a certain sequence. The string sequence may be distinguished from each opcode by a preset identifier, or may be distinguished in other ways. In this embodiment, the length of the operation code sequence is not limited, and the arrangement may be arranged according to time sequence.
Step S202, converting the operation code sequence into a gray scale map based on a data dictionary acquired in advance.
The data dictionary may be used to store mappings between opcodes in the system call information and gray values of the gray map. When the number of the operation codes is greater than the number of the gray values, the operation codes may be converted into corresponding values according to a predetermined rule, and then the values may be normalized. For example, in the case of a gray value of [0, 255], the numerical value may also be normalized to be within the range of [0, 255], whereby the mapping relationship between the operation code and the gray value is established.
And assigning gray values corresponding to the operation codes to corresponding pixel points according to the arrangement sequence of the operation codes in the operation code sequence, and then arranging the pixel points according to the arrangement sequence of the operation codes to generate a corresponding gray map.
Step S203, inputting the gray level map into the trained document anomaly detection model, and determining whether the Office document to be detected is abnormal.
The document anomaly detection model can perform two-classification judgment according to the input gray level diagram, and give out a result of whether the Office document to be detected is anomalous. In this embodiment and other embodiments, the anomaly indicates that the document contains a malicious offensive code, and the malicious offensive code is triggered manually or automatically after the document is run, and a security risk is generated for a user after the triggering. The document anomaly detection model may be a convolutional neural network model, and in particular, may be an SPP-Net (Spatial Pyramid Pooling Network) model. The model may include an input layer, a convolution layer, an SPP layer, and a full connection layer, the model loss function may be a model_cross sentropy, and the evaluation index may be F1-Score.
Through the steps S201 to S203, an operation code sequence is obtained based on the system call information generated when the Office document to be detected runs, that is, the operation information executed in the system call information when the Office document runs is collected and arranged according to the time sequence, so as to form continuous operation information; converting an operation code sequence into a gray level diagram based on a pre-acquired data dictionary, and converting all operation information related to the Office document into a picture form according to a time sequence, wherein the picture has no height dimension limitation, and the interruption and the loss of the operation information are avoided while the time sequence of the operation information is maintained; the method comprises the steps of inputting a gray level diagram into a trained document anomaly detection model, determining whether an Office document to be detected is abnormal, extracting and integrating operation information, and inputting the model to obtain a conclusion whether the Office document is a malicious document, so that information loss caused by limited length of the input information of the model is avoided, and the problem of low accuracy of detecting the Office document anomaly through a neural network model in the related technology is solved.
In some of these embodiments, fig. 3 is a flowchart of the conversion of an operation code sequence of some embodiments of the present application into a gray scale, as shown in fig. 3, the flowchart comprising the steps of:
Step S301, converting the operation code sequence into a corresponding feature vector based on the data dictionary.
In this embodiment, the data dictionary stores the mapping relation between the operation codes and the corresponding codes. The code may be a numerical value obtained by converting the operation code character string based on a predetermined rule, or a numerical value obtained by directly numbering according to the number of operation codes. The encoded numerical range may also be normalized according to a range of gray values, e.g., falling within the [0, 255] range. The feature vector is a sequence in which values obtained by corresponding codes are arranged according to the arrangement order of the operation codes in the operation code sequence. Each code is an element of the feature vector and corresponds to an opcode of the opcode sequence.
Step S302, based on the value of the feature vector, a corresponding gray scale map is generated.
The value of each element in the feature vector is a code, and can correspond to the gray value of a pixel point. And arranging the gray values according to the arrangement sequence of each element in the feature vector to obtain a queue of pixel points. According to the preset width dimension of the gray level map, the number of pixel points in each row is determined, and a gray level map with fixed width and unlimited height can be generated.
Through the steps S301-S302, the operation code sequence is converted into the corresponding feature vector based on the data dictionary, the operation code is converted into the corresponding gray value, the feature vector is generated according to the arrangement sequence of the operation code, and the time sequence information of the operation code is reserved; and generating a corresponding gray level map based on the value of the feature vector, and providing picture input for abnormality detection of a subsequent input document abnormality detection model, wherein the gray level map does not limit the height, and avoids interruption and loss of operation information while maintaining the time sequence of the operation information, thereby improving the accuracy of Office document abnormality detection.
In some of these embodiments, fig. 4 is a flowchart of generating a gray map based on values of feature vectors according to some embodiments of the present application, as shown in fig. 4, the flowchart including the steps of:
step S401, based on the value of each element in the feature vector, acquires the gray value of the corresponding pixel point.
The element values of the feature vector can be directly used as the gray values of the corresponding pixel points under the condition of being normalized according to the numerical range of [0, 255 ]; if the normalization is not performed or the numerical range of [0, 255] is exceeded, the gray value of the pixel point is assigned after transformation according to a predetermined rule.
Step S402, a gray scale map with preset width is generated according to the sequence from left to right and from bottom to top based on the gray scale value of each pixel point and the arrangement sequence of each element in the feature vector.
And according to the arrangement sequence of the elements in the feature vector, arranging the pixel points corresponding to the elements in sequence. And determining the number of pixel points in each row according to the preset gray scale width. The gray-scale image width in this embodiment is 256, i.e., each row includes 256 pixel points. The pixel points are arranged row by row according to the sequence from left to right and from bottom to top, and an N multiplied by 256 pixel point matrix is generated, wherein N is the row number of the matrix. The portion of the matrix that is not full of a row is filled with pixel points having a gray value of 0.
Through the steps S401-S402, the gray value of the corresponding pixel point is obtained based on the value of each element in the feature vector, the mapping relation between the element value of the feature vector and the gray value of the pixel point is established according to the determined rule, and a uniform generation rule is determined for the input picture of the document anomaly detection model; based on the gray value of each pixel point and the arrangement sequence of each element in the feature vector, generating a gray map with preset width according to the sequence from left to right and from bottom to top, obtaining a picture meeting the input requirement of a document anomaly detection model on the basis of maintaining the time sequence information and the integrity of an operation code, and providing a data basis for document anomaly detection.
In some of these embodiments, fig. 5 is a flow chart of converting an operation code sequence into feature vectors based on a data dictionary according to some embodiments of the present application, as shown in fig. 5, the flow comprising the steps of:
in step S501, based on the operation code in the operation code sequence, the code corresponding to the operation code is queried in the data dictionary.
The mapping relation between various operation codes and codes is stored in the data dictionary in advance. The code may be a numerical value obtained by converting the operation code character string based on a predetermined rule, or a numerical value obtained by directly numbering according to the number of operation codes. In case the value is outside the gray value range, the normalization may also be performed according to the gray value range, e.g. falling within the [0, 255] range.
In step S502, when a corresponding code is queried, the code is used as a feature vector element value corresponding to the operation code.
If a corresponding code is queried in the data dictionary, the operation represented by the operation code is known and can be identified by the document anomaly detection model, so that the operation code can be converted into the corresponding code and input into the document anomaly detection model as a characteristic vector element value.
In step S503, if the corresponding code is not found, the operation code is deleted.
If the corresponding code is not queried in the data dictionary, the operation represented by the operation code is unknown and cannot be identified by the document abnormality detection model, and if the model is input, the abnormality detection cannot be accurately carried out, so the operation code is deleted from the operation code sequence.
Step S504, based on the order of the operation codes in the operation code sequence, corresponding eigenvector element values are arranged to generate eigenvectors.
Through the steps S501 to S504, by searching the codes corresponding to the operation codes in the data dictionary based on the operation codes in the operation code sequence, it is confirmed whether each operation code can be identified by the data dictionary; converting the operation code into a characteristic vector element value under the condition that the operation code can be identified by taking the code as the characteristic vector element value corresponding to the operation code under the condition that the corresponding code is inquired, so as to be used as the effective input of a document anomaly detection model; deleting the operation code under the condition that the corresponding code is not queried, deleting the operation code which cannot be identified, and avoiding the unknown operation from affecting the accuracy of anomaly detection; the feature vector is generated by arranging corresponding feature vector element values based on the ordering of the operation codes in the operation code sequence, and a data basis is provided for generating a corresponding gray level diagram and inputting a document anomaly detection model.
In some of these embodiments, the system call information includes log generation time, system call opcode, and thread number. FIG. 6 is a flow chart of a system call information based acquisition of an opcode sequence according to some embodiments of the present application, as shown in FIG. 6, the flow comprising the steps of:
step S601, grouping system call information based on the thread number.
The system call information may be extracted from a behavior log generated by the process monitoring tool, for example, a behavior sequence of a relevant process of the Office program may be extracted, and relevant process names are WINWORD. EXE (word processing program), POWERPNT. EXE (ppt processing program), EXCEL. EXE (excel processing program), and the like, respectively. And then screening out effective attributes in the log, including Time of Day (log generation Time), operation (system call Operation code), TID (thread number) and the like, and generating system call information.
Grouping the system call information according to the TID, and arranging the grouping sequence according to the size of the thread number.
Step S602, based on the log generation time, ordering the system call information of each packet.
And then sequencing the system call information in each group according to the log generation time.
Step S603, based on the ordering of the system call information, the corresponding system call operation codes are arranged, and the operation code sequence corresponding to each packet is generated.
And extracting a system call operation code in each set of system call information, and generating an operation code sequence according to the ordering of the system call information. The operation code sequences of different packets are different.
Through the steps S601-S603, system call information corresponding to different threads is obtained by grouping the system call information based on the thread numbers; sequencing the system call information of each group based on log generation time to obtain a series of operation information which is executed by the same thread according to time sequence, and judging whether a malicious attack intention exists in the document according to the content and execution sequence of the operation information; and arranging the corresponding system call operation codes based on the ordering of the system call information, generating an operation code sequence corresponding to each packet, and providing input data for a subsequent input document abnormality detection model.
In some of these embodiments, FIG. 7 is a training flowchart of a document anomaly detection model of some embodiments of the present application, as shown in FIG. 7, the flowchart comprising the steps of:
step S701, acquiring a corresponding operation code sequence sample set based on a pre-acquired system call information sample set.
And acquiring a system call information sample set in advance, wherein all system call information in the system call information sample set is subjected to abnormality detection, and a result of whether the system call information sample set is abnormal is obtained. Using the method in the above embodiment, a corresponding operation code sequence sample is generated from each system call information sample in the sample set, and an operation code sequence sample set is configured.
Step S702, converting the operation code sequence sample set into a corresponding picture sample set based on the data dictionary.
Using the method in the above embodiment, each operation code sequence sample in the operation code sequence sample set is converted into a corresponding picture sample, and a picture sample set is constituted.
In step S703, the picture sample in the picture sample set is normalized by taking the size of the picture with the largest height value in the picture sample set as the standard size.
Because specific operation contents and operation numbers are different in the operation code sequences corresponding to different system call information, the generated picture samples are consistent in width and different in height. And filling other pictures in the picture sample set by using the pixel point with the gray value of 0 by taking the size of the picture with the largest height value in the picture sample set as the standard size, so that the heights of all the picture samples in the picture sample set are consistent.
Step S704, inputting the picture sample subjected to the standardization processing into an initial model for training to obtain a document anomaly detection model.
The picture sample input initial model may be a convolutional neural network model, and in particular, may be an SPP-Net (Spatial Pyramid Pooling Network) model. The model may include an input layer, a convolution layer, an SPP layer, and a full connection layer, the model loss function may be a model_cross sentropy, and the evaluation index may be F1-Score. And adjusting the weight and bias of each neuron node in the model through a back propagation algorithm until the model is stable.
Through the steps S701 to S704, a corresponding operation code sequence sample set is obtained based on a system call information sample set obtained in advance, so as to obtain a sample containing a label for model training; converting the operation code sequence sample set into a corresponding picture sample set based on a data dictionary, and using the picture sample set as input for training a document anomaly detection model; the picture sample in the picture sample set is standardized by taking the size of the picture with the largest height value in the picture sample set as the standard size, so that the subsequent model training is facilitated; the image sample subjected to standardized processing is input into the initial model for training, so that a document anomaly detection model is obtained and is used as a standard for detecting the anomaly of the Office document, and the accuracy of detecting the anomaly of the Office document is improved.
In some of these embodiments, fig. 8 is a flowchart of the data dictionary acquisition of some embodiments of the present application, as shown in fig. 8, the flowchart including the steps of:
step S801, a corresponding operation code sequence sample set is acquired based on a system call information sample set acquired in advance.
The implementation method of step S701 in the above embodiment may be used to generate a corresponding operation code sequence sample from each system call information sample in the system call information sample set, so as to form an operation code sequence sample set. And carrying out anomaly detection on each piece of system call information to obtain an anomaly result.
Step S802, the operation codes in the operation code sequence sample set are encoded to generate a data dictionary.
The opcodes in the opcode set are counted and numbered to generate a data dictionary. In this embodiment, a total of 65 unique system call opcodes are obtained by sample collection.
Through the steps S801 to S802, a corresponding operation code sequence sample set is obtained based on a system call information sample set obtained in advance, so as to obtain an operation code sequence sample for training; the method comprises the steps of encoding operation codes in an operation code sequence sample set, generating a data dictionary, obtaining a mapping relation between the operation codes and the codes, and using the relation for model training of document anomaly detection and Office document anomaly detection in practical application, wherein the mapping relation is based on a unified mapping relation as a query tool for model training and practical application.
In some of these embodiments, fig. 9 is a flowchart of the acquisition of system call information according to some embodiments of the present application, as shown in fig. 9, the flowchart including the steps of:
step S901, running the Office document to be detected in the pre-created sandbox environment, and obtaining the corresponding log information.
Creating a sandbox capable of running Office documents, running Office documents to be detected, and running a process monitor program, wherein the process monitor monitors and records a behavior log related to system calls generated when the Office process runs.
Step S902, extracting process information corresponding to the Office document to be detected in the log information.
And extracting process information of the Office related process according to the action log obtained in the Office sandbox, wherein the process names of the Office program during running are WINWORD. EXE (word processing program), POWERPNT. EXE (ppt processing program) and EXCEL. EXE (excel processing program) respectively.
Step S903, based on the process information, system call information is acquired.
And (3) screening out relevant attributes in the behavior log from the process information, wherein the relevant attributes comprise Time of Day (log generation Time), operation (Operation code generated by system call) and TID (thread number), and the rest of attribute columns are deleted to form the system call information.
Through the steps S901 to S903, the to-be-detected Office document is operated in a pre-created sandbox environment, corresponding log information is obtained, risks possibly existing in the to-be-detected Office document are isolated through the sandbox environment, and the log information of the to-be-detected Office document during operation is obtained by using a process monitoring tool; selecting process information related to an Office program by extracting process information corresponding to an Office document to be detected in log information, and narrowing the information range to be processed by screening; the system call information is acquired based on the process information, and the information is screened again to reduce storage and calculation resources of subsequent processing, so that only the system call information for detecting the abnormal document is reserved, and the efficiency of detecting the abnormal document is improved.
The present embodiment is described and illustrated below by way of preferred embodiments. FIG. 10 is a flow chart of an Office document anomaly detection method in accordance with some preferred embodiments of the present application. As shown in fig. 10, the flow includes the steps of:
step S1001, creating a virtual machine configuration file, wherein the configuration file comprises metadata and an operating system starting item, and then starting a virtual machine instance by using a virsh command;
Step S1002, installing an operating system, a processing monitor and Office software in the created virtual machine instance for constructing a sandbox;
step S1003, after the sandbox is prepared, creating a snapshot by using a virsh tool for resetting the subsequent sandbox;
step S1004, starting 1 virtual machine instance based on Windows 7 operating system by using virsh (version v6.0.0) tool, and installing Office 2007 as a sandbox for running Office document;
step S1005, transmitting the sample to be detected to a virtual machine instance, and opening the sample to be detected by using an Office program after running a process monitor program (version v 3.70);
step S1006, exporting the behavior log grabbed by the ProcessMonitor into a csv format after waiting for 30 seconds;
step S1007, storing the file name rule into a log storage server through an FTP request, wherein the log storage system adopts Proftpd (version v1.3.6e) as an FTP server program, and the log file name rule is that a sha256 abstract of an Office document is added with a host time stamp;
step S1008, closing the virtual machine instance by using a virsh tool, and restoring the virtual machine instance to a snapshot node before the detection document is not operated by virsh to realize the cleaning of the virtual machine instance, so as to avoid the pollution of the detection sample to the sandbox in the operation process;
Step S1009, extracting the action sequence of the Office related process from the log file, wherein the process names of the Office program running are WINWORD. EXE (word processing program), POWERPNT. EXE (ppt processing program) and EXCEL. EXE (excel processing program), respectively;
step S1010, screening out related attributes from the action sequence of the Office related process, including Time of Day (log generation Time), operation (Operation code generated by system call), TID (thread number), and deleting the rest attribute columns;
step S1011, grouping the logs according to TID, arranging the grouping sequence according to the size of the thread number, and sequencing the logs according to the time generated by the logs in the group;
step S1012, extracting Operation attribute in each row of logs as an Operation code sequence of the document in Operation, and deleting unknown Operation codes;
step S1013, selecting a behavior log of a sample with a known classification result, and executing steps S1009-S1012 to obtain an operation code sequence;
in step S1014, the operation code set in the operation code sequence is counted and numbered to generate a data dictionary. Through experiments, 65 unique system call operation codes are obtained in total;
step S1015, encoding the operation code sequence in step S1012 according to the data dictionary generated in step S1014, to obtain a feature vector;
Step S1016, converting the coded feature vector into a picture, wherein the width is fixed to 256, the height is variable, the pixels with the width less than 256 and the gray level of 0 are used for filling and converting into the picture, and in the model prediction stage, the heights of different pictures can be inconsistent;
in step S1017, the structure of the deep learning model is determined, and each layer of network is specifically described as follows:
input layer: receiving a single channel, wherein the width is 256, and the height is variable;
convolution layer C1: sampling the input data row by adopting 32 convolution kernels with the size of 1 multiplied by 3, requiring a reserved boundary (consistent input and output sizes), and enabling an activation function to be relu;
convolution layer C2: sampling the input data row by adopting 32 convolution kernels with the size of 1 multiplied by 5, requiring a reserved boundary (consistent input and output sizes), and enabling an activation function to be relu;
convolution layer C3: sampling the input data row by adopting 32 convolution kernels with the size of 1 multiplied by 7, requiring a reserved boundary (consistent input and output sizes), and enabling an activation function to be relu;
SPP layer SPP1: and adopting an SPP layer with the size of 1 multiplied by 4, processing the input with the non-fixed length through the SPP layer, and outputting fixed-length data. The SPP algorithm divides an input picture using a number of scales of different sizes, each block acting as an output neuron. Finally, converting a picture with any size into a feature with a fixed size;
Full connection layer DS1: the method comprises the steps that the size is 2, information input by a front layer is extracted and integrated and used for two classifications, and a result indicates whether an Office document corresponding to an input picture is abnormal or not;
step S1018, selecting a behavior log of a sample with a known classification result, and executing steps S1009-S1015 to extract feature vectors; converting the feature vector into pictures, and filling other pictures by using 0 with the picture with the largest height as a standard size, so that the heights of the finally input pictures are consistent;
step S1019, adjusting the weight and bias of each neuron node in the deep learning model through a back propagation algorithm until the model is stable, wherein the model loss function is category_cross sentropy, and the evaluation index selects F1-Score.
Step S1020, taking the picture obtained in the step S1016 as input, inputting the input into a trained document anomaly detection model for anomaly detection and obtaining a detection result.
Through the steps S1001 to S1020, the possible risk in the Office document to be detected is isolated through the sandbox environment, the log information of the Office document to be detected during operation is obtained by using the process monitoring tool, the system call information related to the Office process in the log information is extracted, the related operations are screened, and the continuous operation code sequence is generated by arranging according to the time sequence, so that all the operation information related to the Office document is converted into the form of a picture according to the time sequence, the picture has no limit of height and size, and the interruption and loss of the operation information are avoided while the time sequence of the operation information is maintained; by inputting the generated pictures into the model, a conclusion of whether the Office document has abnormality or not is obtained, information loss caused by limited length of the input information of the model is avoided, the input of pictures with any size and the output of fixed sizes are realized, the problem that the accuracy of detecting the abnormality of the Office document is low by a neural network model in the related technology is solved, and the accuracy and the detection efficiency of detecting the abnormality of the Office document are improved.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
In some embodiments, the present application further provides an Office document anomaly detection device, where the Office document anomaly detection device is used to implement the foregoing embodiments and preferred implementations, and the description is omitted herein. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function.
In some embodiments, fig. 11 is a block diagram of the Office document abnormality detection apparatus of the present embodiment, as shown in fig. 11, the apparatus includes:
the obtaining module 1101 is configured to obtain an operation code sequence based on system call information generated when the Office document to be detected runs;
a conversion module 1102, configured to convert an operation code sequence into a gray scale map based on a data dictionary acquired in advance;
the determining module 1103 is configured to input the gray map into the trained document anomaly detection model, and determine whether the Office document to be detected is anomaly.
According to the Office document anomaly detection device in the embodiment, an acquisition module 1101 acquires an operation code sequence based on system call information generated when an Office document to be detected runs, namely, collects operation information executed in the system call information when the Office document runs, and arranges the operation information according to a time sequence to form continuous operation information; converting an operation code sequence into a gray level diagram based on a pre-acquired data dictionary by a conversion module 1102, and converting all operation information related to the Office document into a picture form according to a time sequence, wherein the picture has no height dimension limitation, and the interruption and the loss of the operation information are avoided while the time sequence of the operation information is maintained; the determination module 1103 inputs the gray level map into the trained document anomaly detection model to determine whether the Office document to be detected is abnormal, and the conclusion of whether the Office document is a malicious document is obtained by extracting and integrating operation information and inputting the model, so that information loss caused by limited length of the input information of the model is avoided, and the problem of lower accuracy of detecting the Office document anomaly through a neural network model in the related technology is solved.
In some embodiments, the conversion module includes a conversion sub-module for converting the operation code sequence into a corresponding feature vector based on the data dictionary, and a first generation sub-module; the first generation sub-module is used for generating a corresponding gray scale map based on the values of the feature vectors.
According to the Office document anomaly detection device, an operation code sequence is converted into a corresponding characteristic vector through a conversion submodule based on a data dictionary, an operation code is converted into a corresponding gray value, the characteristic vector is generated according to the arrangement sequence of the operation code, and time sequence information of the operation code is reserved; the first generation sub-module generates a corresponding gray level diagram based on the value of the feature vector, and provides picture input for abnormality detection of a subsequent input document abnormality detection model.
In some embodiments, the first generating sub-module includes an obtaining unit and a generating unit, where the obtaining unit is configured to obtain a gray value of a corresponding pixel point based on a value of each element in the feature vector; the generating unit is used for generating a gray level map with preset width according to the gray level value of each pixel point and the arrangement sequence of each element in the feature vector from left to right and from bottom to top.
According to the Office document anomaly detection device, a gray value of a corresponding pixel point is obtained through an obtaining unit based on the value of each element in a feature vector, a mapping relation between the element value of the feature vector and the gray value of the pixel point is established according to a determined rule, and a uniform generation rule is determined for an input picture of a document anomaly detection model; the generation unit generates a gray level map with preset width according to the sequence from left to right and from bottom to top based on the gray level value of each pixel point and the arrangement sequence of each element in the feature vector, and obtains a picture meeting the input requirement of a document anomaly detection model on the basis of time sequence information and integrity of the reserved operation code, thereby providing a data basis for document anomaly detection.
In some embodiments, the conversion submodule includes a query unit, an assignment unit, a deletion unit and an arrangement unit, where the query unit is configured to query a data dictionary for a code corresponding to an operation code based on the operation code in the operation code sequence; the assignment unit is used for taking the code as a characteristic vector element value corresponding to the operation code under the condition that the corresponding code is inquired; the deleting unit is used for deleting the operation code under the condition that the corresponding code is not queried; the arrangement unit is used for arranging corresponding eigenvector element values based on the ordering of the operation codes in the operation code sequence to generate eigenvectors.
According to the Office document anomaly detection device, a query unit queries codes corresponding to operation codes in a data dictionary based on the operation codes in the operation code sequence, and confirms whether each operation code can be identified by the data dictionary; under the condition that the corresponding code is inquired through the assignment unit, the code is used as a characteristic vector element value corresponding to the operation code, and under the condition that the operation code can be identified, the operation code is converted into the characteristic vector element value to be used as effective input of a document anomaly detection model; deleting the operation code through the deleting unit under the condition that the corresponding code is not inquired, deleting the operation code which cannot be identified, and avoiding the unknown operation from affecting the accuracy of anomaly detection; the arrangement unit is used for arranging the corresponding eigenvector element values based on the sequence of each operation code in the operation code sequence to generate eigenvectors, and providing a data basis for generating corresponding gray level images and inputting a document anomaly detection model.
In some embodiments, the system call information includes a log generation time, a system call operation code, and a thread number, the acquisition module includes a grouping sub-module, a sorting sub-module, and a second generation sub-module, the grouping sub-module is configured to group the system call information based on the thread number; the sequencing submodule is used for sequencing the system call information of each group based on the log generation time; the second generation submodule is used for arranging the corresponding system call operation codes based on the ordering of the system call information and generating the operation code sequences corresponding to the packets.
According to the Office document anomaly detection device, system call information is grouped by a grouping sub-module based on thread numbers, and system call information corresponding to different threads is obtained; sequencing the system call information of each group based on log generation time through a sequencing submodule to obtain a series of operation information which is executed by the same thread according to time sequence, and judging whether a malicious attack intention exists in the document according to the content and the execution sequence of the operation information; and arranging corresponding system call operation codes through a second generation submodule based on the ordering of the system call information, generating an operation code sequence corresponding to each packet, and providing input data for a subsequent input document anomaly detection model.
In some embodiments, the Office document anomaly detection device includes a first storage module, where the first storage module is configured to store a trained document anomaly detection model.
According to the Office document abnormality detection device, the document abnormality detection model stored by the second storage module is used as the standard of Office document abnormality detection, so that the accuracy of Office document abnormality detection is improved.
In some embodiments, the Office document anomaly detection apparatus includes a second storage module, where the second storage module is configured to store a data dictionary.
According to the Office document anomaly detection device, the mapping relation between the operation code and the code is obtained through the data dictionary stored in the second storage module, and is used for model training of document anomaly detection and Office document anomaly detection in practical application, and the mapping relation is used as a query tool for model training and practical application.
In some embodiments, the Office document anomaly detection device further includes a second acquisition module, an extraction module, and a third acquisition module, where the second acquisition module is configured to run an Office document to be detected in a pre-created sandbox environment, and acquire corresponding log information; the extraction module is used for extracting process information corresponding to the Office document to be detected in the log information; the third acquisition module is used for acquiring system call information based on the process information.
According to the Office document anomaly detection device, an Office document to be detected is operated in a pre-created sandbox environment through a second acquisition module, corresponding log information is acquired, risks possibly existing in the Office document to be detected are isolated through the sandbox environment, and the log information of the Office document to be detected in operation is acquired through a process monitoring tool; extracting process information corresponding to the Office document to be detected in the log information through an extraction module, selecting process information related to an Office program, and reducing the information range to be processed through screening; and acquiring the system call information based on the process information through a third acquisition module, and screening the information again to reduce storage and calculation resources of subsequent processing, wherein only the system call information for detecting the abnormal document is reserved, so that the abnormal document detection efficiency is improved.
In addition, in combination with the Office document anomaly detection method provided in the above embodiment, a readable storage medium may be provided in this embodiment. The readable storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the Office document anomaly detection methods of the above embodiments.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.
It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.
The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. An Office document anomaly detection method, the method comprising:
acquiring an operation code sequence based on system call information generated when an Office document to be detected runs;
Converting the operation code sequence into a gray scale map based on a pre-acquired data dictionary;
and inputting the gray level map into a trained document abnormality detection model, and determining whether the Office document to be detected is abnormal.
2. The method of claim 1, wherein the converting the operation code sequence into a gray scale map based on a pre-acquired data dictionary comprises:
converting the operation code sequence into corresponding feature vectors based on the data dictionary;
and generating a corresponding gray scale map based on the values of the feature vectors.
3. The method of claim 2, wherein generating the corresponding gray map based on the values of the feature vector comprises:
acquiring a gray value of a corresponding pixel point based on the value of each element in the feature vector;
and generating a gray level map with preset width according to the sequence from left to right and from bottom to top based on the gray level value of each pixel point and the arrangement sequence of each element in the characteristic vector.
4. The method of claim 2, wherein the converting the operation code sequence into the corresponding feature vector based on the data dictionary comprises:
Querying codes corresponding to the operation codes in the data dictionary based on the operation codes in the operation code sequence;
under the condition that the corresponding code is inquired, the code is used as a characteristic vector element value corresponding to the operation code;
deleting the operation code under the condition that the corresponding code is not queried;
and based on the sequence of each operation code in the operation code sequence, arranging corresponding eigenvector element values to generate the eigenvectors.
5. The method of claim 1, wherein the system call information includes a log generation time, a system call operation code, and a thread number, and wherein the acquiring the operation code sequence based on the system call information generated by the Office document to be detected during runtime includes:
grouping the system call information based on the thread number;
sorting the system call information of each group based on the log generation time;
and based on the ordering of the system call information, arranging the corresponding system call operation codes, and generating the operation code sequences corresponding to the groups.
6. The method of claim 1, wherein the document anomaly detection model is trained based on:
Acquiring a corresponding operation code sequence sample set based on a pre-acquired system call information sample set;
converting the operation code sequence sample set into a corresponding picture sample set based on the data dictionary;
taking the size of the picture with the largest height value in the picture sample set as a standard size, and carrying out standardization processing on the picture samples in the picture sample set;
and inputting the picture sample subjected to the standardization processing into an initial model for training to obtain the document anomaly detection model.
7. The method of claim 1, wherein the data dictionary is obtained based on:
acquiring a corresponding operation code sequence sample set based on a pre-acquired system call information sample set;
and encoding the operation codes in the operation code sequence sample set to generate the data dictionary.
8. The method of claim 1, wherein prior to the acquiring the operation code sequence based on system call information generated at runtime of the Office document to be detected, the method further comprises:
operating the Office document to be detected in a pre-created sandbox environment to acquire corresponding log information;
extracting process information corresponding to the Office document to be detected in the log information;
And acquiring the system call information based on the process information.
9. An Office document anomaly detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring an operation code sequence based on system call information generated when the Office document to be detected runs;
a conversion module for converting the operation code sequence into a gray scale map based on a data dictionary acquired in advance;
and the determining module is used for inputting the gray level diagram into a trained document abnormality detection model and determining whether the Office document to be detected is abnormal or not.
10. A readable storage medium having a program stored thereon, wherein the program when executed by a processor implements the steps of the Office document anomaly detection method of any one of claims 1 to 8.
CN202310801935.9A 2023-07-03 2023-07-03 Office document anomaly detection method and device and readable storage medium Pending CN116542224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310801935.9A CN116542224A (en) 2023-07-03 2023-07-03 Office document anomaly detection method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310801935.9A CN116542224A (en) 2023-07-03 2023-07-03 Office document anomaly detection method and device and readable storage medium

Publications (1)

Publication Number Publication Date
CN116542224A true CN116542224A (en) 2023-08-04

Family

ID=87447405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310801935.9A Pending CN116542224A (en) 2023-07-03 2023-07-03 Office document anomaly detection method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN116542224A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160164901A1 (en) * 2014-12-05 2016-06-09 Permissionbit Methods and systems for encoding computer processes for malware detection
US20190213099A1 (en) * 2018-01-05 2019-07-11 NEC Laboratories Europe GmbH Methods and systems for machine-learning-based resource prediction for resource allocation and anomaly detection
CN111600788A (en) * 2020-04-30 2020-08-28 深信服科技股份有限公司 Method and device for detecting harpoon mails, electronic equipment and storage medium
CN112860484A (en) * 2021-01-29 2021-05-28 深信服科技股份有限公司 Container runtime abnormal behavior detection and model training method and related device
CN114005107A (en) * 2021-11-03 2022-02-01 深圳须弥云图空间科技有限公司 Document processing method and device, storage medium and electronic equipment
CN114510716A (en) * 2022-01-20 2022-05-17 上海斗象信息科技有限公司 Document detection method, model training method, device, terminal and storage medium
CN115730313A (en) * 2022-12-05 2023-03-03 北京天融信网络安全技术有限公司 Malicious document detection method and device, storage medium and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160164901A1 (en) * 2014-12-05 2016-06-09 Permissionbit Methods and systems for encoding computer processes for malware detection
US20190213099A1 (en) * 2018-01-05 2019-07-11 NEC Laboratories Europe GmbH Methods and systems for machine-learning-based resource prediction for resource allocation and anomaly detection
CN111600788A (en) * 2020-04-30 2020-08-28 深信服科技股份有限公司 Method and device for detecting harpoon mails, electronic equipment and storage medium
CN112860484A (en) * 2021-01-29 2021-05-28 深信服科技股份有限公司 Container runtime abnormal behavior detection and model training method and related device
CN114005107A (en) * 2021-11-03 2022-02-01 深圳须弥云图空间科技有限公司 Document processing method and device, storage medium and electronic equipment
CN114510716A (en) * 2022-01-20 2022-05-17 上海斗象信息科技有限公司 Document detection method, model training method, device, terminal and storage medium
CN115730313A (en) * 2022-12-05 2023-03-03 北京天融信网络安全技术有限公司 Malicious document detection method and device, storage medium and equipment

Similar Documents

Publication Publication Date Title
EP3534284B1 (en) Classification of source data by neural network processing
CN111177095B (en) Log analysis method, device, computer equipment and storage medium
CN106709345B (en) Method, system and equipment for deducing malicious code rules based on deep learning method
EP3716111B1 (en) Computer-security violation detection using coordinate vectors
EP3534283A1 (en) Classification of source data by neural network processing
US10511617B2 (en) Method and system for detecting malicious code
CN111614599B (en) Webshell detection method and device based on artificial intelligence
CN111460446B (en) Malicious file detection method and device based on model
EP3323075A1 (en) Malware detection
CN110427755A (en) A kind of method and device identifying script file
CN109492118A (en) A kind of data detection method and detection device
US20220383157A1 (en) Interpretable machine learning for data at scale
KR20200030082A (en) Systems and methods for neural networks
CN114253866B (en) Malicious code detection method and device, computer equipment and readable storage medium
CN105468972B (en) A kind of mobile terminal document detection method
CN113360911A (en) Malicious code homologous analysis method and device, computer equipment and storage medium
CN116542224A (en) Office document anomaly detection method and device and readable storage medium
CN111177506A (en) Classification storage method and system based on big data
CN106547780A (en) Article reprints statistics of variables method and device
US20220272125A1 (en) Systems and methods for malicious url pattern detection
KR100918367B1 (en) Apparatus and method of context inference for ubiquitous context recognition
CN110929118B (en) Network data processing method, device, apparatus and medium
CN113888760A (en) Violation information monitoring method, device, equipment and medium based on software application
CN117454380B (en) Malicious software detection method, training method, device, equipment and medium
CN113315790B (en) Intrusion flow detection method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination