CN116305172A - OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment - Google Patents

OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment Download PDF

Info

Publication number
CN116305172A
CN116305172A CN202310580571.6A CN202310580571A CN116305172A CN 116305172 A CN116305172 A CN 116305172A CN 202310580571 A CN202310580571 A CN 202310580571A CN 116305172 A CN116305172 A CN 116305172A
Authority
CN
China
Prior art keywords
file
onenote
embedded
document
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310580571.6A
Other languages
Chinese (zh)
Other versions
CN116305172B (en
Inventor
金志强
刘佳男
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Antiy Network Technology Co Ltd
Original Assignee
Beijing Antiy Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Antiy Network Technology Co Ltd filed Critical Beijing Antiy Network Technology Co Ltd
Priority to CN202310580571.6A priority Critical patent/CN116305172B/en
Publication of CN116305172A publication Critical patent/CN116305172A/en
Application granted granted Critical
Publication of CN116305172B publication Critical patent/CN116305172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the field of malicious code detection, in particular to a detection method, a detection device, a detection medium and detection equipment of OneNote documents. Comprising the following steps: acquiring a target extraction file; performing interval determination processing on the target global unique identifier and the target extraction file to determine an extraction interval corresponding to the embedded file; because each embedded file corresponds to a unique globally unique identifier that identifies it, the 20 th byte of the globally unique identifier is followed by the head position of the embedded file; and the 8 bytes following this globally unique identifier represent the size of the embedded file; therefore, the OneNote document is scanned according to the global unique identifier, and code content in the extraction interval corresponding to each embedded file can be accurately and rapidly positioned. The embedded file is extracted and detected abnormally more accurately, and the detection rate of the malicious OneNote document is improved.

Description

OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment
Technical Field
The invention relates to the field of malicious code detection, in particular to a detection method, a detection device, a detection medium and detection equipment of OneNote documents.
Background
The OneNote document is a digital notebook newly introduced by microsoft, and can be used for directly recording notes, adding notes, processing words or drawing through an electronic ink technology, and can also be embedded with multimedia video and other various types of files. Due to the powerful note taking function, oneNote documents are being widely used in various scenes of daily office and production life.
At the same time, various types of files can be embedded in the OneNote document, so that more and more attackers start to propagate malicious files in this way. As it has been observed at present that a number of popular families of malicious code are propagated through OneNote documents.
However, the existing security product cannot automatically determine the code content corresponding to each embedded file in the OneNote document, so that abnormality detection cannot be accurately performed on each embedded file, and the problem of lower detection rate of the malicious OneNote document is also caused.
Disclosure of Invention
Aiming at the technical problems, the invention adopts the following technical scheme:
according to one aspect of the present invention, there is provided a method for detecting an OneNote document, the method comprising the steps of:
and obtaining a target extraction file of the OneNote document to be processed. The target extraction file is a hexadecimal file.
And carrying out interval determination processing on the target global unique identifier and the target extraction file, and determining an extraction interval corresponding to the embedded file. The extraction interval comprises code content corresponding to the embedded file.
The section determination process includes:
if the target global unique identifier is the same as the byte content in the first byte section, determining the twenty-first byte after the first byte section in the target extraction file as the initial byte of the extraction section. The first byte interval is any byte interval in the target extraction file that has the same byte length as the target globally unique identifier.
And determining the size of the numerical value represented by the byte content in the second byte interval corresponding to the first byte interval as the interval length of the extraction interval.
And determining the extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
According to a second aspect of the present invention, there is provided a detection apparatus for an OneNote document, comprising:
the acquisition module is used for: the method comprises the steps of obtaining a target extraction file of an OneNote document to be processed; the target extraction file is a hexadecimal file;
the processing module is used for: the method comprises the steps of performing interval determination processing on a target global unique identifier and a target extraction file to determine an extraction interval corresponding to an embedded file; the extraction interval comprises code content corresponding to the embedded file;
the processing module is specifically configured to:
if the target global unique identifier is the same as the byte content in the first byte interval, determining the twenty-first byte after the first byte interval in the target extraction file as the initial byte of the extraction interval; the first byte interval is any byte interval with the same byte length as the target global unique identifier in the target extraction file;
determining the value size represented by byte content in a second byte interval corresponding to the first byte interval as the interval length of the extraction interval;
and determining an extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
According to a third aspect of the present invention, there is provided a non-transitory computer readable storage medium storing a computer program which when executed by a processor implements a method of detecting an OneNote document as described above.
According to a fourth aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method for detecting an OneNote document as described above when executing the computer program.
The invention has at least the following beneficial effects:
because the OneNote document is converted into the corresponding hexadecimal file, each embedded file corresponds to a unique global unique identifier for identifying the embedded file, namely a target global unique identifier; meanwhile, the 20 th byte of the globally unique identifier is followed by the head position of the embedded file, namely the initial byte; and the 8 bytes after this globally unique identifier represent the size of the embedded file, i.e. the section length. Correspondingly, the head position of the embedded file plus the size of the embedded file is the tail position of the embedded file.
Therefore, by utilizing the characteristics of the hexadecimal files corresponding to the OneNote document, the position area of the code of each embedded file in the target extraction file can be directly determined.
That is, the code content of each embedded file can be accurately and quickly located by scanning the OneNote document according to the globally unique identifier. So as to extract each embedded file more accurately and detect the subsequent abnormality, thereby improving the detection rate of such malicious OneNote documents.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for detecting OneNote documents according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
According to a first aspect of the present invention, as shown in fig. 1, there is provided a method for detecting OneNote document, the method comprising the steps of:
s100: and obtaining a target extraction file of the OneNote document to be processed. The target extraction file is a hexadecimal file.
Obtained by transformation by existing transformation tools. The hexadecimal file is hexadecimal code corresponding to the OneNote document.
S200: and carrying out interval determination processing on the target global unique identifier and the target extraction file, and determining an extraction interval corresponding to the embedded file. The extraction interval comprises code content corresponding to the embedded file.
Each embedded file in the OneNote document will have a corresponding target globally unique identifier as the embedded identifier. And each target globally unique identifier is identical, its specific value in hexadecimal encoding may be { E7-16-E3-BD-65-26-11-45-A4-C4-8D-4D-0B-7A-9E-AC }. Therefore, the target extraction file of the OneNote document to be processed can be subjected to matching scanning by using the target global unique identifier. And determining an embedded file once every matching is successful.
The section determination process includes:
s201: if the target global unique identifier is the same as the byte content in the first byte section, determining the twenty-first byte after the first byte section in the target extraction file as the initial byte of the extraction section. The first byte interval is any byte interval in the target extraction file that has the same byte length as the target globally unique identifier.
The first byte interval is the coding interval where each target global unique identifier in the target extraction file is located.
S202: and determining the size of the numerical value represented by the byte content in the second byte interval corresponding to the first byte interval as the interval length of the extraction interval. The second byte section is a byte section formed of eight consecutive bytes after the first byte section.
S203: and determining the extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
Taking the partial codes in the target extraction file as examples, the following is taken as an example for illustration:
{ ……E7-16-E3-BD-65-26-11-45-A4-C4-8D-4D-0B-7A-9E-AC
1C-05-00-00-00-00-00-00-00-00-00-00-00-00-00-00
00-00-00-00-0D-70-6F……}
wherein E7-16-E3-BD-65-26-11-45-A4-C4-8D-4D-0B-7A-9E-AC in the first row is the first byte section, and 0D is the start byte of the extraction section; 1C-05-00-00-00-00-00-00 is the encoded content in the second byte interval, which represents a size of 0x51C. Thus, the coding region, i.e., the extraction section, of the corresponding embedded file can be determined.
Because the OneNote document is converted into the corresponding hexadecimal file, each embedded file corresponds to a unique global unique identifier for identifying the embedded file, namely a target global unique identifier; meanwhile, the 20 th byte of the globally unique identifier is followed by the head position of the embedded file, namely the initial byte; and the 8 bytes after this globally unique identifier represent the size of the embedded file, i.e. the section length. Correspondingly, the head position of the embedded file plus the size of the embedded file is the tail position of the embedded file.
Therefore, by utilizing the characteristics of the hexadecimal files corresponding to the OneNote document, the position area of the code of each embedded file in the target extraction file can be directly determined. The embodiment can automatically extract and detect the embedded files in a large number of OneNote documents. And the universality is strong, and various embedded files in the OneNote document can be detected.
That is, the code content of each embedded file can be accurately and quickly located by scanning the OneNote document according to the globally unique identifier. So as to extract each embedded file more accurately and detect the subsequent abnormality, thereby improving the detection rate of such malicious OneNote documents.
As a possible embodiment of the invention, the method further comprises:
s300: and acquiring code content corresponding to the extraction interval.
S400: and detecting the abnormality of the code content corresponding to the extraction interval. The anomaly detection is used for determining whether the code content corresponding to the extraction interval is the code used for initiating the security attack by an unauthorized user.
Preferably, the anomaly detection includes a first anomaly detection and a second anomaly detection, the second anomaly detection having a greater accuracy than the first anomaly detection.
S400: performing anomaly detection on code content corresponding to the extraction interval, including:
s401: and performing first anomaly detection on the code content corresponding to the extraction interval.
The first anomaly detection comprises comparing and detecting code content corresponding to the extraction interval by using a malicious code feature library.
S402: if no abnormality is detected, a second abnormality detection is performed on the code content corresponding to the extraction section.
And manually analyzing the malicious embedded file determined by the second anomaly detection, extracting malicious code features from the malicious embedded file, and recording the newly added malicious code features into a malicious code feature library.
The code content corresponding to the extraction interval can be subjected to anomaly detection by the existing anomaly detection method so as to determine whether the to-be-processed OneNote document carries a malicious file or not. Meanwhile, in order to consider the detection efficiency and accuracy, the detection speed of the first abnormality detection is greater than that of the second abnormality detection, and the accuracy of the second abnormality detection is greater than that of the first abnormality detection. The second abnormality detection may be an existing abnormality detection method with higher accuracy such as an abnormality detection performed manually.
As a possible embodiment of the present invention, at S100: before obtaining the target extraction file of the OneNote document to be processed, the method further comprises:
s110: a target extraction file of a plurality of initial detection documents is acquired. The plurality of initial detection documents includes at least one OneNote document.
S120: the target file identification is matched with the header bytes of the target extraction file of each initial detection document. Preferably, the target file identifier is sixteen bytes in length.
S130: if the matching is successful, the initial detection document is determined to be the OneNote document to be processed.
Since the hexadecimal code of the OneNote document will have a specific 16-byte header: { E4-52-5C-7B-8C-D8-A7-4D-AE-B1-53-78-D0-29-96-D3}. Thus, oneNote documents can be screened from a large number of files with this feature. Thus, no scanning of all content is required, so the OneNote document can be determined more quickly for quick subsequent detection.
As one possible embodiment of the present invention, the OneNote document includes a plurality of embedded files, which are image files or executable files. The image file may be a still picture or a moving picture. The executable file is a file which can execute corresponding operation according to a preset code after clicking and opening, such as an executable program, an executable webpage file and an executable script file; the executable web page file may be in html, htm, etc. The executable script file may be in the format of VBS, BAT, CMD, etc.
At S200: after determining the extraction interval corresponding to the embedded file, the method further comprises the following steps:
s500: and determining the file type of each embedded file included in the OneNote document according to the codes in the head section in the extraction section corresponding to each embedded file. The file types include image types and executable file types.
Specifically, the coding of the header section corresponding to the image file of the same format is fixed. The encoding of the header section corresponding to the executable file of the same format is also the same. Thus, codes in header sections corresponding to a plurality of embedded files can be collected by the history data. The file type of each embedded file is determined by matching.
S600: when P exe /P img >Y 1 At this time, the confidence processing is performed on the OneNote document, and the confidence D of the OneNote document is generated. Wherein P is exe P is the total number of embedded files of executable file types in OneNote document img Y is the total number of embedded files of image types in OneNote document 1 Is a first confidence threshold.
An attacker typically places a plurality of start icons of malicious files under some pictures that guide the user to click, where the start icons may be start shortcuts, and the pictures may be icons of double-click buttons, in order to increase the probability of the user running malicious files. The user double clicks the picture according to the schematic of the picture, and then starts a starting shortcut below the picture, so that the malicious software starts to run.
Thus, based on the characteristics, P in OneNote document can be caused exe /P img Elevated above P in a normal OneNote document exe /P img 。Y 1 And Y 2 The determination may be made based on the particular use scenario.
At the same time due to P exe /P img Although a certain degree of anomaly is shown, the accuracy is low, and a normal OneNote document is easily misjudged as an abnormal document. A more accurate confidence process is required to further determine if the OneNote document is abnormal.
S700: when D is<Y 2 At this time, the OneNote document is determined to be an abnormal document. Y is Y 2 Is a second confidence threshold.
The confidence processing comprises the following steps:
s601: and acquiring the image attribute value of the embedded file of each image type in the OneNote document according to the jcidImageNode structure corresponding to the OneNote document. Wherein A is i An image attribute value for the embedded file of the i-th image type. A is that i ={(X i P ,Y i P ),W i P ,H i P }. Wherein, (X i P ,Y i P ) The upper left corner point of the picture corresponding to the embedded file of the ith image type is on OneNotPosition coordinates in e document, W i P And H i P The width of the picture and the height of the picture corresponding to the embedded file of the ith image type are respectively set.
S602: and obtaining the icon attribute value of the embedded file of each executable file type in the OneNote document according to the jcidEmbeddFilenode structure corresponding to the OneNote document. Wherein B is n An icon attribute value for the embedded file of the nth executable file type. B (B) n ={(X n b ,Y n b ),W n b ,H n b }. Wherein, (X n b ,Y n b ) Position coordinates of the upper left corner of the start icon corresponding to the embedded file of the nth executable file type in the OneNote document, W n b And H n b The icon width and the icon height of the start icon corresponding to the embedded file of the nth executable file type are respectively.
The jcidImagenode structure in the OneNote document specifies attribute values of all image nodes included in the OneNote document, the attribute values including information such as width and height of an image, horizontal and vertical positions (position coordinates) of the image, text content included in the image, and the like.
Correspondingly, attribute values of all executable files included in the OneNote document are specified in the jcidEmbeddFileNode structure, and the attribute values include the width and height of a file start icon, the horizontal and vertical positions (position coordinates) of the file, the source file name (including a suffix name) of the file, and the like.
Specifically, in the page coordinate system in the OneNote document, the lower left corner of the page may be taken as the origin of coordinates, the lower transverse edge of the page is taken as the X axis, the left vertical edge of the page is taken as the Y axis, and the page coordinate system of each page is established. Therefore, the coverage area values of the embedded picture and the starting icon can be accurately determined.
S603: and generating a coverage area value of the embedded file of each image type according to the image attribute value of the embedded file of each image type. Wherein C is i Coverage area value, C, for the embedded file of the ith image type i ={[X i P ,X i P +W i P ],[Y i P -H i P ,Y i P ]}。
S604: and generating an overlay area value of the embedded file of each executable file type according to the icon attribute value of the embedded file of each executable file type. Wherein D is n Coverage area value, D, for embedded file of nth executable file type i ={[X n b ,X n b +W n b ],[Y n b -H n b ,Y n b ]}。
S605: determining the executable file coverage number E corresponding to the embedded file of each image type according to the coverage area value of the embedded file of each image type and the coverage area value of the embedded file of each executable file type 1 、E 2 、…、E i 、…、E z . Wherein E is i For the executable file coverage number corresponding to the i-th image type embedded file, z is the total number of image type embedded files included in the OneNote document, i=1, 2, …, z.
Specifically, when [ X ] n b ,X n b +W n b ]∈[X i P ,X i P +W i P ]And [ Y ] n b -H n b ,Y n b ]∈[Y i P -H i P ,Y i P ]And when the image type is the same as the image type, the start icon of the embedded file of the nth executable file type is covered below the embedded file of the ith image type.
According to the method, the number of the starting icons covered under each picture can be determined, and the file coverage number can be executed.
S606: according to E 1 、E 2 、…、E i 、…、E z A confidence level D for the OneNote document is generated. D satisfies the following condition:
Figure SMS_1
wherein, K is a basic coefficient, K is a non-zero natural number, and the situation that the denominator in the calculation formula is 0 can be avoided by adding K.
In this embodiment, according to the attribute value corresponding to each embedded file, the area covered by the image and the area covered by the start icon of the executable file may be obtained. And judging whether the executable file is hidden under the image according to whether the area covered by the starting icon is completely within the image area. Therefore, whether the OneNote document has malicious features can be accurately judged.
As a possible embodiment of the invention, the image attribute value further comprises text information embedded in the file of the image type.
At S605: after determining the executable file coverage number corresponding to the embedded file of each image type, the confidence processing further includes:
s615: matching text information in the embedded file of each image type with a first mapping word list to generate a first confidence added value F of the embedded file of each image type 1 、F 2 、…、F i 、…、F z . Wherein F is i And the first confidence added value corresponding to the embedded file of the ith image type. The first mapping word list is a mapping relation list between preset text information and a first weight value.
S625: will be according to E 1 、E 2 、…、E i 、…、E z The confidence level D of the OneNote document is generated, replacing it with: and generating the confidence degree D of the OneNote document according to the executable file coverage number corresponding to the embedded file of each image type and the first confidence added value. D satisfies the following condition:
Figure SMS_2
an attacker typically sets some text content for guiding the user to click in a picture covering a plurality of malicious start icons in order to further increase the probability of the user running malicious files. Such as "double click", "Double Click here to view", etc., having inducibility.
In this embodiment, whether the picture includes a sensitive vocabulary inducing the user to operate is also considered in the confidence calculation, so that the calculation accuracy of the confidence can be further improved, and finally the determination accuracy of whether the OneNote document is an abnormal document is improved.
As a possible embodiment of the present invention, the icon attribute value further includes a suffix name of the embedded file of the executable file type.
At S615: after generating the first confidence additional value for the embedded file for each image type, the confidence processing further includes:
s635: matching the suffix name of the embedded file of each executable file type with a second mapping word list to generate a second confidence added value G of the embedded file of each executable file type 1 、G 2 、…、G n 、…、G Q . Wherein G is n A corresponding second confidence additional value for the embedded file of the nth executable file type. Q is the total number of embedded files of executable file type included in the OneNote document, n=1, 2, …, Q. The second mapping word list is a mapping relation list between a preset suffix name and a second weight value.
S645: replacing the confidence degree D of the OneNote document generated according to the executable file coverage number corresponding to the embedded file of each image type and the first confidence added value with: and generating the confidence degree D of the OneNote document according to the executable file coverage number and the first confidence added value corresponding to the embedded file of each image type and the second confidence added value of the embedded file of each executable file type. D satisfies the following condition:
Figure SMS_3
since the icon attribute value also includes the suffix name of the embedded file of the executable file type. And the suffix name can embody specific type information of the executable file. Therefore, the specific type information can be matched with the determined type information of the malicious file, and a corresponding weight value, namely a second confidence added value, is given according to the matching degree so as to represent the malicious degree of the executable file.
In this embodiment, not only the feature of whether the picture includes a sensitive vocabulary inducing the user to perform the operation is considered in the confidence calculation, but also the feature of the suffix name of the executable file is considered in the confidence calculation. And further, the calculation accuracy of the confidence coefficient can be improved again, and finally, the judgment accuracy of whether the OneNote document is an abnormal document is improved.
As a second aspect of the present invention, there is provided a detection apparatus of an OneNote document, comprising:
the acquisition module is used for: the method comprises the steps of obtaining a target extraction file of an OneNote document to be processed; the target extraction file is a hexadecimal file;
the processing module is used for: the method comprises the steps of performing interval determination processing on a target global unique identifier and a target extraction file to determine an extraction interval corresponding to an embedded file; the extraction interval comprises code content corresponding to the embedded file;
the processing module is specifically configured to:
if the target global unique identifier is the same as the byte content in the first byte interval, determining the twenty-first byte after the first byte interval in the target extraction file as the initial byte of the extraction interval; the first byte interval is any byte interval with the same byte length as the target global unique identifier in the target extraction file;
determining the value size represented by byte content in a second byte interval corresponding to the first byte interval as the interval length of the extraction interval;
and determining an extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
As a third aspect of the present invention, there is also provided a non-transitory computer-readable storage medium storing a computer program which, when executed by a processor, implements a detection method of an OneNote document in any one of the above embodiments.
As a fourth aspect of the present invention, there is also provided an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing a detection method of an OneNote document in any one of the above embodiments when executing the computer program.
Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention described in the present specification when the program product is run on the electronic device.
Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device according to this embodiment of the invention. The electronic device is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present invention.
The electronic device is in the form of a general purpose computing device. Components of an electronic device may include, but are not limited to: the at least one processor, the at least one memory, and a bus connecting the various system components, including the memory and the processor.
Wherein the memory stores program code that is executable by the processor to cause the processor to perform steps according to various exemplary embodiments of the present invention described in the above section of the exemplary method of this specification.
The storage may include readable media in the form of volatile storage, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).
The storage may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary method" section of this specification, when the program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A method for detecting OneNote documents, the method comprising the steps of:
acquiring a target extraction file of an OneNote document to be processed; the target extraction file is a hexadecimal file;
performing interval determination processing on the target global unique identifier and the target extraction file to determine an extraction interval corresponding to the embedded file; the extraction interval comprises code content corresponding to the embedded file;
the section determination process includes:
if the target global unique identifier is the same as the byte content in the first byte interval, determining the twenty-first byte after the first byte interval in the target extraction file as the initial byte of the extraction interval; the first byte interval is any byte interval with the same byte length as the target global unique identifier in the target extraction file;
determining the value size represented by byte content in a second byte interval corresponding to the first byte interval as the interval length of the extraction interval;
and determining an extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
2. The method of claim 1, wherein after determining the extraction interval corresponding to the embedded file, the method further comprises:
acquiring code content corresponding to the extraction interval;
performing anomaly detection on the code content corresponding to the extraction interval;
the anomaly detection is used for determining whether the code content corresponding to the extraction interval is a code for an unauthorized user to initiate security attack.
3. The method of claim 2, wherein the anomaly detection comprises a first anomaly detection and a second anomaly detection, the second anomaly detection having a greater accuracy than the first anomaly detection;
performing anomaly detection on the code content corresponding to the extraction interval, including:
performing first anomaly detection on code content corresponding to the extraction interval;
if no abnormality is detected, performing second abnormality detection on the code content corresponding to the extraction interval;
the first anomaly detection comprises comparison detection of code content corresponding to the extraction interval by using a malicious code feature library.
4. The method of claim 1, wherein prior to obtaining the target extraction file of the OneNote document to be processed, the method further comprises:
acquiring target extraction files of a plurality of initial detection documents; the plurality of initial detection documents comprise at least one OneNote document;
matching the target file identification with the head bytes of the target extraction file of each initial detection document;
if the matching is successful, determining the initial detection document as an OneNote document to be processed;
the target file identifier has a length of sixteen bytes.
5. The method of claim 1, wherein the OneNote document comprises a plurality of embedded files, the embedded files being image files or executable files; the executable file is an executable program, an executable webpage file or an executable script file;
after determining the extraction interval corresponding to the embedded file, the method further comprises:
determining the file type of each embedded file included in the OneNote document according to the codes in the head section in the extraction section corresponding to each embedded file; the file types comprise image types and executable file types;
when P exe /P img >Y 1 Performing confidence degree processing on the OneNote document to generate a confidence degree D of the OneNote document; wherein P is exe The total number of embedded files that are executable file types in the OneNote document; p (P) img The total number of embedded files that are image types in the OneNote document; y is Y 1 Is a first confidence threshold;
when D is<Y 2 When the OneNote document is an abnormal document, determining the OneNote document as an abnormal document; y is Y 2 Is a second confidence threshold;
the confidence processing includes:
acquiring an image attribute value of an embedded file of each image type in the OneNote document according to a jcidImageNode structure corresponding to the OneNote document; wherein,,A i an image attribute value for the embedded file of the ith image type; a is that i ={(X i P ,Y i P ),W i P ,H i P };(X i P ,Y i P ) The position coordinates of the upper left corner of the picture corresponding to the embedded file of the ith image type in the OneNote document; w (W) i P And H i P The picture width and the picture height corresponding to the embedded file of the ith image type are respectively; z is the total number of embedded files of the image type included in the OneNote document; i=1, 2, …, z;
obtaining an icon attribute value of an embedded file of each executable file type in the OneNote document according to a jcidEmbeddFilenode structure corresponding to the OneNote document; wherein B is n An icon attribute value for the embedded file of the nth executable file type; b (B) n ={(X n b ,Y n b ),W n b ,H n b };(X n b ,Y n b ) The position coordinates of the upper left corner of the starting icon corresponding to the embedded file of the nth executable file type in the OneNote document; w (W) n b And H n b The icon width and the icon height of the starting icon corresponding to the embedded file of the nth executable file type are respectively; q is the total number of embedded files of the executable file type included in the OneNote document; n=1, 2, …, Q;
generating a coverage area value of the embedded file of each image type according to the image attribute value of the embedded file of each image type; wherein C is i Coverage area value, C, for the embedded file of the ith image type i ={[X i P ,X i P +W i P ],[Y i P -H i P ,Y i P ]};
Generating a coverage area value of the embedded file of each executable file type according to the icon attribute value of the embedded file of each executable file type; wherein D is n Coverage area value, D, for embedded file of nth executable file type n ={[X n b ,X n b +W n b ],[Y n b -H n b ,Y n b ]};
According to the coverage area value of the embedded file of each image type and the coverage area value of the embedded file of each executable file type; determining executable file coverage number E corresponding to embedded file of each image type 1 、E 2 、…、E i 、…、E z The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is i The executable file coverage number corresponding to the embedded file of the ith image type;
according to E 1 、E 2 、…、E i 、…、E z Generating a confidence degree D of the OneNote document; d satisfies the following condition:
Figure QLYQS_1
wherein K is the base coefficient.
6. The method of claim 5, wherein the image attribute value further comprises text information in an embedded file of an image type;
after determining the executable file coverage number corresponding to the embedded file of each image type, the confidence processing further includes:
matching text information in the embedded file of each image type with a first mapping word list to generate a first confidence added value F of the embedded file of each image type 1 、F 2 、…、F i 、…、F z The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is i A first confidence added value corresponding to the embedded file of the ith image type; the first mapping word list is a mapping relation list between preset text information and a first weight value;
will be according to E 1 、E 2 、…、E i 、…、E z Generating confidence level D of OneNote document, replacingThe method comprises the following steps: generating a confidence degree D of the OneNote document according to the executable file coverage number corresponding to the embedded file of each image type and the first confidence added value; d satisfies the following condition:
Figure QLYQS_2
7. the method of claim 6, wherein the icon attribute value further comprises a suffix name of an embedded file of the executable file type;
after generating the first confidence additional value for the embedded file for each image type, the confidence processing further includes:
matching the suffix name of the embedded file of each executable file type with a second mapping word list to generate a second confidence added value G of the embedded file of each executable file type 1 、G 2 、…、G n 、…、G q The method comprises the steps of carrying out a first treatment on the surface of the Wherein G is n A corresponding second confidence added value for the embedded file of the nth executable file type; the second mapping word list is a mapping relation list between a preset suffix name and a second weight value;
replacing the confidence degree D of the OneNote document generated according to the executable file coverage number corresponding to the embedded file of each image type and the first confidence added value with: generating a confidence degree D of the OneNote document according to the executable file coverage number and the first confidence added value corresponding to the embedded file of each image type and the second confidence added value of the embedded file of each executable file type; d satisfies the following condition:
Figure QLYQS_3
8. a detection device for OneNote documents, comprising:
the acquisition module is used for: the method comprises the steps of obtaining a target extraction file of an OneNote document to be processed; the target extraction file is a hexadecimal file;
the processing module is used for: the method comprises the steps of performing interval determination processing on a target global unique identifier and a target extraction file to determine an extraction interval corresponding to an embedded file; the extraction interval comprises code content corresponding to the embedded file;
the processing module is specifically configured to:
if the target global unique identifier is the same as the byte content in the first byte interval, determining the twenty-first byte after the first byte interval in the target extraction file as the initial byte of the extraction interval; the first byte interval is any byte interval with the same byte length as the target global unique identifier in the target extraction file;
determining the value size represented by byte content in a second byte interval corresponding to the first byte interval as the interval length of the extraction interval;
and determining an extraction interval corresponding to the embedded file according to the initial byte and the interval length of the extraction interval.
9. A non-transitory computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a method of detecting an OneNote document according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method of detecting an OneNote document as claimed in any one of claims 1 to 7 when the computer program is executed by the processor.
CN202310580571.6A 2023-05-23 2023-05-23 OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment Active CN116305172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310580571.6A CN116305172B (en) 2023-05-23 2023-05-23 OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310580571.6A CN116305172B (en) 2023-05-23 2023-05-23 OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment

Publications (2)

Publication Number Publication Date
CN116305172A true CN116305172A (en) 2023-06-23
CN116305172B CN116305172B (en) 2023-08-04

Family

ID=86834488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310580571.6A Active CN116305172B (en) 2023-05-23 2023-05-23 OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment

Country Status (1)

Country Link
CN (1) CN116305172B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104025102A (en) * 2011-10-18 2014-09-03 迈克菲公司 System And Method For Detecting A File Embedded In An Arbitrary Location And Determining The Reputation Of The File
CN106372507A (en) * 2016-08-30 2017-02-01 北京奇虎科技有限公司 Method and device for detecting malicious document
CN106940771A (en) * 2016-01-04 2017-07-11 阿里巴巴集团控股有限公司 Leak detection method and device based on file
CN111382439A (en) * 2020-03-28 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-mode deep learning
US20200364334A1 (en) * 2019-05-16 2020-11-19 Cisco Technology, Inc. Detection of malicious executable files using hierarchical models
CN112041815A (en) * 2018-05-15 2020-12-04 国际商业机器公司 Malware detection
CN115035539A (en) * 2022-08-12 2022-09-09 平安银行股份有限公司 Document anomaly detection network model construction method and device, electronic equipment and medium
CN116089912A (en) * 2022-12-30 2023-05-09 成都鲁易科技有限公司 Software identification information acquisition method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104025102A (en) * 2011-10-18 2014-09-03 迈克菲公司 System And Method For Detecting A File Embedded In An Arbitrary Location And Determining The Reputation Of The File
CN106940771A (en) * 2016-01-04 2017-07-11 阿里巴巴集团控股有限公司 Leak detection method and device based on file
CN106372507A (en) * 2016-08-30 2017-02-01 北京奇虎科技有限公司 Method and device for detecting malicious document
CN112041815A (en) * 2018-05-15 2020-12-04 国际商业机器公司 Malware detection
US20200364334A1 (en) * 2019-05-16 2020-11-19 Cisco Technology, Inc. Detection of malicious executable files using hierarchical models
CN111382439A (en) * 2020-03-28 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-mode deep learning
CN115035539A (en) * 2022-08-12 2022-09-09 平安银行股份有限公司 Document anomaly detection network model construction method and device, electronic equipment and medium
CN116089912A (en) * 2022-12-30 2023-05-09 成都鲁易科技有限公司 Software identification information acquisition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116305172B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN112041815B (en) Malware detection
US8631498B1 (en) Techniques for identifying potential malware domain names
KR101337874B1 (en) System and method for detecting malwares in a file based on genetic map of the file
CN109271782B (en) Method, medium, system and computing device for detecting attack behavior
US9213837B2 (en) System and method for detecting malware in documents
CN109118420B (en) Watermark identification model establishing and identifying method, device, medium and electronic equipment
CN115221516B (en) Malicious application program identification method and device, storage medium and electronic equipment
CN113472803A (en) Vulnerability attack state detection method and device, computer equipment and storage medium
US8335757B2 (en) Extracting patterns from sequential data
CN115766184A (en) Webpage data processing method and device, electronic equipment and storage medium
CN114205156A (en) Message detection method and device for tangent plane technology, electronic equipment and medium
CN116305172B (en) OneNote document detection method, oneNote document detection device, oneNote document detection medium and OneNote document detection equipment
CN109343985B (en) Data processing method, device and storage medium
RU2583712C2 (en) System and method of detecting malicious files of certain type
CN115225328B (en) Page access data processing method and device, electronic equipment and storage medium
CN114143074B (en) webshell attack recognition device and method
CN113849813A (en) Data detection method and device, electronic equipment and storage medium
CN114357449A (en) Abnormal process detection method and device, electronic equipment and storage medium
CN116910756B (en) Detection method for malicious PE (polyethylene) files
CN116992448B (en) Sample determination method, device, equipment and medium based on importance degree of data source
CN116611068B (en) File scanning method based on confusion path, electronic equipment and storage medium
CN113703780B (en) Decompilation detection and webpage resource data sending method, device, equipment and medium
CN113553487B (en) Method and device for detecting website type, electronic equipment and storage medium
CN112565271B (en) Web attack detection method and device
CN115348096B (en) Command injection vulnerability detection method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant