CN116303290A

CN116303290A - Office document detection method, device, equipment and medium

Info

Publication number: CN116303290A
Application number: CN202310546722.6A
Authority: CN
Inventors: 高泽霖; 刘佳男; 肖新光
Original assignee: Beijing Antiy Network Technology Co Ltd
Current assignee: Beijing Antiy Network Technology Co Ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-06-23
Anticipated expiration: 2043-05-16
Also published as: CN116303290B

Abstract

The invention provides an office document detection method, an office document detection device, office document detection equipment and a office document detection medium, wherein the office document detection method comprises the following steps: acquiring a plurality of non-hidden subfiles contained in an office document to be detected; determining a target file from a plurality of non-hidden subfiles; if the target file comprises the first preset field and the second preset field at the same time, determining whether the to-be-detected office document has a corresponding risk file; and if the to-be-detected office document has the corresponding risk file, outputting an alarm prompt. According to the invention, through detecting the dependence item executed by using the VSTO technology malicious file, the malicious file is prevented from executing various malicious operations such as stealing user data assets, encrypting files in a user system and the like in a user machine, so that the user data assets are prevented from being damaged, the detection rate of the malicious code is improved, the malicious code is not required to be detected based on characteristics, the method has certain universality, less system resources are occupied, and the influence on the system performance is smaller.

Description

Office document detection method, device, equipment and medium

Technical Field

The present invention relates to the field of security detection, and in particular, to a method, an apparatus, a device, and a medium for detecting an office document.

Background

In the field of network security, an attacker may use VSTO (Visual Studio Tools for office, macro language) to maliciously manipulate an office document, which may export the load item embedded in the office document. The VSTOoffice file enables an attacker to induce a user to install an add-on by means of phishing mail or the like to control the user's machine to remotely execute malicious code. The VSTO office file may also be downloaded directly from the Internet after the user opens the file to steal the user data asset. And most security manufacturers do not pay attention to the application of the VSTO in the malicious attack event, so that the current security software cannot detect malicious codes executed by using the technology, and security threat is caused to network data of users.

Disclosure of Invention

In view of this, the invention provides a method, a device, equipment and a medium for detecting office documents, which at least partially solve the technical problems existing in the prior art, and adopts the following technical scheme:

according to one aspect of the present application, there is provided an office document detection method, including:

responding to the acquired to-be-detected office document, and acquiring a plurality of non-hidden subfiles contained in the to-be-detected office document;

Determining a target file from a plurality of non-hidden subfiles according to a preset file name; the file name of the target file is a preset file name;

if the target file comprises the first preset field and the second preset field at the same time, determining whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles contained in the to-be-detected office document;

and if the to-be-detected office document has the corresponding risk file, outputting an alarm prompt.

In an exemplary embodiment of the present application, obtaining a plurality of non-hidden subfiles included in an office document to be detected includes:

replacing the suffix name of the to-be-detected office document with a first preset character string to obtain a compressed office document; if the suffix name of any file is a first preset character string, the file type of the any file is a compressed file;

decompressing the compressed office document into a preset storage space, and obtaining a plurality of non-hidden subfiles obtained by decompressing the compressed office document in the preset storage space.

In an exemplary embodiment of the present application, determining whether an office document to be detected has a corresponding risk file according to a plurality of subfiles included in the office document to be detected includes:

Determining whether a preset storage space contains hidden files or not, and if so, determining each hidden file as a hidden sub-file;

determining whether at least one hidden sub-file contains a hidden sub-file with a file type of a first preset type, and if so, determining the hidden sub-file with the file type of the first preset type as a risk file.

In an exemplary embodiment of the present application, determining whether at least one hidden sub-file includes a hidden sub-file having a file type that is a first preset type further includes:

if the hidden sub-file does not contain the hidden sub-file with the file type of the first preset type, determining whether the to-be-detected office document has a corresponding risk link according to the second preset field;

and if the to-be-detected office document has the corresponding risk link, outputting the risk link and an alarm prompt.

In an exemplary embodiment of the present application, determining whether the office document to be detected has a corresponding risk link according to the second preset field includes:

and determining whether the field content of the second preset field contains a network link with a second preset character string, and if so, determining the network link with the second preset character string as a risk link.

In an exemplary embodiment of the present application, determining a hidden subfile with a file type of a first preset type as a risk file includes:

acquiring characteristic information of a hidden sub-file with a file type of a first preset type;

determining a risk characteristic value of a hidden sub-file with a corresponding file type of a first preset type according to the characteristic information;

if the risk characteristic value is larger than the preset risk threshold value, determining the hidden sub-file with the file type corresponding to the risk characteristic value as the first preset type as the risk file.

In an exemplary embodiment of the present application, determining, according to the feature information, a risk feature value of a hidden subfile of a first preset type corresponding to a file type includes:

determining a corresponding file type as a characteristic vector of a hidden sub-file of a first preset type according to the characteristic information;

acquiring a plurality of historical non-malicious feature vectors and a plurality of historical malicious feature vectors; the historical non-malicious feature vector is a feature vector corresponding to the historical non-risk file; the historical non-risk file is a file with a file type of a first preset type and does not contain a code of a set type; the history malicious feature vector is a feature vector corresponding to the history risk file; the historical risk file is a file with a file type of a first preset type and contains a code of a set type;

Clustering at least part of the historical non-malicious feature vectors to obtain a plurality of historical non-malicious feature vector groups;

fusing at least part of the historical non-malicious feature vectors in each historical non-malicious feature vector group to obtain historical non-malicious fused feature vectors corresponding to each historical non-malicious feature vector group;

performing feature comparison on the feature vector and each historical non-malicious fusion feature vector to obtain a plurality of first matching degrees;

performing feature comparison on the feature vector and each historical malicious feature vector to obtain a plurality of second matching degrees;

and determining the risk characteristic value of the hidden sub-file with the file type corresponding to the characteristic vector as the first preset type according to the weight of each historical non-malicious fusion characteristic vector, the weight of each historical malicious characteristic vector, each first matching degree and each second matching degree.

According to one aspect of the present application, there is provided an office document detection apparatus, including:

the document response module is used for acquiring a plurality of non-hidden subfiles contained in the to-be-detected office document when the to-be-detected office document is acquired;

the target determining module is used for determining a target file from a plurality of non-hidden subfiles according to a preset file name; the file name of the target file is a preset file name;

The risk determining module is used for determining whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles contained in the to-be-detected office document when the target file simultaneously comprises a first preset field and a second preset field;

and the alarm prompt module is used for outputting an alarm prompt when the to-be-detected office document has a corresponding risk file.

According to one aspect of the present application, there is provided a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the aforementioned office document detection method.

According to one aspect of the present application, there is provided an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

The invention has at least the following beneficial effects:

according to the invention, the target file is determined from a plurality of non-hidden subfiles contained in the to-be-detected office document, whether the to-be-detected office document has a corresponding risk file is determined by detecting whether the target file simultaneously comprises a first preset field and a second preset field, if the to-be-detected office document has the corresponding risk file, an alarm prompt is output, so that the local VSTO detection and the remote VSTO detection of the office document are realized, the security performance is improved, the dependence items executed by utilizing the VSTO technology malicious file are detected, the malicious files are prevented from executing various malicious operations such as stealing user data assets in a user machine, encrypting files in a user system and the like, the defect of detecting the user data assets aiming at the technology is overcome, the detection of the malicious codes by security software is improved, the malicious codes do not need to be detected based on characteristics, the method has certain universality, the occupied system resources are less, and the influence on the system performance is smaller.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an office document detection method provided by an embodiment of the present invention;

FIG. 2 is a block diagram of an office document detection device according to an embodiment of the present invention;

fig. 3 to fig. 8 are step diagrams of an office document detection method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

VBA (VisualBasic for Applications, macro language) in Microsoft office files has long been utilized by attackers to gain access to target systems and deploy malware. If allowed to run automatically, once the user opens the office file, the attacker can use VBA to execute malicious code. Even if VBA is not enabled by default, an attacker induces a user to enable VBA to execute malicious code by social means such as enabling VBA to view complete content, and the like. For the above reasons, more and more security software will prevent suspicious VBA code execution, and microsoft will also default to prevent macros in files from the Internet.

Because VSTO (Visual Studio Tools for office) makes it easier to develop an office application, and using VSTO to develop an office application can use numerous functions in the Visual studio development environment and memory management, garbage collection, etc. functions provided by CLR. Thus, an attacker begins to use the alternate attack mediator VSTO of VBA, which can export the load item embedded in the office document. The VSTO office file enables an attacker to induce a user to install an add-on by means of phishing mail or the like to control the user's machine to remotely execute malicious code. As shown in FIG. 5, an attacker induces a victim in a document to see the complete content by installing custom items, most users who do not know the office document are induced to install, and after the installation, malicious codes preset by the attacker are executed to execute malicious behaviors. When the VSTO office file is connected to a Visual Studiooffice File application written using NET, it can execute any malicious code. The VSTO office file may also download the VSTO file (NET application) directly from the Internet after the user opens the file. And because most security vendors do not pay attention to the use of VSTOs in the event of a malicious attack, current security software is substantially unable to detect malicious code executed using this technique.

Therefore, in order to prevent an attacker from executing malicious codes on a user machine by using the VSTO, the invention provides the method for detecting the office document, which judges whether the office document is malicious or not by detecting whether the office document has the condition of using the VSTO maliciously so as to prevent the office document from executing the malicious codes in the user machine and avoid damage to user data assets.

FIG. 1 is a flow chart of an office document detection method according to one embodiment of the invention.

As shown in fig. 1, the office document detection method according to an embodiment of the present invention includes:

step S100, responding to the obtained to-be-detected office document, and obtaining a plurality of non-hidden subfiles contained in the to-be-detected office document;

the to-be-detected office document is a received undetected document, and comprises a plurality of non-hidden subfiles, and when the to-be-detected office document is obtained, whether the to-be-detected office document has codes executed by utilizing the VSTO is detected by obtaining all the non-hidden subfiles contained in the to-be-detected office document.

Further, in step S100, a plurality of non-hidden subfiles included in the office document to be detected are obtained, including:

step S110, replacing the suffix name of the to-be-detected office document with a first preset character string to obtain a compressed office document; if the suffix name of any file is a first preset character string, the file type of the any file is a compressed file;

And step S120, decompressing the compressed office document into a preset storage space, and obtaining a plurality of non-hidden subfiles obtained after decompressing the compressed office document in the preset storage space.

When the to-be-detected office document is obtained, only one editable document is needed to be compressed and decompressed if the subfiles contained in the to-be-detected office document are obtained, if the to-be-obtained subfiles are needed to be obtained, the suffix name of the to-be-detected office document is directly changed into a first preset character string, such as a zip character string, the to-be-detected office document with the suffix name changed into the first preset character string is converted into a compressed file, the compressed office document is obtained, the compressed office document is decompressed into a preset storage space, the preset storage space can be a blank folder, or other folders or other memories which do not contain VSTO files, and a plurality of non-hidden subfiles are obtained after the decompression.

Currently, because detecting VSTO execution needs to meet the operating environment, an attacker typically needs to want to co-propagate the documents and dependencies that utilize the VSTO into the user's machine, i.e., under the same folder. Users who normally use such documents often install the environment under a default or designated folder in advance, and for the scene with a high current security level, the number of files transmitted by the user machine at one time is often limited. An attacker can transmit office documents, VSTO files, dll files, dependency terms to a specified system in various ways such as phishing mail, sump websites and the like. When all files are transmitted, the user executes the malicious code when opening the malicious document next time. Because the office document is used for a plurality of times, the user can reduce vigilance psychology, so that an attacker can successfully execute malicious behaviors. Therefore, when a VSTO utilizes a component is downloaded, the download folder contains files with suffix VSTO, and then other folders are replaced for saving. If the file is normally downloaded to a designated folder named as 'download', the folder is divided into a first folder and a second folder, files in the office document of the same source (the same URL and the same mailbox) are stored separately, and the VSTO cannot achieve the execution condition, namely, the selection and determination method of the preset folder.

Step 200, determining a target file from a plurality of non-hidden subfiles according to a preset file name; the file name of the target file is a preset file name;

the target file is a non-hidden sub-file named as a preset file name, as shown in fig. 6, a file corresponding to a custom.xml is the target file, the custom.xml is the preset file name, and after the target file is determined, the attribute information in the target file is detected to determine whether the target file contains a VSTO file.

Step S300, if the target file simultaneously comprises a first preset field and a second preset field, determining whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles contained in the to-be-detected office document; any subfile is a non-hidden subfile or a hidden subfile;

after determining the target file, performing field detection on the target file, and detecting whether the target file simultaneously comprises a first preset field and a second preset field, wherein the first preset field is name attribute information of the VSTO file, the second preset field is position attribute information of the VSTO file, as shown in fig. 7, the marked frames in fig. 7 are the first preset field and the second preset field, namely "_Assemblename" and "_Assemblelocation", and judging whether the to-be-detected office file adopts the VSTO technology by detecting whether the target file simultaneously comprises the first preset field and the second preset field. If the first preset field and the second preset field are included at the same time, the VSTO technology is adopted, the risk file is determined through the second preset field and the subfiles of the office document to be detected, the risk file is a file which causes security threat when the risk file is executed, and the threat coefficient of the execution of the risk file is higher than a security threshold.

Further, in step S300, determining whether the to-be-detected office document has a corresponding risk file according to the plurality of subfiles included in the to-be-detected office document includes:

step S310, determining whether a preset storage space contains hidden files, if so, determining each hidden file as a hidden sub-file; determining whether at least one hidden sub-file contains a hidden sub-file with a file type of a first preset type, and if so, determining the hidden sub-file with the file type of the first preset type as a risk file;

whether the subfiles included in the office document to be detected are risk files is determined by detecting whether the subfiles include hidden subfiles with a file type of a first preset type, and as shown in fig. 4, the first preset type is a VSTO type file, which includes an loading item, a dependency item, a dll file, a pdb file and the like required for opening the VSTO file. Since the office document to be detected is transferred to the user machine, only one file is displayed on the user machine, if the malicious document adopts a local VSTO mode, the dll-load item compiled by the net and its dependent items will typically be stored with the office document created to execute it. In order to avoid the user from finding an exception, an attacker typically hides the VSTO load item and its dependencies. Therefore, whether the files of the local VSTO are malicious files can be judged by detecting whether hidden VSTO loading items and dependency items thereof are contained in the office files and the catalogues, so that in order to cope with malicious invasion, whether hidden subfiles of a first preset type are contained in subfiles hidden in the office files to be detected or not is checked, if the hidden subfiles are contained, the hidden subfiles of the first preset type are expressed as a local VSTO malicious invasion mode, and the hidden subfiles of the first preset type are determined as risk files.

Further, in step S310, determining the hidden sub-file with the file type being the first preset type as the risk file includes:

step S311, obtaining characteristic information of a hidden sub-file with a file type of a first preset type;

step S312, determining a risk characteristic value of a hidden sub-file with a corresponding file type of a first preset type according to the characteristic information;

corresponding risk characteristic values are determined by acquiring the characteristic information of the hidden subfiles of the first preset type, and whether the hidden subfiles are risk files or not is determined by comparing the risk characteristic values with preset risk values.

Further, in step S312, determining, according to the feature information, a risk feature value of the hidden subfile of which the corresponding file type is the first preset type includes:

step S3121, determining a feature vector of a hidden sub-file with a corresponding file type of a first preset type according to the feature information;

step S3122, acquiring a plurality of historical non-malicious feature vectors and a plurality of historical malicious feature vectors; the historical non-malicious feature vector is a feature vector corresponding to a historical non-risk file, wherein the historical non-risk file is a file with a file type of a first preset type and does not contain a file with a set type code; the history malicious feature vector is a feature vector corresponding to a history risk file, wherein the history risk file is a file with a file type of a first preset type and comprises a file with a set type code;

Step S3123, clustering at least part of the historical non-malicious feature vectors to obtain a plurality of historical non-malicious feature vector groups;

step S3124, fusing at least part of the historical non-malicious feature vectors in each historical non-malicious feature vector group to obtain a historical non-malicious fused feature vector corresponding to each historical non-malicious feature vector group;

step S3125, comparing the characteristic vector with each historical non-malicious fusion characteristic vector to obtain a plurality of first matching degrees;

step S3126, comparing the characteristic vector with each historical malicious characteristic vector to obtain a plurality of second matching degrees;

step S3127, determining the risk characteristic value of the hidden sub-file with the file type corresponding to the characteristic vector as the first preset type according to the weight of each historical non-malicious fusion characteristic vector, the weight of each historical malicious characteristic vector, each first matching degree and each second matching degree.

Step 3121-step 3127 are methods for determining risk characteristic values of hidden subfiles of a first preset type, and obtain characteristic information of the hidden subfiles of the first preset type, and file size Q ₁ Identification Q of whether it is a hidden subfile ₂ Number of hidden subfiles Q ₃ Number of dll packets Q ₄ Type Q of known dependent item ₅ Type Q of unknown dependent item ₆ It is combined into a corresponding feature vector q= (Q) ₁ ,Q ₂ ,Q ₃ ,Q ₄ ,Q ₅ ,Q ₆ ) Because the malicious reasons of each malicious file are different, an attacker may perform malicious code implantation on the hidden sub-file, and may also perform malicious code implantation on the dependent item of the hidden sub-file, so in order to make the obtained feature vector more accurately represent the full-aspect features of the hidden sub-file, feature information of all aspects of the feature vector needs to be obtained, for example, the size of the file implanted with the malicious code is larger than that of a normal file, and the type of the dependent item of the file implanted with the malicious code is different from that of the normal file. After the feature vectors of the hidden subfiles of the first preset type are obtained, each historical non-malicious feature vector and each historical malicious feature vector are obtained, the historical non-malicious feature vectors and the historical malicious feature vectors can be obtained through recording of historical data, the historical non-malicious feature vectors and the historical malicious feature vectors of documents received or detected in a historical preset time period can be counted, all the historical non-malicious feature vectors are clustered to obtain a plurality of historical non-malicious feature vector groups, fusion processing, such as average processing, is conducted on each historical non-malicious feature vector group, and the historical non-malicious fusion feature vectors corresponding to each historical non-malicious feature vector group are obtained. Because the malicious means used by each detected malicious file is not unique, only the historical non-malicious feature vectors are clustered and fused, and then the features are obtained The vector is respectively subjected to feature comparison with each historical non-malicious fusion feature vector and each historical malicious feature vector to respectively obtain a plurality of first matching degrees and a plurality of second matching degrees, wherein the matching degrees are similar distances between the two compared feature vectors, and then the corresponding first matching degrees and second matching degrees are weighted according to the weight of each historical non-malicious fusion feature vector and the weight of each historical malicious feature vector to obtain risk feature values of hidden subfiles of which the file types corresponding to the feature vectors are of a first preset type.

The weight of each historical non-malicious fusion feature vector is determined by the following method:

acquiring the detection time and the detection accuracy of the historical non-risk file corresponding to each historical non-malicious fusion feature vector, determining the detection efficiency of the corresponding historical non-risk file through the product of the detection time and the detection accuracy, sorting each historical non-malicious fusion feature vector according to the decreasing value of the detection efficiency, and carrying out normalization processing on each historical non-malicious fusion feature vector according to the sorted sequence number to obtain the weight corresponding to each historical non-malicious fusion feature vector.

The weight of each historical malicious feature vector is determined by the following method:

the method comprises the steps of obtaining detection time and detection accuracy of a history risk file corresponding to each history malicious feature vector, determining the detection efficiency of the corresponding history risk file through the product of the detection time and the detection accuracy, sorting each history malicious feature vector according to the decreasing value of the detection efficiency, and carrying out normalization processing on each history malicious feature vector according to the sorted sequence number to obtain the weight corresponding to each history malicious feature vector.

After the weight of each historical non-malicious fusion feature vector and the weight of each historical malicious feature vector are obtained, multiplying the weight of each historical non-malicious fusion feature vector by the corresponding first matching degree or second matching degree, and summing all products to obtain the risk feature value of the hidden sub-file with the file type corresponding to the feature vector being the first preset type.

In addition, the risk file may also be determined by:

performing feature comparison on each feature vector and a preset positive sample vector and a preset negative sample vector to obtain corresponding matching degree; the positive sample vector and the negative sample vector are standard non-risk files and feature vectors corresponding to the risk files, and can be obtained through historical statistics;

If the matching degree between the feature vector and the preset positive sample vector is greater than that between the feature vector and the preset negative sample vector, the non-malicious determination is taken as a feature comparison result of the feature vector; otherwise, maliciously determining the characteristic as a characteristic comparison result of the characteristic vector;

traversing feature comparison results corresponding to all feature vectors, and if the number of the feature comparison results is larger than that of the feature comparison results, determining that the office document to be detected is a non-risk document; otherwise, determining the to-be-detected office document as a risk file.

The feature vector can be analyzed through an AI regression model to obtain a risk file, the AI regression model is determined by each historical malicious feature vector and each historical non-malicious feature vector, the feature vector is put into the AI regression model, a risk feature value corresponding to the feature vector can be obtained, and whether the to-be-detected office document corresponding to the feature vector is the risk file is determined through comparison of the risk feature value and a preset risk threshold value.

Step 313, if the risk characteristic value is greater than the preset risk threshold, determining that the hidden sub-file with the file type corresponding to the risk characteristic value being the first preset type is the risk file.

After obtaining the risk characteristic value of the hidden sub-file with the file type of the first preset type corresponding to the characteristic vector, comparing the risk characteristic value with a preset risk threshold, if the risk characteristic value is larger than the preset risk threshold, the threat risk is larger, so that the hidden sub-file with the file type of the first preset type corresponding to the risk characteristic value is determined to be a risk file, if the risk characteristic value is smaller than or equal to the preset risk threshold, the threat risk is smaller, and the hidden sub-file is not processed.

Step S320, if the hidden sub-file does not contain a hidden sub-file with the file type of the first preset type, determining whether the office document to be detected has a corresponding risk link according to the second preset field;

if the malicious document is in the form of a remote VSTO, the loading item may be stored separately from the created office document and executed. But the attacker needs to assign the network link to the "_AssemblelyLocation" attribute, i.e. in the second preset field. Whether the malicious document adopts a remote VSTO mode can be judged by detecting whether the field content of the second preset field of the target file contains the network link of the second preset character string.

If the hidden sub-file does not include a hidden sub-file with the file type of the first preset type, the hidden sub-file indicates that the local VSTO file does not exist, and whether the remote VSTO network link exists or not is detected, as shown in fig. 8, the remote VSTO network link in the label frame is indicated as a suspicious link.

Further, in step S320, determining whether the to-be-detected office document has a corresponding risk link according to the second preset field includes:

step S321, determining whether the field content of the second preset field contains a network link with a second preset character string, if so, determining the network link with the second preset character string as a risk link; if the network link contains a second preset character string, the storage space corresponding to the network link is provided with a file type of a first preset type;

as shown in fig. 8, the second preset character string is a ". Vst" character string, if the field content of the second preset field does not include a network link with the second preset character string, the to-be-detected office document is indicated to be a non-risk file, and is not processed, if the field content of the second preset field includes a network link with the second preset character string, the to-be-detected office document is indicated to have a suspicious link, and the network link with the second preset character string is determined to be a risk link.

And step S330, outputting a risk link and an alarm prompt if the to-be-detected office document has a corresponding risk link.

And extracting the risk links, prompting the user to find the suspicious links, and judging whether the to-be-detected office document is a malicious document according to the document trust degree, the link trust degree and the like of the to-be-detected office document.

Wherein, step S320 further includes:

step S321, if the target file contains a file download link with a second preset character string, determining whether the file download link is contained in a preset safe link list;

the network links contained in the preset safe link list are safe links, the safe links are network links with the probability of being executed by malicious codes lower than a preset threshold, namely a white list of the network links, the safe execution coefficient of the network links in the preset safe link list is higher, files downloaded through the network links in the preset safe link list are considered to contain no malicious codes, and whether the file download links are safe links is determined by judging whether the file download links exist in the preset safe link list.

The preset safe link list is determined by the following method:

step S001, determining whether the preset network download links are safe links according to the confidence level of at least part of the first preset type files on the websites corresponding to each preset network download link;

step S002, adding the preset network download link which is determined to be the safety link in the preset time period into a preset safety link list.

Because in order to detect whether the file download link is the network link corresponding to the remote VSTO file, the confidence level of the VSTO file is required to be counted and combined into a preset safe link list, the confidence level is a probability coefficient of the corresponding first preset file as a safe file, the probability coefficient can be determined by the access times and the safe execution times of the corresponding preset network download link, if the access times A1 of the corresponding preset network download link of the first preset type file, the download times A2 of the first preset type file and the safe execution times A3 of the first preset type file are acquired, the confidence level of the first preset type file is determined to be A3/A2/A1, the confidence level of each first preset type file on the website corresponding to the first preset network download link is determined by the determination method, if the confidence level of the corresponding first preset type file is the safe link and the preset link is obtained by the access times and the safe execution times of the preset link, the preset link is determined to be the safe link in the total link list, and if the confidence level of the first preset type file is the safe link is obtained, the total safe link is determined to be the safe link, and if the confidence level of the preset link is determined to be the safe link in the total link, and the total safe link is determined to be the safe link.

Step S322, if the preset safe link list does not include the file download link with the second preset character string, outputting the alarm information and the file download link with the second preset character string.

If the file download link corresponding to the office document to be detected is not in the preset safe link list, the file download link is not the safe link, and alarm information and the file download link are output to prompt a user that the file download link has danger, and the file download link is a threat link.

Further, in step S322, outputting the alarm information and the file download link with the second preset character string, including:

step S3221, determining whether the office document to be detected is a malicious document according to the document confidence corresponding to the office document to be detected and the link confidence corresponding to the file download link;

step S3222, if the office document to be detected is not a malicious document, adding the file downloading link to a preset safe link list;

step S3223, if the office document to be detected is a malicious document, outputting alarm information and a file downloading link.

Step S3221-step S3223 are further determining methods for determining whether the to-be-detected office document is a malicious document, when the file download link corresponding to the to-be-detected office document is not in the preset secure link list, the document confidence B1 corresponding to the to-be-detected office document and the link confidence B2 corresponding to the file download link are obtained, the document confidence B1 may be determined by the number B11 of the history similar documents of the to-be-detected office document and the number B12 of the history similar documents being secure documents, b1=b12/B11, the history similar documents are documents similar to the type and size of the to-be-detected office document, the determining method of the link confidence B2 is the same as the document confidence B1, b2=b22/B21, B22 is the number of the history similar links similar to the file download link, and B21 is the number of the history similar links similar to the file download link. After B1 and B2 are obtained, summing the two, if the sum is smaller than a preset confidence threshold value, the fact that malicious codes exist in the to-be-detected office document or security threat exists when the document is opened is indicated, so that the to-be-detected office document is determined to be a malicious document, and at the moment, alarm information and a document downloading link are output to prompt a user; otherwise, if the sum is greater than or equal to the preset confidence threshold, it indicates that no malicious code exists in the to-be-detected office document, and the to-be-detected office document is considered to be a security document, the to-be-detected office document can be opened, and since the file download link contained in the to-be-detected office document is not in the preset security link list, but the to-be-detected office document is a security document, the information of the security link in the preset security link list is considered to be missing at the moment, and the file download link contained in the to-be-detected office document can be added into the preset security link list, so that the security link detection of the following office document is facilitated.

In addition, the preset safety link list changes the safety links in the preset safety link list through a link updating strategy, so that the safety links in the preset safety link list are subjected to state updating, and the network links in the preset safety link list are ensured to be kept as safety links, wherein the link updating strategy comprises the following steps:

step S3201, determining whether the safety links are determined to be risk links according to the confidence level of at least part of the first preset type files on the website corresponding to each safety link in the preset safety link list acquired at intervals of set time;

acquiring the confidence coefficient of all the first preset type files on the website corresponding to each safety link in the preset safety link list every set time, and if the confidence coefficient is still greater than or equal to a preset confidence coefficient threshold value, indicating that the safety link is still a safety link and not processing the safety link; if the confidence coefficient is smaller than the preset confidence coefficient threshold value, the first preset type file on the website corresponding to the safety link is modified, if a new first preset type file is newly added or the original first preset type file is modified, the website corresponding to the safety link becomes a threat website, and at the moment, the safety link is determined to be a risk link.

Step S3202, if the safety link is determined to be a risk link, deleting the risk link from a preset safety link list;

if the confidence level of all the first preset type files on the website corresponding to the safety link is smaller than a preset confidence level threshold, determining the safety link as a risk link, deleting the risk link from a preset safety link list, transferring the risk link into a preset link list, and storing a risk link to be determined in the preset link list;

step S3203, determining the change time of the risk links according to the confidence level of at least part of the first preset type files on the website corresponding to the risk links acquired at intervals of set time;

acquiring the confidence levels of all the first preset type files on the website corresponding to the risk links in the preset link list at intervals of set time, wherein the set time can be the same as or different from the set time of the preset safety link list, and when the confidence level of the risk links in the preset link list is greater than or equal to a preset confidence level threshold value, determining the time when the confidence level of the risk links is greater than or equal to the preset confidence level threshold value as change time, and determining whether the corresponding risk links are added into the preset safety link list through the change time;

Step S3204, determining whether the risk link is determined to be a safety link according to the change time of the risk link;

in step S3205, if the risk link is determined to be a safety link, the risk link is added to the preset safety link list.

If the change time is smaller than or equal to the preset change time threshold, the self-healing capability of the website corresponding to the corresponding risk link is higher, malicious documents or malicious information on the website can be found in a shorter time and cleared, the corresponding risk link is determined to be a safety link and is added into a preset safety link list, if the existence time of the risk link in the preset link list is larger than the preset change time threshold, the self-healing capability of the risk link is poorer, the daily maintenance capability is weaker, the risk link is deleted from the preset link list, the number of the risk links in the preset link list is ensured to be maintained in a smaller number, and therefore the calculation capability of a user machine for the information occupation of the risk link in the preset link list is reduced.

Step 400, if the office document to be detected has a corresponding risk file, outputting an alarm prompt.

According to the invention, the target file is determined from a plurality of non-hidden subfiles contained in the to-be-detected office document, whether the to-be-detected office document has a corresponding risk file is determined by detecting whether the target file simultaneously comprises the first preset field and the second preset field, and if the to-be-detected office document has the corresponding risk file, an alarm prompt is output, so that the local VSTO detection and the remote VSTO detection of the office document are realized, and the safety performance is improved. By detecting the dependence item executed by using the VSTO technology malicious files, the malicious files are prevented from executing various malicious operations such as stealing user data assets, encrypting files in a user system and the like in a user machine, the damage to the user data assets is avoided, the defect of detection of the technology by safety software is overcome, the detection rate of the malicious codes is improved, the malicious codes do not need to be detected based on characteristics, the method has certain universality, less occupied system resources and less influence on system performance.

An office document detection apparatus 100, as shown in fig. 2, includes:

the document response module 110 is configured to obtain a plurality of non-hidden subfiles included in the to-be-detected office document when the to-be-detected office document is obtained;

The target determining module 120 is configured to determine a target file from a plurality of non-hidden subfiles according to a preset file name; the file name of the target file is a preset file name;

the risk determining module 130 is configured to determine, when the target file includes a first preset field and a second preset field, whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles included in the to-be-detected office document;

and the alarm prompt module 140 is configured to output an alarm prompt when the to-be-detected office document has a corresponding risk file.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device according to this embodiment of the invention. The electronic device is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present invention.

The electronic device is in the form of a general purpose computing device. Components of an electronic device may include, but are not limited to: the at least one processor, the at least one memory, and a bus connecting the various system components, including the memory and the processor.

Wherein the memory stores program code that is executable by the processor to cause the processor to perform steps according to various exemplary embodiments of the invention described in the "exemplary methods" section of this specification.

The storage may include readable media in the form of volatile storage, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).

The storage may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The bus may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.

The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. As shown, the network adapter communicates with other modules of the electronic device over a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An office document detection method, comprising:

responding to the acquisition of an office document to be detected, and acquiring a plurality of non-hidden subfiles contained in the office document to be detected;

determining a target file from a plurality of non-hidden subfiles according to a preset file name; the file name of the target file is the preset file name;

If the target file comprises a first preset field and a second preset field at the same time, determining whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles contained in the to-be-detected office document; and if the to-be-detected office document has the corresponding risk file, outputting an alarm prompt.

2. The method of claim 1, wherein the obtaining the plurality of non-hidden subfiles included in the office document to be detected comprises:

replacing the suffix name of the to-be-detected office document with a first preset character string to obtain a compressed office document; if the suffix name of the arbitrary file is the first preset character string, the file type of the arbitrary file is represented as a compressed file;

decompressing the compressed office document into a preset storage space, and obtaining a plurality of non-hidden subfiles obtained after the compressed office document is decompressed in the preset storage space.

3. The method of claim 2, wherein the determining whether the to-be-detected office document has a corresponding risk file according to a plurality of subfiles included in the to-be-detected office document comprises:

Determining whether the preset storage space contains hidden files or not, and if so, determining each hidden file as a hidden sub-file;

4. The method of claim 3, wherein determining whether at least one of the hidden subfiles includes a hidden subfile having a file type of a first preset type further comprises:

and if the to-be-detected office document has a corresponding risk link, outputting the risk link and an alarm prompt.

5. The method of claim 4, wherein the determining whether the to-be-detected office document has a corresponding risk link according to the second preset field comprises:

6. A method according to claim 3, wherein said determining a hidden sub-file of a file type of a first preset type as a risk file comprises:

and if the risk characteristic value is larger than a preset risk threshold value, determining the hidden sub-file with the file type corresponding to the risk characteristic value as a first preset type as a risk file.

7. The method of claim 6, wherein determining, according to the feature information, a risk feature value of a hidden sub-file of which a corresponding file type is a first preset type includes:

acquiring a plurality of historical non-malicious feature vectors and a plurality of historical malicious feature vectors; the historical non-malicious feature vector is a feature vector corresponding to a historical non-risk file; the historical non-risk file is a file with a file type of a first preset type and does not contain a code of a set type; the history malicious feature vector is a feature vector corresponding to a history risk file; the history risk file is a file with a file type of a first preset type and a file containing a set type code;

and determining a risk characteristic value of a hidden sub-file with a file type corresponding to the characteristic vector as a first preset type according to the weight of each historical non-malicious fusion characteristic vector, the weight of each historical malicious characteristic vector, each first matching degree and each second matching degree.

8. An office document detection apparatus, comprising:

9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the method of any one of claims 1-7.

10. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 9.