CN114925365A - File processing method and device, electronic equipment and storage medium - Google Patents

File processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114925365A
CN114925365A CN202210593366.9A CN202210593366A CN114925365A CN 114925365 A CN114925365 A CN 114925365A CN 202210593366 A CN202210593366 A CN 202210593366A CN 114925365 A CN114925365 A CN 114925365A
Authority
CN
China
Prior art keywords
file
information
target
determining
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210593366.9A
Other languages
Chinese (zh)
Inventor
邢洋
童志明
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Antiy Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Antiy Technology Group Co Ltd filed Critical Antiy Technology Group Co Ltd
Priority to CN202210593366.9A priority Critical patent/CN114925365A/en
Publication of CN114925365A publication Critical patent/CN114925365A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Virology (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a file processing method, a file processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a file to be processed; extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different; determining the portrait information corresponding to the file to be processed according to the first target characteristic information; and determining the file type corresponding to the file to be processed according to the portrait information and the portrait library. According to the file processing method, the determined portrait information is obtained according to the first target feature information of different feature types in the file to be processed, so that the portrait information comprises features of a plurality of feature types, and then the file types are matched, detection is carried out in the whole process without a blacklist mode, and the problem that effective detection cannot be carried out due to the fact that the code types are unknown is solved.

Description

File processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of information security, and in particular, to a file processing method and apparatus, an electronic device, and a storage medium.
Background
Most of traditional malicious code detection methods are based on blacklist for detection, and the detection method has the advantage that threats in the blacklist can be quickly detected. But the defect is obvious, namely, the unknown malicious code attack cannot be effectively identified, and a large security hole can be generated.
Disclosure of Invention
In view of this, the present application provides a file processing method, an apparatus, an electronic device, and a storage medium, which at least partially solve the problems in the prior art.
According to an aspect of the present application, there is provided a file processing method including:
acquiring a file to be processed;
extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different;
determining portrait information corresponding to the file to be processed according to the first target characteristic information;
and determining the file type corresponding to the file to be processed according to the portrait information and the portrait library.
In an exemplary embodiment of the present application, the extracting features of the file to be processed to obtain a plurality of first target feature information includes:
obtaining a plurality of feature extraction methods;
sequentially using each feature extraction method to extract features of the file to be processed to obtain a plurality of candidate feature information;
and determining the candidate characteristic information meeting set conditions in the plurality of candidate characteristic information as first target characteristic information.
In an exemplary embodiment of the present application, the plurality of feature extraction methods includes an encrypted feature extraction method;
the encryption feature extraction method comprises the following steps:
determining whether the code information of the file to be processed contains at least one set word;
if yes, determining a target encryption algorithm according to the set words contained in the code information;
determining parameter information in the code information according to the target encryption algorithm;
and determining the encryption candidate characteristic information corresponding to the file to be processed according to the parameter information and the target encryption algorithm.
In an exemplary embodiment of the present application, the determining, according to a plurality of pieces of first target feature information, portrait information corresponding to the file to be processed includes:
determining a weight value corresponding to each first target characteristic information according to the characteristic category corresponding to each first target characteristic information;
and determining the portrait information according to the first target feature information and the corresponding weight values of the first target feature information.
In an exemplary embodiment of the application, the determining a file type corresponding to the file to be processed according to the portrait information and the portrait library includes:
determining target image information from the image library based on the similarity of the image information to each sample image information in the image library;
and determining the file type corresponding to the target portrait information as the file type corresponding to the file to be processed.
In an exemplary embodiment of the present application, the method further comprises:
constructing each sample portrait information in the portrait library, comprising:
acquiring a plurality of sample files;
extracting the characteristics of each sample file to obtain a plurality of second target characteristic information corresponding to each sample file;
determining a weight value corresponding to each second target characteristic information according to a plurality of second target characteristic information corresponding to each sample file;
and determining the sample portrait information of each sample file according to the corresponding second target characteristic information and the weight value corresponding to each second target characteristic information of each sample file.
In an exemplary embodiment of the present application, the method further comprises:
and clustering the sample files according to the file type corresponding to each sample file to obtain at least one sample group.
According to an aspect of the present application, there is provided a document processing apparatus including:
the acquisition module is used for acquiring a file to be processed;
the characteristic extraction module is used for extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different;
the first determining module is used for determining the portrait information corresponding to the file to be processed according to the first target characteristic information;
and the second determining module is used for determining the file type corresponding to the file to be processed according to the portrait information and the portrait library.
According to one aspect of the present application, there is provided an electronic device comprising a processor and a memory;
the processor is configured to perform the steps of any of the above methods by calling a program or instructions stored in the memory.
According to an aspect of the present application, there is provided a computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of any of the above methods.
According to the file processing method, a plurality of first target feature information with different feature types can be obtained by extracting features of different feature types of a file to be processed, wherein each first feature target information is used for representing the feature of the file to be processed in the corresponding feature type, and then image information (which can be understood as the overall feature of the file to be processed) corresponding to the file to be processed is determined according to the plurality of first target feature information. And then comparing the portrait information with a portrait library to determine a file type corresponding to the file to be processed, wherein the file type indicates whether the file to be processed is a malicious file or not, and/or which type of malicious type the file to be processed specifically corresponds to under the condition that the file to be processed is the malicious file. According to the file processing method, the determined portrait information is obtained according to the first target feature information of different feature types in the file to be processed, so that the portrait information comprises features of multiple feature types, then the file types are matched, detection is carried out in the whole process without a blacklist mode, and the problem that effective detection cannot be carried out due to the fact that the code types are unknown is solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a file processing method provided in this embodiment.
Fig. 2 is a block diagram of a file processing apparatus according to this embodiment.
Detailed Description
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, all other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort fall within the scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to or other than one or more of the aspects set forth herein.
Referring to fig. 1, according to an aspect of the present application, there is provided a document processing method, including the steps of:
and step S100, acquiring a file to be processed. The file to be processed can be input or selected and obtained by the staff, and can also be automatically obtained according to a certain rule according to a preset obtaining program.
Step S200, extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; there is a difference in feature class between a number of the first target feature information.
In this embodiment, the first target feature information refers to feature information included in the file to be processed, and may be obtained by performing feature extraction on header file information, code information, or other information of the file to be processed. Meanwhile, in this embodiment, when performing feature extraction on a file to be processed, full-type feature extraction is adopted, that is, if N feature types are preset by a user, feature extraction is performed on the N feature types respectively. But since the file to be processed does not necessarily have all types of features, the situation that part of the first target feature information is an empty set or an empty vector can be allowed to exist. Or in a possible implementation manner, the first target feature information may also refer to feature information of a part empty set or an empty vector, as long as it is ensured that each target feature information has a corresponding feature class.
Step S300, image information corresponding to the file to be processed is determined according to the first target characteristic information. The portrait information may be obtained by performing accumulation, conversion, normalization, and other processing on a plurality of first target feature information, or may be obtained by regarding each first target feature information as a feature vector and then embedding each first target feature information into a preset feature matrix.
And step S400, determining the file type corresponding to the file to be processed according to the portrait information and the portrait library. Specifically, the image information may be matched with each sample image information in the image library, so as to determine a file type corresponding to the file to be processed.
The image library stores a plurality of sample image information, and each sample image information has at least one specific corresponding file type and can also have a corresponding sample file. The file type and the sample file can be directly stored in the image library, and a corresponding association relationship is established with corresponding sample image information, or the file type and the sample file are stored in other memories, and a mapping table indicating the mapping relationship between each sample image information and the corresponding file type and/or sample file is set, so that the file type corresponding to each sample image information can be determined through the association relationship or the mapping table.
The file processing method provided in this embodiment may obtain a plurality of first target feature information with different feature types by performing feature extraction on a file to be processed, where each first feature target information is used to represent a feature of the file to be processed in the corresponding feature type, and then determine image information (which may be understood as an overall feature of the file to be processed) corresponding to the file to be processed according to the plurality of first target feature information. And then comparing the portrait information with a portrait library to determine a file type corresponding to the file to be processed, wherein the file type indicates whether the file to be processed is a malicious file or not, and/or which type of malicious type the file to be processed specifically corresponds to under the condition that the file to be processed is the malicious file. According to the file processing method, the determined portrait information is obtained according to the first target feature information of different feature types in the file to be processed, so that the portrait information comprises features of a plurality of feature types, and then the file types are matched, detection is carried out in the whole process without a blacklist mode, and the problem that effective detection cannot be carried out due to the fact that the code types are unknown is solved.
In this embodiment, a malicious file refers to a file that, when existing or running in an electronic device, may have a specified negative impact or damage on the electronic device. The definition of the specified negative impact may be specified by the impact documented in a preset database.
In an exemplary embodiment of the present application, the extracting features of the file to be processed to obtain a plurality of first target feature information includes:
several feature extraction methods are obtained. Wherein, the plurality of feature extraction methods refers to all set feature extraction methods, and each extraction method is uniquely corresponding to one feature type.
And sequentially using each feature extraction method to extract features of the file to be processed to obtain a plurality of candidate feature information. The number of the candidate feature information is the same as the number of the feature extraction methods.
And determining the candidate characteristic information meeting set conditions in the plurality of candidate characteristic information as first target characteristic information.
In this embodiment, all types of feature extraction are performed on the file to be processed. After the feature extraction is performed, the candidate feature information with the same number as that of the feature extraction method is obtained, and the candidate feature information can exist in a vector form. Meanwhile, as described above, a to-be-processed file may not have features corresponding to certain feature types, and thus, in this embodiment, some candidate feature information may be a null vector or a null set.
In specific implementation, the setting condition may be that the candidate feature information is not a null vector or a null set, and is determined as the first target feature information if the setting condition is met, or may be that a data amount corresponding to the candidate feature information is greater than a setting threshold, and is determined as the first target feature information if the setting condition is met.
In this embodiment, the setting condition is that the data size corresponding to the candidate feature information is greater than a set threshold, and if the setting condition is met, the candidate feature information is determined as the first target feature information, and a value range of the set threshold is 1K to 10K. In this embodiment, the set threshold may be 1K. It should be noted that, in the present embodiment, since the first target feature information conforms to the setting condition, and the setting condition provided in the present embodiment requires that the first target feature has a certain data size, a limitation of the present embodiment, which is different from the foregoing embodiment, is that the first target feature vector may not be an empty set or an empty vector.
In this embodiment, the full-type feature extraction is performed on the file to be processed, so that after the file to be processed is obtained, the full-type feature extraction is directly performed without performing the previous type judgment, and empty candidate feature vectors are removed, so as to retain the first target feature information with useful information. Therefore, the problem that judgment is inaccurate due to the fact that type judgment is carried out in advance is solved. Meanwhile, when the first target characteristic information is confirmed, the first target characteristic information with useful characteristics can be effectively reserved.
If the information amount in the candidate target feature information is small, the candidate target feature information is likely not to effectively represent the feature of the file to be processed on the corresponding feature type, and if the candidate target feature information is determined as the first target feature information, the image feature is likely to be negatively affected. In this embodiment, the setting condition is set such that the data size corresponding to the candidate feature information is greater than the setting threshold, and if the setting condition is met, the candidate feature information is determined to be the first target feature information, so that the above problem can be effectively prevented.
In this embodiment, some of the feature extraction methods may include, but are not limited to, an attribute knowledge portrait feature extraction method, a determination knowledge portrait feature extraction method, a technology stack knowledge portrait feature extraction method, a resource-using knowledge portrait feature extraction method, a behavior knowledge portrait feature extraction method, an attack target knowledge portrait feature extraction method, an analyst knowledge portrait feature extraction method, and the like. The feature type corresponding to the extracted candidate feature information is attribute knowledge portrait feature, determination knowledge portrait feature, technical stack knowledge portrait feature, resource use knowledge portrait feature, behavior knowledge portrait feature, attack target knowledge portrait feature, analyst knowledge portrait feature and the like.
Specifically, in an exemplary embodiment of the present application, the plurality of feature extraction methods include an encrypted feature extraction method.
The encryption feature extraction method specifically comprises the following steps:
determining whether the code information of the file to be processed contains at least one set word; in this embodiment, the file to be processed may be a file such as a script whose corresponding code is plaintext.
And if so, determining a target encryption algorithm according to the set words contained in the code information.
And if not, setting the encryption candidate feature information as an empty set or an empty vector.
And determining parameter information in the code information according to the target encryption algorithm.
And determining the encryption candidate characteristic information corresponding to the file to be processed according to the parameter information and the target encryption algorithm.
In actual implementation, the number of the setting words may be multiple, and the multiple setting words may be determined by a worker according to analysis of the obtained malicious file that has been determined that the code is encrypted. For example, the following may be: "encrypt", "decrypt", "des", "wincrypt", etc.
And traversing the code information of the file to be processed according to the set words, thereby determining the set words contained in the code information. And then determining the encryption algorithm corresponding to the contained set words, such as aes, md4, tkip, xor and the like, according to a preset comparison table.
After the encryption algorithm is determined, parameter information (which may be character string information) in the encrypted code can be extracted from the code information according to the parameter setting characteristics (analyzed in advance) of the corresponding encryption algorithm.
And then, carrying out hash value extraction on the algorithm name corresponding to the encryption algorithm and each parameter information to obtain a plurality of hash values, and carrying out fuzzy hash calculation according to the hash values to obtain encryption candidate characteristic information corresponding to the file to be processed.
In an exemplary embodiment of the present application, the determining, according to a plurality of pieces of first target feature information, portrait information corresponding to the file to be processed includes:
determining a weight value corresponding to each first target characteristic information according to the characteristic category corresponding to each first target characteristic information;
and determining the portrait information according to the first target feature information and the corresponding weight values thereof.
Each feature type has a corresponding weight value, and the weight value is used for representing the negative influence degree of the corresponding feature type, and the larger the weight value is, the larger the negative influence degree is. The weight value is preset, and when the weight value corresponding to each first target feature information is determined, the weight value can be directly obtained according to the corresponding feature category. The detailed description of the specific weight value calculation method will be described later, and will not be repeated herein.
And determining the portrait information according to the first target feature information and the weight values corresponding to the first target feature information, specifically, performing weighted summation on the first target feature information according to the weight values, and performing dimension reduction processing to obtain the portrait information. The portrait information can include at least part of information of each first target characteristic information, so that the portrait information can comprehensively reflect the characteristics of the file to be processed on various characteristic types.
In an exemplary embodiment of the application, the determining a file type corresponding to the file to be processed according to the portrait information and the portrait library includes:
target image information is identified from the image library based on a similarity of the image information to each sample image information in the image library.
And determining the file type corresponding to the target portrait information as the file type corresponding to the file to be processed.
In this embodiment, the similarity between the image information and the sample image information may be determined by calculating a hamming distance between the image information and the sample image information, and the smaller the hamming distance, the higher the similarity.
In an exemplary embodiment of the present application, the method further comprises:
constructing each sample portrait information in the portrait library, comprising:
obtaining a plurality of sample files; the sample file is a malicious file that has been processed to determine its corresponding file type.
And extracting the characteristics of each sample file to obtain a plurality of second target characteristic information corresponding to each sample file.
Determining a weight value corresponding to each second target characteristic information according to a plurality of second target characteristic information corresponding to each sample file;
and determining the sample portrait information of each sample file according to the corresponding second target characteristic information and the weight value corresponding to each second target characteristic information of each sample file.
The method for extracting the features of the sample file is the same as the method for extracting the features of the file to be processed in the foregoing, and the method is the full-scale feature extraction. However, when the second target feature information is determined, the candidate feature information extracted from the sample file is also filtered by the setting condition, but the difference is that the setting condition corresponding to the second target feature information is that the candidate feature information is determined as the second target feature information if the candidate feature information is not an empty vector or an empty set. This is set so that the full amount of features is analyzed as much as possible when the subsequent weight value is determined, thereby determining a more accurate weight value.
In this embodiment, the determining, according to the plurality of second target characteristic information corresponding to each sample file, a weight value corresponding to each second target characteristic information may specifically be:
evaluation was performed based on threat: specifically, all features of all sample files are extracted, feature screening is carried out according to core threat behaviors, and feature weight division is carried out according to the degree of closeness of association with the core behaviors. For example, the encryption Lesso malicious code has high weight of the corresponding encryption characteristics 'payment' and 'money', and the encryption characteristics aes and des corresponding to the Trojan horse virus have high weight.
Carrying out artificial probability statistics: the method specifically comprises the steps of extracting all characteristics of all sample files, counting the frequency of the characteristics appearing in all malicious samples, calibrating the weight according to the frequency, dividing frequency ranges corresponding to different weights according to a statistical rule, and finally obtaining a marking rule of the weight. For example, 1000 samples with encryption behaviors are selected, and the weight of the encryption algorithm characteristic information is 1 after 1000 times of occurrence; the weight is 0.8 when 800 times occur; weight 0.6 occurs 600 times; the weight is 0.2 below 400 occurrences.
And finally, weighting and making a final weight value based on threat degree evaluation and artificial probability statistics and by combining the credit degree of the information.
In an exemplary embodiment of the present application, the method further comprises:
clustering the sample files according to the file type corresponding to each sample file to obtain at least one sample group;
and grouping the sample image information according to sample files contained in the sample groups to obtain a plurality of image groups, and generating an image library. The number of image groups is the same as the number of sample groups.
Each sample group or image group corresponds to a malicious behavior type or a malicious code family, such as a virus, a trojan, and the like.
Thus, when the image information in the image library needs to be inquired, the corresponding sample group or image group can be quickly inquired. Meanwhile, each image group may have a corresponding group characteristic information. The group feature information may be generated from all of the image feature information within the corresponding image group. To identify the corresponding malicious behavior type or overall characteristics of the malicious code family. When the first target feature information is used for determining the target image information, similarity determination can be performed with group feature information of each image group, and an image group corresponding to at least one group feature information with similarity higher than a threshold value is determined as the target image group. The first target feature information is then compared with the similarity of each sample image information in each target image group to determine target image feature information. Thus, the data processing amount in determining the target image information can be effectively reduced, and the overall processing speed can be increased.
Referring to fig. 2, according to an aspect of the present application, there is provided a document processing apparatus including:
the acquisition module is used for acquiring a file to be processed;
the characteristic extraction module is used for extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different;
the first determining module is used for determining the portrait information corresponding to the file to be processed according to the first target characteristic information;
and the second determining module is used for determining the file type corresponding to the file to be processed according to the portrait information and the portrait library.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device according to this embodiment of the present application. The electronic device is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
The electronic device is in the form of a general purpose computing device. Components of the electronic device may include, but are not limited to: the at least one processor, the at least one memory, and a bus connecting the various system components (including the memory and the processor).
Wherein the storage stores program code executable by the processor to cause the processor to perform steps according to various exemplary embodiments of the present application described in the "exemplary methods" section above.
The memory may include readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).
The storage may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A file processing method, comprising:
acquiring a file to be processed;
extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different;
determining the portrait information corresponding to the file to be processed according to the first target characteristic information;
and determining the file type corresponding to the file to be processed according to the portrait information and the portrait library.
2. The file processing method according to claim 1, wherein said extracting features of the file to be processed to obtain a plurality of first target feature information includes:
obtaining a plurality of feature extraction methods;
sequentially using each feature extraction method to extract features of the file to be processed to obtain a plurality of candidate feature information;
and determining the candidate characteristic information meeting set conditions in the plurality of candidate characteristic information as first target characteristic information.
3. The document processing method according to claim 2, wherein the plurality of feature extraction methods include an encrypted feature extraction method;
the encryption feature extraction method comprises the following steps:
determining whether the code information of the file to be processed contains at least one set word;
if yes, determining a target encryption algorithm according to the set words contained in the code information;
determining parameter information in the code information according to the target encryption algorithm;
and determining the encryption candidate characteristic information corresponding to the file to be processed according to the parameter information and the target encryption algorithm.
4. The method for processing the file according to claim 1, wherein the determining the portrait information corresponding to the file to be processed according to the plurality of first target feature information comprises:
determining a weight value corresponding to each first target characteristic information according to the characteristic category corresponding to each first target characteristic information;
and determining the portrait information according to the first target feature information and the corresponding weight values of the first target feature information.
5. The method of claim 1, wherein determining the file type corresponding to the file to be processed according to the portrait information and the portrait library comprises:
determining target image information from the image library based on the similarity of the image information to each sample image information in the image library;
and determining the file type corresponding to the target portrait information as the file type corresponding to the file to be processed.
6. The document processing method according to claim 1, characterized in that the method further comprises:
constructing each sample portrait information in the portrait library, comprising:
obtaining a plurality of sample files;
extracting the characteristics of each sample file to obtain a plurality of second target characteristic information corresponding to each sample file;
determining a weight value corresponding to each second target characteristic information according to a plurality of second target characteristic information corresponding to each sample file;
and determining the sample portrait information of each sample file according to the corresponding second target characteristic information and the weight value corresponding to each second target characteristic information of each sample file.
7. The document processing method according to claim 6, further comprising:
and clustering the sample files according to the file type corresponding to each sample file to obtain at least one sample group.
8. A document processing apparatus, characterized by comprising:
the acquisition module is used for acquiring a file to be processed;
the characteristic extraction module is used for extracting the characteristics of the file to be processed to obtain a plurality of first target characteristic information; the feature categories of a plurality of pieces of first target feature information are different;
the first determining module is used for determining the portrait information corresponding to the file to be processed according to the first target characteristic information;
and the second determining module is used for determining the file type corresponding to the file to be processed according to the portrait information and the portrait library.
9. An electronic device comprising a processor and a memory;
the processor is adapted to perform the steps of the method of any one of claims 1 to 7 by calling a program or instructions stored in the memory.
10. A computer-readable storage medium, characterized in that it stores a program or instructions for causing a computer to perform the steps of the method according to any one of claims 1 to 7.
CN202210593366.9A 2022-05-27 2022-05-27 File processing method and device, electronic equipment and storage medium Pending CN114925365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210593366.9A CN114925365A (en) 2022-05-27 2022-05-27 File processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210593366.9A CN114925365A (en) 2022-05-27 2022-05-27 File processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114925365A true CN114925365A (en) 2022-08-19

Family

ID=82810461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210593366.9A Pending CN114925365A (en) 2022-05-27 2022-05-27 File processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114925365A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034260A (en) * 2023-10-08 2023-11-10 深圳安天网络安全技术有限公司 Event judgment information generation method and device, medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034260A (en) * 2023-10-08 2023-11-10 深圳安天网络安全技术有限公司 Event judgment information generation method and device, medium and electronic equipment
CN117034260B (en) * 2023-10-08 2024-01-26 深圳安天网络安全技术有限公司 Event judgment information generation method and device, medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN112003870B (en) Network encryption traffic identification method and device based on deep learning
CN112953933B (en) Abnormal attack behavior detection method, device, equipment and storage medium
CN111400357A (en) Method and device for identifying abnormal login
CN113162794B (en) Next attack event prediction method and related equipment
CN111222137A (en) Program classification model training method, program classification method and device
CN112632609A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
US11068595B1 (en) Generation of file digests for cybersecurity applications
CN111339531A (en) Malicious code detection method and device, storage medium and electronic equipment
CN114925365A (en) File processing method and device, electronic equipment and storage medium
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
CN112685255A (en) Interface monitoring method and device, electronic equipment and storage medium
US20230017839A1 (en) Risk analysis result display apparatus, method, and computer readable media
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN116015861A (en) Data detection method and device, electronic equipment and storage medium
Vahedi et al. Cloud based malware detection through behavioral entropy
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN115589339A (en) Network attack type identification method, device, equipment and storage medium
Shi et al. SFCGDroid: android malware detection based on sensitive function call graph
CN115913710A (en) Abnormality detection method, apparatus, device and storage medium
US11550910B2 (en) Creating generic rules in a high dimensional sparse feature space using negative feedback
CN116760644B (en) Terminal abnormality judging method, system, storage medium and electronic equipment
JP7427146B1 (en) Attack analysis device, attack analysis method, and attack analysis program
US20240054213A1 (en) Attack information generation apparatus, control method, and non-transitory computer readable medium
KR102471731B1 (en) A method of managing network security for users
Ban et al. Augmenting Android Malware Using Conditional Variational Autoencoder for the Malware Family Classification.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination