CN108270783B - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN108270783B
CN108270783B CN201810034333.4A CN201810034333A CN108270783B CN 108270783 B CN108270783 B CN 108270783B CN 201810034333 A CN201810034333 A CN 201810034333A CN 108270783 B CN108270783 B CN 108270783B
Authority
CN
China
Prior art keywords
data
content
state
file
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810034333.4A
Other languages
Chinese (zh)
Other versions
CN108270783A (en
Inventor
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN201810034333.4A priority Critical patent/CN108270783B/en
Publication of CN108270783A publication Critical patent/CN108270783A/en
Application granted granted Critical
Publication of CN108270783B publication Critical patent/CN108270783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a data processing method and a data processing device, wherein the method comprises the following steps: receiving a data message, wherein the data message comprises data content; determining target data content, wherein the target data content comprises data content; extracting the content of the pre-set data volume of the target data content; matching the extracted content with a preset feature code, and determining the preset feature code included in the extracted content; determining a file type corresponding to the preset feature code included in the extracted content as a target data type according to the corresponding relation between the preset feature code and the file type; and processing the data message and other data messages including other data contents belonging to the same original data packet with the data contents according to the target data type. By applying the technical scheme provided by the embodiment of the application, the accuracy of file type identification is improved, and the network security is improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a data processing method and apparatus.
Background
The file filtering function is a deep packet inspection technology realized based on an application layer inspection engine, and is a security protection mechanism for filtering files transmitted by equipment according to file type information.
Currently, the file filtering function is implemented in the following manner: and determining the file type of the file in the data message according to the extension, and further processing the data message according to the characteristic rule matched with the file type. For example, an intranet user is restricted from sending a docx file and a pptx file to an extranet device, and if it is detected that the extension of the file in the data packet is docx or pptx, it is determined that the file type of the file in the data packet is a docx file or pptx file, and the file in the data packet may be discarded, so as to restrict the docx file or pptx file from sending the extranet device.
The extension of the file is relatively simple, and the determination of the file type of the file by the extension is inaccurate; in addition, the user can bypass the security check by modifying the extension of the file, and the security of the network is low.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data processing method and apparatus, so as to improve accuracy of file type identification and improve security of a network. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application discloses a data processing method, where the method includes:
receiving a data message, wherein the data message comprises data content;
determining target data content, the target data content comprising the data content;
extracting the content of the preset data volume of the target data content;
matching the extracted content with a preset feature code, and determining the preset feature code included in the extracted content;
determining a file type corresponding to the preset feature code included in the extracted content as a target data type according to the corresponding relation between the preset feature code and the file type;
and processing the data message and other data messages according to the target data type, wherein the other data messages comprise other data contents belonging to the same original data packet as the data contents.
In a second aspect, an embodiment of the present application discloses a data processing apparatus, including:
a receiving unit, configured to receive a data packet, where the data packet includes data content;
a first determination unit configured to determine a target data content, the target data content including the data content;
the extraction unit is used for extracting the content of the preset data volume of the target data content;
the matching unit is used for matching the extracted content with the preset feature codes and determining the preset feature codes included in the extracted content;
the second determining unit is used for determining the file type corresponding to the preset feature code included in the extracted content as the target data type according to the corresponding relation between the preset feature code and the file type;
and the processing unit is used for processing the data message and other data messages according to the target data type, wherein the other data messages comprise other data contents which belong to the same original data packet as the data contents.
In a third aspect, an embodiment of the present application discloses an electronic device, which includes a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor is caused by the machine-executable instructions to implement the above data processing method.
In a fourth aspect, embodiments of the present application disclose a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the above-described data processing method.
In the embodiment of the application, the data content comprises the file characteristic information, the file type is determined according to the data content instead of the simple extension name, so that the accuracy of file type identification is improved. Of course, it is not necessary for any product or method of the present application to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first schematic flowchart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a state machine provided in an embodiment of the present application;
fig. 3 is a second flowchart of a data processing method according to an embodiment of the present application;
fig. 4 is a schematic view of a data processing scenario provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
At present, the file type of a file in a data message is mainly determined according to an extension, and then the data message is processed according to a characteristic rule matched with the file type. The extension of the file is relatively simple, and the determination of the file type of the file by the extension is inaccurate; in addition, the user can bypass the security check by modifying the extension of the file, and the security of the network is low.
In order to improve accuracy of file type identification and improve network security, the embodiment of the application provides a data processing method and device. The method can be applied to network devices such as firewall devices, switches, routers and the like, and the method is not limited in the embodiment of the application.
Referring to fig. 1, fig. 1 is a first flowchart of a data processing method provided in an embodiment of the present application, where the method may be applied to a device equipped with a Deep Packet Inspection (DPI) function, such as a firewall device.
Specifically, the data processing method includes:
step 101: receiving a data message; the data message includes data content.
When the network device sends a data packet, the data type of an original data packet corresponding to the data content carried by the data packet may be a non-file type or a file type.
In addition, if the size of the original data packet exceeds the Maximum Transmission Unit (MTU) that can be supported by the link, the network device splits the original data packet into a plurality of sub-packets, and each sub-packet is encapsulated in one data packet and sent to other devices. At this time, the data packet received by the other device includes a part of data content of the original data packet.
If the size of the original data packet does not exceed the MTU that can be supported by the link, the network device encapsulates the original data packet in a data packet and sends the data packet to other devices. At this time, the data packet received by the other device includes a complete original data packet.
In this embodiment of the present application, if an original data packet is split into multiple packets, a received data packet is a first packet corresponding to the original data packet.
Step 102: target data content is determined. The target data content includes: the data message includes data content.
The data message can be transmitted by different protocols, after the data message is received, the data message is firstly analyzed according to the transmission protocol of the data message to obtain data content, and then the target data content is determined according to the data content obtained by analysis.
In one embodiment of the present application, the data content included in the data packet may be determined as the target data content.
In another embodiment of the present application, the amount of data carrying data content in each data packet is uncertain. In order to avoid the problem that the data type cannot be accurately determined due to the fact that the data amount of the data content received first is less than the preset data amount, whether the data amount of the data content included in the received data message is not less than the preset data amount or not can be determined first.
If the data volume of the data content is determined to be not less than the preset data volume, the data content can be directly determined as the target data content.
And if the data volume of the data content is smaller than the preset data volume, determining the data content and other data contents as target data contents when other data messages including other data contents belonging to the same original data packet as the data content included in the data message are received. That is, the target data content may include, in addition to the data content included in the data packet: at least one other data packet includes other data content belonging to the same original data packet as the data content.
In order to ensure that the data type is quickly identified and the occupied storage space is reduced, each time another data message is received, whether the sum of the data content included in the data message received once and the data content included in the other data message is not less than the preset data amount or not can be determined. If the data content is not less than the preset data volume, determining the data content and other data contents as target data content; and if the data volume is less than the preset data volume, continuously receiving other data messages.
In an embodiment of the present application, if the data size of the data content is smaller than the preset data size, a timer may be started, and if other data packets including other data contents belonging to the same original data packet as the data content included in the previously received data packet are received before the timer expires, the data content and the other data contents are determined as target data contents; if the timer is overtime and still does not receive other data messages including other data contents belonging to the same original data packet as the data contents included in the first received data message, the data contents included in the received data messages are determined as target data contents, so that the problems that the target data contents are stored all the time and other data messages are waited to be received are solved, the storage space of the equipment is occupied, and the influence on data processing is reduced.
For example, the preset data amount is Q. The original data packet X is split into 5 packets, that is, the original data packet X is split into 5 portions of data content, the 5 portions of data content are respectively encapsulated in 5 data messages, and the 5 data messages include a message 1, a message 2, a message 3, a message 4, and a message 5.
After the network device receives the message 1, if it is determined that the data volume of the data content included in the message 1 is not less than Q, the data content included in the message 1 is determined as the target data content.
And if the data volume of the data content in the message 1 is smaller than Q, starting a timer to wait for receiving other data messages.
And if the timer is overtime, determining the data content included in the message 1 as the target data content.
If the network device receives the message 2 before the timer is overtime, determining whether the sum of the data content included in the message 1 and the data volume of the data content included in the message 2 is not less than Q.
And if the sum of the data amount of the data content included in the message 1 and the data amount of the data content included in the message 2 is determined to be not less than Q, determining the data content included in the message 1 and the data content included in the message 2 as target data content.
If the sum of the data content included in the message 1 and the data content included in the message 2 is smaller than Q, resetting and starting the timer, and continuing to wait for receiving other data messages.
And if the timer is overtime, determining the data content included in the message 1 and the data content included in the message 2 as the target data content.
If the network device receives the message 3 before the timer is overtime, it is determined whether the sum of the data content included in the message 1, the data content included in the message 2, and the data amount of the data content included in the message 3 is not less than Q. Specifically, the operation when the message 2 is received may be referred to.
Step 103: and extracting the content of the previous preset data volume of the target data content.
Here, the preset data amount may be set based on an empirical value, or may be set in consideration of an accuracy rate of determining the data type and an efficiency of determining the data type. For example, extracting the signature from the first 12 bytes of the data content enables the accuracy of determining the file type to be as high as 90%, and thus the preset data amount can be set to 12 bytes.
In the embodiment of the application, the content of the pre-preset data volume of the target data content is extracted to determine the file type, that is, only the content of the preset data volume needs to be cached, all the target data content does not need to be cached, all the data content of the same original data packet does not need to be cached, and the storage space is saved. In addition, the file type is determined according to the content of the preset data volume of the target data content, the file type is not determined according to all the target data content, and the file type is not determined according to all the data content of the same original data packet, so that the file identification efficiency is effectively improved.
Step 104: and matching the extracted content with the preset feature codes, and determining the preset feature codes included in the extracted content.
Here, the feature code is a feature field of the data Content, and the preset feature code may include feature codes for different types of files, for example, a feature code "rar" for a compressed file, a feature code "PK [ Content _ types ]. xml" for a docx file, and the like.
Step 105: and determining the file type corresponding to the preset feature code included in the extracted content as the target data type according to the corresponding relation between the preset feature code and the file type.
In an embodiment of the present application, in order to determine the file type, the file may be divided into multiple types, and the feature code of each type of file is determined separately, where the file type may include:
(1) basic class files: the file content header is provided with an obvious feature code; for example, compressed files;
(2) the general class file: the file content header is provided with a universal feature code; for example, docx, pptx files; here, the file type cannot be accurately determined through the feature code of the header, and the content of other fields needs to be detected to determine the file type;
(3) script class files: in the data transmission process of the file, the protocol load content is the content of the program, the language style of the program and the key field or the code field of the specific program can be considered in the file identification, and the specific type of the file, such as perl script file, is determined according to the comprehensive characteristic field;
(4) unknown file classes: the files have no clear characteristics and need to be isolated and processed independently during processing. Common such documents are: a picture class file, an audio class file, a WIN executable file, a Linux executable file, etc.
After extracting the feature codes for each type of file, storing the corresponding relation between the feature codes of the type of file and the file types of the type of file, deducing the feature rules matched with the preset feature codes matched with the extracted content according to the corresponding relation between the feature codes and the file types, and determining the file types of the received data content.
Step 106: and processing the data message and other data messages according to the target data type. The other data packets include other data content belonging to the same original data packet as the data content included in the data packet.
Here, the processing of the data packet and the other data packets may be: and inputting the data message and other data messages into a deep packet inspection engine for inspection processing.
In an embodiment of the present application, if it is determined that the target data type is a compressed file, it indicates that other files are nested in the target data content, decompresses the target data content, extracts the content of the previous preset data size of the decompressed content, and continues to execute step 104.
Here, in order to quickly identify the type of the file, only the content extracted from the target data content may be decompressed, and thereafter, step 105 may be re-performed.
And if the target data type is determined not to be the compressed file, processing the data message and other data messages according to the target data type.
In an embodiment of the application, after the extracted content is matched with the preset feature code, the preset feature code that is not included in the extracted content is determined, it may be determined that the target data content is a non-file content, the target data content is a non-file type, and then the non-file type is taken as the target data type, and then the data packet and other data packets are processed according to the target data type.
For example, the preset rule is: the transmission of docx, pptx files is prohibited, and non-files and other types of files are allowed. If the target data type of the data content Y is determined to be the pptx file type, discarding the data message and other data messages including other data contents belonging to the same original data packet as the data content Y; and if the target data type is determined to be a non-file type, transmitting the data message and other data messages including other data contents belonging to the same original data packet as the data content Y.
In one embodiment of the present application, to facilitate determining the data type and processing the data packet, reference is made to the state machine shown in fig. 2, according to which the data type identification and processing of the data packet are performed.
When the state machine is in an Initial state, analyzing the received data message to obtain data content, and extracting the content of the data content with the preset data volume; then, the state transitions to the MNMatch (MN algorithm matching) state. Among them, the MN algorithm is a lightweight AC (Aho-corestick) algorithm.
When the state machine is in an MNMatch state, matching the extracted content with a preset feature code; if no preset feature code is matched with the extracted content after matching, determining that the file type of the data content is a non-file type, processing the data message and other data messages including other data contents belonging to the same original data packet as the data content according to the non-file type, and switching to a FINI (Finish) state; and if the preset feature codes are matched with the extracted contents after matching, switching to a Sigdeduce (feature derivation) state.
When the state machine is in a Sigdeduce state, deducing a feature rule matched with the extracted preset feature code matched with the content, and determining the file type of the data content; if the File type of the data content is determined, processing the data message and other data messages including other data contents belonging to the same original data packet as the data content according to the determined File type, and switching to a File process (File processing) state; if the deduction is not finished, the state of the state machine is not changed, and the deduction is continued when the next data message arrives; if the deduction fails, determining that the file type of the data content is a non-file type, processing the data message and other data messages including other data contents belonging to the same original data packet as the data content according to the non-file type, and switching to a FINI state.
When the state machine is in a Fileproc state, calling a callback function of a file according to the determined file type, and inputting the data message and other data messages including other data contents belonging to the same original data packet as the data contents into corresponding service modules for processing. Before the original packet is not finished, the state machine is in a Fileproc state. When the original data packet is finished, the state is transferred to the FINI state.
The service module includes an AV (anti-virus) module, an IPS (Intrusion Prevention System) module, a FW (Fire Wall) module, and the like.
When the state machine is in the FINI state, data processing ends.
By applying the embodiment of the application, the data content comprises the file characteristic information, the file type is determined according to the data content instead of the simple extension name, so that the accuracy of file type identification is improved.
In an embodiment of the present application, referring to a second flowchart of the data processing method shown in fig. 3, based on fig. 1, the method includes:
step 301: receiving a data message; the data message includes data content.
Step 302: target data content is determined. The target data content includes: the data message includes data content.
Step 303: and extracting the content of the previous preset data volume of the target data content.
Step 304: and matching the extracted content with the preset feature codes, and determining the preset feature codes included in the extracted content.
Step 305: and determining the file type corresponding to the preset feature code included in the extracted content as the target data type according to the corresponding relation between the preset feature code and the file type.
Steps 301-305 are the same as steps 101-105.
Step 306: and determining a target depth detection engine corresponding to the target data type according to the corresponding relation between the pre-stored data type and the depth packet detection engine.
In the embodiment of the application, the acquired feature rules are divided according to data types. And compiling the divided characteristic rules corresponding to the data types to generate the deep packet inspection engines corresponding to the data types for each data type. For example. And obtaining the feature rule of the compressed file, compiling to generate a deep packet inspection engine 1, and obtaining the feature rule of the script file, compiling to generate a deep packet inspection engine 2.
When the target data type is determined, a target depth detection engine corresponding to the target data type can be determined.
Step 307: and inputting the data message and other data messages into a target deep packet inspection engine, and respectively determining the characteristic rules matched with the data message and other data messages.
The target deep packet inspection engine comprises characteristic rules corresponding to the data types of the target data contents, and the data messages and other data messages are input into the target deep packet inspection engine, so that the characteristic rules respectively matched with the data messages and other data messages can be determined.
Step 308: and respectively processing the data message and other data messages according to the matched characteristic rules. The other data packets include other data content belonging to the same original data packet as the data content included in the data packet.
In the embodiment of the application, a plurality of deep packet inspection engines are divided according to data types, each deep packet inspection engine comprises a characteristic rule corresponding to one data type and is far less than the characteristic rules corresponding to all data types, a target deep packet inspection engine is determined according to the data types, the target deep packet inspection engine inspects the characteristic rules matched with the data messages and other data messages, and compared with a main deep packet inspection engine compiled by the characteristic rules corresponding to all data types, the detection speed is effectively improved by inspecting the characteristic rules matched with the data messages and other data messages, and the data processing efficiency is further improved.
As shown in fig. 4, in the data processing scenario, 5 deep packet inspection engines are set in the network device, which are deep packet inspection engine 1 for base class files, deep packet inspection engine 2 for WIN executable class files, deep packet inspection engine 3 for Linux executable class files, deep packet inspection engine 4 for picture class files, and deep packet inspection engine 5 for script class files.
If the network device receives a flow, as shown in fig. 4, the flow is divided into 6 segments, including a non-file, a WIN executable file, an unidentifiable file, a picture file, a script file, and a Linux executable file; inputting non-files and unidentifiable files into a deep packet inspection engine 1, inputting WIN executable files into a deep packet inspection engine 2, inputting picture files into a deep packet inspection engine 4, inputting script files into a deep packet inspection engine 5, and inputting Linux executable files into a deep packet inspection engine 3; and determining matched characteristic rules of each file by the 5 deep packet inspection engines, and further processing the flow. Therefore, under the condition of not reducing the identification accuracy, the identification efficiency is improved, and the equipment performance is improved.
The data content mentioned in the embodiment of the present application is the data content included in the received data packet. The other data packets are data packets including other data contents belonging to the same original data packet as the data contents.
Corresponding to the data processing method embodiment, the embodiment of the application also provides a data processing device. Referring to fig. 5, fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the apparatus includes:
a receiving unit 501, configured to receive a data packet, where the data packet includes data content;
a first determining unit 502 for determining a target data content, the target data content comprising a data content;
an extracting unit 503 for extracting the content of the pre-set data amount of the target data content;
a matching unit 504, configured to match the extracted content with a preset feature code, and determine the preset feature code included in the extracted content;
a second determining unit 505, configured to determine, according to a corresponding relationship between the preset feature codes and file types, a file type corresponding to the preset feature codes included in the extracted content, as a target data type;
a processing unit 506, configured to process the data packet and other data packets according to the target data type, where the other data packets include other data contents belonging to the same original data packet as the data content.
In an embodiment of the present application, if the data size of the data content is smaller than the preset data size, the target data content further includes: at least one other data packet includes other data content belonging to the same original data packet as the data content.
In an embodiment of the application, the processing unit 506 may be further configured to decompress the target data content if the target data type is a compressed file;
the extracting unit 503 may be further configured to extract a content with a pre-set data size from the decompressed content;
the processing unit 506 may be further configured to process the data packet and other data packets according to the target data type if the target data type is not the compressed file.
In an embodiment of the application, the second determining unit 505 may be further configured to determine that the target data content is a non-file type if the extracted content does not include the preset feature code, and use the non-file type as the target data type.
In an embodiment of the present application, the processing unit 506 may specifically be configured to:
determining a target depth detection engine corresponding to the target data type according to the corresponding relation between the pre-stored data type and the depth packet detection engine;
inputting the data message and other data messages into a target deep packet detection engine, and respectively determining characteristic rules matched with the data message and other data messages;
and respectively processing the data message and other data messages according to the matched characteristic rules.
By applying the embodiment of the application, the data content comprises the file characteristic information, the file type is determined according to the data content instead of the simple extension name, so that the accuracy of file type identification is improved.
Corresponding to the data processing method embodiment, the present application also provides an electronic device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor is caused by the machine-executable instructions to implement the data processing method.
As shown in fig. 6, the electronic device includes a processor 601 and a machine-readable storage medium xx, and the machine-readable storage medium 602 stores machine-executable instructions executable by the processor 601.
In addition, as shown in fig. 6, the electronic device may further include: a communication interface 603 and a communication bus 604; the processor 601, the machine-readable storage medium 602, and the communication interface 603 complete communication with each other through the communication bus 604, and the communication interface 603 is used for communication between the electronic device and other devices.
Wherein the machine-executable instructions comprise: receive instructions 612, first determine instructions 622, fetch instructions 632, match instructions 642, second determine instructions 652, and process instructions 662;
the receipt of instructions 612 by the processor 601 causes the implementation of: receiving a data message, wherein the data message comprises data content;
the processor 601 is caused by the first determination instructions 622 to implement: determining target data content, wherein the target data content comprises data content;
the processor 601 is caused by the fetch instructions 632 to implement: extracting the content of the pre-set data volume of the target data content;
the processor 601 is caused by the matching instructions 642 to implement: matching the extracted content with a preset feature code, and determining the preset feature code included in the extracted content;
the processor 601 is caused by the second determination instructions 652 to implement: determining a file type corresponding to the preset feature code included in the extracted content as a target data type according to the corresponding relation between the preset feature code and the file type;
the processor 601 is caused by the processing instructions 662 to implement: and processing the data message and other data messages according to the target data type, wherein the other data messages comprise other data contents belonging to the same original data packet with the data contents.
In an embodiment of the present application, if the data size of the data content is smaller than the preset data size, the target data content further includes: at least one other data packet includes other data content belonging to the same original data packet as the data content.
In one embodiment of the application, the processor 601 is caused by the processing instructions 662 to further implement: if the target data type is a compressed file, decompressing the target data content;
the processor 601 is caused by the fetch instructions 632 to further implement: extracting the content of the preset data volume from the decompressed content;
the processor 601, being caused by the processing instructions 662, may also implement: and if the target data type is not the compressed file, processing the data message and other data messages according to the target data type.
In one embodiment of the present application, the processor 601, caused by the second determination instructions 652, may further implement: and if the extracted content does not comprise the preset feature code, determining that the target data content is of a non-file type, and taking the non-file type as the target data type.
In one embodiment of the present application, the processor 601 is caused by the processing instructions 662 to specifically implement:
determining a target depth detection engine corresponding to the target data type according to the corresponding relation between the pre-stored data type and the depth packet detection engine;
inputting the data message and other data messages into a target deep packet detection engine, and respectively determining characteristic rules matched with the data message and other data messages;
and respectively processing the data message and other data messages according to the matched characteristic rules.
By applying the embodiment of the application, the data content comprises the file characteristic information, the file type is determined according to the data content instead of the simple extension name, so that the accuracy of file type identification is improved.
The communication bus 604 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus 604 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The machine-readable storage medium 602 may include a RAM (Random Access Memory) and may also include a NVM (Non-Volatile Memory), such as at least one disk Memory. Additionally, the machine-readable storage medium 602 may also be at least one memory device located remotely from the aforementioned processor.
Processor 601 may be a general-purpose Processor including a CPU (Central Processing Unit), an NP (Network Processor), etc.; but also DSPs (Digital Signal Processing), ASICs (Application Specific Integrated circuits), FPGAs (Field Programmable Gate arrays) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
Corresponding to the data processing method embodiment, the embodiment of the present application further provides a machine-readable storage medium, which stores machine executable instructions, and when the machine executable instructions are called and executed by a processor, the machine executable instructions cause the processor to implement the data processing method.
Wherein the machine-executable instructions comprise: receiving an instruction, a first determining instruction, an extracting instruction, a matching instruction, a second determining instruction and a processing instruction;
when invoked and executed by a processor, receiving instructions cause the processor to: receiving a data message, wherein the data message comprises data content;
when invoked and executed by a processor, the first determining instructions cause the processor to: determining target data content, wherein the target data content comprises data content;
when invoked and executed by a processor, the fetch instructions cause the processor to: extracting the content of the pre-set data volume of the target data content;
when invoked and executed by a processor, the matching instructions cause the processor to: matching the extracted content with a preset feature code, and determining the preset feature code included in the extracted content;
when invoked and executed by a processor, the second determining instructions cause the processor to: determining a file type corresponding to the preset feature code included in the extracted content as a target data type according to the corresponding relation between the preset feature code and the file type;
when invoked and executed by a processor, the processing instructions cause the processor to: and processing the data message and other data messages according to the target data type, wherein the other data messages comprise other data contents belonging to the same original data packet with the data contents.
In an embodiment of the present application, if the data size of the data content is smaller than the preset data size, the target data content further includes: at least one other data packet includes other data content belonging to the same original data packet as the data content.
In one embodiment of the application, the processing instructions, when invoked and executed by a processor, cause the processor to further implement: if the target data type is a compressed file, decompressing the target data content;
when invoked and executed by a processor, the fetch instructions cause the processor to further implement: extracting the content of the preset data volume from the decompressed content;
when invoked and executed by a processor, the processing instructions cause the processor to further implement: and if the target data type is not the compressed file, processing the data message and other data messages according to the target data type.
In one embodiment of the present application, the second determining instructions, when invoked and executed by the processor, cause the processor to further implement: and if the extracted content does not comprise the preset feature code, determining that the target data content is of a non-file type, and taking the non-file type as the target data type.
In one embodiment of the present application, the processing instructions, when invoked and executed by a processor, cause the processor to implement in particular:
determining a target depth detection engine corresponding to the target data type according to the corresponding relation between the pre-stored data type and the depth packet detection engine;
inputting the data message and other data messages into a target deep packet detection engine, and respectively determining characteristic rules matched with the data message and other data messages;
and respectively processing the data message and other data messages according to the matched characteristic rules.
By applying the embodiment of the application, the data content comprises the file characteristic information, the file type is determined according to the data content instead of the simple extension name, so that the accuracy of file type identification is improved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the embodiments of the data processing apparatus, the electronic device, and the machine-readable storage medium, since they are substantially similar to the embodiments of the data processing method, the description is relatively simple, and the relevant points can be referred to the partial description of the embodiments of the data processing method.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A method of data processing, the method comprising:
receiving a data message, wherein the data message comprises data content;
when the state machine is in an initial state, determining target data content, wherein the target data content comprises the data content; extracting the content of the pre-set data volume of the target data content, and then switching the state machine to an MN algorithm matching state;
when the state machine is in the MN algorithm matching state, matching the extracted content with a preset feature code, determining the preset feature code included in the extracted content, and then switching the state machine to a feature derivation state;
when the state machine is in the feature derivation state, determining a file type corresponding to the preset feature code included in the extracted content according to the corresponding relation between the preset feature code and the file type, and taking the file type as a target data type, and then switching the state machine to a file processing state;
when the state machine is in the file processing state, inputting the data message and other data messages into a deep packet inspection engine for inspection processing according to the target data type, wherein the other data messages comprise other data contents belonging to the same original data packet as the data contents; after the original data packet is processed, the state machine is switched to a state of ending FINI;
when the state machine is in the FINI state, ending data processing;
before the step of inputting the data packet and other data packets into a deep packet inspection engine for inspection processing according to the target data type, the method further includes:
when the state machine is in the file processing state, if the target data type is a compressed file, decompressing the target data content; extracting the content of the pre-set data volume from the decompressed content, then switching the state machine to an MN algorithm matching state, returning to execute the step of matching the extracted content with the pre-set feature code when the state machine is in the MN algorithm matching state, determining the pre-set feature code included in the extracted content, and then switching the state machine to a feature derivation state;
and when the state machine is in the file processing state, if the target data type is not a compressed file, continuing to execute the step of inputting the data message and other data messages into a deep packet inspection engine for inspection processing according to the target data type.
2. The method of claim 1, wherein if the data size of the data content is smaller than the predetermined data size, the target data content further comprises: at least one of the other data packets includes other data content belonging to the same original data packet as the data content.
3. The method of claim 1, further comprising:
and when the state machine is in the feature derivation state, if the extracted content does not include a preset feature code, determining that the target data content is a non-file type, taking the non-file type as the target data type, and then switching the state machine to a file processing state.
4. The method according to any one of claims 1-3, wherein the step of inputting the datagram and other datagrams into a deep packet inspection engine for inspection processing according to the target data type comprises:
determining a target depth detection engine corresponding to the target data type according to a corresponding relation between a pre-stored data type and a depth packet detection engine;
inputting the data message and other data messages into the target deep packet inspection engine, and respectively determining the characteristic rules matched with the data message and other data messages;
and respectively processing the data message and other data messages according to the matched characteristic rules.
5. A data processing apparatus, characterized in that the apparatus comprises:
a receiving unit, configured to receive a data packet, where the data packet includes data content;
a first determining unit, configured to determine target data content when a state machine is in an initial state, where the target data content includes the data content;
the extraction unit is used for extracting the content of the pre-set data volume of the target data content when the state machine is in an initial state, and then the state machine is switched to an MN algorithm matching state;
the matching unit is used for matching the extracted content with a preset feature code when the state machine is in the MN algorithm matching state, determining the preset feature code included in the extracted content, and then switching the state machine to a feature derivation state;
a second determining unit, configured to determine, when the state machine is in the feature derivation state, a file type corresponding to the preset feature code included in the extracted content according to a correspondence between the preset feature code and the file type, as a target data type, and then, switch the state machine to a file processing state;
a processing unit, configured to, when the state machine is in the file processing state, input the data packet and other data packets into a deep packet inspection engine for inspection processing according to the target data type, where the other data packets include other data contents belonging to the same original data packet as the data content; after the original data packet is processed, the state machine is switched to a state of ending FINI;
when the state machine is in the FINI state, ending data processing;
the processing unit is further configured to decompress the target data content if the target data type is a compressed file when the state machine is in the file processing state;
the extracting unit is further configured to extract content of a pre-set data size from the decompressed content, and then the state machine is switched to an MN algorithm matching state;
and the processing unit is further configured to, when the state machine is in the file processing state, if the target data type is not a compressed file, input the data packet and other data packets into a deep packet inspection engine for inspection processing according to the target data type.
6. The apparatus of claim 5, wherein if the data size of the data content is smaller than the predetermined data size, the target data content further comprises: at least one of the other data packets includes other data content belonging to the same original data packet as the data content.
7. The apparatus according to claim 5, wherein the second determining unit is further configured to determine that the target data content is a non-file type if the extracted content does not include a preset feature code when the state machine is in the feature deriving state, and take the non-file type as a target data type, and then the state machine goes to a file processing state.
8. The apparatus according to any one of claims 5 to 7, wherein the processing unit is specifically configured to:
determining a target depth detection engine corresponding to the target data type according to a corresponding relation between a pre-stored data type and a depth packet detection engine;
inputting the data message and other data messages into the target deep packet inspection engine, and respectively determining the characteristic rules matched with the data message and other data messages;
and respectively processing the data message and other data messages according to the matched characteristic rules.
9. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 4.
10. A machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to: carrying out the method steps of any one of claims 1 to 4.
CN201810034333.4A 2018-01-15 2018-01-15 Data processing method and device, electronic equipment and storage medium Active CN108270783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810034333.4A CN108270783B (en) 2018-01-15 2018-01-15 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810034333.4A CN108270783B (en) 2018-01-15 2018-01-15 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108270783A CN108270783A (en) 2018-07-10
CN108270783B true CN108270783B (en) 2021-04-16

Family

ID=62775642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810034333.4A Active CN108270783B (en) 2018-01-15 2018-01-15 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108270783B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181806B (en) * 2019-12-25 2022-02-25 深圳市丰润达科技有限公司 Method and device for realizing whole network flow analysis technology and readable storage medium
CN111367582B (en) * 2020-03-06 2023-08-25 上海赋华网络科技有限公司 Method for identifying file type in high performance
CN112214462B (en) * 2020-10-22 2023-04-28 新华三信息安全技术有限公司 Multi-layer decompression method for compressed file, electronic device and storage medium
CN115002243B (en) * 2022-08-02 2022-11-01 上海秉匠信息科技有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571767A (en) * 2011-12-24 2012-07-11 成都市华为赛门铁克科技有限公司 File type recognition method and file type recognition device
CN102624547A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Method, device and system for managing IM (Instant Messaging) online behavior
CN103209170A (en) * 2013-03-04 2013-07-17 汉柏科技有限公司 File type identification method and identification system
CN105808583A (en) * 2014-12-30 2016-07-27 Tcl集团股份有限公司 File type identification method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10319479B2 (en) * 2015-05-14 2019-06-11 Florence Healthcare, Inc. Remote monitoring and dynamic document management systems and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571767A (en) * 2011-12-24 2012-07-11 成都市华为赛门铁克科技有限公司 File type recognition method and file type recognition device
CN102624547A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Method, device and system for managing IM (Instant Messaging) online behavior
CN103209170A (en) * 2013-03-04 2013-07-17 汉柏科技有限公司 File type identification method and identification system
CN105808583A (en) * 2014-12-30 2016-07-27 Tcl集团股份有限公司 File type identification method and device

Also Published As

Publication number Publication date
CN108270783A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108270783B (en) Data processing method and device, electronic equipment and storage medium
CN107294982B (en) Webpage backdoor detection method and device and computer readable storage medium
US9749341B2 (en) Method, device and system for recognizing network behavior of program
KR100862187B1 (en) A Method and a Device for Network-Based Internet Worm Detection With The Vulnerability Analysis and Attack Modeling
CA3159619C (en) Packet processing method and apparatus, device, and computer-readable storage medium
CN112600852B (en) Vulnerability attack processing method, device, equipment and storage medium
CN109361674B (en) Bypass access streaming data detection method and device and electronic equipment
US9584537B2 (en) System and method for detecting mobile cyber incident
US10237287B1 (en) System and method for detecting a malicious activity in a computing environment
CN110234082B (en) Addressing method and device of mobile terminal, storage medium and server
CN115022034B (en) Attack message identification method, device, equipment and medium
CN113630417B (en) WAF-based data transmission method, WAF-based data transmission device, WAF-based electronic device and storage medium
CN114050917B (en) Audio data processing method, device, terminal, server and storage medium
CN113297577B (en) Request processing method and device, electronic equipment and readable storage medium
CN114281547B (en) Data message processing method and device, electronic equipment and storage medium
CN114205150B (en) Intrusion prevention method and device for container environment, electronic equipment and storage medium
CN112953957B (en) Intrusion prevention method, system and related equipment
CN111079144B (en) Virus propagation behavior detection method and device
CN113595797A (en) Alarm information processing method and device, electronic equipment and storage medium
US9049170B2 (en) Building filter through utilization of automated generation of regular expression
CN109756454B (en) Data interaction method, device and system
KR102001814B1 (en) A method and apparatus for detecting malicious scripts based on mobile device
US8289854B1 (en) System, method, and computer program product for analyzing a protocol utilizing a state machine based on a token determined utilizing another state machine
CN110572372B (en) Method and device for detecting intrusion of Internet of things equipment
CN116527541A (en) Method, device, equipment and medium for predicting network transaction delay influence time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant