CN113849813A

CN113849813A - Data detection method and device, electronic equipment and storage medium

Info

Publication number: CN113849813A
Application number: CN202111082681.7A
Authority: CN
Inventors: 桂铭成
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-12-28

Abstract

The application discloses a data detection method and device, electronic equipment and a storage medium, and belongs to the technical field of electronics. The specific scheme comprises the following steps: detecting a first data packet according to a first rule set, wherein the first data packet is obtained through a firewall; and under the condition that the first data packet meets a first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall.

Description

Data detection method and device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of electronics, and particularly relates to a data detection method and device, electronic equipment and a storage medium.

Background

With the wide spread of internet technology, maintaining network security becomes an important requirement for people. Conventional safeguards are involved from the network layer, the host layer, to the application layer. Protection against Application layers is dominated by fire Walls (WAF).

In the prior art, most of protection principles of the WAF are based on characteristic judgment of malicious data, and operations such as releasing, intercepting and modifying are performed on a data packet. However, the feature determination of the WAF is implemented based on fixed regular expression matching. The flexibility is poor, resulting in at least half of the application layer attacks that eventually bypass the WAF. Therefore, the conventional WAF has a high false negative rate.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data detection method, an apparatus, an electronic device, and a storage medium, which can solve the problem that a conventional WAF has a high false negative rate.

In a first aspect, an embodiment of the present application provides a data detection method, where the method includes: detecting a first data packet according to a first rule set, wherein the first data packet is obtained through a firewall; and under the condition that the first data packet meets a first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall.

In a second aspect, an embodiment of the present application provides a data detection apparatus, including: and a detection module. The detection module is used for detecting a first data packet according to a first rule set, wherein the first data packet is obtained through a firewall; and under the condition that the first data packet meets a first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, a first data packet is detected according to a first rule set, wherein the first data packet is obtained through a firewall; and under the condition that the first data packet meets a first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall. Through the scheme, if safety detection is required to be carried out on the data packet, the data packet can be detected through the firewall firstly, and data detection is carried out through the pre-training model under the condition that the data packet meets the early warning rule. Because the detection intensity of the pre-training model is higher than that of the firewall, the pre-training model with higher detection intensity can be used for protecting the data packet meeting the early warning rule, so that the missing report rate of the firewall is reduced, and the reliability of network safety protection is improved.

Drawings

FIG. 1 is a schematic diagram of a data detection method provided in an embodiment of the present application;

FIG. 2 is a second schematic diagram of a data detection method according to an embodiment of the present application;

FIG. 3 is a diagram of a mapping of attack code segments provided by an embodiment of the application;

FIG. 4 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present application;

FIG. 5 is a hardware diagram of an electronic device provided in an embodiment of the present application;

fig. 6 is a second hardware schematic diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The data detection method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, the embodiment of the present application provides a data detection method, which may be applied to an electronic device, and the method may include steps 101-102.

Step 101, the electronic device detects a first data packet according to a first rule set.

The first data packet is obtained through a firewall.

Under a general condition, a firewall in the electronic device is always in a working state, and after the firewall receives a first data packet, whether the first data packet meets an early warning rule can be detected based on a first rule set, and the early warning rule can be used for filtering the data packet with a higher risk coefficient.

It should be noted that, when attacking the application layer of the electronic device, the malicious data packet is a persistent process, and different attack time nodes may correspond to different attack activities, and the first rule set may include attack activities corresponding to different attack time nodes, so that the firewall may identify whether there is an attack activity that satisfies the early warning rule in the first data packet based on the first rule set.

And 102, under the condition that the first data packet meets the first rule set early warning rule, the electronic equipment carries out safety detection on the first data packet through a pre-training model.

The detection intensity of the pre-training model is higher than that of the firewall.

Optionally, if the electronic device determines that the first data packet meets the early warning rule, the pre-training model may be called within a preset time for determining the risk, and the deep security detection may be performed on the first data packet through the pre-training model. For example, the electronic device may invoke the pre-training model within one hour of determining that the first data packet satisfies the early warning rule and perform deep security inspection on the first data packet.

Optionally, the rule set corresponding to the pre-training model may be a second rule set, and when the electronic device invokes the pre-training model to detect the first data packet, the pre-training model may perform joint detection on the first data packet based on the first rule set and the second rule set.

Optionally, the electronic device may directly perform security detection on the first data packet through the firewall under the condition that the first data packet does not satisfy the early warning rule. I.e. security detection of the first data packet based on the first set of rules.

It should be noted that, because the firewall performs security detection based on the first rule set, and the pre-trained model performs joint security detection based on the first rule set and the second rule set, the computation density of the firewall is greater than that of the pre-trained model, but the detection strength of the pre-trained model is higher than that of the firewall. Therefore, when the protection difficulty of the data packet is low, safety monitoring can be carried out only through the firewall, so that the system consumption of the electronic equipment can be reduced, and the operating efficiency of the electronic equipment is improved.

Optionally, after the security detection is performed on the first data packet by using the pre-training model, if it is detected that the first data packet is a malicious data packet, the pre-training model may send instruction information for intercepting the first data packet to the firewall, so as to avoid malicious attack of the first data packet.

Optionally, after the security detection is performed on the first data packet by using the pre-training model, under the condition that the first data packet meets the preset characteristics, the pre-training model may generate a corresponding target regular expression according to the data characteristics of the first data packet, and add the target regular expression to the first rule set; wherein the predetermined characteristic is a data characteristic indicative of a malicious data packet.

Specifically, the pre-training model may extract data features of the first data packet, determine the data features of the first data packet as segment codes, generate a corresponding target regular expression according to a rule of the regular expression, and add the target regular expression to the first rule set.

Based on the scheme, when the firewall is subjected to malicious attack of the data packet with the same data characteristic as the first data packet, the data packet can be directly intercepted.

In the embodiment of the application, if security detection is to be performed on the data packet, the data packet may be detected through the firewall first, and data detection may be performed through the pre-training model when the data packet meets the early warning rule. Because the detection intensity of the pre-training model is higher than that of the firewall, the pre-training model with higher detection intensity can be used for protecting the data packet meeting the early warning rule, so that the missing report rate of the firewall is reduced, and the reliability of network safety protection is improved.

Optionally, as shown in fig. 2, before performing step 102, the method may further include step 103 and step 104:

step 103, the electronic device obtains a training data set of the pre-training model, and performs segmentation processing and feature coding mapping of key fields on each data in the training data set.

The key fields can be obtained by performing segmentation processing on each data in the training data set, and the key fields correspond to the feature codes one to one.

Specifically, the electronic device may perform segmentation processing on the second data packet based on the malicious keyword, so as to obtain a plurality of key fields; and then mapping the plurality of key fields to feature codes, wherein one key field may correspond to one feature code, and the second data packet may be any one of the data in the training data set.

During a data attack, the electronic device can identify the context-specific characteristics of the data, that is, each type of attacking data has its unique context-specific characteristics. For example, in an XSS attack, the attack code includes:

"><img src onerror＝alert(1)>

<a href＝"javascript％26colon；alert(1)">clic；

the context intrinsic characteristics can include a section with a malicious tendency, such as 'script'.

In the XXE attack, the attack code includes:

<！DOCTYPE foo[<！ENTITY xxe SYSTEM"http://xxx.xxx.xxx.xxx/">]>

<！DOCTYPE foo[<！ENTITY％xxe SYSTEM"http://xxx.com">％xxe；]>

the context intrinsic characteristics may include segments with malicious tendencies such as "DOCTYPE" and "ENTITY", among others.

Based on this, after acquiring the training data set, the electronic device may perform segmentation processing and feature coding mapping on each data in the training data set, respectively, so as to obtain the context intrinsic features of the data. Still taking the second data packet as an example, the electronic device may first decode the obtained second data packet through a decoder of the firewall to obtain an attack code, then perform segmentation processing on the decoded attack code based on the malicious keyword to obtain a plurality of key fields, and then map the plurality of key fields one by one as the feature codes.

Illustratively, as shown in fig. 3, the segment processing and feature coding mapping are performed on the attack code "< script > throw error ═ alert (1) </script >" of the XSS attack. Key fields such as "< script >, throw, onerror, \ alert (1), and </script >" can be obtained by segmenting the sentence, and these key fields can then be mapped to feature codes "\\ x2ab3, \ x37e1, \\ x8a8f, \\ x3d0d, \\ x0000, and \ x2ab 4".

It should be noted that the signature coding may be composed of two bytes, and the pre-training model may accommodate 65536 coding signatures, which respectively correspond to 65536 different code signatures of the application layer attack. Where \ x0000 is a reserved field, free content can be identified, e.g., execution content alert (1) in XSS attack code can be defined as free content.

And step 104, the electronic equipment determines the feature code corresponding to each datum as an input vector of the pre-training model.

Optionally, the electronic device may send the feature code as an input vector to a pre-training model for training, and each node of the pre-training model may output the malicious probability of the feature code. And then, standardizing the malicious probabilities corresponding to all the nodes according to a sequence order, sending the standardized malicious probabilities into a log function, solving a mathematical expectation, and taking a finally obtained result of the mathematical expectation as a malicious feature score of the feature code. The higher the malicious feature score, the stronger the malicious intent that represents the feature encoding. By gradually increasing the malicious identification probability of the existence continuity of the suspicious characteristics of the malicious code, the attack behavior of the malicious code can be identified on the basis of disregarding the construction logic and the execution sequence of the malicious code.

Optionally, the electronic device may also perform model training of the pre-trained model through the malicious data set. It should be noted that, the training based on the malicious data set as the model requires supervised tag information on all data (normal access data and malicious data), that is, the pre-training model is provided with the tagged content of the training data belonging to the normal access data or the tagged content of the malicious access data.

Optionally, the training data set may include malicious data disclosed by the network and regular expressions in the first rule set. The electronic equipment can use the feature codes corresponding to the malicious data disclosed by the network as input vectors of a pre-training model to perform model training on the pre-training model to obtain a first training model; then, the feature codes corresponding to the regular expressions in the first rule set are used as input vectors of the first training model to perform model training on the first training model, and a pre-training model in an overfitting state is obtained.

It should be noted that training the pre-training model by using malicious data disclosed by the network can provide a basis for the protection capability of the pre-training model. The pre-training model is trained through the regular expression in the first rule set until the pre-training model is trained to reach an overfitting state, so that high attack mode coverage rate can be guaranteed.

Optionally, even if the pre-training model is used as a protection supplementary scheme of the firewall, false alarm and false negative of data may still exist, so that a user can screen out data packets with false alarm and false negative conditions from a log of the firewall and then input the data packets into the pre-training model, thereby realizing feedback training of the pre-training model.

Optionally, the electronic device may further expand the number of samples of the data packet having the false alarm and false alarm failure conditions by copying, and increase the training weight of the pre-training model by sample expansion, thereby ensuring that the false alarm and false alarm failure problem is completely corrected and improved.

Alternatively, the pre-training model may be a Long Short-Term Memory network (LSTM) model. The LSTM model can be accessed to a rule configuration module and a log module of the firewall, the rule configuration module can provide a first rule set for the LSTM model as a data set for model training, and the log module can provide a data packet with false alarm and false alarm conditions for the LSTM model as a data set for model feedback training.

In the embodiment of the application, data in the training data set can be used for model training after being subjected to segmentation processing and feature coding mapping, so that a pre-training model can more accurately identify malicious data packets, and the reliability of network security protection is improved.

It should be noted that, in the data detection method provided in the embodiment of the present application, the execution subject may be a data detection apparatus, or a control module in the data detection apparatus for executing the data detection method. In the embodiment of the present application, a data detection method performed by a data detection apparatus is taken as an example, and the data detection apparatus provided in the embodiment of the present application is described.

As shown in fig. 4, an embodiment of the present application further provides a data detection apparatus 400, including: a detection module 401; the detection module 401 is configured to detect a first data packet according to a first rule set, and perform security detection on the first data packet through a pre-training model when the first data packet meets a first rule set early warning rule; the first data packet is obtained through the firewall, and the detection intensity of the pre-training model is higher than that of the firewall.

Optionally, the detecting module 401 is further configured to perform security detection on the first data packet through the firewall under the condition that the first data packet does not satisfy the early warning rule.

Optionally, the apparatus 400 may further include a processing module 402; a processing module 402, configured to, after the detection module 401 performs security detection on the first data packet through the pre-training model, generate a corresponding target regular expression according to data characteristics of the first data packet when the first data packet meets preset characteristics, and add the target regular expression to the first rule set; the preset characteristics are data characteristics used for indicating malicious data packets.

Optionally, the apparatus 400 may further include an obtaining module 403 and a training module 404; the obtaining module 403 is further configured to obtain a training data set of a pre-training model; the training module 404 is configured to perform segmentation processing and feature coding mapping of key fields on each data in the training data set; determining the feature code corresponding to each data as the input vector of the pre-training model; the key fields are obtained by carrying out segmentation processing on each data in the training data set, and the key fields correspond to the feature codes one to one.

Optionally, the training data set may include malicious data disclosed by the network and regular expressions in the first rule set. The training module 404 may be specifically configured to perform model training on the pre-training model by using a feature code corresponding to malicious data disclosed by the network as an input vector of the pre-training model, so as to obtain a first training model; and performing model training on the first training model by taking the feature codes corresponding to the regular expressions in the first rule set as input vectors to obtain the pre-training model in an overfitting state.

The data detection device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The data detection device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The data detection device provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 3, and is not described here again to avoid repetition.

Optionally, as shown in fig. 5, an electronic device 500 is further provided in this embodiment of the present application, and includes a processor 501, a memory 502, and a program or an instruction stored in the memory 502 and capable of being executed on the processor 501, where the program or the instruction is executed by the processor 501 to implement each process of the data detection method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The network module 1002 may be configured to obtain the first data packet through a firewall.

A processor 1010, configured to detect a first packet according to a first rule set, where the first packet is obtained through a firewall; and under the condition that the first data packet meets a first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall.

Optionally, the processor 1010 may be further configured to perform security detection on the first data packet through the firewall under the condition that the first data packet does not satisfy the early warning rule.

In the embodiment of the application, because the firewall performs security detection based on the first rule set and the pre-trained model performs joint security detection based on the first rule set and the second rule set, the computation density of the firewall is greater than that of the pre-trained model, but the detection intensity of the pre-trained model is higher than that of the firewall. Therefore, when the protection difficulty of the data packet is low, safety monitoring can be carried out only through the firewall, so that the system consumption of the electronic equipment can be reduced, and the operating efficiency of the electronic equipment is improved.

Optionally, the processor 1010 may be further configured to, after the detection module performs security detection on the first data packet through the pre-training model, generate a corresponding target regular expression according to the data feature of the first data packet when the first data packet meets a preset feature, and add the target regular expression to the first rule set; the preset characteristics are data characteristics used for indicating malicious data packets.

In an embodiment of the application, the target regular expression is added to the first rule set. The firewall can directly intercept the data packet when the firewall is subjected to malicious attack of the data packet with the same data characteristics as the first data packet.

Optionally, the network module 1002 is further configured to obtain a training data set of the pre-training model.

A processor 1010, configured to perform segmentation processing and feature coding mapping of key fields on each data in the training data set; determining the feature code corresponding to each data as the input vector of the pre-training model; the key fields are obtained by carrying out segmentation processing on each data in the training data set, and the key fields correspond to the feature codes one to one.

Optionally, the processor 1010 may be specifically configured to perform model training on the pre-training model by using a feature code corresponding to malicious data disclosed by the network as an input vector of the pre-training model, so as to obtain a first training model; and performing model training on the first training model by taking the feature codes corresponding to the regular expressions in the first rule set as input vectors to obtain the pre-training model in an overfitting state.

In the embodiment of the application, the training of the pre-training model through malicious data disclosed by the network can provide a basis for the protection capability of the pre-training model. The pre-training model is trained through the regular expression in the first rule set until the pre-training model is trained to reach an overfitting state, so that high attack mode coverage rate can be guaranteed.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the data detection method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the data detection method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for data detection, comprising:

detecting a first data packet according to a first rule set, wherein the first data packet is obtained through a firewall;

and under the condition that the first data packet meets the first rule centralized early warning rule, performing security detection on the first data packet through a pre-training model, wherein the detection intensity of the pre-training model is higher than that of the firewall.

2. The data detection method of claim 1, wherein after detecting the first packet according to the first set of rules, the method further comprises:

and under the condition that the first data packet does not meet the early warning rule, performing security detection on the first data packet through the firewall.

3. The data detection method of claim 1, wherein after the security detection of the first data packet by the pre-trained model, the method further comprises:

under the condition that the first data packet meets preset characteristics, generating a corresponding target regular expression according to the data characteristics of the first data packet, and adding the target regular expression to the first rule set; the preset characteristics are data characteristics used for indicating malicious data packets.

4. The data detection method according to any one of claims 1 to 3, wherein before the security detection of the first data packet by the pre-trained model, the method further comprises:

acquiring a training data set of the pre-training model, and performing segmentation processing and feature coding mapping of key fields on each data in the training data set;

determining the feature code corresponding to each datum as an input vector of the pre-training model;

the key fields are obtained by performing segmentation processing on each data in the training data set, and the key fields correspond to the feature codes one to one.

5. The data detection method of claim 4, wherein the training data set comprises malicious data exposed to the network and regular expressions in the first rule set; determining the feature code corresponding to each datum as the input vector of the pre-training model includes:

taking a feature code corresponding to malicious data disclosed by the network as an input vector of the pre-training model to perform model training on the pre-training model to obtain a first training model;

and performing model training on the first training model by taking the feature codes corresponding to the regular expressions in the first rule set as input vectors to obtain the pre-training model in an overfitting state.

6. A data detection apparatus, comprising: a detection module;

the detection module is used for detecting a first data packet according to a first rule set and carrying out safety detection on the first data packet through a pre-training model under the condition that the first data packet meets the early warning rule in the first rule set;

the first data packet is acquired through a firewall, and the detection intensity of the pre-training model is higher than that of the firewall.

7. The data detection device of claim 6, wherein the detection module is further configured to perform security detection on the first data packet through the firewall if the first data packet does not satisfy the pre-warning rule.

8. The data detection device of claim 6, further comprising a processing module;

the processing module is configured to, after the detection module performs security detection on the first data packet through the pre-training model, generate a corresponding target regular expression according to data characteristics of the first data packet when the first data packet meets preset characteristics, and add the target regular expression to the first rule set; the preset characteristics are data characteristics used for indicating malicious data packets.

9. The data detection device according to any one of claims 6 to 8, further comprising an acquisition module and a training module;

the acquisition module is used for acquiring a training data set of the pre-training model;

the training module is used for carrying out segmentation processing and feature coding mapping of key fields on each data in the training data set; determining the feature code corresponding to each data as the input vector of the pre-training model;

10. The data detection apparatus of claim 9, wherein the training data set comprises malicious data exposed by a network and regular expressions in the first rule set; the training module is specifically configured to:

11. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the data detection method of any one of claims 1-5.

12. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, carry out the steps of the data detection method according to any one of claims 1 to 5.