CN115168850A - Data security detection method and device - Google Patents

Data security detection method and device Download PDF

Info

Publication number
CN115168850A
CN115168850A CN202210647637.4A CN202210647637A CN115168850A CN 115168850 A CN115168850 A CN 115168850A CN 202210647637 A CN202210647637 A CN 202210647637A CN 115168850 A CN115168850 A CN 115168850A
Authority
CN
China
Prior art keywords
data
detection
regular
regular expression
regular expressions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210647637.4A
Other languages
Chinese (zh)
Inventor
袁小栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202210647637.4A priority Critical patent/CN115168850A/en
Publication of CN115168850A publication Critical patent/CN115168850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Abstract

The specification provides a data security detection method and device, which are applied to a big data computing engine which is carried by a cloud service system and used for operating preset detection rules and carrying out security detection on data to be detected. The detection rule includes several regular expressions. The method comprises the following steps: and grouping the regular expressions contained in the detection rule according to the data fields corresponding to the regular expressions to obtain a plurality of regular expression sets. The data fields corresponding to all regular expressions in the regular expression set are the same. Responding to the received data to be detected, executing regular expressions in the regular expression sets, and caching an execution result into a cache space; and responding to the completion of the execution of all regular expressions, sequentially reading the detection rules from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the buffer space, and generating the detection results corresponding to the detection rules.

Description

Data security detection method and device
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a data security detection method and device.
Background
This section is intended to provide a background or context to the embodiments of the specification that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In a big data computing scenario, due to high security requirements, security detection is generally required for data entering a big data computing engine, and the security detection is generally formed by a large number of detection rules. In the related art, when the security detection is performed, a large amount of computing resources are consumed to run a large number of detection rules.
Disclosure of Invention
To overcome the problems in the related art, the present specification provides the following methods and apparatuses.
In a first aspect of embodiments of the present specification, a data security detection method is provided, where the data security detection method is applied to a big data computing engine carried by a cloud service system, and the computing engine is configured to run a detection rule in a preset detection rule base, and perform security detection on to-be-detected data from a data source docked with the cloud service system;
the data to be detected comprises a plurality of data fields; the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression;
the method comprises the following steps:
grouping regular expressions contained in the detection rules in the detection rule base according to data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
responding to the received data to be detected from the data source, executing regular expressions in the regular expression sets, and caching the execution results corresponding to the regular expression sets into a cache space;
and in response to the fact that the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected are executed, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
In a second aspect of embodiments of the present specification, there is provided a data security detection apparatus, which is applied to a big data computing engine carried by a cloud service system, where the computing engine is configured to run detection rules in a preset detection rule base, and perform security detection on data to be detected from a data source docked with the cloud service system;
the data to be detected comprises a plurality of data fields; the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression;
the device comprises:
the grouping unit is used for grouping the regular expressions contained in the detection rules in the detection rule base according to the data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
the cache unit is used for responding to the received data to be detected from the data source, executing regular expressions in the regular expression sets and caching the execution results corresponding to the regular expression sets into a cache space;
and the detection unit is used for responding to the completion of the execution of the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
In a third aspect of embodiments of the present specification, there is provided a storage medium; the storage medium has stored thereon a computer program which, when executed, implements the steps of the method as described below:
grouping regular expressions contained in the detection rules in the detection rule base according to data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
responding to the received data to be detected from the data source, executing regular expressions in the regular expression sets, and caching the execution results corresponding to the regular expression sets into a cache space;
and in response to the fact that the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected are executed, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
In a fourth aspect of embodiments herein, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following method when executing the program:
grouping regular expressions contained in the detection rules in the detection rule base according to data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
responding to the received data to be detected from the data source, executing regular expressions in the regular expression sets, and caching the execution results corresponding to the regular expression sets into a cache space;
and in response to the fact that the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected are executed, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
The above embodiments of the present specification have at least the following advantages:
in the technical scheme, the regular expressions in all the detection rules are grouped according to data fields in advance, when the data to be detected are received, the regular expressions in all the groups are executed firstly, and the execution result is cached in a cache space; and then, sequentially processing each detection rule, and directly reading the result of the regular expression contained in the detection rule from the cache space to obtain the detection result of the detection rule. According to the scheme, the regular expressions are executed in advance and the results of the regular expressions are cached, and the execution results of the corresponding regular expressions are directly inquired in the caching space for each detection rule, so that the calculation cost of safety detection is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 schematically illustrates an architecture diagram of a data security detection method according to an embodiment of the present specification;
FIG. 2 schematically illustrates a flow diagram of a data security detection method according to an embodiment of the present description;
fig. 3 is a schematic diagram illustrating a cache structure of a data security detection method according to an embodiment of the present specification;
FIG. 4 schematically illustrates a schematic diagram of a data security detection method according to an embodiment of the present description;
FIG. 5 schematically illustrates a block diagram of a data security detection apparatus according to an embodiment of the present description;
fig. 6 schematically shows a hardware structure diagram of a computer device in which a data security detection method according to an embodiment of the present specification is implemented.
In the drawings, like or corresponding reference characters designate like or corresponding parts.
Detailed Description
The principles and spirit of the present description will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely to enable those skilled in the art to better understand and to implement the present description, and are not intended to limit the scope of the present description in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the description to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present description may be embodied as a system, apparatus, device, method, or computer program product. Thus, the present description may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In a big data scenario, security detection is generally required for a large amount of data entering a computing engine, especially in an application scenario where a proprietary cloud, a hybrid cloud, and the like are more sensitive to data security. While detection for large amounts of data typically requires the application of a large number of detection rules, where each detection rule may contain multiple regular expressions.
In the related art, data security detection is usually performed through large data component applications such as flink or blink, but the large data component applications have the characteristics of heavy dependence and high task overhead, and each detection rule needs to be operated separately to allocate separate computing resources.
In view of this, the present specification provides a data security detection method.
The embodiments of the present description will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a data security detection system according to an exemplary embodiment. As shown in fig. 1, the system may include a network 10, a server 11, a number of electronic devices, such as a cell phone 12, a cell phone 13, a cell phone 14, and so on.
The server 11 may be a physical server comprising an independent host, or the server 11 may be a virtual server, a cloud server, etc. carried by a cluster of hosts. Handsets 12-14 are just one type of electronic device that a user may use. In fact, it is obvious that the user can also use electronic devices of the type such as: tablet devices, notebook computers, personal Digital Assistants (PDAs), wearable devices (e.g., smart glasses, smart watches, etc.), etc., to which one or more embodiments of the present disclosure are not limited. The network 10 may include various types of wired or wireless networks.
In one embodiment, the server 11 may cooperate with handsets 12-14; wherein, the mobile phones 12-14 can receive user operation, and upload the received command and file to the server 11 through the network 10, and then the server 11 processes the file based on the scheme of the present specification. In another embodiment, the handsets 12-14 may independently implement aspects of the present description; the mobile phones 12-14 receive user operations, and process received commands and files based on the scheme of the specification so as to realize data security detection.
Referring to fig. 2, fig. 2 is a flowchart of a data security detection method according to an exemplary embodiment, where the method is applied to a processing device, and the processing device may be the server 11 or the mobile phones 12 to 14 shown in fig. 1.
The method comprises the following steps:
step 202, grouping regular expressions contained in detection rules in the detection rule base according to data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
the data security detection method provided by the specification is applied to a big data computing engine carried by a cloud service system, wherein the engine is used for operating detection rules in a preset detection rule base and carrying out security detection on to-be-detected data from a data source butted with the cloud service system;
the big data computing engine is deployed in a cloud service system, which may be a public cloud system, a proprietary cloud system, or a hybrid cloud system, and this specification does not specifically limit this. The big data computing engine can perform security detection on the input data to determine whether the input data contains malicious intrusion behaviors.
In one illustrative embodiment shown in the present specification, the cloud service system comprises a proprietary cloud or a hybrid cloud service system; the big data compute engine includes a lightweight secure compute engine.
In one illustrative embodiment shown in this specification, the big data compute engine comprises a RockketMQ-Streams compute engine.
The data to be detected comprises a plurality of data fields; the data to be detected may generally have a relatively fixed format, or may be regarded as a relatively fixed format, for example, the data to be detected may be regarded as being composed of a plurality of data fields, each field being a key-value key value pair composed of a field name or a field value, where the field name may be directly recorded in the data to be detected, or may be determined according to a specific position of the data field in the data to be detected.
When the large data budget engine is subjected to security detection, the input data to be detected is a large amount of data from a few data sources; for example, data sources that detect for host security may include process logs and weblogs; for the data to be detected from each data source, security detection can be performed by using detection rules in different detection rule bases, or security detection can be performed by using detection rules in the same detection rule base.
In an exemplary implementation of the present disclosure, the detection rules in the detection rule base may be dynamically updated according to a change of the data to be detected and an update of the detection policy. For example, an API interface for updating the dynamic detection rule base may be reserved, so that an administrator may update the detection rules in the detection rule base in an adding, deleting, modifying, replacing manner, or the like.
In one illustrative embodiment shown in the present specification, the data to be detected includes data from the same data source.
Data from the same data source, which generally has a similar data structure and contains similar data fields, can be subjected to security detection on the data from the data source by using detection rules in the same detection rule base.
The data from the data source may be streaming data, which is usually continuous data that dynamically increases with time, and the streaming data may be intercepted by a window as the data to be detected. The window may be a time window or an element window, a fixed window or a sliding window.
The time window refers to that the window intercepts data according to time, for example, the window length is 1 hour, and the window intercepts data fragments of stream data received within 1 hour in the window; the element window refers to that the window is intercepted according to the number of elements, in this specification, the number of elements may be the number of data fields, for example, the window length is 10000 data fields, and the window intercepts continuous data fragments of stream data of 10000 fields.
The fixed window refers to intercepting at fixed time intervals or element numbers, for example, data segments of past 1 hour can be intercepted every 1 hour, and the sliding window can dynamically intercept at any time.
In one illustrative embodiment shown in this specification, the data to be detected comprises data segments that are dynamically intercepted through a sliding window in streaming data from the same data source.
The detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression;
generally, data to be detected in a big data calculation scene has large data quantity and data complexity, so that a large number of detection rules are generally required to be configured for security detection of the data, the detection rules can include one or more regular expressions, and when the detection rules include a plurality of regular expressions, the detection rules refer to complex security detection rules formed by a plurality of regular expressions.
In an exemplary embodiment shown in this specification, the detection rule is a logical operation formula formed by a plurality of regular expressions corresponding to data fields included in the data to be detected and according to a preset logical operation manner;
the regular expression is used for performing regular matching on a corresponding data field in data to be detected to determine whether the data field matches a specific syntax rule, for example, determine whether a value of the data field includes a specific character or a character combination, determine the number of characters included in the value of the data field, and the like. The specification does not limit the specific format of the regular expression, and the format supported by the programming language for writing the detection rule can be adopted according to actual needs, for example.
For example, in one illustrative embodiment shown in this specification, a regular expression in which the field value used to represent a data field with a field name project is a string of any 8 characters can be written as: project ='; wherein, the content in the quotation marks represents character strings, and represents any character.
The execution result of the regular expression represents whether the corresponding field is matched with the corresponding syntactic rule or not, and the execution result is a binary result and respectively represents regular matching and regular mismatching. For example, the result of a regular match or a regular mismatch can be represented by binary values 1 and 0, respectively; of course, other values may be used to represent the execution results, and the present specification is not particularly limited.
In an exemplary embodiment shown in this specification, the big data computing engine includes a preprocessing rule for performing data preprocessing on the data to be detected, so as to process a data field included in the data to be detected into a specification field.
The data preprocessing may include various data preprocessing modes such as extraction, merging and deduplication, error correction, transformation and transformation of data fields in the data to be detected, and the specific data preprocessing mode is not specifically limited in this specification.
The data preprocessing rules may be stored in the big data calculation engine individually, or may be stored in the detection rules corresponding to the preprocessed data fields, respectively.
The plurality of detection rules include a large number of regular expressions, wherein there may be a large number of repeated regular expressions. If the multiple detection rules are executed respectively during security detection, the regular expressions can be executed repeatedly, which results in large overhead waste.
Therefore, all expressions can be extracted in advance and only executed once, so that the aims of reducing the expenditure and accelerating the operation are fulfilled. The execution result of each expression can be cached in the cache space, and when the detection rule needs to be executed, the detection result of the detection rule can be further obtained only by querying in the cache space and calling the execution result of the expression corresponding to the detection rule.
Furthermore, the data to be detected usually contains the same data fields, or the data to be detected after preprocessing contains the same fields, so that the same detection rule base can be applied. Therefore, a general cache structure for caching execution results of all regular expressions can be established in advance, after the data to be detected is received each time, all the regular expressions can be executed in batch aiming at the data to be detected, and the results are stored in a cache space according to the general cache structure.
Specifically, all regular expressions in the detection rules in the detection rule base may be grouped, and the regular expressions corresponding to the same data field are respectively grouped into one group according to the data fields corresponding to the regular expressions, so as to obtain a plurality of regular expression sets respectively corresponding to different data fields.
Since the cache space is occupied continuously in the implementation mode of caching the execution result of the regular expression into the cache space and then performing query calling according to the detection rule, the cache space is not needed to be occupied selectively for the result of the regular expression with a small number of multiplexing times. Therefore, the part of regular expression sets with low cache utilization rate can be deleted. The determination method of low cache utilization rate is not specifically limited in this specification. For example, the regular expression set, in which the number of repeated occurrences of the regular expressions included in the regular expression set in the detection rules in the detection rule base is smaller than a preset threshold value, may be deleted;
in an exemplary embodiment shown in this specification, the number of regular expressions in the plurality of regular expression sets is respectively determined, and the regular expression set whose number of regular expressions is smaller than a preset threshold is determined as the regular expression set with a low cache utilization rate, and is deleted.
Step 204, in response to receiving the data to be detected from the data source, executing regular expressions in the regular expression sets, and caching the execution results corresponding to the regular expression sets into a cache space;
and after the regular expressions are grouped and the regular expression set is obtained, the received data to be detected can be subjected to general processing.
And executing each regular expression in each regular expression set according to the plurality of regular expression sets aiming at the data to be detected when the data to be detected from the data source is received, and obtaining the execution result of each regular expression corresponding to each regular expression set.
In an exemplary embodiment shown in the present specification, the executing each regular expression includes:
and executing regular expressions in the regular expression sets in batches based on a preset acceleration frame.
Since all regular expressions are uniformly executed, a large number of regular expressions need to be executed at the same time, and therefore the regular expression acceleration framework can be used for accelerating the execution of the regular expressions. The specification does not limit the specific type of the regular expression acceleration framework, and for example, a flash frame, a hyperscan frame, or the like may be used.
In one illustrative embodiment shown in this description, the regular expression acceleration framework described above comprises a hyperscan framework. The Hyperscan framework is a high-performance regular expression acceleration framework suitable for multimode and streaming matching, and has the characteristic that the acceleration effect is better when the number of the regular expressions which are executed at the same time is larger.
According to the scheme, all regular expressions are uniformly executed, and the number of the executed regular expressions is very large, so that the regular expression execution process is accelerated by using a hyperscan frame, and a good acceleration effect can be achieved.
The execution result of each regular expression can be represented by binary values 1 and 0, where 1 represents regular match and 0 represents regular mismatch. For each regular expression set, the execution result can be represented by binary values representing the execution results of the regular expressions contained in the regular expression set, and the binary character strings are obtained by splicing according to a preset sequence.
For example, a regular expression set includes 6 regular expressions, and the execution results of the 10 regular expressions for the data to be detected are respectively: the regular expression set may be recorded as a binary character string 010100, and the binary character string is cached as the character string in a cache space.
The execution result of the regular expression set may be stored in a key-value key value pair manner, where a value of the execution result corresponding to the regular expression set is used as a value, and a key value may be a value corresponding to the expression set, which is not specifically limited in this specification, for example, a field name or a field value of a data field corresponding to the expression set may be used as the key value.
In an exemplary embodiment shown in this specification, according to data fields corresponding to the regular expression sets, field values of the data fields are respectively used as key values, values of execution results corresponding to the regular expression sets are used as value values, a key-value pair is formed, and the key-value pair is stored in a cache space.
Because the detection rule base contains many detection rules and many regular expressions, the execution results of all regular expression sets are cached in the cache space, and a large cache space is also required to be occupied. For example, in the related art, the structure of java Map is usually used to cache the data of the set type of key-value pair, and with such structure, a large amount of overhead is needed to be spent on the overhead of java header, object reference, array length, padding, and the like.
Therefore, the scheme of the present specification caches the execution result of the regular expression set into the cache space to which the high compression cache space structure is applied.
When the cached key value pairs are read in the cache space, according to the detection rule to be executed actually, only the value of value needs to be inquired according to the corresponding key value, that is, the execution result of each expression set where each detection expression in the detection rule is located is further read, and then the result of the detection rule is obtained according to the execution result of each detection expression.
Since the change of the length of the key value is large and may be long, which may cause large resource overhead, the cache space overhead may be reduced by processing the key value.
When the key value pairs cached in the cache space are read, the key values are only used for searching matching. Therefore, the value of the data field may not be directly stored as the key value, but the index corresponding to the value of the data field may be stored as the key value, and when the query is actually performed, the corresponding value may be searched as the key value in the cache space by calculating the corresponding index value according to the value of the data field. The index value can control the data structure and size, so that the size of the cache space can be effectively compressed.
The present specification does not limit the specific algorithm for calculating the corresponding index value from the value of the data field. For example, md5, SHA, CRC, etc. algorithms may be used.
In an exemplary implementation shown in this specification, an md5 algorithm is used as the above-mentioned indexing algorithm, and an md5 value obtained by performing an md5 operation on a field value of a data field is stored in a cache space in the form of a key value pair as a key value.
The md5 algorithm is a message digest algorithm, and can generate a 16-byte md5 value according to original information, and is commonly used for information verification. Because the length of the md5 value obtained by the md5 algorithm is fixed to 16 bytes, and the possibility that different character strings generate the same md5 value hardly occurs, the md5 value corresponding to the field value of the data field is used as a key value, so that the cache space used by the cache can be effectively compressed, and the overhead is reduced.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a cache structure of a data security detection method according to an embodiment of the present disclosure;
as shown in fig. 3, the cache space structure includes a main cache space and an additional cache space, where the main cache space is a main array structure composed of a plurality of main array elements. The additional cache space is composed of a plurality of additional cache space blocks, the additional cache space blocks are of a list structure composed of a plurality of additional array structures, and the additional array structures are composed of a plurality of additional array elements.
When the key-value pair is stored in the cache space structure, a main array element is distributed to the key-value pair, and the key-value pair is stored in the main array element.
To further compress the cache space, the key-value pairs may be assigned the primary array elements using a hash-hashing scheme. A hash algorithm may be designed such that the obtained hash value corresponds to the main array element in the main array structure.
For example, a hash algorithm may be designed to perform a hash operation on the key values in the key value pairs, and the maximum value of the obtained hash values is set to be the same as the number of the key value pairs, that is, the number of the corresponding regular expression sets; therefore, the above hash operation may be performed on the key value of the key value pair, and the main array element corresponding to the obtained hash value may be assigned to the key value pair.
By using the hash operation, the corresponding storage position can be directly calculated according to the key value in the key value pair, and the addressing information is saved without using an extra cache space, so that the cache space can be further compressed.
The length of the main array element can be a fixed preset value, wherein the main array element comprises an address bit and a data storage bit for storing a key value pair.
Taking the cache space structure in fig. 3 as an example, the length of the main array element is 24 bytes, which includes 4 bytes of address bits and 20 bytes of data storage bits. Since the key value in the key-value pair is a md5 value of fixed length, 16 bytes of key bits are occupied, and the remaining 4 bytes are value bits for storing value values.
However, the length of the value bit in the key value pair is determined by the number of detection expressions included in the regular expression set corresponding to the key value pair, and is an unfixed value, and the length may exceed 4 bytes.
Therefore, when the key-value pair is stored in the main array element, it needs to be firstly confirmed whether the length of the value exceeds 4 bytes, and if not, the value can be normally stored in the value bit of the main array element;
when the length of the value exceeds 4 bytes, the value needs to be stored into an additional buffer space, and a pointer of an address of the additional buffer space into which the value is stored in a value bit in the main array element.
It should be noted that the length of the main array and the length of the value bit are only an example, and may be set to other reasonable length values according to actual needs, for example, the length of the main array element may be set to 32 bytes or 36 bytes, etc.
In an exemplary embodiment shown in this specification, the specific way to store the value into the additional buffer space is as follows:
assigning an additional array element to the value, wherein the additional cache space initially comprises an additional cache space block, sequentially confirming whether the additional array element in the last newly-built cache space block is empty, and assigning the first empty additional array element to the value; if all the additional array elements in the additional cache space block are not empty, an additional cache space block is newly built, and the first additional array element in the newly built additional cache space block is allocated to the value;
writing the value into the additional array element, if the length of the value exceeds the length of the additional array element, continuing to write the next additional array element until the value is completely stored;
and storing the pointer of the address of the initial additional array element allocated to the value into the value bit in the main array element.
At this time, 4 bytes (32 bits) in the value bit, 7 bits for indicating the additional cache space block where the additional array element is located and 16 bits for indicating the position of the additional array element in the additional cache space block where it is located may be set.
Therefore, there may be up to 128 additional cache space blocks in the additional cache space at this time. The additional cache space block can be added according to the requirement, and when all the additional array elements in one additional cache space block are used, a new additional cache space block is newly built when the additional array elements are required to be newly built. The additional cache space block is added according to the actual requirement, so that the cache utilization rate can be improved, the cache space resource used by the additional cache space block is saved,
the length of the array element of the additional buffer space may also be set as desired, for example, it may still be set to 24 bytes.
In the method for allocating the main array elements to the key value pairs in the hash manner, hash collision may occur, that is, different key values are subjected to a hash algorithm to obtain the same hash value, which corresponds to the same main array element.
For the second key-value pair pointing to the main array element, an empty additional buffer space array element may be allocated for the key-value pair, and the key-value pair may be stored in the empty additional buffer space array element, and a pointer to the address of the additional buffer space array element may be stored in the address bit of the main array element.
For the third key-value pair pointing to the main array element, an empty additional buffer space array element may be allocated to the key-value pair, and the key-value pair may be stored in the empty additional buffer space array element, and a pointer pointing to the address of the additional buffer space array element may be stored in the address bit of the additional array element in which the second key-value pair pointing to the main array element is stored.
And for the subsequent key value pairs pointing to the main array element, processing by the method, and storing the additional array element pointing to the next key value pair in the address bit of each array element to form a linked list structure.
In this method, the structure of the additional array element is the same as that of the main array element, for example, in the embodiment corresponding to fig. 3, the length of the additional array element is 24 bytes, which includes 4 bytes of address bits, 16 bytes of key bits, and 4 bytes of value bits.
In summary, the method for caching the execution results corresponding to the regular expression sets into the cache space includes the steps that the execution results corresponding to the expression sets are subjected to binary value character strings, the values of the data fields corresponding to the regular expression sets are compressed through an md5 algorithm, addressing is performed through a hash algorithm, key and value values are directly stored in a byte mode, and the cache space occupied by the cache is highly compressed through a multi-step caching means.
Step 208, in response to that the regular expressions in the regular expression sets corresponding to all the data fields included in the data to be detected are executed, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of each regular expression included in the detection rules from the cache space, and generating the detection result corresponding to the detection rule based on the read execution results.
After the regular expressions in all the regular expression sets are executed and cached in the high-compression mode, such as a cache space, each detection rule to be detected can be read from the detection rule base once.
And for each detection rule, confirming each regular expression in the detection rules, respectively reading the value of the data field according to the data field corresponding to the regular expression, applying the md5 algorithm to obtain the corresponding md5 value, taking the md5 value as a key value, reading the binary character string corresponding to the execution result of the regular expression set corresponding to the data field in a cache space, and obtaining the execution result corresponding to the regular expression from the binary character string according to the position of the regular expression in the regular expression set. And after the execution results corresponding to all regular expressions in the detection rule are obtained, the detection result of the detection rule is further determined.
In an exemplary embodiment shown in this specification, the regular expressions corresponding to partial data fields in the data to be detected have fewer repetition times in the multiple detection rules included in the detection rule base, and for caching the execution results of the regular expressions in advance, the regular expressions may be executed to obtain corresponding execution results.
The execution process of the regular expression may be accelerated and executed based on a preset regular expression acceleration frame, for example, a hyperscan frame.
Referring to fig. 4, fig. 4 is a schematic diagram of a data security detection method according to an embodiment of the present disclosure;
when the data security detection method is applied to a RockettMQ-Streams architecture, the detection rule is included in a plurality of security computing tasks, and the security computing tasks comprise a data source task and a plurality of filtering tasks. The data source task mainly performs data preprocessing or ETL (extraction, conversion and loading) processing on data to be detected acquired from a data source, and obtains a plurality of normalized data fields which are convenient to process from the data to be detected.
The filtering task comprises a plurality of regular expressions, and the filtering task further determines the detection result according to the execution results of the regular expressions.
Firstly, all regular expressions are extracted from the plurality of filtering tasks, and the expressions are grouped according to data fields corresponding to the regular expressions.
And after the data to be processed is acquired, the data source task is operated to obtain a normalized data field, and whether a corresponding execution result (expression fingerprint) exists in a cache corresponding to each expression group corresponding to each field is checked according to the data field.
If yes, the detected repeated data fields exist, and the execution results corresponding to the fields are put into the context;
if not, based on the hyperscan framework, the execution of the regular expressions in each expression group is accelerated, and the execution result is stored in a high-compression cache and a context in the form of binary number.
And sequentially checking whether the context contains the results of all the filtering rules, acquiring the execution results of the regular expressions corresponding to the filtering rules from the context for the filtering rules of which the results do not exist in the context, and further determining the detection results of the filtering rules according to the execution results of the regular expressions.
In an exemplary embodiment of the present specification, a data security detection apparatus is also provided. Referring to fig. 5, fig. 5 is a block diagram of a data security detection device according to an embodiment of the present disclosure.
The big data computing engine is applied to a cloud service system, and is used for operating a detection rule in a preset detection rule base and carrying out security detection on to-be-detected data from a data source butted with the cloud service system;
the data to be detected comprises a plurality of data fields; the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression;
the device comprises the following units:
a grouping unit 520, configured to group regular expressions included in the detection rules in the detection rule base according to data fields corresponding to the regular expressions, so as to obtain multiple regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
a caching unit 530, configured to execute regular expressions in the regular expression sets in response to receiving data to be detected from the data source, and cache an execution result corresponding to each regular expression set in a caching space;
the detecting unit 540 is configured to, in response to that the regular expressions in the regular expression sets corresponding to all the data fields included in the data to be detected are executed, sequentially read the detection rules to be executed from the detection rule base, read the execution results of the regular expressions included in the detection rules from the cache space, and generate the detection results corresponding to the detection rules based on the read execution results.
Optionally, the detection rule is a logical operation expression formed by a plurality of regular expressions corresponding to data fields included in the data to be detected according to a preset logical operation mode;
the detecting unit 540 is specifically configured to perform logical operation according to a preset logical operation manner based on the read execution result, so as to obtain a detection result corresponding to the detection rule.
Optionally, the data to be detected includes a data segment that is dynamically intercepted through a sliding window in stream data from the same data source.
Optionally, the apparatus further comprises:
the preprocessing unit 510 performs data preprocessing on the data to be detected, so as to process a data field included in the data to be detected into a canonical field.
Optionally, the grouping unit 520 is specifically configured to group regular expressions included in the detection rules in the detection rule base according to data fields corresponding to the regular expressions, to obtain a plurality of regular expression sets respectively corresponding to different data fields, to respectively determine the number of the regular expressions in the plurality of regular expression sets, and to delete the regular expression sets of which the number of the regular expressions is smaller than a preset threshold;
a detecting unit 540, configured to specifically determine whether an execution result of each regular expression included in the detection rule exists in the cache space; and if the regular expression exists in the cache space, acquiring the execution result of the regular expression from the cache space, and if the regular expression does not exist in the cache space, executing the regular expression to obtain a corresponding execution result.
Optionally, the caching unit 530 is specifically configured to execute regular expressions in a batch manner in the plurality of regular expression sets based on a preset regular expression acceleration framework.
Optionally, the regular expression acceleration framework includes a hypersaccn framework.
Optionally, the caching unit 530 is specifically configured to cache the execution result corresponding to each regular expression set in a cache space to which a high-compression cache space structure is applied.
Optionally, the caching unit 530 is specifically configured to form a key-value key value pair by using field values of the data fields as key values and using values of execution results corresponding to the regular expression sets as value values according to the data fields corresponding to the regular expression sets, and store the key value pair in a caching space.
Optionally, the value of the execution result corresponding to the regular expression set includes:
and binary character strings are spliced by binary values corresponding to the execution results of all regular expressions contained in the regular expression set.
Optionally, the cache unit 530 is specifically configured to perform md5 operation on the field value of the data field to obtain a corresponding md5 value; and taking the md5 value as a key value.
Optionally, the cache space structure includes a main cache space, and the main cache space includes a main array structure formed by a plurality of main array elements;
the cache unit 530 is specifically configured to allocate a main array element to the key-value pair, and store the key-value pair in the main array element.
Optionally, the number of the main array elements is the number of the data fields;
the cache unit 530 is specifically configured to perform a hash operation on the key value of the key value pair, and allocate a main array element corresponding to the obtained hash value to the key value pair.
Optionally, the cache space structure includes an additional cache space formed by a plurality of additional cache space blocks; the additional cache space block comprises a list structure formed by a plurality of additional array structures; the additional array structure is composed of a plurality of additional array elements.
Optionally, the caching unit 530 is specifically configured to determine whether the length of the value exceeds a preset threshold, and if not, store the key-value pair into the main array element;
if so, distributing a plurality of additional array elements for the key value pair, storing the value in the key value pair into the additional array elements, and storing the key value in the key value pair and a pointer pointing to the address of the additional array elements into the main array element.
Optionally, the caching unit 530 is specifically configured to perform a hash operation on the key value of the key value pair, and determine whether the key value pair is already stored in the main array element corresponding to the obtained hash value;
if not, allocating the main array element corresponding to the hash value to the key value pair;
if so, an additional array element is allocated to the key-value pair, and a pointer to the address of the additional array element is stored in the main address bit.
Optionally, the cloud service system includes a proprietary cloud or a hybrid cloud service system; the big data compute engine includes a lightweight secure compute engine.
Optionally, the big data compute engine comprises a rocktmq-Streams compute engine.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on at least one network unit. Some or all of the elements can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement without inventive effort.
In exemplary embodiments of the present description, embodiments of an apparatus and a terminal applied thereto are also provided.
The embodiments of the apparatus of the present specification can be applied to a computer device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory through the processor where the device is located and running the computer program instructions. In terms of hardware, as shown in fig. 6, fig. 6 is a hardware structure diagram of a computer device 60 in which an apparatus according to an embodiment of the present disclosure is located, and besides the processor 610, the memory 630, the network interface 620, and the nonvolatile memory 640 shown in fig. 6, a server or an electronic device in which the apparatus is located in an embodiment may also include other hardware according to an actual function of the computer device, which is not described again.
In an exemplary embodiment of the present specification, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the present description may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present description described in the "exemplary methods" section above of the present description, when the program product is run on the terminal device.
A program product for implementing the above method according to an embodiment of the present specification may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present specification is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or at least one readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for this specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of at least one embodiment can also be implemented in combination in a single embodiment. In another aspect, various features that are described in the context of a single embodiment can also be implemented in at least one embodiment separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system elements and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into at least one software product.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (14)

1. A data security detection method is applied to a big data computing engine carried by a cloud service system, wherein the computing engine is used for operating detection rules in a preset detection rule base and carrying out security detection on to-be-detected data from a data source butted with the cloud service system; the data to be detected comprises a plurality of data fields; the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression; the method comprises the following steps:
grouping regular expressions contained in the detection rules in the detection rule base according to data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
responding to the received data to be detected from the data source, executing regular expressions in the regular expression sets, and caching the execution results corresponding to the regular expression sets into a cache space;
and in response to the fact that the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected are executed, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected and a logic operation formula formed according to a preset logic operation mode;
the generating a detection result corresponding to the detection rule based on the read execution result includes:
and based on the read execution result, performing logical operation according to a preset logical operation mode to obtain a detection result corresponding to the detection rule.
3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the data to be detected comprises data segments which are dynamically intercepted through a sliding window in stream data from the same data source.
4. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
before grouping regular expressions included in the detection rules in the detection rule base according to the data fields corresponding to the regular expressions, the method further includes:
and carrying out data preprocessing on the data to be detected so as to process data fields contained in the data to be detected into standard fields.
5. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the executing regular expressions in the plurality of regular expression sets includes:
and executing regular expressions in the regular expression sets in batch based on a preset regular expression acceleration frame.
6. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the caching the execution result corresponding to each regular expression set into a cache space includes:
and according to the data fields corresponding to the regular expression sets, respectively taking md5 values obtained by performing md5 operation on the field values of the data fields as key values, taking values of execution results corresponding to the regular expression sets as value values, forming key-value key value pairs, and storing the key value pairs into a cache space.
7. The method of claim 6, wherein the first and second light sources are selected from the group consisting of,
the cache space comprises a main cache space, and the main cache space comprises a main array structure formed by a plurality of main array elements;
the storing the key-value pair into a cache space comprises:
and allocating a main array element to the key-value pair, and storing the key-value pair into the main array element.
8. The method of claim 7, wherein said at least one of said first and second sets of parameters is selected from the group consisting of,
the number of the main array elements is the number of the data fields;
the allocating a main array element to the key-value pair includes:
and carrying out hash operation on the key value of the key value pair, and distributing the main array element corresponding to the obtained hash value to the key value pair.
9. The method of claim 7, wherein the first and second light sources are selected from the group consisting of,
the cache space comprises an additional cache space formed by a plurality of additional cache space blocks;
the additional cache space block comprises a list structure formed by a plurality of additional array structures;
the additional array structure is composed of a plurality of additional array elements.
10. The method of claim 9, wherein the first and second reference signals are transmitted,
the storing the key-value pair into the main array element comprises:
judging whether the length of the value exceeds a preset threshold value or not;
if not, storing the key value pair into the main array element;
if so, distributing a plurality of additional array elements for the key value pair, storing the value in the key value pair into the additional array elements, and storing the key value in the key value pair and a pointer pointing to the address of the additional array elements into the main array element.
11. The method of claim 9, wherein the first and second light sources are selected from the group consisting of,
performing hash operation on the key value of the key value pair, and allocating the main array element corresponding to the obtained hash value to the key value pair, including:
performing hash operation on the key value of the key value pair, and judging whether the key value pair is stored in the main array element corresponding to the obtained hash value;
if not, allocating the main array element corresponding to the hash value to the key value pair;
if so, an additional array element is allocated to the key-value pair, and a pointer to the address of the additional array element is stored in the main array element.
12. A data security detection device is applied to a big data computing engine carried by a cloud service system, wherein the computing engine is used for operating detection rules in a preset detection rule base and carrying out security detection on to-be-detected data from a data source butted with the cloud service system;
the data to be detected comprises a plurality of data fields; the detection rule comprises a plurality of regular expressions corresponding to data fields contained in the data to be detected; the regular expression is used for performing regular matching on the data fields contained in the data to be detected and corresponding to the regular expression;
the device comprises:
the grouping unit is used for grouping the regular expressions contained in the detection rules in the detection rule base according to the data fields corresponding to the regular expressions to obtain a plurality of regular expression sets respectively corresponding to different data fields; the data fields corresponding to all regular expressions in the regular expression set are the same;
the cache unit is used for executing regular expressions in the regular expression sets in response to receiving the data to be detected from the data source, and caching the execution results corresponding to the regular expression sets into a cache space;
and the detection unit is used for responding to the completion of the execution of the regular expressions in the regular expression sets corresponding to all the data fields contained in the data to be detected, sequentially reading the detection rules to be executed from the detection rule base, reading the execution results of all the regular expressions contained in the detection rules from the cache space, and generating the detection results corresponding to the detection rules based on the read execution results.
13. A storage medium having stored thereon a computer program which, when executed, carries out the steps of the method according to any one of claims 1-11.
14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1-11 when executing the program.
CN202210647637.4A 2022-06-08 2022-06-08 Data security detection method and device Pending CN115168850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210647637.4A CN115168850A (en) 2022-06-08 2022-06-08 Data security detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210647637.4A CN115168850A (en) 2022-06-08 2022-06-08 Data security detection method and device

Publications (1)

Publication Number Publication Date
CN115168850A true CN115168850A (en) 2022-10-11

Family

ID=83485981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210647637.4A Pending CN115168850A (en) 2022-06-08 2022-06-08 Data security detection method and device

Country Status (1)

Country Link
CN (1) CN115168850A (en)

Similar Documents

Publication Publication Date Title
US9852056B2 (en) Multi-level memory compression
CN107870728B (en) Method and apparatus for moving data
US8332367B2 (en) Parallel data redundancy removal
US11176099B2 (en) Lockless synchronization of LSM tree metadata in a distributed system
US10031691B2 (en) Data integrity in deduplicated block storage environments
US11829624B2 (en) Method, device, and computer readable medium for data deduplication
CN111949710B (en) Data storage method, device, server and storage medium
US10983718B2 (en) Method, device and computer program product for data backup
US9471244B2 (en) Data sharing using difference-on-write
US20200349186A1 (en) Method, apparatus and computer program product for managing metadata of storage object
CN110888918A (en) Similar data detection method and device, computer equipment and storage medium
CN112748866A (en) Method and device for processing incremental index data
US11662927B2 (en) Redirecting access requests between access engines of respective disk management devices
CN115168850A (en) Data security detection method and device
CN110716946B (en) Method and device for updating feature rule matching library, storage medium and electronic equipment
US20210373881A1 (en) Memory efficient software patching for updating applications on computing devices
CN110413215B (en) Method, apparatus and computer program product for obtaining access rights
US20240028519A1 (en) Data processing method, electronic device and computer program product
CN112784596A (en) Method and device for identifying sensitive words
US11500590B2 (en) Method, device and computer program product for data writing
CN111831620B (en) Method, apparatus and computer program product for storage management
US11494100B2 (en) Method, device and computer program product for storage management
CN111562940B (en) Project data construction method and device
US11822803B2 (en) Method, electronic device and computer program product for managing data blocks
US11243932B2 (en) Method, device, and computer program product for managing index in storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination