CN115686376A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115686376A
CN115686376A CN202211431013.5A CN202211431013A CN115686376A CN 115686376 A CN115686376 A CN 115686376A CN 202211431013 A CN202211431013 A CN 202211431013A CN 115686376 A CN115686376 A CN 115686376A
Authority
CN
China
Prior art keywords
data
area
stored
sensitivity
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211431013.5A
Other languages
Chinese (zh)
Inventor
王富彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202211431013.5A priority Critical patent/CN115686376A/en
Publication of CN115686376A publication Critical patent/CN115686376A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a storage medium. The method comprises the following steps: responding to a data storage request, and acquiring initial data to be stored; determining a first heat type of the initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type; determining a first sensitivity degree of the initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree; and storing the initial data to be stored into the initial sensitivity area. The technical scheme of the embodiment of the invention can realize high-efficiency regional storage of data, thereby facilitating the management and use of the data.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data processing method and device, electronic equipment and a storage medium.
Background
With the development of informatization and digitization of various industries, it is very important to build a set of efficient complete data system.
At present, the data volume of data is increasing, and various data are mixed together, and the data is divided in a lack of efficient data processing mode, so that the management and the use of the data are not facilitated, and the improvement is needed.
Disclosure of Invention
Embodiments of the present invention provide a data processing method and apparatus, an electronic device, and a storage medium, so as to implement efficient partition of area storage data, thereby facilitating management and use of data.
According to an aspect of the present invention, there is provided a data processing method, which may include:
responding to a data storage request, and acquiring initial data to be stored;
determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type;
determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree;
and storing the initial data to be stored into the initial sensitivity area.
According to another aspect of the present invention, there is provided a data processing apparatus, which may include:
the data to be stored acquisition module is used for responding to the data storage request and acquiring initial data to be stored;
the initial heat area determining module is used for determining a first heat type of initial data to be stored and determining an initial heat area in a first storage area in a data lake according to the first heat type;
the initial sensitivity region determining module is used for determining a first sensitivity degree of initial data to be stored and determining an initial sensitivity region in the initial heat region according to the first sensitivity degree;
and the initial data storage module to be stored is used for storing the initial data to be stored into the initial sensitivity area.
According to another aspect of the present invention, there is provided an electronic device, which may include:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform a data processing method provided by any of the embodiments of the present invention when executed.
According to another aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer instructions for causing a processor to execute a method of processing data provided by any of the embodiments of the present invention.
According to the technical scheme of the embodiment of the invention, initial data to be stored is obtained in response to a data storage request; determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type; determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity region in the initial heat region according to the first sensitivity degree; and storing the initial data to be stored into the initial sensitivity area. The technical scheme of the embodiment of the invention can divide the data in an efficient data processing mode, thereby facilitating the management and the use of the data. According to the technical scheme, the data are stored in the storage areas which are divided according to multiple dimensions such as data processing stages, heat types and sensitivities, the data are stored in the divided areas efficiently, and therefore the data are managed and used conveniently.
It should be understood that the statements in this section do not necessarily identify key or critical features of any embodiment of the present invention, nor do they necessarily limit the scope of the present invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention;
fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention;
fig. 3 is a structural diagram of area division corresponding to a data storage manner adopting multiple dimensions in a data lake according to a second embodiment of the present invention;
FIG. 4 is a flowchart of a data processing method provided in a third embodiment of the present invention;
fig. 5 is a flowchart of an alternative example of a data processing method provided in the third embodiment of the present invention;
fig. 6 is a block diagram of a data processing apparatus according to a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device implementing the data processing method according to the embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. The cases of "target", "original", etc. are similar and will not be described in detail herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention. The embodiment can be applied to the case of processing data, in particular to the case of processing data on the basis of a data lake. The method can be executed by a data processing device provided by the embodiment of the invention, the device can be realized by software and/or hardware, the device can be integrated on an electronic device, and the electronic device can be a data lake platform, various user terminals or a server.
Referring to fig. 1, the method of the embodiment of the present invention specifically includes the following steps:
and S110, responding to the data storage request, and acquiring initial data to be stored.
A data storage request may be understood, among other things, as a request to indicate that data is stored. The initial data to be stored can be understood as data to be stored in the data lake.
It is to be understood that a data lake is a centralized repository, a database that stores various large raw data sets in native format, the data lake allowing all structured or unstructured data to be stored at any scale.
In the embodiment of the invention, after the initial data to be stored is obtained, the initial data to be stored can be converted into a character string form, so that the integrity of the initial data to be stored is ensured.
S120, determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type.
Wherein the first heat type is a heat type of the initial data to be stored. The heat type may be understood as a data type capable of reflecting the access frequency of data. The hot type of the data may be determined by the data state of the data and the access frequency within a preset time period. The access frequency may be determined according to a data access auditing mechanism, for example, and in the embodiment of the present invention, the determination method of the access frequency is not particularly limited. The data state may be understood as a duty cycle in which data is located, and may include, for example: a start period, a working period, an end period, or a pre-post service period, etc. The heat type may include cold data, warm data, or hot data. The cold data may be understood as a data type with a low access frequency, for example, the data with the low access frequency and the data state at the end period within a preset time period is the cold data. The thermal data may be understood as a data type with a high access frequency, for example, the data with a high access frequency and a data state in a working period within a preset time period is the thermal data. The temperature data may be understood as a data type with a medium access frequency, for example, the data with a medium access frequency and a data state in a working period within a preset time period is the temperature data. The data heat type can be determined by storing a preset threshold range of the frequency in a preset time length, for example, if the frequency of the data stored in the preset time length is less than the minimum value in the preset threshold range, the data is cold data; if the data access frequency in the preset time length is within the preset threshold range, the data is temperature data; and if the frequency of data access in the preset time length is greater than the maximum value in the preset threshold range, the data is thermal data. The initial heat area can be understood as a heat area in which data to be initially stored needs to be stored when entering a lake. Entering the lake is the operation of storing data into the data lake. The heat area may be understood as a storage area storing data corresponding to a heat type, respectively, for example, in a case where the heat type may include cold data, temperature data, or hot data, the heat area may include a cold data area corresponding to the cold data, a temperature data area corresponding to the cold temperature data, or a hot data area corresponding to the hot data.
It should be noted that there may be a case where the data is accessed frequently and the current data state of the data just reaches the end period, in this case, although the data is accessed frequently within the preset time, the work for the data is already finished, and the data may not be accessed frequently any more afterwards, so the hot type of the data may be cold data. There may be a case where the data is accessed less frequently and the current data state of the data is an end period, but there may be a pre-service period after the end period, in which case, although the data is already before the end and is accessed less frequently within a preset time, the data may be accessed in a proper amount in the pre-service period, and thus the heat type of the data may be temperature data.
It should be noted that in the embodiment of the present invention, a data storage manner with multiple dimensions is adopted in the data lake, that is, a first storage area is set in the data lake according to a data processing stage, where the first storage area is an area where initial data to be stored set in the data lake enters the data lake, and the first storage area is an area where the initial data to be stored set in the data lake enters the data lake, for example, the first storage area may be an original area, the set original area may retain an original appearance of the initial data to be stored, no processing is performed on the initial data to be stored, an access right is strictly controlled, and a guarantee is provided for subsequent data backtracking and verification. In the first storage area, the area may be further divided according to the heat type of the data, for example, the first storage area may be divided into a cold data area, a hot data area and a warm data area, so that the initial data to be stored is stored in the corresponding initial heat area according to the first heat type, thereby facilitating effective data division and storage in the data storage process.
In the embodiment of the present invention, a first heat type of initial data to be stored may be determined, and an initial heat area in a first storage area in a data lake is determined according to the first heat type, for example, if the first heat type is cold data, the determined initial heat area is a cold data area.
It can be understood that, if the initial data to be stored carries a relevant field or tag of the heat type, or the data state of the initial data to be stored and the access frequency within the preset time length can be determined, the first heat type can be determined according to the relevant field or tag of the heat type, or the data state of the initial data to be stored and the access frequency within the preset time length. However, there may be a case where the initial data to be stored does not carry a related field or tag of the heat type, and the data state of the initial data to be stored cannot be determined, and the access frequency within the preset time period, that is, the first heat type cannot be determined, a default heat type preset according to the property of the first storage area may be used as the first heat type, so as to prevent the case where the first heat type cannot be determined. For example, if the first storage area is an original area, the data in the original area is original data that is stored to provide guarantees for subsequent data backtracking and verification, so that the access frequency of most data in the area is low, that is, most of the data stored in the original area is cold data, and based on this property, the heat type can be preset for the original area as cold data.
In the embodiment of the present invention, different types of storage media or data storage modes are also set for different types of hot areas, for example, since the data stored in the hot data area has a high access frequency and a Solid State Disk (SSD) has the characteristic of a flash memory, the SSD may be set as the storage medium for the hot data area; since the frequency of data access stored in the cold data area is low, and a Hard Disk Drive (HDD) has a large storage space and a relatively low cost, an SSD may be provided as a storage medium for the cold data area. I.e., one area partitioning a data lake may correspond to a plurality of different types of storage media. For another example, because the data stored in the hot data area has a high access frequency, a memory database suitable for a large amount of accessed data can be set as a storage mode for the hot data area; because the data stored in the cold data area has low access frequency, a relational database suitable for storing mass data can be set for the cold data area as a storage mode.
In the embodiment of the invention, the data lake can be partitioned according to the data processing stage, for example, the first storage area. According to the storage areas divided by the data processing stage, the space occupied by different heat areas can be determined based on the properties of the storage areas, for example, the data lake comprises an original area, and since most of the data in the original area is cold data, the cold data area in the original area can occupy more storage space.
In the embodiment of the present invention, according to the storage areas divided in the data processing stage, in a case that only one storage medium can be adopted in one storage area, the storage medium corresponding to the storage area may also be determined based on the property of the storage area, for example, most data of the original area is stored less frequently, and more cold data is stored, and an HDD may be used as the storage medium of the original area.
S130, determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree.
Wherein the first sensitivity level is a sensitivity level of the data to be initially stored. The degree of sensitivity may reflect the degree of confidentiality of the data. The sensitivity degree of the data can be determined by a machine learning or clustering algorithm, for example, the sensitivity degree can be determined according to a sensitivity determination model trained in advance; the sensitivity can also be determined by keywords related to the sensitivity in the data, for example, if keywords such as "name", "gender" or "telephone" exist in the data, the sensitivity is confidential data; or according to the sensitive label carried by the initial data to be stored. The sensitivity level may include public data, internal data, or confidential data. Public data is understood to not relate to more confidential data and may be data that is publicly visible or selected for disclosure by the data holder. Internal data may be understood as data that is generally visible to an authorized user or visible after being applied by a data access party and authorized by an owner; the internal data may be used as a default sensitivity level when the initial data to be stored enters the lake. Confidential data is understood to be data that covers confidential content, which not only needs to be desensitized and encrypted, but also the access mechanism is more stringent. The initial sensitivity region can be understood as a sensitivity region into which data to be initially stored is required to be stored when entering a lake. The sensitivity region may be understood as a storage region storing data corresponding to the sensitivity degree, respectively, for example, in the case where the sensitivity degree may include common data, internal data, or confidential data, the heat region may include a common data region corresponding to the common data, an internal data region corresponding to the internal temperature data, or a confidential data region corresponding to the confidential data.
It should be noted that, in the embodiment of the present invention, because a data storage manner with multiple dimensions is adopted in the data lake, that is, the region may be divided again in the heat region according to the sensitivity of the data, for example, the heat region may be divided into a public data region, an internal data region, and a confidential data region, so that the initial data to be stored is stored into the corresponding sensitivity region according to the first sensitivity, thereby facilitating further effective data division and storage in the data storage process.
And S140, storing the initial data to be stored into the initial sensitivity area.
In the embodiment of the invention, the initial data to be stored can be stored in the initial sensitivity region in the initial heat region in the data lake, so that the initial data to be stored can be stored in the data lake divided into a plurality of dimensions, and the high-efficiency data storage in the divided region is realized.
In the embodiment of the present invention, after the initial data to be stored is stored in the initial sensitivity region, a dynamic data hot region adjustment may be performed based on a Least Recently Used (LRU) algorithm, for example, a dynamic data hot region adjustment may be performed based on an LRU-3 algorithm with a number of times of last use being 3 times.
According to the technical scheme of the embodiment of the invention, initial data to be stored is obtained in response to a data storage request; determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type; determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree; and storing the initial data to be stored into the initial sensitivity area. The technical scheme of the embodiment of the invention can divide the data in an efficient data processing mode, and is convenient for managing and using the data. According to the technical scheme, the data are stored in the storage areas which are divided according to multiple dimensions such as data processing stages, heat types and sensitivities, the data are stored in the divided areas efficiently, and therefore the data are managed and used conveniently.
An optional technical solution, determining a first sensitivity of initial data to be stored, includes: inputting the initial data to be stored into a pre-trained sensitivity determination model, and determining a first sensitivity degree of the initial data to be stored according to an output result of the sensitivity determination model.
In the embodiment of the present invention, the sensitivity determination model may be trained in advance according to the existing training samples or the training samples set by the user. After responding to a data storage request and acquiring initial data to be stored, inputting the initial data to be stored into a pre-trained sensitivity determination model, and determining a first sensitivity degree of the initial data to be stored according to an output result of the sensitivity determination model so as to improve the accuracy of the determined first sensitivity degree.
It should be noted that after the first sensitivity degree is determined, a corresponding sensitivity degree tag may be established for the initial data to be stored, so that when the sensitivity degree of the data is required to be obtained, the sensitivity degree of the data can be obtained only according to the sensitivity degree tag, and the sensitivity degree is not required to be determined again by the sensitivity degree determination model.
Example two
Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the data lake further includes a second storage area; the data processing method further comprises: in response to the first processing instruction, determining first data to be stored from the first storage area, and determining a corresponding first heat area and a first sensitivity area of the first data to be stored in the first storage area, wherein the first data to be stored is stored in the first sensitivity area in the first heat area; determining a second heat area corresponding to the first heat area from the second storage area, and determining a second sensitivity area corresponding to the first sensitivity area from the second heat area; and determining the structured data of the first data to be stored, and storing the structured data into the second sensitivity region. The same or corresponding terms as those in the above embodiments are not explained in detail herein.
Referring to fig. 2, the method of the present embodiment may specifically include the following steps:
s210, responding to the data storage request, and acquiring initial data to be stored.
S220, determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type, wherein the data lake further comprises a second storage area.
It should be noted that, in the embodiment of the present invention, because a multi-dimensional data storage manner is adopted in the data lake, that is, in a processing stage of data in the data lake, in addition to setting a first storage area for a lake entering stage of data to be stored, a second storage area may also be set for a data aggregation stage, where the second storage area is an area stored when the first data to be stored set in the data lake is aggregated into structured data, for example, the second storage area may be an aggregation area, and the set aggregation area may perform necessary metadata analysis and extraction in combination with a metadata center, and convert semi-structured data into structured data. The second storage area may divide the heat area again according to the heat type of the data, and divide the sensitivity area in the heat area again according to the sensitivity of the data, and the multi-dimensional division manner of the second storage area is the same as the division manner of the first storage area, which is not described in detail herein. The first data to be stored can be understood as data that needs to be stored in the first storage area of the second storage area.
S230, determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree.
And S240, storing the initial data to be stored into the initial sensitivity area.
S250, responding to the first processing instruction, determining first data to be stored from the first storage area, and determining a corresponding first heat area and a first sensitivity area of the first data to be stored in the first storage area, wherein the first data to be stored is stored in the first sensitivity area in the first heat area.
The first processing instruction may be understood as an instruction which indicates that the first data to be stored is determined from the first memory area to be stored in the second memory area. The first heat area may be understood as a heat area where the first data to be stored is located in the first storage area. The first sensitivity region may be understood as a sensitivity region in which the first data to be stored is located in the first heat region.
S260, determining a second heat area corresponding to the first heat area from the second storage area, and determining a second sensitivity area corresponding to the first sensitivity area from the second heat area.
The second heat area may be understood as a heat area in the second storage area into which the first data to be stored needs to be stored. The second sensitivity region can be understood as a sensitivity region in the second heat region into which the first demand for data to be stored is stored.
It can be understood that, because the first heat area can represent the heat type of the first data to be stored, and when the first data to be stored is stored in the second storage area, the first data to be stored also needs to be stored in the heat area corresponding to the heat type of the first data to be stored, the heat area in the second storage area, which has the same heat type as the first heat area, is the second heat area into which the first data to be stored needs to be stored. The first sensitivity region can represent the sensitivity type of the first data to be stored, and when the first data to be stored is stored in the first heat region, the first data to be stored is also required to be stored in the sensitivity region corresponding to the sensitivity type of the first data to be stored, so that the sensitivity region in the second heat region with the same sensitivity type as the first sensitivity region is the second sensitivity region in which the first data to be stored is required to be stored.
S270, determining the structured data of the first data to be stored, and storing the structured data into the second sensitivity area.
It can be understood that the data stored in the second storage area is structured data, so that when the first data to be stored is unstructured data, the structured data corresponding to the first data to be stored can be determined; and when the first data to be stored is the structured data, taking the first data to be stored as the structured data. The structured data is then stored in a second sensitivity zone.
According to the technical scheme of the embodiment of the invention, the data lake further comprises a second storage area; in response to the first processing instruction, determining first data to be stored from the first storage area, and determining a corresponding first heat area and a first sensitivity area of the first data to be stored in the first storage area, wherein the first data to be stored is stored in the first sensitivity area in the first heat area; determining a second heat area corresponding to the first heat area from the second storage area, and determining a second sensitivity area corresponding to the first sensitivity area from the second heat area; and determining the structured data of the first data to be stored, and storing the structured data into the second sensitivity area. According to the technical scheme, the areas in the data lake are further divided according to the data processing stage, the data are stored in the areas further divided according to the data processing stage, and then the data are stored in the storage areas divided according to multiple dimensions such as heat types, sensitivities and the like, so that the data are further efficiently stored in the divided areas, and the data are conveniently managed and used.
An optional technical solution, before storing the structured data in the second sensitivity region, further includes: and under the condition that the second sensitivity degree of the first to-be-stored data corresponding to the structured data is a preset degree, desensitizing the structured data.
It will be appreciated that portions of more sensitive data may contain confidential content, and such data may require desensitization to ensure the security of the data. Therefore, a preset degree can be preset, wherein the preset degree is the sensitive degree of the existing sensitive content and requiring desensitization treatment, such as confidential data; and carrying out desensitization processing on the structured data under the condition that the second sensitivity degree of the first to-be-stored data corresponding to the structured data is a preset degree, so as to ensure the safety of the data and avoid the leakage of sensitive contents.
According to another optional technical scheme, the data lake further comprises a third storage area; after storing the structured data in the second sensitivity region, further comprising: in response to the second processing instruction, determining second data to be stored from the second storage area, and determining a corresponding third heat area and a third sensitivity area of the second data to be stored in the second storage area, wherein the second data to be stored is stored in the third sensitivity area in the third heat area; determining a fourth heat area corresponding to the third heat area from the third storage area, and determining a fourth sensitivity area corresponding to the third sensitivity area from the fourth heat area; and storing the second data to be stored into the fourth sensitivity area.
It should be noted that, in the embodiment of the present invention, because a data storage manner with multiple dimensions is adopted in the data lake, that is, in the data lake according to the processing stage of data, in addition to setting the first storage area and the second storage area for the lake entry stage of the data to be stored, a third storage area may also be set for data use, where the third storage area is an area that is set in the data lake and stored when the second data to be stored is used according to the requirement, for example, the third storage area may be a working area or a service area, and the working area may be accompanied by a large amount of data access operations, for example, core operations such as a number of warehouse hierarchical modeling access that requires a large amount of data access operations to use the second data to be stored are completed in the working area; the service area can provide data service for the outside, and efficient data access, authority management and data sharing are the core of the layer, so the service area also requires to use the second data to be stored through a large number of data access operations. The third storage area may be divided into the heat areas again according to the heat types of the data, the sensitivity areas are divided into the heat areas again according to the sensitivity of the data, and the multidimensional division manner of the third storage area is the same as the division manner of the first storage area and the second storage area, which is not described in detail herein. The second data to be stored may be understood as data required to be stored in the second storage area of the third storage area.
Fig. 3 is a diagram of a region partition structure corresponding to a data storage manner in a data lake according to a plurality of dimensions provided in the second embodiment of the present invention. Referring to fig. 3, a data lake may be divided into four storage areas, namely, an original area, a convergence area, a working area, and a service area, and since the original area has more cold data, an HDD may be used as its storage medium; because the cold data of the convergence zone is more, the HDD can be used as a storage medium; because the working area has more hot data, the SSD can be used as the storage medium of the working area; because the service area has more hot data, the SSD can be used as the storage medium of the service area; each storage area is divided into a hot data area, a warm data area and a cold data area; each hot area is divided into a public data area, an internal data area and a confidential data area.
Wherein the second processing instruction may be understood as an instruction that instructs to determine the second data to be stored from the second storage area to store in the third storage area. The third heat area may be understood as a heat area where the second data to be stored is located in the second storage area. The third sensitivity region may be understood as a sensitivity region in which the second data to be stored is located in the third heat region. The fourth heat area can be understood as a heat area in the third storage area into which the second data to be stored needs to be stored. The fourth sensitivity region may be understood as a sensitivity region in the fourth heat region into which the second data to be stored is required to be stored.
It can be understood that, because the third heat area may represent the heat type of the second data to be stored, and when the second data to be stored is stored in the third storage area, the second data to be stored also needs to be stored in the heat area corresponding to the heat type thereof, the heat area in the third storage area having the same heat type as the heat type corresponding to the third heat area is the fourth heat area in which the second data to be stored needs to be stored. The third sensitivity region can represent the sensitivity type of the second data to be stored, and the second data to be stored is required to be stored into the sensitivity region corresponding to the sensitivity type when the second data to be stored is stored into the third heat region, so that the sensitivity region in the fourth heat region with the same sensitivity type as that corresponding to the third sensitivity region is the fourth sensitivity region in which the second data to be stored is required to be stored.
In an embodiment of the present invention, the data lake further comprises a third storage area; in response to the second processing instruction, determining second data to be stored from the second storage area, and determining a corresponding third heat area and a third sensitivity area of the second data to be stored in the second storage area, wherein the second data to be stored is stored in the third sensitivity area in the third heat area; determining a fourth heat area corresponding to the third heat area from the third storage area, and determining a fourth sensitivity area corresponding to the third sensitivity area from the fourth heat area; and storing the second data to be stored into the fourth sensitivity area. According to the technical scheme, the areas in the data lake are further divided according to the data processing stage, the data are stored into the areas further divided according to the data processing stage, and then the data are stored into the storage areas divided according to multiple dimensions such as heat types, sensitivities and the like, so that the data are further efficiently stored in the divided areas, and the data are conveniently managed and used.
EXAMPLE III
Fig. 4 is a flowchart of a data processing method according to a third embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the data processing method further includes: responding to a third processing instruction, determining a third sensitivity degree of each lake data, and determining a second heat type of each lake data according to the data state of each lake data stored in the data lake and the access frequency within a preset time length; determining a target heat area of each lake data in the original storage area according to the second heat type of each lake data, and determining a target sensitivity area of each lake data in the target heat area according to the third sensitivity degree of each lake data; and storing each lake data into a target sensitivity area in a target heat area in the original storage area from an original sensitivity area in an original heat area in the original storage area. The same or corresponding terms as those in the above embodiments are not explained in detail herein.
Referring to fig. 4, the method of this embodiment may specifically include the following steps:
s310, responding to the data storage request, and acquiring initial data to be stored.
S320, determining a first heat type of initial data to be stored, and determining an initial heat area in a first storage area in the data lake according to the first heat type.
S330, determining a first sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree.
And S340, storing the initial data to be stored into the initial sensitivity area.
S350, responding to the third processing instruction, determining a third sensitivity level of each lake data stored in the data lake, and determining a second heat type of each lake data according to the data state of each lake data and the access frequency in the preset time length.
Wherein the third processing instruction can be understood as an instruction that instructs to update the lake data storage area. Lake data can be understood as data stored in a data lake. The third sensitivity level can be understood as the sensitivity level of the lake data. The second heat type is a heat type of the lake data.
It should be noted that the data state, the access frequency within the preset time length, and the sensitivity of each lake data in the data lake may change at any time, and therefore, the third processing instruction generated periodically may be responded to, so as to realize that the storage area corresponding to the heat type and the sensitivity of the lake data is timely stored when the heat type and the sensitivity of the lake data change; it may also be in response to a third processing instruction issued manually.
In the embodiment of the present invention, a third sensitivity level stored in each lake data in the data lake may be determined in response to a third processing instruction, and the sensitivity level may be determined by inputting the lake data into a sensitivity determination model trained in advance; under the condition that the sensitivity degree label exists in the data, when the sensitivity degree of the data changes, the sensitivity degree label also changes, so that the sensitivity degree label of the lake data can be directly read to serve as a third sensitivity degree. And determining a second heat type of each lake data according to the data state of each lake data and the access frequency within the preset time length, wherein the manner of determining the second heat type is the same as that of determining the first heat type, and details are not repeated here.
S360, determining a target heat area of each lake data in the original storage area according to the second heat type of each lake data, and determining a target sensitivity area of each lake data in the target heat area according to the third sensitivity degree of each lake data.
The original storage area may be understood as a storage area divided according to a data processing stage where lake data is located before responding to the third processing instruction, and the original storage area may be the first storage area, the second storage area, or the third storage area. The target heat zone can be understood as a heat zone into which lake data needs are stored. The target sensitivity zone can be understood as the sensitivity zone into which lake data requirements are stored.
And S370, storing each lake data from the original sensitivity area in the original heat area in the original storage area into the target sensitivity area in the target heat area in the original storage area.
It should be noted that the determined second heat type and third sensitivity level can only determine the updated change of the heat area and sensitivity area of the lake data, so that the lake data is stored in the original storage area, and only the heat area and sensitivity area of the lake data may be changed.
According to the technical scheme of the embodiment of the invention, the third sensitivity degree of each lake data is determined in response to a third processing instruction, and the second heat type of each lake data is determined according to the data state of each lake data stored in the data lake and the access frequency in the preset time length; determining a target heat area of each lake data in the original storage area according to the second heat type of each lake data, and determining a target sensitivity area of each lake data in the target heat area according to the third sensitivity of each lake data; and storing each lake data from an original sensitivity area in an original heat area in the original storage area into a target sensitivity area in a target heat area in the original storage area. According to the technical scheme, the storage areas corresponding to the heat types and the sensitivity degrees of the lake data are timely stored when the heat types and the sensitivity degrees of the lake data are changed.
An optional technical solution, where the data processing method further includes: extracting changed metadata of the lake data in response to a change instruction of the metadata of the lake data; and updating the directory corresponding to the data lake according to the changed metadata.
Wherein, the change instruction of the metadata of the lake data can be understood as an instruction indicating that the metadata of the lake data is changed.
In the embodiment of the present invention, the instruction may be a change instruction in response to the metadata of the lake data generated when the metadata of the lake data is monitored to be changed, for example, the instruction may be a change instruction in response to the metadata of the lake data generated when the metadata of the lake data is monitored to be changed by using a Hook (Hook) mechanism. Extracting changed metadata of lake data; because information such as the storage position, historical data, resource searching, file recording and the like of lake data can be acquired through the metadata, the corresponding directory of the data lake can be updated according to the changed metadata.
In the embodiment of the invention, the metadata of the lake data stored in the data lake can be extracted in advance, and the catalog is built by supplementing the metadata management center according to the extracted metadata of the lake data stored in the data lake on the level of a data lake platform so as to update the catalog when the metadata of the lake data changes.
In the embodiment of the invention, the changed metadata of the lake data is extracted by responding to the change instruction of the metadata of the lake data; and updating the directory corresponding to the data lake according to the changed metadata, so that the low-cost construction and directory updating can be realized.
On the basis of the foregoing solution, in another optional technical solution, the data processing method further includes: in response to the viewing request, determining a target entry object in the directory; verifying whether the checking party has checking authority or not according to a sixth sensitivity area stored in lake data corresponding to the target entry object, the identifier of the checking party and a preset checking strategy; and displaying lake data corresponding to the target entry object under the condition that the viewer has the viewing right.
Wherein, the viewing request can be understood as a request indicating to view lake data corresponding to the target entry object. The view request can include a request that the viewer select the target entry object to indicate viewing of the corresponding lake data; the method can also include two parts of indicating a request for viewing the directory and selecting a target entry object in the directory to indicate a request for viewing the corresponding lake data. The target entry object can be understood as a corresponding entry object of the corresponding lake data in the directory viewed by the requirement selected by the viewer. The sixth sensitivity region may be understood as a sensitivity region in which lake data corresponding to the target item object is stored. The identity of the viewing party may be understood as an identity that can characterize the identity of the viewing party, and the identity of the viewing party may be determined by an Office Automation (OA) system. The preset viewing policy may be understood as a preset policy that can determine whether the viewing party has the viewing right, for example, the preset viewing policy may include that the viewing party does not have the viewing right if the viewing party is identified as an insider and the sixth sensitivity region is a confidential data region.
In the embodiment of the invention, a target entry object in the directory can be determined in response to the viewing request, and the lake data corresponding to the target entry object is the lake data which is required to be viewed by the viewer; according to the sixth sensitivity area stored in the lake data corresponding to the target entry object, the identifier of the viewing party and a preset viewing strategy, whether the viewing party has the viewing permission or not can be verified, and the viewing permission indicates that the viewing party has the qualification for viewing the lake data corresponding to the target entry object; and displaying lake data corresponding to the target entry object under the condition that the viewer has the viewing right.
In the embodiment of the invention, under the condition that the viewing party does not have the viewing authority, the related information prompting that the viewing party does not have the viewing authority can be displayed.
In the embodiment of the invention, a target entry object in a directory is determined by responding to a viewing request; verifying whether the checking party has checking authority or not according to a sixth sensitivity region stored in lake data corresponding to the target entry object, the identifier of the checking party and a preset checking strategy; under the condition that the viewing party has the viewing permission, lake data corresponding to the target entry object is displayed, so that display content can be provided for the viewing party according to the identity of the viewing party, and the safety of lake data access is improved.
Fig. 5 is a flowchart of an alternative example in a data processing method provided in the third embodiment of the present invention, and in order to better understand the technical solution of the third embodiment of the present invention, an alternative example is provided here. For example, referring to fig. 5, in response to a data storage request, initial data to be stored is obtained; determining the heat type of initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the heat type; determining the sensitivity degree of initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the sensitivity degree; and storing the initial data to be stored into the initial sensitivity area.
In response to a first processing instruction, determining first data to be stored from a first storage area, and determining the heat type and the sensitivity degree of the first data to be stored; converting the first data to be stored into structured data, judging whether the structured data is confidential data or not, and carrying out desensitization processing on the structured data and updating the data into the structured data under the condition that the structured data is the confidential data; determining a second heat area and a second sensitivity area according to the heat type and the sensitivity of the first data to be stored; the structured data is stored in a second sensitivity zone in the second heat zone.
Determining the sensitivity and heat type of each lake data in response to the third processing instruction; judging whether the sensitivity and the heat type of each lake data are changed compared with the original heat type and the original sensitivity; if the change occurs, the changed lake data is stored in the target sensitivity area in the target heat area in the original storage area corresponding to the changed sensitivity degree and heat type from the original sensitivity area in the original heat area in the original storage area.
Extracting changed metadata of the lake data in response to a change instruction of the metadata of the lake data; and updating the directory corresponding to the data lake according to the changed metadata.
Example four
Fig. 6 is a block diagram of a data processing apparatus according to a fourth embodiment of the present invention, which is configured to execute the data processing method according to any of the above embodiments. The device and the data processing method of the embodiments belong to the same inventive concept, and details which are not described in detail in the embodiments of the data processing device may refer to the embodiments of the data processing method. Referring to fig. 6, the apparatus may specifically include: a to-be-stored data acquisition module 410, an initial heat area determination module 420, an initial sensitivity area determination module 430, and an initial to-be-stored data storage module 440.
The to-be-stored data obtaining module 410 is configured to, in response to a data storage request, obtain initial to-be-stored data;
an initial heat region determining module 420, configured to determine a first heat type of initial data to be stored, and determine an initial heat region in a first storage region in a data lake according to the first heat type;
an initial sensitivity region determining module 430, configured to determine a first sensitivity degree of initial data to be stored, and determine an initial sensitivity region in the initial heat region according to the first sensitivity degree;
an initial data to be stored storage module 440, configured to store the initial data to be stored into the initial sensitivity region.
Optionally, the initial sensitivity region determining module 430 includes:
and the sensitivity degree determining unit is used for inputting the initial data to be stored into a pre-trained sensitivity determining model and determining the first sensitivity degree of the initial data to be stored according to the output result of the sensitivity determining model.
Optionally, the data lake further comprises a second storage area; a data processing apparatus, further comprising:
the first sensitivity region determining module is used for determining first data to be stored from the first storage region in response to the first processing instruction, and determining a corresponding first heat region and a first sensitivity region of the first storage region of the first data to be stored, wherein the first data to be stored is stored in the first sensitivity region in the first heat region;
a second sensitivity region determining module for determining a second heat region corresponding to the first heat region from the second storage region and determining a second sensitivity region corresponding to the first sensitivity region from the second heat region;
and the structured data storage module is used for determining the structured data of the first data to be stored and storing the structured data into the second sensitivity area.
On the basis of the above scheme, optionally, the data processing apparatus further includes:
and the structured data desensitization module is used for desensitizing the structured data under the condition that the second sensitivity degree of the first data to be stored corresponding to the structured data is a preset degree before the structured data are stored in the second sensitivity region.
On the basis of the above scheme, optionally, the data lake further comprises a third storage area; a data processing apparatus, further comprising:
the third sensitivity region determining module is used for responding to a second processing instruction after the structured data are stored in the second sensitivity region, determining second data to be stored from the second storage region, and determining a third heat region and a third sensitivity region corresponding to the second data to be stored in the second storage region, wherein the second data to be stored is stored in the third sensitivity region in the third heat region;
a fourth sensitivity region determining module for determining a fourth heat region corresponding to the third heat region from the third storage region and a fourth sensitivity region corresponding to the third sensitivity region from the fourth heat region;
and the data to be stored is stored in the module, and the data to be stored is used for storing the second data to be stored in the fourth sensitivity area.
On the basis of the above scheme, optionally, the data processing apparatus further includes:
the second heat type determining module is used for responding to a third processing instruction, determining a third sensitivity degree of each lake data stored in the data lake, and determining a second heat type of each lake data according to the data state of each lake data and the access frequency in a preset time length;
the target sensitivity region determining module is used for determining a target heat region of each lake data in the original storage region according to the second heat type of each lake data, and determining a target sensitivity region of each lake data in the target heat region according to the third sensitivity degree of each lake data;
and the lake data storage module is used for storing each lake data from an original sensitivity area in an original heat area in the original storage area into a target sensitivity area in a target heat area in the original storage area.
On the basis of the above scheme, optionally, the data processing apparatus further includes:
the metadata extraction module is used for responding to a change instruction of the metadata of the lake data and extracting the changed metadata of the lake data;
and the catalog updating module is used for updating the catalog corresponding to the data lake according to the changed metadata.
On the basis of the above scheme, optionally, the data processing apparatus further includes:
the target entry object determining module is used for responding to the viewing request and determining a target entry object in the catalogue;
the checking authority checking module is used for checking whether the checking party has checking authority or not according to a sixth sensitivity area stored in the lake data corresponding to the target entry object, the identifier of the checking party and a preset checking strategy;
and the lake data display module is used for displaying the lake data corresponding to the target entry object under the condition that the viewing party has the viewing right.
The data processing device provided by the fourth embodiment of the invention responds to the data storage request through the data to be stored acquisition module to acquire the initial data to be stored; determining a first heat type of initial data to be stored through an initial heat area determining module, and determining an initial heat area in a first storage area in a data lake according to the first heat type; determining a first sensitivity degree of initial data to be stored through an initial sensitivity area determining module, and determining an initial sensitivity area in an initial heat area according to the first sensitivity degree; and storing the initial data to be stored into the initial sensitivity region through the initial data to be stored storage module. The device stores the data into the storage areas divided according to multiple dimensions such as data processing stages, heat types, sensitivities and the like, so that the data are stored in the divided areas with high efficiency, and the data are convenient to manage and use. .
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the data processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE five
FIG. 7 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as a data processing method.
In some embodiments, the data processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak data state expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of data processing, the method comprising:
responding to a data storage request, and acquiring initial data to be stored;
determining a first heat type of the initial data to be stored, and determining an initial heat area in a first storage area in a data lake according to the first heat type;
determining a first sensitivity degree of the initial data to be stored, and determining an initial sensitivity area in the initial heat area according to the first sensitivity degree;
and storing the initial data to be stored into the initial sensitivity area.
2. The method of claim 1, wherein determining the first sensitivity level of the initial data to be stored comprises:
inputting the initial data to be stored into a pre-trained sensitivity determination model, and determining a first sensitivity degree of the initial data to be stored according to an output result of the sensitivity determination model.
3. The method of claim 1, wherein the data lake further comprises a second storage area;
the method further comprises the following steps:
in response to a first processing instruction, determining first data to be stored from the first storage area, and determining a corresponding first heat area and a first sensitivity area of the first data to be stored in the first storage area, wherein the first data to be stored is stored in the first sensitivity area of the first heat area;
determining a second heat region corresponding to the first heat region from the second storage region, and determining a second sensitivity region corresponding to the first sensitivity region from the second heat region;
and determining the structured data of the first data to be stored, and storing the structured data into the second sensitivity area.
4. The method of claim 3, further comprising, prior to said storing said structured data in said second sensitivity region:
and under the condition that the second sensitivity degree of the first data to be stored corresponding to the structured data is a preset degree, desensitizing the structured data.
5. The method of claim 3, wherein the data lake further comprises a third storage area;
after the storing the structured data into the second sensitivity region, further comprising:
in response to a second processing instruction, determining second data to be stored from the second storage area, and determining a corresponding third thermal area and a third sensitivity area of the second data to be stored in the second storage area, wherein the second data to be stored is stored in the third sensitivity area;
determining a fourth heat area corresponding to the third heat area from the third storage area, and determining a fourth sensitivity area corresponding to the third sensitivity area from the fourth heat area;
and storing the second data to be stored into the fourth sensitivity area.
6. The method according to any one of claims 1-5, further comprising:
in response to a third processing instruction, determining a third sensitivity level of each lake data stored in the data lake, and determining a second heat type of each lake data according to the data state of each lake data and the access frequency within a preset time length;
determining a target heat area of each lake data in an original storage area according to the second heat type of each lake data, and determining a target sensitivity area of each lake data in the target heat area according to the third sensitivity of each lake data;
and storing each lake data into the target sensitivity area in the target heat area in the original storage area from the original sensitivity area in the original heat area in the original storage area.
7. The method of claim 6, further comprising:
extracting changed metadata of the lake data in response to a change instruction of the metadata of the lake data;
and updating the directory corresponding to the data lake according to the changed metadata.
8. The method of claim 7, further comprising:
in response to a viewing request, determining a target entry object in the directory;
checking whether the checking party has checking authority or not according to a sixth sensitivity area stored in lake data corresponding to the target entry object, the identifier of the checking party and a preset checking strategy;
and displaying lake data corresponding to the target entry object under the condition that the viewer has the viewing permission.
9. A data processing apparatus, comprising:
the data to be stored acquisition module is used for responding to the data storage request and acquiring initial data to be stored;
the initial heat area determining module is used for determining a first heat type of the initial data to be stored and determining an initial heat area in a first storage area in a data lake according to the first heat type;
the initial sensitivity region determining module is used for determining a first sensitivity degree of the initial data to be stored and determining an initial sensitivity region in the initial heat region according to the first sensitivity degree;
and the initial data to be stored storage module is used for storing the initial data to be stored into the initial sensitivity area.
10. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the data processing method of any one of claims 1-8.
11. A computer-readable storage medium, having stored thereon computer instructions for causing a processor, when executing the computer instructions, to implement a data processing method according to any one of claims 1-8.
CN202211431013.5A 2022-11-15 2022-11-15 Data processing method and device, electronic equipment and storage medium Pending CN115686376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211431013.5A CN115686376A (en) 2022-11-15 2022-11-15 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211431013.5A CN115686376A (en) 2022-11-15 2022-11-15 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115686376A true CN115686376A (en) 2023-02-03

Family

ID=85051432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211431013.5A Pending CN115686376A (en) 2022-11-15 2022-11-15 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115686376A (en)

Similar Documents

Publication Publication Date Title
US10521404B2 (en) Data transformations with metadata
US11269834B2 (en) Detecting quasi-identifiers in datasets
US11741100B2 (en) Providing matching security between data stores in a database system
US20130166543A1 (en) Client-based search over local and remote data sources for intent analysis, ranking, and relevance
CN111512303A (en) Hierarchical graphics data structure
CN107515879B (en) Method and electronic equipment for document retrieval
US11720825B2 (en) Framework for multi-tenant data science experiments at-scale
US10360394B2 (en) System and method for creating, tracking, and maintaining big data use cases
CN104516910A (en) Method and system for recommending content in client-side server environment
Zobaed et al. Big Data in the Cloud.
CN113535677B (en) Data analysis query management method, device, computer equipment and storage medium
CN116842012A (en) Method, device, equipment and storage medium for storing Redis cluster in fragments
US11074266B2 (en) Semantic concept discovery over event databases
CN115686376A (en) Data processing method and device, electronic equipment and storage medium
US11645283B2 (en) Predictive query processing
CN112968876A (en) Content sharing method and device, electronic equipment and storage medium
CN111143328A (en) Agile business intelligent data construction method, system, equipment and storage medium
CN115033187B (en) Big data based analysis management method
CN110895549B (en) Quantized data retrieval method and system
CN114579573B (en) Information retrieval method, information retrieval device, electronic equipment and storage medium
US10896193B2 (en) Cache fetching of OLAP based data using client to client relationships and data encoding
US20240012857A1 (en) Asserted Relationships Matching in an Identity Graph Data Structure
CN116701220A (en) Data synchronization test method and device, electronic equipment and computer readable medium
CN115983222A (en) EasyExcel-based file data reading method, device, equipment and medium
CN117610059A (en) Authority data storage method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination