CN115718571A

CN115718571A - Data management method and device based on multi-dimensional features

Info

Publication number: CN115718571A
Application number: CN202211475101.5A
Authority: CN
Inventors: 陈志标
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-02-28
Anticipated expiration: 2042-11-23
Also published as: CN115718571B; WO2024108639A1

Abstract

The application provides a data management method and device based on multi-dimensional characteristics, the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps: the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request; the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal. So that the user service requirements are met with relatively optimal cost and a simple one-stop solution; providing optimal performance and cost to the user; transparent to upper-level users, users do not need to deploy multiple systems or add external tools to complete the life cycle management of data.

Description

Data management method and device based on multi-dimensional features

Technical Field

The present application relates to the field of data management, and in particular, to a data management method and apparatus based on multidimensional features.

Background

With the development of mobile internet and enterprise digital transformation, enterprises need to manage larger and larger data, and mainly present the requirements of real-time perception of changed data, agile requirements of business on data analysis, longer data period needing to be stored and the like.

Data is of a life cycle, the data shows different characteristics in different stages, and service load characteristics required to be supported are different, and in order to manage enterprise data of an enterprise, the enterprise generally adopts a plurality of systems to manage the data in different periods and different service requirements, and the data flows among different systems through tools such as data integration, data conversion, data migration and the like. By adopting the data management scheme, the main problems are as follows: 1. the same data is stored in a plurality of systems in a redundant mode, the cost is high, and the global consistency of the data is difficult to ensure; 2. data migration between systems requires a user to use a third-party tool and the like, and requires professional technical personnel to implement the data migration, so that the threshold is high, the time delay of data flow is high, and the data migration is difficult to process efficiently under abnormal conditions.

Since data becomes one of production elements, enterprises begin to recognize the value of the data, and hope to store various data generated in the production process of the enterprises, how to manage the massive data with the lowest cost and provide real-time data access capability is a key capability of a digital transformation enterprise, and whether the enterprises can stand in a non-failure place in a severe competitive environment, and in order to reduce the data management cost and threshold of the enterprises, some data management system schemes for fusion management begin to appear, and all the existing schemes only consider one dimension, but cannot realize multiple dimensions.

Disclosure of Invention

In view of the problems, the present application is proposed to provide a multidimensional feature-based data management method and apparatus that overcome or at least partially solve the problems, including:

a data management method based on multi-dimensional characteristics relates to a storage end, a routing end, a management end and an execution end, and comprises the following steps:

the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request;

the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal;

and the management terminal updates or deletes the target data characteristic statistical information corresponding to the target storage terminal according to the first processing instruction.

Further, the step of generating initial data feature statistical information and trigger conditions corresponding to the storage end according to the data access request includes:

the management terminal determines access data in the storage terminal according to the data access request;

the management terminal determines data granularity information corresponding to the access data in the storage terminal according to the access data;

and the management terminal generates initial data characteristic statistical information and the triggering condition corresponding to the storage terminal according to the data granularity information.

Further, the step of generating the task execution request according to the target storage terminal and the trigger condition and sending the task execution request to the execution terminal includes:

the management terminal acquires all data in the storage terminal and determines whether target data meeting the trigger condition exists or not;

when all the data in the storage end have target data which correspondingly meet the triggering condition, the management end generates the task execution request according to the target data;

and the management terminal sends the task execution request to the execution terminal.

The embodiment of the invention also provides a data management method based on multi-dimensional characteristics, which relates to a storage end, a routing end, a management end and an execution end, and comprises the following steps:

when the execution end receives a task execution request sent by the management end, the execution end determines a task type corresponding to the task execution request according to the task execution request;

the execution end acquires target data in the storage end according to a trigger condition in the task execution request;

the execution end performs data processing on the target data according to the task type and a target storage end, wherein the task type comprises migration or deletion;

the execution end generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type, and sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting target data characteristic statistical information corresponding to the target storage terminal at the management terminal;

the execution end generates a second processing instruction responding to the routing end according to the target data, the target storage end and the task type, and sends the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed from the target data to the target storage end in the target storage end at the routing end.

Further, the data processing is performed on the target data according to the task type and the target storage end, wherein the task type includes a migration or deletion step, and the migration or deletion step includes:

when the task type is the migration, the execution end migrates the target data from the storage end to the target storage end according to the task type;

or the like, or, alternatively,

and when the task type is the deletion, the execution end deletes the target data from the target storage end according to the task type, wherein the target storage end is an initial storage end corresponding to the target data.

the routing end determines an access path in response to data access information for upper layer data;

the routing terminal generates the data access request according to the access path and sends the data access request to the management terminal;

and the routing end adds or deletes a target access path from the target storage end according to a second processing instruction sent by the execution end.

The embodiment of the invention also provides a data management device based on multi-dimensional characteristics, which relates to a storage end, a routing end, a management end and an execution end, and comprises the following components:

the receiving module is used for the management terminal to receive the data access request sent by the routing terminal and generate initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request;

the first generation module is used for determining a target storage end by the management end according to the trigger condition, generating a task execution request according to the target storage end and the trigger condition, sending the task execution request to the execution end, and receiving a first processing instruction sent by the execution end;

and the first data processing module is used for updating or deleting the target data characteristic statistical information corresponding to the target storage terminal by the management terminal according to the first processing instruction.

The embodiment of the invention also provides a data management device based on multi-dimensional characteristics, which relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

the first determining module is used for determining the task type corresponding to the task execution request by the execution end according to the task execution request when the execution end receives the task execution request sent by the management end;

the acquisition module is used for acquiring target data in the storage end by the execution end according to a trigger condition in the task execution request;

the second data processing module is used for the execution end to perform data processing on the target data according to the task type and the target storage end, wherein the task type comprises migration or deletion;

the fourth data processing module is used for generating a second processing instruction responding to the routing end by the execution end according to the target data, the target storage end and the task type, and sending the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed from the target data to the target storage end in the target storage end at the routing end.

a second determining module, configured to determine, by the router, an access path in response to data access information for upper layer data;

the second generating module is used for generating the data access request by the routing end according to the access path and sending the data access request to the management end;

and the fifth data processing module is used for adding or deleting a target access path from the target storage end by the routing end according to the second processing instruction sent by the execution end.

An embodiment of the present invention further provides a computer device, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor, the computer program implements the steps of the data management method based on the multidimensional feature.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of data management based on multidimensional features as described above.

The application has the following advantages:

in the embodiment of the application, only one dimension is considered, but multiple dimensions cannot be realized; data is stored in a plurality of systems in a redundant mode, the cost is high, and the global consistency of the data is difficult to ensure; and the problem that the time delay of data flow is high and the data flow is difficult to be efficiently processed under abnormal conditions, the application provides a solution of a data management method based on multi-dimensional characteristics, which specifically comprises the following steps: the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request; the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal; and the management terminal updates or deletes the target data characteristic statistical information corresponding to the target storage terminal according to the first processing instruction. The problems that multi-dimensionality cannot be achieved, data consistency cannot be guaranteed, cost is high and data processing efficiency is low are solved by generating the task execution request according to the target storage end and the trigger condition, data are automatically migrated and managed according to multiple different dimensionalities such as data, business and the like, and the business requirements of a user are met by relatively optimal cost and a simple one-stop solution; the contradiction of mutual exclusion of read-write optimization and the contradiction of data cold and hot and storage cost are solved, and the optimal performance and cost are provided for users; transparent to upper-level users, users do not need to deploy multiple systems or add external tools to complete the life cycle management of data.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings needed to be used in the description of the present application will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a flowchart illustrating steps of a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating steps of a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 4 is a block diagram of a data management apparatus based on multidimensional features according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating an exemplary embodiment of a data management apparatus based on multidimensional features;

FIG. 6 is a block diagram illustrating an exemplary embodiment of a data management apparatus based on multidimensional features;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The inventor discovers by analyzing the prior art that: data has a life cycle, the data shows different characteristics in different stages, and service load characteristics required to be supported are different, an enterprise generally adopts a plurality of systems to manage data in different periods and different service requirements in order to manage enterprise data of the enterprise, and enables the data to flow among different systems through tools such as data integration, data conversion, data migration and the like, for example, when the data is generated, generally in a production system, a Transaction is required to be supported at the moment, a low-delay point check point change is supported, the data is generally supported by a traditional OLTP (On-Line Transaction Processing) database, then the data is integrated into a historical archiving database in real time in order to meet historical detail query of a user, and the archiving database generally needs to be stored in a concentrated mode according to the user so as to perform data retrieval according to the user dimension; then, in order to support report and business intelligence, etc., data is converted and stored in the data warehouse through tools such as ETL (Extract-Transform-Load, which is used to describe the process of passing data from the source end through Extract, transform, load, and the like, i.e., extraction, transform, and Load, to the destination end). With such a data management scheme, the main problems are:

1. the same data is stored in a plurality of systems in a redundant mode, the cost is high, and the global consistency of the data is difficult to guarantee.

2. Data migration among systems needs a user to implement the data migration by using a third-party tool and the like, needs professional technicians to implement the data migration, has a high threshold, causes high data flow delay, and is difficult to efficiently process under abnormal conditions.

The first technical scheme is as follows:

C-Store(Mike Stonebraker，et，al.C-store:a column-oriented DBMS.In VLDB,pages 553–564,2005.)

the storage oriented to the write optimization and the storage oriented to the query optimization are mutually exclusive in design and implementation, for example, the storage oriented to the write generally adopts Rowstore (row storage), the storage oriented to the query generally adopts Column Store (columns Store is a Column-based database system), the C-Store paper firstly proposes a mixed architecture scheme to solve, and the architecture realizes two modules. One module is responsible for handling fast write-able Store (WS), one module is responsible for providing efficient query, read-Optimized Store (RS), while a Tuple move is run in between to continuously migrate data in WS to RS.

This method has been widely used in many large data systems, like SAP HANA, vertica, etc. in commercial systems. The disadvantages of this solution are:

only one dimension can be used, and the fixed mode is to distinguish according to reading and writing, and only the specific problem is solved.

The Tuple Mover merges data from WS to RS in a periodic manner, which may result in frequent merging of data back and forth if the data changes more frequently.

The second prior art scheme is:

and (3) layered storage of cold and hot data: a hierarchical storage method based on time series data cold and hot classification comprises the following steps: CN108268217A; day of publication (announcement): 30/04/2021; the patent divides data into real-time data, day-of-week data, historical data and cold data through a specific time rule, establishes a real-time data layer, a day-of-week data layer, a historical data layer and a cold data layer, and sequentially and respectively stores the data in a memory, an SSD storage hard disk, an HDD storage hard disk and a distributed storage system HDFS, wherein the real-time data is data within 24 hours, the day-of-week data is data within two weeks, the historical data is data within two years, and the cold data is data within 10 years. Various data is periodically batch-loaded from the upper storage layer to the lower storage layer through the batch transfer service.

The disadvantages of this solution are: the data can be simply divided and migrated only according to the time window of the data, and the data migration granularity is coarse; only the data appending and the query are supported, and the data modification is not supported.

In summary, it can be known that in the prior art, only one dimension is considered, but multiple dimensions cannot be realized; data are stored in a plurality of systems in a redundant mode, the cost is high, and the global consistency of the data is difficult to ensure; and the time delay of data flow is higher, and the problem that the data flow is difficult to be processed efficiently under abnormal conditions is solved by the data management method based on the multidimensional characteristic provided by the invention; the details are as follows:

referring to fig. 1, a flowchart illustrating steps of a data management method based on multidimensional features according to an embodiment of the present application is shown;

s110, the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request;

s120, the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal;

s130, the management terminal updates or deletes the target data characteristic statistical information corresponding to the target storage terminal according to the first processing instruction.

Next, a data management method based on multi-dimensional features in the present exemplary embodiment will be further described.

In an embodiment of the invention, more than 2 storage engines are included in the storage end, and each storage engine provides data management capability oriented to data feature optimization of different dimensions.

In an embodiment of the present invention, the management end is configured to manage and define data management capability and a lifecycle management policy of each storage engine, that is, a storage end, track data optimization feature information, that is, data feature statistical information, of each storage engine, and trigger a data management task, such as data migration, when a definition rule, that is, a trigger condition, is satisfied.

In an embodiment of the present invention, the management end receives a data access request sent by the routing end, and generates initial data feature statistical information and a trigger condition corresponding to the storage end according to the data access request, where the initial data feature statistical information includes initial dimension information, initial optimization feature information, and initial granularity information corresponding to the storage end; mapping to the data granularity defined by the engine according to the data in the data access request sent by the routing end in the execution process, and updating the data granularity, namely granularity information; each storage end has corresponding data characteristic statistical information.

In an embodiment of the present invention, the data access request includes, but is not limited to, data query, data access, data addition, and data deletion.

In an embodiment of the present invention, the step of generating initial data feature statistical information and a trigger condition corresponding to the storage end according to the data access request includes:

and the management terminal generates initial data characteristic statistical information and the trigger condition corresponding to the storage terminal according to the data granularity information.

In a specific embodiment, according to data to be accessed, that is, access data, in a data access request sent by a router during execution, the data is mapped to data granularity, that is, granularity information, defined by the storage engine, that is, a storage end, and the data granularity, that is, the granularity information, is updated.

In a specific embodiment, when the data access request is new data, the initial data feature statistical information corresponding to the storage end for storing the new data can be generated through the new data.

In one embodiment, the features of the data are defined by classification through dimension information, including but not limited to:

dimension one: defining data as dynamic data and static data according to the change dimension of the data, namely the change dimension of the data;

dimension two: defining data as hot data, warm data and cold data according to the access frequency of the data, namely data access frequency dimensionality;

dimension three: according to the business characteristics of the data, the data is defined as retrieval data and analysis data, namely data query business dimensionality.

In one embodiment, the optimization characteristic information refers to an optimization characteristic of the engine, including but not limited to:

optimizing performance based on thermal data characteristics; cost optimization based on cold data characteristics; optimizing based on dynamic data characteristics; optimizing based on static data characteristics; optimizing based on the retrieval access characteristics.

In one embodiment, the granularity information refers to the granularity of statistical units of data, including but not limited to:

counting according to each row of the data table; counting according to each page/block of the data table; counting according to a certain range interval of the data table, such as time; counting data pieces according to a data table; and counting according to the partitions of the data table.

In an embodiment of the present invention, when the data access request is new data, initial data feature statistical information corresponding to the storage end for storing the new data can be generated by the new data, where the initial data feature statistical information includes initial dimension information, initial optimized feature information, and initial granularity information, and is used to define a feature statistical management structure corresponding to the storage end for storing the new data; namely, each storage terminal has corresponding data characteristic statistical information.

In an embodiment of the present invention, the step of generating the task execution request according to the target storage end and the trigger condition, and sending the task execution request to the execution end includes:

the management terminal acquires all data in the storage terminal and determines whether target data meeting the triggering condition exists or not;

In an embodiment of the present invention, it is determined whether there is data satisfying the trigger condition in all data in the storage end through the trigger condition in the storage end, where the trigger condition includes a periodic or external event, specifically: periodicity refers to a time period such as: triggering according to every minute, triggering according to every hour, triggering according to a certain specific moment every day, and triggering according to the month; external events generally refer to statistical information changes caused by upper layer data access reaching a certain threshold trigger, such as: and counting that no access is performed on certain data for two consecutive weeks, and triggering the data to be migrated to a certain engine.

In an embodiment of the present invention, the management end updates or deletes the data characteristic statistical information corresponding to the target storage end according to the first processing instruction; the management terminal updates or deletes the data characteristic statistical information of the target storage terminal by receiving a first processing instruction returned by the execution terminal, wherein the updating or deleting is carried out according to the task type corresponding to the first processing instruction;

in a specific embodiment, when the task type is migration, the data characteristic statistical information is updated in the target storage terminal.

In another embodiment, when the task type is delete, the data characteristic statistics are deleted in the target storage.

Referring to fig. 2, a flowchart illustrating steps of a data management method based on multidimensional features according to an embodiment of the present application is shown;

s210, when the execution end receives a task execution request sent by the management end, the execution end determines a task type corresponding to the task execution request according to the task execution request;

s220, the execution end acquires target data in the storage end according to a trigger condition in the task execution request;

s230, the execution end performs data processing on the target data according to the task type and a target storage end, wherein the task type comprises migration or deletion;

s240, the execution end generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type, and sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting target data characteristic statistical information corresponding to the target storage terminal at the management terminal;

s250, the execution end generates a second processing instruction responding to the routing end according to the target data, the target storage end and the task type, and sends the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed from the target data to the target storage end in the target storage end at the routing end.

In an embodiment of the present invention, the performing data processing on the target data according to the task type and the target storage end, where the task type includes a migration or deletion step, and includes:

or the like, or, alternatively,

In an embodiment of the present invention, the execution end obtains the target data in the storage end according to the trigger condition; acquiring target data in a storage end through a trigger condition; in a specific embodiment, all data in the storage terminal are acquired, and target data are screened from all the data through a trigger condition, where the trigger condition may be a preset threshold, and the data greater than the preset threshold is determined as the target data.

In an embodiment of the present invention, when the task type is the migration, the execution end migrates the target data from the storage end to the target storage end according to the task type; in a specific embodiment, when the task type is migration, all data, that is, target data, of which the storage end reaches a threshold value is copied to a destination storage end in the storage end according to defined data granularity, that is, granularity information.

In an embodiment of the present invention, when the task type is the deletion, the execution end deletes the target data from the target storage end according to the task type, where the target storage end is an initial storage end corresponding to the target data; in a specific embodiment, when the task type is deletion, all data, namely target data, of which the target storage end reaches a threshold value is deleted in the target storage end according to defined data granularity, namely granularity information; specifically, when the task type is delete, the target storage is an initial storage corresponding to the target data.

In an embodiment of the present invention, the execution end generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type;

the execution end sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting the corresponding data characteristic statistical information in the target storage end at the management end.

In a specific embodiment, after all data, namely target data, of which the storage end reaches a threshold value is copied to a target storage end, a first processing instruction is generated according to a task type and is sent to a management end;

in an embodiment of the present invention, the execution end generates a second processing instruction responding to the routing end according to the target data, the target storage end, and the task type;

the execution end sends the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed from the target data to the target storage end in the target storage end at the routing end;

in a specific embodiment, after all data reaching the threshold value in the target storage terminal, that is, the target data is deleted, a second processing instruction is generated according to the task type and is sent to the routing terminal.

Referring to fig. 3, a flowchart illustrating steps of a data management method based on multidimensional features according to an embodiment of the present application is shown;

s310, the routing terminal determines an access path in response to the data access information aiming at the upper layer data;

s320, the routing terminal generates the data access request according to the access path and sends the data access request to the management terminal;

s330, the routing end adds or deletes a target access path from the target storage end according to a second processing instruction sent by the execution end.

In an embodiment of the present invention, the routing end is configured to determine which storage end the data is stored in, and may transparently route the upper layer data access request to the corresponding storage end for execution, and route the newly added data to the storage end; in one embodiment, the storage end comprises an initial storage end and a plurality of target storage ends, and the newly added data is routed to the initial storage end.

In one embodiment of the invention, the router determines an access path in response to data access information for upper layer data; the access path can be used for knowing which storage terminals need to be accessed; the method comprises the following steps of determining which storage end is accessed according to data access information in a routing end according to data conditions needing to be accessed; such as: for the first newly added data, the data are put into an initial storage end, and routing information, namely an access path, is recorded in the initial storage end, and the routing information is updated when the data are migrated between the storage ends; for subsequent access, the storage end where the data to be accessed is located can be known according to the routing end.

In one embodiment of the invention, the data access request refers to that the data access generally comprises data add/delete and modification (the database is generally represented by DML) and data definition change (the database is generally represented by DDL); and generating corresponding target requests according to the data addition, deletion, check or change.

In an embodiment of the present invention, the routing end generates the data access request according to the access path, and sends the data access request to the management end.

In an embodiment of the present invention, the routing end adds or deletes a target access path from the target storage end according to a second processing instruction sent by the execution end; in a specific embodiment, when the task type is migration, adding an access path formed between target data from a storage end to a target storage end into the target storage end; in another specific embodiment, when the task type of the data task is a delete task, all access paths corresponding to the target data are deleted in the storage terminal.

Example one

The following is an example of 3 storage engines, namely storage terminals, optimized by different dimensional characteristics, which can be easily extended to a larger number of storage engines, namely storage terminals, and meanwhile, the dimensional characteristics are only 3 dimensions, and it should also be easy to extend the storage engines to more dimensions or perform corresponding deformation.

The following three storage engines, namely storage ends, are added, and the relevant attributes are respectively as follows:

the storage engine 1: changing dimensionality, dynamic data characteristics, recording level granularity, enabling a trigger threshold value to be 30 days without change, enabling a data task to be migration, and enabling a target storage end to be a storage engine 2;

the storage engine 2: accessing frequency dimension, thermal data characteristics, data batch granularity, wherein a trigger threshold value is that the data temperature is 20 (the access frequency is low), an attenuation period is that the data temperature is reduced by 1 degree per hour, the attenuation period is that the attenuation period is reduced by 10 degrees per hour, and the attenuation period is 720 degrees at most (the following description is simplified, a complex model such as Newton's cooling law can also be used for attenuation), a data task is migration, and a target storage end is a storage engine 3;

the storage engine 3: the method comprises the steps of accessing time dimension, hot data characteristics, data file granularity, aging of a trigger threshold value of 10 years, deleting a data task and leaving no target storage terminal.

Establishing a data structure of the latest update time of each updated record for the storage engine 1 as the data characteristic statistical information of the storage engine 1;

establishing a data structure of the access temperature of each batch of data for the storage engine 2 as data characteristic statistical information of the storage engine 2;

establishing a data structure of each data file creation time for the storage engine 3 as data characteristic statistical information of the storage engine 3;

suppose that day 1/8/2012 inserts 10 data items as follows:

ID	NAME	SALARY
			1	LIU	30000
2	LI	9000
			3	WANG	20000
4	ZHAO	10000
			5	HAN	500000
6	FAN	4000
			7	GUO	60000
8	MAO	7000
			9	CAI	40000
10	GAO	12000

initially, all of the 10 pieces of data are inserted into the storage engine 1, i.e., the storage side, and the corresponding routing information is updated.

And 8, 8 days in 2012, updating the data with the ID of 8, for example, updating Salary from 7000 to 10000, wherein the data feature statistical information of the storage engine 1 has the new ID of 8, the updating time of 8 days in 2012, and the subsequent date of 31 days in 8 months in 2012, and other data are not updated.

At this time, the regular task triggers the data task of the storage engine 1, all the data except the record with the ID of 8 are migrated to the storage engine 2, the data are deleted from the storage engine 1, the routing information is updated, the routes with the IDs of 1-7 and the IDs of 9-10 are updated to the storage engine 2, the IDs of 1-5 are used as a data batch A, the data with the IDs of 6, 7, 9 and 10 are used as a data batch B, and the temperature of the initialization data batch A and the data batch 2 is 100.

If the user wants to access the data, but accesses the data with the ID of 8, the data is automatically routed to the storage engine 1, and other data accesses the storage engine 2; assuming that 10 data of data batch B are queried on the day of 31/8/2012, the temperature in the statistical information of the data characteristics of data batch B is 200 degrees, and the data temperature of data batch a is 100 degrees.

The data of the data batch A is migrated from the storage engine 2 to the storage engine 3, the route is updated, namely the route with the ID of 1-5 is changed into the route to the storage engine 3, the storage engine 3 stores the ID of 1-5 to the data file Z as a whole, and the creation time of the data file Z with the updated characteristic statistics is 2012, 9, 4 days.

At this time, if the user accesses the data, automatic routing is realized: data with an ID of 8 is stored in the storage engine 1 (assuming that the ID of 8 is always updated), data with IDs of 6, 7, 9 and 10 is stored in the storage engine 2 (assuming that the batch 2 is always queried) and data with IDs of 1 to 5 is stored in the storage engine 3.

Finally, on the 9 th and 4 th 2022 years, the data file Z will trigger a data deletion task because it reaches the 10 th threshold condition, and delete the data file Z from the storage engine 3 directly, and delete the relevant routing information and feature statistical information. At this time, the user no longer accesses the data having the IDs 1 to 5.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Referring to fig. 4, a block diagram of a data management apparatus based on multidimensional features according to an embodiment of the present application is shown;

a data management device based on multidimensional characteristics, the device relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

a receiving module 410, configured to receive, by the management end, a data access request sent by the routing end, and generate initial data feature statistical information and a trigger condition corresponding to the storage end according to the data access request;

a first generating module 420, configured to determine a target storage end by the management end according to the trigger condition, generate the task execution request according to the target storage end and the trigger condition, send the task execution request to the execution end, and receive a first processing instruction sent by the execution end;

the first data processing module 430 is configured to update or delete, by the management terminal, the target data feature statistical information corresponding to the target storage terminal according to the first processing instruction.

In an embodiment of the present invention, the receiving module 410 includes:

the first determining submodule is used for determining access data in the storage end by the management end according to the data access request;

the second determining submodule is used for determining data granularity information corresponding to the access data in the storage end by the management end according to the access data;

and the first updating submodule is used for the management terminal to generate initial data characteristic statistical information and the triggering condition corresponding to the storage terminal according to the data granularity information.

In an embodiment of the present invention, the first generating module 420 includes:

a third determining submodule, configured to, by the management terminal, obtain all data in the storage terminal, and determine whether target data meeting the trigger condition exists;

the first generation submodule is used for generating the task execution request by the management terminal according to the target data when all the data in the storage terminal have the target data which correspondingly meets the trigger condition;

and the first sending submodule is used for sending the task execution request to the execution end by the management end.

Referring to fig. 5, a block diagram of a data management apparatus based on multidimensional features according to an embodiment of the present application is shown;

a first determining module 510, configured to, when the executing end receives a task execution request sent by the managing end, determine, by the executing end, a task type corresponding to the task execution request according to the task execution request;

an obtaining module 520, configured to obtain, by the execution end, target data in the storage end according to a trigger condition in the task execution request;

a second data processing module 530, configured to perform data processing on the target data by the execution end according to the task type and a target storage end, where the task type includes migration or deletion;

a third data processing module 540, configured to generate, by the execution end, a first processing instruction in response to the management end according to the target data, the target storage end, and the task type, and send the first processing instruction to the management end; the first processing instruction is used for updating or deleting target data characteristic statistical information corresponding to the target storage terminal at the management terminal;

a fourth data processing module 550, configured to generate, by the execution end, a second processing instruction in response to the routing end according to the target data, the target storage end, and the task type, and send the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed from the target data to the target storage end in the target storage end at the routing end.

In an embodiment of the present invention, the second data processing module 530 includes:

the first execution sub-module is used for migrating the target data from the storage end to the target storage end by the execution end according to the task type when the task type is the migration;

or the like, or, alternatively,

a second execution submodule, configured to, when the task type is the deletion, delete the target data from the target storage end by the execution end according to the task type, where the target storage end is an initial storage end corresponding to the target data

Referring to fig. 6, a block diagram of a data management apparatus based on multidimensional features according to an embodiment of the present application is shown;

a second determining module 610, configured to determine an access path in response to the data access information for the upper layer data by the router;

a second generating module 620, configured to generate, by the routing end, the data access request according to the access path, and send the data access request to the management end;

a fifth data processing module 630, configured to add or delete, by the router, a target access path from the target storage end according to the second processing instruction sent by the execution end.

Referring to fig. 7, a computer device of a data management method based on multidimensional features of the present invention is shown, which may specifically include the following:

the computer device 12 described above is embodied in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, and a processor or local bus 18 using any of a variety of bus 18 architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus 18, micro-channel architecture (MAC) bus 18, enhanced ISA bus 18, audio Video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as "hard drives"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 18 by one or more data media interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, with the program modules 42 configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), with one or more devices that enable an operator to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN)), a Wide Area Network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As shown, the network adapter 20 communicates with the other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, and data backup storage systems 34, etc.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a multidimensional feature-based data management method provided by the embodiment of the present invention.

That is, the processing unit 16 implements, when executing the program,: the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request; the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal; and the management terminal updates or deletes the target data characteristic statistical information corresponding to the target storage terminal according to the first processing instruction.

In an embodiment of the present invention, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a multidimensional feature-based data management method as provided in all embodiments of the present application:

that is, the program when executed by the processor implements: the management terminal receives a data access request sent by the routing terminal and generates initial data characteristic statistical information and trigger conditions corresponding to the storage terminal according to the data access request; the management terminal determines a target storage terminal according to the trigger condition, generates the task execution request according to the target storage terminal and the trigger condition, sends the task execution request to the execution terminal, and receives a first processing instruction sent by the execution terminal; and the management terminal updates or deletes the target data characteristic statistical information corresponding to the target storage terminal according to the first processing instruction.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the operator's computer, partly on the operator's computer, as a stand-alone software package, partly on the operator's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the operator's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal apparatus that comprises the element.

The foregoing describes in detail a method and an apparatus for data management based on multidimensional features provided by the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A data management method based on multi-dimensional features is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps:

2. The method of claim 1, wherein the step of generating initial data characteristic statistics and trigger conditions corresponding to the storage side based on the data access request comprises:

3. The method according to claim 1, wherein the step of generating the task execution request according to the target storage terminal and the trigger condition and sending the task execution request to the execution terminal comprises:

4. A data management method based on multi-dimensional features is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps:

5. The method according to claim 4, wherein the data processing on the target data according to the task type and the target storage end, wherein the task type includes a migration or deletion step, and includes:

or the like, or a combination thereof,

6. A data management method based on multi-dimensional features is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps:

7. A data management device based on multidimensional characteristics, which is characterized in that the device relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

8. A data management device based on multi-dimensional features, the device relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

9. A data management device based on multi-dimensional features, the device relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

the second generation module is used for generating the data access request by the routing terminal according to the access path and sending the data access request to the management terminal;

10. A computer device comprising a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the method of any one of claims 1 to 6.