CN115718571B

CN115718571B - Data management method and device based on multidimensional features

Info

Publication number: CN115718571B
Application number: CN202211475101.5A
Authority: CN
Inventors: 陈志标
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-08-22
Anticipated expiration: 2042-11-23
Also published as: CN115718571A

Abstract

The application provides a data management method and a device based on multidimensional characteristics, wherein the method relates to a storage end, a routing end, a management end and an execution end, and comprises the following steps: the management terminal receives the data access request sent by the routing terminal and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request; the management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end. So that the user traffic demands are met with a relatively optimal cost and simple one-stop solution; providing optimal performance and cost to the user; transparent to upper users, users do not need to deploy multiple systems or add external tools to complete lifecycle management of the data.

Description

Data management method and device based on multidimensional features

Technical Field

The application relates to the field of data management, in particular to a data management method and device based on multidimensional features.

Background

With the development of mobile internet and enterprise digital transformation, the enterprise needs to manage larger data, mainly presents the demands of real-time perception of changing data, agile service analysis requirements on data, and longer data period to be stored, so that in order to realize the demands, the enterprise often needs to put in a very complex data management system with large labor cost and invest a large amount of resources to store the massive data, the burden is more and more difficult for large enterprises, and the burden is more and more small enterprises, so that the large data becomes unique to a small number of enterprises with financial resources and technical resources, and the general enterprises cannot really enjoy the improvement of the productivity brought by the large data.

The data has life cycle, the data shows different characteristics at different stages, the service load characteristics required to be supported are also different, in order to manage the enterprise data, the enterprise generally adopts a plurality of systems to manage the data with different periods and different service demands, and the data flows among different systems through tools such as data integration, data conversion, data migration and the like. By adopting the data management scheme, the main problems are as follows: 1. the same data is stored in a plurality of systems in a redundant way, the cost is high, and the global consistency of the data is difficult to ensure; 2. the migration of data between systems requires users to implement by using third party tools and the like, requires professional technicians to implement, has a high threshold, has high delay of data flow, and is often difficult to efficiently process under abnormal conditions.

As data has become one of the production elements, enterprises begin to recognize the value of the data, hope to save various data generated in the production process of the enterprises, how to manage the mass data with the lowest cost and provide real-time data access capability is a key capability of digital conversion enterprises, and is also a key whether the enterprises can stand still in a fierce competition environment, so as to reduce the data management cost and threshold of the enterprises, and begin to present some data management system schemes for fusion management, wherein all the existing schemes only consider a certain dimension but cannot realize multiple dimensions.

Disclosure of Invention

In view of the foregoing, the present application has been developed to provide a method and apparatus for multidimensional feature-based data management that overcomes or at least partially solves the foregoing, including:

a data management method based on multidimensional features, the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps:

the management terminal receives the data access request sent by the routing terminal and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request;

The management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end;

and the management end updates or deletes the target data characteristic statistical information corresponding to the target storage end according to the first processing instruction.

Further, the step of generating initial data feature statistics and triggering conditions corresponding to the storage end according to the data access request includes:

the management end determines access data in the storage end according to the data access request;

the management end determines data granularity information corresponding to the access data in the storage end according to the access data;

and the management end generates initial data characteristic statistical information and the triggering condition corresponding to the storage end according to the data granularity information.

Further, the step of generating the task execution request according to the target storage end and the trigger condition and sending the task execution request to the execution end includes:

The management end obtains all data in the storage end and determines whether target data meeting the triggering condition exists or not;

when all the data in the storage end have target data which correspondingly meet the triggering condition, the management end generates the task execution request according to the target data;

and the management end sends the task execution request to the execution end.

The embodiment of the invention also provides a data management method based on the multidimensional characteristic, the method relates to a storage end, a routing end, a management end and an execution end, and the method comprises the following steps:

when the execution end receives a task execution request sent by the management end, the execution end determines a task type corresponding to the task execution request according to the task execution request;

the execution end obtains target data from the storage end according to the trigger condition in the task execution request;

the execution end carries out data processing on the target data according to the task type and the target storage end, wherein the task type comprises migration or deletion;

the execution end generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type, and sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting the target data characteristic statistical information corresponding to the target storage end at the management end;

The execution end generates a second processing instruction responding to the routing end according to the target data, the target storage end and the task type, and sends the second processing instruction to the routing end; the second processing instruction is configured to add or delete, at the routing end, a target access path corresponding to a target access path formed from the target data to the target storage end.

Further, the data processing is performed on the target data according to the task type and the target storage end, where the task type includes a step of migration or deletion, and the method includes:

when the task type is the migration, the execution end migrates the target data from the storage end to the target storage end according to the task type;

or alternatively, the first and second heat exchangers may be,

and when the task type is the deletion, deleting the target data from the target storage end by the execution end according to the task type, wherein the target storage end is an initial storage end corresponding to the target data.

The router end responds to the data access information aiming at the upper layer data to determine an access path;

the routing end generates the data access request according to the access path and sends the data access request to the management end;

and the router adds or deletes the target access path from the target storage end according to the second processing instruction sent by the execution end.

The embodiment of the invention also provides a data management device based on the multidimensional characteristic, the device relates to a storage end, a routing end, a management end and an execution end, and the device comprises:

the receiving module is used for receiving the data access request sent by the router by the management end and generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request;

the first generation module is used for determining a target storage end by the management end according to the trigger condition, generating the task execution request according to the target storage end and the trigger condition, sending the task execution request to the execution end, and receiving a first processing instruction sent by the execution end;

and the first data processing module is used for updating or deleting the target data characteristic statistical information corresponding to the target storage end by the management end according to the first processing instruction.

the first determining module is used for determining a task type corresponding to the task execution request according to the task execution request when the execution end receives the task execution request sent by the management end;

the acquisition module is used for the execution end to acquire target data from the storage end according to the trigger condition in the task execution request;

the second data processing module is used for the execution end to perform data processing on the target data according to the task type and the target storage end, wherein the task type comprises migration or deletion;

the third data processing module is used for generating a first processing instruction responding to the management end by the execution end according to the target data, the target storage end and the task type, and sending the first processing instruction to the management end; the first processing instruction is used for updating or deleting the target data characteristic statistical information corresponding to the target storage end at the management end;

The fourth data processing module is used for generating a second processing instruction responding to the routing end by the execution end according to the target data, the target storage end and the task type, and sending the second processing instruction to the routing end; the second processing instruction is configured to add or delete, at the routing end, a target access path corresponding to a target access path formed from the target data to the target storage end.

the second determining module is used for determining an access path by the routing end in response to the data access information aiming at the upper layer data;

the second generation module is used for generating the data access request by the routing end according to the access path and sending the data access request to the management end;

and the fifth data processing module is used for adding or deleting the target access path from the target storage end according to the second processing instruction sent by the execution end by the routing end.

The embodiment of the invention also provides computer equipment, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the data management method based on the multi-dimensional characteristics.

Embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a step of data management based on a multi-dimensional feature as described above.

The application has the following advantages:

in the embodiment of the application, compared with the prior art, only one dimension is considered, and the multi-dimension cannot be realized; the data are stored in a plurality of systems in a redundant way, the cost is high, and the global consistency of the data is difficult to ensure; the application provides a solution of a data management method based on multidimensional characteristics, which is specifically as follows: the management terminal receives the data access request sent by the routing terminal and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request; the management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end; and the management end updates or deletes the target data characteristic statistical information corresponding to the target storage end according to the first processing instruction. The task execution request is generated according to the target storage end and the triggering condition, so that the problems that multiple dimensions cannot be achieved, data consistency cannot be guaranteed, the cost is high and the data processing is low in efficiency are solved, and data are automatically migrated and managed according to multiple different dimensions such as data and business, so that the user business requirement is met by a relatively optimal cost and a simple one-stop solution; the contradiction between read-write optimization mutual exclusion and the contradiction between data cooling and heating and storage cost are solved, and optimal performance and cost are provided for users; transparent to upper users, users do not need to deploy multiple systems or add external tools to complete lifecycle management of the data.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart illustrating a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for managing data based on multidimensional features according to an embodiment of the present application;

FIG. 4 is a block diagram of a data management device based on multidimensional features according to an embodiment of the present application;

FIG. 5 is a block diagram of a data management device based on multidimensional features according to an embodiment of the present application;

FIG. 6 is a block diagram of a data management device based on multidimensional features according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order that the manner in which the above recited objects, features and advantages of the present application are obtained will become more readily apparent, a more particular description of the application briefly described above will be rendered by reference to the appended drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The inventors found by analyzing the prior art that: the data has life cycle, the data shows different characteristics at different stages, the service load characteristics required to be supported are different, in order to manage the enterprise data, an enterprise generally adopts a plurality of systems to manage the data with different periods and different service requirements, and flows among different systems through data integration, data conversion, data migration and other tools, for example, when the data is just generated, the data generally needs to support transactions in a production system, point search point change with low time delay is supported, the data is generally supported by adopting a traditional OLTP (On-Line Transaction Processing, online transaction processing) database, then the data is integrated into a history archiving database in real time in order to meet the history detail query of a user, and the archiving database generally needs to be stored in a centralized way according to the user so as to carry out data retrieval according to the user dimension; then, in order to support report forms, business intelligence, etc., data is converted and stored in a data warehouse through tools such as ETL (Extract-Transform-Load) to describe a process of extracting, transforming, and loading data from a source end to a destination end. With such a data management scheme, the main problems are:

1. The same data is stored in a plurality of systems in a redundant way, the cost is high, and the global consistency of the data is difficult to ensure.

2. The migration of data between systems requires users to implement by using third party tools and the like, requires professional technicians to implement, has a high threshold, has high delay of data flow, and is often difficult to efficiently process under abnormal conditions.

The first technical scheme is as follows:

C-Store(Mike Stonebraker，et，al.C-store:a column-oriented DBMS.In VLDB,pages 553–564,2005.)

write-optimization-oriented storage and query-optimization-oriented storage are mutually exclusive in design and implementation, such as write-oriented storage, which is generally implemented by Rowstore (row storage), query-oriented storage, which is generally implemented by Column Store (Column Store is a Column-based database system), and a mixed architecture scheme is first proposed in the C-Store paper, and the architecture of the mixed architecture realizes two modules. One module is responsible for handling fast write (Writeable Store, WS) and one module is responsible for providing efficient queries (Read-Optimized Store, RS) while running a single copy move in between to migrate data in WS to RS.

This approach has been widely used in many big data systems, like SAP HANA, vertical, etc. in commercial systems. The disadvantage of this solution is:

Only one dimension is possible and the fixing is to distinguish between reading and writing, only solving this particular problem.

The multiple Mover merges data from WS to RS in a periodic manner, which may result in frequent round-trip merging of data if the data changes more frequently.

The second technical scheme is as follows:

and (3) layering and storing cold and hot data: a hierarchical storage method based on time sequence data cold-hot classification, announcement number: CN108268217a; publication (bulletin) day: 2021, 04, 30; the patent is characterized in that data are divided into real-time data, sunday data, historical data and cold data through a specific time rule, a real-time data layer, a sunday data layer, a historical data layer and a cold data layer are established and are sequentially stored in a memory, an SSD storage hard disk, an HDD storage hard disk and a distributed storage system HDFS respectively, wherein the real-time data are data within 24 hours, the sunday data are data within two weeks, the historical data are data within two years, and the cold data are data within 10 years. The various data are periodically imported from the upper storage layer to the lower storage layer in batches through a batch transfer service.

The disadvantage of this solution is: the data can be simply divided and migrated only according to the time window of the data, and the data migration granularity is relatively coarse; only append and query of data are supported, and modification of data is not supported.

In summary, it can be known that only one dimension is considered in the prior art, but multiple dimensions cannot be realized; the data are stored in a plurality of systems in a redundant way, the cost is high, and the global consistency of the data is difficult to ensure; the data management method based on the multidimensional features is used for solving the problems; the specific steps are as follows:

referring to fig. 1, a flowchart illustrating steps of a data management method based on a multidimensional feature according to an embodiment of the present application is shown;

s110, the management end receives a data access request sent by the routing end and generates initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request;

s120, the management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end;

And S130, the management end updates or deletes the target data characteristic statistical information corresponding to the target storage end according to the first processing instruction.

Next, a data management method based on a multi-dimensional feature in the present exemplary embodiment will be further described.

In one embodiment of the invention, the storage end comprises more than 2 storage engines, and each storage engine provides data management capability of data feature optimization oriented to different dimensions.

In an embodiment of the present invention, the management end is configured to manage and define a data management capability and a life cycle management policy of each storage engine, that is, the storage end, and track data optimization feature information, that is, data feature statistics information, of each storage engine, and trigger data management tasks, such as data migration, when a definition rule, that is, a trigger condition is satisfied.

In an embodiment of the present invention, the management end receives a data access request sent by the routing end, and generates initial data feature statistics information and triggering conditions corresponding to the storage end according to the data access request, where the initial data feature statistics information includes initial dimension information, initial optimization feature information and initial granularity information corresponding to the storage end; according to the data in the data access request sent by the routing end in the execution process, mapping the data to the data granularity defined by the engine, and updating the data granularity, namely granularity information; each storage end has corresponding data characteristic statistical information.

In one embodiment of the invention, the data access request includes, but is not limited to, data query, data access, data addition, data deletion.

In an embodiment of the present invention, the step of generating initial data feature statistics and triggering conditions corresponding to the storage terminal according to the data access request includes:

In a specific embodiment, according to the data to be accessed, i.e. the access data, in the data access request sent by the routing end in the execution process, mapping the data granularity, i.e. the granularity information, defined by the storage engine, i.e. the storage end, and updating the data granularity, i.e. the granularity information.

In a specific embodiment, when the data access request is newly added data, the initial data feature statistical information corresponding to the storage end for storing the newly added data can be generated through the newly added data.

In one embodiment, the characteristics of the data are defined by classification by dimension information, including but not limited to:

dimension one: defining the data as dynamic data and static data according to the change dimension of the data, namely the data change dimension;

dimension two: defining the data as hot data, warm data and cold data according to the access frequency of the data, namely the dimension of the data access frequency;

dimension three: according to the business characteristics of the data, the data is defined as search data and analysis data, namely, the dimension of the data query business.

In a specific embodiment, the optimization feature information refers to the optimization features of the engine, including but not limited to:

optimizing performance based on thermal data characteristics; optimizing the cost based on the cold data characteristics; optimizing based on dynamic data characteristics; optimizing based on static data characteristics; access feature optimization is based on the retrieval.

In a particular embodiment, granularity information refers to the statistical unit granularity of data, including but not limited to:

counting according to each row of the data table; statistics per page/block of the data table; statistics, such as time, are performed according to a certain range interval of the data table; counting according to the data sheets of the data table; statistics are performed by partition of the data table.

In an embodiment of the present invention, when the data access request is newly added data, initial data feature statistics information corresponding to the storage end for storing the newly added data can be generated through the newly added data, where the initial data feature statistics information includes initial dimension information, initial optimization feature information, and initial granularity information, which are used to define a feature statistics management structure corresponding to the storage end for storing the newly added data; that is, each storage end has corresponding data feature statistical information.

In an embodiment of the present invention, the step of generating the task execution request according to the target storage end and the trigger condition, and sending the task execution request to the execution end includes:

and the management end sends the task execution request to the execution end.

In an embodiment of the present invention, determining whether there is data satisfying a trigger condition in all data in a storage terminal through the trigger condition in the storage terminal, where the trigger condition includes a periodic or external event, specifically: periodicity refers to a period of time, such as: triggering every minute, triggering every hour, triggering at a specific moment every day, and month; external events generally refer to the triggering of a change in statistical information caused by access to upper layers to a certain threshold, such as: if it is counted that no access to a certain data is made for two consecutive weeks, the migration of the data to a certain engine is triggered.

In an embodiment of the present application, the management end updates or deletes the data feature statistics corresponding to the target storage end according to the first processing instruction; the management end updates or deletes the data characteristic statistical information of the target storage end by receiving a first processing instruction returned by the execution end, wherein the updating or deleting is carried out according to the task type corresponding to the first processing instruction;

in one embodiment, when the task type is migration, the data feature statistics are updated in the target storage.

In another embodiment, when the task type is delete, the data feature statistics are deleted in the target storage.

Referring to fig. 2, a flowchart illustrating steps of a data management method based on a multidimensional feature according to an embodiment of the present application is shown;

s210, when the execution end receives a task execution request sent by the management end, the execution end determines a task type corresponding to the task execution request according to the task execution request;

S220, the execution end acquires target data from the storage end according to the trigger condition in the task execution request;

s230, the execution end performs data processing on the target data according to the task type and the target storage end, wherein the task type comprises migration or deletion;

s240, the executing end generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type, and sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting the target data characteristic statistical information corresponding to the target storage end at the management end;

s250, the execution end generates a second processing instruction responding to the routing end according to the target data, the target storage end and the task type, and sends the second processing instruction to the routing end; the second processing instruction is configured to add or delete, at the routing end, a target access path corresponding to a target access path formed from the target data to the target storage end.

In an embodiment of the present invention, the data processing is performed on the target data according to the task type and the target storage end, where the task type includes a step of migration or deletion, and the method includes:

or alternatively, the first and second heat exchangers may be,

In an embodiment of the present invention, the executing end obtains the target data from the storage end according to the trigger condition; acquiring target data in a storage end through a triggering condition; in a specific embodiment, all data in the storage end are acquired, and target data is screened from all data through a triggering condition, wherein the triggering condition can be a preset threshold value, and data larger than the preset threshold value are determined as target data.

In an embodiment of the present invention, when the task type is the migration, the executing end migrates the target data from the storage end to the target storage end according to the task type; in a specific embodiment, when the task type is migration, all data, i.e. target data, of which the storage end reaches a threshold value, are copied to a target storage end in the storage end according to defined data granularity, i.e. granularity information.

In an embodiment of the present invention, when the task type is the deletion, the executing terminal deletes the target data from the target storage terminal according to the task type, where the target storage terminal is an initial storage terminal corresponding to the target data; in a specific embodiment, when the task type is delete, deleting all data, i.e. target data, of which the target storage end reaches a threshold value in the target storage end according to defined data granularity, i.e. granularity information; specifically, when the task type is delete, the target storage is the initial storage corresponding to the target data.

In an embodiment of the present invention, the executing end generates a first processing instruction in response to the managing end according to the target data, the target storage end and the task type;

the execution end sends the first processing instruction to the management end; the first processing instruction is used for updating or deleting the corresponding data characteristic statistical information in the target storage end at the management end.

In a specific embodiment, when copying all data, namely target data, of which the storage end reaches a threshold value to a target storage end, generating a first processing instruction according to a task type and sending the first processing instruction to a management end;

In an embodiment of the present application, the executing end generates a second processing instruction in response to the routing end according to the target data, the target storage end and the task type;

the execution end sends the second processing instruction to the routing end; the second processing instruction is used for adding or deleting a target access path formed by the target data to the target storage end in the target storage end at the routing end;

in a specific embodiment, after all the data reaching the threshold in the target storage end, that is, the target data is deleted, a second processing instruction is generated according to the task type and sent to the routing end.

Referring to fig. 3, a flowchart illustrating steps of a data management method based on a multidimensional feature according to an embodiment of the present application is shown;

s310, the router end responds to the data access information aiming at the upper layer data to determine an access path;

s320, the router generates the data access request according to the access path and sends the data access request to the management end;

S330, the router adds or deletes the target access path from the target storage end according to the second processing instruction sent by the execution end.

In an embodiment of the present invention, a routing end is configured to determine which storage end stores data, and may perform transparent routing of an upper layer data access request to a corresponding storage end, where the newly added data is routed to the storage end; in one embodiment, the storage includes an initial storage and a plurality of target storages, and the newly added data is routed to the initial storage.

In an embodiment of the present invention, the routing end determines an access path in response to data access information for upper layer data; the access path can be used for knowing which storage ends need to be accessed; determining which storage end is accessed according to the data access information in the router according to the data condition to be accessed; such as: for the first newly added data, the data are put into an initial storage end, route information, namely an access path, is recorded in the initial storage end, and the route information is updated when the data are migrated between the storage ends; for subsequent access, the storage end on which the data to be accessed is located can be known according to the routing end.

In one embodiment of the invention, a data access request refers to an access of data generally including data addition, deletion, and verification (database is generally represented by DML), and a change of data definition (database is generally represented by DDL); and generating a corresponding target request according to the data addition, deletion, check or change.

In an embodiment of the present invention, the routing end generates the data access request according to the access path, and sends the data access request to the management end.

In an embodiment of the present invention, the routing end adds or deletes a target access path from the target storage end according to a second processing instruction sent by the execution end; in a specific embodiment, when the task type is migration, adding an access path formed between the storage end and the target storage end of the target data to the target storage end; in another embodiment, when the task type of the data task is a deletion task, all access paths corresponding to the target data are deleted in the storage end.

Example 1

The following illustrates a storage engine, i.e. a storage end, optimized by 3 different dimensional characteristics, which can be easily extended to a larger number of storage engines, i.e. storage ends, and the dimensional characteristics only include 3 dimensions, which should be easily extended to a larger number of dimensions or be correspondingly deformed.

The following three storage engines are added, namely a storage end, and the related attributes are as follows:

storage engine 1: changing dimension, dynamic data characteristics, recording level granularity, no change in a trigger threshold value of 30 days, migration of data tasks and storage engine 2 of a target storage end;

storage engine 2: the access frequency dimension, the thermal data characteristic, the data batch granularity, the trigger threshold value is 20 (the access frequency is low), the attenuation period is 1 degree per hour, the increment of each access is 10 degrees, the maximum is 720 degrees (the access is simplified, the attenuation can be performed by a complex model such as Newton's cooling law), the data task is migration, and the target storage end is the storage engine 3;

storage engine 3: the method comprises the steps of accessing a time dimension, heating data characteristics, granularity of data files, aging a trigger threshold of 10 years, deleting data tasks and no target storage end.

Establishing a data structure of the latest update time of each updated record for the storage engine 1 as data characteristic statistical information of the storage engine 1;

establishing a data structure of access temperature of each batch of data for the storage engine 2 as data characteristic statistical information of the storage engine 2;

a data structure of creation time of each data file is established for the storage engine 3 as data characteristic statistical information of the storage engine 3;

Suppose that 10 pieces of data are inserted 8/1/2012:

ID	NAME	SALARY
			1	LIU	30000
2	LI	9000
			3	WANG	20000
4	ZHAO	10000
			5	HAN	500000
6	FAN	4000
			7	GUO	60000
8	MAO	7000
			9	CAI	40000
10	GAO	12000

initially, all these 10 pieces of data are inserted into the storage engine 1, i.e. the storage side, and the corresponding routing information is updated.

The data with the ID of 8 is updated in 8 months 2012, such as from 7000 to 10000, at this time, the data feature statistical information of the storage engine 1 is newly added with the ID of 8, the updating time is 8 months 2012, and the other data is not updated until 31 months 2012.

At this time, the regular task triggers the data task of the storage engine 1, all but the record with ID 8 is migrated to the storage engine 2, the data is deleted from the storage engine 1, the route information is updated, the routes with ID 1 to 7 and ID 9 to 10 are updated to the storage engine 2, ID 1 to 5 is used as one data batch a, ID 6, 7, 9 and 10 data is used as data batch B, and the temperature of the initialized data batch a and data batch 2 is 100.

At this time, if the user wants to access the data, but accesses the data with ID of 8, the data will be automatically routed to the storage engine 1, and other data will access the storage engine 2; assuming that 10 data lot B data were queried on day 8, 31 in 2012, the temperature in the data characteristic statistics of data lot B would be 200 degrees, while the data temperature of data lot a would be 100 degrees.

From 31 days, the data is attenuated by 10 degrees every day, when the temperature of the batch is reduced to 20 degrees, a data migration task is triggered, the data of the data batch A is migrated from the storage engine 2 to the storage engine 3, and the route is updated, namely, the route with the ID of 1-5 is changed into the route to the storage engine 3, the storage engine 3 stores the ID of 1-5 as a whole to the data file Z, and the creation time of the update feature statistics data file Z is 2012, 9 months and 4 days.

At this time, if the user accesses the data, automatic routing will be implemented: data with ID 8 is at storage engine 1 (assuming that ID 8 is always updated during this period), data with IDs 6, 7, 9, and 10 are at storage engine 2 (assuming that batch 2 is always queried during this period), and data with IDs 1-5 are at storage engine 3.

Finally, at 2022, 9 and 4, data file Z will trigger the data deletion task, delete directly from storage engine 3, and delete the relevant routing information and feature statistics, as it reaches the threshold condition of 10 years. At this time, the user revisits the data with the ID of 1 to 5, and no more.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Referring to fig. 4, a block diagram of a data management device based on multidimensional features according to an embodiment of the present application is shown;

a data management apparatus based on multidimensional features, the apparatus involving a storage side, a routing side, a management side, and an execution side, the apparatus comprising:

a receiving module 410, configured to receive, by the management end, a data access request sent by the routing end, and generate initial data feature statistics information and a triggering condition corresponding to the storage end according to the data access request;

the first generating module 420 is configured to determine, by the management end, a target storage end according to the trigger condition, generate the task execution request according to the target storage end and the trigger condition, send the task execution request to the execution end, and receive a first processing instruction sent by the execution end;

the first data processing module 430 is configured to update or delete the target data feature statistics corresponding to the target storage according to the first processing instruction by the management end.

In one embodiment of the present application, the receiving module 410 includes:

the first determining submodule is used for determining access data in the storage end by the management end according to the data access request;

The second determining submodule is used for determining data granularity information corresponding to the access data in the storage end according to the access data by the management end;

and the first updating sub-module is used for generating initial data characteristic statistical information corresponding to the storage end and the triggering condition by the management end according to the data granularity information.

In an embodiment of the present application, the first generating module 420 includes:

a third determining submodule, configured to, according to the obtaining, by the management end, all the data in the storage end, and determine whether there is target data that meets the triggering condition;

the first generation sub-module is used for generating the task execution request according to the target data by the management end when all the data in the storage end have the target data which correspondingly meet the trigger condition;

and the first sending submodule is used for sending the task execution request to the execution end by the management end.

Referring to fig. 5, a block diagram of a data management device based on multidimensional features according to an embodiment of the present application is shown;

A first determining module 510, configured to, when the executing end receives a task execution request sent by the management end, determine, by the executing end according to the task execution request, a task type corresponding to the task execution request;

the acquiring module 520 is configured to acquire target data from the storage terminal according to a trigger condition in the task execution request by the execution terminal;

the second data processing module 530 is configured to perform data processing on the target data by the execution end according to the task type and the target storage end, where the task type includes migration or deletion;

a third data processing module 540, configured to generate a first processing instruction according to the target data, the target storage end, and the task type by the execution end, and send the first processing instruction to the management end; the first processing instruction is used for updating or deleting the target data characteristic statistical information corresponding to the target storage end at the management end;

a fourth data processing module 550, configured to generate a second processing instruction according to the target data, the target storage end, and the task type by using the execution end, and send the second processing instruction to the routing end; the second processing instruction is configured to add or delete, at the routing end, a target access path corresponding to a target access path formed from the target data to the target storage end.

In one embodiment of the present application, the second data processing module 530 includes:

the first execution sub-module is used for the execution end to migrate the target data from the storage end to the target storage end according to the task type when the task type is the migration;

or alternatively, the first and second heat exchangers may be,

a second execution sub-module, configured to delete the target data from the target storage according to the task type when the task type is the deletion, where the target storage is an initial storage corresponding to the target data

Referring to fig. 6, a block diagram of a data management device based on multidimensional features according to an embodiment of the present application is shown;

a second determining module 610, configured to determine an access path by the routing peer in response to the data access information for the upper layer data;

a second generating module 620, configured to generate the data access request according to the access path by the routing end, and send the data access request to the management end;

And a fifth data processing module 630, configured to add or delete a target access path from the target storage according to the second processing instruction sent by the execution end by the routing end.

Referring to fig. 7, a computer device illustrating a data management method based on a multi-dimensional feature according to the present invention may specifically include the following:

the computer device 12 described above is embodied in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, a processor, or a local bus 18 using any of a variety of bus 18 architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus 18, micro channel architecture (MAC) bus 18, enhanced ISA bus 18, video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, the program modules 42 being configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), one or more devices that enable an operator to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 7, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, data backup storage systems 34, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a multi-dimensional feature-based data management method provided by an embodiment of the present application.

That is, the processing unit 16 realizes when executing the program: the management terminal receives the data access request sent by the routing terminal and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request; the management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end; and the management end updates or deletes the target data characteristic statistical information corresponding to the target storage end according to the first processing instruction.

In an embodiment of the present application, the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-dimensional feature-based data management method as provided in all embodiments of the present application:

That is, the program is implemented when executed by a processor: the management terminal receives the data access request sent by the routing terminal and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request; the management end determines a target storage end according to the trigger condition, generates the task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end; and the management end updates or deletes the target data characteristic statistical information corresponding to the target storage end according to the first processing instruction.

Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the operator's computer, partly on the operator's computer, as a stand-alone software package, partly on the operator's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the operator computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail a method and apparatus for managing data based on multidimensional features, and specific examples are applied to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The method is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, wherein the multi-dimensional characteristics at least comprise any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, and each storage end has corresponding data characteristic statistical information, and the method comprises the following steps:

the management terminal receives a data access request sent by the routing terminal, and generates initial data characteristic statistical information and triggering conditions corresponding to the storage terminal according to the data access request, wherein the initial data characteristic statistical information comprises initial dimension information, and the characteristics of the data are classified and defined through the initial dimension information;

The management end determines a target storage end according to the trigger condition, generates a task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end, wherein the execution end is used for determining a task type corresponding to the task execution request according to the task execution request and acquiring target data in the storage end according to the trigger condition in the task execution request, the execution end carries out data processing on the target data according to the task type and the target storage end, the task type comprises migration or deletion, and generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type;

2. The method of claim 1, wherein the step of generating initial data characteristic statistics and triggering conditions corresponding to the storage side in accordance with the data access request comprises:

3. The method according to claim 1, wherein the step of generating the task execution request according to the target storage side and the trigger condition, and transmitting the task execution request to the execution side includes:

the management end acquires all data in the storage end and determines whether target data meeting the triggering condition exists or not;

and the management end sends the task execution request to the execution end.

4. The method is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, wherein the multi-dimensional characteristics at least comprise any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, and each storage end has corresponding data characteristic statistical information, and the method comprises the following steps:

When the execution end receives a task execution request sent by the management end, the execution end determines a task type corresponding to the task execution request according to the task execution request, wherein the management end is used for receiving a data access request sent by the routing end and generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request, the initial data characteristic statistical information comprises initial dimension information, and the characteristics of data are classified and defined through the initial dimension information; the management end determines a target storage end according to the triggering condition and generates a task execution request according to the target storage end and the triggering condition;

5. The method of claim 4, wherein the data processing is performed on the target data according to the task type and the target storage side, wherein the task type includes a step of migrating or deleting, and the method includes:

or alternatively, the first and second heat exchangers may be,

6. The method is characterized in that the method relates to a storage end, a routing end, a management end and an execution end, wherein the multi-dimensional characteristics at least comprise any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, and each storage end has corresponding data characteristic statistical information, and the method comprises the following steps:

the router generates the data access request according to the access path and sends the data access request to the management end, wherein the management end is used for receiving the data access request sent by the router and generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request, the initial data characteristic statistical information comprises initial dimension information, and the characteristics of the data are classified and defined through the initial dimension information; the management end determines a target storage end according to the trigger condition, generates a task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end, wherein the execution end is used for determining a task type corresponding to the task execution request according to the task execution request and acquiring target data in the storage end according to the trigger condition in the task execution request, the execution end carries out data processing on the target data according to the task type and the target storage end, the task type comprises migration or deletion, and generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type;

The router adds or deletes a target access path from a target storage end according to a second processing instruction sent by the execution end, wherein the execution end is used for generating a second processing instruction responding to the router end according to target data, the target storage end and a task type, and sending the second processing instruction to the router end, and the second processing instruction is used for adding or deleting the target access path corresponding to the target data in the target storage end and formed by the target data to the target storage end in the router end.

7. A data management device based on a multi-dimensional feature, wherein the device relates to a storage end, a routing end, a management end and an execution end, the multi-dimensional feature at least comprises any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, each storage end has data feature statistical information corresponding to the storage end, and the device comprises:

the receiving module is used for receiving the data access request sent by the router end by the management end, generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request, wherein the initial data characteristic statistical information comprises initial dimension information, and classifying and defining the characteristics of the data through the initial dimension information;

The first generation module is used for determining a target storage end by the management end according to the trigger condition, generating a task execution request according to the target storage end and the trigger condition, sending the task execution request to the execution end, and receiving a first processing instruction sent by the execution end, wherein the execution end is used for determining a task type corresponding to the task execution request according to the task execution request and acquiring target data in the storage end according to the trigger condition in the task execution request, the execution end carries out data processing on the target data according to the task type and the target storage end, the task type comprises migration or deletion, and generating a first processing instruction responding to the management end according to the target data, the target storage end and the task type;

8. A data management device based on a multi-dimensional feature, wherein the device relates to a storage end, a routing end, a management end and an execution end, the multi-dimensional feature at least comprises any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, each storage end has data feature statistical information corresponding to the storage end, and the device comprises:

The first determining module is used for determining a task type corresponding to the task execution request according to the task execution request when the execution end receives the task execution request sent by the management end, wherein the management end is used for receiving a data access request sent by the routing end and generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request, the initial data characteristic statistical information comprises initial dimension information, and the characteristics of data are classified and defined through the initial dimension information; the management end determines a target storage end according to the triggering condition and generates a task execution request according to the target storage end and the triggering condition;

9. A data management device based on a multi-dimensional feature, wherein the device relates to a storage end, a routing end, a management end and an execution end, the multi-dimensional feature at least comprises any two of a data change dimension, a data access frequency dimension and a data query service dimension, the storage end comprises an initial storage end and a plurality of target storage ends, each storage end has data feature statistical information corresponding to the storage end, and the device comprises:

the second generation module is used for generating the data access request by the routing end according to the access path and sending the data access request to the management end, wherein the management end is used for receiving the data access request sent by the routing end and generating initial data characteristic statistical information and triggering conditions corresponding to the storage end according to the data access request, the initial data characteristic statistical information comprises initial dimension information, and the characteristics of the data are classified and defined through the initial dimension information; the management end determines a target storage end according to the trigger condition, generates a task execution request according to the target storage end and the trigger condition, sends the task execution request to the execution end, and receives a first processing instruction sent by the execution end, wherein the execution end is used for determining a task type corresponding to the task execution request according to the task execution request and acquiring target data in the storage end according to the trigger condition in the task execution request, the execution end carries out data processing on the target data according to the task type and the target storage end, the task type comprises migration or deletion, and generates a first processing instruction responding to the management end according to the target data, the target storage end and the task type;

And the fifth data processing module is used for adding or deleting the target access path from the target storage end according to the second processing instruction sent by the execution end by the routing end, wherein the execution end is used for generating a second processing instruction responding to the routing end according to target data, the target storage end and the task type, and sending the second processing instruction to the routing end, and the second processing instruction is used for adding or deleting the target access path which corresponds to the target access path formed from the target data to the target storage end in the target storage end.

10. A computer device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which computer program, when executed by the processor, implements the method of any one of claims 1 to 6.