CN108304142B - Data management method and device - Google Patents

Data management method and device Download PDF

Info

Publication number
CN108304142B
CN108304142B CN201711487793.4A CN201711487793A CN108304142B CN 108304142 B CN108304142 B CN 108304142B CN 201711487793 A CN201711487793 A CN 201711487793A CN 108304142 B CN108304142 B CN 108304142B
Authority
CN
China
Prior art keywords
data
storage device
stored
server
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711487793.4A
Other languages
Chinese (zh)
Other versions
CN108304142A (en
Inventor
毕杰山
钟超强
李岱城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201711487793.4A priority Critical patent/CN108304142B/en
Publication of CN108304142A publication Critical patent/CN108304142A/en
Application granted granted Critical
Publication of CN108304142B publication Critical patent/CN108304142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data management method and a device, relates to the field of storage, and is used for reducing IO redundancy generated in a data merging process, and the scheme is applied to a distributed storage system and comprises the following steps: the method comprises the steps that in the process that a server side determines that a plurality of data stored in a first storage device need to be stored in a second storage device, the server side determines that two or more data in the plurality of data meet a first preset condition, the server side stores the two or more data and the rest data except the two or more data in the first storage device in the second storage device to obtain a first data file, and the two or more data are located in a first data set in the second storage device.

Description

Data management method and device
Technical Field
The present application relates to the field of storage, and in particular, to a data management method and apparatus.
Background
A Key-Value (Key-Value) type storage mode is adopted in the distributed storage system, namely data are stored in a Value part, and then a mapping relation between the Key and the Value is constructed. When the client accesses the data, the client can use the Key as an index to search the corresponding Value according to the mapping relationship between the Key and the Value so as to access the data stored in the Value part. In addition, data is usually sorted naturally according to the dictionary order of keys when stored in the distributed storage system. This ensures that data encoded by the same user is stored contiguously. So that all data of a user code XXXX during a period of time can be retrieved from the user code XXXX.
When data is stored in the distributed storage system, a corresponding user Table (Table) can be created according to user requirements, the Table is used for storing a type of data, for example, a Table named as user information (UserInfo) can be used for storing user basic information, and the UserInfo Table can be used as a Key; transaction record detail information is stored using a Table named transaction records (Transactions), which may be used as Value. However, a Table may contain a large amount of data, and therefore, in the prior art, a common way is to cut a Table according to the dictionary sequence of the recorded keys to generate a plurality of sub-tables (registers) for management and maintenance, that is, a register refers to a Key value interval having a start Key and an end Key, different keys belong to different registers, and a Table usually includes one or more registers.
In the prior art, a client may merge an existing Key-Value list in a row of data into a large KeyValue, specifically, as shown in fig. 1, the client first needs to read the existing Key-Value list from a server, where the data list includes at least one Key and a Value corresponding to each Key in the at least one Key, and as shown in fig. 2, the client merges two or more data in the same row or the same column of different files to form a new Key-Value list, and sends the formed new Key-Value list and a deletion flag to the server, where the deletion flag is used to indicate that the merged data is deleted.
However, reading the existing Key-Value list from the server by the client causes read disk read-write (IO) redundancy, and in addition, write disk IO redundancy exists in the process of sending the new Key-Value list and the deletion mark of the client to the server.
Disclosure of Invention
The application provides a data management method and device, which are used for reducing disk IO redundancy generated in a data merging process.
In order to solve the above problem, the present application provides a data management method, which is applied in a distributed storage system, and the method includes: the method comprises the steps that in the process that a server side determines that a plurality of data stored in a first storage device need to be stored in a second storage device, the server side determines that two or more data in the plurality of data meet a first preset condition, the server side stores the two or more data and the rest data except the two or more data in the first storage device in the second storage device to obtain a first data file, and the two or more data are located in a first data set in the second storage device.
The application provides a data management method, in the process of storing a plurality of data stored in a first storage device in a second storage device, if two or more data in the plurality of data in the first storage device meet a first preset condition, then two or more data and the rest data are stored in the second storage device, and the two or more data are located in a first data set in the second storage device, because the process of storing the two or more data in the second storage device in the first data set is executed by a server, the disk IO redundancy can be reduced compared with the prior art, in addition, the two or more data are stored in the first data set, so that when the two or more data need to be accessed, the data in the first data set can be accessed by accessing the first data set, when two or more data are not merged into the first data set, the two or more data are generally required to be accessed respectively, so that the access times can be reduced.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the method provided by the present application further includes: the server determines that two or more first data files in the plurality of first data files in the second storage device meet a second preset condition, and acquires a second data set according to the two or more first data files. The application can reduce the number of the first data files stored in the second storage device by combining a plurality of first data files in the second storage device, thereby improving the access performance.
In addition, further, the server determines that partial data in the two or more first data files meets a second preset condition, and then the server acquires a second data set according to the partial data in the two or more first data files.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the second preset condition includes: two or more first data files belong to the same time period, the types of the two or more first data files are the same, the indexes of the two or more first data files are the same, the time of the two or more first data files is continuous, and the identifications of the two or more first data files are continuous.
With reference to any one of the first aspect to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the determining, by the server, that the plurality of data stored in the first storage device need to be stored in the second storage device includes: the server receives a first operation instruction, wherein the first operation instruction is used for indicating that a plurality of data stored in a first storage device are stored in a second storage device in a persistent mode; the server determines that a plurality of data stored in the first storage device need to be stored in the second storage device according to the first operation instruction; or, if the server determines that the plurality of data stored in the first storage device satisfy the data persistence condition, the server determines that the plurality of data need to be stored in the second storage device.
With reference to any one of the first aspect to any one of the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the first preset condition includes any one or more of the following: the time information of the two or more data belongs to the same time period, the data indexes of the two or more data are the same, the types of the two or more data are the same, the indexes of the two or more data belong to the same range, the time of the two or more data is continuous, and the marks of the two or more data are continuous.
With reference to any one of the first aspect to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the data persistence condition includes at least one or more of the following: the size of the data stored in the first storage device is larger than or equal to a first threshold, other data needing to be stored in the first storage device exists, and the storage space of the first storage device is smaller than or equal to a second threshold.
With reference to any one of the first aspect to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the storing, by the server, two or more pieces of data and the remaining data, except for the two or more pieces of data, in the first storage device in a second storage device to obtain a first data file includes: the server side obtains a first data set according to two or more data; the server stores the first data set and the rest data in a second storage device.
In a second aspect, the present application provides a data management apparatus, which is applied to a distributed storage system, and includes: the server side is used for determining that two or more data in the plurality of data meet a first preset condition in the process of determining that the plurality of data stored in the first storage device need to be stored in the second storage device;
and the storage unit is used for storing the two or more data and the rest data except the two or more data in the first storage device into the second storage device to obtain a first data file, wherein the two or more data are positioned in the first data set in the second storage device.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the apparatus provided in the present application further includes: the obtaining unit is further configured to determine that two or more first data files in the plurality of first data files in the second storage device satisfy a second preset condition, and the server obtains a second data set according to the two or more first data files.
With reference to the second aspect or the first possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the second preset condition includes: two or more first data files belong to the same time period, the types of the two or more first data files are the same, the indexes of the two or more first data files are the same, the time of the two or more first data files is continuous, and the identifications of the two or more first data files are continuous.
With reference to any one of the second aspect to the third possible implementation manner of the second aspect, in a fourth possible implementation manner of the second aspect, the apparatus provided by the present application further includes: a receiving unit, configured to receive a first operation instruction, where the first operation instruction is used to instruct persistent storage of a plurality of data stored in a first storage device to a second storage device; the determining unit is used for determining that the data stored in the first storage device needs to be stored in the second storage device according to the first operation instruction;
or, the determining unit is configured to determine that the plurality of data stored in the first storage device satisfy a data persistence condition, and the server determines that the plurality of data need to be stored in the second storage device.
With reference to any one of the second aspect to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner of the second aspect, the first preset condition includes any one or more of the following: the time information of the two or more data belongs to the same time period, the data indexes of the two or more data are the same, the types of the two or more data are the same, the indexes of the two or more data belong to the same range, the time of the two or more data is continuous, and the marks of the two or more data are continuous.
With reference to any one of the second aspect to the fifth possible implementation manner of the second aspect, in a sixth possible implementation manner of the second aspect, the data persistence condition includes at least one or more of the following: the size of the data stored in the first storage device is larger than or equal to a first threshold, other data needing to be stored in the first storage device exists, and the storage space of the first storage device is smaller than or equal to a second threshold.
In a third aspect, the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed, the instructions cause a server to execute the data management method described in the foregoing first aspect to the fifth possible implementation manner of the first aspect.
In a fourth aspect, the present application provides a computer program product containing instructions, where the instructions are stored in the computer program product, and when the instructions are executed, the server executes the data management method described in the first aspect to the fifth possible implementation manner of the first aspect.
In a fifth aspect, the present application provides a chip system, which is applied to a data management device, where the chip system includes at least one processor and an interface circuit, where the interface circuit and the at least one processor are interconnected by a line, and the processor is configured to execute instructions to perform the data management method described in the first aspect to the fifth possible implementation manner of the first aspect.
In a sixth aspect, the present application provides a data management system, including the data management apparatus described in the foregoing second aspect to the fifth possible implementation manner of the second aspect, and a client.
Drawings
Fig. 1 is a first diagram of data merging provided in the prior art;
fig. 2 is a schematic diagram of data merging provided in the prior art;
FIG. 3a is a schematic structural diagram of a distributed storage system to which a data management method is applied according to the present application;
FIG. 3b is a first schematic structural diagram of a data management apparatus according to an embodiment of the present invention;
fig. 4 is a first flowchart illustrating a data management method provided in the present application;
FIG. 5 is a diagram illustrating a server storing data from a first storage device to a second storage device according to the present application;
fig. 6 is a schematic flowchart illustrating a second data management method according to the present application;
FIG. 7 is a first illustration of a merging of data in a second storage device according to the present application;
FIG. 8 is a second illustration of a merging of data in a second storage device according to the present application;
FIG. 9 is a third illustration of a merging of data in a second storage device according to the present application;
fig. 10 is a schematic structural diagram of a data management apparatus according to the present application;
fig. 11 is a schematic structural diagram of another data management apparatus provided in the present application.
Detailed Description
It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The term "and/or" in this application is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this application generally indicates that the former and latter related objects are in an "or" relationship.
The network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as a person of ordinary skill in the art knows that along with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The term "plurality" in this application means two or more.
The terms "first", "second", and the like in the present application are only for distinguishing different objects, and do not limit the order thereof. For example, the first data and the second data are only for distinguishing different data, and the sequence order thereof is not limited.
Before the present application is described, the relevant terms to which the present application refers are described:
as shown in fig. 3a, fig. 3a shows a schematic structural diagram of a distributed storage system applied in a data management method provided in the present application, where the distributed storage system includes a client, a server, and at least one storage device connected to the server (for example, at least one storage device is taken as the storage device 1, the storage device 2, and the storage device 3 in fig. 3 a), it is understood that the distributed storage system in the present application may include more than three storage devices.
The client is used for reading data stored in the server or writing data into the server.
The server is used for writing data according to the request of the client, writing the data into a memory of the server, or writing the data into a storage device connected with the server, or sending the data to the client, wherein the storage device is used for storing the data.
It will be appreciated that at least one of the storage devices may be a magnetic disk.
When the distributed storage system writes data, the data is written into the memories of a write-ahead log (WAL) and a Region at the same time. The WAL is solidified to a disk to ensure the reliability of data, and after the data in the Region memory meets a certain condition, the data can be durably (Flush) to the disk to form a distributed database data storage file (HFile), and meanwhile, a timestamp range containing the data is recorded in the metadata of the HFile.
When the distributed database continuously writes data, due to Flush, a large number of hfiles exist on a disk, and the performance of reading data is affected. Therefore, the HFile performs a merge (composition) process when a certain condition is satisfied, so as to merge multiple hfiles into one HFile.
Region is the smallest unit of distributed storage and load balancing.
The data management apparatus in the embodiment of the present invention may be implemented by using a controller, as shown in fig. 3b, and fig. 3b shows a possible structure of the data management apparatus, as shown in fig. 3b, the data management apparatus 30 includes: including memory 511, processor 512, system bus 513, power components 514, input output interfaces 515, and communication components 516, among others. The memory 511 may be used to store data, software programs, and modules, and mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store data created by use of the data management apparatus 30, and the like. The processor 512 performs various functions of the data management apparatus 30 and processes data by running or executing software programs and/or modules stored in the memory 511 and calling data stored in the memory 511. The system bus 513 includes an address bus, a data bus, and a control bus for transmitting data and instructions; a power component 514 for providing power to the various components of the data-managed device 30; input/output interface 515 provides an interface between processor 512 and peripheral interface modules; the communication component 516 is configured to communicate in a wired or wireless manner between the data management apparatus 30 and other devices.
As shown in fig. 4, fig. 4 is a schematic flowchart illustrating a data management method provided in the present application, and is applied to a distributed storage system, where the method includes:
s101, in a process that the server determines that a plurality of data stored in the first storage device needs to be stored in the second storage device, the server determines that two or more data in the plurality of data satisfy a first preset condition (the preset condition in this application may also be referred to as a service logic, that is, two or more data are merged into a first data set according to the service logic).
Alternatively, the first storage device in this application may be a memory of the server, for example, a Memstore, and the second storage device in this application may be a storage device connected to the server, for example, the storage device 1 shown in fig. 3a, and the storage device 1 may be a magnetic disk.
It is understood that a plurality of data in the present application is stored in the first storage device in the form of Key-Value, and thus two or more data in the second data set may also be stored in the second storage device in the form of Key-Value.
Specifically, the server may determine that the data stored in the first storage device needs to be stored in the second storage device in a plurality of ways, for example, one way is as follows: s1, the server receives a first operation instruction, wherein the first operation instruction is used for instructing to store the data stored in the first storage device to the second storage device in a persistent mode; s2, the server determines that the data stored in the first storage device needs to be stored in the second storage device; in another mode, S3, if the server determines that the data stored in the first storage device satisfies the data persistence condition, the server determines that the data stored in the first storage device needs to be stored in the second storage device.
Optionally, the data persistence condition includes at least one or more of: the size of the data stored in the first storage device is larger than or equal to a first threshold, other data needing to be stored in the first storage device exists, and the storage space of the first storage device is smaller than or equal to a second threshold.
Specifically, the first preset condition includes any one or more of the following conditions: the time information of the two or more data belongs to the same time period, the data indexes of the two or more data are the same, the types of the two or more data are the same, and the indexes of the two or more data belong to the same range.
Illustratively, the first preset condition is that the data indexes of two or more data are the same, for example, the two or more data include first data and second data, wherein the first data and the second data have the same index, for example, Key, and the first data is stored in the first Value portion and the second data is stored in the Value portion. Specifically, as shown in table 1:
TABLE 1
Value1 Value2 Value3 Value4
Key1 Data
1 Data 2 Data 3 Data 4
Key2 Data 5 Data 6 Data 7 Data 8
As can be seen from table 1, data 2, data 3, and data 4 are stored in different values of the first storage device, but have the same Key1, so that the server can generate the first data set by storing data 1, data 2, data 3, and data 4 in the second storage device, i.e., Key1 corresponding to data 1 and data 1, Key1 corresponding to data 2 and data 2, Key1 corresponding to data 3 and data 3, and Key1 corresponding to data 4 and data 1.
For example, taking a first preset condition as that time information of two or more data belongs to the same time period, for example, in an actual process, one piece of weather information is collected every minute between 13:00 and 14:00, the server stores the weather information collected every minute in the first storage device, sixty pieces of weather information exist in the first storage device between 13:00 and 14:00, when the client needs to access 13:00 to 14:00, the client can determine the weather information between 13:00 and 14:00 through an index corresponding to each piece of the sixty pieces of weather information, so that the client can visit the first storage device for many times, and in the present application, when the data persistence condition is met, the server can generate a first data set from the sixty pieces of weather information collected between 13:00 and 14:00, meanwhile, the server gives an identifier or an index to the first data set, so that the client can access sixty pieces of weather information collected between 13:00 and 14:00 through the identifier or the index of the first data set.
S102, the server stores two or more data and the rest data except the two or more data in the first storage device into the second storage device to obtain a first data file, wherein the two or more data are located in a first data set in the second storage device.
Specifically, the server in the present application may merge two or more data to generate the first data set before storing two or more data in the second storage device, that is, the process of merging two or more data to generate the first data set is executed on the side of the first storage device, the server may also merge two or more data to generate the first data set in the process of storing two or more data in the second storage device, the server may also merge two or more data to generate the first data set after storing two or more data in the second storage device, and then merge two or more data, to generate the first data set, which is not limited in this application as long as two or more data finally stored in the second storage device are guaranteed to belong to the first data set.
Specifically, step S102 in the present application can be implemented by: the server acquires a first data set according to two or more data, and stores the first data set and the rest data except the two or more data in the first storage device into the second storage device to obtain a first data file.
Specifically, in the present application, the server determines that two or more data satisfy the first preset condition, which may also be understood as that the server merges two or more data according to the first service logic to generate the first data set.
It should be noted that, in the present application, two or more data are merged to generate the first data set, and the two or more data exist independently in the first data set, it is understood that merging refers to establishing a relationship between two or more data independent from each other so that the two or more data belong to the first data set.
For example, the server may store the first data set and the rest of the data in the first storage device except for two or more data in the form of the distributed database data storage file HFile in the second storage device to obtain the first data file.
Illustratively, as shown in fig. 5, during the process of storing data 1, data 2 and data 3 in the second storage device, data 1, data 2, and data 3 may be merged, and the merged data X1 (where data X1 includes data 1, data 2, and data 3), and storing a plurality of data M in the same row as the data 1, the data 2 and the data 3 into a second storage device, similarly, in the process of storing the data 4, the data 5, the data 6 and the data 7 in the second storage device, data 4, data 5, data 6, and data 7 may be merged, and the merged data X2, and storing a plurality of data M in the same row as the data 4, the data 5, the data 6, and the data 7 into a second storage device to obtain a distributed database data storage file Hfile.
It should be noted that, when two or more data stored in the first storage device are located in different rows or different columns, in the present application, when two or more data satisfy the first preset condition, two or more data satisfying the first preset condition may be merged to implement merging of data located in different rows or different columns, whereas in the prior art, data located in the same row or data located in the same column may only be merged.
For example, in the application, a storage space of the first storage device may be divided into a Key-Value storage manner, where, by taking a behavior Key and taking Value as an example, one piece of weather information collected every minute between 13:00 and 14:00 is stored in a first row of the first storage device, and one piece of weather information collected every minute between 14:00 and 15:00 is stored in a second row of the first storage device, when the weather information between 13:00 and 15:00 needs to be accessed, a first data combination may be generated for the weather information between 13:00 and 15:00, and a flag may be assigned to the first data combination generated for the weather information between 13:00 and 15:00, so that the weather information between 13:00 and 15:00 may be acquired.
It should be noted that, when the data stored in the first storage device has the deletion flag, the server may delete the data with the deletion flag during the process of storing the plurality of data in the first storage device in the second storage device, that is, the data with the deletion flag is not stored in the second storage device during the process of storing the plurality of data in the first storage device in the second storage device by the server.
Of course, it should be noted that when two or more data are merged into the first data set, the server may not only allocate a first identifier to the first data set, so that the client may access multiple data in the first data set according to the first identifier, but also may allocate a second identifier to each data in the two or more data, so that the client may further obtain the data indicated by the second identifier from the first data set according to the second identifier.
The application provides a data management method, in the process of storing a plurality of data stored in a first storage device in a second storage device, if two or more data in the plurality of data in the first storage device meet a first preset condition, then two or more data and the rest data are stored in the second storage device, and the two or more data are located in a first data set in the second storage device, because the process of storing the two or more data in the second storage device in the first data set is executed by a server, the disk IO redundancy can be reduced compared with the prior art, in addition, the two or more data are stored in the first data set, so that when the two or more data need to be accessed, the data in the first data set can be accessed by accessing the first data set, when two or more data are not merged into the first data set, the two or more data are generally required to be accessed respectively, so that the access times can be reduced.
Optionally, as shown in fig. 6, the method provided by the present application further includes:
s103, the server determines that two or more first data files in the plurality of first data files in the second storage device meet a second preset condition, and the server acquires a second data set according to the two or more first data files.
Specifically, step S103 may be implemented by: and under the condition that the server side determines that the plurality of first data files in the second storage device need to be merged, the server side determines that two or more first data files in the plurality of first data files in the second storage device meet a second preset condition, and the server side acquires a second data set according to the two or more first data files.
Specifically, the server determines that a plurality of first data files in the second storage device need to be merged, and may be implemented in the following manner: s4, the server receives a second operation instruction, the second operation instruction is used for instructing the service to merge the plurality of first data files in the second storage device, and S5, the server can determine to merge the plurality of first data files in the second storage device according to the second operation instruction; in addition, if the server determines that the number of the first data files in the second storage device is greater than or equal to the third threshold, the server determines that a plurality of first data files in the second storage device need to be merged. In the application, the plurality of data files in the second storage device are merged, so that the number of the files in the second storage device can be reduced, and the reading performance is improved.
Wherein the second preset condition comprises: two or more first data files belong to the same time period, the types of the two or more first data files are the same, the indexes of the two or more first data files are the same, the time of the two or more first data files is continuous, and the identifications of the two or more first data files are continuous.
Optionally, the identification of two or more first data files may include any one of the following: the storage addresses of two or more first data files are consecutive, and the file numbers of two or more first data files are consecutive.
For example, taking the first data file as the HFile stored in the disk as an example, as shown in fig. 7, when multiple or more hfiles satisfy the second condition, the server may merge the two or more hfiles according to the preset service logic to generate the second data set.
Illustratively, as shown in fig. 8, when the data file 1 (including Key Value CoIA, Key Value CoIB, and Key Value CoIC) and the data file 2 (including Key Value CoID and Key Value CoIE) satisfy a second preset condition, the server may merge the data file 1 and the data file 2 to generate a second data set, where the second data set includes Key Value CoIA, Key Value CoIB, Key Value CoIC, Key Value CoID, and Key Value CoIE.
Optionally, the server determines that partial data in the two or more first data files satisfies a second preset condition, and then the server acquires a second data set according to the partial data in the two or more first data files.
Specifically, when the server determines that at least one file included in the data file 1 and at least one data file included in the data file 2 satisfy the second preset condition, the server may further merge the at least one file included in the data file 1 and the at least one data file included in the data file 2 to generate the second data file, for another example, as shown in fig. 9, when the server determines that the Key Value CoIC included in the data file 1 and the Key Value CoID and the Key Value CoIE included in the data file 2 satisfy the second preset condition, the server may merge the Key Value CoIC, the Key Value CoID and the Key Value CoIE to generate the second data set, for example, the Key Value CoIF in fig. 9.
Specifically, in one possible implementation manner, step S103 may be implemented by:
and S1031, when the server determines that the plurality of first data files in the second storage device meet a second preset condition, the server acquires a second data set according to two or more first data files in a preset time period before the current time in the plurality of first data files.
It should be noted that, when the first data file in the present application has data with a deletion marker, and when the server merges two or more first data files, the data with the deletion marker is deleted, that is, the finally generated second data set does not include the data with the deletion marker.
The above description has mainly described the scheme provided in the present application from the perspective of a data management apparatus. It is to be understood that the data management apparatus and the like include hardware structures and/or software modules corresponding to the respective functions in order to realize the above-described functions. Those of skill in the art will readily appreciate that the present invention may be implemented in hardware or a combination of hardware and computer software for implementing the exemplary data management apparatus and method steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present invention, the data management apparatus and the like may be divided into functional modules according to the above method, for example, each functional module may be divided for each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
In the case of dividing each functional module by corresponding functions, fig. 10 shows a schematic diagram of a possible structure of the data management apparatus according to the foregoing embodiment, and the data management apparatus 30 includes: a determination unit 101 and a storage unit 102. Wherein the apparatus 30 for supporting data management by the determination unit 101 performs steps S101, S2, S3, and S5 in the above-described embodiment; the storage unit 102 is configured to support the data management apparatus 30 to execute step S102 in the foregoing embodiment, and in addition, the data management apparatus 30 provided by the present application further includes: a receiving unit 103 and an obtaining unit 104, wherein the receiving unit 103 is used for the apparatus 30 supporting data management to execute the steps S4, S1 in the above embodiments. The apparatus 30 for supporting data management by the acquisition unit 104 performs step S103S1031 in the above-described embodiment. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
Fig. 11 shows a schematic diagram of a possible logical structure of the data management apparatus 30 in the above embodiment, in the case of using an integrated unit. The data management apparatus 30 includes: a processing module 312 and a communication module 313. The processing module 312 is used for controlling and managing actions of the data-managed device 30, for example, the processing module 312 is used for supporting operations of performing data or signaling processing on the data-managed device 30 side, for example, S101, S102, S2, S3, and S5, the communication module 313 is used for supporting operations of performing data reception and transmission on the data-managed device 30 side, for example, steps S4, S1, and/or other processes for the techniques described herein. The apparatus 30 for data management may further comprise a storage module 311 for storing program codes and data of the apparatus 30 for data management.
The processing module 312 may be a processor or controller, such as a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a digital signal processor and a microprocessor, or the like. The communication module 313 may be a transceiver, a transceiving circuit or a transceiver, etc. The storage module 311 may be a memory.
When the processing module 312 is a processor, the communication module 313 is a transceiver, and the storage module 311 is a memory, the apparatus for data management according to the embodiment of the present invention may be the device shown in fig. 3 b.
In one aspect, an embodiment of the present invention provides a computer-readable storage medium, in which instructions are stored, which, when run on a data management apparatus, cause the data management apparatus to perform steps S101, S102, S2, S3, S5, S1, and S4 in the above-described embodiments.
The application provides a computer program product comprising instructions stored therein, which when executed, cause a server to perform steps S101, S102, S2, S3 and S5, S1 and S4.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the flow or functions described in accordance with embodiments of the invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be read by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard drive, read only memory, random access memory, magnetic or optical disk, and the like.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A data management method is applied to a distributed storage system, and comprises the following steps:
the method comprises the steps that in the process that a server side determines that a plurality of data stored in a first storage device need to be stored in a second storage device, the server side determines that at least two data in the plurality of data meet a first preset condition, the server side stores the at least two data and the rest data except the at least two data in the first storage device in the second storage device to obtain a first data file, wherein the at least two data are merged and stored in the second storage device and are located in a first data set; the first preset condition comprises any one or more of the following conditions: the time information of the at least two data belongs to the same time period, the types of the at least two data are the same, and the indexes of the at least two data belong to the same range;
the merged storing means that at least two data are independent data and a relationship is established so that the at least two data belong to the first data set.
2. The method of claim 1, further comprising:
the server determines that at least two first data files in the plurality of first data files in the second storage device meet a second preset condition, and the server acquires a second data set according to the at least two first data files.
3. The method of claim 2, further comprising: and the server side determines that partial data in the at least two first data files meet a second preset condition, and then the server side acquires a second data set according to the partial data in the at least two first data files.
4. The method according to any one of claims 1 to 3, wherein the server determines that a plurality of data stored in the first storage device needs to be stored in the second storage device, including:
the server receives a first operation instruction, wherein the first operation instruction is used for indicating that a plurality of data stored in a first storage device are stored in a second storage device in a persistent mode;
the server determines that a plurality of data stored in the first storage device need to be stored in the second storage device according to the first operation instruction;
or, if the server determines that the plurality of data stored in the first storage device satisfy the data persistence condition, the server determines that the plurality of data need to be stored in the second storage device.
5. The method according to any one of claims 1 to 3, wherein the server stores the at least two data and the rest of the data in the first storage device except the at least two data in the second storage device to obtain a first data file, and comprises:
the server side acquires a first data set according to the at least two data;
the server stores the first data set and the rest data in the second storage device.
6. The method according to claim 4, wherein the server stores the at least two data and the rest of the data in the first storage device except the at least two data in the second storage device to obtain a first data file, and comprises:
the server side acquires a first data set according to the at least two data;
the server stores the first data set and the rest data in the second storage device.
7. A data management apparatus, applied to a distributed storage system, the apparatus comprising:
the server side is used for determining that at least two data in the plurality of data meet a first preset condition in the process of determining that the plurality of data stored in the first storage device need to be stored in the second storage device; the first preset condition comprises any one or more of the following conditions: the time information of the at least two data belongs to the same time period, the types of the at least two data are the same, and the indexes of the at least two data belong to the same range;
a storage unit, configured to store the at least two pieces of data and the remaining data in the first storage device except the at least two pieces of data in the second storage device to obtain a first data file, where the at least two pieces of data are merged and stored in the second storage device and are located in a first data set; the merged storing means that at least two data are independent data and a relationship is established so that the at least two data belong to the first data set.
8. The apparatus of claim 7, further comprising:
the obtaining unit is further configured to determine that at least two first data files in the plurality of first data files in the second storage device satisfy a second preset condition, and the server obtains a second data set according to the at least two first data files.
9. The apparatus according to claim 8, wherein the obtaining unit is further configured to determine that partial data in the at least two first data files satisfies a second preset condition, and the server obtains a second data set according to the partial data in the at least two first data files.
10. The apparatus according to any one of claims 7-9, further comprising:
a receiving unit, configured to receive a first operation instruction, where the first operation instruction is used to instruct persistent storage of a plurality of data stored in a first storage device to a second storage device;
the determining unit is used for determining that a plurality of data stored in the first storage device need to be stored in the second storage device according to the first operation instruction;
or, the determining unit is configured to determine that the plurality of data stored in the first storage device satisfy a data persistence condition, and the server determines that the plurality of data need to be stored in the second storage device.
11. The apparatus according to any one of claims 7-9, further comprising: an obtaining unit, configured to obtain a first data set according to the at least two data sets, where the storing unit is specifically configured to store the first data set and the remaining data in the second storage device.
12. The apparatus of claim 10, further comprising: an obtaining unit, configured to obtain a first data set according to the at least two data sets, where the storing unit is specifically configured to store the first data set and the remaining data in the second storage device.
13. A computer-readable storage medium having stored therein instructions that, when executed, cause the data management method of any of claims 1-6 to be performed.
CN201711487793.4A 2017-12-29 2017-12-29 Data management method and device Active CN108304142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711487793.4A CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711487793.4A CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Publications (2)

Publication Number Publication Date
CN108304142A CN108304142A (en) 2018-07-20
CN108304142B true CN108304142B (en) 2021-10-15

Family

ID=62868328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711487793.4A Active CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Country Status (1)

Country Link
CN (1) CN108304142B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111190908B (en) * 2018-11-15 2023-09-22 华为技术有限公司 Data management method, device and system
CN109542352B (en) * 2018-11-22 2020-05-08 北京百度网讯科技有限公司 Method and apparatus for storing data
CN109947733A (en) * 2019-03-29 2019-06-28 众安信息技术服务有限公司 Data storage device and method
CN112286948B (en) * 2020-11-18 2023-05-23 成都佳华物链云科技有限公司 Data storage method, data reading method and data storage device of time sequence database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793493A (en) * 2014-01-21 2014-05-14 深圳市元征科技股份有限公司 Method and system for processing car-mounted terminal mass data
CN105430078A (en) * 2015-11-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Distributed storage method of mass data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793493A (en) * 2014-01-21 2014-05-14 深圳市元征科技股份有限公司 Method and system for processing car-mounted terminal mass data
CN105430078A (en) * 2015-11-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Distributed storage method of mass data

Also Published As

Publication number Publication date
CN108304142A (en) 2018-07-20

Similar Documents

Publication Publication Date Title
CN108304142B (en) Data management method and device
US9213731B2 (en) Determining whether to relocate data to a different tier in a multi-tier storage system
US8578096B2 (en) Policy for storing data objects in a multi-tier storage system
JP4733461B2 (en) Computer system, management computer, and logical storage area management method
CN108628541B (en) File storage method, device and storage system
CN110168532B (en) Data updating method and storage device
CN103902623A (en) Method and system for accessing files on a storage system
CN109804359A (en) For the system and method by write back data to storage equipment
CN103491152A (en) Metadata obtaining method, device and system in distributed file system
US11226778B2 (en) Method, apparatus and computer program product for managing metadata migration
CN115840731A (en) File processing method, computing device and computer storage medium
CN111857574A (en) Write request data compression method, system, terminal and storage medium
CN107526533B (en) Storage management method and equipment
CN105653539A (en) Index distributed storage implement method and device
CN106156038B (en) Date storage method and device
CN104969167A (en) Control device and control method
US20110307525A1 (en) Virtual storage device
CN115079936A (en) Data writing method and device
CN101908007A (en) Storage system and computer system
CN113032349A (en) Data storage method and device, electronic equipment and computer readable medium
CN111857556A (en) Method, apparatus and computer program product for managing metadata of storage objects
US11360687B2 (en) Method of processing a input-output request, an electronic device, and a computer program product
US11314430B2 (en) Reading data in sub-blocks using data state information
CN113849482A (en) Data migration method and device and electronic equipment
CN108959517B (en) File management method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200422

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Applicant before: Huawei Technologies Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220214

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.