CN108304142A - A kind of data managing method and device - Google Patents

A kind of data managing method and device Download PDF

Info

Publication number
CN108304142A
CN108304142A CN201711487793.4A CN201711487793A CN108304142A CN 108304142 A CN108304142 A CN 108304142A CN 201711487793 A CN201711487793 A CN 201711487793A CN 108304142 A CN108304142 A CN 108304142A
Authority
CN
China
Prior art keywords
data
storage device
server
storage
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711487793.4A
Other languages
Chinese (zh)
Other versions
CN108304142B (en
Inventor
毕杰山
钟超强
李岱城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Hangzhou Huawei Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huawei Digital Technologies Co Ltd filed Critical Hangzhou Huawei Digital Technologies Co Ltd
Priority to CN201711487793.4A priority Critical patent/CN108304142B/en
Publication of CN108304142A publication Critical patent/CN108304142A/en
Application granted granted Critical
Publication of CN108304142B publication Critical patent/CN108304142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of data managing method of the application offer and device, are related to field of storage, and to reduce the IO redundancies generated in data merging process, the program is applied in distributed memory system, including:During server-side determines that the multiple data for needing that first storage device will be stored in are stored to the second storage device, the server-side determines that there are two or more data to meet the first preset condition in the multiple data, the server-side will be in the remainder data storage to second storage device in described two or more than two data and the first storage device in addition to described two or more than two data, to obtain the first data file, wherein, described two or more than two data are located in second storage device in the first data acquisition system.

Description

A kind of data managing method and device
Technical field
This application involves field of storage more particularly to a kind of data managing methods and device.
Background technology
The storage mode that key-value (Key-Value) type is used in distributed memory system, that is, place the data in Then the parts Value build the mapping relations between Key and Value.It, can be according to Key when client accesses the data Mapping relations between Value, using Key as the corresponding Value of index search, to access the data for being stored in the parts Value. In addition, data are when being saved in the distributed memory system, typically according to the lexicographic order natural ordering of Key.In this way Ensure that the same subscriber-coded data are adjacent storages.In order to be obtained according to a subscriber-coded XXXX Take all data of the subscriber-coded XXXX whithin a period of time.
When data store in distributed memory system, corresponding user's table can be created according to user demand (Table), the Table is for storing a kind of data, it is, for example, possible to use entitled user information (UserInfo) Table stores user base information, which can use as Key;Use an entitled transaction record (Transactions) Table carrys out store transaction record managing detailed catalogue, and the Table of the Transactions can conduct Value is used.And the data of magnanimity may be included in a Table, therefore, relatively common mode is by one in the prior art Table according to the cutting of the lexicographic order of the Key of record generates multiple sublists (Region) to be managed and safeguard, Ye Jiyi A Region, which refers to one, to be had starting Key and terminates the Key values section of Key, and different Key belongs to different Region, and one A Table generally includes one or more Region.
In the prior art, existing Key-Value lists in data line can be merged into one big by client KeyValue, specifically, as shown in Figure 1, client reads out existing Key-Value lists, the number firstly the need of from server-side Include each corresponding Value of Key at least one Key and at least one Key according to list, as shown in Fig. 2, client will Two or more data in same a line or same row of different files merge, to form a new Key- Value lists, and a new Key-Value list will be formed and delete label and be sent to server-side, wherein delete label The data that will merge are used to indicate to delete.
But client reads disk read-write (IO) redundancy for having Key-Value lists and causing reading from server-side, In addition, the new Key-Value lists of client and deleting during label is sent to server-side that there is write-in disk I/O superfluous It is remaining.
Invention content
A kind of data managing method of the application offer and device, it is superfluous to reduce the disk I/O generated in data merging process It is remaining.
To solve the above problems, the application provides a kind of data managing method, it is applied in distributed memory system, the party Method includes:Server-side, which determines, needs to be stored in multiple data storage of first storage device to the process of the second storage device In, the server-side determines that there are two or more data to meet the first preset condition in the multiple data, described Server-side will be removed described two or more than two in described two or more than two data and the first storage device Remainder data storage except data is in second storage device, to obtain the first data file, wherein it is described two or More than two data are located in second storage device in the first data acquisition system.
The application provides a kind of data managing method, by being arrived by the multiple data storages for being stored in first storage device During second storage device, if two or more data in multiple data in first storage device meet the One preset condition, then two or more data and remainder data storage be in second storage device, and two Or more than two data are located in second storage device in the first data acquisition system, due to two or more numbers The process that the first data acquisition system is stored in the second storage device according to this is performed by server-side, therefore compared with prior art Disk I/O redundancy can be reduced, in addition, two or more data are stored with the first data acquisition system, is needing to visit in this way When asking two or more data, first data acquisition system can be accessed by accessing the first data acquisition system In data, when two or more data are not incorporated into the first data acquisition system, it usually needs to two or two with On data access respectively, therefore the application can reduce access times.
With reference to first aspect, in the first possible realization method of first aspect, method provided by the present application is also wrapped It includes:Server-side determines that there are two or more first numbers in multiple first data files in second storage device Meet the second preset condition according to file, the server-side obtains the according to described two or more than two first data files Two data acquisition systems.Application is by merging multiple first data files in the second storage device, it is possible to reduce is stored in second The quantity of the first data file in storage device, to improve access performance.
In addition, further, server-side determines the partial data in described two or more than two first data files Meet the second preset condition, then the server-side is obtained according to the partial data in two or more first data files Take the second data set.
With reference to first aspect or the first possible realization method of first aspect, second in first aspect are possible In realization method, the second preset condition includes:Two or more first data files belong to the same period, two or The index of identical, two or more first data file of type of more than two first data files is identical, two Or the Time Continuous of more than two first data files, the mark of two or more first data files are continuous.
With reference to first aspect any one of to second of possible realization method of first aspect, the of first aspect In three kinds of possible realization methods, server-side is determined to need to store the multiple data for being stored in first storage device to second and be deposited Storage device, including:Server-side receives the first operational order, and first operational order, which is used to indicate, will be stored in the first storage dress The multiple Data Data persistent storages set are to the second storage device;Server-side is determined and is needed according to first operational order The multiple data for being stored in first storage device are stored to the second storage device;Alternatively, server-side determination is stored in the Multiple data of one storage device meet data persistence condition, then the server-side determination needs to store the multiple data To second storage device.
With reference to first aspect any one of to the third possible realization method of first aspect, the of first aspect In four kinds of possible realization methods, the first preset condition includes following any one or more:It is described two or more than two The temporal information of data belonged in the same period, and the data directory of described two or more than two data is identical, and described two The type of a or more than two data is identical, and the index of described two or more than two data belongs to the same range, institute The Time Continuous of two or more data is stated, the mark of described two or more than two data is continuous.
With reference to first aspect any one of to the 4th kind of possible realization method of first aspect, the of first aspect In five kinds of possible realization methods, data persistence condition includes at least one of the following or multinomial:It is stored in the first storage The size of the data of device is greater than or equal to first threshold, and there are other needs to store to the data of first storage device, and first The memory space of storage device is less than or equal to second threshold.
With reference to first aspect any one of to the 5th kind of possible realization method of first aspect, the of first aspect In six kinds of possible realization methods, server-side will remove two in two or more data and the first storage device Or in the remainder data storage to the second storage device except more than two data, to obtain the first data file, including:Clothes End be engaged according to two or more data, obtains the first data acquisition system;Server-side is by the first data acquisition system and its remainder According in storage to the second storage device.
Second aspect, the application provide a kind of data administrator, are applied in distributed memory system, the device packet It includes:Determination unit needs to be stored in multiple data storage of first storage device to the mistake of the second storage device for determining Cheng Zhong, the server-side determine that there are two or more data to meet the first preset condition in the multiple data;
Storage unit, for described two will to be removed in described two or more than two data and the first storage device In remainder data storage to second storage device except a or more than two data, to obtain the first data file, Wherein, described two or more than two data are located in second storage device in the first data acquisition system.
In conjunction with second aspect, in the first possible realization method of second aspect, device provided by the present application also wraps It includes:Acquiring unit, be additionally operable to determine second storage device in multiple first data files in there are two or two with On the first data file meet the second preset condition, the server-side according to described two or more than two first data text Part obtains the second data set.
In conjunction with the possible realization method of the first of second aspect or second aspect, the third in second aspect is possible In realization method, the second preset condition includes:Two or more first data files belong to the same period, two or The index of identical, two or more first data file of type of more than two first data files is identical, two Or the Time Continuous of more than two first data files, the mark of two or more first data files are continuous.
In conjunction with any one of the third possible realization method of second aspect to second aspect, the of second aspect In four kinds of possible realization methods, device provided by the present application further includes:Receiving unit, for receiving the first operational order, institute Multiple Data Data persistent storages of first storage device will be stored in the second storage by stating the first operational order and being used to indicate Device;Determination unit needs to store the data for being stored in first storage device for according to first operational order, determining To the second storage device;
Alternatively, the determination unit, for determining that the multiple data for being stored in first storage device meet data persistence Condition, then the server-side determination are needed the multiple data storage to second storage device.
In conjunction with any one of the 4th kind of possible realization method of second aspect to second aspect, the of second aspect In five kinds of possible realization methods, the first preset condition includes following any one or more:It is described two or more than two The temporal information of data belonged in the same period, and the data directory of described two or more than two data is identical, and described two The type of a or more than two data is identical, and the index of described two or more than two data belongs to the same range, institute The Time Continuous of two or more data is stated, the mark of described two or more than two data is continuous.
In conjunction with any one of the 5th kind of possible realization method of second aspect to second aspect, the of second aspect In six kinds of possible realization methods, data persistence condition includes at least one of the following or multinomial:It is stored in the first storage The size of the data of device is greater than or equal to first threshold, and there are other needs to store to the data of first storage device, and first The memory space of storage device is less than or equal to second threshold.
The third aspect, the application provide a kind of computer readable storage medium, are stored in the computer readable storage medium There is instruction, when described instruction is run so that server-side execute above-mentioned first aspect to first aspect the 5th kind is possible Data managing method described in realization method.
Fourth aspect, the application provide a kind of computer program product including instruction, are stored in computer program product There is instruction, when the instruction is run so that server-side executes above-mentioned first aspect to the 5th kind of possible reality of first aspect Data managing method described in existing mode.
5th aspect, the application provide a kind of chip system, are applied in data administrator, which includes extremely A few processor and interface circuit, interface circuit and at least one processor are interconnected by circuit, and processor refers to for running It enables, to carry out data managing method described in the 5th kind of possible realization method of the first aspect to first aspect.
6th aspect, the application provide a kind of data management system, include such as the of above-mentioned second aspect to second aspect Data administrator and client described in five kinds of possible realization methods.
Description of the drawings
Fig. 1 is that a kind of data provided in the prior art merge schematic diagram one;
Fig. 2 is that a kind of data provided in the prior art merge schematic diagram two;
Fig. 3 a are a kind of structural schematic diagram of the distributed memory system of data managing method application provided by the present application;
Fig. 3 b are a kind of structural schematic diagram one of the device of data management provided in an embodiment of the present invention;
Fig. 4 is a kind of flow diagram one of data managing method provided by the present application;
Fig. 5 is that server-side stores data to the schematic diagram of the second storage device from first storage device in the application;
Fig. 6 is a kind of flow diagram two of data managing method provided by the present application;
Fig. 7 is the merging schematic diagram one of data in a kind of second storage device provided by the present application;
Fig. 8 is the merging schematic diagram two of data in a kind of second storage device provided by the present application;
Fig. 9 is the merging schematic diagram three of data in a kind of second storage device provided by the present application;
Figure 10 is a kind of structural schematic diagram two of data administrator provided by the present application;
Figure 11 is the structural schematic diagram of another data administrator provided by the present application.
Specific implementation mode
It should be noted that in the application, " illustrative " or " such as " etc. words make example, illustration for indicating or say It is bright.Described herein as " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as than it His embodiment or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words be intended to Related notion is presented in specific ways.
Term "and/or" in the application, only a kind of incidence relation of description affiliated partner, indicates may exist three kinds Relationship, for example, A and/or B, can indicate:Individualism A exists simultaneously A and B, these three situations of individualism B.In addition, this Character "/" in application, it is a kind of relationship of "or" to typically represent forward-backward correlation object.
The network architecture and business scenario of the embodiment of the present application description are to more clearly illustrate that the application is implemented The technical solution of example, does not constitute the restriction for technical solution provided by the embodiments of the present application, those of ordinary skill in the art It is found that the appearance of the differentiation and new business scene with the network architecture, technical solution provided by the embodiments of the present application is for similar The technical issues of, it is equally applicable.
Term " multiple " in the application refers to two or more.
Term " first ", " second " in the application etc. are merely to distinguish different objects, not to the progress of its sequence It limits.For example, the first data and the second data do not limit its sequencing just for the sake of distinguishing different data It is fixed.
Before introducing the application, first introduce this application involves to relational language:
As shown in Figure 3a, Fig. 3 a show a kind of distributed memory system of data managing method application provided by the present application Structural schematic diagram, which includes client, server-side and at least one storage being connect with server-side Device (it is illustrative, be by storage device 1, storage device 2 and storage device 3 of at least one storage device in Fig. 3 a Example), it is to be understood that the distributed memory system in the application may include three or more storage devices.
Wherein, data are written for reading the data for being stored in server-side, or to server-side in client.
Server-side can write data into the memory of server-side, also may be used for data to be written according to the request of client To be written in the storage device that is connect with server-side, or to client transmission data, wherein storage device is used to store number According to.
It is understood that at least one storage device can be disk.
Distributed memory system when writing data, data can be written simultaneously ahead daily record (write-ahead log, WAL) and in the memory of Region.WAL is cured to the reliability that disk is used for ensureing data, and the data in Region memories are full After the certain condition of foot, on meeting data persistence (Flush) to disk, distributed data base data storage file is formed (HFile), the timestamp scope comprising data while in its metadata is had recorded.
When distributed data base persistently writes data, due to Flush, there can be a large amount of HFile on disk, influence To the performance for reading data.So HFile can execute merging (compaction) flow when meeting certain condition, it will be multiple HFile is merged into a HFile.
Region is the minimum unit of distributed storage and load balancing.
Controller may be used to realize in the device of data management in the embodiment of the present invention, and as shown in Figure 3b, Fig. 3 b show A kind of possible structure of the device of data management is gone out, as shown in Figure 3b, the device 30 of the data management includes:Including storage Device 511, processor 512, system bus 513, power supply module 514, input/output interface 515 and communication component 516 etc..Its In, memory 511 can be used for storing data, software program and module, includes mainly storing program area and storage data field, deposits Store up program area can storage program area, the application program etc. needed at least one function, storage data field can store data management Device 30 use created data etc..Processor 512 is by running or executing the software journey being stored in memory 511 Sequence and/or module, and call and be stored in data in memory 511, execute the device 30 of data management various functions and Handle data.System bus 513 includes address bus, data/address bus, controlling bus, is used for transmission data and instruction;Power supply group The various components that part 514 is used for the device 30 for data management provide power supply;Input/output interface 515 is for processor 512 and outside It encloses and provides interface between interface module;Communication component 516 is used between device 30 and other equipment for data management be had The communication of line or wireless mode.
As shown in figure 4, Fig. 4 shows a kind of flow diagram of data managing method provided by the present application, it is applied to divide In cloth storage system, including:
S101, server-side, which determine, to be needed to store the multiple data for being stored in first storage device to the second storage device In the process, server-side determines that there are two or more data to meet the first preset condition (in the application in multiple data Preset condition be referred to as service logic, namely two or more data are merged into first according to service logic Data acquisition system).
Optionally, the first storage device in the application can be the memory of server-side, for example, Memstore, and this Shen Please in the second storage device can be the storage device that is connect with server-side, such as storage device 1 shown in Fig. 3 a, this is deposited Storage device 1 can be disk.
It is understood that multiple data in the application are stored in first storage device in the form of Key-Value, Therefore two or more data in the second data set can also be stored in the second storage dress in the form of Key-Value In setting.
Specifically, server-side can determine that needs arrive the data storage for being stored in first storage device in several ways In second storage device, for example, a kind of mode is:S1, server-side receive the first operational order, and first operational order is for referring to Show multiple Data Data persistent storages for will being stored in first storage device to the second storage device;S2, server-side determine The data that will be stored in first storage device are needed to store to the second storage device;Another way is that S3, server-side determination are deposited The data stored up in first storage device meet data persistence condition, then server-side, which determines, needs that first storage device will be stored in Data storage to the second storage device.
Optionally, which includes at least one of the following or multinomial:It is stored in first storage device The sizes of data be greater than or equal to first threshold, there are other needs to store to the data of first storage device, the first storage The memory space of device is less than or equal to second threshold.
Specifically, first preset condition includes following any one or more:Described two or more than two data Temporal information belong in the same period, the data directory of described two or more than two data is identical, it is described two or The type of more than two data is identical, and the index of described two or more than two data belongs to the same range.
Illustratively, identical for the data directory of two or more data with the first preset condition, for example, two Or more than two data include the first data and the second data, wherein the first data and the second data index having the same, For example, Key, the first data are stored in the parts the first Value and the second data are stored in the parts Value.Specifically, such as 1 institute of table Show:
Table 1
Value1 Value2 Value3 Value4
Key1 Data 1 Data 2 Data 3 Data 4
Key2 Data 5 Data 6 Data 7 Data 8
In conjunction with table 1 it is found that data 1, data 2, data 3 and data 4 are respectively stored in the difference of first storage device On Value, but Key1 having the same, such server-side are stored by data 1, data 2, data 3 and data 4 to second It, can be by 1 corresponding Key1 of data 1 and data, 2 corresponding Key1 of data 2 and data, data during storage device 3 and 3 corresponding Key1 of data, data 4 and 1 corresponding Key1 of data generate the first data acquisition system.
Illustratively, the same period is belonged to the temporal information that the first preset condition is two or more data It is interior, for example, with 13 in real process:00-14:Between 00 it is per minute can acquire a Weather information for, server-side will The Weather information of the acquisition per minute is respectively stored in first storage device, then and 13:00-14:It will be deposited first between 00 There are 60 Weather informations in storage device, when client needs access 13:00-14:When 00, client can pass through 60 Each Weather information is corresponding in Weather information indexes to determine 13:00-14:Weather information between 00, in this way will certainly be more Secondary access first storage device, and in the application when meeting data persistence condition, server-side can be by 13:00-14:00 it Between 60 Weather informations acquiring generate first data acquisition system, while server-side is that first data acquisition system assigns one Mark or index, such client can pass through the mark or index accesses 13 of the first data acquisition system:00-14:Between 00 60 Weather informations of acquisition.
S102, server-side will in two or more data and first storage device except it is described two or two with On data except remainder data storage in second storage device, to obtain the first data file, wherein described two A or more than two data are located at the first data acquisition system in the second storage device.
Specifically, the server-side in the application can be stored by two or more data to the second storage device Before, two or more data are merged to generate the first data acquisition system, namely according to two or more numbers Executed in first storage device side according to merging with generating the process of the first data acquisition system, server-side can also by two or two with On data store to the second storage device during by two or more data merge to generate the first data set Close, server-side two or more data can also be stored to the second storage device and then by two or two with On data merge, to generate the first data acquisition system, the application to this without limiting, as long as ensureing that being ultimately stored on second deposits Two or more data in storage device belong to the first data acquisition system.
Specifically, the step S102 in the application can be accomplished by the following way:Server-side according to two or two with On the first data acquisition system of data acquisition, server-side will be in the first data acquisition system and first storage device except described two or two In remainder data storage to second storage device except a above data, to obtain the first data file.
Specifically, server-side determines that two or more data meet the first preset condition in the application, it can also It is interpreted as server-side to merge two or more data according to the first service logic, to generate the first data acquisition system.
It needs, two or more data is merged in the application and generate the first data acquisition system, this two Or more than two data are individually present in the first data acquisition system, it is to be understood that merging refer to by two or two with On Dynamic data exchange data, opening relationships is so that two or more data belong to the first data acquisition system.
Illustratively, server-side can will be removed two or more in the first data acquisition system and first storage device Remainder data except data is stored according to the form of distributed data base data storage file HFile to the second storage device In, to obtain the first data file.
Illustratively, as shown in figure 5, server-side is stored by data 1, data 2, data 3 to the mistake of the second storage device Cheng Zhong can merge data 1, data 2, data 3, and (wherein, data X1 includes data 1, data by the data X1 after merging 2, data 3), and will be stored into the second storage device positioned at multiple data M with a line with data 1, data 2, data 3, Similarly, server-side, can be by number during storing data 4, data 5, data 6 and data 7 to the second storage device Merge according to 4, data 5, data 6 and data 7, and by the data X2 after merging, and will with data 4, data 5, data 6, with And data 7 are stored into the second storage device positioned at multiple data M with a line to obtain distributed data base data storage text Part Hfile.
It should be noted that the two or more data stored in the first storage device are located at different rows or not It, can be by two or two in the application when two or more data meet the first preset condition when same column The above data for meeting the first preset condition merge, and are not gone together or the merging of the data of different lines with realizing to be located at, and existing skill It is only capable of to be located in art and merge with the data of a line or the data of same row.
Illustratively, the memory space of first storage device can be divided into the storage side of Key-Value in the application Formula, wherein with behavior Key, by taking Value as an example, by 13:00-14:Acquire a Weather information per minute is between 00 The first row of one storage device stores, by 14:00-15:Acquire a Weather information per minute is in the first storage between 00 Second row of device stores, then when needing access 13:00-15:It, can be by 13 when Weather information between 00:00-15:00 Between Weather information generate the first data combine, and be 13:00-15:The first data knot that Weather information between 00 generates One mark of distribution is closed, 13 can be got in this way:00-15:Weather information between 00.
It should be noted that when the data that first storage device stores in the application have, when deleting label, server-side exists It, can will be with the data for deleting label by multiple data storage in first storage device to during the second storage device It deletes, namely fills multiple data storage in first storage device to the second storage in server-side with the data for deleting label It will not be by the second storage device of storage during setting.
Certainly, it should be noted that when two or more data are incorporated into the first data acquisition system, service End can be not only that the first data acquisition system distributes a first identifier, so that client can access first according to first identifier Multiple data in data acquisition system, further, it is also possible to for each data distribution one in two or more data the Two marks, such client, according to second identifier, can also obtain the number indicated by second identifier from the first data acquisition system According to.
The application provides a kind of data managing method, by being arrived by the multiple data storages for being stored in first storage device During second storage device, if two or more data in multiple data in first storage device meet the One preset condition, then two or more data and remainder data storage be in second storage device, and two Or more than two data are located in second storage device in the first data acquisition system, due to two or more numbers The process that the first data acquisition system is stored in the second storage device according to this is performed by server-side, therefore compared with prior art Disk I/O redundancy can be reduced, in addition, two or more data are stored with the first data acquisition system, is needing to visit in this way When asking two or more data, first data acquisition system can be accessed by accessing the first data acquisition system In data, when two or more data are not incorporated into the first data acquisition system, it usually needs to two or two with On data access respectively, therefore the application can reduce access times.
Optionally, as shown in fig. 6, method provided by the present application further includes:
S103, server-side determine that there are two or more in multiple first data files in the second storage device First data file meets the second preset condition, and server-side is obtained according to described two or more than two first data files The second data set.
Specifically, step S103 can be accomplished by the following way:Server-side determination needs to merge in the second storage device Multiple first data files in the case of, server-side determines that there are two in multiple first data files in the second storage device A or more than two first data files meet the second preset condition, and server-side is according to described two or more than two first Data file obtains the second data set.
Specifically, server-side determine need merge the second storage device in multiple first data files, can by with Under type is realized:S4, server-side receive the second operational order, which is used to indicate service and merges the second storage Multiple first data files in device, S5, server-side can be determined according to the second operational order to be merged in the second storage device Multiple first data files;In addition, server-side determines that the quantity of the first data file in the second storage device is more than or waits In third threshold value, then server-side, which determines, needs to merge multiple first data files in the second storage device.Pass through in the application Multiple data files in second storage device are merged, the number of files in the second storage device can be reduced in this way, To promote the performance of reading.
Wherein, the second preset condition includes:Two or more first data files belong to the same period, two Or the index of identical, two or more first data file of the type of more than two first data files is identical, two The Time Continuous of a or more than two first data files, the mark of two or more first data files are continuous.
Optionally, the mark of two or more first data files may include continuously any one of following:Two Or the storage address of more than two first data files is continuous, the reference number of a document of two or more first data files It is continuous etc..
Illustratively, by taking the first data file is the HFile being stored in disk as an example, then as shown in fig. 7, when multiple or When multiple above HFile files meet second condition, server-side can be by two or more HFile according to default Service logic merges to generate the second data set.
Illustratively, as shown in figure 8, when data file 1 (including Key Value CoIA, Key Value CoIB and Key Value CoIC) and data file 2 (including Key Value CoID, Key Value CoIE) the second preset condition of satisfaction When, server-side can merge data file 1 and data file 2, and to generate the second data set, which includes Key Value CoIA、Key Value CoIB、Key Value CoIC、Key Value CoID、Key Value CoIE。
Optionally, server-side determines that the partial data in described two or more than two first data files meets second Preset condition, then the server-side is according to the partial data in two or more first data files, the second number of acquisition According to set.
Specifically, server-side determines that at least one file that data file 1 includes and data file 2 include at least When one data file meets the second preset condition, server-side can also by least one file that data file 1 includes and At least one data file that data file 2 includes merges, to generate the second data file, another example, as shown in figure 9, Server-side determines the Key Value CoIC that data file 1 includes and Key Value CoID, Key that data file 2 includes When Value CoIE meet the second preset condition, server-side can be by Key Value CoIC, Key Value CoID, Key Value CoIE merge, to generate the second data set, for example, the Key Value CoIF in Fig. 9.
Specifically, a kind of possible realization method, step S103 can be accomplished by the following way:
When S1031, server-side determine that multiple first data files in the second storage device meet the second preset condition, institute Server-side is stated according to two or more in the preset time period before current time in the multiple first data file The first data file obtain the second data set.
It should be noted that when having the data for deleting label in the first data file in the application, work as server-side When two or more first data files are merged, there are the data for deleting label will be deleted, namely ultimately generate The second data set in do not include with delete label data.
It is above-mentioned that mainly scheme provided by the present application is described from the angle of the device of data management.It is appreciated that It is that for device of data management etc. in order to realize above-mentioned function, it comprises execute the corresponding hardware configuration of each function and/or soft Part module.Those skilled in the art should be readily appreciated that, described in conjunction with the examples disclosed in the embodiments of the present disclosure The device and method step of data management, the present invention can be realized with the combining form of hardware or hardware and computer software. Some functions is executed in a manner of hardware or computer software driving hardware actually, depends on the specific application of technical solution And design constraint.Professional technician can realize described work(using distinct methods to each specific application Can, but this realization is it is not considered that exceed scope of the present application.
The embodiment of the present invention can carry out device of data management etc. according to the above method example division of function module, For example, can correspond to each function divides each function module, two or more functions can also be integrated in one In processing module.The form that hardware had both may be used in above-mentioned integrated module is realized, the shape of software function module can also be used Formula is realized.It should be noted that being schematical to the division of module in the embodiment of the present invention, only a kind of logic function is drawn Point, formula that in actual implementation, there may be another division manner.
In the case where dividing each function module using corresponding each function, Figure 10 shows involved in above-described embodiment And data management device a kind of possible structural schematic diagram, the device 30 of data management includes:Determination unit 101 and Storage unit 102.Wherein it is determined that unit 101 is used to support the device 30 of data management to execute the step in above-described embodiment S101, S2, S3 and S5;Storage unit 102 is used to support the device 30 of data management to execute the step S102 in above-described embodiment, In addition, the device 30 of data management provided by the present application further includes:Receiving unit 103 and acquiring unit 104, wherein receive Unit 103 is used to support the device 30 of data management to execute step S4, S1 in above-described embodiment.Acquiring unit 104 is for branch Hold the step S103S1031 in execution above-described embodiment of device 30 of data management.Each step that above method embodiment is related to All related contents can quote corresponding function module function description, details are not described herein.
Using integrated unit, Figure 11 shows the device of data management involved in above-described embodiment A kind of 30 possible logical construction schematic diagram.The device 30 of data management includes:Processing module 312 and communication module 313.Place Reason module 312 is used to carry out control management to the action of the device 30 of data management, for example, processing module 312 is for supporting 30 side of device of data management executes the operation of data or signaling processing, for example, S101, S102, S2, S3 and S5, communication module 313 execute the operation of data receiver and transmission for supporting in 30 side of device of data management, for example, step S4, S1, and/or Other processes for techniques described herein.The device 30 of data management can also include memory module 311, for storing The program code and data of the device 30 of data management.
Wherein, processing module 312 can be processor or controller, such as can be central processor unit, general place Manage device, digital signal processor, application-specific integrated circuit, field programmable gate array or other programmable logic device, crystal Pipe logical device, hardware component or its arbitrary combination.It is may be implemented or executed in conjunction with described in the disclosure of invention Various illustrative logic blocks, module and circuit.The processor can also be the combination of realization computing function, such as comprising One or more microprocessors combine, the combination etc. of digital signal processor and microprocessor.Communication module 313 can be received Send out device, transmission circuit or transceiver etc..Memory module 311 can be memory.
When processing module 312 is processor, communication module 313 is transceiver, when memory module 311 is memory, this hair The device of data management involved by bright embodiment can be equipment shown in Fig. 3 b.
On the one hand, the embodiment of the present invention provides a kind of computer readable storage medium, is deposited in computer readable storage medium Instruction is contained, when computer readable storage medium is run on the device of data management so that the device of data management executes Step S101, S102, S2, S3 and S5, S1 in above-described embodiment and S4.
The application provides a kind of computer program product including instruction, and instruction is stored in computer program product, when When the instruction is run so that server-side executes step S101, S102, S2, S3 and S5, S1 and S4.
In the above-described embodiments, all or part of reality can be come by software, hardware, firmware or its arbitrary combination It is existing.When being realized using software program, can entirely or partly realize in the form of a computer program product.Computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or Partly generate the flow or function more described according to the embodiment of the present invention.The computer can make all-purpose computer, special Computer, computer network or other programmable devices.Computer instruction can be stored in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, computer instruction Can be from a web-site, computer, server or data center are by wired (for example, coaxial cable, optical fiber, number are used Family line DSL) or wireless (for example, infrared, wireless, microwave etc.) mode to another web-site, computer, server or data Central transmission.Computer readable storage medium can be any usable medium that can read of computer either comprising one or The data storage devices such as multiple usable mediums integrated server, data center.The usable medium can make magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium are (for example, solid state disk (Solid Stste Disk, SSD)) etc..
Through the above description of the embodiments, it is apparent to those skilled in the art that, for description It is convenienct and succinct, only the example of the division of the above functional modules, in practical application, can as needed and will be upper It states function distribution to be completed by different function modules, i.e., the internal structure of device is divided into different function modules, to complete All or part of function described above.The specific work process of the system, apparatus, and unit of foregoing description, before can referring to The corresponding process in embodiment of the method is stated, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module or The division of unit, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units Or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, institute Display or the mutual coupling, direct-coupling or communication connection discussed can be by some interfaces, device or unit INDIRECT COUPLING or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor execute described in each embodiment of the application The all or part of step of method.And storage medium above-mentioned includes:It is flash memory, mobile hard disk, read-only memory, random Access the various media that can store program code such as memory, magnetic disc or CD.
The above, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, it is any Change or replacement in the technical scope that the application discloses, should all cover within the protection domain of the application.Therefore, this Shen Protection domain please should be based on the protection scope of the described claims.

Claims (11)

1. a kind of data managing method, which is characterized in that it is applied in distributed memory system, the method includes:
During server-side determines that needs store the multiple data for being stored in first storage device to the second storage device, institute It states server-side and determines that there are two or more data to meet the first preset condition, the server-side in the multiple data Will in described two or more than two data and the first storage device except described two or more than two data it In outer remainder data storage to second storage device, to obtain the first data file, wherein it is described two or two with On data in second storage device be located at the first data acquisition system in.
2. according to the method described in claim 1, it is characterized in that, the method further includes:
The server-side determines that there are two or more in multiple first data files in second storage device First data file meets the second preset condition, the server-side according to described two or more than two first data files, Obtain the second data set.
3. according to the method described in claim 2, it is characterized in that, the method further includes:The server-side determines described two Partial data in a or more than two first data files meets the second preset condition, then the server-side according to two or Partial data in more than two first data files obtains the second data set.
4. according to claim 1-3 any one of them methods, which is characterized in that the server-side, which determines, needs to be stored in the Multiple data of one storage device are stored to the second storage device, including:
The server-side receives the first operational order, and first operational order, which is used to indicate, will be stored in first storage device Multiple Data Data persistent storages are to the second storage device;
The server-side determines according to first operational order and needs to store the multiple data for being stored in first storage device To the second storage device;
Alternatively, the server-side determines that the multiple data for being stored in first storage device meet data persistence condition, then it is described Server-side determination is needed the multiple data storage to second storage device.
5. according to claim 1-4 any one of them methods, which is characterized in that the server-side by it is described two or two with On data and the first storage device in remainder data storage in addition to described two or more than two data arrive In second storage device, to obtain the first data file, including:
The server-side obtains the first data acquisition system according to described two or more than two data;
The server-side will be in first data acquisition system and remainder data storage to second storage device.
6. a kind of data administrator, which is characterized in that be applied in distributed memory system, described device includes:
Determination unit needs to be stored in multiple data storage of first storage device to the mistake of the second storage device for determining Cheng Zhong, the server-side determine that there are two or more data to meet the first preset condition in the multiple data;
Storage unit, for will in described two or more than two data and the first storage device remove it is described two or In remainder data storage to second storage device except more than two data, to obtain the first data file, wherein Described two or more than two data are located in second storage device in the first data acquisition system.
7. device according to claim 6, which is characterized in that described device further includes:
The acquiring unit is additionally operable to determine that there are two or two in multiple first data files in second storage device A the first above data file meets the second preset condition, and the server-side is according to described two or more than two first numbers According to file, the second data set is obtained.
8. device according to claim 7, which is characterized in that the acquiring unit is additionally operable to determine described two or two Partial data in a the first above data file meets the second preset condition, then the server-side according to two or two with On the first data file in partial data, obtain the second data set.
9. according to claim 6-8 any one of them devices, which is characterized in that described device further includes:
Receiving unit, for receiving the first operational order, first operational order, which is used to indicate, will be stored in the first storage dress The multiple Data Data persistent storages set are to the second storage device;
Determination unit needs to store the data for being stored in first storage device for according to first operational order, determining To the second storage device;
Alternatively, the determination unit, for determining that the multiple data for being stored in first storage device meet data persistence condition, Then the server-side determination is needed the multiple data storage to second storage device.
10. according to claim 6-9 any one of them devices, which is characterized in that described device further includes:Acquiring unit is used According to described two or more than two data, obtaining the first data acquisition system, the storage unit is specifically used for described the In one data acquisition system and remainder data storage to second storage device.
11. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When described instruction is run so that the claims 1-5 any one of them data managing methods are performed.
CN201711487793.4A 2017-12-29 2017-12-29 Data management method and device Active CN108304142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711487793.4A CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711487793.4A CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Publications (2)

Publication Number Publication Date
CN108304142A true CN108304142A (en) 2018-07-20
CN108304142B CN108304142B (en) 2021-10-15

Family

ID=62868328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711487793.4A Active CN108304142B (en) 2017-12-29 2017-12-29 Data management method and device

Country Status (1)

Country Link
CN (1) CN108304142B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542352A (en) * 2018-11-22 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for storing data
CN109947733A (en) * 2019-03-29 2019-06-28 众安信息技术服务有限公司 Data storage device and method
CN111190908A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Data management method, device and system
CN112286948A (en) * 2020-11-18 2021-01-29 成都佳华物链云科技有限公司 Data storage method, reading method and device of time sequence database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793493A (en) * 2014-01-21 2014-05-14 深圳市元征科技股份有限公司 Method and system for processing car-mounted terminal mass data
CN105430078A (en) * 2015-11-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Distributed storage method of mass data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793493A (en) * 2014-01-21 2014-05-14 深圳市元征科技股份有限公司 Method and system for processing car-mounted terminal mass data
CN105430078A (en) * 2015-11-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Distributed storage method of mass data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111190908A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Data management method, device and system
CN111190908B (en) * 2018-11-15 2023-09-22 华为技术有限公司 Data management method, device and system
CN109542352A (en) * 2018-11-22 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for storing data
CN109542352B (en) * 2018-11-22 2020-05-08 北京百度网讯科技有限公司 Method and apparatus for storing data
CN109947733A (en) * 2019-03-29 2019-06-28 众安信息技术服务有限公司 Data storage device and method
CN112286948A (en) * 2020-11-18 2021-01-29 成都佳华物链云科技有限公司 Data storage method, reading method and device of time sequence database
CN112286948B (en) * 2020-11-18 2023-05-23 成都佳华物链云科技有限公司 Data storage method, data reading method and data storage device of time sequence database

Also Published As

Publication number Publication date
CN108304142B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
US8578096B2 (en) Policy for storing data objects in a multi-tier storage system
US9229826B2 (en) Volatile memory representation of nonvolatile storage device set
JP4733461B2 (en) Computer system, management computer, and logical storage area management method
US9430321B2 (en) Reconstructing data stored across archival data storage devices
CN103064639B (en) Date storage method and device
US9436571B2 (en) Estimating data storage device lifespan
US10356150B1 (en) Automated repartitioning of streaming data
US9424156B2 (en) Identifying a potential failure event for a data storage device
CN108304142A (en) A kind of data managing method and device
CN103597440A (en) Method for creating clone file, and file system adopting the same
US20080071983A1 (en) Information processing apparatus, information processing method and storage system
KR20140060308A (en) Efficient access to storage devices with usage bitmaps
CN109804359A (en) For the system and method by write back data to storage equipment
US9557938B2 (en) Data retrieval based on storage device activation schedules
US9854037B2 (en) Identifying workload and sizing of buffers for the purpose of volume replication
CN108399050B (en) Data processing method and device
CN104054071A (en) Method for accessing storage device and storage device
US9436524B2 (en) Managing archival storage
CN110147203A (en) A kind of file management method, device, electronic equipment and storage medium
US7792966B2 (en) Zone control weights
US11500822B2 (en) Virtualized append-only interface
US9430149B2 (en) Pipeline planning for low latency storage system
US20110307525A1 (en) Virtual storage device
CN103970671B (en) Allocating Additional Requested Storage Space For A Data Set In A First Managed Space In A Second Managed Space
US20150331610A1 (en) Data device grouping across multiple-data-storage-devices enclosures for synchronized data maintenance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200422

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Applicant before: Huawei Technologies Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220214

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.