WO2019057081A1 - 数据存储方法、数据查询方法、计算机设备及存储介质 - Google Patents
数据存储方法、数据查询方法、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2019057081A1 WO2019057081A1 PCT/CN2018/106495 CN2018106495W WO2019057081A1 WO 2019057081 A1 WO2019057081 A1 WO 2019057081A1 CN 2018106495 W CN2018106495 W CN 2018106495W WO 2019057081 A1 WO2019057081 A1 WO 2019057081A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data segment
- identifier
- segment
- streaming
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0626—Reducing size or complexity of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- the embodiments of the present invention relate to the field of data storage technologies, and in particular, to a data storage method, a data query method, a device, a computer device, and a storage medium.
- Consistent hash algorithm is a commonly used algorithm in distributed storage schemes without central nodes. Consistent hashing algorithm can avoid performance bottlenecks, single points of failure and data caused by dedicated metadata services to manage data distribution. Consistency and other issues.
- the storage system stores the streaming data through the consistent hash algorithm
- the received streaming data needs to be divided into different data segments, and the name of each data segment is hashed to determine the data segment.
- the storage device where it is located.
- the step of segmenting the stream data is automatically completed by the storage system.
- the naming of each data segment is also implemented by the storage system, that is, the user cannot know the name of a single data segment, and the user searches for When a certain data segment is used, all the storage devices in the storage system that store the streaming data need to query the data segment locally, resulting in a large overhead of the storage system during the process of searching for data in the streaming data. More system resources.
- the present application provides a data storage method, a data query method, and a device.
- Computer equipment and storage media
- a data storage method comprising:
- the data segment is stored to at least one storage device corresponding to the data segment.
- the method further includes:
- performing the hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, and obtaining the identifier of the virtual storage node corresponding to the data segment including:
- Hash value Performing a hash calculation on the identifier of the streaming data and the time information corresponding to the data segment according to the consistency hash algorithm, and obtaining a hash value of the identifier of the streaming data and time information corresponding to the data segment.
- the identifier of the virtual storage node corresponding to the data segment is obtained according to the hash value of the identifier of the streaming data and the hash value of the time information corresponding to the data segment, including:
- the sum of the hash value of the identifier of the streaming data and the hash value of the time information corresponding to the data segment is obtained as the sequence number of the virtual storage node corresponding to the data segment.
- the determining, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment including:
- the calculating, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment by using a pseudo hash algorithm including:
- redundancy policy corresponding to the streaming data, where the redundancy policy is used to indicate a storage device interval in which each group of redundant data corresponding to the data segment is located;
- a data query method comprising:
- determining the time information corresponding to the data segment according to the target time including:
- the time period is a start and end time period of the data segment in the streaming data, and determining time information corresponding to the data segment according to the time period.
- a data storage device comprising:
- a segmentation module for segmenting data in the streaming data to obtain a data segment
- a calculation module configured to perform a hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, to obtain an identifier of the virtual storage node corresponding to the data segment, where the time information is used to indicate the data
- the segment corresponds to the time in the streaming data
- a device determining module configured to determine, according to an identifier of the virtual storage node corresponding to the data segment, at least one storage device corresponding to the data segment;
- a storage module configured to store the data segment to at least one storage device corresponding to the data segment.
- the computing module includes:
- a calculating unit configured to perform hash calculation on the identifier of the streaming data and the time information corresponding to the data segment according to the consistency hash algorithm, to obtain a hash value of the identifier of the streaming data, and the data The hash value of the time information corresponding to the fragment;
- an identifier obtaining unit configured to obtain, according to a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment, an identifier of the virtual storage node corresponding to the data segment.
- the identifier obtaining unit is configured to obtain, as a virtual storage corresponding to the data segment, a sum of a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment.
- the serial number of the node is configured to obtain, as a virtual storage corresponding to the data segment, a sum of a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment. The serial number of the node.
- the device determining module is configured to calculate, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment by using a pseudo hash algorithm.
- the device determining module includes:
- a policy obtaining unit configured to acquire a redundancy policy corresponding to the streaming data, where the redundancy policy is used to indicate a storage device interval in which each group of redundant data corresponding to the data segment is located;
- a device determining unit configured to determine, according to the identifier of the virtual storage node corresponding to the data segment, by using the pseudo hash algorithm, respectively, from the storage device interval in which each group of redundant data corresponding to the data segment is located
- a storage device for storing the data segments
- a data query device comprising:
- a request receiving module configured to receive a query request including a target time, where the query request is used to query a data segment in the streaming data
- An information determining module configured to determine time information corresponding to the data segment according to the target time, where the time information is used to indicate that the data segment corresponds to a time in the streaming data;
- a calculation module configured to perform hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, to obtain an identifier of the virtual storage node corresponding to the data segment;
- a device determining module configured to determine, according to an identifier of the virtual storage node corresponding to the data segment, at least one storage device corresponding to the data segment;
- a querying module configured to query the data segment from at least one storage device corresponding to the data segment.
- the information determining module is specifically configured to determine a time period in which the target time is located, where the time period is a start and end time period of the data segment in the streaming data, and according to the time period Determining time information corresponding to the data segment.
- a computer device comprising a processor and a memory, wherein the memory stores instructions, and the processor executes the instructions to cause the computer device to implement the first aspect or the first The method described in the two aspects.
- a sixth aspect a computer readable storage medium storing instructions, the computer device executing the instructions to cause the computer device to implement the method of the first aspect or the second aspect described above .
- the identifier of the virtual storage node corresponding to the data segment is obtained, and then determined according to the identifier of the virtual storage node.
- the corresponding storage device considers the time information of the data segment when performing the hash calculation.
- the subsequent user searches for the data in the streaming data, it only needs to provide the time information corresponding to the data to be searched, and the storage system can directly according to the time.
- the information is determined by the storage device storing the corresponding data segment, and the storage device storing the streaming data is not required to separately search for the data segment that the user wants to find, thereby reducing the system overhead of the storage system in querying the streaming data, and saving the system. Resources.
- FIG. 1 is a block diagram of a data storage system involved in the present application
- FIG. 2 is a flowchart of a data storage method provided by an embodiment of the present application.
- FIG. 3 is a structural diagram of a data storage software involved in the embodiment shown in FIG. 2;
- FIG. 5 is a schematic diagram of a data storage form in a storage device according to the embodiment shown in FIG. 4; FIG.
- FIG. 6 is a flowchart of a data query method provided by an embodiment of the present application.
- FIG. 7 is a block diagram of a data storage device according to an embodiment of the present application.
- FIG. 8 is a block diagram of a data query apparatus according to an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
- a number as referred to herein means one or more, and "a plurality” means two or more.
- the character "/" generally indicates that the contextual object is an "or" relationship.
- FIG. 1 is a structural diagram of a data storage system according to an embodiment of the present application.
- the data storage system can perform distributed storage on streaming data.
- the data storage system includes the following devices: a data generating device 110, a plurality of storage devices 120, and an access control device 130.
- Streaming data can be a sequence of sequential, large, fast, and contiguous data sequences.
- streaming data can be viewed as a dynamic data set that grows over time.
- the streaming data can be a video stream, an audio stream, or other type of data stream, and the like.
- Streaming data is widely used in network monitoring, sensor networks, aerospace, meteorological measurement and control, and financial services.
- the data generating device 110 is a device that generates streaming data; for example, when the streaming data is a monitoring video stream, the data generating device 110 may be a surveillance camera that captures the surveillance video stream; or, when streaming data When it is an audio stream, the data generating device 110 may be a microphone that records the audio stream; or, when the streaming data is a log stream of a network application, the data generating device 110 may be a server of the network application.
- the specific data type of the streaming data and the specific device type of the data generating device 110 are not limited in the embodiment of the present application.
- the storage device 120 can be used to store streaming data.
- the storage device 120 can be a disk or a mechanical hard disk including a disk.
- the storage device 120 can also be a flash memory or include The solid state hard disk of the flash memory, or the storage device 120 may also include a hybrid hard disk of a disk and a flash memory.
- the access control device 130 is used to control the storage and reading of streaming data in the various storage devices 120.
- the access control device 130 may be disposed on the user side.
- the access control device 130 may be a personal computer such as a user's personal computer or a workstation, or the access control device 130 may also be set by a user. server.
- the access control device 130 may also be disposed on the storage service provider side.
- the access control device 130 may be a server set by the storage service provider.
- the access control device 130 and the plurality of storage devices 120 are respectively connected by a wired or wireless network.
- all or part of the devices between the plurality of storage devices 120 can also be connected through a wired or wireless network.
- FIG. 2 shows a flowchart of a data storage method provided by an embodiment of the present application. This method can be used in the data storage system shown in FIG. As shown in FIG. 2, the data storage method may include:
- step 201 the data in the streaming data is segmented to obtain a data segment.
- Step 202 Perform hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, and obtain an identifier of the virtual storage node corresponding to the data segment.
- the time information is used to indicate that the data segment corresponds to the time in the streaming data.
- Step 203 Determine at least one storage device corresponding to the data segment according to the identifier of the virtual storage node corresponding to the data segment.
- Step 204 Store the data segment to at least one storage device corresponding to the data segment.
- the foregoing steps may be implemented by a data storage system based on management software developed by an SDK (Software Development Kit).
- FIG. 3 illustrates a data storage software architecture diagram related to an embodiment of the present application.
- the SDK-based management software (indicated by SDK in FIG. 3) can segment the streaming data in chronological order to determine a Vnode (virtual node) in a bucket corresponding to the segmented data segment.
- the virtual node is identified and the segmented data segment is stored in the corresponding Vnode, where the Vnode is the virtual storage node.
- the Vnode further determines that the data segment corresponds to at least one OSD (Object Storage Device) in each host, and stores the data segment into the corresponding OSD.
- OSD Object Storage Device
- the method provided by the embodiment of the present application divides the streaming data into data segments, and performs hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, thereby obtaining virtual corresponding to the data segment.
- the identifier of the storage node determines the corresponding storage device according to the identifier of the virtual storage node.
- the storage system can determine the storage device that stores the corresponding data segment according to the time information, and does not need all the storage devices to separately search for the data segment that the user wants to find, thereby reducing the storage system querying the streaming data. System overhead in the process, saving system resources.
- FIG. 4 shows a flowchart of a data storage method provided by an embodiment of the present application.
- This method can be used in the data storage system shown in FIG.
- the storage of the streaming data sent by the data generating device is taken as an example.
- the data storage method may include:
- Step 401 The access control device receives the streaming data sent by the data generating device.
- the data generating device continuously sends the generated data to the access control device in the form of streaming data, and correspondingly, the access control device continuously receives the streaming data sent by the data generating device. .
- the data generating device is a video monitoring device (such as a surveillance camera)
- the video monitoring device since the video monitoring device usually runs continuously, the captured video data needs to be continuously stored in the storage system.
- the control device receives
- the incoming streaming data can be a video stream captured by the video surveillance device.
- Step 402 The access control device segments the streaming data according to the time corresponding to each data in the streaming data to obtain a data segment.
- the data of each unit in the streaming data usually corresponds to the respective time, and the time corresponding to the data of the adjacent two units is continuous, or the data of each unit in the streaming data is in the respective chronological order. Arranged.
- each frame image in the video stream corresponds to a respective shooting time
- each frame image in the video stream is arranged in a sequence of respective shooting times.
- the shooting time of each frame image can be used as the time corresponding to each frame image in the streaming data.
- the access control device may divide the data in the streaming data into data segments of fixed or non-fixed time length according to the time corresponding to each data in the streaming data.
- the access control device since the data in the streaming data is usually generated and reaches the access control device, the access control device needs to store the data while receiving the data.
- the access control device The data of the length of the interval may be divided into one data segment according to the time corresponding to the data of each unit in the streaming data, and then stored in units of data segments.
- the length of time corresponding to each data segment may be fixed.
- the access control device may divide the data in the streaming data into every half hour, for example, to receive the stream.
- the data is taken as an example of the video stream sent by the video monitoring device.
- the access control device detects whether the time stamp of the currently received video frame crosses the whole point or a half point, and if so, The unfragmented video frame received before the current video frame is divided into one video segment (ie, equivalent to the above data segment).
- the length of time corresponding to each of the foregoing data segments may not be fixed.
- the access control device may compare data between two adjacent integer points in the streaming data (for example, from 6:00:00 to 6: The data between 59:59 or the data between 7:00:00 and 7:59:59 is divided into three data segments, wherein the first two data segments correspond to a length of time of 25 minutes, and the last data The length corresponding to the segment is 10 min; alternatively, the access control device can slice every other hour of data in the streaming data into three data segments of length 30 min, 20 min, and 10 min, respectively.
- the embodiment of the present application is not limited.
- Step 403 The access control device performs hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, and obtains an identifier of a Vnode (Virtual Node) corresponding to the data segment.
- Vnode Virtual Node
- the time information is used to indicate that the data segment corresponds to the time in the streaming data.
- the access control device may perform hash calculation on the identifier of the streaming data and the time information corresponding to the data segment according to the consistency hash algorithm, and obtain a hash value of the identifier of the streaming data and the data segment. And a hash value corresponding to the time information; obtaining a identifier of the virtual storage node corresponding to the data segment according to the hash value of the identifier of the streaming data and the hash value of the time information corresponding to the data segment.
- the identifier of the streaming data may be an identifier that uniquely indicates the streaming data.
- the identifier of the streaming data may be For example, the identifier of the streaming data source device is a video monitoring device, and the identifier of the streaming data may be a camera ID of the video monitoring device.
- time information corresponding to the data segment may be a timestamp corresponding to the data segment, and specifically, may be the start time of the data segment, or may be the end time of the data segment, or may be The middle time of the data fragment and so on.
- the access control device may combine the identifier of the streaming data and the time information of the data segment to comprehensively determine the corresponding virtual storage node, so that on the one hand, the virtual storage node may be made virtual.
- the determining of the storage node introduces time information of the data segment, so that the subsequent searching according to the time information is facilitated, and on the other hand, the data segments corresponding to different streaming data at the same time can be allocated to different virtual storage nodes.
- the access control device may obtain the sum of the hash value of the identifier of the streaming data and the hash value of the time information corresponding to the data segment.
- the streaming data is used as the monitoring video stream
- the identifier of the streaming data is the camera ID of the video monitoring device
- the time information corresponding to the data segment is the starting time of the data segment (assumed to be 1:30:00).
- the virtual storage node is a virtual concept set for facilitating system processing.
- the number of the virtual storage node may be a hash ring of the consistent hash algorithm. The length corresponds.
- the length of the hash loop of the consistent hash calculation may be the ratio of the storage period of the streaming data to the duration of the single data segment.
- the storage period of the streaming data is one year (that is, the storage system stores the data of the last year in the streaming data by default), and the length of each data segment is half an hour.
- the hash ring of the above consistency algorithm is used as an example.
- the access control device may automatically set the name of the data segment (ie, the file name) after the segmentation of the data segment.
- the name of the data segment may include streaming data.
- the identifier, the time information, and the identifier of the virtual storage node, for example, the streaming video data is used as the monitoring video stream generated by the video monitoring device.
- the name of a video segment may be: camera ID + start time + Vnode serial number.
- Step 404 The access control device determines, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment.
- the access control device may calculate, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment by using a pseudo hash algorithm.
- the storage system according to the embodiment of the present application is a distributed file system
- the access control device can be controlled by a CRUSH (Controlled Replication Under Scalable Hashing) algorithm according to a data segment.
- the identification of the virtual storage node calculates at least one storage device corresponding to the data segment.
- the access control device may obtain a redundancy policy corresponding to the streaming data, where the redundancy policy is used to indicate a storage device interval in which each group of redundant data corresponding to the data segment is located; and corresponding to the data segment according to the data segment
- the identifier of the virtual storage node is calculated by the pseudo hash algorithm from the storage device interval in which each set of redundant data corresponding to the data segment is located to determine at least one storage device for storing the data segment.
- the redundant data of the data segment refers to multiple sets of data corresponding to the data segment and stored in different storage devices.
- storing data segments as multiple sets of redundant data in different storage devices can prevent data fragments stored in this portion of the storage devices from being lost due to failure of some storage devices.
- the storage device may be divided into storage devices of different levels. For example, a plurality of storage devices may be distributed in multiple computer rooms, and multiple storage hosts are disposed in each computer room, and each storage host corresponds to multiple storage devices. A device group with multiple storage devices in each storage device group.
- the user or the system can specify a redundancy policy, that is, one piece of data is simultaneously stored in a plurality of different storage devices, so that when a storage device fails, the data stored in the storage device is not lost.
- the foregoing redundancy policy may indicate that the level of redundancy is used for redundancy.
- the redundancy policy may be that each copy of the same data is separately stored in different equipment rooms, or each copy of the same data is separately Stored in different hosts in a computer room, or each copy of the same data is stored in a different storage device group in one host, or each copy of the same data is stored in a storage device group. Different storage devices and so on.
- the access control device may determine at least one storage device according to the redundancy policy, for example, when the redundancy policy is that each group of redundant data of the same data segment is separately stored in different equipment rooms, the access control device It can be determined from all the storage devices included in each equipment room (for example, according to a pseudo hash algorithm such as CRUSH) that a storage device is used as a storage device corresponding to the currently stored data segment, and how many storage devices determine how many storage devices are used as The storage device corresponding to the data fragment.
- a pseudo hash algorithm such as CRUSH
- Step 405 The access control device stores the data segment to at least one storage device corresponding to the data segment.
- the at least one storage device includes a primary storage device and at least one secondary storage device, and the access control device can store the data segment to the primary storage device, so that the primary storage device generates each group of redundancy of the data segment. Data, and each set of redundant data is separately synchronized to the at least one slave storage device.
- the access control device may set the first storage device of the determined at least one storage device as the primary storage device, and all calculations in the storage process are completed by the primary storage device, and if the copy mode is, the primary storage device The copy data is sent to the slave storage device; if it is the erasure code mode, the primary storage device needs to first split the data into strips, save the first data to the local, and send other data to the slave storage device, and the storage process needs Waiting for data to be written in at least one storage device is considered successful.
- the streaming data collected by the video monitoring device is stored in the OSD as an example.
- the data storage format of the data segment in the streaming data in each OSD can be as shown in FIG. 5, and the description of FIG. 5 is as follows:
- the data on the object storage device is organized and written into objects in the N virtual storage nodes according to the number of written virtual storage nodes (Vnodes);
- each directory of the virtual storage node generate 1 to M virtual storage node directories, and put each object into a different directory through a hash algorithm;
- Each object is named and stored according to the video surveillance device identifier (such as camera ID) + bucket ID (bucket ID) + time stamp (such as the starting time point). In subsequent queries, the corresponding object can be directly searched. name;
- the object storage device further includes a storage directory, where the storage directory may include an index file and a log file, wherein each object generates a corresponding I frame index, and the I frame index is stored in the index file for saving; the log file There are records of operations performed on the object storage device, such as reading records and storing records.
- mapping data needs to be migrated.
- the storage system can perform data repair.
- the data recovery is initiated by the Vnode.
- the OSD of the Vnode recovers and is brought back online
- the specified OSD of each OSD corresponding to the Vnode initiates a data repair process. During the process, reading and writing to the OSD will be blocked.
- the steps for data repair can be as follows:
- Step 1 View the log of the corresponding time point: the specified OSD obtains the list of OSDs participating in the fault time.
- the OSD corresponding to the Vnode is the primary OSD (that is, the primary storage device)
- one of the OSDs corresponding to the Vnode ie, the above-mentioned secondary storage device
- the OSD of the fault recovery is the slave OSD in each OSD corresponding to the Vnode
- the master OSD in each OSD corresponding to the Vnode will be determined as the specified OSD, and the time period from the failure of the OSD is acquired by the specified OSD.
- Step 2 Obtain a corresponding log: the specified OSD obtains a storage log.
- the specified OSD can obtain storage logs during the time period when other OSDs fail, to determine what data changes during the time period when other OSDs fail, such as what data is added or which data is deleted.
- Step 3 Obtain the recorded log information that needs to be repaired: the specified OSD obtains the Object information that needs to be repaired for each copy.
- the specified OSD After the specified OSD obtains the storage logs in the time period when other OSDs are faulty, it can determine which objects need to be restored to the re-online OSD according to the obtained storage logs.
- Step 4 Perform data modification according to the log information that needs to be repaired, and the specific operations for performing data repair may be as follows:
- the primary OSD is a fail-recovery OSD and an object with data loss occurs in the primary OSD
- the primary OSD actively pulls the object data on the unsuccessful slave OSD, and performs data recovery locally according to the extracted object data.
- the main OSD when the redundancy mode is the replica mode, stores the object data on the OSD as a copy; or, when the redundancy mode is the erasure code mode, the main OSD performs erasure correction according to the object data from the OSD. Code calculation, obtain object data on the main OSD and store it.
- the OSD is a fault recovery OSD and an object with data loss occurs from the OSD
- the object data to be repaired is pushed from the OSD by the primary OSD to the failure recovery.
- the main OSD pushes the local object data as a copy to the fail-safe slave OSD; or, when the redundancy mode is the erasure code mode, the main OSD is based on the locally stored object data and Other error-free object data from the OSD is subjected to erasure code calculation, object data on the OSD from the fault recovery is obtained, and the calculated object data is pushed to the slave recovery OSD.
- the primary OSD and some of the slave OSDs are fail-recovery OSDs and the object data is missing, the primary OSD first pulls data from the OSD that has never failed, for local recovery. The next time you push the data to the slave OSD that needs to be restored.
- the primary OSD stores the object data on the OSD as a copy without fail, and pushes the copy to the failed recovery slave OSD; or, when the redundancy mode is the erasure code In the mode, the main OSD performs the erasure code calculation based on the object data on the OSD that has not failed, obtains the object data on the main OSD and the object data on the OSD from the fault recovery, and recovers the object from the OSD on the fault. The data is pushed to the recovery from the OSD.
- the method provided by the embodiment of the present application performs segmentation on the data in the streaming data according to the time, and performs hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment to obtain a data segment corresponding.
- the identifier of the virtual storage node is determined according to the identifier of the virtual storage node, and the time information of the data segment is considered when performing the hash calculation, and the subsequent user only needs to provide the data in the streaming data.
- the time information corresponding to the data to be searched, the storage system can directly determine the storage device that stores the corresponding data segment according to the time information, and does not require all the storage devices to separately search for the data segment that the user wants to find, thereby reducing the storage system in the query flow. System overhead in the data process, saving system resources.
- FIG. 6 is a flowchart of a data query method provided by an embodiment of the present application. This method can be used in the data storage system shown in FIG. As shown in FIG. 6 , the data query method in the query streaming data is taken as an example.
- the data query method may include:
- Step 601 The access control device receives a query request including a target time, where the query request is used to query a data segment in the streaming data.
- the user when the user needs to query the data at the certain time point in the streaming data, the user can display the information in the query terminal.
- the query interface directly sets or inputs the target time to be queried, and the query terminal can generate a query request for the streaming data and including the target time.
- the user can select the query in the query interface.
- the query terminal can generate a query request containing the identification of the monitoring video stream and the target time "17-09-10, 7:15:00", and The query request is provided to the access control device.
- Step 602 The access control device determines time information corresponding to the data segment according to the target time, where the time information is used to indicate that the data segment corresponds to the time in the streaming data.
- the access control device may determine, according to the target time included in the query request, the time segment corresponding to the data segment to be queried in the streaming data.
- the method for determining, by the access control device, the time segment corresponding to the data segment to be queried in the streaming data according to the target time included in the query request, and the respective units in the streaming data according to the embodiment shown in FIG. 4 The method corresponding to the time segmentation of the data segment corresponds to the data.
- the access control device determines the time period corresponding to the data segment to be queried for the query request with the target time of “17-09-10, 7:15:00”. It is 7:00:00 to 7:29:59 on the morning of September 10, 17th.
- Step 603 The access control device performs hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, and obtains an identifier of the virtual storage node corresponding to the data segment.
- Step 604 The access control device determines, according to the identifier of the virtual storage node corresponding to the data segment, at least one storage device corresponding to the data segment.
- Step 605 The access control device queries the data segment from at least one storage device corresponding to the data segment.
- the access control device may query the determined storage device for the data segment, for example, may refer to the primary storage in the determined at least one storage device.
- the device queries the data fragment.
- the access control device may determine to store the data by using a hash algorithm. At least one storage device of the data segment, and querying the determined at least one storage device for the data segment, the process does not need to query each storage device, thereby greatly reducing the system of the storage system in querying the streaming data process Overhead, saving system resources.
- the method provided by the embodiment of the present application performs segmentation on the data in the streaming data according to the time, and performs hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment to obtain a data segment corresponding.
- the identifier of the virtual storage node is determined according to the identifier of the virtual storage node, and the time information of the data segment is considered when performing the hash calculation, and the subsequent user only needs to provide the data in the streaming data.
- the time information corresponding to the data to be searched, the storage system can directly determine the storage device that stores the corresponding data segment according to the time information, and does not require all the storage devices to separately search for the data segment that the user wants to find, thereby reducing the storage system in the query flow. System overhead in the data process, saving system resources.
- FIG. 7 shows a block diagram of a data storage device provided by an embodiment of the present application.
- the device may be implemented in hardware or a combination of hardware and software as part or all of the access control device 110 of the data storage system shown in FIG. 1 for performing all of the operations performed by the access control device as shown in FIG. 2 or 4. Part of the steps.
- the device can include:
- the segmentation module 701 is configured to segment the data in the streaming data according to the time corresponding to the data in the streaming data to obtain a data segment.
- the calculation module 702 is configured to perform hash calculation according to the identifier of the streaming data and the time information corresponding to the data segment, to obtain an identifier of the virtual storage node corresponding to the data segment;
- the device determining module 703 is configured to determine, according to the identifier of the virtual storage node corresponding to the data segment, at least one storage device corresponding to the data segment;
- the storage module 704 is configured to store the data segment to at least one storage device corresponding to the data segment.
- the computing module includes:
- a calculating unit configured to perform hash calculation on the identifier of the streaming data and the time information corresponding to the data segment according to the consistency hash algorithm, to obtain a hash value of the identifier of the streaming data, and the data The hash value of the time information corresponding to the fragment;
- an identifier obtaining unit configured to obtain, according to a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment, an identifier of the virtual storage node corresponding to the data segment.
- the identifier obtaining unit is configured to obtain, as a virtual storage corresponding to the data segment, a sum of a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment.
- the serial number of the node is configured to obtain, as a virtual storage corresponding to the data segment, a sum of a hash value of the identifier of the streaming data and a hash value of time information corresponding to the data segment. The serial number of the node.
- the device determining module is configured to determine, according to the identifier of the virtual storage node corresponding to the data segment, the at least one storage device corresponding to the data segment by using a pseudo hash algorithm.
- the device determining module includes:
- a policy obtaining unit configured to acquire a redundancy policy corresponding to the streaming data, where the redundancy policy is used to indicate a storage device interval in which each group of redundant data corresponding to the data segment is located;
- a device determining unit configured to determine, according to the identifier of the virtual storage node corresponding to the data segment, by using the pseudo hash algorithm, respectively, from the storage device interval in which each group of redundant data corresponding to the data segment is located
- a storage device for storing the data segments
- the at least one storage device includes one primary storage device and at least one secondary storage device,
- the storage module is configured to store the data segment to the primary storage device, so that the primary storage device synchronizes the data segment to the at least one secondary storage device.
- FIG. 8 is a block diagram of a data query apparatus provided by an embodiment of the present application.
- the device may be implemented as part or all of the access control device 110 of the data storage system shown in FIG. 1 by hardware or a combination of software and hardware for performing all or part of the steps performed by the access control device in FIG. .
- the device can include:
- a request receiving module 801 configured to receive a query request that includes a target time, where the query request is used to query a data segment in the streaming data;
- the information determining module 802 is configured to determine time information corresponding to the data segment according to the target time, where the time information is used to indicate that the data segment corresponds to a time in the streaming data;
- the calculation module 803 is configured to perform hash calculation according to the identifier of the streaming data and time information corresponding to the data segment, to obtain an identifier of the virtual storage node corresponding to the data segment;
- the device determining module 804 is configured to determine, according to the identifier of the virtual storage node corresponding to the data segment, at least one storage device corresponding to the data segment;
- the query module 805 is configured to query the data segment from at least one storage device corresponding to the data segment.
- the information determining module is specifically configured to determine a time period in which the target time is located, where the time period is a start and end time period of the data segment in the streaming data, and according to the time period Determining time information corresponding to the data segment.
- FIG. 9 there is shown a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application, which may be implemented as the access control device 130 in the system shown in FIG.
- the computer device includes a processor 91, a communication component 92, a memory 93, and a bus 94.
- the processor 91 includes one or more processing cores, and the processor 91 executes various functions and information processing by running software programs and modules.
- Communication component 92 can include at least one of a wired network interface (such as an Ethernet interface) and a wireless network interface (such as an interface such as WLAN, BLE, ZigBee, etc.).
- the communication component 92 is for modulating and/or demodulating information and receiving or transmitting the information via a wired or wireless signal.
- the memory 93 is connected to the processor 91 via a bus 94.
- Memory 93 can be used to store software programs as well as modules.
- the memory 93 can store the application module 96 as described by at least one function.
- the processor 91 can implement all or part of the steps performed by the access control device in FIG. 2, FIG. 4 or FIG. 6 by executing the application module 96 described above.
- memory 93 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable In addition to Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read only memory
- EPROM Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Disk Disk
- Disk Disk or Optical Disk
- An embodiment of the present application also provides a non-transitory computer readable storage medium including instructions, such as a memory including instructions executable by a processor of a computer device to perform the data storage method illustrated in various embodiments of the present application or Data query method.
- the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
- modules and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution.
- the disclosed apparatus and method can be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the modules may be divided into only one logical function, or may be further divided.
- multiple modules or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
- the modules described as separate components may or may not be physically separated.
- the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected as needed to achieve the objectives of the solution of the embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种数据存储方法,属于数据存储技术领域。所述方法包括:对流式数据中的数据进行切分,获得数据片段(201);根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识(202);根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备(203);将所述数据片段存储至对应的至少一个存储设备(204)。后续用户在查找流式数据中的数据时,只需要提供需要查找的数据对应的时间信息,存储系统可以直接根据时间信息确定出存储对应的数据片段的存储设备,从而降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
Description
本申请要求于2017年09月21日提交、申请号为201710858464.X、发明名称为“数据存储方法、数据查询方法及装置”的中国专利申请的优先权,上述申请的全部内容通过引用结合在本申请中。
本申请实施例涉及数据存储技术领域,特别涉及一种数据存储方法、数据查询方法、装置、计算机设备及存储介质。
一致性哈希(hash)算法是无中心节点的分布式存储方案中常用的算法,通过一致性哈希算法可以避免采用专用元数据服务来管理数据分布而导致的性能瓶颈、单点故障以及数据一致性等问题。
存储系统在通过一致性哈希算法对流式数据进行存储时,需要将接收到的流式数据切分为不同的数据片段,并对每一个数据片段的名称进行哈希计算,以确定该数据片段所在的存储设备。
由于流式数据的存储过程中,对流式数据进行切分的步骤由存储系统自动完成,对每一个数据片段的命名也由存储系统来实现,即用户无法知晓单个数据片段的名称,当用户查找某个数据片段时,存储系统中所有用于存储该流式数据的存储设备都需要在本地查询该数据片段,导致对流式数据中的数据进行查找的过程中,存储系统的开销较大,浪费较多的系统资源。
发明内容
为了解决现有技术中对流式数据中的数据进行查找的过程中,存储系统的开销较大,浪费较多的系统资源的问题,本申请提供了一种数据存储方法、数据查询方法、装置、计算机设备及存储介质。
第一方面,提供了一种数据存储方法,所述方法包括:
对流式数据中的数据进行切分,获得数据片段;
根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,所述时间信息用于指示所 述数据片段对应在所述流式数据中的时间;
根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
将所述数据片段存储至所述数据片段对应的至少一个存储设备。
可选的,所述方法还包括:
接收包含目标时间的查询请求,所述查询请求用于查询所述流式数据中的数据片段;
根据所述目标时间确定所述数据片段对应的时间信息;
根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;
根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
从所述数据片段对应的至少一个存储设备中查询所述数据片段。
可选的,所述根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,包括:
按照一致性哈希算法分别对所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值;
根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识。
可选的,所述根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识,包括:
将所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值的和,获取为所述数据片段对应的虚拟存储节点的序号。
可选的,所述根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备,包括:
根据所述数据片段对应的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备。
可选的,所述根据所述数据片段对应的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备,包括:
获取所述流式数据对应的冗余策略,所述冗余策略用于指示所述数据片段 对应的每一组冗余数据所在的存储设备区间;
根据所述数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从所述数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储所述数据片段的存储设备。
第二方面,提供了一种数据查询方法,所述方法包括:
接收包含目标时间的查询请求,所述查询请求用于查询流式数据中的数据片段;
根据所述目标时间确定所述数据片段对应的时间信息,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;
根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;
根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
从所述数据片段对应的至少一个存储设备中查询所述数据片段。
可选的,所述根据所述目标时间确定所述数据片段对应的时间信息,包括:
确定所述目标时间所在的时间段,所述时间段是所述数据片段在所述流式数据中的起止时间段,并根据所述时间段确定所述数据片段对应的时间信息。
第三方面,提供了一种数据存储装置,所述装置包括:
切分模块,用于对流式数据中的数据进行切分,获得数据片段;
计算模块,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;
设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
存储模块,用于将所述数据片段存储至所述数据片段对应的至少一个存储设备。
可选的,所述计算模块,包括:
计算单元,用于按照一致性哈希算法分别对所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述流式数据的标识的哈希数值 以及所述数据片段对应的时间信息的哈希数值;
标识获得单元,用于根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识。
可选的,所述标识获得单元,用于将所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值的和,获取为所述数据片段对应的虚拟存储节点的序号。
可选的,所述设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备。
可选的,所述设备确定模块,包括:
策略获取单元,用于获取所述流式数据对应的冗余策略,所述冗余策略用于指示所述数据片段对应的每一组冗余数据所在的存储设备区间;
设备确定单元,用于根据所述数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从所述数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储所述数据片段的存储设备。
第四方面,提供了一种数据查询装置,所述装置包括:
请求接收模块,用于接收包含目标时间的查询请求,所述查询请求用于查询所述流式数据中的数据片段;
信息确定模块,用于根据所述目标时间确定所述数据片段对应的时间信息,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;
计算模块,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;
设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
查询模块,用于从所述数据片段对应的至少一个存储设备中查询所述数据片段。
可选的,所述信息确定模块,具体用于确定所述目标时间所在的时间段,所述时间段是所述数据片段在所述流式数据中的起止时间段,并根据所述时间段确定所述数据片段对应的时间信息。
第五方面,提供了一种计算机设备,所述计算机设备包含处理器和存储器, 所述存储器中存储有指令,所述处理器执行所述指令使得所述计算机设备实现如上述第一方面或者第二方面所述的方法。
第六方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有指令,计算机设备执行所述指令使得所述计算机设备实现如上述第一方面或者第二方面所述的方法。
本申请实施例提供的技术方案至少包括以下有益效果:
通过将流式数据切分为数据片段,并根据该流式数据的标识以及数据片段对应的时间信息进行哈希计算,获得数据片段对应的虚拟存储节点的标识,再根据虚拟存储节点的标识确定对应的存储设备,在进行哈希计算时,考虑了数据片段的时间信息,后续用户在查找流式数据中的数据时,只需要提供需要查找的数据对应的时间信息,存储系统可以直接根据时间信息确定出存储对应的数据片段的存储设备,不需要所有存储该流式数据的存储设备分别查找用户想要查找的数据片段,从而降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请所涉及的数据存储系统的架构图;
图2是本申请一个实施例提供的数据存储方法的流程图;
图3是图2所示实施例涉及的数据存储软件架构图;
图4是本申请一个实施例提供的数据存储方法的流程图;
图5是图4所示实施例涉及的一种存储设备中数据存储形式示意图;
图6是本申请一个实施例提供的数据查询方法的流程图;
图7是本申请一个实施例提供的数据存储装置的框图;
图8是本申请一个实施例提供的数据查询装置的框图;
图9是本申请一个实施例提供的计算机设备的结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
应当理解的是,在本文中提及的“若干个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
图1是本申请实施例所涉及的一种数据存储系统的架构图。该数据存储系统可以对流式数据进行分布式存储。该数据存储系统包括以下设备:数据生成设备110、若干个存储设备120以及存取控制设备130。
流式数据可以是一组顺序、大量、快速、连续到达的数据序列,一般情况下,流式数据可以被视为一个随时间延续而增长的动态数据集合。典型的,流式数据可以是视频流、音频流或者其它类型的数据流等。流式数据广泛应用于网络监控、传感器网络、航空航天、气象测控和金融服务等领域。
在图1中,数据生成设备110是生成流式数据的设备;比如,当流式数据是监控视频流时,数据生成设备110可以是拍摄该监控视频流的监控摄像头;或者,当流式数据是音频流时,数据生成设备110可以是录制该音频流的麦克风;或者,当流式数据是某网络应用的日志流时,数据生成设备110可以是该网络应用的服务器。本申请实施例对于流式数据的具体数据类型以及数据生成设备110的具体设备类型不做限定。
存储设备120用于存储流式数据,其中,存储设备120的实现形式可以有多种,比如,存储设备120可以是磁盘或者包含磁盘的机械硬盘等,或者,存储设备120也可以是闪存或者包含闪存的固态硬盘,或者,存储设备120也可以包含磁盘和闪存的混合式硬盘等。
在本申请实施例中,若干个存储设备120可以通过集中式或者分布式的方式部署。
存取控制设备130用于控制流式数据在各个存储设备120中的存储和读取。在本申请实施例中,存取控制设备130可以设置在用户侧,比如,存取控制设备130可以是用户的个人电脑或者工作站等个人计算机,或者,存取控制设备130也可以是用户设置的服务器。或者,存取控制设备130也可以设置在存储服务提供商侧,比如,存取控制设备130可以是存储服务提供商设置的服务器。
该存取控制设备130与若干个存储设备120之间分别通过有线或者无线网络相连。相应的,该若干个存储设备120之间的全部或者部分设备之间也可以通过有线或者无线网络相连。
请参考图2,其示出了本申请一个实施例提供的数据存储方法的流程图。该方法可以用于图1所示的数据存储系统中。如图2所示,该数据存储方法可以包括:
步骤201,对流式数据中的数据进行切分,获得数据片段。
步骤202,根据该流式数据的标识以及该数据片段对应的时间信息进行哈希计算,获得该数据片段对应的虚拟存储节点的标识。
其中,上述时间信息用于指示数据片段对应在流式数据中的时间。
步骤203,根据该数据片段对应的虚拟存储节点的标识确定该数据片段对应的至少一个存储设备。
步骤204,将该数据片段存储至该数据片段对应的至少一个存储设备。
在一种可能的实现方式中,上述步骤可以由数据存储系统基于SDK(Software Development Kit,软件开发工具包)开发的管理软件来实现。请参考图3,其示出了本申请实施例涉及的数据存储软件架构图。如图3所示,基于SDK开发的管理软件(图3中以SDK表示)可以按照时间顺序对流式数据进行切分,确定切分的数据片段对应的桶(Bucket)中的Vnode(virtual node,虚拟节点)的标识,并将切分的数据片段存储至对应的Vnode,这里的Vnode即为上述虚拟存储节点。然后,Vnode再确定上述数据片段对应在各个主机(Host)中的至少一个OSD(Object Storage Device,对象存储设备),并将数据片段存储至对应的OSD中。
综上所述,本申请实施例提供的方法,通过将流式数据切分为数据片段,并根据该流式数据的标识以及数据片段对应的时间信息进行哈希计算,获得数 据片段对应的虚拟存储节点的标识,在根据虚拟存储节点的标识确定对应的存储设备,在进行哈希计算时,考虑了数据片段的时间信息,后续用户在查找流式数据中的数据时,只需要提供需要查找的数据对应的时间信息,存储系统可以直接根据时间信息确定出存储对应的数据片段的存储设备,不需要所有的存储设备分别查找用户想要查找的数据片段,从而降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
请参考图4,其示出了本申请一个实施例提供的数据存储方法的流程图。该方法可以用于图1所示的数据存储系统中。如图4所示,以对数据生成设备发送的流式数据进行存储为例,该数据存储方法可以包括:
步骤401,存取控制设备接收数据生成设备发送的流式数据。
在本申请实施例中,数据生成设备将生成的数据以流式数据的形式源源不断的发送给存取控制设备,相应的,存取控制设备也源源不断的接收数据生成设备发送的流式数据。
比如,以数据生成设备是视频监控设备(比如监控摄像头)为例,由于视频监控设备通常是不间断运行的,其采集的视频数据需要源源不断的存储至存储系统中,此时,控制设备接收到的流式数据可以是视频监控设备拍摄的视频流。
步骤402,存取控制设备按照流式数据中的各个数据所对应的时间对该流式数据进行切分,获得数据片段。
流式数据中每个单位的数据通常都对应各自的时间,且相邻两个单位的数据各自对应的时间是连续的,或者说,流式数据中的各个单位的数据是按照各自的时间顺序进行排列的。
比如,当流式数据是视频监控设备拍摄的视频流时,该视频流中的每一帧图像都对应各自的拍摄时间,且视频流中各帧图像按照各自的拍摄时间顺序排列。其中,各帧图像的拍摄时间即可以作为各帧图像在流式数据中各自对应的时间。
在本申请实施例中,存取控制设备可以按照流式数据中的各个数据对应的时间,将流式数据中的数据切分成固定或者非固定时间长度的数据片段。
具体的,由于流式数据中的数据通常是源源不断的生成并到达存取控制设备的,因此,存取控制设备需要对流式数据边接收边存储,在本申请实施例中, 存取控制设备可以按照流式数据中的各个单位的数据所对应的时间,将每隔一段时间长度的数据切分为一个数据片段,后续以数据片段为单位进行存储。
在本申请实施例中,各个数据片段各自对应的时间长度可以固定,比如,存取控制设备可以将流式数据中每隔半小时的数据切分为一个数据片段,比如,以接收到的流式数据为视频监控设备发送的视频流为例,存取控制设备在接收视频流的过程中,检测当前收到的视频帧的时间戳是否跨越了整点或半整点,若是,则将在当前视频帧之前接收到的,且未切分的视频帧切分为一个视频片段(即相当于上述数据片段)。
或者,上述各个数据片段各自对应的时间长度也可以不固定,比如,存取控制设备可以将流式数据中每相邻两个整点之间的数据(比如从6:00:00~6:59:59之间的数据或者7:00:00~7:59:59之间的数据)切分为三个数据片段,其中,前两个数据片段对应的时间长度均为25min,最后一个数据片段对应的时间长度为10min;或者,存取控制设备可以将流式数据中每隔一个小时的数据切分成时间长度分别为30min、20min和10min中的三个数据片段。
对于存取控制设备按照流式数据中的各个数据对应的时间切分数据片段的具体方式,本申请实施例不做限定。
步骤403,存取控制设备根据该流式数据的标识以及该数据片段对应的时间信息进行哈希计算,获得该数据片段对应的Vnode(Virtual node,虚拟存储节点)的标识。
其中,上述时间信息用于指示数据片段对应在流式数据中的时间。
具体的,存取控制设备可以按照一致性哈希算法分别对该流式数据的标识以及该数据片段对应的时间信息进行哈希计算,获得该流式数据的标识的哈希数值以及该数据片段对应的时间信息的哈希数值;根据该流式数据的标识的哈希数值以及该数据片段对应的时间信息的哈希数值获得该数据片段对应的虚拟存储节点的标识。
其中,上述流式数据的标识可以是一个唯一表示该流式数据的标识,具体的,当流式数据源设备只向存取控制设备发送一条流式数据时,该流式数据的标识可以是流式数据源设备的标识,比如,以流式数据源设备为视频监控设备为例,该流式数据的标识可以是视频监控设备的编号(camera ID)。
此外,上述数据片段对应的时间信息可以是该数据片段对应的时间戳,具体的,可以是该数据片段的起始时间,或者,也可以是该数据片段的结束时间, 或者,也可以是该数据片段的中间时间等等。
在本申请实施例中,存取控制设备在确定数据片段对应的虚拟存储节点时,可以结合流式数据的标识和数据片段的时间信息来综合确定对应的虚拟存储节点,这样一方面可以使得虚拟存储节点的确定引入数据片段的时间信息,便于后续根据时间信息进行查找,另一方面也可以使得不同的流式数据对应在相同时间的数据片段能够被分配至不同的虚拟存储节点。
可选的,在获得该数据片段对应的虚拟存储节点的标识时,存取控制设备可以将该流式数据的标识的哈希数值以及该数据片段对应的时间信息的哈希数值的和,获取为该数据片段对应的虚拟存储节点的序号。
具体比如,以流式数据为监控视频流,流式数据的标识为视频监控设备的编号(camera ID),数据片段对应的时间信息为数据片段的起始时间(假设为1:30:00)为例,存取控制设备对camera ID进行一致性哈希计算,获得哈希值为9527,并对起始时间“1:30:00”进行一致性哈希计算,获得哈希值为3,则存取控制设备确定该数据片段对应的虚拟存储节点的序号为9527+3=9530。
在本申请实施例中,虚拟存储节点是一个为便于系统处理而设置的虚拟的概念,对于一致性哈希算法而言,该虚拟存储节点的个数可以与一致性哈希算法的哈希环的长度相对应。
其中,上述一致性哈希计算的哈希环的长度可以是流式数据的存储周期与单个数据片段的时长的比值。比如,以流式数据的存储周期为一年(即存储系统默认存储流式数据中最近一年的数据),每个数据片段的时间长度为半小时为例,上述一致性算法的哈希环的长度可以为366*24*2=17568,其中,考虑到部分年份为闰年的因素,一年的天数以最大天数(即366天)计算。即上述哈希环的长度为17568,对应的虚拟存储节点的个数也为17568个。
在本申请实施例中,存取控制设备在切分获得数据片段之后,还可以自动设置数据片段的名称(即文件名),在本申请实施例中,数据片段的名称可以包括流式数据的标识、时间信息以及虚拟存储节点的标识,比如,以流式数据为上述视频监控设备生成的监控视频流为例,一个视频片段的名称可以为:camera ID+起始时间+Vnode序号。
步骤404,存取控制设备根据该数据片段对应的虚拟存储节点的标识确定该数据片段对应的至少一个存储设备。
在本申请实施例中,存取控制设备可以根据该数据片段对应的虚拟存储节 点的标识,通过伪哈希算法计算确定该数据片段对应的至少一个存储设备。
比如,以本申请实施例涉及的存储系统为分布式文件系统为例,存取控制设备可以通过CRUSH(Controlled Replication Under Scalable Hashing,可伸缩哈希下的受控复制)算法,根据数据片段对应的虚拟存储节点的标识计算确定数据片段对应的至少一个存储设备。
可选的,存取控制设备可以获取该流式数据对应的冗余策略,该冗余策略用于指示该数据片段对应的每一组冗余数据所在的存储设备区间;并根据该数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从该数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储该数据片段的存储设备。
其中,数据片段的冗余数据是指该数据片段对应的,存储在不同存储设备中的多组数据。在数据存储领域中,将数据片段存储为不同存储设备中的多组冗余数据,可以防止因为部分存储设备故障而导致存储在这部分存储设备中的数据片段丢失。
其中,若干个存储设备可以划分为不同层级的存储设备区间,具体比如,若干个存储设备可以分布在多个机房中,每个机房中设置有多台存储主机,每台存储主机对应多个存储设备组,每个存储设备组中包含多个存储设备。在数据存储过程中,用户或者系统可以指定冗余策略,即一份数据同时存储在多个不同的存储设备中,这样当一个存储设备发生故障时,该存储设备中存储的数据不至于丢失。上述冗余策略可以是指示通过什么样的层级来进行冗余,具体比如,该冗余策略可以是同一份数据的每个副本分别存储在不同机房中,或者,同一份数据的每个副本分别存储在一个机房中的不同主机中,或者,同一份数据的每个副本分别存储在一个主机中的不同存储设备组中,或者,同一份数据的每个副本分别存储在一个存储设备组中的不同存储设备中等等。
在确定存储设备时,存取控制设备可以根据冗余策略确定至少一个存储设备,比如,当冗余策略为同一个数据片段的各组冗余数据分别存储在不同机房中时,存取控制设备可以从每一个机房包含的所有存储设备中确定(比如,按照CRUSH等伪哈希算法确定)出一个存储设备作为当前存储的数据片段对应的存储设备,有多少个机房即确定多少个存储设备作为数据片段对应的存储设备。
步骤405,存取控制设备将该数据片段存储至该数据片段对应的至少一个 存储设备。
其中,上述至少一个存储设备中包含一个主存储设备以及至少一个从存储设备,存取控制设备可以将该数据片段存储至该主存储设备,以便该主存储设备生成该数据片段的各组冗余数据,并将各组冗余数据分别同步给该至少一个从存储设备。
具体的,存取控制设备可以将确定的至少一个存储设备中的第一个存储设备设置为主存储设备,存储过程中所有的计算通过主存储设备完成,如果是副本模式,则主存储设备将副本数据发送给从存储设备;如果是纠删码模式,主存储设备需要先将数据拆分为条带,将第一份数据保存到本地,其它的数据发送到从存储设备,存储的过程需要等待数据至少一个存储设备中都写完才算成功。
其中,以将视频监控设备采集生成的流式数据存储在OSD中为例,流式数据中的数据片段在每个OSD中数据存储形式可以如图5所示,图5的说明如下:
1、对象存储设备上的数据按照写入的虚拟存储节点(Vnode)数量,组织写入N个虚拟存储节点内的对象(object)中;
2、每个虚拟存储节点的目录下,产生1到M个虚拟存储节点目录,将每个对象通过哈希算法放入不同的目录中;
3、每个对象按照视频监控设备标识(比如camera ID)+桶标识(bucket ID)+时间戳(比如起始时间点)的形式命名并进行存储,在后续查询时,可以直接查找对应的对象名;
4、对象存储设备中还包含存储目录,该存储目录下可以包含索引文件以及日志文件等,其中,每个对象会产生相应的I帧索引,I帧索引存入索引文件中进行保存;日志文件中保存有在该对象存储设备上执行过的操作的记录,比如读取记录和存储记录等。
可选的,在本申请实施例中,当某个Vnode对应的存储设备发生改变,比如某个Vnode对应的存储设备涉及到设备添加、删除或者故障恢复后重上线时,导致映射数据需要迁移,此时,存储系统可以进行数据修复。
以数据存储在OSD中为例,数据修复以Vnode为单位发起,比如,当Vnode对应的OSD故障恢复并重上线时,Vnode对应的各个OSD中的某一个指定的OSD将发起数据修复流程,在此过程中,对OSD进行读写将被阻塞。数据修 复的步骤可以如下:
步骤1,查看对应的时间点的日志:指定的OSD获取故障时间内参与的OSD列表。
其中,如果Vnode对应的各个OSD中,故障恢复的OSD为主OSD(即上述主存储设备),则Vnode对应的各个OSD中的某一个从OSD(即上述从存储设备)将会被确定为上述指定的OSD,并由该指定的OSD获取主OSD发生故障的时间段内,参与存储该Vnode对应的数据片段的OSD列表。
如果Vnode对应的各个OSD中,故障恢复的OSD为从OSD,则Vnode对应的各个OSD中的主OSD将会被确定为上述指定的OSD,并由该指定的OSD获取从OSD发生故障的时间段内,参与存储该Vnode对应的数据片段的OSD列表。
步骤2,获取对应的日志:指定的OSD获取存储日志。
指定的OSD可以获取其它OSD发生故障的时间段内的存储日志,以确定在其它OSD发生故障的时间段内,有哪些数据变更,比如,新增了哪些数据或者删除了哪些数据等等。
步骤3,获取记录的需要修复的日志信息:指定的OSD获取每个副本需要修复的Object信息。
指定的OSD获取到其它OSD发生故障的时间段内的存储日志后,可以根据获取到的存储日志确定哪些Object需要被恢复到重新上线的OSD中。
步骤4,根据需要修复的日志信息进行数据修改,进行数据修复的具体操作可以如下:
如果主OSD是故障恢复的OSD,且主OSD中出现数据丢失的object,则由主OSD主动拉取未发生故障的从OSD上的object数据,根据拉取的object数据在本地进行数据恢复。
比如,当冗余模式为副本模式时,主OSD将从OSD上的object数据作为副本进行存储;或者,当冗余模式为纠删码模式时,主OSD根据从OSD上的object数据进行纠删码计算,获得主OSD上的object数据并存储。
如果从OSD是故障恢复的OSD,且从OSD中上出现数据丢失的object,则由主OSD主动向故障恢复的从OSD推送需要修复的object数据。
比如,当冗余模式为副本模式时,主OSD将本地的object数据作为副本推送给故障恢复的从OSD;或者,当冗余模式为纠删码模式时,主OSD根据 本地存储的object数据以及其它未发生故障的从OSD上的object数据进行纠删码计算,获得故障恢复的从OSD上的object数据,并将计算获得的object数据推送给故障恢复的从OSD。
如果主OSD和部分从OSD是故障恢复的OSD,且缺失object数据,则先由主OSD从未发生故障的从OSD上拉取数据,进行本地恢复。下一次再把数据推送到需要恢复的从OSD上。
比如,当冗余模式为副本模式时,主OSD将未发生故障的从OSD上的object数据作为副本进行存储,并将副本推送给故障恢复的从OSD;或者,当冗余模式为纠删码模式时,主OSD根据未发生故障的从OSD上的object数据进行纠删码计算,获得主OSD上的object数据以及故障恢复的从OSD上的object数据,并将故障恢复的从OSD上的object数据推送给故障恢复的从OSD。
综上所述,本申请实施例提供的方法,通过按照时间对流式数据中的数据进行切分,并根据该流式数据的标识以及数据片段对应的时间信息进行哈希计算,获得数据片段对应的虚拟存储节点的标识,在根据虚拟存储节点的标识确定对应的存储设备,在进行哈希计算时,考虑了数据片段的时间信息,后续用户在查找流式数据中的数据时,只需要提供需要查找的数据对应的时间信息,存储系统可以直接根据时间信息确定出存储对应的数据片段的存储设备,不需要所有的存储设备分别查找用户想要查找的数据片段,从而降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
请参考图6,其示出了本申请一个实施例提供的数据查询方法的流程图。该方法可以用于图1所示的数据存储系统中。如图6所示,以查询流式数据中的数据片段为例,该数据查询方法可以包括:
步骤601,存取控制设备接收包含目标时间的查询请求,该查询请求用于查询流式数据中的数据片段。
在本申请实施例中,当流式数据按照上述图4所示的方法存储至各个存储设备之后,用户需要查询该流式数据中处于某个时间点处的数据时,可以在查询终端展示的查询界面中直接设置或者输入需要查询的目标时间,查询终端可以生成用于针对该流式数据,且包含目标时间的查询请求。
比如,以流式数据为某个视频监控设备生成的监控视频流为例,当用户想要查询17年9月10号上午7点15分左右的监控视频画面时,可以在查询界 面中选中该视频监控设备和日期,并输入目标时间“7:15:00”,查询终端可以生成包含监控视频流的标识和目标时间“17-09-10,7:15:00”的查询请求,并将该查询请求提供给存取控制设备。
步骤602,存取控制设备根据该目标时间确定该数据片段对应的时间信息,该时间信息用于指示数据片段对应在流式数据中的时间。
存取控制设备获取到查询请求后,可以根据查询请求中包含的目标时间,确定所要查询的数据片段对应在流式数据中的时间段。
其中,存取控制设备根据查询请求中包含的目标时间,确定所要查询的数据片段对应在流式数据中的时间段的方法,与上述图4所示实施例中按照流式数据中的各个单位数据所对应的时间切分数据片段的方法相对应。
比如,假设在上述图4所示的实施例中,存取控制设备检测到当前收到的视频帧的时间戳跨越了整点或半整点时,将在当前视频帧之前接收到的,且未切分的视频帧切分为一个视频片段,则对于上述目标时间为“17-09-10,7:15:00”的查询请求,存取控制设备确定所要查询的数据片段对应的时间段为17年9月10号上午的7:00:00~7:29:59。
步骤603,存取控制设备根据该流式数据的标识以及该数据片段对应的时间信息进行哈希计算,获得该数据片段对应的虚拟存储节点的标识。
步骤604,存取控制设备根据该数据片段对应的虚拟存储节点的标识确定该数据片段对应的至少一个存储设备。
其中,上述步骤603至步骤604的实现过程与上述图4所示实施例中的步骤403至步骤404类似,详见上述图4所示实施例中的描述,此处不再赘述。
步骤605,存取控制设备从该数据片段对应的至少一个存储设备中查询该数据片段。
在本申请实施例中,存取控制设备确定查询的数据片段所在的至少一个存储设备后,可以向确定出的存储设备查询该数据片段,比如,可以向确定的至少一个存储设备中的主存储设备查询该数据片段。
在本申请实施例中,查询流式数据中的某一个数据片段时,用户只需要提供要查询的数据片段的标识以及流式数据的标识,存取控制设备即可以通过哈希算法确定存储该数据片段的至少一个存储设备,并向确定的至少一个存储设备查询数据片段,该过程中不需要向每一个存储设备都进行查询,从而极大的降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
综上所述,本申请实施例提供的方法,通过按照时间对流式数据中的数据进行切分,并根据该流式数据的标识以及数据片段对应的时间信息进行哈希计算,获得数据片段对应的虚拟存储节点的标识,在根据虚拟存储节点的标识确定对应的存储设备,在进行哈希计算时,考虑了数据片段的时间信息,后续用户在查找流式数据中的数据时,只需要提供需要查找的数据对应的时间信息,存储系统可以直接根据时间信息确定出存储对应的数据片段的存储设备,不需要所有的存储设备分别查找用户想要查找的数据片段,从而降低存储系统在查询流式数据过程中的系统开销,节约系统资源。
请参考图7,其示出了本申请一个实施例提供的数据存储装置的框图。该装置可以通过硬件或者软硬结合的方式实现为图1所示数据存储系统的存取控制设备110的部分或者全部,用以执行如图2或4中由存取控制设备所执行的全部或者部分步骤。该装置可以包括:
切分模块701,用于按照流式数据中的数据所对应的时间对所述流式数据中的数据进行切分,获得数据片段;
计算模块702,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;
设备确定模块703,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
存储模块704,用于将所述数据片段存储至所述数据片段对应的至少一个存储设备。
可选的,所述计算模块,包括:
计算单元,用于按照一致性哈希算法分别对所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值;
标识获得单元,用于根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识。
可选的,所述标识获得单元,用于将所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值的和,获取为所述数据片段对应的虚拟存储节点的序号。
可选的,所述设备确定模块,用于根据所述数据片段对应的虚拟存储节点 的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备。
可选的,所述设备确定模块,包括:
策略获取单元,用于获取所述流式数据对应的冗余策略,所述冗余策略用于指示所述数据片段对应的每一组冗余数据所在的存储设备区间;
设备确定单元,用于根据所述数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从所述数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储所述数据片段的存储设备。
可选的,所述至少一个存储设备中包含一个主存储设备以及至少一个从存储设备,
所述存储模块,用于将所述数据片段存储至所述主存储设备,以便所述主存储设备将所述数据片段同步给所述至少一个从存储设备。
请参考图8,其示出了本申请一个实施例提供的数据查询装置的框图。该装置可以通过硬件或者软硬结合的方式实现为图1所示数据存储系统的存取控制设备110的部分或者全部,用以执行如图6中由存取控制设备所执行的全部或者部分步骤。该装置可以包括:
请求接收模块801,用于接收包含目标时间的查询请求,所述查询请求用于查询所述流式数据中的数据片段;
信息确定模块802,用于根据所述目标时间确定所述数据片段对应的时间信息,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;
计算模块803,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;
设备确定模块804,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;
查询模块805,用于从所述数据片段对应的至少一个存储设备中查询所述数据片段。
可选的,所述信息确定模块,具体用于确定所述目标时间所在的时间段,所述时间段是所述数据片段在所述流式数据中的起止时间段,并根据所述时间段确定所述数据片段对应的时间信息。
请参考图9,其示出了本申请一个示例性实施例提供的计算机设备的结构 示意图,该计算机设备可以实现为上述图1所示系统中的存取控制设备130。该计算机设备包括:处理器91、通信组件92、存储器93以及总线94。
处理器91包括一个或者一个以上处理核心,处理器91通过运行软件程序以及模块,从而执行各种功能以及信息处理。
通信组件92可以包括有线网络接口(比如以太网接口)和无线网络接口(比如WLAN、BLE、ZigBee等接口)中的至少一种。该通信组件92用于对信息进行调制和/或解调,并通过有线信号或无线信号接收或发送该信息。
存储器93通过总线94与处理器91相连。
存储器93可用于存储软件程序以及模块。
存储器93可存储至少一个功能所述的应用程序模块96。处理器91可以通过执行上述应用程序模块96来实现上述图2、图4或图6中存取控制设备执行的全部或者部分步骤。
此外,存储器93可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随时存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
本申请一个实施例还提供一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由计算机设备的处理器执行以完成本申请各个实施例所示的数据存储方法或者数据查询方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。
本领域普通技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通 过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,可以仅仅为一种逻辑功能划分,也可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据需要选择其中的部分或者全部模块来实现本实施例方案的目的。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
Claims (18)
- 一种数据存储方法,其特征在于,所述方法包括:对流式数据中的数据进行切分,获得数据片段;根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;将所述数据片段存储至所述数据片段对应的至少一个存储设备。
- 根据权利要求1所述的方法,其特征在于,所述根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,包括:按照一致性哈希算法分别对所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值;根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识。
- 根据权利要求2所述的方法,其特征在于,所述根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识,包括:将所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值的和,获取为所述数据片段对应的虚拟存储节点的序号。
- 根据权利要求1所述的方法,其特征在于,所述根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备,包括:根据所述数据片段对应的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备。
- 根据权利要求4所述的方法,其特征在于,所述根据所述数据片段对应 的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备,包括:获取所述流式数据对应的冗余策略,所述冗余策略用于指示所述数据片段对应的每一组冗余数据所在的存储设备区间;根据所述数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从所述数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储所述数据片段的存储设备。
- 一种数据查询方法,其特征在于,所述方法包括:接收包含目标时间的查询请求,所述查询请求用于查询流式数据中的数据片段;根据所述目标时间确定所述数据片段对应的时间信息,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;从所述数据片段对应的至少一个存储设备中查询所述数据片段。
- 根据权利要求6所述的方法,其特征在于,所述根据所述目标时间确定所述数据片段对应的时间信息,包括:确定所述目标时间所在的时间段,所述时间段是所述数据片段在所述流式数据中的起止时间段,并根据所述时间段确定所述数据片段对应的时间信息。
- 一种数据存储装置,其特征在于,所述装置包括:切分模块,用于对流式数据中的数据进行切分,获得数据片段;计算模块,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;存储模块,用于将所述数据片段存储至所述数据片段对应的至少一个存储设备。
- 根据权利要求8所述的装置,其特征在于,所述计算模块,包括:计算单元,用于按照一致性哈希算法分别对所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值;标识获得单元,用于根据所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值获得所述数据片段对应的虚拟存储节点的标识。
- 根据权利要求9所述的装置,其特征在于,所述标识获得单元,用于将所述流式数据的标识的哈希数值以及所述数据片段对应的时间信息的哈希数值的和,获取为所述数据片段对应的虚拟存储节点的序号。
- 根据权利要求8所述的装置,其特征在于,所述设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识,通过伪哈希算法计算确定所述数据片段对应的至少一个存储设备。
- 根据权利要求11所述的装置,其特征在于,所述设备确定模块,包括:策略获取单元,用于获取所述流式数据对应的冗余策略,所述冗余策略用于指示所述数据片段对应的每一组冗余数据所在的存储设备区间;设备确定单元,用于根据所述数据片段对应的虚拟存储节点的标识,通过所述伪哈希算法从所述数据片段对应的每一组冗余数据所在的存储设备区间中分别计算确定出至少一个用于存储所述数据片段的存储设备。
- 一种数据查询装置,其特征在于,所述装置包括:请求接收模块,用于接收包含目标时间的查询请求,所述查询请求用于查询所述流式数据中的数据片段;信息确定模块,用于根据所述目标时间确定所述数据片段对应的时间信息,所述时间信息用于指示所述数据片段对应在所述流式数据中的时间;计算模块,用于根据所述流式数据的标识以及所述数据片段对应的时间信息进行哈希计算,获得所述数据片段对应的虚拟存储节点的标识;设备确定模块,用于根据所述数据片段对应的虚拟存储节点的标识确定所述数据片段对应的至少一个存储设备;查询模块,用于从所述数据片段对应的至少一个存储设备中查询所述数据片段。
- 根据权利要求13所述的装置,其特征在于,所述信息确定模块,具体用于确定所述目标时间所在的时间段,所述时间段是所述数据片段在所述流式数据中的起止时间段,并根据所述时间段确定所述数据片段对应的时间信息。
- 一种计算机设备,其特征在于,所述计算机设备包含处理器和存储器,所述存储器中存储有指令,所述处理器执行所述指令使得所述计算机设备实现如上述权利要求1至5任一所述的数据存储方法。
- 一种计算机设备,其特征在于,所述计算机设备包含处理器和存储器,所述存储器中存储有指令,所述处理器执行所述指令使得所述计算机设备实现如上述权利要求6或7所述的数据查询方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,计算机设备执行所述指令使得所述计算机设备实现如上述权利要求1至5任一所述的数据存储方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,计算机设备执行所述指令使得所述计算机设备实现如上述权利要求6或7所述的数据查询方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710858464.XA CN109542330B (zh) | 2017-09-21 | 2017-09-21 | 数据存储方法、数据查询方法及装置 |
CN201710858464.X | 2017-09-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019057081A1 true WO2019057081A1 (zh) | 2019-03-28 |
Family
ID=65811112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/106495 WO2019057081A1 (zh) | 2017-09-21 | 2018-09-19 | 数据存储方法、数据查询方法、计算机设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109542330B (zh) |
WO (1) | WO2019057081A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110275873A (zh) * | 2019-06-28 | 2019-09-24 | 重庆紫光华山智安科技有限公司 | 文件存储方法、装置、存储管理设备及存储介质 |
CN110336891A (zh) * | 2019-07-24 | 2019-10-15 | 中南民族大学 | 缓存数据分布方法、设备、存储介质及装置 |
CN111093094A (zh) * | 2019-12-03 | 2020-05-01 | 深圳市万佳安物联科技股份有限公司 | 视频转码方法、装置、系统及电子设备及可读存储介质 |
CN111263183A (zh) * | 2020-02-26 | 2020-06-09 | 腾讯音乐娱乐科技(深圳)有限公司 | 唱歌状态识别方法及装置 |
CN111400322B (zh) * | 2020-03-25 | 2023-10-03 | 抖音视界有限公司 | 用于存储数据的方法、装置、电子设备和介质 |
CN112015561B (zh) * | 2020-09-16 | 2024-07-30 | 支付宝(杭州)信息技术有限公司 | 用于流式计算服务的方法、装置和系统 |
CN113194117A (zh) * | 2021-03-22 | 2021-07-30 | 海南视联通信技术有限公司 | 一种基于视联网的数据处理的方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103561057A (zh) * | 2013-10-15 | 2014-02-05 | 深圳清华大学研究院 | 基于分布式哈希表和纠删码的数据存储方法 |
CN104881481A (zh) * | 2015-06-03 | 2015-09-02 | 安科智慧城市技术(中国)有限公司 | 一种存取海量时间序列数据的方法及装置 |
CN105243140A (zh) * | 2015-10-10 | 2016-01-13 | 中国科学院软件研究所 | 一种面向高速列车实时监控的海量数据管理方法 |
CN107154957A (zh) * | 2016-12-29 | 2017-09-12 | 贵州电网有限责任公司铜仁供电局 | 基于虚拟环负载均衡算法的分布式存储控制方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106357452B (zh) * | 2016-09-29 | 2019-06-04 | 上海和付信息技术有限公司 | 一种单点异构数据存储的高可用框架系统及其实现方法 |
-
2017
- 2017-09-21 CN CN201710858464.XA patent/CN109542330B/zh active Active
-
2018
- 2018-09-19 WO PCT/CN2018/106495 patent/WO2019057081A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103561057A (zh) * | 2013-10-15 | 2014-02-05 | 深圳清华大学研究院 | 基于分布式哈希表和纠删码的数据存储方法 |
CN104881481A (zh) * | 2015-06-03 | 2015-09-02 | 安科智慧城市技术(中国)有限公司 | 一种存取海量时间序列数据的方法及装置 |
CN105243140A (zh) * | 2015-10-10 | 2016-01-13 | 中国科学院软件研究所 | 一种面向高速列车实时监控的海量数据管理方法 |
CN107154957A (zh) * | 2016-12-29 | 2017-09-12 | 贵州电网有限责任公司铜仁供电局 | 基于虚拟环负载均衡算法的分布式存储控制方法 |
Non-Patent Citations (2)
Title |
---|
YU, YANG: "How to Put Eggs into Different Baskets?An Introduction to Data Distribution Algorithms in Distributed Storage", BOCLOUD, 24 March 2017 (2017-03-24), pages 1 - 3, XP055584284, Retrieved from the Internet <URL:http://www.bocloud.com.cn/news/show-201.html> * |
ZHANG, YOUDONG: "bout MongoDB Sharding, Something you Should Know", MONGODB SHARDING, 7 September 2016 (2016-09-07), pages 2 - 3, XP055584293, Retrieved from the Internet <URL:http://www.mongoing.com/archives/3397> * |
Also Published As
Publication number | Publication date |
---|---|
CN109542330B (zh) | 2020-11-10 |
CN109542330A (zh) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019057081A1 (zh) | 数据存储方法、数据查询方法、计算机设备及存储介质 | |
WO2021003985A1 (zh) | 区块链数据归档存储方法、装置、计算机设备和存储介质 | |
WO2019154394A1 (zh) | 分布式数据库集群系统、数据同步方法及存储介质 | |
US8260742B2 (en) | Data synchronization and consistency across distributed repositories | |
US9690823B2 (en) | Synchronizing copies of an extent in an append-only storage system | |
CN109739929A (zh) | 数据同步方法、装置及系统 | |
WO2014101424A1 (zh) | 分布式数据库同步方法和系统 | |
US9547706B2 (en) | Using colocation hints to facilitate accessing a distributed data storage system | |
TWI609277B (zh) | 與位置獨立之檔案 | |
CN109407977B (zh) | 一种大数据分布式存储管理方法及系统 | |
US11188423B2 (en) | Data processing apparatus and method | |
CN110046062B (zh) | 分布式数据处理方法及系统 | |
WO2020063600A1 (zh) | 数据容灾方法与站点 | |
CN103902410A (zh) | 云存储系统的数据备份加速方法 | |
CN105376277A (zh) | 一种数据同步方法及装置 | |
US10664494B2 (en) | Method and system for synchronously storing multi-modal information of portable endoscope | |
CN109947730B (zh) | 元数据恢复方法、装置、分布式文件系统及可读存储介质 | |
CN117667944B (zh) | 用于分布式图数据库的副本扩容方法、装置及系统 | |
US20130226867A1 (en) | Apparatus and method for converting replication-based file into parity-based file in asymmetric clustering file system | |
US10853892B2 (en) | Social networking relationships processing method, system, and storage medium | |
CN109445988A (zh) | 异构容灾方法、装置、系统、服务器和容灾平台 | |
CN111404737B (zh) | 一种容灾处理方法以及相关装置 | |
US20210240350A1 (en) | Method, device, and computer program product for recovering based on reverse differential recovery | |
CN110109934B (zh) | 一种数据库管理方法、装置、服务器及存储介质 | |
TWI420333B (zh) | 分散式的重複數據刪除系統及其處理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18858839 Country of ref document: EP Kind code of ref document: A1 |