CN110968577B - Method and system for writing and reading resources and time sequence storage system - Google Patents

Method and system for writing and reading resources and time sequence storage system Download PDF

Info

Publication number
CN110968577B
CN110968577B CN201811133251.1A CN201811133251A CN110968577B CN 110968577 B CN110968577 B CN 110968577B CN 201811133251 A CN201811133251 A CN 201811133251A CN 110968577 B CN110968577 B CN 110968577B
Authority
CN
China
Prior art keywords
data
resource
data segment
writing
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811133251.1A
Other languages
Chinese (zh)
Other versions
CN110968577A (en
Inventor
林炳辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811133251.1A priority Critical patent/CN110968577B/en
Publication of CN110968577A publication Critical patent/CN110968577A/en
Application granted granted Critical
Publication of CN110968577B publication Critical patent/CN110968577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a system for writing and reading resources and a time sequence storage system. The method for writing in the resource comprises the following steps: receiving a request resource label of a resource to be written, wherein the resource to be written is stored in a corresponding position in a writing request resource acquisition device according to the request resource label; acquiring a resource to be written from a writing request resource acquisition device according to the received request resource label; and writing the resource to be written into at least one data block according to a predetermined mode. The invention also discloses a computing device for executing the method.

Description

Method and system for writing and reading resources and time sequence storage system
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and a system for writing and reading resources.
Background
With the continuous development of the internet of things technology, the time sequence database is an important service in the direction of the internet of things and is also more and more concerned by the industry. Briefly, a time sequence database (hereinafter referred to as "time sequence storage model") is a database for storing time sequence data, and supports basic functions of fast writing, persistence, multi-latitude aggregated query and the like of the time sequence data. Under a multithreading read-write scene, the traditional time sequence storage model often has the problems of low aggregation efficiency, huge jitter and degradation of data throughput caused by locks of various granularities and the like. Therefore, it is necessary to design a time-series data storage model suitable for a multithreading read-write scenario, which can ensure high throughput and data reliability at the same time.
Disclosure of Invention
To this end, the present invention provides a method and system of writing to, reading from, and a sequential storage system, in an attempt to solve, or at least alleviate, at least one of the problems identified above.
According to an aspect of the present invention, there is provided a method of writing to a resource, comprising the steps of: receiving a request resource label of a resource to be written, wherein the resource to be written is stored in a corresponding position in a writing request resource acquisition device according to the request resource label; acquiring a resource to be written from a writing request resource acquisition device according to the received request resource label; and writing the resource to be written into at least one data block according to a predetermined mode.
Optionally, in the method for writing in a resource according to the present invention, the resource to be written includes at least one time series data, and each time series data includes timestamp information.
Optionally, in the method for writing in a resource according to the present invention, the request resource label comes from a second queue of the write request resource collecting device, the write request resource collecting device obtains an idle request resource label from the first queue in response to requests for writing in a resource sent by a plurality of first threads, and stores the request resource label in the second queue after the plurality of first threads write in the request resource according to the request resource label, where the requests for writing in a resource correspond to different request resource labels.
Optionally, in the method for writing a resource according to the present invention, the step of writing the resource to be written into at least one data block in a predetermined manner includes: mapping each time sequence data to be written into the resource to at least one data segment according to the timestamp information; and writing the time sequence data in each data segment into at least one data block according to the capacity of each data segment.
Optionally, in the method for writing in a resource according to the present invention, the step of mapping each time series data to be written in the resource to at least one data segment according to the timestamp information includes: determining a label corresponding to each time sequence data according to the timestamp information of each time sequence data and the length of a preset data segment; and dividing the time sequence data with the same label into a data segment, and planning a memory for each data segment according to the label sequence to ensure that each data segment has a corresponding data segment identifier, wherein each data segment comprises a state identifier.
Optionally, in the method for writing in a resource according to the present invention, the step of mapping each time series data to be written in the resource to at least one data segment according to the timestamp information further includes: respectively comparing the length of the data segment of each current data segment with the length of a preset data segment; and determining time sequence data corresponding to each position in each data segment by combining the comparison result and the state identification of each data segment.
Optionally, in the method for writing in a resource according to the present invention, the step of determining, according to the comparison result and the status identifier of each data segment, time-series data corresponding to each position in each data segment includes: if the length of the current data segment is smaller than the length of the preset data segment and the state identifier is 0, adding new time sequence data into the current data segment; if the length of the current data segment is not less than the length of the preset data segment and the state identifier is 0, determining the position of each time sequence data in the current data segment, and rearranging each time sequence data according to the determined position; and if the state identifier is 1, determining the position of each time sequence data in the current data segment, acquiring the corresponding original data of each position, and combining the corresponding original data of each position with the current time sequence data to obtain the data of each position.
Optionally, in the method for writing a resource according to the present invention, the step of determining the position of each time series data in the current data segment includes: and determining the position of each time sequence data according to the preset data segment length and the data segment identification of the current data segment and the time stamp information of each time sequence data.
Optionally, in the method for writing resources according to the present invention, the step of writing the time-series data in each data segment into at least one data block according to the capacity of each data segment further includes: determining the number of data blocks corresponding to each data segment according to the capacity of each data segment; and writing the time sequence data in each data segment into the corresponding data block in sequence, wherein each data block has a corresponding data block label.
Optionally, in the method for writing a resource according to the present invention, after the step of writing the resource to be written into at least one data block in a predetermined manner, the method further includes the steps of: and transmitting the request resource labels to a first queue of the write request resource acquisition device for reuse by a plurality of first threads.
According to another aspect of the present invention, there is provided a system for writing to a resource, the system comprising: the resource acquisition device is suitable for distributing corresponding idle request resource labels for the resources to be written in the requests when receiving the requests for writing the resources from the first threads so as to send the request resource labels to the resource storage device; and the resource storage device is suitable for acquiring the resource to be written from the resource acquisition device according to the received request resource label when the request resource label is received, and writing the resource to be written into at least one data block according to a preset mode.
Optionally, in the system for writing in resources according to the present invention, the resource collecting apparatus is further adapted to convert a request for writing in resources of multiple threads into a request for writing in resources of multiple single threads.
Optionally, in the system for writing in resources according to the present invention, the resource collecting device is further adapted to obtain an idle request resource label from the first queue when receiving a request for writing in resources sent by the first threads, and store the request resource label in the second queue after the first threads write in request data according to the request resource label.
Optionally, in the system for writing resources according to the present invention, the resource storage device is further adapted to transmit the request resource index to the first queue of the resource collecting device for reuse by the plurality of first threads after writing the resource to be written into the at least one data block in a predetermined manner.
According to still another aspect of the present invention, there is provided a method of reading a resource, including the steps of: when requests for reading resources sent by a plurality of second threads are received, determining to store at least one data segment of the resources to be read; sending the data block label and the corresponding first check code in at least one data segment to a second thread so that the second thread can verify the first check code for the first time, and sending the time sequence data and the second check code in the corresponding data block to the second thread when the first verification passes so that the second thread can verify the second check code for the second time; and if the secondary verification is passed, confirming that the time sequence data is the resource to be read, and sending the time sequence data to the second thread.
According to still another aspect of the present invention, there is provided a system for reading a resource, including: the resource storage device is suitable for determining at least one data segment for storing the resources to be read according to the labels of the resources to be read in the requests when receiving the requests for reading the resources from a plurality of second threads, and sending the labels of the data blocks in the data segments and corresponding first check codes to the second threads so as to facilitate the second threads to carry out primary verification on the first check codes; the resource storage device is further adapted to send the time sequence data and the second verification code in the corresponding data block to the second thread when the first verification passes, so that the second thread performs second verification on the second verification code, and when the second verification passes, confirms that the time sequence data is the resource to be read, and sends the time sequence data to the second thread.
According to still another aspect of the present invention, there is provided a time-series memory system including: a system for writing to a resource as described above; and a system for reading resources as described above.
According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing any of the methods as described above.
According to yet another aspect of the present invention, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform any of the methods described above.
According to the time sequence storage scheme, the storage space can be flexibly distributed according to the sparsity of the resource to be written, so that the waste of the space is avoided. Meanwhile, under the condition of the same storage space, the scheme provided by the invention can accommodate more time sequence data. The scheme according to the invention takes up at most if one data segment uses K consecutive data points
Figure BDA0001814083290000041
One data point resource, and the conventional time sequence storage model needs to occupy N data pointsWhen K is<<N, the conventional model causes a very serious waste of space, but the time sequence storage system according to the present solution does not. And compared with the common time sequence storage scheme, the time sequence storage scheme has better performance and throughput.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a sequential storage system 100 according to one embodiment of the invention;
FIG. 2 shows a schematic diagram of a computing device 200, according to one embodiment of the invention;
FIG. 3 illustrates a schematic diagram of a method 300 of writing to a resource, according to one embodiment of the invention;
FIGS. 4A and 4B respectively illustrate a schematic diagram of a resource collection apparatus 112 according to some embodiments of the invention;
FIG. 5 illustrates a schematic structural diagram of the resource storage 114 (or the resource storage 122) according to one embodiment of the present invention; and
FIG. 6 shows a flow diagram of a method 600 of reading a resource according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a schematic diagram of a sequential memory system 100 according to one embodiment of the invention. As shown in FIG. 1, a sequential storage system 100 according to an embodiment of the present invention includes a system 110 to write resources and a system 120 to read resources. Wherein the system for writing resources 110 writes resources (i.e., timing data) in a predetermined manner and stores in response to a request for a multi-threaded write resource; the system for reading resources 120 reads resources from stored resources in response to a multi-threaded request to read resources. According to an embodiment of the present invention, the system for writing resources 110 includes at least a resource collection device 112 and a resource storage device 114, and the system for reading resources 120 includes at least a resource storage device 122. According to one embodiment, the system 110 end of the write resource is connected to a plurality of first threads, the plurality of first threads are used for producing data, and time stamp information of each data is recorded. When multiple first threads are to write resources into the system 110, a request for writing resources is sent to the system 110, as shown in fig. 1, the multiple first threads may send a multi-threaded request for writing resources to the resource collection apparatus 112 in parallel. When receiving the request for writing in the resource, the resource collection device 112 allocates a corresponding request resource label to the resource to be written in the request, converts the multi-threaded request for writing in the resource into a single-threaded request for writing in the resource, and sends the single-threaded request to the resource storage device 114. According to an embodiment of the present invention, the resource collecting apparatus 112 is arranged with a request pool for storing the received requests for writing into the resource, and allocating a request resource label to each request. Meanwhile, a first queue and a second queue are also arranged in the resource collection device 112. The first queue stores the request resource labels that are free, and the second queue stores the request resource labels that are not free (i.e., the request resource labels in which the request data are written). When receiving a request for writing in a resource, the resource acquisition device 112 acquires an idle request resource label from the first queue and informs the corresponding first thread, the first thread writes in the request resource at a corresponding position of the request pool, the request pool informs the second queue of the request resource label for writing in the request resource, and the second queue sends the request resource label to the resource storage device 114. The resource storage device 114 acquires the resource to be written from the resource acquisition device 112 according to the received request resource label, stores the resource in the resource storage device 114 through a plurality of data blocks, and writes the acquired resource to be written into at least one data block in a predetermined manner. According to still other embodiments, the system 120 side of reading resources is coupled to a plurality of second threads, the plurality of second threads for consuming data. When the second thread is to read the resource for consumption, it sends a request to read the resource directly to the system 120. Specifically, the multiple second threads may send a request for reading the resource to the resource storage device 122, the resource storage device 122 determines at least one data block storing the resource to be read, and sends data and a check code (denoted as cs) corresponding to the determined data block to the corresponding second thread, the second thread recalculates the check code (denoted as cs ') according to the data, and if the calculated check code cs' is equal to the received check code cs, the verification is passed, and the resource storage device 122 sends the resource in the corresponding data block to the second thread for consumption by the second thread. If the calculated check code cs' is not equal to the received check code cs, the check is not passed, and the second thread reads again after a few clock cycles. It should be noted that fig. 1 is only an example, and in practical applications, the resource storage 114 and the resource storage 122 may be configured as a same device, and used for storing the resource written by the first thread and being read by the second thread. Also, in other embodiments, fewer, additional, or different components may be present in the system 100.
According to embodiments of the invention, sequential storage system 100 may be implemented by one or more computing devices. In some embodiments, the sequential storage system 100 and its components, such as the system for writing to resources 110 and the system for reading from resources 120, may be implemented by a computing device 200 as described below.
FIG. 2 shows a schematic diagram of a computing device 200, according to one embodiment of the invention.
As shown in FIG. 2, in a basic configuration 202, a computing device 200 typically includes a system memory 206 and one or more processors 204. A memory bus 208 may be used for communication between the processor 204 and the system memory 206.
Depending on the desired configuration, the processor 204 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a digital information processor (DSP), or any combination thereof. The processor 204 may include one or more levels of cache, such as a level one cache 210 and a level two cache 212, a processor core 214, and registers 216. Example processor cores 214 may include Arithmetic Logic Units (ALUs), floating Point Units (FPUs), digital signal processing cores (DSP cores), or any combination thereof. The example memory controller 218 may be used with the processor 204, or in some implementations the memory controller 218 may be an internal part of the processor 204.
Depending on the desired configuration, system memory 206 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 206 may include an operating system 220, one or more applications 222, and program data 224. In some implementations, the application 222 can be arranged to execute instructions on the operating system with the program data 224 by the one or more processors 204.
Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (e.g., output devices 242, peripheral interfaces 244, and communication devices 246) to the basic configuration 202 via the bus/interface controller 230. The example output device 242 includes a graphics processing unit 248 and an audio processing unit 250. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 252. Example peripheral interfaces 244 can include a serial interface controller 254 and a parallel interface controller 256, which can be configured to facilitate communications with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 258. An example communication device 246 may include a network controller 260, which may be arranged to facilitate communications with one or more other computing devices 262 over a network communication link via one or more communication ports 264.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 200 may be implemented as a server, such as a file server, database server, application server, WEB server, and the like, or as a personal computer including desktop and notebook computer configurations. Of course, computing device 200 may also be implemented as part of a small-sized portable (or mobile) electronic device. In an embodiment in accordance with the invention, the computing device 200 is configured to perform the method 300 of writing to resources and the method 600 of reading resources in accordance with the invention. Among other things, application 222 of computing device 200 includes a plurality of program instructions that implement methods 300 and 600 according to the present invention.
FIG. 3 shows a flow diagram of a method 300 of writing to a resource, according to one embodiment of the invention. The process of the method 300 performed by the system for writing resources 110 is described in detail below with reference to fig. 1 and the related description above for the system for writing resources 110.
As shown in fig. 3, the method 300 begins at step S310. In step S310, the resource storage device 114 receives the request resource labels of the resources to be written, and the resources to be written are stored in corresponding positions in the write request resource collecting device (i.e. the resource collecting device 112, hereinafter referred to as the resource collecting device 112) according to the request resource labels. According to the embodiment of the invention, the resource to be written comprises at least one time sequence data, wherein each time sequence data comprises corresponding time stamp information.
According to one embodiment, the requested resource identifier of the resource to be written is assigned by the resource acquisition device 112. Specifically, the resource collection device 112 receives requests for writing in resources from a plurality of first threads, obtains idle request resource labels from the first queue, and returns the idle request resource labels to the corresponding first threads, and the first threads write the request resources in the request pool according to the request resource labels, and at the same time, the request pool transfers the request resource labels for writing in data to the second queue. The second queue may then send the corresponding requested resource label to the resource store 114. According to the present invention, the resource collection device 112 converts the multi-threaded request for writing into resources into a plurality of single-threaded requests for writing into resources, and the different requests for writing into resources correspond to different request resource labels. Fig. 4A shows a schematic diagram of a resource collection apparatus 112 according to an embodiment of the invention. Assuming that there are 4 request resource labels 1, 2, 3, 4 in the request pool, initially the first queue contains all request resource labels (i.e. 1, 2, 3, 4), after a part of the request resources are stored in the request pool, the "/" filling label in the request pool indicates that the request resource has been stored at the position of the label, at this time, the request resource labels 1, 3 are temporarily free and put into the first queue, and the request data has been written into the request resource labels 2, 4, so that the request resource labels are put into the second queue. When the resource collection device 112 receives a request for writing in the resource from the first thread, the resource collection device 112 calculates the number of required request resource labels according to the number of requests, and assuming that a request resource label is required at this time, the first queue preferentially returns the request resource label 1 to the first thread, the first thread writes the request data in the position corresponding to the label 1 in the request pool, and at the same time, the request resource label 1 is stored in the second queue. Fig. 4B shows a schematic diagram of the resource collection device 112 after the request is written, where at this time, the request resource label 1 is located in the second queue, and the request resources are stored in the request resource labels 1, 2, and 4. In the embodiment according to the present invention, it is assumed that one request resource label can accommodate K requests, so when a first thread simultaneously initiates N requests for writing into resources, the first thread will obtain M request resource labels, where M is calculated as follows:
Figure BDA0001814083290000091
wherein it is present>
Figure BDA0001814083290000092
Indicating rounding up.
According to the embodiment of the present invention, the resource collecting apparatus 112 returns the empty request resource label to the first thread requesting to write the resource by distinguishing the first queue from the second queue, so that the first thread can write the resource from the corresponding position of the request pool according to the empty request resource label. Because different first threads obtain different request resource labels, the positions written into the memory are not overlapped, and the memory can be directly written without lock synchronization. After the writing is completed, the request pool indicates the second queue, and the request resource label of the written resource is placed in the second queue, so that the resource storage device 114 can obtain the request resource label from the second queue.
Subsequently, in step S320, the resource storage device 114 obtains the resource to be written from the resource collection device 112 according to the received request resource identifier.
Subsequently, in step S330, the resource storage device 114 writes the resource acquired in step S320 into at least one data block in a predetermined manner.
FIG. 5 shows a schematic diagram of the structure of the resource storage 114 according to an embodiment of the invention. As shown in fig. 5, the resource storage 114 contains a pool of data segments and a pool of data blocks. And recording data segments of each applied memory in the data segment pool, wherein each data segment is provided with a data segment identifier, one data segment corresponds to a plurality of data blocks, and the labels of the corresponding data blocks are stored in each data segment so as to access the corresponding data blocks in the data block pool according to the data block labels. Corresponding time sequence data is stored in the data block pool. According to one embodiment, the resource storage 114 further comprises a time sequence data mapping module for mapping the obtained time sequence data in the resource into one or more data segments.
In connection with fig. 5, in an embodiment according to the present invention, step S330 may be performed in two sub-steps: the resource to be written is mapped into one or more data segments through the time sequence data mapping module, and then the data in each data segment is written into one or more data blocks. These two substeps will be explained in detail below.
The first step, mapping each time sequence data to be written into the resource to at least one data segment according to the time stamp information.
In a preferred embodiment, a label corresponding to each time sequence data is determined according to the time stamp information of each time sequence data and the length of a preset data segment; and then dividing the time sequence data with the same label into a data segment, and planning a memory for each data segment according to the label sequence, so that each data segment has a corresponding data segment identifier. It is assumed that the time series data is represented by a duplet of (UUID, time), where UUID is the unique identification of the time series line and time is the time stamp information of the time series data. For example, "CPU usage of host a", is a timing line whose unique identifier is set as UUID _ a, then the CPU usage of host a at 12. In the embodiment according to the present invention, the length of the predetermined data segment (denoted as SegLength) is a value that is set in advance according to an application scenario (e.g., a pre-allocated size of a database, an index item that needs to be collected, etc.), and then the label SegID corresponding to each time series data may be represented as:
Figure BDA0001814083290000101
wherein it is present>
Figure BDA0001814083290000102
Is shown facing downwardsAnd (6) taking the whole.
The time series data with the same SegID is used as a data segment, and the memory is sequentially planned for each data segment according to the sequence of the SegID (for example, according to the sequence of the SegID from large to small or from small to large), so that each data segment has a corresponding data segment identifier.
As shown in fig. 5, if there are two segids after the resource to be written is mapped by the time sequence data mapping module, two data segments need to be applied from the data segment pool. According to the embodiment of the invention, each time a new data segment is applied, the data segment pool returns the value of the water level line (denoted as s _ tail) of the data segment pool, the water level line value indicates the currently idle data segment identifier, and the corresponding data segment is applied from the data segment pool according to the s _ tail value. And after the application of the new data segment is finished, updating the value of s _ tail according to the number of the data segments applied at this time. Continuing with the example of fig. 5, assuming that initial s _ tail =1, it indicates that the currently applicable data segment starts from the data segment with data segment identifier 1, two new data segments (data segments with data segment identifiers 1 and 2) are sequentially applied from the data segment pool for storing the resource to be written this time, and then the value of s _ tail is updated to 3.
In addition, each data segment also contains a status identifier (denoted as "Sorted") which identifies whether all the time-series data in each data segment can find its corresponding position in the data segment according to the time stamp information. In other embodiments according to the present invention, after the time-series data are divided into at least one data segment according to the above method, the position of each time-series data in the data segment is determined according to the status identifier. The initial value of the Sorted is set to 0, and when the Sorted =0, the position of the time series data needs to be determined by further combining the length of the data segment; when Sorted =1, the position of the time series data can be directly determined. In one embodiment according to the present invention, the position of each time series data may be determined in the following manner: (1) comparing the length of the data segment of each current data segment with the length of a preset data segment; (2) the comparison result and the status flag of each data segment are combined to determine the corresponding time series data at each position in each data segment, which can be roughly divided into the following three cases.
1) If the length of the current data segment is smaller than the length of the preset data segment and the status flag of the current data segment is 0 (i.e., sorted = 0), new time series data continues to be added to the current data segment.
2) And if the length of the current data segment is not less than the length of the preset data segment and the state identifier of the current data segment is 0, determining the position of each time sequence data in the current data segment, and rearranging each time sequence data according to the determined position. According to an embodiment of the present invention, the step of determining the position of each time series data in the current data segment includes: determining the position of each time series data according to the preset data segment length and the data segment identification of the current data segment and the timestamp information of each time series data, specifically, calculating the position of each time series data by the following formula:
P=time-segID*SegLength,
in the above formula, P represents the position of the time series data, time represents the timestamp information of the time series data, segID represents the label corresponding to the time series data (the calculation process can refer to the foregoing description), and SegLength represents the predetermined data segment length of the data segment where the time series data is located.
3) If the state identifier is 1 (i.e. Sorted = 1), the position of each time series data in the current data segment is directly determined (the same calculation method as that in 2 may be adopted here) to determine the position of the time series data), and the raw data corresponding to each position is obtained, and the raw data corresponding to each position and the current time series data are combined to obtain the data at each position. In other words, in this case, the raw data at each position and the calculated time-series data may be combined for further processing, and the processed result may be taken as the data at the position. According to one embodiment, the manner in which further processing occurs includes, but is not limited to, summing, maximizing, minimizing, counting, and the like. The processing mode can be preset in the time sequence data, for example, the UUID is further set to be a binary group or a triple group, and the processing mode is indicated besides the unique identification of the time sequence line.
And writing the time sequence data in each data segment into at least one data block according to the capacity of each data segment.
In a preferred embodiment, the number of data blocks corresponding to each data segment is determined according to the capacity of each data segment; and writing the time sequence data in each data segment into the corresponding data block according to the sequence. In an embodiment according to the invention, a data segment of capacity L is accessible
Figure BDA0001814083290000121
A data block to which sequential data within the data segment is sequentially assigned>
Figure BDA0001814083290000122
In a data block. In other embodiments, a ≧ can be generated>
Figure BDA0001814083290000123
The data block number group of the size is stored in the data segment and used for indicating whether the data block is applied for the data block in the data block pool or not. Initially, the position of each data block label in the data block label array is-1, which represents that a data block is not applied yet, when the value at the position of a certain data block label in the data block label array is not-1, the position represents that a data block is applied at the position, and the label value represents the position of the data block in the data block pool.
The process of allocating data blocks for time series data is described with reference to fig. 5 as an example. Assuming that the allocated capacity of both data segments in fig. 5 is 4, each data segment can access 2 data blocks. Similarly, when a new chunk is requested, the chunk pool returns the value of the water line (denoted as b _ tail) of the chunk pool, which indicates the currently free chunk label, and the corresponding chunk is requested from the chunk pool according to the b _ tail value. And after the new data block is applied, updating the value of b _ tail according to the number of the data blocks applied at this time. Assuming that b _ tail =1 initially, it indicates that the currently applicable data block starts from the data block with data block number 1, and four new data blocks (data blocks with data block numbers 1 to 4) are sequentially applied from the data block pool for storing the resource to be written this time, where the data segment with data segment number 1 corresponds to the data blocks with access data block numbers 1 and 2, and the data segment with data segment number 2 corresponds to the data blocks with access data block numbers 3 and 4. The value of b _ tail is then updated to 5. Meanwhile, the value of the data block label array can be updated, and the data block label is written into the corresponding position in the array.
In an embodiment according to the invention, a data segment of capacity L is accessible
Figure BDA0001814083290000124
One data block, one size->
Figure BDA0001814083290000125
A block of data can be accessed>
Figure BDA0001814083290000126
A time series of data points. As shown in fig. 5, in the data segment pool, each data segment includes a data block label and a corresponding first check value, and similarly, each data block includes time series data and a corresponding second check value. It should be noted that the first check value and the second check value may be hash values of corresponding data block labels and time series data, which is not limited by the embodiment of the present invention.
At this point, all the time series data to be written into the resource are written into the resource storage device 114 in a "data segment-data block" manner. According to the resource writing scheme of the invention, the sequential data multithreading insertion scene of the repeat points, the dense points and the sparse points can be supported, and the system 110 can flexibly allocate the storage space according to the sparsity degree of the inserted data volume, thereby avoiding the waste of the space. Meanwhile, under the condition of the same storage space, more time sequence data can be accommodated according to the scheme of the invention. The scheme according to the invention takes up at most if a data segment uses K consecutive data points
Figure BDA0001814083290000131
A data point resource, while the conventional time sequence storage model needs to occupy N data points, when K is<<N, the conventional model causes a very serious waste of space, but the sequential storage system according to the present solution does not.
In the embodiment of the present invention, in order to recycle the resource, after the step S330 is completed, the resource storage device 114 transmits the request resource identifier to the first queue in the resource collecting device 112 as a free request resource identifier for the reuse of the plurality of first threads.
FIG. 6 shows a schematic diagram of a method 600 of reading a resource according to one embodiment of the invention. The process of the method 600 performed by the system for reading resources 120 will be described in detail below with reference to fig. 1.
As shown in fig. 6, the method 600 begins in step S610, when requests for reading resources sent by a plurality of second threads are received, at least one data segment of the resources to be read is determined to be stored. For example, the request includes timestamp information of each time sequence data in the resource to be read, and the data segment label where the time sequence data is located is determined according to the timestamp information.
Subsequently, in step S620, the data block label and the corresponding first check code in the at least one data segment are sent to the corresponding second thread, so that the second thread verifies the first check code. In conjunction with the foregoing description with respect to FIG. 5, in an embodiment in accordance with the invention, each location in the data block index array of data segments contains two fields: a data block label and a first check code. When the second thread wants to read the data block at a position, the resource storage device 122 sends the data block label and the corresponding first check code to the corresponding second thread, the second thread calculates the first check code according to the data block label, if the calculated first check code is consistent with the received first check code, the verification is passed once, otherwise, the verification is not passed once.
Similarly, at each time-series data point in the data block, three fields are included: time series data, a second check code, and time stamp information (the time stamp information is not additionally shown in fig. 5, but it should be understood that the time stamp information is included in the time series data). When the first verification passes, the resource storage device 122 sends the timing data and the second verification code in the corresponding data block to the second thread, and the second thread performs a second verification on the second verification code. And the second thread calculates a second check code according to the time sequence data and the timestamp information, if the calculated second check code is consistent with the received second check code, the second check code passes the second verification, and otherwise, the second check code does not pass the second verification.
Subsequently, in step S530, when the second verification passes, it is determined that the time series data stored in the corresponding data block is the resource to be read, and the time series data is sent to the corresponding second thread, and the reading is successful.
The sequential storage system 100 according to the present invention has at least the following advantages: 1. the resource acquisition device 112 converts the multi-thread write-in resource request into a single-thread write-in resource request, and meanwhile, the multi-thread read-write scene can be converted into a write-once read-many scene, so that the problems are ingeniously converted, and a foundation is laid for designing a sequential storage system with higher throughput and higher performance; 2. the lock cost and the real-time resource allocation cost are avoided, and excellent performance is obtained. By using the pre-allocated request pool, the data block pool and the data block pool, the corresponding resource can be quickly searched through the label of the resource in the resource storage device. The problem of data consistency is solved; 3. space is flexibly allocated, and resource waste is reduced; 4. and the online data aggregation is realized by further processing the time sequence data in the data segment. Due to the random access capability, the data at the same time point can be aggregated on line at the same array position, and the performance degradation does not occur.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the device in this example. The modules in the foregoing examples may be combined into one module or may additionally be divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination. Additionally, some of the embodiments are described herein as a method or combination of method elements that can be implemented by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (16)

1. A method of writing to a resource, the method comprising the steps of:
receiving a request resource label of a resource to be written, wherein the resource to be written is stored in a corresponding position in a writing request resource acquisition device according to the request resource label;
acquiring the resource to be written from the writing request resource acquisition device according to the received request resource label; and
writing the resource to be written into at least one data block according to a predetermined manner,
the request resource labels come from a second queue of the write request resource acquisition device, the write request resource acquisition device acquires idle request resource labels from the first queue in response to requests for writing in resources sent by a plurality of first threads, and stores the request resource labels in the second queue after the plurality of first threads write the request resources according to the request resource labels, wherein the requests for writing in resources correspond to different request resource labels.
2. The method of claim 1, wherein the resource to be written comprises at least one time series of data, each time series of data comprising timestamp information.
3. The method of claim 2, wherein the writing the resource to be written into the at least one data block in a predetermined manner comprises:
mapping each time sequence data in the resource to be written into at least one data segment according to the timestamp information; and
and writing the time sequence data in each data segment into at least one data block according to the capacity of each data segment.
4. The method of claim 3, wherein the mapping each time series of data to be written in the resource into at least one data segment according to the time stamp information comprises:
determining a label corresponding to each time sequence data according to the timestamp information of each time sequence data and the length of a preset data segment; and
dividing time sequence data with the same label into a data segment, and planning a memory for each data segment according to the label sequence to enable each data segment to have a corresponding data segment identifier, wherein each data segment comprises a state identifier.
5. The method of claim 4, wherein the step of mapping each time series of data to be written into the resource into at least one data segment according to the time stamp information further comprises:
respectively comparing the length of the data segment of each current data segment with the length of a preset data segment; and
and determining time sequence data corresponding to each position in each data segment by combining the comparison result and the state identification of each data segment.
6. The method of claim 5, wherein the step of determining the time series data corresponding to each position in each data segment according to the comparison result and the status identification of each data segment comprises:
if the length of the current data segment is smaller than the length of a preset data segment and the state identifier is 0, adding new time sequence data into the current data segment;
if the length of the current data segment is not less than the length of the preset data segment and the state identifier is 0, determining the position of each time sequence data in the current data segment, and rearranging each time sequence data according to the determined position; and
and if the state identifier is 1, determining the position of each time sequence data in the current data segment, acquiring the corresponding original data of each position, and combining the corresponding original data of each position with the current time sequence data to obtain the data of each position.
7. The method of claim 6, wherein the step of determining the position of each time series data in the current data segment comprises:
and determining the position of each time sequence data according to the length of the preset data segment of the current data segment, the data segment identification and the time stamp information of each time sequence data.
8. The method according to any one of claims 3-7, wherein the step of writing the time series data in each data segment into at least one data block according to the capacity of each data segment further comprises:
determining the number of data blocks corresponding to each data segment according to the capacity of each data segment;
and writing the time sequence data in each data segment into the corresponding data block according to the sequence, wherein each data block has a corresponding data block label.
9. The method as claimed in claim 1, wherein after the step of writing the resource to be written into at least one data block in a predetermined manner, further comprising the steps of:
and transmitting the request resource label to a first queue of the write request resource acquisition device for reuse by the plurality of first threads.
10. A system for writing to a resource, the system comprising:
the resource acquisition device is suitable for acquiring idle request resource labels from a first queue when requests for writing in resources sent by a plurality of first threads are received so as to send the request resource labels to the resource storage device, and storing the request resource labels in a second queue after the first threads write request data according to the request resource labels; and
and the resource storage device is suitable for acquiring the resource to be written from the resource acquisition device according to the received request resource label when the request resource label is received, and writing the resource to be written into at least one data block according to a preset mode.
11. The system of claim 10, wherein,
the resource storing means is further adapted to transmit the requested resource label to the first queue of the resource collecting means for reuse by the plurality of first threads after writing the resource to be written into the at least one data block in a predetermined manner.
12. A method of reading a resource, the method comprising the steps of:
when requests for reading resources sent by a plurality of second threads are received, determining to store at least one data segment of the resources to be read;
sending the data block label and the corresponding first check code in the at least one data segment to the second thread so that the second thread can verify the first check code for the first time, and sending the time sequence data and the second check code in the corresponding data block to the second thread when the first check code passes so that the second thread can verify the second check code for the second time; and
and if the secondary verification is passed, confirming that the time sequence data is the resource to be read, and sending the time sequence data to the second thread.
13. A system for reading a resource, the system comprising:
the resource storage device is suitable for determining at least one data segment for storing the resource to be read according to the label of the resource to be read in the request when receiving the request for reading the resource from a plurality of second threads, and sending the label of the data block in the at least one data segment and the corresponding first check code to the second threads so as to facilitate the second threads to carry out primary verification on the first check code;
the resource storage device is further adapted to send the time series data and the second verification code in the corresponding data block to the second thread when the first verification passes, so that the second thread performs secondary verification on the second verification code, and confirms that the time series data are the resources to be read when the second verification passes, and sends the time series data to the second thread.
14. A sequential storage system, comprising:
the system of writing resources of any one of claims 10-11; and
a system for reading resources as in claim 13.
15. A computing device, comprising:
at least one processor; and
a memory storing program instructions configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-9, and instructions for performing the method of claim 12.
16. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-9 and the method of claim 12.
CN201811133251.1A 2018-09-27 2018-09-27 Method and system for writing and reading resources and time sequence storage system Active CN110968577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811133251.1A CN110968577B (en) 2018-09-27 2018-09-27 Method and system for writing and reading resources and time sequence storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811133251.1A CN110968577B (en) 2018-09-27 2018-09-27 Method and system for writing and reading resources and time sequence storage system

Publications (2)

Publication Number Publication Date
CN110968577A CN110968577A (en) 2020-04-07
CN110968577B true CN110968577B (en) 2023-04-07

Family

ID=70027003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811133251.1A Active CN110968577B (en) 2018-09-27 2018-09-27 Method and system for writing and reading resources and time sequence storage system

Country Status (1)

Country Link
CN (1) CN110968577B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542520A (en) * 2021-07-19 2021-10-22 珠海艾派克微电子有限公司 Data transmission method, read-write equipment and consumable chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7571284B1 (en) * 2004-06-30 2009-08-04 Sun Microsystems, Inc. Out-of-order memory transactions in a fine-grain multithreaded/multi-core processor
CN102622189A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Storage virtualization device, data storage method and system
CN105988874A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Resource processing method and device
WO2018113724A1 (en) * 2016-12-21 2018-06-28 广州优视网络科技有限公司 Method and apparatus for download acceleration based on reading and writing separation mode, terminal device and storage medium
CN108509156A (en) * 2018-04-04 2018-09-07 腾讯科技(深圳)有限公司 Method for reading data, device, equipment and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7571284B1 (en) * 2004-06-30 2009-08-04 Sun Microsystems, Inc. Out-of-order memory transactions in a fine-grain multithreaded/multi-core processor
CN102622189A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Storage virtualization device, data storage method and system
CN105988874A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Resource processing method and device
WO2018113724A1 (en) * 2016-12-21 2018-06-28 广州优视网络科技有限公司 Method and apparatus for download acceleration based on reading and writing separation mode, terminal device and storage medium
CN108509156A (en) * 2018-04-04 2018-09-07 腾讯科技(深圳)有限公司 Method for reading data, device, equipment and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨洪斌 ; 吴悦 ; 刘权胜 ; .同时多线程微处理器分布式保留站结构的数据流技术.应用科学学报.2008,(02),第82-87页. *
陈楚材."面向时间序列的流式对象存储文件系统的设计与实现".《中国优秀硕士学位论文全文数据库电子期刊信息科技辑》.2018,第I137-80页. *

Also Published As

Publication number Publication date
CN110968577A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN109983449B (en) Data processing method and storage system
US20200233909A1 (en) System, method and computer program product for data transfer management
CN102301349B (en) Accessing stripping rows of data in storage chips controlled by storage controller
US10437481B2 (en) Data access method and related apparatus and system
KR101994021B1 (en) File manipulation method and apparatus
US20160092361A1 (en) Caching technologies employing data compression
US10073648B2 (en) Repartitioning data in a distributed computing system
CN105320654A (en) Dynamic bloom filter and element operating method based on same
CN111292225B (en) Partitioning graphics data for large-scale graphics processing
US20220197552A1 (en) Memory system architecture for heterogeneous memory technologies
WO2024036985A1 (en) Storage system, computational storage processor and solid-state drive thereof, and data reading method and data writing method therefor
CN110968577B (en) Method and system for writing and reading resources and time sequence storage system
US11030714B2 (en) Wide key hash table for a graphics processing unit
CN109213423A (en) Concurrent I/O command is handled without lock based on address barrier
CN107003932B (en) Cache directory processing method and directory controller of multi-core processor system
WO2023040348A1 (en) Data processing method in distributed system, and related system
US20150212759A1 (en) Storage device with multiple processing units and data processing method
TWI810876B (en) Method and computer program product and apparatus for data access in response to host discard commands
CN115658625B (en) Data decompression system, graphic processing system, device, equipment and decompression method
CN113031908B (en) Ordered data processing method and computing device
TWI835027B (en) Method and computer program product and apparatus for updating host-to-flash address mapping table
WO2024007745A1 (en) Data writing method and apparatus, data reading method and apparatus, electronic device, and storage medium
US11385998B2 (en) Memory system, data processing system and operation method of the same
CN117472870A (en) Data object storage method, device, equipment and storage medium
CN115586943A (en) Hardware marking implementation method for dirty pages of virtual machine of intelligent network card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant