Disclosure of Invention
The embodiment of the invention provides a data reading method and device, which can solve the problem of reading failure in a scene of concurrent data relocation and data reading by a user.
In a first aspect, a method for reading data is provided, in which an access service layer, when receiving a notification message that a client requests a data set to be read to start moving in a data reading process, suspends reading remaining data in the data set and records location information for identifying data that is read last and successfully in the data set, the data set includes at least one piece of data, each piece of data corresponds to an identifier, the data set is moved to move the data set from one storage space to another storage space, when receiving a notification that the data set is moved, a new storage location of the data set is determined according to the identifier of the data set, and then, according to the recorded location information for identifying data that is read last and successfully in the data set, remaining data that is located after the data that is read last and successfully is continuously read in the moved data set stored in the new storage location.
In the process of reading data, if an access service layer receives a notification message that a client requests to read a data set to start moving, the access service layer starts to suspend reading of the remaining data in the data set and records an identifier for identifying the data which is read last successfully in the data set, and then continues to read the remaining data which is located after the data which is read last successfully from a new storage position of the data set according to the identifier of the data which is read last successfully after receiving the notification message that the client requests to read the data set to finish moving, so that all data which are requested to be read can be returned in one data reading request, and the situation that one data reading request is reinitiated due to reading failure can be avoided.
In one possible design, the location information for identifying the last successfully read data in the data set may be at least one of: the data identification corresponding to the data which is successfully read last in the data set, the data identification corresponding to each data which is successfully read in the data set, and the data identification corresponding to the next adjacent data of the data which is successfully read last in the data set.
The location of the last successfully read data can be obtained by the location information for identifying the last successfully read data in the data set, so that it can be ensured that the remaining data after the last successfully read data is continuously read from the new storage location of the data set.
In a possible design, the access service layer may read new metadata corresponding to the identifier of the data set according to the identifier of the data set, and determine a new storage location of the data set according to the read new metadata.
The new metadata corresponding to the identifier is read through the identifier of the data set, so that the new storage position of the data set can be obtained, and the remaining unread data can be continuously read at the new storage position, so that the situation that the unread data is repeatedly read all the time in the data layer is avoided.
In one possible design, before the access service layer suspends reading of remaining data in the data set and records an identifier of data that is read last successfully in the data set, the access service layer may read metadata of the data set requested to be read according to the identifier of the data set requested to be read by acquiring a read data request sent by the client, determining a storage location of the data set requested to be read, reading at least one data in the data set requested to be read according to the storage location of the data set requested to be read, and establishing an input/output (IO) stream with the client to send the read at least one data to the client in a data stream form.
In a possible design, after the access service layer suspends reading of the remaining data in the data set and records the identifier of the data that is read last successfully in the data set, the access service layer may maintain an IO stream established with the client, and after the access service layer continues to read the remaining data that is located after the data that is read last successfully in the migrated data set stored in the new storage location, the access service layer may access the remaining data that continues to be read to the IO stream maintained by the client, and send the remaining data that continues to be read to the client.
By maintaining the IO stream established with the client and re-accessing the read residual data into the maintained IO stream after the data set is moved, the state of data transmission can be maintained, and the interruption can be avoided.
In a second aspect, there is provided a data reading apparatus comprising: an IO interface and a processor; the IO interface is used for receiving and sending data and data identification by a user; the processor is configured to, in a data reading process, when determining that the IO interface receives a notification message that a client requests a read data set to start moving, suspend reading of remaining data in the data set and record location information for identifying data that is read last and successfully in the data set, where the data set includes at least one piece of data, each piece of data corresponds to one identifier, and the data set is moved to move the data set from one storage space to another storage space; when the IO interface is determined to receive the notification of the completion of the data set relocation, determining a new storage position of the data set according to the identifier of the data set; and according to the recorded position information used for identifying the data which is read successfully last in the data set, continuing to read the residual data which is positioned after the data which is read successfully last in the moved data set stored in the new storage position.
In one possible design, the location information for identifying the last successfully read data in the data set is at least one of:
data identification corresponding to the data which is successfully read last in the data set;
data identification corresponding to each data which is successfully read in the data set;
and the data identification corresponding to the next adjacent data of the data which is successfully read last in the data set.
In one possible design, when the processor determines a new storage location of the data set according to the identifier of the data set, the processor may read new metadata corresponding to the identifier of the data set according to the identifier of the data set; and determining a new storage position of the data set according to the read new metadata.
In one possible design, before suspending reading of remaining data in the data set and recording location information for identifying data that is successfully read last in the data set, the processor may further obtain, through the IO interface, a read data request sent by the client, where the read data request includes an identifier of the data set requested to be read; reading metadata of the data set requested to be read according to the identification of the data set requested to be read, and determining the storage position of the data set requested to be read; reading at least one data in the data set requested to be read according to the storage position of the data set requested to be read; and establishing an IO stream between the IO interface and the client to send the read at least one data to the client in a data stream form.
In one possible design, after suspending reading of the remaining data in the data set and recording location information identifying the last successfully read data in the data set, the processor may further continue to maintain the IO stream established between the IO interface and the client; after the processor continues to read the remaining data after the last successfully read data in the migrated data set stored in the new storage location, the processor may access the remaining data that is continuously read to the IO stream maintained between the IO interface and the client, so as to send the remaining data that is continuously read to the client.
In a third aspect, an apparatus for reading data is provided, including: a processor and a memory; the memory stores computer executable instructions, the processor is connected to the memory via the bus, and when the device is running, the processor executes the computer executable instructions stored in the memory, so as to make the device execute the method according to any one of the above first aspects.
In a fourth aspect, there is provided a computer readable storage medium comprising computer readable instructions which, when read and executed by a computer, cause the computer to perform the method of any of the first aspects.
In a fifth aspect, there is provided a computer program product comprising computer readable instructions which, when read and executed by a computer, cause the computer to perform the method of any of the first aspects.
Detailed Description
Fig. 3 shows a system architecture to which an embodiment of the present invention is applicable, which includes a client 301 and a storage server 302. Therein, a client 301 communicates with a storage server 302 for requesting read and write data from the storage server 302.
The storage server 302 may include an access service layer, a metadata layer, and a data layer; the access service layer is configured to interact with the client 301, receive a request message sent by the client 301, and perform an operation corresponding to the request message. The metadata layer is used for storing metadata corresponding to data stored in the data layer, and the metadata is mainly information describing data attributes and is used for supporting functions such as indicating storage positions and storage sizes. The data layer is used for storing data requested to be stored by the client.
At present, data is read as a most basic service, and in order to guarantee the reliability of the data, each layer of the system has corresponding reliability guarantee, for example, after an access service layer detects an abnormality, a corresponding retry is performed to solve the problems of network instability, flash or server busy, and a corresponding retry mechanism is also performed on a bottom data layer to ensure that the failure of the whole request due to the failure of a certain disk is avoided.
However, the reliability protection of each layer before the data layer is basically limited to simple retry at each layer, for example, the retry mechanism at the access service layer is only effective at the beginning of service establishment; the data layer always tries to read the data stored in the storage space from the same storage space and the same position continuously. Therefore, under the conditions that data relocation is frequent and data reading operation is concurrent, the data reading failure rate is obviously improved. If the data read by the user is large, because the reading of a certain section of data fails, the IO stream between the server and the user is interrupted, and the previously successfully transmitted data is completely invalidated, which causes great waste of network resources and time.
In order to solve the above problem, fig. 4 exemplarily shows a flow of a method for data reading provided by an embodiment of the present invention, where the flow may be performed by an access service layer, and the access service layer may be located in a storage server.
Step 401, when receiving a notification message that a client requests to read a data set to start moving in a data reading process, an access service layer suspends reading of remaining data in the data set and records location information for identifying the last successfully read data in the data set.
In the embodiment of the present invention, the data set requested to be read by the client may include at least one data, and each data corresponds to an identifier, for example, data set 1 includes data 1, data 2, … …, and data n, where n is a positive integer. And the data n is a data identifier corresponding to the nth data. In the embodiment of the present invention, the data set identifier and the data identifier of the data are used in a digital form, which is only an example, and in a specific application, other identifiers that can be used to distinguish different data from different data sets may be used, which is not limited in this respect. Data set migration may be understood as the process of moving a data set from one storage space to another. The access service layer may encounter a task of data relocation performed by the data layer in the process of reading data, and at this time, a notification message that a data set starts relocation sent by the server background may be received, that is, a task of data relocation stored in the data layer exists while a task of data reading is performed, or that a data set being read is being relocated. At this time, the access service layer suspends reading the data remaining in the data set currently being read and records the location information for identifying the last successfully read data in the data set. The location information for identifying the last successfully read data in the data set may be at least one of: data identification corresponding to the data which is successfully read at last in the data set; respectively corresponding data identification to each data which is successfully read in the data set; and data identification corresponding to the next adjacent data of the data which is successfully read last in the data set. That is, here, the data identifier of the last successfully read data, or the data identifier of the first data to be read in the remaining data, or the data identifiers of the respective data that have been successfully read may be recorded. For example, there are four data in the data set, data 1, data 2, data 3, and data 4. After reading the data 1 and the data 2, when starting to read the data 3, the received data set is wholly moved to another storage space, and the access service layer may record the data identifier of the data 2, may also record the data identifiers of the data 1 and the data 2, and may also record the data identifier of the unread data 3 next adjacent to the data 2.
In this case, the access service layer may suspend the task of the client requesting to read the data in the data set, and continue to read the data after the task of the data set relocation is completed.
Before suspending the task of reading data, the access service layer generally obtains a read data request sent by the client, where the read data request includes an identifier of a data set requested to be read. The data set requested to be read comprises at least one piece of data, the metadata corresponding to the identifier of the data set requested to be read can be read from the metadata layer through the identifier of the data set requested to be read, and the storage position of the data set requested to be read (namely, the storage position before the data set is moved), namely, the old storage position of the data set in the data layer can be determined from the read metadata. The metadata is information generated by the server after storing the data in the data set to the data layer and used for describing the attribute of the data set. After determining the storage location of the data set requested to be read, the access service layer may read at least one data in the data set requested to be read from the storage location of the data set requested to be read by the data layer. Then, the access service layer establishes an IO stream with the client to send the read at least one data to the client in a data stream form.
The access service layer may also continue to maintain the IO stream established with the client while suspending reading of the remaining data in the data set currently being read, so as to prevent the IO stream from being interrupted. The IO stream established with the client is kept, so that a user can not see the situation of data reading failure at the client and can not see reading pause, the user can see that data is transmitted all the time and is not interrupted, and the user experience can be improved.
Step 402, when receiving the notification of the completion of the data set relocation, the access service layer determines a new storage location of the data set according to the identifier of the data set.
When the data set read by the client in the data layer is requested to be moved, the server background can send a notification of the completion of the movement, and when the access service layer receives the notification of the completion of the movement of the data set, a new metadata can be generated after the movement of the data set is completed, and the new metadata is stored in the metadata layer. Since the metadata is updated after the data in the data set is migrated, when the task of reading the data is continued, the data to be read may not be located in the previous storage location, and the remaining data in the data set cannot be read in the previous old storage location, so that the new metadata generated after the data set is migrated needs to be read in the metadata layer according to the identifier of the data set, where the new metadata indicates the new storage location of the data set. After reading the new metadata corresponding to the identification of the data set, a new storage location for the data set may be obtained.
Step 403, the access service layer continues to read the remaining data after the data successfully read last in the migrated data set stored in the new storage location according to the recorded location information for identifying the data successfully read last in the data set.
After obtaining the new storage location of the data set, the access service layer may continue to read the remaining data located after the last successfully read data in the migrated data set stored in the new storage location according to the location information recorded in step 401 for identifying the last successfully read data in the data set. And when the position information of the last successfully read data is the data identifier corresponding to the last successfully read data, directly finding the data corresponding to the data identifier of the last successfully read data in a new storage position according to the data identifier of the last successfully read data, and then reading the residual data behind the last successfully read data. When the position information of the last successfully read data is the data identifier corresponding to each successfully read data, the data identifier corresponding to the last successfully read data in each successfully read data is determined, then the data corresponding to the data identifier of the last successfully read data is found in a new storage position according to the data identifier of the last successfully read data, and finally the remaining data behind the last successfully read data is read. When the position information of the last successfully read data is the data identifier corresponding to the next adjacent data of the last successfully read data, the data corresponding to the next adjacent data of the last successfully read data and the remaining data after the next adjacent data of the last successfully read data can be directly read. For example, a data set includes 5 data, which are respectively data 1, data 2, data 3, data 4, and data 5, and when the access service layer suspends the task of reading data, the last successfully read data recorded is data 2, that is, data 1 and data 2 have been successfully read, and data 3, data 4, and data 5 are the remaining data in the data set. After obtaining a new storage location according to the new metadata, the access service layer finds data 2 in the new storage location, and then continues to read data 3 located after data 2 until data 5 is read.
After the access service layer continues to read the data, the remaining data which is continuously read is accessed into the IO stream maintained by the client, and the remaining data which is continuously read is sent to the client. In this way, the situation seen on the user side is that the speed is slightly slower than that of a read data request when no data is migrated, but all data reading tasks can be successfully completed in one read data request, and the situation of reading failure can not occur.
In order to clearly explain the data reading process provided by the embodiment of the present invention, the data reading process will be described below in a specific implementation scenario.
As shown in fig. 5, the process specifically includes:
in step 501, a client initiates a read data request.
The client side initiates a data reading request, and the data reading request comprises the identification of the data set. The identification of this data set is 1, namely data set 1. The data set includes 5 data, which are data 1, data 2, data 3, data 4 and data 5, where the numbers 1, 2, 3, 4 and 5 are data identifiers corresponding to the respective data.
Step 502, reading the metadata information of the data set.
And after receiving a data reading request sent by the client, the access service layer reads the metadata information corresponding to the data set with the identifier 1.
Step 503, obtaining the position information of the data set on the data layer.
After the metadata information corresponding to the data set with the identifier 1 is read, the access service layer may obtain a storage location of the data set on the data layer.
At step 504, a location on the data layer is located and reading of data begins.
After obtaining the storage location of the data set with the identifier 1, the access service layer locates to the data layer, starts to read the data in the data set at the storage location of the data set with the identifier 1 on the data layer, starts to read from the data 1 in sequence, and records the data identifier of the successfully read data if reading a successful data.
Step 505, the read data is returned.
After reading the data in the data set, the data layer returns the data that has been read to the access service layer. For example, after reading data 1, data 1 is returned to the access service layer, and data 2 is read continuously.
Step 506, data is returned to the client in the form of IO streams.
And the access service layer returns the received data 1 which is successfully read to the client in the form of IO stream.
In step 507, the migration task starts and notifies the data of the data set to start migrating.
And in the process of reading the data, the access service layer starts to execute the relocation task of the data set with the identifier 1 in the data layer after the server, and notifies the access service layer that the data of the data set with the identifier 1 which is currently being read starts to be relocated.
Step 508, suspending the current data reading task flow, recording the identifier of the current last successfully read data, and maintaining the IO stream established with the client.
The access service layer suspends the data reading task currently being executed, namely suspends the reading of the remaining unread data in the data set, records the data identifier of the currently and last successfully read data, for example, the data identifier of the last successfully read data is 2, namely the data 2 has been read, the data 3 has not been read, and keeps the IO stream established with the client.
In step 509, the metadata is updated to notify the completion of the data migration task.
And after the data in the data set is completely migrated, the server background updates the metadata corresponding to the data set, and after the metadata is completely updated, informs the access service layer that the data migration task of the data set is completed.
At step 510, the metadata information of the data set is re-read.
And after receiving the notification of the completion of the data relocation task of the data set, the access service layer returns to the metadata layer to read the metadata information corresponding to the data set with the identifier 1.
In step 511, the storage location of the data set on the data layer is obtained.
After obtaining the metadata of the data set identified as 1, the access service layer may retrieve the new storage location of the data set identified as 1 on the data layer after performing necessary check on the metadata.
At step 512, the data is relocated to a new storage location in the data layer, and the remaining data in the data set continues to be read.
The access service layer relocates to the data layer after the new storage location according to the retrieved data set identified as 1, and then reads the remaining data located after data 2 at the new storage location of the data set identified as 1 at the data layer. Before formal reading, the reading position is moved forward from data 1 to the position of data 3, and then the remaining data, namely data 3, data 4 and data 5, are actually read from the data set identified as 1 stored in the data layer.
Step 513, return the read data.
And after the residual data are read from the new storage position of the data layer, returning the read residual data to the access service layer.
Step 514, continuing to transmit the remaining data by the IO stream before the pause.
And after receiving the read residual data, the access service layer continues to transmit the read residual data by the IO stream which is suspended before the newly read residual data is connected with the client. Therefore, the situation can be seen from the user side that the speed is slightly slower than that of reading without relocation, but all data reading tasks are successfully completed in one reading request without failure.
Based on the same technical concept, fig. 6 illustrates a structure of a data reading apparatus 600 according to an embodiment of the present invention, where the apparatus 600 may be an access service layer and may perform the above-mentioned data reading procedure.
As shown in fig. 6, the apparatus 600 specifically includes: an IO interface 601 and a processor 602;
the IO interface 601 is configured to receive and send data and a data identifier;
the processor 602 is configured to, in a data reading process, determine that when the IO interface 601 receives a notification message that a client requests a read data set to start moving, suspend reading of remaining data in the data set and record location information used to identify data that is read last and successfully in the data set, where the data set includes at least one piece of data, each piece of data corresponds to an identifier, and the data set is moved to move the data set from one storage space to another storage space; when it is determined that the IO interface 601 receives the notification that the data set is moved, determining a new storage location of the data set according to the identifier of the data set; and according to the recorded position information used for identifying the data which is read successfully last in the data set, continuing to read the residual data which is positioned after the data which is read successfully last in the moved data set stored in the new storage position.
In one possible design, the processor 602, when determining the new storage location of the data set according to the identification of the data set, has a function of:
reading new metadata corresponding to the identification of the data set according to the identification of the data set;
and determining a new storage position of the data set according to the read new metadata.
In one possible design, the processor 602, before suspending reading of the data remaining in the data set and recording location information identifying the last successfully read data in the data set, is further configured to:
acquiring a data reading request sent by the client through the IO interface 601, where the data reading request includes an identifier of a data set requested to be read;
reading metadata of the data set requested to be read according to the identification of the data set requested to be read, and determining the storage position of the data set requested to be read;
reading at least one data in the data set requested to be read according to the storage position of the data set requested to be read;
establishing an IO stream between the IO interface 601 and the client, and sending the read at least one data to the client in a data stream form.
In one possible design, the processor 602, after suspending reading of the data remaining in the data set and recording location information identifying the last successfully read data in the data set, is further configured to:
maintaining the IO stream established between the IO interface 601 and the client;
after continuing to read the remaining data after the last successfully read data in the migrated data set stored in the new storage location, the processor 602 is further configured to:
and accessing the continuously read residual data to the IO stream maintained between the IO interface 601 and the client, and sending the continuously read residual data to the client.
Based on the same technical concept, an embodiment of the present invention further provides a data reading apparatus 700, as shown in fig. 7, the apparatus 700 may include: I/O interface 701, processor 702, and memory 703. The processor 702 is used to control the operation of the apparatus 700; the memory 703 may include both read-only memory and random-access memory, and stores instructions and data that may be executed by the processor 702. A portion of the memory 703 may also include non-volatile row random access memory (NVRAM). The I/O interface 701, processor 702, and memory 703 components are connected by a bus 709, wherein the bus 709 may include a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus 709 in the figure.
The data reading method disclosed by the embodiment of the invention can be applied to the processor 702, or implemented by the processor 702. In implementation, the steps of the process flow may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 702. The processor 702 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like that implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 703, and the processor 702 reads the information stored in the memory 703, and completes a data reading step in combination with hardware thereof.
The method for reading data disclosed by the embodiment of the invention can be applied to the processor 702, or implemented by the processor 702.
The processor 702 is configured to read codes in the memory 703 for performing the flow of data reading in the above method embodiments.
Based on the same technical concept, embodiments of the present invention also provide a computer-readable storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer-readable storage medium causes the computer to execute the above data reading method.
Based on the same technical concept, embodiments of the present invention further provide a computer program product, which includes computer readable instructions, and when the computer reads and executes the computer readable instructions, the computer executes the method for reading data.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.