CN112799585A

CN112799585A - Data processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112799585A
Application number: CN201911111597.6A
Authority: CN
Inventors: 贾宝雷; 段立国
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2021-05-14
Anticipated expiration: 2039-11-14
Also published as: CN112799585B

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a readable storage medium, and relates to the data processing technology. According to the data storage method and device, the data which are expected not to be read by the user are stored in the second storage system based on the second storage device adopting the sequential access mode in an archiving mode, and the data which are scheduled to be read are copied to the first storage system based on the first storage device adopting the random access mode, so that the storage cost of data storage can be reduced, and the requirement of online data access can be met.

Description

Data processing method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a readable storage medium.

Background

The traditional data access method adopts a data storage system based on a storage device adopting a random access mode, and can provide low-delay and high-throughput online storage service.

However, the storage cost of the storage device adopting the random access method is high, so that the storage cost of the data storage system adopting the storage device adopting the random access method is high.

Disclosure of Invention

Aspects of the present disclosure provide a data processing method, an apparatus, an electronic device, and a readable storage medium, so as to reduce storage cost of data storage.

In one aspect of the present application, a data processing method is provided, including:

receiving a storage request of data to be stored by a user;

storing the data in a first storage system based on first storage equipment, and confirming the successful data storage to the user, wherein the first storage equipment is storage equipment adopting a random access mode;

and migrating the data from the first storage system to a second storage system based on a second storage device, wherein the second storage device is a storage device adopting a sequential access mode.

The above-described aspects and any possible implementations further provide an implementation in which the first storage system includes a first distributed storage system; the storing the data to a first storage system based on a first storage device, and confirming the success of the data storage to the user comprises:

storing the data to a first distributed storage system based on a first storage device, and generating metadata of the data;

issuing an asynchronous data storage task to a message queue;

and confirming the data storage success to the user.

The above-described aspects and any possible implementations further provide an implementation in which the second storage system includes a second distributed storage system; the migrating the data from the first storage system to a second storage system based on a second storage device, comprising:

subscribing to an asynchronous data storage task from the message queue;

according to the subscribed asynchronous data storage task, acquiring metadata of the data, reading the data from the first distributed storage system according to the metadata of the data, writing the data into the second distributed storage system, and updating the metadata of the data;

and deleting the data in the first distributed storage system.

The above aspect and any possible implementation manner further provide an implementation manner, where the migrating the data from the first storage system to a second storage system based on a second storage device further includes:

and storing the task identification of the completed asynchronous data storage task.

The above aspects and any possible implementations further provide an implementation in which the first storage device includes a disk; the second storage device includes a magnetic tape.

In another aspect of the present application, there is provided another data processing method, including:

receiving a retrieval request for data to be retrieved by a user;

copying the data from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirming that the data retrieval is successful to the user, wherein the first storage device is a storage device adopting a random access mode, and the second storage device is a storage device adopting a sequential access mode;

receiving a read request for the data;

reading the data from the first storage system.

The above-described aspects and any possible implementations further provide an implementation in which the first storage system includes a first distributed storage system; the second storage system comprises a second distributed storage system; the copying the data from the second storage device based second storage system to the first storage device based first storage system confirming to the user that the data retrieval was successful comprises:

issuing an asynchronous data retrieval task to a message queue;

confirming to the user that the data retrieval acceptance is successful;

subscribing to asynchronous data retrieval tasks from the message queue;

according to the subscribed asynchronous data retrieval task, acquiring metadata of the data, reading the data from the second distributed storage system according to the metadata of the data, writing the data into the first distributed storage system, and updating the metadata of the data;

confirming to the user that the data retrieval was successful.

The above aspect and any possible implementation further provide an implementation in which the copying the data from the second storage system based on the second storage device to the first storage system based on the first storage device confirms to the user that the data retrieval is successful, and further includes:

and storing the task identification of the completed asynchronous data retrieval task.

In another aspect of the present application, there is provided a data processing apparatus including:

the request processing unit is used for receiving a storage request of data to be stored by a user;

the cache control unit is used for storing the data in a first storage system based on first storage equipment and confirming the successful data storage to the user, wherein the first storage equipment is storage equipment adopting a random access mode;

and the migration control unit is used for migrating the data from the first storage system to a second storage system based on a second storage device, wherein the second storage device is a storage device adopting a sequential access mode.

The above-described aspects and any possible implementations further provide an implementation in which the first storage system includes a first distributed storage system; the cache control unit is specifically used for

issuing an asynchronous data storage task to a message queue; and

and confirming the data storage success to the user.

The above-described aspects and any possible implementations further provide an implementation in which the second storage system includes a second distributed storage system; the migration control unit is specifically used for

Subscribing to an asynchronous data storage task from the message queue;

according to the subscribed asynchronous data storage task, acquiring metadata of the data, reading the data from the first distributed storage system according to the metadata of the data, writing the data into the second distributed storage system, and updating the metadata of the data; and

and deleting the data in the first distributed storage system.

The foregoing aspects and any possible implementations further provide an implementation, where the migration control unit is further configured to

In another aspect of the present application, there is provided another data processing apparatus including:

a request processing unit for receiving a retrieval request of data to be retrieved by a user;

a migration control unit, configured to copy the data from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirm to the user that the data retrieval is successful, where the first storage device is a storage device adopting a random access method, and the second storage device is a storage device adopting a sequential access method;

the request processing unit is further used for receiving a reading request of the data;

and the cache control unit is used for reading the data from the first storage system.

The above-described aspects and any possible implementations further provide an implementation in which the first storage system includes a first distributed storage system; the second storage system comprises a second distributed storage system; the migration control unit is specifically used for

Issuing an asynchronous data retrieval task to a message queue;

confirming to the user that the data retrieval acceptance is successful;

subscribing to asynchronous data retrieval tasks from the message queue;

according to the subscribed asynchronous data retrieval task, acquiring metadata of the data, reading the data from the second distributed storage system according to the metadata of the data, writing the data into the first distributed storage system, and updating the metadata of the data; and

confirming to the user that the data retrieval was successful.

In another aspect of the present invention, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the aspects and any possible implementation described above.

In another aspect of the invention, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the above described aspects and any possible implementation.

As can be seen from the foregoing technical solutions, on one hand, in the embodiments of the present application, a user stores data to be stored in a first storage system based on a first storage device, and confirms that the data is successfully stored in the user, where the first storage device is a storage device adopting a random access manner, so that the data can be migrated from the first storage system to a second storage system based on a second storage device, and the second storage device is a storage device adopting a sequential access manner, because the data to be stored by the user is temporarily stored in the first storage system first, and then the data stored in the first storage system is further migrated to the second storage system, so that a storage space of the first storage device is released, and thus the first storage device with only a limited storage capability can be used to repeatedly store more data without additionally expanding its storage capability, the second storage device which is used as permanent storage data and adopts a sequential access mode has lower device cost, so that the storage cost of data storage is reduced.

As can be seen from the foregoing technical solutions, in the embodiments of the present application, a user copies data to be retrieved from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirms to the user that the data is successfully retrieved, where the first storage device is a storage device adopting a random access method, the second storage device is a storage device adopting a sequential access method, and further, when a read request of the data is received, the data can be immediately read from the first storage system, and since the data to be read by the user is first copied from the second storage system to the first storage system and temporarily stored, the data can be immediately read from the first storage system when the user needs to read the data, so that a storage space of the first storage device is released, and thus more data can be repeatedly stored by using the first storage device having only a limited storage capability, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

In addition, by adopting the technical scheme provided by the application, the data which is expected not to be read by the user is archived and stored in the second storage system based on the second storage device adopting the sequential access mode, and the data which is scheduled to be read is copied into the first storage system based on the first storage device adopting the random access mode, so that the storage cost of data storage can be reduced, and the requirement of online data access can be met.

In addition, by adopting the technical scheme provided by the application and adopting the message queue to store the asynchronous tasks, the sequence of the tasks can be effectively ensured, and the execution of the tasks is decoupled.

In addition, by adopting the technical scheme provided by the application, the task identifier of the finished asynchronous data storage/retrieval task is stored, so that when the system is restarted, the execution of the asynchronous data storage/retrieval task can be restarted from the breakpoint based on the stored task identifier, the asynchronous data storage/retrieval task is ensured not to be lost, and the reliability of data access can be effectively improved.

In addition, by adopting the technical scheme provided by the application, the storage system based on the magnetic tape is used as the permanent storage layer, so that the storage cost of data storage can be effectively reduced.

In addition, by adopting the technical scheme provided by the application, the storage system based on the magnetic disk is used as a buffer storage layer, so that the time delay of storing data and reading data by a user can be reduced.

Further effects of the above aspects or possible implementations will be described below in connection with specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor. The drawings are only for the purpose of illustrating the present invention and are not to be construed as limiting the present application. Wherein:

fig. 1A is a schematic flowchart of a data processing method according to an embodiment of the present application; a

FIG. 1B is a schematic flow chart illustrating data storage processing according to the embodiment shown in FIG. 1A;

fig. 2A is a schematic flowchart of another data processing method according to an embodiment of the present application;

FIG. 2B is a flow chart illustrating data reading processing performed in the embodiment corresponding to FIG. 2A

Fig. 3 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of another data processing apparatus according to another embodiment of the present application;

fig. 5 is a schematic diagram of an electronic device for implementing the data processing method provided in the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terminal involved in the embodiments of the present application may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a Personal Computer (PC), an MP3 player, an MP4 player, a wearable device (e.g., smart glasses, smart watch, smart bracelet, etc.), and the like.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1A is a schematic flowchart of a data processing method according to an embodiment of the present application, as shown in fig. 1A.

101. A storage request for data to be stored by a user is received.

102. And storing the data in a first storage system based on a first storage device, and confirming the successful data storage to the user, wherein the first storage device is a storage device adopting a random access mode.

103. And migrating the data from the first storage system to a second storage system based on a second storage device, wherein the second storage device is a storage device adopting a sequential access mode.

It should be noted that part or all of the execution subjects of 101 to 103 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) set in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a data processing platform on the network side, which is not particularly limited in this embodiment.

It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, which is not limited in this embodiment.

In this way, the data to be stored by the user is stored in the first storage system based on the first storage device, the success of the data storage is confirmed to the user, the first storage device is a storage device adopting a random access mode, so that the data can be migrated from the first storage system to the second storage system based on the second storage device, the second storage device is a storage device adopting a sequential access mode, because the data to be stored by the user is temporarily stored in the first storage system, and then the data stored in the first storage system is further migrated to the second storage system, the storage space of the first storage device is released, so that the first storage device only having limited storage capacity can be used for repeatedly storing more data without additionally expanding the storage capacity, and because the second storage device adopting the sequential access mode is used for permanently storing the data, the equipment cost is low, so that the storage cost of data storage is reduced.

In the present application, the first storage device used is a storage device using a random access method, and means that the time required for writing or reading data is independent of the location of the data, for example, a magnetic disk, an optical disk, or the like. Such storage devices are typically expensive, and thus, they are very expensive to store for data storage in a storage system as carriers of data stored in the storage system.

In the present application, the second storage device used is a storage device using a sequential access method, and means that the time required for writing or reading data is related to the location of the data, for example, a magnetic tape. Generally, such storage devices are low cost, and therefore, such storage devices can be used as carriers of data stored by the storage system, so that the storage cost of data storage of the storage system is very low.

In the present application, a storage system refers to a system including various storage devices for storing data, a control unit, a device for managing information scheduling, and an algorithm in a computer.

Optionally, in a possible implementation manner of this embodiment, the first storage system and the second storage system may employ a conventional storage system, that is, a centralized storage server to store all data. However, in this way, the storage server becomes a bottleneck of system performance, and is also a focus of reliability and security, and may not meet the requirements of large-scale storage applications.

Optionally, in a possible implementation manner of this embodiment, the first storage system and the second storage system may adopt a distributed storage system, that is, data is dispersedly stored on a plurality of independent storage servers. The mode adopts an expandable system structure, and utilizes a plurality of storage servers to share the storage load, thereby not only improving the reliability, the availability and the access efficiency of the system, but also being easy to expand.

Accordingly, the first storage system may include, but is not limited to, a first distributed storage system, and the second storage system may include, but is not limited to, a second distributed storage system, which is not particularly limited by this embodiment.

Specifically, the first distributed storage system and the second distributed storage system adopted in the present application may specifically include, but are not limited to, at least one or two of a distributed file system and a distributed key value (k-v) system, a distributed table system and a distributed database system, which are not particularly limited in this embodiment.

Fig. 1B is a schematic flow chart of data storage processing in the embodiment corresponding to fig. 1A, and the following describes the embodiment in detail with reference to fig. 1B.

In a specific implementation, in 102, the data may be stored in a first distributed storage system based on a first storage device, and metadata of the data is generated. Then, an asynchronous data storage task may be issued to the message queue and the user may be confirmed that the data storage was successful.

The metadata of the data, which is used for recording the state information of the data, may include all data required for data access control. When accessing data, firstly, a metadata service request is made to inquire the metadata of the data, and then, subsequent I/O operations such as data reading and writing are performed through the obtained metadata.

The metadata of the data can be regarded as an electronic catalog, which is used for positioning the data and can provide shared access to any system/device with authority, so that the metadata of the data can be stored separately and stored in a metadata storage system for unified management.

In the implementation process, the implementation may be specifically implemented by a storage system control module. Wherein,

specifically, in this embodiment, when the user has a storage requirement for data, a storage request of the data to be stored by the user may be initiated, step a 1.

After receiving a user initiated storage request for data to be stored by a user, the storage system control module may store the data to a first distributed storage system based on the first storage device, generate metadata for the data, and store the metadata for the data in the metadata storage system, step a 2.1.

Concurrently with, or after step a2.1, the storage system control module may then send an asynchronous message to the message queue, representing an asynchronous data storage task, step a 2.2.

The storage system control module then returns a request success to the user to confirm to the user that the data storage was successful, step a 3.

At this time, data is only temporarily stored in the first distributed storage system based on the first storage device, but for a user, it can be confirmed that the data is successfully stored, and compared with the technical scheme of directly storing the data in the second distributed storage system based on the second storage device, the time delay of data storage can be greatly reduced.

In another specific implementation, at 103, the asynchronous data storage task may be specifically subscribed from the message queue. Then, according to the subscribed asynchronous data storage task, the metadata of the data may be acquired, and further, according to the metadata of the data, the data may be read from the first distributed storage system, and written into the second distributed storage system, so as to update the metadata of the data. And finally, deleting the data in the first distributed storage system.

Further, after 103, the task identifier of the completed asynchronous data storage task may be further stored, for example, in a database.

Therefore, the task identifier of the completed asynchronous data storage task is stored, so that when the system is restarted, the execution of the asynchronous data storage task can be restarted from the breakpoint based on the stored task identifier, the asynchronous data storage task is ensured not to be lost, and the reliability of data storage can be effectively improved.

In the implementation process, two different modules may be implemented together, one module is an asynchronous migration control module (hereinafter referred to as a master), and the other module is an asynchronous migration working module (hereinafter referred to as a worker). Wherein,

the master may specifically subscribe to the asynchronous data storage task from the message queue, and store the subscribed asynchronous data storage task in the memory, step a 4.

The worker can specifically access the master to obtain the corresponding asynchronous data storage task, and the step a5 is carried out.

After the worker acquires the asynchronous data storage task, the metadata of the data can be specifically acquired according to the identification information, such as a name, of the data in the acquired asynchronous data storage task. Furthermore, the worker can read the data from the first distributed storage system according to the metadata of the data, and write the data into the second distributed storage system, thereby completing the migration from the first distributed storage system to the second distributed storage system. Then, the worker may update the metadata of the data, for example, record a storage path of the data in the second distributed storage system, and the like. Finally, the worker may delete the data in the first distributed storage system, so that the storage space of the first storage device is released, step a 6.

After the worker completes the migration of the data from the first distributed storage system to the second distributed storage system, the master may also be accessed again to notify that the asynchronous data storage task has been completed, step a 7.

After the master confirms that the asynchronous data storage task is completed, the master may delete the asynchronous data storage task from the memory, and store the task identifier, such as a task number, of the currently completed asynchronous data storage task in the database, so as to be used as a basis for restarting execution of the asynchronous data storage task, step a 8.

Therefore, the data is stored in the second distributed storage system based on the second storage device for permanent storage, and the device cost of the permanently stored storage device is lower even if the storage operation of the data is really finished, so that the aim of reducing the storage cost of the data storage can be achieved. The purpose of temporarily storing data by adopting the first distributed storage system based on the first storage device is that the first distributed storage system based on the first storage device can provide low-delay and high-throughput online access service for users, so that the real-time requirement of online access is ensured.

In this embodiment, the data to be stored by the user is stored in the first storage system based on the first storage device, and the success of data storage is confirmed to the user, where the first storage device is a storage device adopting a random access method, so that the data can be migrated from the first storage system to the second storage system based on the second storage device, and the second storage device is a storage device adopting a sequential access method, and since the data to be stored by the user is temporarily stored in the first storage system first and then the data stored in the first storage system is further migrated to the second storage system, the storage space of the first storage device is released, so that the first storage device with only limited storage capacity can be used to repeatedly store more data without additionally expanding the storage capacity, and since the second storage device adopting the sequential access method is used as a permanent storage device for storing data, the equipment cost is low, so that the storage cost of data storage is reduced.

Fig. 2A is a schematic flow chart of another data processing method according to another embodiment of the present application, as shown in fig. 2A.

201. A retrieval request for data to be retrieved by a user is received.

202. Copying the data from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirming that the data retrieval is successful to the user, wherein the first storage device is a storage device adopting a random access mode, and the second storage device is a storage device adopting a sequential access mode;

203. receiving a read request for the data;

204. reading the data from the first storage system.

It should be noted that part or all of the execution subjects of 201 to 204 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) set in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a data processing platform on the network side, which is not particularly limited in this embodiment.

In this way, the data to be retrieved by the user is copied from the second storage system based on the second storage device to the first storage system based on the first storage device, the data retrieval success is confirmed to the user, the first storage device is a storage device adopting a random access mode, the second storage device is a storage device adopting a sequential access mode, and further, when a reading request of the data is received, the data can be immediately read from the first storage system, because the data to be read by the user is firstly copied from the second storage system to the first storage system for temporary storage, the data can be immediately read from the first storage system when the user needs to read the data, so that the storage space of the first storage device is released, and the first storage device only provided with limited storage capacity can be used for repeatedly storing more data, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

Fig. 2B is a schematic flow chart of data reading processing in the embodiment corresponding to fig. 2A, and the following describes the embodiment in detail with reference to fig. 2B.

In a specific implementation process, in 202, an asynchronous data retrieval task may be specifically issued to a message queue, and the user is confirmed that the data retrieval is successfully accepted. Asynchronous data retrieval tasks may then be subscribed from the message queue. Then, according to the subscribed asynchronous data retrieval task, metadata of the data may be acquired, and further, according to the metadata of the data, the data may be read from the second distributed storage system, and written into the first distributed storage system, and the metadata of the data may be updated. Finally, the user is confirmed that the data retrieval was successful.

specifically, in this embodiment, when there is a reading demand of data, the user may initiate a retrieval request of the data to be read by the user, step b 1.

After receiving a user-initiated retrieval request for data to be read by the user, the storage system control module may send an asynchronous message to the message queue, representing an asynchronous data retrieval task, step b 2.

The storage system control module then returns a request acceptance success to the user to confirm the data retrieval acceptance success to the user, step b 3.

After confirming to the user that the data retrieval acceptance was successful, the asynchronous data retrieval task may then continue to be completed and the user may confirm that the data retrieval was successful.

Further, after 202, the task identification of the completed asynchronous data retrieval task may be further stored, for example, in a database.

Therefore, the task identifier of the completed asynchronous data retrieval task is stored, so that when the system is restarted, the execution of the asynchronous data retrieval task can be restarted from the breakpoint based on the stored task identifier, the asynchronous data retrieval task is ensured not to be lost, and the reliability of data reading can be effectively improved.

the master may specifically subscribe to the asynchronous data retrieval task from the message queue, and store the subscribed asynchronous data retrieval task in the memory, step b 4.

The worker can specifically access the master to obtain the corresponding asynchronous data retrieval task, step b 5.

After the worker acquires the asynchronous data retrieval task, the metadata of the data can be specifically acquired according to the identification information, such as a name, of the data in the acquired asynchronous data retrieval task. Furthermore, the worker can read the data from the second distributed storage system according to the metadata of the data, and write the data into the first distributed storage system, so as to complete the retrieval from the first distributed storage system to the second distributed storage system. Then, the worker may update the metadata of the data, for example, record a storage path of the data in the first distributed storage system, and the like, step b 6.

After the worker completes the retrieval of data from the second distributed storage system to the first distributed storage system, the master may also be accessed again to notify that the asynchronous data retrieval task has been completed, step b 7.

After the master confirms that the step data retrieving task is completed, the asynchronous data retrieving task may be deleted from the memory, and the task identifier, such as a task serial number, of the currently completed asynchronous data retrieving task is stored in the database, so as to be used as a basis for restarting the execution of the asynchronous data retrieving task, step b 8.

The master may then return a request success to the user to confirm to the user that the data retrieval was successful, step b 9.

At this time, only the data is temporarily copied to the first distributed storage system based on the first storage device, but for a user, the user can already confirm that the data can be immediately read at any time, and compared with the technical scheme of directly reading the data from the second distributed storage system based on the second storage device, the time delay of data reading can be greatly reduced.

After confirming that the data retrieval is successful, the user can read the data at any time. The user may specifically initiate a read request for data to be read by the user, step b 10.

After receiving a user initiated read request for data to be read by a user, the storage system control module may read the data from the first storage system, step b 11.

After the user reads the data, the worker may delete the data in the first distributed storage system, so that the storage space of the first storage device is released.

Therefore, the temporarily stored data can be read in real time from the first distributed storage system based on the first storage device, and because the data are only temporarily stored in the first distributed storage system based on the first storage device, and actually are permanently stored in the second distributed storage system based on the second storage device, the device cost of the permanently stored storage device is low, and the purpose of reducing the storage cost of data storage can be achieved. The purpose of temporarily storing data by adopting the first distributed storage system based on the first storage device is that the first distributed storage system based on the first storage device can provide low-delay and high-throughput online access service for users, so that the real-time requirement of online access is ensured.

In this embodiment, the data to be retrieved by the user is copied from the second storage system based on the second storage device to the first storage system based on the first storage device, and the success of the data retrieval is confirmed to the user, where the first storage device is a storage device adopting a random access method, the second storage device is a storage device adopting a sequential access method, and further, when a request for reading the data is received, the data can be immediately read from the first storage system, and since the data to be read by the user is copied from the second storage system to the first storage system for temporary storage, the data can be immediately read from the first storage system when the user needs to read the data, so that the storage space of the first storage device is released, and thus the first storage device with only limited storage capacity can be used to repeatedly store more data, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

The data processing method provided by the application can be respectively suitable for the data storage and reading processes, and mainly has the following beneficial effects:

1. the storage system based on the storage equipment adopting the sequential access mode is used as a permanent storage layer, so that the storage cost of data storage can be effectively reduced;

2. the storage system based on the storage equipment adopting the random access mode is used as a buffer storage layer, so that the time delay of storing data and reading data by a user can be reduced;

3. the asynchronous tasks are stored by adopting the message queues, so that the sequence of the tasks can be effectively ensured, and the execution of the tasks is decoupled.

4. The task identifier of the completed asynchronous data storage/retrieval task is stored, so that when the system is restarted, the execution of the asynchronous data storage/retrieval task can be restarted from the breakpoint based on the stored task identifier, the asynchronous data storage/retrieval task is ensured not to be lost, and the reliability of data access can be effectively improved.

In the application, data which is expected not to be read by a user is archived and stored in the second storage system based on the second storage device adopting the sequential access mode, and data which is scheduled to be read is copied into the first storage system based on the first storage device adopting the random access mode, so that the storage cost of data storage can be reduced, and the requirement of online data access can be met.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

Fig. 3 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present application, as shown in fig. 3. The data processing apparatus 300 of the present embodiment may include a request processing unit 301, a cache control unit 302, and a migration control unit 303. The request processing unit 301 is configured to receive a storage request of data to be stored by a user; a cache control unit 302, configured to store the data in a first storage system based on a first storage device, and confirm to the user that the data is successfully stored, where the first storage device is a storage device adopting a random access manner; a migration control unit 303, configured to migrate the data from the first storage system to a second storage system based on a second storage device, where the second storage device is a storage device adopting a sequential access manner.

It should be noted that, part or all of the execution main body of the data processing apparatus provided in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a Software Development Kit (SDK) or a plug-in provided in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a data processing platform on the network side, which is not particularly limited in this embodiment.

In a specific implementation process, the cache control unit 302 may be specifically configured to store the data in a first distributed storage system based on a first storage device, and generate metadata of the data; issuing an asynchronous data storage task to a message queue; and confirming to the user that the data storage is successful.

In another specific implementation process, the migration control unit 303 may be specifically configured to subscribe to an asynchronous data storage task from the message queue; according to the subscribed asynchronous data storage task, acquiring metadata of the data, reading the data from the first distributed storage system according to the metadata of the data, writing the data into the second distributed storage system, and updating the metadata of the data; and deleting the data in the first distributed storage system.

Further, the migration control unit 303 may be further configured to perform storage processing on the task identifier of the completed asynchronous data storage task.

It should be noted that the method in the embodiment corresponding to fig. 1A may be implemented by the data processing apparatus provided in this embodiment. For a detailed description, reference may be made to relevant contents in the embodiment corresponding to fig. 1A, and details are not described here.

In this embodiment, the data to be stored by the user is stored in the first storage system based on the first storage device through the cache control unit, and the success of the data storage is confirmed to the user, where the first storage device is a storage device adopting a random access mode, so that the migration control unit can migrate the data from the first storage system to the second storage system based on the second storage device, and the second storage device is a storage device adopting a sequential access mode, and since the data to be stored by the user is temporarily stored in the first storage system first and then the data stored in the first storage system is further migrated to the second storage system, the storage space of the first storage device is released, so that the first storage device with only limited storage capacity can be used to repeatedly store more data without additionally expanding the storage capacity, the second storage device which is used as permanent storage data and adopts a sequential access mode has lower device cost, so that the storage cost of data storage is reduced.

Fig. 4 is a schematic structural diagram of another data processing apparatus according to another embodiment of the present application, as shown in fig. 4. The data processing apparatus 400 of the present embodiment may include a request processing unit 401, a migration control unit 402, and a cache control unit 403. A request processing unit 401, configured to receive a retrieval request of data to be retrieved by a user; a migration control unit 402, configured to copy the data from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirm that the data retrieval is successful to the user, where the first storage device is a storage device adopting a random access method, and the second storage device is a storage device adopting a sequential access method; the request processing unit 401 is further configured to receive a read request of the data; a cache control unit 403, configured to read the data from the first storage system.

In a specific implementation process, the migration control unit 402 may be specifically configured to issue an asynchronous data retrieval task to a message queue; confirming to the user that the data retrieval acceptance is successful; subscribing to asynchronous data retrieval tasks from the message queue; according to the subscribed asynchronous data retrieval task, acquiring metadata of the data, reading the data from the second distributed storage system according to the metadata of the data, writing the data into the first distributed storage system, and updating the metadata of the data; and confirming to the user that the data retrieval was successful.

Further, the migration control unit 402 may be further configured to store the task identifier of the completed asynchronous data retrieval task.

It should be noted that the method in the embodiment corresponding to fig. 2A may be implemented by the data processing apparatus provided in this embodiment. For a detailed description, reference may be made to relevant contents in the embodiment corresponding to fig. 2A, and details are not described here.

In this embodiment, the migration control unit copies data to be retrieved by a user from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirms success of the data retrieval to the user, where the first storage device is a storage device adopting a random access method, the second storage device is a storage device adopting a sequential access method, and further, when the request processing unit receives a read request of the data, the cache control unit can immediately read the data from the first storage system, and since the data to be read by the user is copied from the second storage system to the first storage system for temporary storage first, the data can be immediately read from the first storage system when the user needs to read the data, so that the storage space of the first storage device is released, and thus the first storage device with only limited storage capacity can be used to repeatedly store more data, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

The present application also provides an electronic device and a non-transitory computer readable storage medium having computer instructions stored thereon, according to embodiments of the present application.

Fig. 5 is a schematic view of an electronic device for implementing the data processing method according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a Graphical User Interface (GUI) on an external input/output apparatus, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the data processing method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the data processing method provided by the present application.

The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and units, such as program instructions/units corresponding to the data processing method in the embodiment of the present application (for example, the request processing unit 301, the cache control unit 302, and the migration control unit 303 shown in fig. 3, or, for example, the request processing unit 401, the migration control unit 402, and the cache control unit 403 shown in fig. 4 again). The processor 501 executes various functional applications of the server and data processing, i.e., implements the data processing method in the above-described method embodiments, by executing the non-transitory software programs, instructions, and units stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like created according to use of an electronic device implementing the data processing method provided by the embodiment of the present application. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 may optionally include a memory remotely located from the processor 501, and these remote memories may be connected via a network to an electronic device implementing the data processing method provided by the embodiments of the present application. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the data processing method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the data processing method provided by the embodiment of the present application, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, an Application Specific Integrated Circuit (ASIC), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical solution of the embodiment of the present application, on one hand, the data to be stored by the user is stored in the first storage system based on the first storage device, and the success of the data storage is confirmed to the user, the first storage device is a storage device adopting a random access mode, so that the data can be migrated from the first storage system to the second storage system based on the second storage device, the second storage device is a storage device adopting a sequential access mode, since the data to be stored by the user is temporarily stored in the first storage system, and then the data stored in the first storage system is migrated to the second storage system, so that the storage space of the first storage device is released, and thus the first storage device with only limited storage capacity can be used for repeatedly storing more data, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

According to the technical solution of the embodiment of the present application, on the other hand, the embodiment of the present application copies data to be retrieved by a user from a second storage system based on a second storage device to a first storage system based on a first storage device, and confirms to the user that the data retrieval is successful, where the first storage device is a storage device adopting a random access method, the second storage device is a storage device adopting a sequential access method, and further, when a read request of the data is received, the data can be immediately read from the first storage system, and since the data to be read by the user is first copied from the second storage system to the first storage system for temporary storage, the data can be immediately read from the first storage system when the user needs to read the data, so that the storage space of the first storage device is released, and thus more data can be repeatedly stored by using the first storage device only having a limited storage capacity, the storage capacity of the data storage device does not need to be additionally expanded, and the device cost is lower due to the second storage device which is used as permanent storage data and adopts a sequential access mode, so that the storage cost of data storage is reduced.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data processing method, comprising:

receiving a storage request of data to be stored by a user;

2. The method of claim 1, wherein the first storage system comprises a first distributed storage system; the storing the data to a first storage system based on a first storage device, and confirming the success of the data storage to the user comprises:

issuing an asynchronous data storage task to a message queue;

and confirming the data storage success to the user.

3. The method of claim 2, wherein the second storage system comprises a second distributed storage system; the migrating the data from the first storage system to a second storage system based on a second storage device, comprising:

subscribing to an asynchronous data storage task from the message queue;

and deleting the data in the first distributed storage system.

4. The method of claim 3, wherein migrating the data from the first storage system to a second storage system based on a second storage device further comprises:

5. The method of any of claims 1-4, wherein the first storage device comprises a disk; the second storage device includes a magnetic tape.

6. A data processing method, comprising:

receiving a retrieval request for data to be retrieved by a user;

receiving a read request for the data;

reading the data from the first storage system.

7. The method of claim 6, wherein the first storage system comprises a first distributed storage system; the second storage system comprises a second distributed storage system; the copying the data from the second storage device based second storage system to the first storage device based first storage system confirming to the user that the data retrieval was successful comprises:

issuing an asynchronous data retrieval task to a message queue;

confirming to the user that the data retrieval acceptance is successful;

subscribing to asynchronous data retrieval tasks from the message queue;

confirming to the user that the data retrieval was successful.

8. The method of claim 7, wherein copying the data from the second storage system based on the second storage device to the first storage system based on the first storage device confirms to the user that the data retrieval was successful, further comprising:

9. The method of any of claims 6-8, wherein the first storage device comprises a disk; the second storage device includes a magnetic tape.

10. A data processing apparatus, comprising:

11. The apparatus of claim 10, wherein the first storage system comprises a first distributed storage system; the cache control unit is specifically used for

issuing an asynchronous data storage task to a message queue; and

and confirming the data storage success to the user.

12. The apparatus of claim 11, wherein the second storage system comprises a second distributed storage system; the migration control unit is specifically used for

Subscribing to an asynchronous data storage task from the message queue;

and deleting the data in the first distributed storage system.

13. The apparatus of claim 12, wherein the migration control unit is further configured to migrate the data stream to the storage device

14. The apparatus of any of claims 10-13, wherein the first storage device comprises a magnetic disk; the second storage device includes a magnetic tape.

15. A data processing apparatus, comprising:

16. The apparatus of claim 15, wherein the first storage system comprises a first distributed storage system; the second storage system comprises a second distributed storage system; the migration control unit is specifically used for

Issuing an asynchronous data retrieval task to a message queue;

confirming to the user that the data retrieval acceptance is successful;

subscribing to asynchronous data retrieval tasks from the message queue;

confirming to the user that the data retrieval was successful.

17. The apparatus of claim 16, wherein the migration control unit is further configured to migrate the target object to the target object

18. The apparatus of any of claims 15-17, wherein the first storage device comprises a magnetic disk; the second storage device includes a magnetic tape.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5 or 6-9.