WO2019232927A1 - Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage - Google Patents

Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage Download PDF

Info

Publication number
WO2019232927A1
WO2019232927A1 PCT/CN2018/100172 CN2018100172W WO2019232927A1 WO 2019232927 A1 WO2019232927 A1 WO 2019232927A1 CN 2018100172 W CN2018100172 W CN 2018100172W WO 2019232927 A1 WO2019232927 A1 WO 2019232927A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
statistical period
flow control
load
control threshold
Prior art date
Application number
PCT/CN2018/100172
Other languages
English (en)
Chinese (zh)
Inventor
陈学伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232927A1 publication Critical patent/WO2019232927A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Definitions

  • the present application relates to the field of computer technology, and in particular, to a distributed data deletion flow control method, device, electronic device, and storage medium.
  • deletion of distributed data is a very lightweight operation type that can be completed quickly compared to read and write operations on the client side, but specific to the back-end distributed file system, delete operations and read and write operations
  • the input / output (IO) paths passed are the same.
  • This processing method has a long response time and the delay is very obvious, especially when performing batch delete operations of large small files. Will even affect the normal business of users.
  • the file delete operation will generate data input and output (IO). If the batch file delete operation in the distributed storage system happens to be the peak IO of the user application, the IO of the file delete operation will generate the IO of the user application Impact, thereby reducing the user experience of the application, and even causing system failure.
  • a first aspect of the present application provides a distributed data deletion flow control method, where the method includes:
  • determining the index information of the corresponding data to be deleted according to the data deletion request includes:
  • index information of the data to be deleted is obtained from the matched storage nodes.
  • a second aspect of the present application provides a distributed data deletion flow control device, where the device includes:
  • a request response module configured to add the data deletion request to the configured processing queue when receiving the data deletion request sent by the client, and return information about the successful data deletion to the client;
  • a request acquisition module configured to acquire a data deletion request in the processing queue every preset time period
  • An index determining module configured to determine index information of corresponding data to be deleted according to the data deletion request
  • An index storage module configured to store index information of the data to be deleted into a configured database
  • a flow control acquisition module configured to obtain a flow control threshold corresponding to a current statistical period within a deletion period
  • a data deletion module is configured to delete data corresponding to the index information in the database based on a flow control threshold corresponding to the current statistical period.
  • a third aspect of the present application provides an electronic device including a processor and a memory, where the processor is configured to implement the distributed data deletion flow control method when executing computer-readable instructions stored in the memory.
  • a fourth aspect of the present application provides a non-volatile readable storage medium, where computer-readable instructions are stored on the non-volatile readable storage medium, and the computer-readable instructions are implemented when executed by a processor.
  • Distributed data deletion flow control method
  • the distributed data deletion flow control method, device, electronic device, and storage medium described in the present application can, when receiving a client request to delete data, first return information about the deleted data to the client and add the data deletion request To the configured processing queue, the subsequent actual deletion of data when acquiring data deletion requests in the processing queue, that is, the response to the client's data deletion request and the operation of deleting data are often asynchronous, thereby effectively reducing the client's waiting time;
  • the flow control thresholds corresponding to different statistical periods are obtained. Based on the flow control thresholds corresponding to each statistical period, the data requested by the client to be deleted is deleted, and the distribution is improved. At the same time, it can avoid the significant impact on the performance of normal input and output services and has a good flow control effect.
  • FIG. 1 is a flowchart of a distributed data deletion flow control method provided in Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a method for determining a flow control threshold corresponding to a current statistical period according to an IO load of a user application in a previous statistical period according to a second embodiment of the present application.
  • FIG. 3 is a functional module diagram of a distributed data deletion flow control device provided in Embodiment 3 of the present application.
  • FIG. 4 is a schematic diagram of an electronic device according to a fourth embodiment of the present application.
  • the distributed data deletion flow control method in the embodiment of the present application is applied to one or more electronic devices.
  • the distributed data deletion flow control method can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network.
  • the network includes, but is not limited to: a wide area network, a metropolitan area network, or a local area network.
  • the distributed data deletion flow control method in the embodiment of the present application may be executed by a server or an electronic device; it may also be executed jointly by the server and the electronic device.
  • the distributed data deletion flow control function provided by the method of the present application may be directly integrated on the electronic device, or a client for implementing the method of the present application may be installed.
  • the method provided in this application can also be run on devices such as servers in the form of Software Development Kit (SDK), and provide the interface of distributed data deletion flow control function in the form of SDK, electronic device or Other devices can implement the method described in this application through the provided interface.
  • SDK Software Development Kit
  • FIG. 1 is a flowchart of a distributed data deletion flow control method provided in Embodiment 1 of the present application. According to different requirements, the execution order in this flowchart can be changed, and some steps can be omitted.
  • an immediate feedback mechanism for a client that sends a data deletion request may be set in advance. That is, when a client initiates a data deletion request, the distributed storage system can directly return information about the successful data deletion to the client, without having to wait until the data corresponding to the data deletion request is actually deleted before returning a successful data deletion to the client. Information, this can save the client time waiting for data to be deleted. Especially when the data to be deleted is large, or the IO load of the distributed storage system is high, the cycle of deleting data requested by the client is very long. Long waiting time is not realistic for the client. Setting the instant feedback mechanism can reduce the time waste of the client, increase the application efficiency of the client user, and improve the application experience of the client user.
  • the configured processing queue is used to store data deletion requests sent by the client, and stores data deletion requests sent by the client in chronological order.
  • the data deletion request may include a name of a storage node that stores data.
  • the data deletion request is obtained from the processing queue in a first-in-first-out order.
  • the index information includes file name and location information of data to be deleted corresponding to the data deletion request in a corresponding storage node.
  • the distributed storage system determines the index information of the corresponding data to be deleted according to the data deletion request, including:
  • the distributed storage system may match the storage nodes of the data to be deleted corresponding to the data deletion request from a plurality of storage nodes according to the names of the storage nodes storing the data in the data deletion request.
  • data written by a user is distributedly stored in the form of three copies.
  • a client requests to delete the written data, it is necessary to find the storage data from multiple nodes.
  • the three storage nodes of the written data are described, and the distributed storage system forwards the data deletion request to the three storage nodes, and obtains the index information of the written data from the three storage nodes.
  • the file name and location information of the data to be deleted in the corresponding storage node may be formed into a data pair and stored in a pre-configured database.
  • a deletion period can be divided into multiple statistical periods, and a statistical period can be a preset time period. For example, a statistical period is set to 1 second.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • the acquiring the flow control threshold corresponding to the current statistical period in the deletion period may specifically include:
  • the flow control threshold corresponding to the first statistical period in the deletion period in this application is a preset flow control threshold, which can be preset by a system administrator according to experience. That is, a preset flow control threshold is used as the flow control threshold of the first statistical period in the deletion period.
  • Each remaining statistical period except the first statistical period in the deletion period may correspond to a flow control threshold.
  • the flow control threshold corresponding to each remaining statistical period is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical period can be calculated based on the IO load in the previous statistical period.
  • the flow control threshold corresponding to the next statistical period can be based on the current statistical period.
  • the calculated IO load is calculated. Specifically, the flow control threshold corresponding to the second statistical period is calculated according to the IO load in the first statistical period; the flow control threshold corresponding to the third statistical period is calculated according to the IO load in the second statistical period; analogy.
  • the distributed storage system may determine the data to be deleted according to the file name and location information corresponding to the index information in the database, and then use the flow control threshold corresponding to the current statistical period to delete the determined data that needs to be deleted. Until the determined data to be deleted corresponding to all the statistical periods in the deletion period is deleted.
  • Delete the data corresponding to the index information based on the flow control threshold corresponding to the current statistical period, so that if the flow control threshold corresponding to the current statistical period is large, deleting the data corresponding to the index information with a larger flow control threshold can improve the deletion of the index information
  • the speed of the corresponding data eases the pressure of data storage in the distributed storage system; if the flow control threshold corresponding to the current statistical cycle is small, deleting the data corresponding to the index information with a smaller flow control threshold can avoid normal input and output services Significant impact on performance.
  • FIG. 2 is a flowchart of a method for determining a flow control threshold corresponding to a current statistical period according to an IO load of a user application in a previous statistical period according to a second embodiment of the present application.
  • S21 Obtain a data block size of each IO applied by a user in a previous statistical period, and calculate an average data block size of the IO in the previous statistical period.
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M. And 8M. Calculating the average data block size of the IO in the previous statistical period by using the arithmetic average algorithm is:
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, the time required for a sending site to start sending data frames to the completion of data frame transmission.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • the training process of the load classification model includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the training is ended, and the trained load classification model is used as a classifier to identify the IO load category in the current statistical period; if the accuracy rate When it is smaller than the preset accuracy threshold, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy threshold.
  • calculating the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period may include:
  • the flow control threshold is reduced according to the first preset range, so that the data requested by the client in the distributed storage system to be deleted with the low flow control threshold within the current statistical period.
  • the deletion is performed to ensure efficient access of user applications by reducing the speed of distributed data deletion.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset range, so that the data requested by the client in the distributed storage system to be deleted with a high flow control threshold within the current statistical period
  • the deletion is performed to achieve the purpose of improving the intensity of the distributed data deletion on the basis of ensuring the access quality of the user application, and the purpose of deleting the junk data remaining in the distributed system as soon as possible.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • the data deletion request when a data deletion request sent by a client is received, the data deletion request is added to a configured processing queue and returned to the client at the same time.
  • the information of successful data deletion receives a write request of user data
  • the data deletion request in the processing queue is obtained every preset time period; the corresponding index information of the data to be deleted is determined according to the data deletion request;
  • the index information of the data to be deleted is stored in a configured database; a flow control threshold corresponding to the current statistical period in the deletion period is obtained; and the index in the database is based on the flow control threshold corresponding to the current statistical period.
  • the data corresponding to the message is deleted.
  • this application When this application receives a request from a client to delete data, it can first return the information that the data has been deleted to the client, and then actually delete the data when it gets a data deletion request in the processing queue, that is, in response to the client's data deletion request and delete Data operations are often asynchronous, which effectively shortens the client's waiting time.
  • the flow control thresholds corresponding to different statistical periods are obtained, based on the flow control corresponding to each statistical period. Threshold value, deletes the data requested by the client for deletion, while improving the efficiency of distributed data deletion, it can avoid a significant impact on normal I / O service performance and has a good flow control effect.
  • the flow control threshold corresponding to the current statistical cycle is automatically adjusted dynamically according to the IO load of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids the subjective factors of the manager The problem caused by inaccurate adjustment.
  • FIG. 3 is a functional module diagram of a preferred embodiment of the distributed data deletion flow control device of the present application.
  • the distributed data deletion flow control device 30 runs in an electronic device.
  • the distributed data deletion flow control device 30 may include a plurality of functional modules composed of program code segments.
  • the program code of each program segment in the distributed data deletion flow control device 30 may be stored in a memory and executed by at least one processor to execute (see FIG. 1-2 and related description for details) distributed data. Delete the flow control method.
  • the distributed data deletion flow control device 30 may be divided into a plurality of functional modules according to functions performed by the distributed data deletion flow control device 30.
  • the functional modules may include a request response module 301, a request acquisition module 302, an index determination module 303, an index storage module 304, a flow control acquisition module 305, a data deletion module 306, a flow control calculation module 307, and a model training module 308.
  • the module referred to in the present application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can perform fixed functions, which are stored in a memory. In some embodiments, functions of each module will be described in detail in subsequent embodiments.
  • a request response module 301 is configured to add the data deletion request to a configured processing queue when receiving a data deletion request sent by a client, and return information about successful data deletion to the client.
  • an immediate feedback mechanism for a client that sends a data deletion request may be set in advance. That is, when a client initiates a data deletion request, the distributed storage system can directly return information about the successful data deletion to the client, without having to wait until the data corresponding to the data deletion request is actually deleted before returning a successful data deletion to the client. Information, this can save the client time waiting for data to be deleted. Especially when the data to be deleted is large, or the IO load of the distributed storage system is high, the cycle of deleting data requested by the client is very long. Long waiting time is not realistic for the client. Setting the instant feedback mechanism can reduce the time waste of the client, increase the application efficiency of the client user, and improve the application experience of the client user.
  • the configured processing queue is used to store data deletion requests sent by the client, and stores data deletion requests sent by the client in chronological order.
  • the data deletion request may include a name of a storage node that stores data.
  • the request obtaining module 302 is configured to obtain a data deletion request in the processing queue every preset time period.
  • the data deletion request is obtained from the processing queue in a first-in-first-out order.
  • the index determining module 303 is configured to determine index information of corresponding data to be deleted according to the data deletion request.
  • the index information includes file name and location information of data to be deleted corresponding to the data deletion request in a corresponding storage node.
  • the index determining module 303 determining the index information of the corresponding data to be deleted according to the data deletion request includes:
  • the distributed storage system may match the storage nodes of the data to be deleted corresponding to the data deletion request from a plurality of storage nodes according to the names of the storage nodes storing the data in the data deletion request.
  • data written by a user is distributedly stored in the form of three copies.
  • a client requests to delete the written data, it is necessary to find the storage data from multiple nodes.
  • the three storage nodes of the written data are described, and the distributed storage system forwards the data deletion request to the three storage nodes, and obtains the index information of the written data from the three storage nodes.
  • the index storage module 304 is configured to store index information of data to be deleted in a configured database.
  • the file name and location information of the data to be deleted in the corresponding storage node may be formed into a data pair and stored in a pre-configured database.
  • the flow control acquisition module 305 is configured to acquire a flow control threshold corresponding to a current statistical period within a deletion period.
  • a deletion period can be divided into multiple statistical periods, and a statistical period can be a preset time period. For example, a statistical period is set to 1 second.
  • the flow control refers to flow control. There are two methods for implementing flow control: one is to implement flow control based on source address, destination address, source port, destination port, and protocol type through the QoS module of routers and switches; the other is to use professional flow control equipment Implement application-based flow control.
  • the flow control acquisition module 305 acquiring the flow control threshold corresponding to the current statistical period in the deletion period may specifically include:
  • the flow control threshold corresponding to the first statistical period in the deletion period in this application is a preset flow control threshold, which can be preset by a system administrator according to experience. That is, a preset flow control threshold is used as the flow control threshold of the first statistical period in the deletion period.
  • Each remaining statistical period except the first statistical period in the deletion period may correspond to a flow control threshold.
  • the flow control threshold corresponding to each remaining statistical period is dynamically adjusted.
  • the flow control threshold corresponding to the current statistical period can be calculated based on the IO load in the previous statistical period.
  • the flow control threshold corresponding to the next statistical period can be based on the current statistical period.
  • the calculated IO load is calculated. Specifically, the flow control threshold corresponding to the second statistical period is calculated according to the IO load in the first statistical period; the flow control threshold corresponding to the third statistical period is calculated according to the IO load in the second statistical period; analogy.
  • a data deletion module 306 is configured to delete data corresponding to the index information in the database based on a flow control threshold corresponding to the current statistical period.
  • the data deletion module 306 may determine the data to be deleted according to the file name and location information corresponding to the index information in the database, and then use the flow control threshold corresponding to the current statistical period to perform the determination based on the data that needs to be deleted. Deleting until the determined data to be deleted corresponding to all the statistical periods in the deleting period is deleted.
  • Delete the data corresponding to the index information based on the flow control threshold corresponding to the current statistical period, so that if the flow control threshold corresponding to the current statistical period is large, deleting the data corresponding to the index information with a larger flow control threshold can improve the deletion of the index information
  • the speed of the corresponding data eases the pressure of data storage in the distributed storage system; if the flow control threshold corresponding to the current statistical cycle is small, deleting the data corresponding to the index information with a smaller flow control threshold can avoid normal input and output services Significant impact on performance.
  • the flow control calculation module 307 is configured to obtain a data block size of each IO applied by a user in a previous statistical period, and calculate an average data block size of the IO in the previous statistical period.
  • the average data block size of the IO in the last statistical period may be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm.
  • the data block sizes of the ten IOs are: 2M, 1M, 3M, 0.5M, 10M, 4M, 0.1M, 1.2M, 5M. And 8M. Calculating the average data block size of the IO in the previous statistical period by using the arithmetic average algorithm is:
  • the flow control calculation module 307 is further configured to obtain a transmission delay of each data block in the last statistical period, and calculate an average data block delay of the IO in the last statistical period.
  • the transmission delay refers to the time required for a node to enter a data block from the node to the transmission medium when transmitting data, that is, the time required for a sending site to start sending data frames to the completion of data frame transmission The total time required for a receiving station, or the time required for a receiving station to start receiving data frames and finish receiving data frames.
  • the transmission delay of the data block may be obtained from a load measurement tool or a performance monitoring tool installed in each storage node.
  • the average data block delay of the IO in the last statistical period may also be calculated by using an arithmetic average algorithm, a geometric mean algorithm, or a root mean square algorithm. Assume that assuming that the transmission delays of ten IOs in the previous statistical period are: 1s, 0.8s, 1.5s, 0.4s, 5s, 2s, 0.02s, 0.6s, 3s, and 4.5s, then When the average IO block delay in the previous statistical period is calculated using the arithmetic mean algorithm, the result is:
  • the average data block size of the IO in the previous statistical period is calculated using the arithmetic average algorithm, the average data block delay of the IO in the previous statistical period is also calculated using the arithmetic average algorithm; if The average data block size of the IO in the previous statistical period is calculated using the geometric mean algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the geometric mean algorithm; or The average data block size of the IO is calculated using the root mean square average algorithm, and the average data block delay of the IO in the previous statistical period is also calculated using the root mean square average algorithm.
  • the flow control calculation module 307 is further configured to obtain a preset reference value of the data block size of the IO and a corresponding reference value of the data block delay.
  • the reference value of the size of the IO data block and the reference value of the corresponding data block delay may be preset by an administrator of the storage system according to experience. For example, according to experience, when a 4K data block is transmitted, the delay is the smallest, and in the ideal state, it can reach 50ms, then the reference value of the IO data block size can be set to 4k, and the corresponding data block delay reference value can be set. It is 50ms.
  • the flow control calculation module 307 is further configured to: according to the average data block size, average data block delay, data block size reference value, and corresponding data block delay reference of the IO in the last statistical period. Value to calculate the IO load intensity in the previous statistical period.
  • the average data block size of the IO in the previous statistical period is X
  • the average data block delay is Y
  • the reference value of the data block size is M
  • the reference value of the corresponding data block delay is N
  • the calculation formula of the IO load intensity in the previous statistical period is:
  • the flow control calculation module 307 is further configured to determine a IO load category in the previous statistical period by using a pre-trained load classification model according to the IO load intensity in the last statistical period.
  • the IO load category includes: a high load category, a normal load category, and a low load category.
  • the load classification model includes, but is not limited to, a Support Vector Machine (SVM) model.
  • SVM Support Vector Machine
  • Using the average data block size of the IO in the last statistical period, the average data block delay of the IO in the last statistical period, and the IO load intensity in the last statistical period as the load classification model The input is calculated by the load classification model, and the IO load category in the previous statistical period is output.
  • SVM Support Vector Machine
  • a model training module 308 is configured to train a load classification model.
  • the process of training the load classification model by the model training module 308 includes:
  • training samples in the training sets of different load categories are distributed to different folders. For example, training samples of high load category are distributed to the first folder, training samples of normal load category are distributed to the second folder, and training samples of low load category are distributed to the third folder.
  • training samples of the first preset ratio for example, 70%
  • second preset ratios for example, 30%
  • the training is ended, and the trained load classification model is used as a classifier to identify the IO load category in the current statistical period; if the accuracy rate When it is smaller than the preset accuracy threshold, the number of positive samples and the number of negative samples are increased to retrain the load classification model until the accuracy is greater than or equal to the preset accuracy threshold.
  • the flow control calculation module 307 is further configured to calculate a flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period.
  • the flow control calculation module 307 is further configured to calculate the flow control threshold corresponding to the current statistical period according to the IO load category in the previous statistical period, which may include:
  • the flow control threshold is reduced according to the first preset range, so that the data requested by the client in the distributed storage system to be deleted with the low flow control threshold within the current statistical period.
  • the deletion is performed to ensure efficient access of user applications by reducing the speed of distributed data deletion.
  • the first preset amplitude may be 1/2 of a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1/2 of the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1/2 of the flow control threshold corresponding to the current statistical period.
  • the flow control threshold is increased according to the second preset range, so that the data requested by the client in the distributed storage system to be deleted with a high flow control threshold within the current statistical period
  • the deletion is performed to achieve the purpose of improving the intensity of the distributed data deletion on the basis of ensuring the access quality of the user application, and the purpose of deleting the junk data remaining in the distributed system as soon as possible.
  • the second preset amplitude may be 1.5 times a flow control threshold corresponding to a previous statistical period. That is, the flow control threshold corresponding to the current statistical period is 1.5 times the flow control threshold corresponding to the previous statistical period, and the flow control threshold corresponding to the next statistical period is 1.5 times the flow control threshold corresponding to the current statistical period.
  • the flow control threshold corresponding to the previous statistical cycle is used as the flow control threshold corresponding to the current statistical cycle.
  • the distributed data deletion flow control device described in this application when receiving a data deletion request sent by a client, adds the data deletion request to a configured processing queue and returns to the client at the same time.
  • the information of successful data deletion receives a write request of user data
  • the data deletion request in the processing queue is obtained every preset time period; the corresponding index information of the data to be deleted is determined according to the data deletion request;
  • the index information of the data to be deleted is stored in a configured database; a flow control threshold corresponding to the current statistical period in the deletion period is obtained; and the index in the database is based on the flow control threshold corresponding to the current statistical period.
  • the data corresponding to the message is deleted.
  • this application When this application receives a request from a client to delete data, it can first return the information that the data has been deleted to the client, and then actually delete the data when it gets a data deletion request in the processing queue, that is, in response to the client's data deletion request and delete Data operations are often asynchronous, which effectively shortens the client's waiting time.
  • the flow control thresholds corresponding to different statistical periods are obtained, based on the flow control corresponding to each statistical period. Threshold value, deletes the data requested by the client for deletion, while improving the efficiency of distributed data deletion, it can avoid a significant impact on normal I / O service performance and has a good flow control effect.
  • the flow control threshold corresponding to the current statistical cycle is automatically adjusted dynamically according to the IO load of the user application in the previous statistical cycle, without manual adjustment by the manager, which reduces the workload of the manager and avoids the subjective factors of the manager The problem caused by inaccurate adjustment.
  • the integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium.
  • the above software function module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor to execute the embodiments described in this application. Part of the method.
  • FIG. 4 is a schematic diagram of an electronic device according to a fourth embodiment of the present application.
  • the electronic device 4 includes: a memory 41, at least one processor 42, computer-readable instructions 43 stored in the memory 41 and executable on the at least one processor 42, and at least one communication bus 44.
  • the computer-readable instructions 43 may be divided into one or more modules / units, and the one or more modules / units are stored in the memory 41 and processed by the at least one processor 42 Perform to complete the steps in the above method embodiment of the present application.
  • the one or more modules / units may be a series of computer-readable instruction instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 43 in the electronic device 4.
  • the electronic device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 4 is only an example of the electronic device 4, and does not constitute a limitation on the electronic device 4. It may include more or fewer components than shown in the figure, or combine some components, or be different
  • the electronic device 4 may further include an input / output device, a network access device, a bus, and the like.
  • the at least one processor 42 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (ASICs). ), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the processor 42 may be a microprocessor, or the processor 42 may be any conventional processor, etc.
  • the processor 42 is a control center of the electronic device 4, and uses various interfaces and lines to connect the entire electronic device 4 The various parts.
  • the memory 41 may be configured to store the computer-readable instructions 43 and / or modules / units, and the processor 42 may execute or execute the computer-readable instructions and / or modules / units stored in the memory 41, and Recalling the data stored in the memory 41 to implement various functions of the electronic device 4.
  • the memory 41 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, application programs required for at least one function (such as a sound playback function, an image playback function, etc.), etc .; the storage data area may Data (such as audio data, phonebook, etc.) created according to the use of the electronic device 4 are stored.
  • the memory 41 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, an internal memory, a plug-in hard disk, a Smart Memory Card (SMC), and a Secure Digital (SD). Card, flash memory card (Flash card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, an internal memory, a plug-in hard disk, a Smart Memory Card (SMC), and a Secure Digital (SD).
  • SSD Secure Digital
  • flash memory card Flash card
  • flash memory device at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module / unit of the electronic device 4 When the integrated module / unit of the electronic device 4 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, this application implements all or part of the processes in the methods of the above embodiments, and can also be completed by computer-readable instructions to instruct related hardware.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instructions include computer-readable instruction codes, and the computer-readable instruction codes may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the non-volatile readable medium may include: any entity or device capable of carrying the computer-readable instruction code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electric carrier signals telecommunication signals
  • telecommunication signals and software distribution media.
  • the content contained in the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practices in the jurisdictions. For example, in some jurisdictions, according to legislation and patent practices, non- Volatile readable media does not include electrical carrier signals and telecommunication signals.
  • each functional unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist separately physically, or two or more units may be integrated in the same unit.
  • the integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé de régulation de flux de suppression de données distribuées qui consiste : à ajouter une demande de suppression de données à une file d'attente de traitement configurée lors de la réception de la demande de suppression de données envoyée par un client, et à renvoyer des informations indiquant que la suppression de données est réussie au client ; à obtenir la demande de suppression de données dans la file d'attente de traitement à chaque autre période de temps prédéfinie ; à déterminer des informations d'index de données correspondantes à supprimer en fonction de la demande de suppression de données ; à stocker les informations d'index des données à supprimer dans une base de données configurée ; à obtenir un seuil de régulation de flux correspondant à une période statistique actuelle dans une période de suppression ; et à supprimer les données correspondant aux informations d'index dans la base de données sur la base du seuil de régulation de flux correspondant à la période statistique actuelle. La présente invention concerne également un appareil de régulation de flux de suppression de données distribuées, un dispositif électronique et un support de stockage. La présente invention peut éviter un impact évident sur des performances normales de service d'entrée et de sortie tout en améliorant l'efficacité de suppression de données d'un système de stockage distribué à grande échelle, et présente un bon effet de régulation de flux.
PCT/CN2018/100172 2018-06-04 2018-08-13 Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage WO2019232927A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810566096.6 2018-06-04
CN201810566096.6A CN108959399B (zh) 2018-06-04 2018-06-04 分布式数据删除流控方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019232927A1 true WO2019232927A1 (fr) 2019-12-12

Family

ID=64493090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100172 WO2019232927A1 (fr) 2018-06-04 2018-08-13 Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage

Country Status (2)

Country Link
CN (1) CN108959399B (fr)
WO (1) WO2019232927A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177137A (zh) * 2019-12-30 2020-05-19 广州酷狗计算机科技有限公司 数据去重的方法、装置、设备及存储介质
CN112118188A (zh) * 2020-08-25 2020-12-22 北京五八信息技术有限公司 一种流量限速方法、装置、电子设备及存储介质
CN112214503A (zh) * 2020-10-10 2021-01-12 深圳壹账通智能科技有限公司 数据处理方法、装置、电子设备及存储介质
CN116595007A (zh) * 2023-05-23 2023-08-15 建材广州工程勘测院有限公司 一种岩土工程地质数据管理系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120973A (zh) * 2019-04-28 2019-08-13 华为技术有限公司 一种请求控制方法、相关设备及计算机存储介质
CN112506896B (zh) * 2019-09-16 2023-08-04 杭州海康威视系统技术有限公司 一种数据删除方法、装置及电子设备
CN110888844B (zh) * 2019-11-22 2023-03-21 浪潮电子信息产业股份有限公司 一种数据删除方法、系统、设备及计算机可读存储介质
CN110941591A (zh) * 2019-11-22 2020-03-31 浪潮电子信息产业股份有限公司 一种文件删除方法、装置、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095489A (zh) * 2015-08-18 2015-11-25 浪潮(北京)电子信息产业有限公司 一种分布式文件删除方法、装置和系统
CN105824881A (zh) * 2016-03-10 2016-08-03 中国人民解放军国防科学技术大学 一种基于负载均衡的重复数据删除数据放置方法器
CN106227469A (zh) * 2016-07-28 2016-12-14 乐视控股(北京)有限公司 用于分布式存储集群的数据删除方法及系统
CN107330061A (zh) * 2017-06-29 2017-11-07 郑州云海信息技术有限公司 一种基于分布式存储的文件删除方法及装置
US20180137175A1 (en) * 2015-05-14 2018-05-17 Walleye Software, LLC Query task processing based on memory allocation and performance criteria

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1135746C (zh) * 2000-10-19 2004-01-21 华为技术有限公司 Cdma蜂窝移动通信系统中多业务负载监测和预测的装置及其计算方法
KR100851000B1 (ko) * 2001-10-15 2008-08-12 엘지전자 주식회사 VoIP에서의 네트워크 트래픽 제어 방법
ATE380431T1 (de) * 2002-11-15 2007-12-15 Ibm Steuerung von netzwerkverkehr in einer peer-to- peer umgebung
CN101631346B (zh) * 2009-06-05 2012-06-20 西安电子科技大学 基于信号强度和负载估计的区间切换方法
CN102355425B (zh) * 2011-10-26 2014-10-29 深信服网络科技(深圳)有限公司 一种网络流量控制方法和设备
TWM439962U (en) * 2012-05-04 2012-10-21 Univ Hungkuang Network traffic control system with congestion-aware function
EP2887590B1 (fr) * 2012-09-25 2017-09-20 Huawei Technologies Co., Ltd. Procédé de contrôle de flux, dispositif et réseau associés
CN104092619B (zh) * 2014-07-25 2017-07-21 华为技术有限公司 流量控制方法及装置
CN104408656A (zh) * 2014-10-29 2015-03-11 中国建设银行股份有限公司 动态调整流控阈值的方法及系统
US10091124B2 (en) * 2015-09-04 2018-10-02 Citrix Systems, Inc. System for early system resource constraint detection and recovery
KR101748272B1 (ko) * 2015-12-10 2017-06-27 현대자동차주식회사 차량에서의 대용량 진단 통신 제어 방법 및 장치
CN107454004A (zh) * 2016-05-30 2017-12-08 阿里巴巴集团控股有限公司 一种流量控制方法和装置
CN107544862B (zh) * 2016-06-29 2022-03-25 中兴通讯股份有限公司 一种基于纠删码的存储数据重构方法和装置、存储节点
CN106656840B (zh) * 2016-11-25 2019-11-08 杭州安恒信息技术股份有限公司 一种应用于网络爬虫的动态流量控制方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137175A1 (en) * 2015-05-14 2018-05-17 Walleye Software, LLC Query task processing based on memory allocation and performance criteria
CN105095489A (zh) * 2015-08-18 2015-11-25 浪潮(北京)电子信息产业有限公司 一种分布式文件删除方法、装置和系统
CN105824881A (zh) * 2016-03-10 2016-08-03 中国人民解放军国防科学技术大学 一种基于负载均衡的重复数据删除数据放置方法器
CN106227469A (zh) * 2016-07-28 2016-12-14 乐视控股(北京)有限公司 用于分布式存储集群的数据删除方法及系统
CN107330061A (zh) * 2017-06-29 2017-11-07 郑州云海信息技术有限公司 一种基于分布式存储的文件删除方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177137A (zh) * 2019-12-30 2020-05-19 广州酷狗计算机科技有限公司 数据去重的方法、装置、设备及存储介质
CN111177137B (zh) * 2019-12-30 2023-10-13 广州酷狗计算机科技有限公司 数据去重的方法、装置、设备及存储介质
CN112118188A (zh) * 2020-08-25 2020-12-22 北京五八信息技术有限公司 一种流量限速方法、装置、电子设备及存储介质
CN112214503A (zh) * 2020-10-10 2021-01-12 深圳壹账通智能科技有限公司 数据处理方法、装置、电子设备及存储介质
CN116595007A (zh) * 2023-05-23 2023-08-15 建材广州工程勘测院有限公司 一种岩土工程地质数据管理系统

Also Published As

Publication number Publication date
CN108959399B (zh) 2022-07-15
CN108959399A (zh) 2018-12-07

Similar Documents

Publication Publication Date Title
WO2019232927A1 (fr) Procédé et appareil de régulation de flux de suppression de données distribuées, dispositif électronique, et support de stockage
WO2019232926A1 (fr) Procédé et appareil de commande de flux et de vérification de cohérence de données, dispositif électronique et support de stockage
US8112546B2 (en) Routing users to receive online services based on online behavior
WO2019232993A1 (fr) Procédé et appareil de commande de flux de récupération de données adaptatif, dispositif électronique et support de stockage
US9961157B2 (en) Adaptive compression management for web services
US11824787B2 (en) Method and apparatus for node speed limiting, electronic device and storage medium
US20130198148A1 (en) Estimating data reduction in storage systems
US20150319238A1 (en) Method, device and storage medium for data processing
WO2018161881A1 (fr) Procédé de traitement de données structurées, support de stockage de données et appareil informatique
WO2021184589A1 (fr) Procédé et dispositif de planification de flux, serveur et support de stockage
US20220200902A1 (en) Method, apparatus and storage medium for application identification
CN109120454B (zh) 一种QoS流量限速系统及方法
WO2023273544A1 (fr) Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage
WO2021051589A1 (fr) Procédé et appareil de mémorisation de données, dispositif électronique et support d'informations
WO2017181614A1 (fr) Procédé, appareil et dispositif électronique de positionnement de données transmises en continu
US20200204688A1 (en) Picture book sharing method and apparatus and system using the same
EP4094161A1 (fr) Procédé et appareil de gestion et de commande d'une ressource, dispositif et support de stockage
WO2019232994A1 (fr) Procédé et appareil de régulation de débit permettant l'écriture sur un disque en arrière-plan, et dispositif électronique et support d'informations
WO2019232925A1 (fr) Procédé et appareil de commande de flux de migration de données de point d'accès sans fil, et dispositif électronique et support de données
WO2021212965A1 (fr) Procédé de planification de ressources et dispositif associé
US9893972B1 (en) Managing I/O requests
US20190044835A1 (en) Technologies for filtering network packets on ingress
WO2020224242A1 (fr) Procédé et appareil de traitement de données de chaîne de blocs, serveur et support de stockage
WO2022078347A1 (fr) Procédé et appareil de planification de tâches, dispositif électronique et support de stockage
US10372436B2 (en) Systems and methods for maintaining operating consistency for multiple users during firmware updates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921416

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921416

Country of ref document: EP

Kind code of ref document: A1