CN118093097A

CN118093097A - Data storage cluster resource scheduling method, device, electronic equipment and medium

Info

Publication number: CN118093097A
Application number: CN202410228836.0A
Authority: CN
Inventors: 付芳吉
Original assignee: Park Road Credit Information Co ltd
Current assignee: Park Road Credit Information Co ltd
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2024-05-28

Abstract

The embodiment of the invention discloses a data storage cluster resource scheduling method, a data storage cluster resource scheduling device, electronic equipment and a medium. One embodiment of the method comprises the following steps: acquiring a user data calling information set and a cluster load information set; preprocessing the user data calling information set to obtain a preprocessed user data calling information set; generating a user-invoked knowledge graph; generating a user data call prediction information set; user value identification is carried out on the preprocessed user data calling information set, and user value information is obtained; inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set; determining a call priority set of a user data call prediction information set; generating cluster resource scheduling information and performing resource scheduling on the data storage clusters. The embodiment can improve the prediction accuracy of user data call and the accuracy of generated cluster resource scheduling information, reduce the loss rate of clusters and improve the stability of the data storage clusters.

Description

Data storage cluster resource scheduling method, device, electronic equipment and medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a data storage cluster resource scheduling method, a device, electronic equipment and a medium.

Background

Currently, with the development of network technology, the daily access of a data storage cluster is increasing, a large number of access requests may occur simultaneously, and huge stress is caused on the data storage cluster, and the stability of the data storage cluster is improved. For resource scheduling of data storage clusters, the following is generally adopted: and predicting the data call quantity of the target user according to the historical user data call information by adopting a time sequence prediction model to obtain the user data call quantity. Then, scheduling the resources of the data storage clusters by using the user data scheduling amount.

However, in practice, it is found that when the data storage clusters are subjected to resource scheduling in the above manner, there is often the following technical problem: and the time sequence prediction model is adopted to predict the call quantity of the user data, and because the time sequence prediction model only focuses on the time sequence information of the call quantity, the considered influencing factors are single, the accuracy of a predicted call information set of the user data is lower, and the resource optimization of a data storage cluster is further influenced, so that the cluster processing efficiency is reduced, and the damage rate of the data storage cluster is higher.

In the process of solving the first technical problem by adopting the technical scheme, the following second technical problem is often accompanied: the initial population is obtained by adopting random initialization, so that the initial population has great randomness, the locust optimization algorithm is linearly decreased along with the increase of the execution times when the locust individuals are updated, the self-adaptive updating cannot be performed, the optimization efficiency of the algorithm is low along with the increase of the execution times, the optimization solution is trapped in a local optimal solution, the resource scheduling of the data storage cluster is unbalanced, the performance of the data storage cluster is unstable, the damage degree of the data storage nodes is increased, and the user experience is reduced. For the second technical problem, the conventional solution is generally: and determining cluster resource scheduling information of the data storage clusters by adopting a locust optimization algorithm so as to dynamically adjust the resources of the data storage clusters. However, the conventional solutions described above still have the following problems: the initial population is obtained by adopting random initialization, so that the initial population has great randomness, the locust optimization algorithm is linearly decreased along with the increase of the execution times when the locust individuals are updated, the self-adaptive updating cannot be performed, the optimization efficiency of the algorithm is low along with the increase of the execution times, the optimization solution is trapped in a local optimal solution, the resource scheduling of the data storage cluster is unbalanced, the performance of the data storage cluster is unstable, the damage degree of the data storage nodes is increased, and the user experience is reduced.

The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosed concept and, therefore, it may contain information that does not form the prior art that is known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a data storage cluster resource scheduling method, apparatus, electronic device, and medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a data storage cluster resource scheduling method, including: acquiring a user data calling information set of a data calling user and a cluster load information set of a data storage cluster; preprocessing the user data calling information set to obtain a preprocessed user data calling information set; generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set; generating a user data calling prediction information set of the data calling user according to the preprocessed user data calling information set and the user calling knowledge graph; carrying out user value identification on the preprocessed user data calling information set to obtain user value information; inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set; determining the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set to obtain a calling priority set; generating cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and carrying out resource scheduling on the data storage cluster according to the cluster resource scheduling information.

In a second aspect, some embodiments of the present disclosure provide a data storage cluster resource scheduling apparatus, including: an acquisition unit configured to acquire a user data call information set of a data call user and a cluster load information set of a data storage cluster; the preprocessing unit is configured to preprocess the user data calling information set to obtain a preprocessed user data calling information set; the first generation unit is configured to generate a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set; the second generating unit is configured to generate a user data calling prediction information set of the data calling user according to the preprocessed user data calling information set and the user calling knowledge graph; the user value recognition unit is configured to recognize the user value of the preprocessed user data calling information set to obtain user value information; the input unit is configured to input the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set; a determining unit configured to determine a call priority of each user data call prediction information in the user data call prediction information set according to the user value information and the cluster load information set, to obtain a call priority set; and the resource adjustment unit is configured to generate cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and perform resource scheduling on the data storage cluster according to the cluster resource scheduling information.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: according to the data storage cluster resource scheduling method, the prediction accuracy of user data call and the accuracy of generated cluster resource scheduling information can be improved, the loss rate of clusters is reduced, and the stability of the data storage clusters is improved. In particular, the reason for the reduced processing efficiency of the associated clusters and the higher failure rate of the data storage clusters is: and the time sequence prediction model is adopted to predict the call quantity of the user data, and because the time sequence prediction model only focuses on the time sequence information of the call quantity, the considered influencing factors are single, the accuracy of predicting the call information set of the user data is lower, and the resource optimization of the data storage cluster is further influenced, so that the cluster processing efficiency is reduced, and the damage rate of the data storage cluster is higher. Based on this, the data storage cluster resource scheduling method of some embodiments of the present disclosure may first obtain a user data call information set of a data call user and a cluster load information set of a data storage cluster. Here, the user data call information set and the cluster load information set facilitate subsequent prediction and resource scheduling of the data storage clusters. And secondly, preprocessing the user data calling information set to obtain a preprocessed user data calling information set. Here, the quality of the user data call information set can be improved and the amount of data to be subsequently processed can be reduced. And thirdly, generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set. The user calling knowledge graph is constructed, the influence of the data calling user with the association relationship on the data calling user can be increased, and the accuracy of the follow-up prediction of the calling information set of the user calling user is improved. And then, generating a user data call prediction information set of the data call user according to the preprocessed user data call information set and the user call knowledge graph. Here, by increasing the space factor of the user calling knowledge graph and the factor of the calling information of the preprocessed user data calling information set, the accuracy of the user data calling prediction information set can be improved. And then, carrying out user value recognition on the preprocessed user data calling information set to obtain user value information. Here, the identification of the user's value may determine the value information of the data invoking user, facilitating a subsequent determination of the priority of the user's data invoking prediction information. And then, inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set. Here, the prediction of cluster remaining resources facilitates a subsequent determination of the processing of the user data call prediction information set by the data storage cluster and the scheduling of cluster resources. And then, determining the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set, and obtaining a calling priority set. The determination of the call priority is convenient for processing the user data call prediction information set according to the priority when the load of the data storage cluster is higher or the cluster residual resources cannot load the user data call prediction information set, so that the user experience can be improved. And finally, generating cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and carrying out resource scheduling on the data storage cluster according to the cluster resource scheduling information. The resource scheduling is carried out on the data storage clusters through the calling priority clusters and the cluster residual resource prediction information sets with higher accuracy, so that the accuracy of cluster resource scheduling information and optimal resource scheduling of the data storage clusters can be improved, the loss rate of the data storage clusters can be reduced, the stability of the clusters can be improved, and the processing efficiency of the user data calling prediction information sets can be improved. Therefore, the data storage cluster resource scheduling method can improve the prediction accuracy of user data call and the accuracy of generated cluster resource scheduling information, reduce the loss rate of clusters and improve the stability of the data storage clusters.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a data storage cluster resource scheduling method according to the present disclosure;

FIG. 2 is a schematic diagram of the architecture of some embodiments of a data storage cluster resource scheduling apparatus according to the present disclosure;

Fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates a flow 100 of some embodiments of a data storage cluster resource scheduling method according to the present disclosure. The data storage cluster resource scheduling method comprises the following steps:

Step 101, acquiring a user data call information set of a data call user and a cluster load information set of a data storage cluster.

In some embodiments, an execution body (e.g., an electronic device) of the data storage cluster resource scheduling method may acquire a user data call information set of the data call user and a cluster load information set of the data storage cluster through a wired connection manner or a wireless connection manner. The data calling user may be a user who calls data stored in the data storage cluster. The user data call information in the user data call information set may be request information for recording data call user call data. The data storage clusters may be storage distributed systems in which users store data resources. The cluster load information in the cluster load information set may be information characterizing an operation state of each data storage node included in the data storage cluster obtained by a query statement. The cluster load information may include, but is not limited to, at least one of: CPU (Central Processing Unit ) utilization, memory utilization, disk utilization, and network bandwidth utilization of the data storage node.

Step 102, preprocessing the user data call information set to obtain the preprocessed user data call information set.

In some embodiments, the execution body may preprocess the user data call information set to obtain a preprocessed user data call information set. The preprocessed user data call information in the preprocessed user data call information set may be a call information set after filling the missing value and removing noise by filtering. In practice, the executing body may first perform interpolation processing on the missing value by using cubic spline interpolation processing in response to determining that the missing value exists in the user data call information set, to obtain the user data call information set after interpolation. And then, performing filtering processing on the interpolated user data call information set by utilizing sliding filtering processing to obtain a preprocessed user data call information set.

And step 103, generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set.

In some embodiments, the executing entity may generate a user call knowledge graph of the data call user according to the preprocessed user data call information set. The user calling knowledge graph can represent the associated topology information of the data calling user and other data calling user sets, and the graph relation network of attribute information such as calling type, access address, calling resource and the like of the user data calling information of the data calling user.

As an example, the executing entity may first input the preprocessed user data call information set to the entity relationship joint extraction model to obtain a user entity information set, a data call entity information set, and an entity relationship set. The entity relationship joint extraction model can be a deep neural network model taking the preprocessed user data calling information set as input and taking the user entity set, the data calling entity information set and the entity relationship set as output. For example, the entity-relationship joint extraction model may be a convolutional neural network model. And then, carrying out association matching on the user entity information set, the book calling entity information set and the entity relation set to obtain a user calling triplet set. And then, inputting the user calling triplet set into a graph database to obtain a user calling knowledge graph.

In some optional implementations of some embodiments, the generating the user call knowledge graph of the data call user according to the preprocessed user data call information set may include the following steps:

And firstly, performing word segmentation processing on the related user information set included in the preprocessed user data call information set to obtain a user data call word segmentation set. The related user information set may be an information set formed by the data calling user and the users of the data calling user set with the related relationship. The data calling user set with the association relationship can be a user set with the same type of calling data resource. The user data call word segmentation in the user data call word segmentation set may be a word about each data call user obtained by performing word segmentation processing based on jieba word segmentation on the associated user information set.

And secondly, carrying out word vector representation on the user data call word segmentation set to obtain the user data call word segmentation vector set. The user data call word segmentation vector in the user data call word segmentation vector set can represent vector representations of semantic information, grammar information and position information of the user data call word segmentation. In practice, the execution body may use GloVe (GlobalVectors for Word Representation) models to perform word vector representation on the user data call word segmentation set, so as to obtain the user data call word segmentation vector set.

And thirdly, determining calling word part of speech information of each user data calling word in the user data calling word part of speech set to obtain the calling word part of speech information set.

And step four, carrying out embedded representation on the user data call word segmentation vector set according to the call word segmentation part-of-speech information set to obtain a call word segmentation feature vector set. The call word segmentation feature vectors in the call word segmentation feature vector set can represent semantic features and part-of-speech features of the call word segmentation vectors of the user data.

As an example, the execution entity may first perform encoding vector representation on the call word part of speech information set to obtain a part of speech vector set. And then, splicing the part-of-speech vector set and the user data call word segmentation vector set to obtain a call word segmentation feature vector set.

And fifthly, generating a user data call triplet set according to the call word segmentation feature vector set and the preprocessed user data call information set. The user data call triples in the user data call triples can represent triples of association topological relations among data call users.

As an example, the execution subject may generate the user relationship feature vector set according to the call word segmentation feature vector set and the preprocessed user data call information set by using a relationship extraction model. The relation extraction model can be a cyclic neural network which takes a call word feature vector set and a preprocessed user data call information set as inputs and takes the association relation of any two call word feature vectors in the call word feature vector set as output. And then, carrying out association mapping processing on the call word segmentation feature vector set and the user relationship feature vector set to obtain a generated user data call triplet set.

And sixthly, inputting the user data calling triplet set into a graph database to obtain an initial calling knowledge graph. The graph database can be a database which performs data query in the form of a graph and represents and stores data in a node, side and attribute mode. For example, the graph database may be a Neo4j database. The initial knowledge graph can characterize the associated network of the associated topology information of the data calling user and other data calling user sets.

And seventhly, carrying out knowledge reasoning on the initial calling knowledge graph to obtain a user calling knowledge graph. The knowledge reasoning can be a reasoning which expands the relation among the nodes included in the initial calling knowledge graph on the basis of the initial calling knowledge graph. The knowledge reasoning may be path-based reasoning. In practice, the executing body may perform knowledge reasoning on the initial calling knowledge graph by using a path sorting algorithm to obtain a user calling knowledge graph.

In some optional implementations of some embodiments, the generating the user data call triplet set according to the call segmentation feature vector set and the preprocessed user data call information set may include the steps of:

The first step, extracting semantic features of the call word segmentation feature vector set to obtain the call word segmentation semantic feature vector set. The call word segmentation semantic feature vectors in the call word segmentation semantic feature vector set can represent sequence features of the call word segmentation vectors of the user data. In practice, the execution body may input the word-part-of-speech information set of the call word and the word-part-of-speech vector set of the call user data to a two-way long-short-term memory neural network to obtain the feature vector set of the call word.

And secondly, performing entity boundary detection on each call word segmentation semantic feature vector in the call word segmentation semantic feature vector set to obtain a call word segmentation boundary feature vector set. The call word segmentation boundary feature vector in the call word segmentation boundary feature vector set can represent the position information of each single Chinese character included in the call word segmentation semantic feature vector. In practice, the execution body may use BIMESO standard methods to perform entity boundary detection on each call word segmentation semantic feature vector in the call word segmentation semantic feature vector set, so as to obtain a call word segmentation boundary feature vector set.

Thirdly, linearly transforming the call word segmentation boundary feature vector set to obtain a call word segmentation linear feature vector set. The call word segmentation linear feature vector in the call word segmentation linear feature vector set can represent the linear feature and the weight value of the call word segmentation semantic feature vector.

And fourthly, carrying out syntactic analysis on the preprocessed user data call information set to obtain a data call syntactic dependency graph. The data call syntax dependency graph may be an association graph that characterizes syntax dependency among call segmentation words included in the preprocessed user data call information set in a graph form. The above-described grammatical dependency may include, but is not limited to, at least one of: a main-term relationship, a dynamic guest relationship and a scholarly modifying relationship.

And fifthly, converting the data call syntax dependency graph to obtain a data call syntax dependency matrix. The data call syntax dependency matrix may be a syntax dependency relationship that characterizes call segmentation in a form of a matrix.

And sixthly, carrying out graph attention weighting processing on the calling word segmentation linear feature vector set and the data calling syntax dependency matrix to obtain a sentence sequence feature vector set and a word segmentation dependency feature vector set. The sentence sequence feature vectors in the sentence sequence feature vector set can represent the sequence features of the call sentences in the user data call information set after preprocessing. The word-segmentation dependent feature vectors in the word-segmentation dependent feature vector set may characterize local dependent features between call segments. In practice, the execution body may input the call word segmentation linear feature vector set and the data call syntax dependency matrix into a graph annotation meaning network to obtain a sentence sequence feature vector set and a word segmentation dependency feature vector set.

And seventhly, performing word segmentation relation prediction on the sentence sequence feature vector set and the word segmentation dependent feature vector set to obtain a user data call triplet set. The word segmentation relation prediction may be performed by inputting the sentence sequence feature vector set and the word segmentation dependent feature vector set to a sigmoid layer.

And 104, generating a user data call prediction information set of the data call user according to the preprocessed user data call information set and the user call knowledge graph.

In some embodiments, the executing entity may generate the user data call prediction information set of the data call user according to the preprocessed user data call information set and the user call knowledge graph. The user data call prediction information in the user data call prediction information set may be information of prediction of call type, call resource, access address, etc. of call information of the data call user in a future period. The future period may be a future day.

As an example, the execution subject may first input the preprocessed user data call information set to the long-short-term memory neural network to obtain the user data call time feature vector set. And then, inputting the user-called knowledge graph to a graph convolution neural network to obtain a user space feature vector set. And finally, inputting the user data calling time feature vector set and the user space feature vector set into a full connection layer to obtain a user data calling prediction information set.

In some optional implementations of some embodiments, the generating the user data call prediction information set of the data call user according to the preprocessed user data call information set and the user call knowledge graph may include the following steps:

And firstly, carrying out user topological space feature extraction on the user calling knowledge graph to obtain a user topological feature vector set. The user topology feature vectors in the user topology feature vector set can represent association information between data calling users. In practice, the executing body may input the user-invoked knowledge graph to a graph convolution neural network to obtain a user topology feature vector set.

And secondly, extracting time sequence characteristics of the historical data call information set included in the preprocessed user data call information set to obtain a call time sequence characteristic vector set. The calling time sequence feature vector in the calling time sequence feature vector set can represent time sequence feature information of historical data calling information. The history data call information in the history data call information set may be access call information of a data call user accessing and calling a data resource in the data storage cluster before the current time. In practice, time2Vec is utilized to perform Time vector representation on the preprocessed user data calling information set, so as to obtain a user calling Time vector set. And then, inputting the user calling time vector set into a time-characterizing attention mechanism in a multi-head attention mechanism to obtain a calling time sequence feature vector set.

And thirdly, extracting frequency characteristics of the historical data calling information set to obtain a calling frequency characteristic vector set. The calling frequency feature vector in the calling frequency feature vector set can represent the frequency feature information of the historical data calling information set. In practice, the executing body may input the historical data call information set to a frequency-characterizing attention mechanism in a multi-head attention mechanism to obtain a call frequency feature vector set.

And step four, carrying out data calling feature extraction on the historical data calling information set to obtain a data calling feature vector set. The data call feature vector in the data call feature vector set may represent feature information of call information such as call resources and call types of the historical data call information set. In practice, the execution body may input the historical data call information set to an attention mechanism characterizing data call features in a multi-head attention mechanism, to obtain a call frequency feature vector set.

And fifthly, carrying out feature fusion processing on the calling time sequence feature vector set, the calling frequency feature vector set and the data calling feature vector set to obtain a calling fusion feature vector set. The feature fusion process may be channel feature vector fusion performed by the call timing feature vector set, the call frequency feature vector set, and the data call feature vector set.

And sixthly, carrying out regression prediction processing on the call fusion feature vector set and the user topology feature vector set to obtain a user data call prediction information set. The regression prediction process may be a regression prediction process performed by inputting the call fusion feature vector set and the user topology feature vector set to a full connection layer.

And 105, carrying out user value recognition on the preprocessed user data call information set to obtain user value information.

In some embodiments, the executing entity may perform user value identification on the preprocessed user data call information set to obtain user value information. The user value information may be information indicating importance of the data calling user. As an example, the executing entity may use a CLV (Customer Lifetime Value, client life cycle value) model to identify the user value of the preprocessed user data call information set, so as to obtain user value information.

And 106, inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set.

In some embodiments, the execution body may input the cluster load information set to a cluster load prediction model to obtain a cluster residual resource prediction information set. The cluster residual resource prediction information in the cluster residual resource prediction information set may be resource information obtained by a difference between a resource information set of a user of the data storage cluster and the cluster load information set. The set of cluster residual resource prediction information may include, but is not limited to, at least one of: CPU residual resource information, network bandwidth residual resource information, memory residual resource information and disk residual resource information. The cluster load prediction model may be a deep neural network model with a cluster load information set as input and remaining resource information of a data storage cluster in a future period as output. The cluster load prediction model may include a model composed of ARMA (Autoregressivemoving average model ) and BP (Back Propagationneural network) neural network.

As an example, the execution body may first perform decomposition processing on the cluster load information set to obtain a linear load information set and a nonlinear load information set. And then, inputting the linear load information set and the nonlinear load information set into an ARMA to obtain a linear time feature vector set and a nonlinear time feature vector set. And finally, inputting the linear time feature vector set and the nonlinear time feature vector set into a BP neural network to obtain a cluster residual resource prediction information set.

And step 107, determining the call priority of each piece of user data call prediction information in the user data call prediction information set according to the user value information and the cluster load information set, and obtaining a call priority set.

In some embodiments, the executing entity may determine a call priority of each user data call prediction information in the user data call prediction information set according to the user value information and the cluster load information set, to obtain a call priority set. The call priority may characterize information of a sequence of call of the data storage resource when receiving the user data call prediction information.

As an example, the execution body may input the user value information and the cluster load information set into a priority evaluation model to obtain a call priority of each user data call prediction information in the user data call prediction information set, and obtain a call priority set. The finite set evaluation model may be a model in which user value information and a set of load information are input and priority of user data calling prediction information is output. For example, the priority assessment model may be a convolutional neural network.

In some optional implementations of some embodiments, the determining the call priority of each user data call prediction information in the user data call prediction information set according to the user value information and the cluster load information set to obtain a call priority set may include the following steps:

And a first step of determining a cluster residual resource information set of the data storage cluster according to the cluster load information set. The cluster residual resource information in the cluster residual resource information set may be resource information corresponding to a difference value between a storage resource of the data storage cluster and the cluster load information set.

As an example, the execution body may perform difference processing on the cluster resource corresponding to the data storage cluster and the cluster load information set to obtain a cluster residual resource information set.

And secondly, determining the number of resource peak call tokens included in the preset resource peak call container as the number of the resource peak tokens. The preset resource peak call container may be a container that generates a peak call token at a first preset rate. The preset resource peak call container can be used for comparing a large amount of user data call information instantly received by the data storage cluster. The capacity of the preset resource peak call container may represent the amount of maximum data call information that the data storage cluster may process. The resource peak call token described above may characterize the maximum amount of data source call information that the transient data storage resource can handle. The first preset rate may be a preset generation speed of the resource peak call token. The first predetermined rate may be indicative of a maximum rate at which the data storage cluster processes user data call information.

And thirdly, determining the number of the resource average call tokens included in the preset resource average call container as the number of the resource average tokens, wherein the preset resource average call container can be a container for generating average call tokens at a second preset rate. The preset resource average call container can be used for comparing a large amount of user data call information instantaneously received by the data storage cluster. The capacity of the preset resource average call container may represent the amount of minimum data call information that the data storage cluster may process. The resource average call token described above may characterize the minimum amount of data source call information that the transient data storage cluster can handle. The second preset rate may be a preset generation speed of the resource average call token. The second predetermined rate may be indicative of a minimum rate at which user data call information is processed at the data storage cluster.

Fourth, for each user data call prediction information in the user data call prediction information set, the following priority determining step is performed:

And 1, responding to the condition that the cluster residual resource information set meets the load of the user data call prediction information set, and comparing the number of data call bytes corresponding to the user data call prediction information with the number of resource peak tokens to obtain a first peak comparison result, wherein the user value information is the user value information of a first level. Wherein, the first level of user value information may be high value user information. The number of data call bytes may be the number of bytes included in a request packet obtained by encapsulating the user data call prediction information.

And 2, determining the first priority as the calling priority in response to determining that the number of the data calling bytes represented by the first peak value comparison result is smaller than or equal to the number of the resource peak tokens. Wherein the first priority may be a high priority.

And 3, in response to determining that the number of the data call bytes represented by the first peak comparison result is greater than or equal to the number of the resource peak tokens, comparing the number of the remaining call bytes with the number of the resource average tokens to obtain a first average comparison result. The number of the remaining call bytes may be a difference between the number of the data call bytes and the number of the resource peak tokens.

And step 4, determining the first priority as the calling priority in response to determining that the number of the residual calling bytes represented by the first average comparison result is larger than or equal to the number of the resource average tokens.

And a sub-step 5 of determining the second priority as the call priority in response to determining that the first average comparison result indicates that the number of the remaining call bytes is smaller than the number of the resource average tokens. Wherein, the second priority may be a medium priority.

In some optional implementations of some embodiments, after determining the second priority as the call priority in response to determining that the first average comparison result characterizes the remaining number of call bytes is less than the average number of resource tokens, the method may further include the steps of:

And in the first step, in response to determining that the cluster residual resource information set meets the condition of loading the user data call prediction information set, and the user value information is user value information of a second level, comparing the number of data call bytes with the number of resource peak tokens to obtain a second peak comparison result. Wherein, the second level of user value information may be low level of user value information.

And a second step of comparing the number of the remaining call bytes with the number of the resource average tokens to obtain a second average comparison result in response to determining that the number of the data call bytes represented by the second peak comparison result is greater than or equal to the number of the resource peak tokens.

And thirdly, determining a second priority as a calling priority in response to determining that the second average comparison result represents that the number of the remaining calling bytes is greater than or equal to the number of the resource average tokens.

And fourthly, determining the third priority as the calling priority in response to determining that the second average comparison result indicates that the number of the residual calling bytes is smaller than the number of the resource average tokens. Wherein the third priority may be a low priority.

And step 108, generating cluster resource scheduling information of the data storage clusters according to the calling priority clusters and the cluster residual resource prediction information sets, and carrying out resource scheduling on the data storage clusters according to the cluster resource scheduling information.

In some embodiments, the execution body may generate cluster resource scheduling information of the data storage cluster according to the call priority set and the cluster residual resource prediction information set, and perform resource scheduling on the data storage cluster according to the cluster resource scheduling information. The cluster resource scheduling information may be information for performing resource scheduling on the data storage cluster. The scheduling of the resources by the data storage cluster may be scheduling of increasing or decreasing the resources of each data storage node by using scheduling information of each data storage node included in the data storage cluster.

As an example, the execution body may generate cluster resource scheduling information of the data storage cluster according to the call priority set and the cluster resource prediction information set by using a polling scheduling algorithm, and perform resource scheduling on the data storage cluster according to the cluster resource scheduling information.

Considering the problem that the conventional solution adopts the locust optimization algorithm to determine cluster resource scheduling information of the data storage clusters so as to dynamically adjust the resources of the data storage clusters, the second technical problem is faced: the initial population is obtained by adopting random initialization, so that the initial population has great randomness, the locust optimization algorithm is linearly decreased along with the increase of the execution times when the locust individuals are updated, the self-adaptive updating cannot be performed, the optimization efficiency of the algorithm is low along with the increase of the execution times, the optimization solution is trapped in a local optimal solution, the resource scheduling of the data storage cluster is unbalanced, the performance of the data storage cluster is unstable, the damage degree of the data storage nodes is increased, and the user experience is reduced. In connection with the state of the art it may be decided to employ the following solutions.

In some optional implementations of some embodiments, generating cluster resource scheduling information of the data storage cluster according to the call priority set and the cluster residual resource prediction information set, and performing resource scheduling on the data storage cluster according to the cluster resource scheduling information may include the following steps:

And a first step of generating an initial population of the data storage node set included in the data storage cluster according to the calling priority set and the cluster residual resource prediction information set. The initial population may be a feasible resource scheduling information set corresponding to CPU remaining resources, memory remaining resources, disk remaining resources, and network bandwidth remaining resources of each data storage node in the data storage node set. The resource scheduling information set may be scheduling information of an increase or decrease in resources of the CPU remaining resources, the memory remaining resources, the disk remaining resources, and the network bandwidth remaining resources among the respective data storage nodes.

As an example, the executing entity may randomly generate the initial population of the set of data storage nodes included in the data storage cluster according to the call priority set and the remaining resource range of each cluster remaining resource prediction information in the cluster remaining resource prediction information set. The above-mentioned remaining resource range may be a range between the maximum remaining resource and the minimum remaining resource of the data storage node.

And secondly, generating a reverse population of the initial population. The inverse population may be a population formed by inverse solutions of the initial individuals in the initial population. In practice, the executing entity may first determine, for each of the reverse individuals in the reverse population, a remaining resource range of an initial individual corresponding to the reverse individual. Then, the difference between the sum of the maximum value and the minimum value of the remaining resource ranges and the remaining resource corresponding to the initial individual is determined as the reverse individual.

And thirdly, setting a population fitness function for the initial population and the reverse population. Wherein the population fitness function may characterize the ability of the data storage node to process user data call prediction information. The population fitness function may be a function that processes the user data call prediction information set with minimal consumption of remaining resources by each data storage node.

And step four, inputting the initial population and the reverse population into the population fitness function to obtain an initial fitness value set and a reverse fitness value set. Wherein, the initial fitness value in the initial fitness value set may characterize the remaining resources of the data storage cluster consumed by the randomly generated cluster resource scheduling information. The inverse fitness value in the set of inverse fitness values may characterize remaining resources of the data storage cluster consumed by an inverse solution of the randomly generated cluster resource scheduling information.

And fifthly, screening the initial population and the reverse population according to the initial fitness value set and the reverse fitness value set to obtain an initial fitness population. The initial fitness population may be a population formed by initial individuals and reverse individuals of the number of initial individuals included in the previous initial population having the largest initial fitness value and reverse fitness value.

As an example, the execution body may first sort the initial fitness value set and the reverse fitness value set from small to large to obtain a fitness value sequence. And then, selecting the adaptation degree value of the number of the initial individuals included in the initial population before the initial population from the adaptation degree value sequence to obtain a target adaptation degree value sequence. And finally, determining the initial individuals and the opposite individuals corresponding to the target fitness numerical sequence as an initial fitness population.

Sixth, based on the initial fitness population, the following scheduling information determining step is performed:

and a sub-step 1 of determining the executed times of the scheduling information determining step.

In response to determining that the number of executions is less than a first preset execution threshold, based on the initial fitness population, performing the following first individual update step:

And a first sub-step of screening out initial fitness individuals with the largest initial fitness values corresponding to the initial fitness individuals from the initial fitness population as target initial fitness individuals. The first preset execution threshold may be a threshold of half of the maximum execution times preset for execution in the scheduling information determining step. The initial fitness value may be a value obtained by inputting the initial fitness individual to a population fitness function.

And a second sub-step of randomly selecting initial fitness individuals from the initial fitness population as random initial fitness individuals.

And a third sub-step, comparing the individual initial fitness value with the random initial fitness value to obtain a fitness comparison result. The initial fitness value of the individual may be an initial fitness value corresponding to the target initial fitness individual. The random initial fitness value may be an initial fitness value corresponding to the random initial fitness individual.

And a fourth sub-step, for each initial fitness individual in the initial fitness population, updating the initial fitness individual according to the target initial fitness individual, the random initial fitness individual and the fitness comparison result to obtain a first updated individual.

As an example, the executing body may update, for each initial fitness individual in the initial fitness population, the initial fitness individual according to the target initial fitness individual, the random initial fitness individual, and the fitness comparison result by using a first updating formula, to obtain a first updated individual. Wherein the first update formula may be

Wherein X ₁ represents a first updated formula. Alpha ₁ represents the ac coefficient between the individuals of the initial fitness, and is set to 1.C represents the search step of the individual of initial fitness. Alpha ₂ represents the victory factor of the initial fitness individual with a large fitness value in the fitness comparison result, and is set to any value between [0.6,0.9 ]. X _win represents an individual with a large initial fitness value in the fitness comparison result. X _i represents an individual of initial fitness. Alpha ₃ represents the disfavor factor of the initial fitness individual with small fitness value in the fitness comparison result, and is set to any value between [0.1,0.4 ]. X _loss represents an individual with a small initial fitness value in the fitness comparison result. T _max denotes a maximum execution threshold of the scheduling information determining step. X _best represents the target initial fitness individual. N represents the number of initial fitness individuals that the initial fitness population comprises. c represents the update delivery coefficient. ub represents the maximum value of the remaining resources. lb represents the minimum of the remaining resources. s () represents the social impact between individuals of initial fitness. x _j represents the position information of the jth initial fitness individual. x _i represents the position information of the i-th initial fitness individual. d _ij denotes the Euclidean distance between the initial fitness individual and the initial fitness individual. t represents the number of executions. c _max denotes the maximum value of the update decrement coefficient. c _min denotes the minimum value of the update decrement coefficient.

In the sub-step 3, in response to determining that the number of times of execution is greater than or equal to the first preset execution threshold, based on the obtained first updated population, the following second individual updating step is executed:

And a first sub-step, screening the first updated population to obtain a target updated population. The target updated population may be a population formed by randomly selecting a first updated individual half the number of the first updated individuals included in the first updated population from the first updated population.

And a second sub-step of inputting the target updated population into the population fitness function to obtain an updated fitness value set of the target updated population.

And a third sub-step of screening out the target updated individuals corresponding to the updated fitness value with the largest value from the updated fitness value set to obtain the screened target updated individuals.

And a fourth sub-step of randomly selecting target updating individuals from the target updating population to obtain random updating individuals.

And a fifth sub-step, for each target updating individual in the target updating population, updating the target updating individual according to the screened target updating individual and the randomly updated individual to obtain an updated updating individual.

As an example, the executing entity may update the target updated individual according to the filtered target updated individual and the randomly updated individual by using the second update formula for each target updated individual in the target update population, to obtain an updated individual. Wherein, the second update formula may be:

Wherein X ₂ represents a second updated formula. Cauchy (0, 1) represents a one-dimensional standard Kexish distribution probability density function. Representing a loudness multiplication. X _best represents post-screening target updated individuals. X _rand represents a randomly updated individual. X _k represents a target updated individual.

And a sixth sub-step of removing the target updated population from the first updated population to obtain a remaining population.

And a seventh sub-step of determining the remaining population as an initial fitness population, and performing the first individual updating step to obtain updated individuals as remaining updated individuals.

And an eighth substep of determining the obtained updated population and the obtained remaining updated population as a second updated population.

And step 4, determining the second updated population as cluster resource scheduling information in response to determining that the executed times reach a second preset execution threshold, and carrying out resource scheduling on the data storage clusters according to the cluster resource scheduling information. The second preset execution threshold may be a preset maximum execution number of the scheduling information determining step. For example, the second preset execution threshold may be 100. The second preset execution threshold may be twice as large as the first preset execution threshold. The resource schedule may be a schedule that increases or decreases the resources of the respective data storage nodes.

Seventh, in response to determining that the number of times of execution does not reach the second preset execution threshold, determining the first updated population or the second updated population as an initial fitness population or a first updated population, and determining the sum of the number of times of execution and the preset threshold as the number of times of execution, so as to execute the scheduling information determining step again. And determining the first updated population as an initial fitness population when the executed times are smaller than the first preset execution threshold. The first updated population may be determined to be the first updated population when the number of executions equals the first preset execution threshold. The second updated population may be determined to be the first updated population when the number of executions is greater than the first preset execution threshold. The predetermined threshold may be a predetermined value. For example, the preset threshold may be 1.

The technical scheme and related content are taken as an invention point of the embodiment of the disclosure, and the technical problem mentioned in the background art is solved, namely, the initial population is obtained by adopting random initialization of the locust optimization algorithm, so that the initial population has high randomness, the locust optimization algorithm is linearly decreased along with the increase of the execution times when the locust individuals are updated, the self-adaptive updating cannot be performed, the optimization efficiency of the algorithm along with the increase of the execution times is low, the local optimal solution is trapped, the resource scheduling of the data storage cluster is unbalanced, the performance of the data storage cluster is unstable, the damage degree of the data storage nodes is increased, and the user experience is reduced. Factors that cause uneven resource scheduling of the data storage cluster, unstable performance of the data storage cluster, and increase damage degree of the data storage node and reduce user experience are often as follows: because the locust optimization algorithm is randomly initialized to obtain an initial population, the initial population has great randomness, and the locust optimization algorithm linearly decreases along with the increase of the execution times when the locust individuals are updated, the locust optimization algorithm cannot be adaptively updated, the optimization efficiency of the locust optimization algorithm along with the increase of the execution times is low, and the locust optimization algorithm falls into a local optimal solution easily. If the factors are solved, the effects of improving the resource scheduling balance of the data storage cluster and the stability of the data storage cluster, reducing the damage degree of the data storage nodes and improving the user experience can be achieved. In order to achieve the effect, the method comprises the steps of firstly, reversely learning an initial population to obtain a reverse population, screening the initial population and the reverse population according to the initial fitness value set and the reverse fitness value set to obtain an initial fitness population, and improving the quality of the initial fitness population, so that the initial fitness population is distributed in a search space as uniformly as possible, the convergence rate of the initial fitness population is improved, and the subsequent execution times are reduced. Secondly, updating each initial fitness individual in the initial fitness population in stages, when the executed times are smaller than a first preset execution threshold, updating the initial fitness individuals according to target initial fitness individuals, random initial fitness individuals and fitness comparison results to obtain first updated individuals, designing nonlinear update decreasing coefficients by cosine functions to better improve global searching capacity in the early stage and local searching capacity in the later stage of the initial fitness population, avoiding being limited to local optimal solutions, and enlarging searching range according to influences of the target initial fitness individuals and the random initial fitness individuals in a first update formula, each initial fitness individual is ensured to search toward the target initial fitness individual. Then, when the executed times are greater than or equal to a first preset execution threshold, randomly selecting the first updated population to obtain a target updated population and a residual population, updating the target updated population based on the cauchy operator and the screened target updated individuals, and updating the residual population by a first updating formula, so that the ability of the first updated individuals to jump out of the local optimal solution can be improved, and the global searching ability can be enhanced. And finally, determining the obtained second updated population as cluster resource scheduling information, and carrying out resource scheduling on the data storage clusters according to the cluster resource scheduling information, so that the accuracy of the cluster resource scheduling information can be improved, the waste of resources of the data storage clusters and the processing rate of user data calling information are reduced, and the stability and user experience of the data storage clusters are improved.

With further reference to fig. 2, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of a data storage cluster resource scheduling apparatus, which correspond to those method embodiments shown in fig. 1, and which may be applied in particular in various electronic devices.

As shown in fig. 2, a data storage cluster resource scheduling apparatus 200 includes: an acquisition unit 201, a preprocessing unit 202, a first generation unit 203, a second generation unit 204, a user value identification unit 205, an input unit 206, a determination unit 207, and a resource adjustment unit 208. Wherein the acquisition unit 201 is configured to: and acquiring a user data calling information set of the data calling user and a cluster load information set of the data storage cluster. The preprocessing unit 202 is configured to: and preprocessing the user data calling information set to obtain the preprocessed user data calling information set. The first generation unit 203 is configured to: and generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set. The second generation unit 204 is configured to: and generating a user data calling prediction information set of the data calling user according to the preprocessed user data calling information set and the user calling knowledge graph. The user value recognition unit 205 is configured to: and carrying out user value identification on the preprocessed user data calling information set to obtain user value information. The input unit 206 is configured to: and inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set. The determination unit 207 is configured to: and determining the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set, and obtaining a calling priority set. The resource adjustment unit 208 is configured to: generating cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and carrying out resource scheduling on the data storage cluster according to the cluster resource scheduling information.

It will be appreciated that the elements recited in the data storage cluster resource scheduling apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and advantages described above for the method are equally applicable to the data storage cluster resource scheduling device 200 and the units contained therein, and are not described herein again.

Referring now to fig. 3, a schematic diagram of an electronic device (e.g., electronic device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a user data calling information set of a data calling user and a cluster load information set of a data storage cluster; preprocessing the user data calling information set to obtain a preprocessed user data calling information set; generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set; generating a user data calling prediction information set of the data calling user according to the preprocessed user data calling information set and the user calling knowledge graph; carrying out user value identification on the preprocessed user data calling information set to obtain user value information; inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set; determining the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set to obtain a calling priority set; generating cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and carrying out resource scheduling on the data storage cluster according to the cluster resource scheduling information.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a query unit, and a segmentation unit. Wherein the names of the units do not constitute a limitation of the unit itself in some cases, for example, the acquisition unit may also be described as "unit that acquires a user data call information set of a data call user and a cluster load information set of a data storage cluster".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A data storage cluster resource scheduling method comprises the following steps:

acquiring a user data calling information set of a data calling user and a cluster load information set of a data storage cluster;

preprocessing the user data calling information set to obtain a preprocessed user data calling information set;

Generating a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set;

Generating a user data calling prediction information set of the data calling user according to the preprocessed user data calling information set and the user calling knowledge graph;

carrying out user value identification on the preprocessed user data calling information set to obtain user value information;

Inputting the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set;

determining the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set to obtain a calling priority set;

Generating cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and carrying out resource scheduling on the data storage cluster according to the cluster resource scheduling information.

2. The method of claim 1, wherein the generating a user invocation knowledge graph of the data invocation user from the preprocessed user data invocation information set comprises:

Performing word segmentation processing on the related user information set included in the preprocessed user data call information set to obtain a user data call word segmentation set;

carrying out word vector representation on the user data call word segmentation set to obtain a user data call word segmentation vector set;

Determining the call word part of speech information of each user data call word in the user data call word set to obtain a call word part of speech information set;

According to the call word part of speech information set, carrying out embedded representation on the call word vector set of the user data to obtain a call word feature vector set;

generating a user data call triplet set according to the call word segmentation feature vector set and the preprocessed user data call information set;

inputting the user data calling triplet set into a graph database to obtain an initial calling knowledge graph;

and carrying out knowledge reasoning on the initial calling knowledge graph to obtain a user calling knowledge graph.

3. The method of claim 2, wherein the generating a user data call triplet set from the call segmentation feature vector set and the preprocessed user data call information set comprises:

extracting semantic features of the call word segmentation feature vector set to obtain the call word segmentation semantic feature vector set;

Performing entity boundary detection on each call word segmentation semantic feature vector in the call word segmentation semantic feature vector set to obtain a call word segmentation boundary feature vector set;

performing linear transformation on the call word segmentation boundary feature vector set to obtain a call word segmentation linear feature vector set;

carrying out syntactic analysis on the preprocessed user data call information set to obtain a data call syntactic dependency graph;

converting the data call syntax dependency graph to obtain a data call syntax dependency matrix;

carrying out graph attention weighting treatment on the calling word segmentation linear feature vector set and the data calling syntactic dependency matrix to obtain a sentence sequence feature vector set and a word segmentation dependent feature vector set;

and carrying out word segmentation relation prediction on the sentence sequence feature vector set and the word segmentation dependent feature vector set to obtain a user data call triplet set.

4. The method of claim 1, wherein the generating a set of user data invocation prediction information for the data invocation user from the preprocessed set of user data invocation information and the user invocation knowledge graph comprises:

the user invokes the knowledge graph to extract the user topological space features to obtain a user topological feature vector set;

Extracting time sequence characteristics of a historical data calling information set included in the preprocessed user data calling information set to obtain a calling time sequence characteristic vector set;

Extracting frequency characteristics of the historical data calling information set to obtain a calling frequency characteristic vector set;

extracting data calling characteristics of the historical data calling information set to obtain a data calling characteristic vector set;

Performing feature fusion processing on the calling time sequence feature vector set, the calling frequency feature vector set and the data calling feature vector set to obtain a calling fusion feature vector set;

and carrying out regression prediction processing on the call fusion feature vector set and the user topology feature vector set to obtain a user data call prediction information set.

5. The method of claim 1, wherein the determining the call priority of each user data call prediction information in the user data call prediction information set according to the user value information and the cluster load information set, to obtain the call priority set, comprises:

Determining a cluster residual resource information set of the data storage cluster according to the cluster load information set;

determining the number of resource peak call tokens included in a preset resource peak call container as the number of resource peak tokens, wherein the preset resource peak call container is a container for generating peak call tokens at a first preset rate;

determining the number of resource average call tokens included in a preset resource average call container as the number of the resource average tokens, wherein the preset resource average call container is a container for generating average call tokens at a second preset rate;

for each user data call prediction information in the set of user data call prediction information, performing the following priority determination steps:

Responding to the condition that the cluster residual resource information set meets the load of the user data call prediction information set, wherein the user value information is of a first grade, and comparing the number of data call bytes corresponding to the user data call prediction information with the number of resource peak tokens to obtain a first peak comparison result;

determining a first priority as a call priority in response to determining that the first peak comparison result characterizes the number of data call bytes is less than or equal to the number of resource peak tokens;

Responding to the fact that the first peak value comparison result represents that the number of the data call bytes is larger than or equal to the number of the resource peak tokens, comparing the number of the residual call bytes with the number of the resource average tokens to obtain a first average comparison result, wherein the number of the residual call bytes is the difference value between the number of the data call bytes and the number of the resource peak tokens;

Determining a first priority as a call priority in response to determining that the first average comparison result characterizes the number of remaining call bytes as greater than or equal to the number of resource average tokens;

In response to determining that the first average comparison result characterizes the number of remaining call bytes as less than the average number of resource tokens, a second priority is determined as a call priority.

6. The method of claim 5, wherein after the determining a second priority as a call priority in response to determining that the first average comparison result characterizes the remaining number of call bytes is less than the resource average number of tokens, the method further comprises:

Responding to the condition that the cluster residual resource information set meets the load of the user data call prediction information set, wherein the user value information is of a second level, and comparing the number of data call bytes with the number of resource peak tokens to obtain a second peak comparison result;

Responding to the fact that the second peak value comparison result represents that the number of the data call bytes is larger than or equal to the number of the resource peak tokens, and comparing the number of the residual call bytes with the number of the resource average tokens to obtain a second average comparison result;

determining a second priority as a call priority in response to determining that the second average comparison result characterizes the number of remaining call bytes as being greater than or equal to the number of resource average tokens;

And in response to determining that the second average comparison result characterizes the number of remaining call bytes as less than the average number of resource tokens, determining a third priority as a call priority.

7. A data storage cluster resource scheduling apparatus, comprising:

an acquisition unit configured to acquire a user data call information set of a data call user and a cluster load information set of a data storage cluster;

The preprocessing unit is configured to preprocess the user data calling information set to obtain a preprocessed user data calling information set;

The first generation unit is configured to generate a user calling knowledge graph of the data calling user according to the preprocessed user data calling information set;

the second generation unit is configured to generate a user data call prediction information set of the data call user according to the preprocessed user data call information set and the user call knowledge graph;

The user value recognition unit is configured to recognize the user value of the preprocessed user data calling information set to obtain user value information;

The input unit is configured to input the cluster load information set into a cluster load prediction model to obtain a cluster residual resource prediction information set;

the determining unit is configured to determine the calling priority of each piece of user data calling prediction information in the user data calling prediction information set according to the user value information and the cluster load information set to obtain a calling priority set;

and the resource adjustment unit is configured to generate cluster resource scheduling information of the data storage cluster according to the calling priority set and the cluster residual resource prediction information set, and perform resource scheduling on the data storage cluster according to the cluster resource scheduling information.

8. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

9. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-6.