CN115330422A

CN115330422A - Big data service management system based on distributed storage

Info

Publication number: CN115330422A
Application number: CN202211256997.8A
Authority: CN
Inventors: 陈炯彬; 戚升权; 王世存
Original assignee: Ningbo Xinhuan Network Technology Co ltd
Current assignee: Ningbo Xinhuan Network Technology Co ltd
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2022-11-11
Anticipated expiration: 2042-10-14
Also published as: CN115330422B

Abstract

The invention relates to the technical field of electric digital data processing, in particular to a big data service management system based on distributed storage, which comprises: the system comprises a time delay suitability acquisition module, a synchronization condition stability acquisition module, a grouping module, a typical degree acquisition module and a synchronization strategy determination module; the time delay suitability acquiring module is used for acquiring time delay data corresponding to each work order session so as to acquire the time delay suitability of each work order session; the synchronization condition stability acquisition module is used for acquiring the differential data volume of each work order session and acquiring the synchronization condition stability by combining the time delay suitability; the grouping module is used for grouping all the work order sessions to obtain a normal group and an abnormal group; the typical degree acquisition module is used for acquiring the typical degree of each work order session in the group; the synchronization strategy determination module is used for acquiring normal and abnormal fluctuation intervals of the differential data volume and determining the synchronization strategy by combining the predicted differential data volume so as to ensure that the service can operate quickly and stably.

Description

Big data service management system based on distributed storage

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a big data service management system based on distributed storage.

Background

The business management system is mainly a customer relationship management CRM, the existing CRM has a function of real-time customer conversation, and for a work order, the distributed storage system not only needs to bear more work order accessories, but also needs to respond in real time and synchronize to other nodes as soon as possible so as to ensure that the CRM can run synchronously in various places. The distributed storage time synchronization condition can influence whether the data state synchronization of the CRM is normal or not, according to the conventional condition, the work order conversation data of the CRM is increased all the time, but the data rollback condition also exists, the rollback data disk-falling time delay can be larger, and more IO requests can be brought by one-time synchronization.

For a distributed storage system, arbitration is generally used to solve the problem from the aspect of data consistency, when there is a large IO request due to synchronization of a large number of work order sessions, it is difficult to efficiently write operations into a database of the distributed storage, and service delay may be large, so that misunderstanding of a user or exception of service logic is generated, and therefore a synchronization policy of the work order sessions in the distributed storage needs to be determined, so that the service can be operated quickly and stably.

Disclosure of Invention

In order to solve the problems of large service delay and low service operation efficiency of distributed storage, the invention aims to provide a large data service management system based on distributed storage, which comprises the following modules:

the time delay suitability acquiring module is used for acquiring time delay data of each work order session in a time period from the last synchronization to the next synchronization, forming all the time delay data into a data set, and acquiring the time delay suitability of the work order session based on the data set;

the synchronization condition stability acquisition module is used for acquiring the differential data volume of each work order session at a plurality of sampling moments in the last synchronization and acquiring the synchronization condition stability of the work order session according to the differential data volume and the time delay suitability;

the grouping module is used for acquiring the difference distance of any two work order conversations according to the time delay suitability and the synchronization condition stability of each work order conversation and dividing all the work order conversations into a normal group and an abnormal group based on the difference distance;

the typical degree acquisition module is used for acquiring the work order semantic descriptors of each work order session, and for a normal group and an abnormal group, acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptors of each work order session in a group; obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the group;

the synchronous strategy determining module is used for acquiring a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group and acquiring an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and acquiring the predicted differential data volume of the work order conversation to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.

Preferably, the method for obtaining the time delay suitability of the work order session based on the data set in the time delay suitability obtaining module includes:

acquiring the average value and the variance of all delay data in the data set, calculating the difference between the average value and a preset proper delay size, and acquiring the delay suitability based on the difference and the variance;

the time delay suitability is in a negative correlation relation with the difference value, and the time delay suitability is in a negative correlation relation with the variance.

Preferably, the method for obtaining the synchronization condition stability of the work order session according to the differential data volume and the time delay suitability in the synchronization condition stability obtaining module includes:

acquiring the variation range of all the differential data volumes corresponding to the work order session, wherein the variation range refers to the difference value between the differential data volume at the last sampling moment and the differential data volume at the first sampling moment;

acquiring a difference absolute value of the difference data quantity corresponding to every two adjacent sampling moments in all the difference data quantities corresponding to the work order session as a difference value, selecting a maximum value in all the difference values, and calculating a summation result of the maximum value of the difference value and the variation range;

and acquiring the stability of the synchronization condition of the work order session according to the summation result and the time delay suitability, wherein the stability of the synchronization condition and the summation result are in a negative correlation relationship, and the stability of the synchronization condition and the time delay suitability are in a positive correlation relationship.

Preferably, the method for obtaining the difference distance between any two work order sessions in the grouping module according to the time delay suitability and the synchronization condition stability of each work order session includes:

acquiring a square result of a difference value between the stability of the synchronization conditions corresponding to any two work order sessions, and calculating a regular distance of dynamic time corresponding to the two work order sessions;

taking the negative number of the square result as a power exponent to obtain an exponential function; obtaining the difference distance according to the exponential function and the dynamic time warping distance;

the difference distance and the exponential function are in positive correlation, and the difference distance and the dynamic time warping distance are in negative correlation.

Preferably, the method for acquiring the differential data variation trend of any two work order sessions based on the work order semantic descriptor of each work order session in the group in the representativeness acquisition module includes:

calculating the morphological similarity distance between the differential data volumes corresponding to every two work order sessions in the group; carrying out difference on the variation range corresponding to the two work order sessions and solving an absolute value to obtain a difference value;

acquiring the similarity between semantic descriptors corresponding to two work order conversations;

and constructing an exponential function by taking the negative number of the morphological similarity distance as a power exponent, multiplying the exponential function and the similarity to obtain a product result, wherein the ratio of the product result to the difference value is the change trend of the difference data of the two work order conversations.

Preferably, the method for acquiring the normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group in the synchronization policy determination module includes:

and performing descending order arrangement on the representative degrees of all the work order sessions in the normal group, taking the work order sessions corresponding to the first 5 representative degrees after the descending order arrangement as reference samples, and forming a normal fluctuation interval of the differential data volume by using the differential data volume corresponding to the reference samples.

Preferably, the method for acquiring the abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group in the synchronization policy determination module includes:

and (4) performing ascending sequence arrangement on the representative degrees of all the work order sessions in the abnormal group, taking the work order sessions corresponding to the first 5 representative degrees after the ascending sequence arrangement as abnormal samples, and forming an abnormal fluctuation interval of the differential data quantity by using the differential data quantity corresponding to the abnormal samples.

Preferably, the training data of the LSTM prediction network in the synchronization policy determination module is a differential data volume corresponding to the work order session in the normal group.

The invention has the following beneficial effects: in the embodiment of the invention, time delay data of the work order conversation in the time period from the last synchronization to the next synchronization is acquired and analyzed to obtain the time delay suitability corresponding to each work order conversation; then, obtaining a differential data volume corresponding to each work order session during the last synchronization, and obtaining the stability of synchronization conditions based on the differential data volume and the time delay suitability, so that the subsequent synchronization strategy determination is more reliable; obtaining a difference distance between any two work order sessions by combining the time delay suitability and the synchronization condition stability corresponding to each work order session, and dividing all the work order sessions into a normal group and an abnormal group based on the difference distance; acquiring a work order semantic descriptor of each work order session, and acquiring the change trend of differential data between every two work order sessions in each group so as to obtain the typical degree of each work order session in the group; acquiring a normal fluctuation interval and an abnormal fluctuation interval based on the typical degree; obtaining a prediction differential data volume according to an LSTM prediction network, and determining a synchronization strategy according to the prediction differential data volume, a normal fluctuation interval and an abnormal fluctuation interval; the problem that the efficiency is too low due to data blockage in the process of the distributed storage service is solved, and the service can run quickly and stably.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a block diagram of a big data service management system based on distributed storage according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description, structure, features and effects of a big data service management system based on distributed storage according to the present invention will be provided with reference to the accompanying drawings and preferred embodiments. In the following description, the different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The method is suitable for determining the work order session synchronization strategy in the distributed storage service; the following describes a specific scheme of a big data service management system based on distributed storage in detail with reference to the accompanying drawings.

Referring to fig. 1, a block diagram of a big data service management system based on distributed storage according to an embodiment of the present invention is shown, where the system includes the following modules:

the delay suitability acquiring module 10 is configured to acquire delay data of each work order session in a time period from the last synchronization to the next synchronization, form all the delay data into a data set, and acquire the delay suitability of the work order session based on the data set.

The distributed storage areas are in a backbone network, and the distributed storage nodes are mutually synchronized through the backbone network, wherein one-time synchronization means that a plurality of work order states are sent to a remote distributed storage system.

The synchronization in the embodiment of the invention refers to the action that one server node sends a synchronization request to other server nodes, wherein one-time synchronization comprises the state of one-time work order session, and if the work order session changes, the work order session is synchronized at a certain time interval; the batch synchronization means that one server node directly compresses and packs data aiming at the work order session meeting the conditions and distributes the data to other storage nodes, so that the throughput is improved, and the synchronization time is shortened.

Because the state of the work order session may change at any time, especially when the network environment is poor, request blocking may occur, so that the terminal oriented in the embodiment of the present invention needs to provide a time delay with the CDN server when submitting the work order session, and when the time delay is relatively large, a situation in which two requests continuously reach a node may easily occur, so that synchronization still needs to be continued after a batch of synchronization, and therefore, the time delay size needs to be detected to determine the suitability of the time delay for currently submitting the work order session.

The time delay data of each work order session in the time period after the last synchronization and before the next synchronization are acquired, the acquisition frequency is set to be 1Hz, namely, the acquisition is performed once per second, and the time delay data of the work order session in the time period are sequentially acquired to obtain a data set.

Obtaining the average value of the time delay data in the data sets, further obtaining the corresponding variance according to the average value in each data set, and calculating the time delay suitability of the work order conversation corresponding to the data set by combining the average value and the variance, wherein the calculation of the time delay suitability is as follows:

wherein the content of the first and second substances,

representing the time delay suitability of the work order session;

an average value representing the time delay data in the data set;

representing the variance of the delay data in the data set;

the proper time delay size is shown and is set by an implementer;

the relaxation correction coefficient is represented, and 0.2 is taken in the embodiment of the invention;

representing an exponential function;

representing an absolute value operation.

Preferably, in the embodiment of the present invention, an appropriate delay size is set

Is 32ms.

When the average value of the time delay data in the data set is closer to the proper time delay size, the suitability of the work order conversation is larger; meanwhile, when the time delay data in the data set fluctuate less, the data are more stable, and the time delay suitability corresponding to the work order conversation

The larger the value of (a) is.

And a synchronization condition stability obtaining module 20, configured to obtain a difference data amount of each work order session at a plurality of sampling times during the last synchronization, and obtain synchronization condition stability of the work order session according to the difference data amount and the time delay suitability.

The size of the load data in the work order session affects the synchronization performance, and further affects whether the synchronization of the work order session in the distributed storage system is reliable or not; therefore, based on the last synchronous data in the work order session, the data quantity statistics of data differential change is carried out, the differential data quantity of each sampling moment of the work order session at the last synchronization is obtained, the sampling frequency is also 1Hz, namely, the data is collected once per second, and the differential data quantity of each second of the work order session at the last synchronization is further obtained.

In the embodiment of the present invention, the difference data amount per second refers to accumulated data, and for example, the difference data amount per second refers to accumulated values of difference data corresponding to the first second, the second, and the third second. Based on the differential data volume of the work order session in the last synchronization and the time delay suitability of the work order session after the last synchronization and before the next synchronization, the stability of the synchronization condition of the work order session is obtained, and the calculation of the stability of the synchronization condition is as follows:

wherein the content of the first and second substances,

representing synchronization condition stability;

the change of the differential data volume of the work order conversation during the last synchronization is very poor, namely the difference value between the differential data volume acquired in the last second and the differential data volume acquired in the first second during the last synchronization;

the difference value of the difference data volume when the work order session is synchronized at the last time is represented, namely, the absolute value of the difference between the difference data volume corresponding to each second and the difference data volume corresponding to the next second is represented;

represents a maximum function;

means to select the maximum value among all the differential values;

indicating the time delay suitability of the work order session.

If the differential data volume corresponding to the work order session accumulates more data in a period of time, the change of the differential data volume corresponding to the work order session is extremely poor, and the stability of the synchronization condition is poor; if the maximum differential value of the differential data volume corresponding to the work order session is larger, the problem of larger blocking exists in the actual transmission process, so that the server possibly encounters the problem of data blocking in the later period, and the stability of the synchronization condition is poorer; the larger the time delay suitability corresponding to the work order conversation is, the better the stability of the synchronization condition of the work order conversation is.

And the grouping module 30 is configured to obtain a difference distance between any two work order sessions according to the time delay suitability and the synchronization condition stability of each work order session, and divide all the work order sessions into a normal group and an abnormal group based on the difference distance.

In the work order synchronization process, the network environment directly influences whether work order synchronization can be effectively carried out, an excessively poor network environment can directly cause sudden change of the differential data volume of the work order session, even the rollback of the work order session is accompanied, a large amount of differential data becomes invalid due to the rollback, and if the rollback work order session appears in a large amount in a batch synchronization task, severe performance loss of distributed storage can be brought.

Therefore, the work order conversations in the process of multiple synchronization to be synchronized are analyzed, a difference distance function is established, the difference distance between each work order conversation is determined, and the specific difference distance is calculated as follows:

wherein the content of the first and second substances,

representing a difference distance between the work order session A and the work order session B;

representing the stability of the synchronization condition of the work order conversation A;

representing the stability of the synchronization condition of the work order conversation B;

a data set representing time delay data corresponding to the work order session A;

a data set representing time delay data corresponding to the work order session B;

representing a calculation of a dynamic time warping distance;

an exponential function with a natural constant e as the base is shown.

By analogy, the difference distance between any two worksheet sessions can be obtained; all the work order sessions can be grouped according to the corresponding difference distance of each work order session, and the grouping method in the embodiment of the invention adopts a classic k-means clustering algorithm and divides all the work order sessions into two groups according to the difference distance between the work order sessions.

Further, the two groups are distinguished, the average value of the differential data amount corresponding to all the work order sessions in each group is calculated, the group with the large average value is an abnormal group, and the group with the small average value is a normal group.

The representativeness acquiring module 40 is used for acquiring the work order semantic descriptors of each work order session, and acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptors of each work order session in the group for the normal group and the abnormal group; and obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the grouping.

Because the work order session mainly comprises dialogues between the user and the customer service, and the like, compared with a plaintext dialog in the work order session differential data in the last synchronization, the semantic features of the work order can be embodied, so that some features accompanied with a large amount of data change can be obviously distinguished, for example: screenshots, photographs, recordings, utterances, etc., represent multimedia or non-specific words, such as: firmware, log, dump and the like, and the vocabularies are difficult to construct manually, so that the conditions of work order conversation to be synchronized can be well distinguished in the follow-up process by combining network time delay and differential data volume.

In the embodiment of the invention, after the word frequency is counted by using the word bag model, the word frequency of all words in the text of the enterprise data resource can be obtained, and in addition, common words and words repeatedly appearing in the field also need to be eliminated; performing text word frequency statistics on the plain text conversations of all work order conversations in the service; after word segmentation, the word bag model can obtain word-based characteristics of the text by counting the occurrence frequency of each word in the text, and the method mainly calculates the secondary characteristic value through TF-IDF, removes common words and field words in time, but because of the attribute of big data of work order conversation, a word table easily breaks through million-level word labels from business experience. In consideration of the sparsity of the text, the features after the hash can well represent the features before the hash, so that the vocabulary is encoded into 65536 dimensional feature codes through the Hashing vector, and the feature codes are recorded as semantic descriptors of the work order conversation.

All the work order sessions are divided into normal and abnormal groups in the grouping module 30, and since the change process of the differential data amount in the work order session is complicated, different work order sessions in each group are analyzed.

Taking a normal group as an example, acquiring a differential data change trend between any two work order sessions in the group, wherein the differential data change trend is calculated as follows:

wherein the content of the first and second substances,

indicating normal intra-group work order sessions

Conversation with work order

Differential data trend therebetween;

representing work order sessions

Corresponding to all the differential data quantities;

representing work order sessions

Corresponding to all the differential data volumes;

representing work order sessions

The variation corresponding to all the differential data amounts is extremely poor;

representing work order sessions

representing morphological similarity distance calculations;

represents an absolute value operation;

representing work order sessions

A corresponding semantic descriptor;

representing work order sessions

A corresponding semantic descriptor;

similarity calculation is represented, and cosine similarity representation is adopted in the embodiment of the invention.

When the data change of the differential data amount corresponding to the two work order conversations is closer, the change trend of the differential data of the two work order conversations is closerCorresponding trend of differential data

The larger the value of (a) is.

By analogy, obtaining the change trend of the difference data between every two work order sessions in the normal group; correspondingly, the differential data change trend between every two work order sessions in the abnormal group is obtained based on the same method of all the work order sessions in the normal group.

Further, the typical degree of each work order conversation is obtained according to the variation trend of the differential data between every two work order conversations in each group, so that the work order conversations in the normal group are obtained

For example, a work order session

The typical degree of (c) is calculated as:

wherein the content of the first and second substances,

representing work order sessions

The degree of representativeness of (a);

indicating normal intra-group work order sessions

Conversation with work order

Differential data trend between, work order conversation

Representative of normal intra-group removal of work order sessions

Any other work order session.

In a similar manner, based on obtaining a work order session

The method for obtaining the typical degree of each work order session in the normal group is used for obtaining the typical degree corresponding to each work order session in the normal group; correspondingly, the corresponding typical degree of each work order session in the abnormal group is obtained.

The synchronization strategy determining module 50 is configured to obtain a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group, and obtain an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and obtaining the predicted differential data volume of the work order session to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.

Because the data in the work order session is mainly data generated by human-human interaction and human behaviors are different, the variation of the differential data amount corresponding to each work order session is obviously different from the expected reference fluctuation, and the difference may be different in each work order synchronization process, so that the fluctuation range is determined according to the differential data amount variation of the work order session in multiple normal processes.

Specifically, the first 5 work order sessions with a larger typical degree in the normal group are selected, that is, the typical degrees of all the work order sessions in the normal group are arranged in a descending order, the work order sessions corresponding to the first 5 typical degrees after the descending order are taken as reference samples, the differential data amount corresponding to the reference samples forms a normal fluctuation interval of the differential data amount, the maximum value of the differential data amount in the reference samples is the normal fluctuation upper limit of the differential data amount, and the minimum value of the differential data amount in the reference samples is the normal fluctuation lower limit of the differential data amount; the normal fluctuation interval of the differential data volume of the work order conversation determined by the method can better represent the differential change data of batch synchronization, so that the differential data volume in the batch synchronization process is more consistent with the typical load of distributed storage.

Correspondingly, 5 work order sessions with smaller typical degrees in the abnormal group are selected, namely the typical degrees of all the work order sessions in the abnormal group are arranged in an ascending order, the work order sessions corresponding to the first 5 typical degrees after the ascending order are abnormal samples, the differential data amount corresponding to the abnormal samples form an abnormal fluctuation interval of the differential data amount, the maximum value of the differential data amount in the abnormal samples is the abnormal fluctuation upper limit of the differential data amount, and the minimum value of the differential data amount in the abnormal samples is the abnormal fluctuation lower limit of the differential data amount; and determining the upper and lower limits of the to-be-synchronized process of the work order conversation based on the abnormal fluctuation interval of the differential data volume of the abnormal sample, and when the upper and lower limits are exceeded, synchronizing in other modes.

Further, training an LSTM prediction network according to the differential data volume corresponding to all work order sessions in the normal group to realize the prediction of the subsequent differential data volume; labeling the obtained change sequence of the differential data volume of the worksheet session in the normal group, inputting the labeled change sequence into an LSTM network, wherein each differential data volume is a sample, and moving the sample to the future for 10 detection times; deleting a section of sample without a label after moving, wherein the movement is a phase movement, and therefore the LSTM prediction network can predict the size of the differential data volume in the future work order session; weight distribution is carried out on a loss function generated by each sample in the training process, the error generated in the training process is a mean square error, and the loss function is as follows:

wherein, in the step (A),

is shown as

The greater the typical degree of the work order conversation corresponding to each sample is, the higher the accuracy degree of the corresponding sample is, so that the current network prediction is more accurate.

The method comprises the steps of detecting the size of the differential data volume in the current work order conversation through the prediction of a neural network, finding out the abnormal fluctuation condition of the differential data volume as soon as possible, and comparing and judging the abnormal fluctuation condition. In the actual use process, according to the size of the prediction difference data volume at the moment after the LSTM prediction network outputs, comparing the obtained normal fluctuation interval and the obtained abnormal fluctuation interval, and determining a corresponding synchronization strategy.

When the size of the predicted differential data volume is larger than the upper limit of normal fluctuation or not larger than the lower limit of normal fluctuation, namely the predicted differential data volume is not in a normal fluctuation interval, single synchronization asynchronous with batch synchronization is immediately carried out, the differential data is common in normal work order conversation, and the time for immediate synchronization is extremely short when the differential data is less; for more differential data, immediate synchronization can avoid further growth of the data.

When the predicted difference data volume is smaller than the abnormal fluctuation lower limit, the synchronization of the work order session is postponed, the change of the work order session is continuously tracked until the difference data volume is larger than the normal fluctuation lower limit, and the work order session is immediately synchronized, so that the phenomenon that a large amount of messages suddenly rush to the server node due to network abnormality is prevented, and the following difference data volume is too large and even exceeds the abnormal fluctuation upper limit.

When the size of the predicted difference data volume is in the abnormal fluctuation interval, dividing the abnormal fluctuation interval into N equal parts, and marking each equal-divided interval as a grade from small to large in sequence

N is a positive integer, set by the implementer according to the synchronization policy; the levels in the abnormal fluctuation interval can be corresponded according to the predicted differential data amount, and different priorities are obtained according to the corresponding levels.

As an example, assume that the normal data priority is 90 and the abnormal data priority is 90If the data size is lower than the normal data priority, the priority of the predicted differential data size in the abnormal fluctuation interval may be:

(ii) a Wherein the content of the first and second substances,

indicating a priority;

a level indicating that the predicted differential data amount is in the abnormal fluctuation section;

the offset value, which represents manual debugging for such abnormal situations, is set to 1 in the embodiment of the present invention.

Therefore, when the predicted differential data volume is in the abnormal fluctuation interval and is closer to the abnormal fluctuation upper limit, the priority corresponding to the predicted differential data volume is lower; asynchronous synchronization is carried out in a mode of lowest synchronous priority, so that IO pressure of other distributed storage nodes is relieved; asynchronous synchronous sequence, refers to another mechanism that queues for synchronization regardless of the mechanism of batch synchronization, for sequentially handling requests in exceptional cases.

And when the size of the predicted differential data volume is larger than the abnormal fluctuation upper limit, carrying out asynchronous synchronization from high priority to low priority.

When the prediction difference prediction data volume is in a normal fluctuation interval, directly carrying out batch synchronization under the condition of not exceeding the queue according to the priority in the queue; if the queue length is exceeded, waiting for next synchronization; the size of the queue is determined by the implementer according to the service situation, and the size of the queue in the embodiment of the invention is 1000.

It should be noted that, if there is an intersection between the normal fluctuation interval and the abnormal fluctuation interval and the size of the predicted differential data amount is in the intersection, the processing is performed based on the criterion that the predicted differential data amount is in the normal fluctuation interval.

In summary, the embodiment of the present invention includes a delay suitability obtaining module 10, a synchronization condition stability obtaining module 20, a grouping module 30, a representative degree obtaining module 40, and a synchronization policy determining module 50; the time delay suitability acquiring module is used for acquiring time delay data of each work order session before synchronization, forming a data set according to all the time delay data, and further acquiring the time delay suitability of the work order session based on the mean value and the variance of the time delay data in the data set; the synchronization condition stability acquisition module is used for acquiring the corresponding differential data volume of each work order session during the last synchronization and acquiring the synchronization condition stability of the corresponding work order session according to the change of the differential data volume and the time delay suitability; the grouping module is used for acquiring the difference distance according to the time delay suitability and the synchronization condition stability corresponding to each work order session, dividing all the work order sessions into two groups based on the difference distance by using a k-means clustering algorithm, and dividing the two groups into a normal group and an abnormal group according to the difference data volume corresponding to each work order session in the groups; the typical degree obtaining module is used for calculating the typical degree of each work order session in the group, and the typical degree is obtained by a semantic descriptor corresponding to the work order session and a difference data change trend between any two work order sessions. And then, a normal fluctuation interval and an abnormal fluctuation interval of the differential data variation are obtained by combining the typical degree of each work order session through a synchronization strategy determining module, LSTM prediction network is trained by utilizing differential data of the work order sessions in a normal group to obtain the predicted differential data volume of the work order sessions in the process of synchronization, and the synchronization strategy of the work order sessions is determined according to the relationship between the predicted differential data volume and the normal fluctuation interval and the abnormal fluctuation interval, so that the condition of IO request accumulation caused by network abnormality and other factors is avoided in time, and the service can be operated quickly and stably.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit of the present invention are intended to be included therein.

Claims

1. A big data service management system based on distributed storage is characterized by comprising the following modules:

the time delay suitability acquisition module is used for acquiring time delay data of each work order session in a time period from the last synchronization to the next synchronization, forming all the time delay data into a data set and acquiring the time delay suitability of the work order session based on the data set;

the typical degree obtaining module is used for obtaining the work order semantic descriptors of each work order session, and for a normal group and an abnormal group, the difference data change trend of any two work order sessions is obtained based on the work order semantic descriptors of each work order session in the group; obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the group;

the synchronous strategy determining module is used for acquiring a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group and acquiring an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and acquiring the predicted differential data volume of the work order session to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.

2. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the time delay suitability of the work order session based on the data set in the time delay suitability obtaining module comprises;

3. The big data service management system based on distributed storage according to claim 1, wherein the method for acquiring the synchronization condition stability of the work order session according to the differential data amount and the delay suitability in the synchronization condition stability acquisition module comprises:

4. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the difference distance between any two work order sessions in the grouping module according to the time delay suitability and the synchronization condition stability of each work order session comprises:

5. The big data service management system based on distributed storage according to claim 3, wherein the method for acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptor of each work order session in a group in the representativeness acquisition module comprises:

obtaining the similarity between the semantic descriptors corresponding to the two work order conversations;

6. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group in the synchronization policy determination module comprises:

and performing descending order arrangement on the representative degrees of all the work order conversations in the normal group, taking the work order conversations corresponding to the first 5 representative degrees after descending order arrangement as reference samples, and forming a normal fluctuation interval of the differential data quantity by using the differential data quantity corresponding to the reference samples.

7. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group in the synchronization policy determination module comprises:

8. The big data traffic management system based on distributed storage according to claim 1, wherein the training data of the LSTM prediction network in the synchronization policy determination module is a differential data volume corresponding to the work order session in the normal group.