CN115330422A - Big data service management system based on distributed storage - Google Patents

Big data service management system based on distributed storage Download PDF

Info

Publication number
CN115330422A
CN115330422A CN202211256997.8A CN202211256997A CN115330422A CN 115330422 A CN115330422 A CN 115330422A CN 202211256997 A CN202211256997 A CN 202211256997A CN 115330422 A CN115330422 A CN 115330422A
Authority
CN
China
Prior art keywords
work order
differential data
acquiring
synchronization
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211256997.8A
Other languages
Chinese (zh)
Other versions
CN115330422B (en
Inventor
陈炯彬
戚升权
王世存
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Xinhuan Network Technology Co ltd
Original Assignee
Ningbo Xinhuan Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Xinhuan Network Technology Co ltd filed Critical Ningbo Xinhuan Network Technology Co ltd
Priority to CN202211256997.8A priority Critical patent/CN115330422B/en
Publication of CN115330422A publication Critical patent/CN115330422A/en
Application granted granted Critical
Publication of CN115330422B publication Critical patent/CN115330422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of electric digital data processing, in particular to a big data service management system based on distributed storage, which comprises: the system comprises a time delay suitability acquisition module, a synchronization condition stability acquisition module, a grouping module, a typical degree acquisition module and a synchronization strategy determination module; the time delay suitability acquiring module is used for acquiring time delay data corresponding to each work order session so as to acquire the time delay suitability of each work order session; the synchronization condition stability acquisition module is used for acquiring the differential data volume of each work order session and acquiring the synchronization condition stability by combining the time delay suitability; the grouping module is used for grouping all the work order sessions to obtain a normal group and an abnormal group; the typical degree acquisition module is used for acquiring the typical degree of each work order session in the group; the synchronization strategy determination module is used for acquiring normal and abnormal fluctuation intervals of the differential data volume and determining the synchronization strategy by combining the predicted differential data volume so as to ensure that the service can operate quickly and stably.

Description

Big data service management system based on distributed storage
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a big data service management system based on distributed storage.
Background
The business management system is mainly a customer relationship management CRM, the existing CRM has a function of real-time customer conversation, and for a work order, the distributed storage system not only needs to bear more work order accessories, but also needs to respond in real time and synchronize to other nodes as soon as possible so as to ensure that the CRM can run synchronously in various places. The distributed storage time synchronization condition can influence whether the data state synchronization of the CRM is normal or not, according to the conventional condition, the work order conversation data of the CRM is increased all the time, but the data rollback condition also exists, the rollback data disk-falling time delay can be larger, and more IO requests can be brought by one-time synchronization.
For a distributed storage system, arbitration is generally used to solve the problem from the aspect of data consistency, when there is a large IO request due to synchronization of a large number of work order sessions, it is difficult to efficiently write operations into a database of the distributed storage, and service delay may be large, so that misunderstanding of a user or exception of service logic is generated, and therefore a synchronization policy of the work order sessions in the distributed storage needs to be determined, so that the service can be operated quickly and stably.
Disclosure of Invention
In order to solve the problems of large service delay and low service operation efficiency of distributed storage, the invention aims to provide a large data service management system based on distributed storage, which comprises the following modules:
the time delay suitability acquiring module is used for acquiring time delay data of each work order session in a time period from the last synchronization to the next synchronization, forming all the time delay data into a data set, and acquiring the time delay suitability of the work order session based on the data set;
the synchronization condition stability acquisition module is used for acquiring the differential data volume of each work order session at a plurality of sampling moments in the last synchronization and acquiring the synchronization condition stability of the work order session according to the differential data volume and the time delay suitability;
the grouping module is used for acquiring the difference distance of any two work order conversations according to the time delay suitability and the synchronization condition stability of each work order conversation and dividing all the work order conversations into a normal group and an abnormal group based on the difference distance;
the typical degree acquisition module is used for acquiring the work order semantic descriptors of each work order session, and for a normal group and an abnormal group, acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptors of each work order session in a group; obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the group;
the synchronous strategy determining module is used for acquiring a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group and acquiring an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and acquiring the predicted differential data volume of the work order conversation to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.
Preferably, the method for obtaining the time delay suitability of the work order session based on the data set in the time delay suitability obtaining module includes:
acquiring the average value and the variance of all delay data in the data set, calculating the difference between the average value and a preset proper delay size, and acquiring the delay suitability based on the difference and the variance;
the time delay suitability is in a negative correlation relation with the difference value, and the time delay suitability is in a negative correlation relation with the variance.
Preferably, the method for obtaining the synchronization condition stability of the work order session according to the differential data volume and the time delay suitability in the synchronization condition stability obtaining module includes:
acquiring the variation range of all the differential data volumes corresponding to the work order session, wherein the variation range refers to the difference value between the differential data volume at the last sampling moment and the differential data volume at the first sampling moment;
acquiring a difference absolute value of the difference data quantity corresponding to every two adjacent sampling moments in all the difference data quantities corresponding to the work order session as a difference value, selecting a maximum value in all the difference values, and calculating a summation result of the maximum value of the difference value and the variation range;
and acquiring the stability of the synchronization condition of the work order session according to the summation result and the time delay suitability, wherein the stability of the synchronization condition and the summation result are in a negative correlation relationship, and the stability of the synchronization condition and the time delay suitability are in a positive correlation relationship.
Preferably, the method for obtaining the difference distance between any two work order sessions in the grouping module according to the time delay suitability and the synchronization condition stability of each work order session includes:
acquiring a square result of a difference value between the stability of the synchronization conditions corresponding to any two work order sessions, and calculating a regular distance of dynamic time corresponding to the two work order sessions;
taking the negative number of the square result as a power exponent to obtain an exponential function; obtaining the difference distance according to the exponential function and the dynamic time warping distance;
the difference distance and the exponential function are in positive correlation, and the difference distance and the dynamic time warping distance are in negative correlation.
Preferably, the method for acquiring the differential data variation trend of any two work order sessions based on the work order semantic descriptor of each work order session in the group in the representativeness acquisition module includes:
calculating the morphological similarity distance between the differential data volumes corresponding to every two work order sessions in the group; carrying out difference on the variation range corresponding to the two work order sessions and solving an absolute value to obtain a difference value;
acquiring the similarity between semantic descriptors corresponding to two work order conversations;
and constructing an exponential function by taking the negative number of the morphological similarity distance as a power exponent, multiplying the exponential function and the similarity to obtain a product result, wherein the ratio of the product result to the difference value is the change trend of the difference data of the two work order conversations.
Preferably, the method for acquiring the normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group in the synchronization policy determination module includes:
and performing descending order arrangement on the representative degrees of all the work order sessions in the normal group, taking the work order sessions corresponding to the first 5 representative degrees after the descending order arrangement as reference samples, and forming a normal fluctuation interval of the differential data volume by using the differential data volume corresponding to the reference samples.
Preferably, the method for acquiring the abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group in the synchronization policy determination module includes:
and (4) performing ascending sequence arrangement on the representative degrees of all the work order sessions in the abnormal group, taking the work order sessions corresponding to the first 5 representative degrees after the ascending sequence arrangement as abnormal samples, and forming an abnormal fluctuation interval of the differential data quantity by using the differential data quantity corresponding to the abnormal samples.
Preferably, the training data of the LSTM prediction network in the synchronization policy determination module is a differential data volume corresponding to the work order session in the normal group.
The invention has the following beneficial effects: in the embodiment of the invention, time delay data of the work order conversation in the time period from the last synchronization to the next synchronization is acquired and analyzed to obtain the time delay suitability corresponding to each work order conversation; then, obtaining a differential data volume corresponding to each work order session during the last synchronization, and obtaining the stability of synchronization conditions based on the differential data volume and the time delay suitability, so that the subsequent synchronization strategy determination is more reliable; obtaining a difference distance between any two work order sessions by combining the time delay suitability and the synchronization condition stability corresponding to each work order session, and dividing all the work order sessions into a normal group and an abnormal group based on the difference distance; acquiring a work order semantic descriptor of each work order session, and acquiring the change trend of differential data between every two work order sessions in each group so as to obtain the typical degree of each work order session in the group; acquiring a normal fluctuation interval and an abnormal fluctuation interval based on the typical degree; obtaining a prediction differential data volume according to an LSTM prediction network, and determining a synchronization strategy according to the prediction differential data volume, a normal fluctuation interval and an abnormal fluctuation interval; the problem that the efficiency is too low due to data blockage in the process of the distributed storage service is solved, and the service can run quickly and stably.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a block diagram of a big data service management system based on distributed storage according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description, structure, features and effects of a big data service management system based on distributed storage according to the present invention will be provided with reference to the accompanying drawings and preferred embodiments. In the following description, the different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The method is suitable for determining the work order session synchronization strategy in the distributed storage service; the following describes a specific scheme of a big data service management system based on distributed storage in detail with reference to the accompanying drawings.
Referring to fig. 1, a block diagram of a big data service management system based on distributed storage according to an embodiment of the present invention is shown, where the system includes the following modules:
the delay suitability acquiring module 10 is configured to acquire delay data of each work order session in a time period from the last synchronization to the next synchronization, form all the delay data into a data set, and acquire the delay suitability of the work order session based on the data set.
The distributed storage areas are in a backbone network, and the distributed storage nodes are mutually synchronized through the backbone network, wherein one-time synchronization means that a plurality of work order states are sent to a remote distributed storage system.
The synchronization in the embodiment of the invention refers to the action that one server node sends a synchronization request to other server nodes, wherein one-time synchronization comprises the state of one-time work order session, and if the work order session changes, the work order session is synchronized at a certain time interval; the batch synchronization means that one server node directly compresses and packs data aiming at the work order session meeting the conditions and distributes the data to other storage nodes, so that the throughput is improved, and the synchronization time is shortened.
Because the state of the work order session may change at any time, especially when the network environment is poor, request blocking may occur, so that the terminal oriented in the embodiment of the present invention needs to provide a time delay with the CDN server when submitting the work order session, and when the time delay is relatively large, a situation in which two requests continuously reach a node may easily occur, so that synchronization still needs to be continued after a batch of synchronization, and therefore, the time delay size needs to be detected to determine the suitability of the time delay for currently submitting the work order session.
The time delay data of each work order session in the time period after the last synchronization and before the next synchronization are acquired, the acquisition frequency is set to be 1Hz, namely, the acquisition is performed once per second, and the time delay data of the work order session in the time period are sequentially acquired to obtain a data set.
Obtaining the average value of the time delay data in the data sets, further obtaining the corresponding variance according to the average value in each data set, and calculating the time delay suitability of the work order conversation corresponding to the data set by combining the average value and the variance, wherein the calculation of the time delay suitability is as follows:
Figure 315060DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 169883DEST_PATH_IMAGE002
representing the time delay suitability of the work order session;
Figure 621724DEST_PATH_IMAGE003
an average value representing the time delay data in the data set;
Figure 954617DEST_PATH_IMAGE004
representing the variance of the delay data in the data set;
Figure 4875DEST_PATH_IMAGE005
the proper time delay size is shown and is set by an implementer;
Figure 968283DEST_PATH_IMAGE006
the relaxation correction coefficient is represented, and 0.2 is taken in the embodiment of the invention;
Figure 969737DEST_PATH_IMAGE007
representing an exponential function;
Figure 106320DEST_PATH_IMAGE008
representing an absolute value operation.
Preferably, in the embodiment of the present invention, an appropriate delay size is set
Figure 211417DEST_PATH_IMAGE005
Is 32ms.
When the average value of the time delay data in the data set is closer to the proper time delay size, the suitability of the work order conversation is larger; meanwhile, when the time delay data in the data set fluctuate less, the data are more stable, and the time delay suitability corresponding to the work order conversation
Figure 876885DEST_PATH_IMAGE002
The larger the value of (a) is.
And a synchronization condition stability obtaining module 20, configured to obtain a difference data amount of each work order session at a plurality of sampling times during the last synchronization, and obtain synchronization condition stability of the work order session according to the difference data amount and the time delay suitability.
The size of the load data in the work order session affects the synchronization performance, and further affects whether the synchronization of the work order session in the distributed storage system is reliable or not; therefore, based on the last synchronous data in the work order session, the data quantity statistics of data differential change is carried out, the differential data quantity of each sampling moment of the work order session at the last synchronization is obtained, the sampling frequency is also 1Hz, namely, the data is collected once per second, and the differential data quantity of each second of the work order session at the last synchronization is further obtained.
In the embodiment of the present invention, the difference data amount per second refers to accumulated data, and for example, the difference data amount per second refers to accumulated values of difference data corresponding to the first second, the second, and the third second. Based on the differential data volume of the work order session in the last synchronization and the time delay suitability of the work order session after the last synchronization and before the next synchronization, the stability of the synchronization condition of the work order session is obtained, and the calculation of the stability of the synchronization condition is as follows:
Figure 568897DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 243592DEST_PATH_IMAGE010
representing synchronization condition stability;
Figure 895807DEST_PATH_IMAGE011
the change of the differential data volume of the work order conversation during the last synchronization is very poor, namely the difference value between the differential data volume acquired in the last second and the differential data volume acquired in the first second during the last synchronization;
Figure 997755DEST_PATH_IMAGE012
the difference value of the difference data volume when the work order session is synchronized at the last time is represented, namely, the absolute value of the difference between the difference data volume corresponding to each second and the difference data volume corresponding to the next second is represented;
Figure 973802DEST_PATH_IMAGE013
represents a maximum function;
Figure 419564DEST_PATH_IMAGE014
means to select the maximum value among all the differential values;
Figure 672822DEST_PATH_IMAGE002
indicating the time delay suitability of the work order session.
If the differential data volume corresponding to the work order session accumulates more data in a period of time, the change of the differential data volume corresponding to the work order session is extremely poor, and the stability of the synchronization condition is poor; if the maximum differential value of the differential data volume corresponding to the work order session is larger, the problem of larger blocking exists in the actual transmission process, so that the server possibly encounters the problem of data blocking in the later period, and the stability of the synchronization condition is poorer; the larger the time delay suitability corresponding to the work order conversation is, the better the stability of the synchronization condition of the work order conversation is.
And the grouping module 30 is configured to obtain a difference distance between any two work order sessions according to the time delay suitability and the synchronization condition stability of each work order session, and divide all the work order sessions into a normal group and an abnormal group based on the difference distance.
In the work order synchronization process, the network environment directly influences whether work order synchronization can be effectively carried out, an excessively poor network environment can directly cause sudden change of the differential data volume of the work order session, even the rollback of the work order session is accompanied, a large amount of differential data becomes invalid due to the rollback, and if the rollback work order session appears in a large amount in a batch synchronization task, severe performance loss of distributed storage can be brought.
Therefore, the work order conversations in the process of multiple synchronization to be synchronized are analyzed, a difference distance function is established, the difference distance between each work order conversation is determined, and the specific difference distance is calculated as follows:
Figure 945672DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 848162DEST_PATH_IMAGE016
representing a difference distance between the work order session A and the work order session B;
Figure 864660DEST_PATH_IMAGE017
representing the stability of the synchronization condition of the work order conversation A;
Figure 300320DEST_PATH_IMAGE018
representing the stability of the synchronization condition of the work order conversation B;
Figure 806388DEST_PATH_IMAGE019
a data set representing time delay data corresponding to the work order session A;
Figure 193245DEST_PATH_IMAGE020
a data set representing time delay data corresponding to the work order session B;
Figure 747854DEST_PATH_IMAGE021
representing a calculation of a dynamic time warping distance;
Figure 38021DEST_PATH_IMAGE007
an exponential function with a natural constant e as the base is shown.
By analogy, the difference distance between any two worksheet sessions can be obtained; all the work order sessions can be grouped according to the corresponding difference distance of each work order session, and the grouping method in the embodiment of the invention adopts a classic k-means clustering algorithm and divides all the work order sessions into two groups according to the difference distance between the work order sessions.
Further, the two groups are distinguished, the average value of the differential data amount corresponding to all the work order sessions in each group is calculated, the group with the large average value is an abnormal group, and the group with the small average value is a normal group.
The representativeness acquiring module 40 is used for acquiring the work order semantic descriptors of each work order session, and acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptors of each work order session in the group for the normal group and the abnormal group; and obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the grouping.
Because the work order session mainly comprises dialogues between the user and the customer service, and the like, compared with a plaintext dialog in the work order session differential data in the last synchronization, the semantic features of the work order can be embodied, so that some features accompanied with a large amount of data change can be obviously distinguished, for example: screenshots, photographs, recordings, utterances, etc., represent multimedia or non-specific words, such as: firmware, log, dump and the like, and the vocabularies are difficult to construct manually, so that the conditions of work order conversation to be synchronized can be well distinguished in the follow-up process by combining network time delay and differential data volume.
In the embodiment of the invention, after the word frequency is counted by using the word bag model, the word frequency of all words in the text of the enterprise data resource can be obtained, and in addition, common words and words repeatedly appearing in the field also need to be eliminated; performing text word frequency statistics on the plain text conversations of all work order conversations in the service; after word segmentation, the word bag model can obtain word-based characteristics of the text by counting the occurrence frequency of each word in the text, and the method mainly calculates the secondary characteristic value through TF-IDF, removes common words and field words in time, but because of the attribute of big data of work order conversation, a word table easily breaks through million-level word labels from business experience. In consideration of the sparsity of the text, the features after the hash can well represent the features before the hash, so that the vocabulary is encoded into 65536 dimensional feature codes through the Hashing vector, and the feature codes are recorded as semantic descriptors of the work order conversation.
All the work order sessions are divided into normal and abnormal groups in the grouping module 30, and since the change process of the differential data amount in the work order session is complicated, different work order sessions in each group are analyzed.
Taking a normal group as an example, acquiring a differential data change trend between any two work order sessions in the group, wherein the differential data change trend is calculated as follows:
Figure 918252DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 461579DEST_PATH_IMAGE023
indicating normal intra-group work order sessions
Figure 819879DEST_PATH_IMAGE024
Conversation with work order
Figure 964553DEST_PATH_IMAGE025
Differential data trend therebetween;
Figure 514221DEST_PATH_IMAGE026
representing work order sessions
Figure 377134DEST_PATH_IMAGE024
Corresponding to all the differential data quantities;
Figure 335863DEST_PATH_IMAGE027
representing work order sessions
Figure 335043DEST_PATH_IMAGE025
Corresponding to all the differential data volumes;
Figure 58542DEST_PATH_IMAGE028
representing work order sessions
Figure 408752DEST_PATH_IMAGE024
The variation corresponding to all the differential data amounts is extremely poor;
Figure 108855DEST_PATH_IMAGE029
representing work order sessions
Figure 228120DEST_PATH_IMAGE025
The variation corresponding to all the differential data amounts is extremely poor;
Figure 854011DEST_PATH_IMAGE030
representing morphological similarity distance calculations;
Figure 363621DEST_PATH_IMAGE008
represents an absolute value operation;
Figure 867415DEST_PATH_IMAGE031
representing work order sessions
Figure 336793DEST_PATH_IMAGE024
A corresponding semantic descriptor;
Figure 635050DEST_PATH_IMAGE032
representing work order sessions
Figure 756590DEST_PATH_IMAGE025
A corresponding semantic descriptor;
Figure 64074DEST_PATH_IMAGE033
similarity calculation is represented, and cosine similarity representation is adopted in the embodiment of the invention.
When the data change of the differential data amount corresponding to the two work order conversations is closer, the change trend of the differential data of the two work order conversations is closerCorresponding trend of differential data
Figure 390888DEST_PATH_IMAGE034
The larger the value of (a) is.
By analogy, obtaining the change trend of the difference data between every two work order sessions in the normal group; correspondingly, the differential data change trend between every two work order sessions in the abnormal group is obtained based on the same method of all the work order sessions in the normal group.
Further, the typical degree of each work order conversation is obtained according to the variation trend of the differential data between every two work order conversations in each group, so that the work order conversations in the normal group are obtained
Figure 860047DEST_PATH_IMAGE024
For example, a work order session
Figure 406566DEST_PATH_IMAGE024
The typical degree of (c) is calculated as:
Figure 691310DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 374095DEST_PATH_IMAGE036
representing work order sessions
Figure 76472DEST_PATH_IMAGE024
The degree of representativeness of (a);
Figure 110287DEST_PATH_IMAGE037
indicating normal intra-group work order sessions
Figure 258109DEST_PATH_IMAGE024
Conversation with work order
Figure 60980DEST_PATH_IMAGE038
Differential data trend between, work order conversation
Figure 871941DEST_PATH_IMAGE038
Representative of normal intra-group removal of work order sessions
Figure 560762DEST_PATH_IMAGE024
Any other work order session.
In a similar manner, based on obtaining a work order session
Figure 748161DEST_PATH_IMAGE024
The method for obtaining the typical degree of each work order session in the normal group is used for obtaining the typical degree corresponding to each work order session in the normal group; correspondingly, the corresponding typical degree of each work order session in the abnormal group is obtained.
The synchronization strategy determining module 50 is configured to obtain a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group, and obtain an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and obtaining the predicted differential data volume of the work order session to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.
Because the data in the work order session is mainly data generated by human-human interaction and human behaviors are different, the variation of the differential data amount corresponding to each work order session is obviously different from the expected reference fluctuation, and the difference may be different in each work order synchronization process, so that the fluctuation range is determined according to the differential data amount variation of the work order session in multiple normal processes.
Specifically, the first 5 work order sessions with a larger typical degree in the normal group are selected, that is, the typical degrees of all the work order sessions in the normal group are arranged in a descending order, the work order sessions corresponding to the first 5 typical degrees after the descending order are taken as reference samples, the differential data amount corresponding to the reference samples forms a normal fluctuation interval of the differential data amount, the maximum value of the differential data amount in the reference samples is the normal fluctuation upper limit of the differential data amount, and the minimum value of the differential data amount in the reference samples is the normal fluctuation lower limit of the differential data amount; the normal fluctuation interval of the differential data volume of the work order conversation determined by the method can better represent the differential change data of batch synchronization, so that the differential data volume in the batch synchronization process is more consistent with the typical load of distributed storage.
Correspondingly, 5 work order sessions with smaller typical degrees in the abnormal group are selected, namely the typical degrees of all the work order sessions in the abnormal group are arranged in an ascending order, the work order sessions corresponding to the first 5 typical degrees after the ascending order are abnormal samples, the differential data amount corresponding to the abnormal samples form an abnormal fluctuation interval of the differential data amount, the maximum value of the differential data amount in the abnormal samples is the abnormal fluctuation upper limit of the differential data amount, and the minimum value of the differential data amount in the abnormal samples is the abnormal fluctuation lower limit of the differential data amount; and determining the upper and lower limits of the to-be-synchronized process of the work order conversation based on the abnormal fluctuation interval of the differential data volume of the abnormal sample, and when the upper and lower limits are exceeded, synchronizing in other modes.
Further, training an LSTM prediction network according to the differential data volume corresponding to all work order sessions in the normal group to realize the prediction of the subsequent differential data volume; labeling the obtained change sequence of the differential data volume of the worksheet session in the normal group, inputting the labeled change sequence into an LSTM network, wherein each differential data volume is a sample, and moving the sample to the future for 10 detection times; deleting a section of sample without a label after moving, wherein the movement is a phase movement, and therefore the LSTM prediction network can predict the size of the differential data volume in the future work order session; weight distribution is carried out on a loss function generated by each sample in the training process, the error generated in the training process is a mean square error, and the loss function is as follows:
Figure 671117DEST_PATH_IMAGE039
wherein, in the step (A),
Figure 652980DEST_PATH_IMAGE040
is shown as
Figure 159922DEST_PATH_IMAGE041
The greater the typical degree of the work order conversation corresponding to each sample is, the higher the accuracy degree of the corresponding sample is, so that the current network prediction is more accurate.
The method comprises the steps of detecting the size of the differential data volume in the current work order conversation through the prediction of a neural network, finding out the abnormal fluctuation condition of the differential data volume as soon as possible, and comparing and judging the abnormal fluctuation condition. In the actual use process, according to the size of the prediction difference data volume at the moment after the LSTM prediction network outputs, comparing the obtained normal fluctuation interval and the obtained abnormal fluctuation interval, and determining a corresponding synchronization strategy.
When the size of the predicted differential data volume is larger than the upper limit of normal fluctuation or not larger than the lower limit of normal fluctuation, namely the predicted differential data volume is not in a normal fluctuation interval, single synchronization asynchronous with batch synchronization is immediately carried out, the differential data is common in normal work order conversation, and the time for immediate synchronization is extremely short when the differential data is less; for more differential data, immediate synchronization can avoid further growth of the data.
When the predicted difference data volume is smaller than the abnormal fluctuation lower limit, the synchronization of the work order session is postponed, the change of the work order session is continuously tracked until the difference data volume is larger than the normal fluctuation lower limit, and the work order session is immediately synchronized, so that the phenomenon that a large amount of messages suddenly rush to the server node due to network abnormality is prevented, and the following difference data volume is too large and even exceeds the abnormal fluctuation upper limit.
When the size of the predicted difference data volume is in the abnormal fluctuation interval, dividing the abnormal fluctuation interval into N equal parts, and marking each equal-divided interval as a grade from small to large in sequence
Figure 947750DEST_PATH_IMAGE042
N is a positive integer, set by the implementer according to the synchronization policy; the levels in the abnormal fluctuation interval can be corresponded according to the predicted differential data amount, and different priorities are obtained according to the corresponding levels.
As an example, assume that the normal data priority is 90 and the abnormal data priority is 90If the data size is lower than the normal data priority, the priority of the predicted differential data size in the abnormal fluctuation interval may be:
Figure 725213DEST_PATH_IMAGE043
(ii) a Wherein the content of the first and second substances,
Figure 143556DEST_PATH_IMAGE044
indicating a priority;
Figure 875145DEST_PATH_IMAGE045
a level indicating that the predicted differential data amount is in the abnormal fluctuation section;
Figure 669926DEST_PATH_IMAGE046
the offset value, which represents manual debugging for such abnormal situations, is set to 1 in the embodiment of the present invention.
Therefore, when the predicted differential data volume is in the abnormal fluctuation interval and is closer to the abnormal fluctuation upper limit, the priority corresponding to the predicted differential data volume is lower; asynchronous synchronization is carried out in a mode of lowest synchronous priority, so that IO pressure of other distributed storage nodes is relieved; asynchronous synchronous sequence, refers to another mechanism that queues for synchronization regardless of the mechanism of batch synchronization, for sequentially handling requests in exceptional cases.
And when the size of the predicted differential data volume is larger than the abnormal fluctuation upper limit, carrying out asynchronous synchronization from high priority to low priority.
When the prediction difference prediction data volume is in a normal fluctuation interval, directly carrying out batch synchronization under the condition of not exceeding the queue according to the priority in the queue; if the queue length is exceeded, waiting for next synchronization; the size of the queue is determined by the implementer according to the service situation, and the size of the queue in the embodiment of the invention is 1000.
It should be noted that, if there is an intersection between the normal fluctuation interval and the abnormal fluctuation interval and the size of the predicted differential data amount is in the intersection, the processing is performed based on the criterion that the predicted differential data amount is in the normal fluctuation interval.
In summary, the embodiment of the present invention includes a delay suitability obtaining module 10, a synchronization condition stability obtaining module 20, a grouping module 30, a representative degree obtaining module 40, and a synchronization policy determining module 50; the time delay suitability acquiring module is used for acquiring time delay data of each work order session before synchronization, forming a data set according to all the time delay data, and further acquiring the time delay suitability of the work order session based on the mean value and the variance of the time delay data in the data set; the synchronization condition stability acquisition module is used for acquiring the corresponding differential data volume of each work order session during the last synchronization and acquiring the synchronization condition stability of the corresponding work order session according to the change of the differential data volume and the time delay suitability; the grouping module is used for acquiring the difference distance according to the time delay suitability and the synchronization condition stability corresponding to each work order session, dividing all the work order sessions into two groups based on the difference distance by using a k-means clustering algorithm, and dividing the two groups into a normal group and an abnormal group according to the difference data volume corresponding to each work order session in the groups; the typical degree obtaining module is used for calculating the typical degree of each work order session in the group, and the typical degree is obtained by a semantic descriptor corresponding to the work order session and a difference data change trend between any two work order sessions. And then, a normal fluctuation interval and an abnormal fluctuation interval of the differential data variation are obtained by combining the typical degree of each work order session through a synchronization strategy determining module, LSTM prediction network is trained by utilizing differential data of the work order sessions in a normal group to obtain the predicted differential data volume of the work order sessions in the process of synchronization, and the synchronization strategy of the work order sessions is determined according to the relationship between the predicted differential data volume and the normal fluctuation interval and the abnormal fluctuation interval, so that the condition of IO request accumulation caused by network abnormality and other factors is avoided in time, and the service can be operated quickly and stably.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit of the present invention are intended to be included therein.

Claims (8)

1. A big data service management system based on distributed storage is characterized by comprising the following modules:
the time delay suitability acquisition module is used for acquiring time delay data of each work order session in a time period from the last synchronization to the next synchronization, forming all the time delay data into a data set and acquiring the time delay suitability of the work order session based on the data set;
the synchronization condition stability acquisition module is used for acquiring the differential data volume of each work order session at a plurality of sampling moments in the last synchronization and acquiring the synchronization condition stability of the work order session according to the differential data volume and the time delay suitability;
the grouping module is used for acquiring the difference distance of any two work order conversations according to the time delay suitability and the synchronization condition stability of each work order conversation and dividing all the work order conversations into a normal group and an abnormal group based on the difference distance;
the typical degree obtaining module is used for obtaining the work order semantic descriptors of each work order session, and for a normal group and an abnormal group, the difference data change trend of any two work order sessions is obtained based on the work order semantic descriptors of each work order session in the group; obtaining the typical degree of the corresponding work order conversation according to the sum of the variation trends of all the differential data corresponding to each work order conversation in the group;
the synchronous strategy determining module is used for acquiring a normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group and acquiring an abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group; and acquiring the predicted differential data volume of the work order session to-be-synchronized process by using an LSTM prediction network, and determining a synchronization strategy based on the predicted differential data volume, the normal fluctuation interval and the abnormal fluctuation interval.
2. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the time delay suitability of the work order session based on the data set in the time delay suitability obtaining module comprises;
acquiring the average value and the variance of all delay data in the data set, calculating the difference between the average value and a preset proper delay size, and acquiring the delay suitability based on the difference and the variance;
the time delay suitability is in a negative correlation relation with the difference value, and the time delay suitability is in a negative correlation relation with the variance.
3. The big data service management system based on distributed storage according to claim 1, wherein the method for acquiring the synchronization condition stability of the work order session according to the differential data amount and the delay suitability in the synchronization condition stability acquisition module comprises:
acquiring the variation range of all the differential data volumes corresponding to the work order session, wherein the variation range refers to the difference value between the differential data volume at the last sampling moment and the differential data volume at the first sampling moment;
acquiring a difference absolute value of the difference data quantity corresponding to every two adjacent sampling moments in all the difference data quantities corresponding to the work order session as a difference value, selecting a maximum value in all the difference values, and calculating a summation result of the maximum value of the difference value and the variation range;
and acquiring the stability of the synchronization condition of the work order session according to the summation result and the time delay suitability, wherein the stability of the synchronization condition and the summation result are in a negative correlation relationship, and the stability of the synchronization condition and the time delay suitability are in a positive correlation relationship.
4. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the difference distance between any two work order sessions in the grouping module according to the time delay suitability and the synchronization condition stability of each work order session comprises:
acquiring a square result of a difference value between the stability of the synchronization conditions corresponding to any two work order sessions, and calculating a regular distance of dynamic time corresponding to the two work order sessions;
taking the negative number of the square result as a power exponent to obtain an exponential function; obtaining the difference distance according to the exponential function and the dynamic time warping distance;
the difference distance and the exponential function are in positive correlation, and the difference distance and the dynamic time warping distance are in negative correlation.
5. The big data service management system based on distributed storage according to claim 3, wherein the method for acquiring the differential data change trend of any two work order sessions based on the work order semantic descriptor of each work order session in a group in the representativeness acquisition module comprises:
calculating the morphological similarity distance between the differential data volumes corresponding to every two work order sessions in the group; carrying out difference on the variation range corresponding to the two work order sessions and solving an absolute value to obtain a difference value;
obtaining the similarity between the semantic descriptors corresponding to the two work order conversations;
and constructing an exponential function by taking the negative number of the morphological similarity distance as a power exponent, multiplying the exponential function and the similarity to obtain a product result, wherein the ratio of the product result to the difference value is the change trend of the difference data of the two work order conversations.
6. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the normal fluctuation interval of the differential data volume according to the typical degree of each work order session in the normal group in the synchronization policy determination module comprises:
and performing descending order arrangement on the representative degrees of all the work order conversations in the normal group, taking the work order conversations corresponding to the first 5 representative degrees after descending order arrangement as reference samples, and forming a normal fluctuation interval of the differential data quantity by using the differential data quantity corresponding to the reference samples.
7. The big data service management system based on distributed storage according to claim 1, wherein the method for obtaining the abnormal fluctuation interval of the differential data volume according to the typical degree of each work order session in the abnormal group in the synchronization policy determination module comprises:
and (4) performing ascending sequence arrangement on the representative degrees of all the work order sessions in the abnormal group, taking the work order sessions corresponding to the first 5 representative degrees after the ascending sequence arrangement as abnormal samples, and forming an abnormal fluctuation interval of the differential data quantity by using the differential data quantity corresponding to the abnormal samples.
8. The big data traffic management system based on distributed storage according to claim 1, wherein the training data of the LSTM prediction network in the synchronization policy determination module is a differential data volume corresponding to the work order session in the normal group.
CN202211256997.8A 2022-10-14 2022-10-14 Big data service management system based on distributed storage Active CN115330422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211256997.8A CN115330422B (en) 2022-10-14 2022-10-14 Big data service management system based on distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211256997.8A CN115330422B (en) 2022-10-14 2022-10-14 Big data service management system based on distributed storage

Publications (2)

Publication Number Publication Date
CN115330422A true CN115330422A (en) 2022-11-11
CN115330422B CN115330422B (en) 2023-04-28

Family

ID=83913651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211256997.8A Active CN115330422B (en) 2022-10-14 2022-10-14 Big data service management system based on distributed storage

Country Status (1)

Country Link
CN (1) CN115330422B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713625A (en) * 2022-11-18 2023-02-24 盐城众拓视觉创意有限公司 Method for rapidly combining teaching real-recorded video and courseware background into film
CN116401600A (en) * 2023-03-07 2023-07-07 深圳市丰宜科技有限公司 Method, device and storage medium for acquiring synchronous node information of work order system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278360A1 (en) * 2004-06-14 2005-12-15 Boyd Kenneth W Apparatus, system, and method for providing efficient disaster recovery storage of data using differencing
WO2022047658A1 (en) * 2020-09-02 2022-03-10 大连大学 Log anomaly detection system
CN115129865A (en) * 2022-05-13 2022-09-30 腾讯科技(深圳)有限公司 Work order classification method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278360A1 (en) * 2004-06-14 2005-12-15 Boyd Kenneth W Apparatus, system, and method for providing efficient disaster recovery storage of data using differencing
WO2022047658A1 (en) * 2020-09-02 2022-03-10 大连大学 Log anomaly detection system
CN115129865A (en) * 2022-05-13 2022-09-30 腾讯科技(深圳)有限公司 Work order classification method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李锐: "无线网络优化中全流程管控平台设计与实现", 《信息科技》 *
王卓君等: "UCM-PPM:基于用户分级的多参量Web预测模型", 《南京大学学报(自然科学)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713625A (en) * 2022-11-18 2023-02-24 盐城众拓视觉创意有限公司 Method for rapidly combining teaching real-recorded video and courseware background into film
CN116401600A (en) * 2023-03-07 2023-07-07 深圳市丰宜科技有限公司 Method, device and storage medium for acquiring synchronous node information of work order system

Also Published As

Publication number Publication date
CN115330422B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN115330422A (en) Big data service management system based on distributed storage
CN106383877B (en) Social media online short text clustering and topic detection method
Lin et al. Smoothing techniques for adaptive online language models: topic tracking in tweet streams
CN102637178A (en) Music recommending method, music recommending device and music recommending system
CN110717093B (en) Movie recommendation system and method based on Spark
CN109145114B (en) Social network event detection method based on Kleinberg online state machine
EP3232336A1 (en) Method and device for recognizing stop word
US20100082625A1 (en) Method for merging document clusters
CN111159127A (en) Log analysis method and device based on Apriori algorithm
CN108595411B (en) Method for acquiring multiple text abstracts in same subject text set
CN114168687A (en) Method and system for establishing data association network of automobile industry
CN116561230B (en) Distributed storage and retrieval system based on cloud computing
CN108509449B (en) Information processing method and server
CN111930949A (en) Search string processing method and device, computer readable medium and electronic equipment
CN110955757A (en) Photovoltaic power station equipment log retrieval method and system
CN116432638A (en) Text keyword extraction method and device, electronic equipment and storage medium
CN113127639B (en) Abnormal conversation text detection method and device
US20020143806A1 (en) System and method for learning and classifying genre of document
CN114943036A (en) push similar article judgment method and device, storage medium and electronic equipment
CN111737461B (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114547316B (en) System, method, device, medium, and terminal for optimizing aggregation-type hierarchical clustering algorithm
CN112632154A (en) Method and device for determining parallel service quantity and time interval based on time data
CN105260467A (en) Short message classification method and apparatus
CN115329748A (en) Log analysis method, device, equipment and storage medium
CN110489741B (en) Microblog burst topic detection method based on burst word detection and filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant