WO2023169274A1 - 数据处理方法、装置、存储介质以及处理器 - Google Patents

数据处理方法、装置、存储介质以及处理器 Download PDF

Info

Publication number
WO2023169274A1
WO2023169274A1 PCT/CN2023/078962 CN2023078962W WO2023169274A1 WO 2023169274 A1 WO2023169274 A1 WO 2023169274A1 CN 2023078962 W CN2023078962 W CN 2023078962W WO 2023169274 A1 WO2023169274 A1 WO 2023169274A1
Authority
WO
WIPO (PCT)
Prior art keywords
clustering
time series
series data
module
data
Prior art date
Application number
PCT/CN2023/078962
Other languages
English (en)
French (fr)
Inventor
王巍巍
陈曦
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023169274A1 publication Critical patent/WO2023169274A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of data processing, and specifically, to a data processing method, device, storage medium and processor.
  • Time series data refers to a sequence formed by arranging various values of a certain phenomenon or a certain statistical indicator at different times in chronological order. It can be widely used in various fields. For example, in the field of Internet of Things, time series The results obtained after data clustering can be applied to equipment operating status monitoring, indicator correlation analysis, and fault diagnosis.
  • clustering methods in related technologies require artificial determination of initial clustering centers, but selecting different initial clustering centers will result in different clustering results, thereby reducing the accuracy of the clustering results.
  • Embodiments of the present invention provide a data processing method, device, storage medium and processor to at least solve the technical problem of low accuracy in clustering time series data using clustering methods used in related technologies.
  • a data processing method including: acquiring multiple time series data generated during the execution of a target task; inputting the multiple time series data into the target model for clustering processing, and obtaining Clustering results, in which the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering results; according to the clustering results Analyze the performance of target tasks.
  • another data processing method including: a cloud server obtains multiple time series data; the cloud server uses a target model to process the multiple time series data to obtain a clustering result, Among them, the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data, and the clustering module is used to cluster features to obtain clustering results; the cloud server returns the clustering results to the client. end.
  • a data processing device including: a first acquisition unit for acquiring multiple time series data generated during the execution of the target task; a first processing unit for Input multiple time series data into the target model for clustering processing to obtain clustering results.
  • the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data.
  • the clustering module is used It is used to cluster the features and obtain the clustering results; the first analysis unit is used to analyze the execution of the target task according to the clustering results.
  • a storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute any one of the above data processing methods.
  • a processor is also provided, which is characterized in that the processor is used to run a program, wherein when the program is running, any one of the above data processing methods is executed.
  • a clustering result is obtained, where the target model at least includes coding Module and clustering module, the encoding module is used to extract features of multiple time series data, and the clustering module is used to cluster features to obtain clustering results; analyze the execution of the target task based on the clustering results.
  • the coding module and the clustering module into a target model, and coding the time series data according to the coding module, the features of multiple time series data are obtained, and the clustering module is used to cluster the features to obtain the clustering results.
  • Figure 1 is a hardware structural block diagram of a computer terminal according to an embodiment of the present invention.
  • Figure 2 is an interactive schematic diagram of an optional computer terminal according to an embodiment of the present invention.
  • Figure 3 is a flow chart of a data processing method provided according to an embodiment of the present invention.
  • Figure 4 is a flow chart of an optional data processing method provided according to an embodiment of the present invention.
  • Figure 5 is a flow chart of another data processing method provided according to an embodiment of the present invention.
  • Figure 6 is a schematic diagram of a data processing device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another data processing device according to an embodiment of the present invention.
  • Figure 8 is a structural block diagram of an optional computer terminal provided according to an embodiment of the present invention.
  • Time series data refers to sequence data formed by arranging various values of a certain phenomenon or a certain statistical indicator at different times in chronological order.
  • a data processing method embodiment is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, Although a logical sequence is shown in the flowcharts, in some cases the steps shown or described may be performed in a sequence different from that herein.
  • FIG. 1 is a hardware structural block diagram of a computer terminal according to an embodiment of the present invention.
  • the computer terminal 10 may include one or more (shown as 102a, 102b, ..., 102n in the figure) processors (the processors may include but are not limited to microprocessors)
  • a processing device such as an MCU or a programmable logic device FPGA
  • a memory 104 for storing data
  • a transmission module 106 for communication functions.
  • the computer terminal 10 may also include: a display, an input/output interface (I/O interface), a universal serial bus (USB) port (which may be included as one of the ports of the BUS bus), a network interface, a power supply and/or Or camera.
  • I/O interface input/output interface
  • USB universal serial bus
  • FIG. 1 is only illustrative, and it does not limit the structure of the above-mentioned electronic device.
  • the computer terminal 10 may also include more or fewer components than shown in FIG. 1 , or have a different configuration than shown in FIG. 1 .
  • the one or more processors and/or other data processing circuitry described above may generally be referred to herein as "data processing circuitry.”
  • the data processing circuit may be embodied in whole or in part as software, hardware, firmware or any other combination.
  • the data processing circuit may be a single independent processing module, or may be fully or partially integrated into any of the other components in the computer terminal 10 (or mobile device).
  • the data processing circuit serves as a processor control (e.g. selection of variable resistor termination paths to interface).
  • the memory 104 can be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method in the embodiment of the present invention.
  • the processor executes various tasks by running the software programs and modules stored in the memory 104.
  • a functional application and data processing that is, to implement the above data processing method.
  • Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include memory located remotely relative to the processor, and these remote memories may be connected to the computer terminal 10 through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the transmission device 106 is used to receive or send data via a network.
  • Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of the computer terminal 10 .
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet wirelessly.
  • RF Radio Frequency
  • the display may be, for example, a touch-screen liquid crystal display (LCD), which may enable a user to interact with the user interface of the computer terminal 10 (or mobile device).
  • LCD liquid crystal display
  • FIG. 1 The hardware structure block diagram shown in Figure 1 can not only be used as an exemplary block diagram of the above-mentioned computer terminal 10 (or mobile device), but also can be used as an exemplary block diagram of the above-mentioned server.
  • Figure 2 is shown as a block diagram An embodiment of using the computer terminal 10 (or mobile device) shown in FIG. 1 as the receiving end is provided.
  • computer terminal 10 (or mobile device) may be connected to one or more servers 108 via a data network connection or electronic connection.
  • the computer terminal 10 (or mobile device) may be a mobile phone or a PC.
  • the data network connection may be a local area network connection, a wide area network connection, an Internet connection, or other types of data network connections.
  • a computer terminal 10 may be configured to connect to a network service 110 executed by a server (eg, a security server) or a group of servers.
  • Network services 110 are network-based user services, such as social networks, cloud resources, email, online payments, or other online applications.
  • Figure 3 is a flow chart of a data processing method provided according to Embodiment 1 of the present invention.
  • the target task can be a sensor monitoring task in the field of Internet of Things.
  • a variety of data can be generated.
  • Each type of data can be accumulated according to the progress of the task, so as to obtain the time series data corresponding to each type of data.
  • monitoring data including vibration intensity, temperature, humidity, etc. can be detected through a variety of sensors, and as the experiment progresses, various experimental data corresponding to vibration intensity, temperature, humidity, etc. can be generated. Time series data for each type of experimental data.
  • the model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering results.
  • the target model can be a machine learning model that has completed training.
  • the machine learning model can be divided into two parts during data processing.
  • the time series data is first obtained by encoding the time series data.
  • Feature information where the feature information can include forward sequence features and reverse sequence features, and two feature information in the latent space can be obtained through forward sequence features and reverse sequence features: mean and variance, thereby obtaining multiple feature information.
  • the clustering module is used to cluster the characteristic information of the multiple time series data to obtain the clustering results.
  • the execution of the target task is analyzed based on the clustering results.
  • analyzing the execution status of the target task according to the clustering results includes: when an execution failure occurs in the target task, determining the fault monitoring quantity sequence data; obtaining the fault monitoring quantity sequence data Data belonging to the same category, and the acquired data is determined as fault-related data.
  • monitoring data including vibration intensity, temperature, humidity, etc. can be detected through a variety of sensors, and as the experiment progresses, various experimental data corresponding to vibration intensity, temperature, humidity, etc. can be generated.
  • Time series data of each experimental data, and after clustering processing, temperature and humidity are divided into one category. That is, it can be considered that there is a correlation between the two characteristics of temperature and humidity.
  • Humidity is detected based on the correlation relationship to determine whether the temperature abnormality is caused by abnormal humidity.
  • a clustering result is obtained, where the target model at least includes coding module and clustering module.
  • the coding module is used to extract features of multiple time series data.
  • the clustering module is used to cluster features and obtain clustering results. According to the clustering results, the execution of the target task is analyzed.
  • the method before inputting multiple time series data into the target model for clustering processing and obtaining the clustering results, the method also The method includes: obtaining multiple sample time series data; training a preset coding module through multiple sample time series data, and determining a corresponding first loss value in the process of training the preset coding module, wherein when training the preset coding module, The preset coding module processes multiple sample time series data to obtain multiple sample features; obtains multiple sample features generated in the process of training the preset coding module, trains the preset clustering module through multiple sample features, and determines the training preset In the process of clustering module, The corresponding second loss value; determine the target loss value based on the first loss value and the second loss value, and obtain the corresponding encoding module and clustering module when the target loss value is less than the loss threshold; combine the obtained encoding module and clustering module Modules are combined into a target model.
  • the target model needs to be trained first to ensure that the target loss value is less than the loss threshold, thereby making the clustering results more accurate.
  • the multiple sample time series data can be time series data of known categories, and each sample time series data and its corresponding category are input into the target model, and the preset coding in the target model is trained through the multiple sample time series data. module to obtain the encoded sample feature information.
  • the process of training the preset encoding module can be determined by decoding the encoded sample time series data and comparing the decoded data with the corresponding sample time series data after obtaining the decoded data.
  • the corresponding first loss value On the other hand, the clustering module is trained through the sample feature information, and the category prediction results corresponding to each sample time series data are obtained through the clustering module.
  • the category prediction results are compared with the categories corresponding to the sample time series data. According to the comparison The result determines the second loss value.
  • the target loss value can be determined based on the first loss value and the second loss value.
  • the target loss value is greater than the loss threshold, the training parameters of the target model need to be adjusted until the target loss value is less than the loss threshold, thereby improving the accuracy of the clustering results.
  • the type of coding module is at least one of the following: sparse autoencoder, variational autoencoder.
  • the target encoding module can be a sparse autoencoder.
  • the structure of the sparse autoencoder is basically the same as that of the autoencoder. The difference is that the hidden layer vector is sparse, that is, there are as many zero elements as possible, which can reduce the risk of model overfitting. .
  • the feature sequence extracted by sparse autoencoder does not have much data meaning and is just a dimensionality reduction representation of time series data.
  • the target encoding model can also use variational autoencoders.
  • the variational autoencoder is an improvement of the autoencoder. It returns multiple probability models through the latent space and uses them to describe the time series data as data features. It should be noted that the variational autoencoder returns a distribution in the latent space rather than a single point, and a regularization term for the returned distribution can be added to the loss function to solve the problem of latent space irregularities to ensure Better organization of latent space.
  • the type of encoding module in this embodiment is not limited to the above two encoder types, and other types of encoders can also be used.
  • the preset code is trained through multiple sample time series data module, and determining the corresponding first loss value in the process of training the preset encoding module includes: inputting multiple sample time series data into the preset encoding module, processing to obtain multiple sample features of the multiple sample time series data; through the preset untie The code module restores multiple sample features to obtain multiple restored time series data; the first loss value is determined based on the difference between the multiple sample time series data and the restored multiple time series data.
  • the normal distribution can be sampled through the sampling layer in the preset decoding module, where the normal distribution is passed through the sample features.
  • the mean and variance are characterized, and through three one-dimensional convolution layers and an oversampling layer, the data is mapped into the dimensions of the original data to obtain the restored time series data. Further, the restored time series data is compared with the The sample time series data are compared, and the first loss value is determined based on the comparison results.
  • the coding module includes: a convolutional layer, Used to extract the first feature of time series data; the pooling layer, connected to the convolution layer, is used to reduce the dimension of the first feature; the first long short-term memory model, connected to the pooling layer, is used to extract the first feature positive sequence features; the second long short-term memory model, connected with the pooling layer, is used to extract the reverse sequence features in the first feature; the first fully connected layer: with the first long short-term memory model and the second long short-term memory model Connection, used to determine the first distribution parameter based on forward sequence features and forward sequence features; second fully connected layer: connected to the first long short-term memory model and the second long short-term memory model, used to determine based on forward sequence features and reverse sequence features The second distribution parameter; determines the forward sequence feature, the reverse sequence feature, the first distribution parameter and the second
  • the convolution layer can be an activation function layer obtained by adding an activation function to a one-dimensional convolution in a convolutional neural network, used to extract the first feature of the input time series data, and send the first feature to the pooling layer .
  • Pooling layers typically act on each input feature separately and reduce its size, thereby reducing the dimensionality of the first feature of the time series data.
  • a bidirectional LSTM layer Long Short-Term Memory, long short-term memory network
  • the bidirectional LSTM layer is extracted to reduce the The forward sequence feature and the reverse sequence feature of the first feature of latitude, and the first distribution parameter of the time series data is determined based on the forward sequence feature and the reverse sequence feature at the first fully connected layer, where the first distribution parameter can be the mean
  • the second distribution parameter of the time series data is determined based on the forward sequence feature and the reverse sequence feature, where the second distribution parameter can be the variance
  • the variance, mean, forward sequence feature and reverse sequence feature are used as Characteristic information of time series data.
  • the clustering algorithm used by the clustering module is at least one of the following: K-means clustering algorithm, hierarchical clustering algorithm.
  • the clustering module can use the K-means clustering algorithm.
  • the K-means clustering algorithm selects K points as the cluster centers of the initial aggregation, calculates the distance from each sample point to the K cluster cores, and finds the distance from the cluster center. Point the nearest cluster core, assign it to the corresponding cluster, and recalculate the center of gravity (average distance from the center) of each cluster after all points are assigned to the cluster, and set it as the new "cluster core". After repeated iterations The above steps are completed until the preset conditions are reached, and the clustering results are obtained.
  • the K-means clustering algorithm requires artificial determination of the initial clustering center, but different initial clustering centers must be selected. Different clustering results will be obtained if the initial clustering center is used, thereby reducing the accuracy of the clustering results.
  • the clustering module can also use hierarchical clustering algorithm.
  • the clustering algorithm adopted by the clustering module is a hierarchical clustering algorithm
  • the hierarchical clustering algorithm is used to cluster the features.
  • Class including: determining each feature as a class to obtain multiple first-level classes; calculating the minimum distance between every two first-level classes, and classifying the two first-level classes with the shortest minimum distance Merge to obtain multiple second-level classes; calculate the minimum distance between each two second-level classes, and merge the two first-level classes with the shortest minimum distance until multiple first-level classes Classes are merged into one class.
  • the features corresponding to each time series data in the clustering module can be regarded as one category, multiple first-level categories can be obtained, and the difference between each two features can be calculated. distance between each feature, and obtain the minimum distance feature corresponding to each feature, merge each feature with the corresponding minimum distance feature, and classify them into one class to obtain multiple second-level classes.
  • the distance between each two classes is calculated again based on the coordinates of each class, and the minimum distance class corresponding to each class is obtained, and each class is compared with the corresponding The minimum distance classes are merged and classified into one class to obtain multiple third-level classes. Repeat the above process until all classes are classified into one class, thereby obtaining the class relationship between each feature. This allows different time series data to be classified according to preset levels.
  • the hierarchical clustering algorithm has the characteristics of not needing to predetermine the number of clusters, can clearly display the hierarchical relationship between classes, can cluster into any shape, and is not affected by a single outlier. Thereby achieving the effect of improving clustering accuracy.
  • the clustering methods that can be used by the clustering module in this embodiment are not limited to the above two clustering algorithms, and other types of clustering algorithms can also be used.
  • x and y are the characteristics of two time series data, ED(x,y) is the Euclidean distance between x and y, and CF(x,y) is the time series complexity factor;
  • N is the number of time series data
  • t is the sequence number of time series data
  • CF(x,y) max(CE(x),CE(y))/min(CE(x),CE(y)), CE(x) is the timing complexity factor corresponding to x, CE( y) is the timing complexity factor corresponding to y;
  • the data processing method according to the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases The former is a better implementation.
  • the technical solution of the present invention can be embodied in the form of a software product in essence or the part that contributes to the existing technology.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention.
  • Figure 4 is a flow chart of an optional data processing method provided according to Embodiment 1 of the present invention, as shown in Figure 4:
  • the target model After the target model receives the time series data, it first extracts the first feature of the input time series data in the activation function layer obtained by adding the activation function to the one-dimensional convolution in the encoding module, and sends the first feature to the pooling layer. Pooling layers typically act on each input feature separately and reduce its size, thereby reducing the dimensionality of the first feature of the time series data.
  • a bidirectional LSTM layer Long Short-Term Memory, long short-term memory network
  • the bidirectional LSTM layer is extracted to reduce the The forward sequence feature and the reverse sequence feature of the first feature of latitude, and the first distribution parameter of the time series data is determined based on the forward sequence feature and the reverse sequence feature at the first fully connected layer, where the first distribution parameter can be the mean
  • the second distribution parameter of the time series data is determined based on the forward sequence feature and the reverse sequence feature, where the second distribution parameter can be the variance
  • the variance, mean, forward sequence feature and reverse sequence feature are used as Characteristic information of time series data.
  • the two fully connected layers can sample the normal distribution through the sampling layer in the preset decoding module.
  • the normal distribution is characterized by the mean and variance in the sample features, and Through three one-dimensional convolution layers and one oversampling layer, the data is mapped into the dimensions of the original data to obtain the restored time series data. Further, the restored time series data is compared with the sample time series data. And determine the first loss value based on the comparison results.
  • the encoding module After the encoding module obtains the characteristic information of the time series data, it sends the characteristic information to the clustering module, and performs clustering processing on the multiple characteristic information through the clustering module, thereby obtaining the time series data pair.
  • the corresponding clustering results are obtained, and after obtaining the clustering results corresponding to multiple time series data, the execution of the target task is analyzed based on the clustering results.
  • Figure 5 is a flow chart of a data processing method provided according to Embodiment 1 of the present invention. As shown in Figure 5, the method includes:
  • the cloud server obtains multiple time series data.
  • the target task can be a sensor monitoring task in the field of Internet of Things.
  • monitoring data including vibration intensity, temperature, humidity, etc. can be detected through a variety of sensors, and as the experiment progresses, various experimental data corresponding to vibration intensity, temperature, humidity, etc. can be generated. Time series data for each type of experimental data.
  • the cloud server uses the target model to process multiple time series data and obtain clustering results.
  • the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data.
  • the clustering module Used to cluster features and obtain clustering results.
  • the target model used by the cloud server can be a machine learning model that has completed training.
  • the machine learning model can be divided into two parts during data processing.
  • the first part first obtains the time by encoding the time series data.
  • Feature information of sequence data where the feature information can include forward sequence features and reverse sequence features, and two feature information in the latent space can be obtained through forward sequence features and reverse sequence features: mean and variance, thereby obtaining multiple feature information.
  • the clustering module is used to cluster the characteristic information of the multiple time series data to obtain the clustering results.
  • the cloud server returns the clustering results to the client.
  • the cloud server can return the clustering results to the client, and the client can analyze the execution of the target task according to the rules corresponding to the determined categories.
  • monitoring data including vibration intensity, temperature, humidity, etc. can be detected through a variety of sensors, and as the experiment progresses, various experimental data corresponding to vibration intensity, temperature, humidity, etc. can be generated.
  • Time series data of each experimental data, and after clustering processing, temperature and humidity are divided into one category. That is, it can be considered that there is a correlation between the two characteristics of temperature and humidity.
  • Humidity is detected based on the correlation relationship to determine whether the temperature abnormality is caused by abnormal humidity.
  • a data processing device for implementing the above data processing method is also provided. As shown in Figure 6, the device includes:
  • the first acquisition unit 61 is used to acquire multiple time series data generated during the execution of the target task.
  • the first processing unit 62 is used to input multiple time series data into the target model for clustering processing to obtain clustering results.
  • the target model at least includes a coding module and a clustering module, and the coding module is used to extract multiple time series data.
  • the clustering module is used to cluster features and obtain clustering results.
  • the first analysis unit 63 is used to analyze the execution of the target task according to the clustering results.
  • the above-mentioned determination of the first acquisition unit 61, the first processing unit 62 and the first analysis unit 63 corresponds to step S31, step S32 and step S33 in Embodiment 1, and the two modules correspond to the corresponding steps.
  • the implementation examples and application scenarios are the same, but are not limited to the content disclosed in the above-mentioned Embodiment 1.
  • the above modules as part of the device can run in the computer terminal 10 provided in the first embodiment.
  • the device further includes: a second acquisition unit, used to acquire multiple sample time series data; a first determination unit, used to train through multiple sample time series data Preset coding module, and determine the corresponding first loss value in the process of training the preset coding module, wherein when training the preset coding module, the preset coding module processes multiple sample time series data to obtain multiple sample features; Three acquisition units, used to acquire multiple sample features generated in the process of training the preset coding module, train the preset clustering module through the multiple sample features, and determine the corresponding second loss in the process of training the preset clustering module value; the second determination unit is used to determine the target loss value based on the first loss value and the second loss value, and obtain the corresponding encoding module and clustering module when the target loss value is less than the loss threshold; the combination unit is used to combine The obtained encoding module and clustering module are combined into the target model.
  • a second acquisition unit used to acquire multiple sample time series data
  • a first determination unit used to train through multiple sample time series data Preset
  • the first determination unit includes: an input module for inputting multiple sample time series data into a preset encoding module, and processing multiple sample time series data to obtain multiple Sample characteristics; the restoration module is used to restore multiple sample characteristics through the preset decoding module to obtain multiple restored time series data; the first determination module is used to restore multiple sample time series data based on the multiple sample time series data and the restored multiple time series data. The difference between the time series data determines the first loss value.
  • the type of encoding module is at least one of the following: sparse autoencoder, variational autoencoder.
  • the encoding module when the type of the encoding module is a variational autoencoder, includes: a convolution layer, used to extract the first feature of the time series data; pooling The first long-short-term memory model, connected to the pooling layer, is used to extract the positive sequence features in the first feature; the second long-short-term memory model, Connected to the pooling layer, used to extract the reverse sequence features in the first feature; the first fully connected layer: connected to the first long short-term memory model and the second long short-term memory model, used to determine based on the forward sequence features and forward sequence features The first distribution parameter; the second fully connected layer: connected to the first long short-term memory model and the second long short-term memory model, used to determine the second distribution parameter based on the forward sequence features and reverse sequence features; combine the forward sequence features, reverse sequence features
  • the features, first distribution parameters and second distribution parameters are determined as outputs of the encoder.
  • the clustering algorithm used by the clustering module is at least one of the following: K-means clustering algorithm, hierarchical clustering algorithm.
  • x and y are the characteristics of two time series data, ED(x,y) is the Euclidean distance between x and y, and CF(x,y) is the time series complexity factor;
  • N is the number of time series data
  • t is the sequence number of time series data
  • CF(x,y) max(CE(x),CE(y))/min(CE(x),CE(y)), CE(x) is the timing complexity factor corresponding to x, CE( y) is the timing complexity factor corresponding to y;
  • the clustering algorithm adopted by the clustering module is a hierarchical clustering algorithm
  • the hierarchical clustering algorithm is used to cluster the features, including: a second determination module , used to determine each feature as a class to obtain multiple first-level classes; the first calculation module is used to calculate the minimum distance between every two first-level classes, and combine the two with the shortest minimum distance.
  • the first-level classes are merged to obtain multiple second-level classes; the second calculation module is used to calculate the minimum distance between every two second-level classes, and combine the two first-level classes with the shortest minimum distance. Classes at different levels are merged until multiple first-level classes are merged into one class.
  • the first analysis unit 63 includes: a third determination module, used to determine the fault monitoring quantity sequence data when an execution failure occurs in the target task; a fourth determination module, It is used to obtain data belonging to the same category as the fault monitoring quantity sequence data, and determine the obtained data as fault-related data.
  • a data processing device for implementing the above data processing method is also provided. As shown in Figure 7, the device includes:
  • the fourth acquisition unit 71 acquires multiple time series data through the cloud server.
  • the second processing unit 72 uses a target model to process multiple time series data through the cloud server to obtain a clustering result.
  • the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract multiple time series data.
  • Feature, clustering module is used to cluster features and obtain clustering results.
  • the second analysis unit 73 returns the clustering results to the client through the cloud server.
  • the above determination of the fourth acquisition unit 71, the second processing unit 72 and the second analysis unit 73 corresponds to step S51, step S52 and step S53 in Embodiment 3, and the two modules correspond to the corresponding steps.
  • the implementation examples and application scenarios are the same, but are not limited to the content disclosed in the above-mentioned Embodiment 3.
  • the above modules as part of the device can run in the computer terminal 10 provided in the first embodiment.
  • Embodiments of the present invention may provide a computer terminal, which may be any computer terminal device in a computer terminal group.
  • the above computer terminal can also be replaced by a terminal device such as a mobile terminal.
  • the above-mentioned computer terminal may be located in at least one network device among multiple network devices of the computer network.
  • the above-mentioned computer terminal can execute the program code of the following steps in the data processing method: obtain multiple time series data generated during the execution of the target task; input the multiple time series data into the target model for clustering processing , to obtain clustering results, in which the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering results; according to the clustering module Class results analyze the execution of target tasks.
  • FIG. 8 is a structural block diagram of an optional computer terminal provided according to an embodiment of the present invention.
  • the computer terminal 10 may include: one or more (only one is shown in the figure) processors, memories, and transmission devices.
  • the memory can be used to store software programs and modules, such as program instructions/modules corresponding to the data processing methods and devices in embodiments of the present invention.
  • the processor executes various functional applications by running the software programs and modules stored in the memory. and data processing, that is, implementing the above data processing methods.
  • Memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory may further include memory located remotely relative to the processor, and these remote memories may be connected to terminal A through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the processor can call the information stored in the memory and the application program through the transmission device to perform the following steps: obtain multiple time series data generated during the execution of the target task; input the multiple time series data into the target model for clustering processing , to obtain clustering results, in which the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering results; according to the clustering module Class results analyze the execution of target tasks.
  • a data processing solution By acquiring multiple time series data generated during the execution of the target task; inputting the multiple time series data into the target model for clustering processing, the clustering results are obtained.
  • the target model at least includes a coding module and a clustering module, and the coding module
  • the module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering results; analyze the execution of the target task based on the clustering results, and combine the coding module and the clustering module to form a target model, and perform encoding operations on time series data according to the encoding module to obtain the characteristics of multiple time series data, and use the clustering module to cluster the characteristics to obtain clustering results, thus achieving improved time series data clustering.
  • the technical problem is the low accuracy of the clustering method used in clustering time series data.
  • the structure shown in Figure 8 is only illustrative, and the computer terminal 10 can also be a smart phone (such as an Android phone, iOS phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (Mobile Internet Devices, MID), PAD and other terminal equipment.
  • FIG. 8 does not limit the structure of the above-mentioned electronic device.
  • the computer terminal 10 may also include more or less components (such as network interfaces, display devices, etc.) than shown in FIG. 8 , or have a different configuration than that shown in FIG. 8 .
  • the program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • An embodiment of the present invention also provides a storage medium.
  • the above-mentioned storage medium can be used to save the program code executed by the data processing method provided in the above-mentioned Embodiment 1.
  • the above storage medium may be located in any computer terminal in a computer terminal group in the computer network, or in any mobile terminal in a mobile terminal group.
  • the storage medium is configured to store program codes for performing the following steps: obtaining multiple time series data generated during the execution of the target task; inputting the multiple time series data into the target model Perform clustering processing to obtain clustering results.
  • the target model at least includes a coding module and a clustering module.
  • the coding module is used to extract features of multiple time series data
  • the clustering module is used to cluster features to obtain clustering. Results; analyze the execution of the target task based on the clustering results.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present invention can be integrated into one processing unit, or each functional unit can be Each unit physically exists alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present invention is essentially or contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the method described in various embodiments of the present invention.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据处理方法、装置、存储介质以及处理器。其中,该方法包括:获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。本发明解决了相关技术中采用的聚类方法对时间序列数据进行聚类的准确性低的技术问题。

Description

数据处理方法、装置、存储介质以及处理器
本申请要求于2022年03月08日提交中国专利局、申请号为202210228756.6、申请名称为“数据处理方法、装置、存储介质以及处理器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,具体而言,涉及一种数据处理方法、装置、存储介质以及处理器。
背景技术
时间序列数据是指将某种现象或者某一个统计指标在不同时间上的各个数值,按时间先后顺序排列而形成的序列,可以广泛应用于各个领域,例如,在物联网领域中,对时间序列数据进行聚类后得到的结果,可以应用于设备运行状态监测、指标关联分析以及故障诊断等方面。
但是,在相关技术中的聚类方法中,用于聚类的时间序列数据对应的特征没有太多的数据含义,仅仅是对时间序列数据的一种降维表示。并且,相关技术中的聚类方法需要人为确定初始聚类中心,但是选取不同的初始聚类中心会得到不同的聚类结果,从而降低了聚类结果的准确性。
针对相关技术中采用的聚类方法对时间序列数据进行聚类的准确性低的问题,目前尚未提出有效的解决方案。
发明内容
本发明实施例提供了一种数据处理方法、装置、存储介质以及处理器,以至少解决相关技术中采用的聚类方法对时间序列数据进行聚类的准确性低的技术问题。
根据本发明实施例的一个方面,提供了一种数据处理方法,包括:获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。
根据本发明实施例的一个方面,提供了另一种数据处理方法,包括:云服务器获取多个时间序列数据;云服务器采用目标模型对多个时间序列数据进行处理,得到聚类结果, 其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;云服务器返回聚类结果至客户端。
根据本发明实施例的另一方面,还提供了一种数据处理装置,包括:第一获取单元,用于获取在执行目标任务的过程中产生的多个时间序列数据;第一处理单元,用于将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;第一分析单元,用于根据聚类结果分析目标任务的执行情况。
根据本发明实施例的另一方面,还提供了一种存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述任意一种数据处理方法。
根据本发明实施例的另一方面,还提供了一种处理器,其特征在于,处理器用于运行程序,其中,程序运行时执行上述任意一种数据处理方法。
在本发明实施例中,通过获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。通过将编码模块和聚类模块组合成为目标模型,并根据编码模块对时间序列数据进行编码操作,得到多个时间序列数据的特征,并使用聚类模块对特征进行聚类,得到聚类结果,从而实现了提高的时间序列数据聚类的准确性的技术效果,进而解决了相关技术中采用的聚类方法对时间序列数据进行聚类的准确性低的技术问题。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的计算机终端的硬件结构框图;
图2是根据本发明实施例的可选的计算机终端的交互示意图;
图3是根据本发明实施例提供的数据处理方法的流程图;
图4是根据本发明实施例提供的一种可选的数据处理方法的流程图;
图5是根据本发明实施例提供的另一种数据处理方法的流程图;
图6是根据本发明实施例的提供一种数据处理装置的示意图;
图7是根据本发明实施例的提供另一种数据处理装置的示意图;
图8是根据本发明实施例提供的可选的计算机终端的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释:
时间序列数据:指将某种现象或者某一个统计指标在不同时间上的各个数值,按时间先后顺序排列而形成的序列数据。
实施例1
根据本发明实施例,还提供了一种数据处理的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例一所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。图1是根据本发明实施例的计算机终端的硬件结构框图。如图1所示,计算机终端10(或移动设备10)可以包括一个或多个(图中采用102a、102b,……,102n来示出)处理器(处理器可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输模块106。除此以外,还可以包括:显示器、输入/输出接口(I/O接口)、通用串行总线(USB)端口(可以作为BUS总线的端口中的一个端口被包括)、网络接口、电源和/或相机。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
应当注意到的是上述一个或多个处理器和/或其他数据处理电路在本文中通常可以被称为“数据处理电路”。该数据处理电路可以全部或部分的体现为软件、硬件、固件或其他任意组合。此外,数据处理电路可为单个独立的处理模块,或全部或部分的结合到计算机终端10(或移动设备)中的其他元件中的任意一个内。如本申请实施例中所涉及到的, 该数据处理电路作为一种处理器控制(例如与接口连接的可变电阻终端路径的选择)。
存储器104可用于存储应用软件的软件程序以及模块,如本发明实施例中的数据处理方法对应的程序指令/数据存储装置,处理器通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的数据处理方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
显示器可以例如触摸屏式的液晶显示器(LCD),该液晶显示器可使得用户能够与计算机终端10(或移动设备)的用户界面进行交互。
图1示出的硬件结构框图,不仅可以作为上述计算机终端10(或移动设备)的示例性框图,还可以作为上述服务器的示例性框图,一种可选实施例中,图2以框图示出了使用上述图1所示的计算机终端10(或移动设备)作为接收端的一种实施例。如图2所示,计算机终端10(或移动设备)可以经由数据网络连接或电子连接到一个或多个服务器108。一种可选实施例中,上述计算机终端10(或移动设备)可以是手机、PC机。数据网络连接可以是局域网连接、广域网连接、因特网连接,或其他类型的数据网络连接。计算机终端10(或移动设备)可以执行以连接到由一个服务器(例如安全服务器)或一组服务器执行的网络服务110。网络服务110是基于网络的用户服务,诸如社交网络、云资源、电子邮件、在线支付或其他在线应用。
在上述运行环境下,本申请提供了如图3所示的数据处理方法。图3是根据本发明实施例一提供的数据处理方法的流程图。
S31,获取在执行目标任务的过程中产生的多个时间序列数据。
具体的,目标任务可以为物联网领域的传感器监测任务。在某一个目标任务正在执行的时候,可以产生多种数据,每种数据可以根据任务的进展进行数据的累积,从而得到每种数据对应的时间序列数据。例如,在进行某项实验的过程中,可以通过多种传感器检测到包括震动强度、温度、湿度等监测数据,并且随着的实验进行,可以产生震动强度、温度、湿度等多种实验数据对应的每种实验数据的时间序列数据。
S32,将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标 模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果。
具体的,目标模型可以为已经完成训练的机器学习模型,其中,该机器学习模型在数据处理的过程中可以分为两部分,第一部分,先通过对时间序列数据进行编码得到该时间序列数据的特征信息,其中,特征信息可以包括正序列特征和逆序列特征,以及通过正序列特征和逆序列特征得到潜在空间中的两个特征信息:均值和方差,从而得到多个特征信息。第二部分,在得到多个时间序列数据对应的特征信息后,通过聚类模块对多个时间序列数据的特征信息进行聚类处理,从而得到聚类结果。
S33,根据聚类结果分析目标任务的执行情况。
具体的,在得到多个时间序列数据对应的聚类结果之后,并根据聚类结果分析目标任务的执行情况。
可选地,在本发明实施例的数据处理方法中,根据聚类结果分析目标任务的执行情况包括:在目标任务出现执行故障的情况,确定故障监测量序列数据;获取与故障监测量序列数据属于同一类的数据,并将获取到的数据确定为故障关联数据。
例如,在执行某项实验的过程中,可以通过多种传感器检测到包括震动强度、温度、湿度等监测数据,并且随着的实验进行,可以产生震动强度、温度、湿度等多种实验数据对应的每种实验数据的时间序列数据,并且在通过聚类处理后,将温度和湿度分为一类,即可以认为温度和湿度两个特性之间存在关联关系,在温度出现异常的时候,可以根据关联关系对湿度进行检测,从而判断是否由于湿度异常造成的温度异常。
在本发明实施例中,通过获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况,通过将编码模块和聚类模块组合成为目标模型,并根据编码模块对时间序列数据进行编码操作,得到多个时间序列数据的特征,并使用聚类模块对特征进行聚类,得到聚类结果,从而实现了提高的时间序列数据聚类的准确性的技术效果,进而解决了相关技术中采用的聚类方法对时间序列数据进行聚类的准确性低的技术问题。
为了使得目标模型的聚类结果更加准确,可选地,在本发明实施例的数据处理方法中,在将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果之前,该方法还包括:获取多个样本时间序列数据;通过多个样本时间序列数据训练预设编码模块,并确定训练预设编码模块的过程中对应的第一损失值,其中,在训练预设编码模块时,预设编码模块处理多个样本时间序列数据得到多个样本特征;获取训练预设编码模块的过程中产生的多个样本特征,通过多个样本特征训练预设聚类模块,并确定训练预设聚类模块的过程中对 应的第二损失值;根据第一损失值和第二损失值确定目标损失值,并获取目标损失值小于损失阈值的情况下对应的编码模块和聚类模块;将获取的编码模块和聚类模块组合为目标模型。
具体的,在使用目标模型进行聚类处理之前,需要先对目标模型进行训练,保证目标损失值小于损失阈值,从而使聚类结果更加准确。多个样本时间序列数据可以为已知类别的时间序列数据,并将每个样本时间序列数据及其对应的类别输入至目标模型中,通过多个样本时间序列数据训练目标模型中的预设编码模块,得到编码后的样本特征信息。
一方面,可以通过对完成编码的样本时间序列数据进行解码操作,并在获取解码后的数据后将解码后的数据与对应的样本时间序列数据进行对比,从而确定训练预设编码模块的过程中对应的第一损失值。另一方面,通过样本特征信息进行聚类模块的训练,并通过聚类模块得到每个样本时间序列数据对应的类别预测结果,将类别预测结果与样本时间序列数据对应的类别进行对比,根据对比结果确定第二损失值。
进一步的,可以根据第一损失值和第二损失值确定目标损失值,例如,可以通过Loss3=Loss1+w*Loss2公式进行目标损失值的确定,其中,Loss3为目标损失值,Loss1为第一损失值,*Loss2为第二损失值,w为预设权重。在目标损失值大于损失阈值的情况下,需要调整目标模型的训练参数,直至目标损失值小于损失阈值,从而达到提高聚类结果准确率的效果。
可选地,在本发明实施例的数据处理方法中,编码模块的类型为以下至少之一:稀疏自编码器、变分自编码器。
具体的,目标编码模块可以为稀疏自编码器,其中,稀疏自编码器与自编码器结构基本一致,区别在于隐藏层向量稀疏,即尽可能多的零元素,可以减少模型过拟合的风险。但是,通过稀疏自编码器提取的特征序列没有太多的数据含义,仅仅是对时间序列数据的一种降维表示。
进一步的,为了更好地进行编码,目标编码模型还可以使用变分自编码器。变分自编码器是自编码器的改进,通过隐空间返回多个概率模型,并以此来描述时间序列数据,从而作为数据特征。需要说明的是,变分自编码器返回的是隐空间中的分布而不是单个点,并可以在损失函数中添加一个对返回的分布的正则项来解决隐空间不规则性的问题,以确保更好地组织隐空间。本实施例中编码模块的类型不限于上述两种编码器类型,还可以采用其他类型的编码器。
为了增强对目标模型训练的效果,使得通过完成训练的目标模型得到的聚类结果更加准确,可选地,在本发明实施例的数据处理方法中,通过多个样本时间序列数据训练预设编码模块,并确定训练预设编码模块的过程中对应的第一损失值包括:将多个样本时间序列数据输入预设编码模块,处理得到多个样本时间序列数据的多个样本特征;通过预设解 码模块对多个样本特征进行还原,得到还原后的多个时间序列数据;根据多个样本时间序列数据和还原后的多个时间序列数据之间的差异确定第一损失值。
具体的,在通过预设编码模块得到多个样本时间序列数据对应的样本特征后,可以通过预设解码模块中的采样层,对正态分布进行采样,其中,正态分布通过样本特征中的均值和方差进行表征,并通过三个一维卷积层和一个过采样层,将数据映射成原始数据的维度,得到还原后的时间序列数据,进一步的,将还原后的时间序列数据与该样本时间序列数据进行对比,并根据对比结果确定第一损失值。
为了得到更准确的时间序列数据的特征信息,可选地,在本发明实施例的数据处理方法中,在编码模块的类型为变分自编码器的情况下,编码模块包括:卷积层,用于提取时间序列数据的第一特征;池化层,与卷积层连接,用于降低第一特征的维度;第一长短期记忆模型,与池化层连接,用于提取第一特征中的正序列特征;第二长短期记忆模型,与池化层连接,用于提取第一特征中的逆序列特征;第一全连接层:与第一长短期记忆模型和第二长短期记忆模型连接,用于根据正序列特征和正序列特征确定第一分布参数;第二全连接层:与第一长短期记忆模型和第二长短期记忆模型连接,用于根据正序列特征和逆序列特征确定第二分布参数;将正序列特征、逆序列特征、第一分布参数以及第二分布参数确定为编码器的输出。
具体的,卷积层可以为卷积神经网络中的一维卷积加激活函数得到的激活函数层,用于抽取输入的时间序列数据的第一特征,并将第一特征发送至池化层。池化层通常会分别作用于每个输入的特征并减小其大小,从而降低时间序列数据的第一特征的维度。
在得到降维后的第一特征之后,可以通过第一长短期记忆模型和第二长短期记忆模型可以构成双向LSTM层(Long Short-Term Memory,长短期记忆网络),通过双向LSTM层抽取降低纬度的第一特征的正序列特征和逆序列特征,并在第一全连接层根据正序列特征和逆序列特征确定该时间序列数据的第一分布参数,其中,第一分布参数可以为均值,并在第二全连接层根据正序列特征和逆序列特征确定该时间序列数据的第二分布参数,其中,第二分布参数可以为方差,并将方差、均值、正序列特征和逆序列特征作为时间序列数据的特征信息。
可选地,在本发明实施例的数据处理方法中,聚类模块采用的聚类算法为以下至少之一:K均值聚类算法、层次聚类算法。
具体的,聚类模块可以使用K均值聚类算法,其中,K均值聚类算法通过选取K个点作为初始聚集的簇心,分别计算每个样本点到K个簇核心的距离,找到离该点最近的簇核心,将它归属到对应的簇,并在所有点都归属到簇之后重新计算每个簇的重心(平均距离中心),将其定为新的“簇核心”,在反复迭代上述步骤,直到达到预设条件时完成聚类,并得到聚类结果。但是,K均值聚类算法需要人为确定初始聚类中心,但是选取不同的初 始聚类中心会得到不同的聚类结果,从而降低了聚类结果的准确性。
进一步的,为了更好地进行聚类,聚类模块还可以使用层次聚类算法。
为了提高聚合结果的准确性,可选地,在本发明实施例的数据处理方法中,在聚类模块采用的聚类算法为层次聚类算法的情况下,采用层次聚类算法对特征进行聚类,包括:将每个特征确定为一类,得到多个第一层级的类;计算每两个第一层级的类之间的最小距离,并将最小距离最短的两个第一层级的类进行合并,得到多个第二层级的类;计算每两个第二层级的类之间的最小距离,并将最小距离最短的两个第一层级的类进行合并,直至多个第一层级的类合并成一类。
具体的,在聚类算法为层次聚类算法的情况下,可以将聚类模块中每一个时间序列数据对应的特征作为一类,得到多个第一层级的类,并计算每两个特征之间的距离,并获取到每个特征对应的最小距离特征,并将每个特征与对应的最小距离特征进行合并,并归为一类,得到多个第二层级的类。
进一步的,在得到多个第二层级的类之后,根据每类的坐标再次计算每两个类之间的距离,并获取到每个类对应的最小距离类,并将每个类与对应的最小距离类进行合并,并归为一类,得到多个第三层级的类。重复上述流程直至全部类归于一类,从而得到每个特征之间的类的关系。从而可以根据预设层级对不同时间序列数据进行分类。
与K均值聚类算法相比,层次聚类算法拥有不需要预先制定聚类数、可以明显显示类之间的层次关系、可以聚类成任意形状、不受单个离群点的影响的特点,从而达到提高聚类准确性的效果。本实施例中聚类模块可以使用的聚类方法不限于上述两种聚类算法,还可以采用其他类型的聚类算法。
为了提高时间序列数据聚类的准确性,需要设置聚类算法中不同坐标之间的距离计算方法,可选地,在本发明实施例的数据处理方法中,聚类算法通过每两个时间序列数据之间的距离进行聚类,其中,通过下式确定每两个时间序列数据之间的距离:
CID(x,y)=ED(x,y)·CF(x,y);
其中,x和y是两个时间序列数据的特征,ED(x,y)是x和y之间的欧式距离,CF(x,y)为时序复杂度因子;
其中,N是时间序列数据的个数,t是时间序列数据的序列号;
其中,CF(x,y)=max(CE(x),CE(y))/min(CE(x),CE(y)),CE(x)是x对应的时序复杂度因子,CE(y)是y对应的时序复杂度因子;
其中,
此外,为了减少计算量,还可以直接通过计算欧氏距离的方式确定每两个时间序列数 据之间的距离。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的数据处理方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
实施例2
根据本发明实施例,还提供了一种可选的数据处理方法,图4是根据本发明实施例一提供的一种可选的数据处理方法的流程图,如图4所示:
目标模型接收到时间序列数据之后,首先在编码模块中的一维卷积加激活函数得到的激活函数层,抽取输入的时间序列数据的第一特征,并将第一特征发送至池化层。池化层通常会分别作用于每个输入的特征并减小其大小,从而降低时间序列数据的第一特征的维度。
在得到降维后的第一特征之后,可以通过第一长短期记忆模型和第二长短期记忆模型可以构成双向LSTM层(Long Short-Term Memory,长短期记忆网络),通过双向LSTM层抽取降低纬度的第一特征的正序列特征和逆序列特征,并在第一全连接层根据正序列特征和逆序列特征确定该时间序列数据的第一分布参数,其中,第一分布参数可以为均值,并在第二全连接层根据正序列特征和逆序列特征确定该时间序列数据的第二分布参数,其中,第二分布参数可以为方差,并将方差、均值、正序列特征和逆序列特征作为时间序列数据的特征信息。
进一步的,两个全连接层在获取到方差、均值之后,可以通过预设解码模块中的采样层对正态分布进行采样,其中,正态分布通过样本特征中的均值和方差进行表征,并通过三个一维卷积层和一个过采样层,将数据映射成原始数据的维度,得到还原后的时间序列数据,进一步的,将还原后的时间序列数据与该样本时间序列数据进行对比,并根据对比结果确定第一损失值。
需要说明的是,在编码模块得到该时间序列数据的特征信息之后,将特征信息发送至聚类模块中,并通过聚类模块对多个特征信息进行聚类处理,从而得到该时间序列数据对 应的聚类结果,并在得到多个时间序列数据对应的聚类结果之后,并根据聚类结果分析目标任务的执行情况。
实施例3
根据本申请实施例,还提供了一种数据处理方法。图5是根据本发明实施例一提供的数据处理方法的流程图,如图5所示,该方法包括:
S51,云服务器获取多个时间序列数据。
具体的,在某一个目标任务正在执行的时候,可以产生多种数据,每种数据可以根据任务的进展进行数据的累积,使得云服务器可以得到每种数据对应的时间序列数据。目标任务可以为物联网领域的传感器监测任务。例如,在进行某项实验的过程中,可以通过多种传感器检测到包括震动强度、温度、湿度等监测数据,并且随着的实验进行,可以产生震动强度、温度、湿度等多种实验数据对应的每种实验数据的时间序列数据。
S52,云服务器采用目标模型对多个时间序列数据进行处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果。
具体的,云服务器采用的目标模型可以为已经完成训练的机器学习模型,其中,该机器学习模型在数据处理的过程中可以分为两部分,第一部分先通过对时间序列数据进行编码得到该时间序列数据的特征信息,其中,特征信息可以包括正序列特征和逆序列特征,以及通过正序列特征和逆序列特征得到潜在空间中的两个特征信息:均值和方差,从而得到多个特征信息。第二部分,在得到多个时间序列数据对应的特征信息后,通过聚类模块对多个时间序列数据的特征信息进行聚类处理,从而得到聚类结果。
S53,云服务器返回聚类结果至客户端。
具体的,在得到某个时间序列数据对应的聚类结果之后,云服务端可以将聚类结果返回至客户端,客户端即可根据确定的类别对应的规则进行目标任务执行情况的分析。
例如,在执行某项实验的过程中,可以通过多种传感器检测到包括震动强度、温度、湿度等监测数据,并且随着的实验进行,可以产生震动强度、温度、湿度等多种实验数据对应的每种实验数据的时间序列数据,并且在通过聚类处理后,将温度和湿度分为一类,即可以认为温度和湿度两个特性之间存在关联关系,在温度出现异常的时候,可以根据关联关系对湿度进行检测,从而判断是否由于湿度异常造成的温度异常。
实施例4
根据本发明实施例,还提供了一种用于实施上述数据处理方法的数据处理装置,如图6所示,该装置包括:
第一获取单元61,用于获取在执行目标任务的过程中产生的多个时间序列数据。
第一处理单元62,用于将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果。
第一分析单元63,用于根据聚类结果分析目标任务的执行情况。
此处需要说明的是,上述确定第一获取单元61、第一处理单元62和第一分析单元63对应于实施例1中的步骤S31、步骤S32和步骤S33,两个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
可选地,在本发明实施例的数据处理装置中,该装置还包括:第二获取单元,用于获取多个样本时间序列数据;第一确定单元,用于通过多个样本时间序列数据训练预设编码模块,并确定训练预设编码模块的过程中对应的第一损失值,其中,在训练预设编码模块时,预设编码模块处理多个样本时间序列数据得到多个样本特征;第三获取单元,用于获取训练预设编码模块的过程中产生的多个样本特征,通过多个样本特征训练预设聚类模块,并确定训练预设聚类模块的过程中对应的第二损失值;第二确定单元,用于根据第一损失值和第二损失值确定目标损失值,并获取目标损失值小于损失阈值的情况下对应的编码模块和聚类模块;组合单元,用于将获取的编码模块和聚类模块组合为目标模型。
可选地,在本发明实施例的数据处理装置中,第一确定单元包括:输入模块,用于将多个样本时间序列数据输入预设编码模块,处理得到多个样本时间序列数据的多个样本特征;还原模块,用于通过预设解码模块对多个样本特征进行还原,得到还原后的多个时间序列数据;第一确定模块,用于根据多个样本时间序列数据和还原后的多个时间序列数据之间的差异确定第一损失值。
可选地,在本发明实施例的数据处理装置中,编码模块的类型为以下至少之一:稀疏自编码器、变分自编码器。
可选地,在本发明实施例的数据处理装置中,在编码模块的类型为变分自编码器的情况下,编码模块包括:卷积层,用于提取时间序列数据的第一特征;池化层,与卷积层连接,用于降低第一特征的维度;第一长短期记忆模型,与池化层连接,用于提取第一特征中的正序列特征;第二长短期记忆模型,与池化层连接,用于提取第一特征中的逆序列特征;第一全连接层:与第一长短期记忆模型和第二长短期记忆模型连接,用于根据正序列特征和正序列特征确定第一分布参数;第二全连接层:与第一长短期记忆模型和第二长短期记忆模型连接,用于根据正序列特征和逆序列特征确定第二分布参数;将正序列特征、逆序列特征、第一分布参数以及第二分布参数确定为编码器的输出。
可选地,在本发明实施例的数据处理装置中,聚类模块采用的聚类算法为以下至少之一:K均值聚类算法、层次聚类算法。
可选地,在本发明实施例的数据处理装置中,聚类算法通过每两个时间序列数据之间的距离进行聚类,其中,通过下式确定每两个时间序列数据之间的距离:
CID(x,y)=ED(x,y)·CF(x,y);
其中,x和y是两个时间序列数据的特征,ED(x,y)是x和y之间的欧式距离,CF(x,y)为时序复杂度因子;
其中,N是时间序列数据的个数,t是时间序列数据的序列号;
其中,CF(x,y)=max(CE(x),CE(y))/min(CE(x),CE(y)),CE(x)是x对应的时序复杂度因子,CE(y)是y对应的时序复杂度因子;
其中,
可选地,在本发明实施例的数据处理装置中,在聚类模块采用的聚类算法为层次聚类算法的情况下,采用层次聚类算法对特征进行聚类,包括:第二确定模块,用于将每个特征确定为一类,得到多个第一层级的类;第一计算模块,用于计算每两个第一层级的类之间的最小距离,并将最小距离最短的两个第一层级的类进行合并,得到多个第二层级的类;第二计算模块,用于计算每两个第二层级的类之间的最小距离,并将最小距离最短的两个第一层级的类进行合并,直至多个第一层级的类合并成一类。
可选地,在本发明实施例的数据处理装置中,第一分析单元63包括:第三确定模块,用于在目标任务出现执行故障的情况,确定故障监测量序列数据;第四确定模块,用于获取与故障监测量序列数据属于同一类的数据,并将获取到的数据确定为故障关联数据。
实施例5
根据本发明实施例,还提供了一种用于实施上述数据处理方法的数据处理装置,如图7所示,该装置包括:
第四获取单元71,通过云服务器获取多个时间序列数据。
第二处理单元72,通过云服务器采用目标模型对多个时间序列数据进行处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果。
第二分析单元73,通过云服务器返回聚类结果至客户端。
此处需要说明的是,上述确定第四获取单元71、第二处理单元72和第二分析单元73对应于实施例3中的步骤S51、步骤S52和步骤S53,两个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例3所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
实施例6
本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行数据处理方法中以下步骤的程序代码:获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。
可选地,图8是根据本发明实施例提供的可选的计算机终端的结构框图。如图8所示,该计算机终端10可以包括:一个或多个(图中仅示出一个)处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的数据处理方法和装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的数据处理方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。
采用本发明实施例,提供了一种数据处理的方案。通过获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况,通过将编码模块和聚类模块组合成为目标模型,并根据编码模块对时间序列数据进行编码操作,得到多个时间序列数据的特征,并使用聚类模块对特征进行聚类,得到聚类结果,从而实现了提高的时间序列数据聚类的准确性的技术效果,进而解决了相关技术 中采用的聚类方法对时间序列数据进行聚类的准确性低的技术问题。
本领域普通技术人员可以理解,图8所示的结构仅为示意,计算机终端10也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图8其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图8中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图8所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述实施例一所提供的数据处理方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:获取在执行目标任务的过程中产生的多个时间序列数据;将多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,目标模型至少包括编码模块和聚类模块,编码模块用于提取多个时间序列数据的特征,聚类模块用于对特征进行聚类,得到聚类结果;根据聚类结果分析目标任务的执行情况。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (13)

  1. 一种数据处理方法,其特征在于,包括:
    获取在执行目标任务的过程中产生的多个时间序列数据;
    将所述多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,所述目标模型至少包括编码模块和聚类模块,所述编码模块用于提取所述多个时间序列数据的特征,所述聚类模块用于对所述特征进行聚类,得到所述聚类结果;
    根据所述聚类结果分析所述目标任务的执行情况。
  2. 根据权利要求1所述的数据处理方法,其特征在于,在将所述多个时间序列数据输入目标模型进行聚类处理,得到聚类结果之前,所述方法还包括:
    获取多个样本时间序列数据;
    通过所述多个样本时间序列数据训练预设编码模块,并确定训练所述预设编码模块的过程中对应的第一损失值,其中,在训练所述预设编码模块时,所述预设编码模块处理所述多个样本时间序列数据得到多个样本特征;
    获取训练所述预设编码模块的过程中产生的所述多个样本特征,通过所述多个样本特征训练预设聚类模块,并确定训练所述预设聚类模块的过程中对应的第二损失值;
    根据所述第一损失值和所述第二损失值确定目标损失值,并获取所述目标损失值小于损失阈值的情况下对应的编码模块和聚类模块;
    将获取的所述编码模块和所述聚类模块组合为所述目标模型。
  3. 根据权利要求2所述的数据处理方法,其特征在于,通过所述多个样本时间序列数据训练预设编码模块,并确定训练所述预设编码模块的过程中对应的第一损失值包括:
    将所述多个样本时间序列数据输入所述预设编码模块,处理得到所述多个样本时间序列数据的所述多个样本特征;
    通过预设解码模块对所述多个样本特征进行还原,得到还原后的多个时间序列数据;
    根据所述多个样本时间序列数据和所述还原后的多个时间序列数据之间的差异确定所述第一损失值。
  4. 根据权利要求1所述的数据处理方法,其特征在于,所述编码模块的类型为以下至少之一:稀疏自编码器、变分自编码器。
  5. 根据权利要求4所述的数据处理方法,其特征在于,在所述编码模块的类型为变分自 编码器的情况下,所述编码模块包括:
    卷积层,用于提取所述时间序列数据的第一特征;
    池化层,与所述卷积层连接,用于降低所述第一特征的维度;
    第一长短期记忆模型,与所述池化层连接,用于提取所述第一特征中的正序列特征;
    第二长短期记忆模型,与所述池化层连接,用于提取所述第一特征中的逆序列特征;
    第一全连接层:与所述第一长短期记忆模型和所述第二长短期记忆模型连接,用于根据所述正序列特征和所述正序列特征确定第一分布参数;
    第二全连接层:与所述第一长短期记忆模型和所述第二长短期记忆模型连接,用于根据所述正序列特征和所述逆序列特征确定第二分布参数;
    将所述正序列特征、所述逆序列特征、所述第一分布参数以及所述第二分布参数确定为所述编码器的输出。
  6. 根据权利要求1所述的数据处理方法,其特征在于,所述聚类模块采用的聚类算法为以下至少之一:K均值聚类算法、层次聚类算法。
  7. 根据权利要求6所述的数据处理方法,其特征在于,所述聚类算法通过每两个时间序列数据之间的距离进行聚类,其中,通过下式确定每两个时间序列数据之间的距离:
    CID(x,y)=ED(x,y)·CF(x,y);
    其中,x和y是两个时间序列数据的特征,ED(x,y)是x和y之间的欧式距离,CF(x,y)为时序复杂度因子;
    其中,N是时间序列数据的个数,t是时间序列数据的序列号;
    其中,CF(x,y)=max(CE(x),CE(y))/min(CE(x),CE(y)),CE(x)是x对应的时序复杂度因子,CE(y)是y对应的时序复杂度因子;
    其中,
  8. 根据权利要求6所述的数据处理方法,其特征在于,在所述聚类模块采用的聚类算法为所述层次聚类算法的情况下,采用所述层次聚类算法对所述特征进行聚类,包括:
    将每个所述特征确定为一类,得到多个第一层级的类;
    计算每两个所述第一层级的类之间的最小距离,并将所述最小距离最短的两个所述第一层级的类进行合并,得到多个第二层级的类;
    计算每两个所述第二层级的类之间的最小距离,并将所述最小距离最短的两个所述第一层级的类进行合并,直至所述多个第一层级的类合并成一类。
  9. 根据权利要求1所述的数据处理方法,其特征在于,根据所述聚类结果分析所述目标任务的执行情况包括:
    在所述目标任务出现执行故障的情况,确定故障监测量序列数据;
    获取与所述故障监测量序列数据属于同一类的数据,并将获取到的数据确定为故障关联数据。
  10. 一种数据处理方法,其特征在于,包括:
    云服务器获取多个时间序列数据;
    所述云服务器采用目标模型对所述多个时间序列数据进行处理,得到聚类结果,其中,所述目标模型至少包括编码模块和聚类模块,所述编码模块用于提取所述多个时间序列数据的特征,所述聚类模块用于对所述特征进行聚类,得到所述聚类结果;
    所述云服务器返回所述聚类结果至客户端。
  11. 一种数据处理装置,其特征在于,包括:
    第一获取单元,用于获取在执行目标任务的过程中产生的多个时间序列数据;
    第一处理单元,用于将所述多个时间序列数据输入目标模型进行聚类处理,得到聚类结果,其中,所述目标模型至少包括编码模块和聚类模块,所述编码模块用于提取所述多个时间序列数据的特征,所述聚类模块用于对所述特征进行聚类,得到所述聚类结果;
    第一分析单元,用于根据所述聚类结果分析所述目标任务的执行情况。
  12. 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至9中任意一项所述的数据处理方法,或权利要求10中所述的数据处理方法。
  13. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至9中任意一项所述的数据处理方法,或权利要求10中所述的数据处理方法。
PCT/CN2023/078962 2022-03-08 2023-03-01 数据处理方法、装置、存储介质以及处理器 WO2023169274A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210228756.6A CN114722091A (zh) 2022-03-08 2022-03-08 数据处理方法、装置、存储介质以及处理器
CN202210228756.6 2022-03-08

Publications (1)

Publication Number Publication Date
WO2023169274A1 true WO2023169274A1 (zh) 2023-09-14

Family

ID=82237967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078962 WO2023169274A1 (zh) 2022-03-08 2023-03-01 数据处理方法、装置、存储介质以及处理器

Country Status (2)

Country Link
CN (1) CN114722091A (zh)
WO (1) WO2023169274A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116957421A (zh) * 2023-09-20 2023-10-27 山东济宁运河煤矿有限责任公司 一种基于人工智能的洗选生产智能化监测系统
CN117271480A (zh) * 2023-11-20 2023-12-22 国能日新科技股份有限公司 数据处理方法、装置、电子设备及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722091A (zh) * 2022-03-08 2022-07-08 阿里巴巴(中国)有限公司 数据处理方法、装置、存储介质以及处理器

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111679949A (zh) * 2020-04-23 2020-09-18 平安科技(深圳)有限公司 基于设备指标数据的异常检测方法及相关设备
US20210357282A1 (en) * 2020-05-13 2021-11-18 Mastercard International Incorporated Methods and systems for server failure prediction using server logs
CN113822366A (zh) * 2021-09-29 2021-12-21 平安医疗健康管理股份有限公司 业务指标异常检测方法及装置、电子设备、存储介质
CN113988156A (zh) * 2021-09-30 2022-01-28 山东云海国创云计算装备产业创新中心有限公司 一种时间序列聚类方法、系统、设备以及介质
CN114722091A (zh) * 2022-03-08 2022-07-08 阿里巴巴(中国)有限公司 数据处理方法、装置、存储介质以及处理器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111679949A (zh) * 2020-04-23 2020-09-18 平安科技(深圳)有限公司 基于设备指标数据的异常检测方法及相关设备
US20210357282A1 (en) * 2020-05-13 2021-11-18 Mastercard International Incorporated Methods and systems for server failure prediction using server logs
CN113822366A (zh) * 2021-09-29 2021-12-21 平安医疗健康管理股份有限公司 业务指标异常检测方法及装置、电子设备、存储介质
CN113988156A (zh) * 2021-09-30 2022-01-28 山东云海国创云计算装备产业创新中心有限公司 一种时间序列聚类方法、系统、设备以及介质
CN114722091A (zh) * 2022-03-08 2022-07-08 阿里巴巴(中国)有限公司 数据处理方法、装置、存储介质以及处理器

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116957421A (zh) * 2023-09-20 2023-10-27 山东济宁运河煤矿有限责任公司 一种基于人工智能的洗选生产智能化监测系统
CN116957421B (zh) * 2023-09-20 2024-01-05 山东济宁运河煤矿有限责任公司 一种基于人工智能的洗选生产智能化监测系统
CN117271480A (zh) * 2023-11-20 2023-12-22 国能日新科技股份有限公司 数据处理方法、装置、电子设备及介质
CN117271480B (zh) * 2023-11-20 2024-03-15 国能日新科技股份有限公司 数据处理方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN114722091A (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2023169274A1 (zh) 数据处理方法、装置、存储介质以及处理器
US11176418B2 (en) Model test methods and apparatuses
JP2020522774A (ja) サーバ、金融時系列データの処理方法及び記憶媒体
CN111091278B (zh) 机械设备异常检测的边缘检测模型构建方法及装置
CN111176953B (zh) 一种异常检测及其模型训练方法、计算机设备和存储介质
CN109783459A (zh) 从日志中提取数据的方法、装置及计算机可读存储介质
CN113568740A (zh) 基于联邦学习的模型聚合方法、系统、设备及介质
CN116760726A (zh) 一种基于编解码算法的供应商作弊检测方法
CN110781410A (zh) 一种社群检测方法及装置
CN112905987B (zh) 账号识别方法、装置、服务器及存储介质
CN115509853A (zh) 一种集群数据异常检测方法及电子设备
CN115567283A (zh) 一种身份认证方法、装置、电子设备、系统及存储介质
CN117097789A (zh) 一种数据处理方法、装置、电子设备及存储介质
CN111815442B (zh) 一种链接预测的方法、装置和电子设备
CN109491844B (zh) 一种识别异常信息的计算机系统
CN111797406A (zh) 一种医疗基金数据分析处理方法、装置及可读存储介质
CN115361231B (zh) 基于访问基线的主机异常流量检测方法、系统及设备
CN117473330B (zh) 一种数据处理方法、装置、设备及存储介质
CN112381539B (zh) 基于区块链和大数据的交易信息处理方法及数字金融平台
CN116629252B (zh) 基于物联网的远程参数自适应性配置方法及系统
CN113297054A (zh) 测试流量集的获取方法、装置和存储介质
CN117669803A (zh) 设备故障预测方法、装置和非易失性存储介质
CN117056436A (zh) 一种社区发现方法、装置、存储介质及电子设备
CN117932458A (zh) 对象识别模型生成方法、装置、电子设备和存储介质
CN117434403A (zh) 电器的局部放电检测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23765852

Country of ref document: EP

Kind code of ref document: A1