WO2021147319A1 - Data processing method, apparatus, device, and medium - Google Patents

Data processing method, apparatus, device, and medium Download PDF

Info

Publication number
WO2021147319A1
WO2021147319A1 PCT/CN2020/111993 CN2020111993W WO2021147319A1 WO 2021147319 A1 WO2021147319 A1 WO 2021147319A1 CN 2020111993 W CN2020111993 W CN 2020111993W WO 2021147319 A1 WO2021147319 A1 WO 2021147319A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
fitting model
target fitting
target
Prior art date
Application number
PCT/CN2020/111993
Other languages
French (fr)
Chinese (zh)
Inventor
霍罗威茨·夏伊
埃瑞恩·雅尔
佩雷斯·诺阿姆
王琛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021147319A1 publication Critical patent/WO2021147319A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC

Definitions

  • This application relates to the field of computer technology, and in particular to a data processing method, device, device, and computer-readable storage medium.
  • IoT Internet of Things
  • more and more devices such as TVs, air conditioners, speakers, routers, cameras, etc. can be independently addressed to form an interconnected network, thereby realizing intelligent identification of devices , Positioning, tracking, monitoring and management.
  • time series data refers to the index data that is calculated according to time.
  • the time series data may be resource utilization rate, electrocardiogram, stock price, and so on.
  • the amount of time-series data generated by IoT devices every minute can reach hundreds of millions, which leads to greater pressure on data transmission and storage.
  • This application provides a data processing method, which realizes data compression by performing model fitting on data, and solves the problem of high data transmission and storage pressure.
  • This application also provides corresponding devices, equipment, computer-readable storage media, and computer program products.
  • this application provides a data processing method.
  • the first device as the data sender can obtain multiple data, and then obtain the target fitting model based on the multiple data, and then send the aforementioned target fitting model to the second device, instead of directly sending multiple data, the second The device can recover at least one of the multiple data through the above-mentioned target fitting model according to requirements. In this way, the amount of transmitted data and the amount of data stored by the second device can be reduced, thereby alleviating the pressure of data transmission and data storage.
  • the first device may obtain the target fitting model in the following manner. Specifically, the first device may perform model fitting on multiple fitting models according to multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
  • the multiple fitting models may be models negotiated in advance by the first device and the second device, and the first device may use multiple data as training data to train the multiple fitting models until the training end condition is satisfied.
  • the training end condition may be that the number of parameter iterations reaches the maximum number, or the loss value determined based on the loss function is less than a preset value.
  • the first device determines the target fitting model according to the loss value of each fitting model.
  • the first device can use the fitting model with the smallest loss value as the target fitting model, or the fitting model with the loss value less than the preset value as the target fitting model, or the loss value tends to converge
  • the fitting model of is used as the target fitting model.
  • the multiple fitting models may specifically be at least two of a linear model, a polynomial model, and a neural network model.
  • multiple fitting models may include linear models, polynomial models, and neural network models. In this way, by fitting more models, the fitting accuracy can be improved and the accuracy of data transmission can be improved.
  • the first device when it sends the target fitting model to the second device, it may send the identification of the fitting model selected by the first device and the corresponding model parameters to the second device.
  • the model ID can be agreed in advance.
  • the model ID of a linear model can be 1
  • the model ID of a polynomial model can be 2
  • the model ID of a neural network model can be 3.
  • Models can generally be characterized by functions, and model parameters are the parameters of function expressions.
  • x is the independent variable
  • y is the function value
  • a and b are the parameters of the linear model.
  • the first device may send the identification “1” of the fitting model and the model parameters “a” and “b” to the second device.
  • Sending model identifiers and parameters can further reduce the amount of transmitted data and the amount of stored data, reduce transmission resource overhead and storage resource overhead, and save costs.
  • the first device when multiple pieces of data acquired by the first device are associated with other data, the first device may also determine the association relationship between the multiple pieces of data and other data, and the association relationship may be characterized by a relationship function. Based on this, the first device can send the relationship function associated with the target fitting model to the second device. In this way, the relationship function can be used to determine another target fitting model based on the target fitting model. At least one of the other data associated with multiple data can be restored.
  • the first device may also send the difference value to the second device.
  • the difference is a difference between the plurality of data acquired by the first device and the plurality of data restored by the first device based on the target fitting model.
  • the difference may be used in combination with the target fitting model to restore at least one data of the plurality of data.
  • the above difference is superimposed on the recovered data of the target fitting model to achieve lossless data recovery and improve the accuracy of data transmission.
  • the foregoing multiple data may be multiple time series data within a time window.
  • the time window can be set according to actual needs, for example, it can be set to one minute.
  • Time series data refers to time series data, that is, a series of data recorded in chronological order for the same indicator. In an example, if the time window is one minute and the collection period is one second, then 60 data can be collected in one time window.
  • this application provides a data processing method. Specifically, the second device as the data receiver receives the target fitting model from the first device, the target fitting model is obtained by the first device based on the multiple data acquired, and the second device recovers according to the target fitting model At least one of the multiple data.
  • receiving the target fitting model and storing the target fitting model greatly reduce the pressure of data transmission and data storage.
  • the second device can choose to restore at least one of the multiple data, so that the individual needs of different users can be met.
  • the second device may also receive a relationship function associated with the target fitting model, and determine another target fitting model based on the relationship function and the target fitting model. In this way, the second device may determine another target fitting model based on the relationship function and the target fitting model.
  • a target fitting model restores at least one data among other data associated with multiple data.
  • receiving the relationship function associated with the target fitting model can further reduce the amount of transmitted data and the amount of stored data, thereby reducing the pressure of data transmission and data storage.
  • the second device may choose to perform lossless recovery or lossy recovery for data according to requirements. Among them, for data with higher data accuracy requirements, lossless recovery can be selected, and for data with relatively low data accuracy requirements, lossy recovery can be selected to save costs.
  • the second device directly restores the data according to the target fitting model, which is a lossy restoration.
  • the difference value is the difference value between the multiple data acquired by the first device and the multiple data recovered by the first device based on the target fitting model
  • the second device may restore at least one of the multiple data according to the target fitting model and the difference.
  • the second device may first perform data recovery according to the target fitting model, and then superimpose the above-mentioned difference on this basis to obtain the original data, thus realizing data lossless recovery.
  • this application provides a data processing device, which includes:
  • Communication module used to obtain multiple data
  • a fitting module configured to obtain a target fitting model based on the plurality of data
  • the communication module is further configured to send the target fitting model to a second device, and the target fitting model is used to restore at least one data of the plurality of data.
  • the fitting module is specifically used for:
  • Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
  • the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
  • the communication module is specifically used for:
  • the communication module is also used to:
  • the relationship function associated with the target fitting model is sent to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target fitting model is used for At least one of the other data associated with the plurality of data is restored.
  • the communication module is also used to:
  • the plurality of data includes a plurality of time series data within a time window.
  • the present application provides a data processing device, which includes:
  • a communication module configured to receive a target fitting model from a first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
  • the restoration module is configured to restore at least one of the multiple data according to the target fitting model.
  • the communication module is also used to:
  • the device also includes:
  • a determining module configured to determine another target fitting model according to the relationship function and the target fitting model
  • the recovery module is also used for:
  • At least one of the other data associated with the plurality of data is restored according to the another target fitting model.
  • the communication module is also used to:
  • a communication module configured to receive a difference value from a first device, where the difference value is the difference between the multiple data acquired by the first device and the multiple data recovered by the first device based on the target fitting model Difference
  • the recovery module is specifically used for:
  • At least one data of the plurality of data is restored according to the target fitting model and the difference value.
  • the present application provides a device including a processor and a memory
  • the processor is configured to execute instructions stored in the memory, so that the processor executes the data processing method according to the first aspect or the second aspect.
  • the present application provides a computer-readable storage medium, including instructions, which when run on a device, cause the device to execute the data processing method as described in the first or second aspect.
  • this application provides a computer program product containing instructions, which when run on a device, causes the device to execute the data processing method described in the first or second aspect.
  • FIG. 1 is a schematic diagram of a system architecture of a data processing method provided by an embodiment of this application;
  • FIG. 2 is an interaction flowchart of a data processing method provided by an embodiment of this application
  • FIG. 3 is a schematic diagram of a data processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a data processing method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a data processing method provided by an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of a data processing device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a device provided by an embodiment of this application.
  • FIG. 9 is a schematic structural diagram of a device provided by an embodiment of the application.
  • Figure 1 shows a possible application scenario of an embodiment of the present application.
  • the device 102 and the device 104 are network node devices.
  • the device 102 may be a network node device that sends data
  • the device 104 is a network node device that receives the data.
  • the data sent by the device 102 may be data generated by the device 102 itself, or may be data obtained by the device 102 from other devices.
  • the device 102 may also be used as a device for receiving data
  • the device 104 may also be used as a device for sending data.
  • This application only uses the device 102 to send data to the device 104 as an example for description, and does not constitute a limitation to the technical solution of the application.
  • the device 102 can be a mobile terminal device such as a smart phone, a smart bracelet, a tablet computer, etc.
  • the device can monitor any one or more indicators in the position and heartbeat through a position sensor, a heartbeat sensor, etc. Monitoring data.
  • the device 102 may also be a device deployed with an application on the cloud, and the device may monitor any one or more indicators such as throughput and delay of the application to generate monitoring data.
  • the device 102 may also be an end-side device of the Internet of things (IoT), such as a smart refrigerator, a smart light bulb, etc., and the device may monitor any one or more indicators such as temperature and brightness to generate monitoring data.
  • IoT Internet of things
  • the device 104 may be a data transfer device.
  • the device 104 may be an edge device of the cloud, such as a gateway device (gateway, GW), a switch device, and so on.
  • the device 104 may also be a data consuming device, such as a server or the like.
  • the present application provides a data processing method that supports the device 102 as a data sender to obtain multiple data first, and then obtain a target fitting model based on the multiple data, and send the target fitting model instead of directly sending the multiple data.
  • Individual data in this way, can reduce the amount of data sent, reduce the requirements for network transmission, and reduce transmission pressure.
  • the receiver can store the target fitting model, and when the data needs to be used, restore at least one of the multiple data based on the target fitting model, which also reduces the requirements for data storage and reduces the storage pressure. Improve storage performance.
  • the method includes:
  • S202 The device 102 obtains multiple pieces of data.
  • the device 102 may generate multiple pieces of data or obtain multiple pieces of data from other devices.
  • multiple data acquired by the device 102 (for example, data generated by the device 102 itself, data acquired by the device 102 from other devices) may be time series data, that is, data of an indicator within a time period.
  • the multiple pieces of data acquired by the device 102 may also be non-sequential data, such as location-related data.
  • the multiple data acquired by the device 102 may be that the device monitors one or more of the virtual machine's central processing unit (CPU) usage rate, memory (memory) usage rate and other indicators to obtain The CPU usage of the virtual machine in a period of time and/or the memory usage of the virtual machine in a period of time.
  • the multiple pieces of data acquired by the device 102 may also be the device's monitoring of one or more of the application's throughput, delay and other indicators, and the acquired throughput and/or delay of the application in a period of time.
  • the CPU usage rate, memory usage rate, throughput, and time delay in a time period described in the embodiments of the present application specifically refer to the CPU usage rate, memory usage rate, throughput, and time delay corresponding to different time points in a time period. Time delay.
  • the device 102 and the device 104 that perform data interaction can negotiate data collection rules in advance. For example, when the data is time series data, the device 102 and the device 104 may pre-negotiate the data collection period, that is, the collection time interval. Of course, the device 102 may also send collection rules to the device 104 when sending data, such as the collection period of the sent data.
  • the multiple data acquired by the device 102 may also be data generated by monitoring indicators such as rainfall at different locations.
  • This data is non-time series data, which can change with different geographic locations.
  • the device 102 and the device 104 may negotiate the data collection location interval in advance.
  • the collection location interval can be measured by latitude and longitude, such as one longitude or one latitude.
  • the collection location interval can also be measured by length and height, such as 5 kilometers (kilometer, km), and so on.
  • Metadata is specifically data describing data, which can be at least one attribute of data.
  • its metadata may include indicator names, project identifiers of the projects to which they belong, service names, and so on.
  • the metadata of multiple data corresponding to the same indicator is often the same. Therefore, compared with directly transmitting data and its metadata, obtaining multiple data can realize the aggregation of metadata, and only need to transmit one piece of metadata. . In this way, data compression can be achieved.
  • the device 102 can further process the multiple data, such as performing compression processing, which can reduce transmission overhead and storage overhead.
  • the device 102 can acquire multiple data according to the pre-configured acquisition quantity N, that is, the device 102 can acquire N data at a time.
  • the device 102 may acquire multiple time series data within the time window ⁇ t.
  • the acquisition number N or the time window ⁇ t can be set according to an empirical value, which is not limited in the embodiment of the application.
  • the data in the embodiments of this application may be numerical values.
  • the data may also be characters, such as characters A, B, C, D, etc. that characterize the level.
  • the device 102 obtains a target fitting model based on the multiple data.
  • the device 102 may perform model fitting on multiple data, and determine a fitting model that meets a preset condition as a target fitting model.
  • the device 102 and the device 104 may negotiate multiple fitting models in advance. In this way, the device 102 may perform the multiple fitting models based on multiple data.
  • the model performs model fitting to select one fitting model from multiple fitting models as the target fitting model.
  • the multiple fitting models negotiated by the device 102 and the device 104 may be models that are easy to fit.
  • the multiple fitting models may be at least two of a linear model, a polynomial model, and a neural network model.
  • the neural network model may be a three-layer neural network model. In this way, the complexity of the model can be reduced and the model is easier to fit.
  • the device 102 When the device 102 simulates and fits multiple fitting models based on multiple data, it can calculate the loss value of each fitting model based on the multiple data and the data obtained from the model fitting, and then calculate the loss value of each fitting model according to the loss value from the multiple fitting models Choose a fitting model as the target fitting model.
  • the device 102 may mark multiple pieces of data as training data, and then train a fitting model based on the training data in combination with a loss function, and optimize the fitting model parameters until the training end condition is satisfied.
  • the training end condition may be that the number of parameter iterations reaches the maximum number, or the loss value determined based on the loss function is less than a preset value.
  • the device 102 determines the target fitting model according to the loss value of each fitting model.
  • the device 102 may use the fitting model with the smallest loss value as the target fitting model, or the fitting model with the loss value less than a preset value as the target fitting model, or the loss value tends to be The fitting model that converges is used as the target fitting model.
  • the device 102 determines the difference between the multiple pieces of data it acquires and the multiple pieces of data recovered based on the target fitting model.
  • the multiple data acquired by the device 102 have a one-to-one correspondence with the multiple data recovered based on the target fitting model. For example, if the device 102 acquires N pieces of data, it can restore the N pieces of data based on the target fitting model. The device 102 can make a difference between the data recovered by the corresponding target fitting model for each of the multiple acquired data, thereby obtaining multiple differences. The differences are used to achieve data lossless recovery, that is, according to the target model. The combined model and the difference realize the accurate recovery of multiple data acquired by the first device.
  • the device 102 may not execute the foregoing S206.
  • the device 102 may not perform the above S206.
  • the device 102 sends the target fitting model to the device 104.
  • the device 102 and the device 104 may agree on the model identifiers of the multiple fitting models negotiated. In this way, when the device 102 selects a fitting model from a plurality of fitting models as the target fitting model, it can send the identification of the fitted model selected and the corresponding model parameters to the device 104, so as to realize the sending of the target to the device 104 Fit the model. Based on this, the transmission overhead can be further reduced.
  • the identification model_ID of the linear model may be 1, the identification model_ID of the polynomial model may be 2, and the identification model_ID of the neural network model may be 3.
  • the linear model can be expressed as:
  • a and b are the parameters of the linear model, which identify the slope and intercept respectively.
  • x can be time or latitude and longitude (representing a region), and y can be data restored by the device 104.
  • the device 102 may send a sequence or array including the following data, such as ⁇ 1, a, b ⁇ or [1, a, b], to send the target to the device 104 Fit the model.
  • the device 102 only needs to send a very short sequence or array. For example, for a linear model, only the identification and model of the model need to be sent. There are three parameters in total, which is equivalent to 95% of the data compressed, which greatly reduces the amount of transmitted data and the amount of stored data.
  • execution order of S206 and S208 may be arbitrary, for example, they may be executed in parallel, or may be executed sequentially in a set order.
  • the device 102 sends to the device 104 the difference between the multiple pieces of data acquired by the device 102 and the multiple pieces of data restored based on the target fitting model.
  • one message may be used to send, or multiple messages may be used to send separately.
  • the device 102 may execute the above S208 and S210 in parallel, or may execute successively in a set sequence.
  • the device 102 may not execute the foregoing S210. For example, in a scenario where data lossy compression can meet the demand, the device 102 may not perform the foregoing S210.
  • the device 104 restores multiple data according to the target fitting model and the difference.
  • the data acquired by the device 102 can be denoted as y 0
  • the data recovered by the device 102 based on the target fitting model is denoted as y 1
  • the data acquired by the device 102 and the data recovered by the device 102 based on the target fitting model y 1 The difference is recorded as ⁇ y.
  • the device 104 may first restore y 1 according to the target fitting model. For example, for time series data, the device 104 may substitute time into the expression of the target fitting model to restore y 1 . It should be noted that for ordered time series data, such as periodic time series data, the device 104 may substitute the corresponding time into the expression of the target fitting model according to the agreed collection period directly to restore y 1 . For non-ordered time series data, the device 102 can also send the time to the device 104, such as "1, 2, 4, 5". In this way, the device 104 can substitute the time sent by the device 102 into the expression recovery of the target fitting model. 1 .
  • the device 104 may superimpose y 1 with the corresponding difference ⁇ y, thereby restoring y 0 , achieving lossless data recovery.
  • the device 104 may not perform superimposing y1 with the corresponding difference ⁇ y in a scenario where data lossy compression can meet the demand. Steps to restore y 0. That is, the device 104 can recover multiple data according to the target fitting model.
  • the embodiment of the present application provides a data processing method that supports the device 102 as the data sender to obtain multiple data first, and then obtain a target fitting model based on the multiple data, and send the target fitting model Instead of sending multiple data directly, in this way, the amount of data sent can be reduced, the requirements for network transmission, and the transmission pressure can be reduced.
  • the device 104 as the receiver can store the target fitting model, and when data is needed, restore multiple data based on the target fitting model, which also reduces the requirements for data storage, reduces storage pressure, and improves storage performance .
  • the data processing method provided in this application also supports two methods of lossy and lossless compression for users to choose.
  • the device 102 sends the target fitting model, and the device 104 restores the data acquired by the device 102 according to the target fitting model.
  • the device 102 also sends the difference between the data acquired by the device 102 and the data recovered based on the target fitting model, and the device 104 accurately restores the data acquired by the device 102 according to the target fitting model and the aforementioned difference.
  • the difference value sent by the device 102 is generally a small value, the number of bits occupied by the encoding will be less than that of a large value. Therefore, the compression effect can also be obtained and the transmission and storage pressure can be reduced.
  • the embodiment of the present application also provides a schematic diagram of a data processing method.
  • the device 102 can acquire multiple data in a time window, and then perform model fitting according to the multiple acquired data to obtain a target fitting model.
  • FIG. 3 uses the data as the time series data and the target fitting model as the sine model for illustration. In other possible implementations, it may also be other types of data and other types of models, which is not limited in this embodiment.
  • the device 102 may send the target fitting model and the difference between the acquired data and the target fitting model recovery data to the device 104.
  • the difference is specifically [0.01, 0.12, 0.03,..., 0.06].
  • the device 104 can perform data loss recovery according to the target fitting model. When the data accuracy is relatively high, the device 104 may also superimpose the aforementioned difference on the result of the lossy recovery of the data, so as to realize the lossless recovery of the data.
  • Figure 3 is an example of collecting one data at each time, that is, each time stamp corresponds to a value.
  • the values of multiple indicators can be collected at a time.
  • the device 102 collects environmental information once every minute, and the environmental information may include two indicators of temperature and humidity.
  • the multiple indicators collected by the device 102 may be independent of each other, or may have an associated relationship.
  • the data can be further compressed through association to further reduce transmission overhead and storage overhead.
  • the device 102 may perform model fitting only on multiple data corresponding to one indicator.
  • the index for model fitting can be called the basic index.
  • the device 102 may determine the association relationship between other indicators among the multiple indicators and the foregoing basic indicators according to the data of the multiple indicators.
  • the association can be characterized by the relation function. For example, when the relationship is linear, it can be characterized by a linear function.
  • the device 104 may determine another target fitting model according to the relationship function and the target fitting model. Specifically, the device 104 substitutes the expression of the target fitting model into the above-mentioned relation function, thereby obtaining the expression of another target fitting model. Then, the device 104 may restore other multiple data associated with the multiple data according to another target fitting model, that is, multiple data corresponding to other indicators associated with the basic indicator. Wherein, the specific implementation of the device 104 restoring the associated multiple data according to another target fitting model is similar to that of restoring multiple data according to one target model. For details, please refer to the description of related content above, which will not be repeated here.
  • this application also provides a specific example to illustrate the process of restoring multiple pieces of associated data.
  • the device 102 may determine the association relationship between the indicators according to multiple data corresponding to each indicator.
  • the multiple data of the 8 indicators in the time window can be divided into two groups, the indicators within the group have an association relationship, and the indicators between the groups do not have an association relationship.
  • the device 102 may perform model fitting for multiple data corresponding to one indicator in each group of indicators to obtain a corresponding target fitting model. Then the device 102 sends the target fitting model and the relationship function representing the association relationship to the device 104. As shown in FIG. 4, the device 102 sends two target fitting models and three relationship functions associated with each target fitting model. The device 104 can restore multiple data corresponding to the index according to the target fitting model. In addition, the device 104 may determine the target fitting model corresponding to the index having the correlation relationship according to the target fitting model and the relationship function, and restore multiple pieces of related data based on the target fitting model corresponding to the index having the correlation relationship.
  • the data processing method of the present application will be introduced by taking the data processing of the CPU usage rate and the memory usage rate as an example.
  • the device 102 collects the CPU usage rate and the memory usage rate according to the collection period of 1 second. In this example, the device 102 needs to collect data within 10 minutes and send the data within the 10 minutes to the device 104. In the traditional method, each time the device 102 collects a piece of data, its data and corresponding metadata are sent to the device 104.
  • the metadata sent by the device 102 includes the project identifier project_id of the project to which the CPU belongs, the cluster identifier cluster_id, the service name service_name, the service instance name service_instance_name, and the metric name metric_name. Since the data sent by the device 102 is time series data, it is necessary to send the time stamp timestamp and the data value value when sending data. Based on this, the data of an indicator at a point in time may be a data block composed of the above metadata and data. The device 102 sends 1 index of data within 10 minutes, that is, 600 data blocks are sent.
  • the metadata corresponding to an indicator at different times is often the same.
  • the metadata of multiple data can be aggregated. Therefore, the device 102 only needs to send one data block.
  • the data part in the data block may include a start timestamp timestamp_from and an end timestamp timestamp_to.
  • the data value is also changed from a single value to an array of values corresponding to multiple moments. Since there is no need to repeatedly send repeated metadata, the device 104 does not need to store the repeated metadata, thereby reducing transmission pressure and storage pressure.
  • the device 102 can perform model fitting based on multiple acquired data, specifically based on the aggregated array as training data, and perform the respective pre-set fitting models, such as linear model, polynomial model, and neural network model. Perform fitting, calculate the loss value of each fitting model, and determine the target fitting model.
  • the device 102 only needs to record the model identification and model parameters in the value field. In this way, the value field can be reduced from an array including 60 data to an array including 3 data, which further reduces the amount of data and reduces transmission pressure and storage pressure.
  • the device 104 receives the above-mentioned data and stores it.
  • the device 104 when the device 104 has a demand for some data of CPU usage, the device 104 can read the model identification and model parameters of the value field, restore the target fitting model, and then restore multiple CPU usage data based on the target fitting model At least one data in.
  • the device 102 When there is an association between memory usage and CPU usage, the data to be sent can be further compressed. Specifically, for the memory usage rate of the associated index, the device 102 only needs to send a data block composed of simplified metadata and simplified data.
  • the simplified metadata includes the indicator name metric_name, which is memory in this example, and other metadata of the data can be referred to the associated indicators, which are also called reference indicators.
  • the simplified data includes a reference index (reference, ref) field and a value field.
  • the field value of the reference index field is CPU, and the value field is used to identify the relationship function between the index and the reference index.
  • the value field includes function identification and function It consists of several numerical values including parameters.
  • the device 104 when the device 104 has requirements for some data of the memory usage rate, the device 104 can read the function identifier and function parameters of the value field, thereby restoring the relationship function between the memory usage rate and the CPU usage rate, based on the relationship function and The target fitting model of the CPU usage rate restores the target fitting model of the memory usage rate, and then restores at least one of the multiple memory usage rate data based on the target fitting model.
  • the data processing device 300 includes:
  • the communication module 302 is used to obtain multiple data
  • the fitting module 304 is configured to obtain a target fitting model based on the multiple data
  • the communication module 302 is further configured to send the target fitting model to the second device 102, where the target fitting model is used to restore at least one data of the plurality of data.
  • the fitting module 304 is specifically configured to:
  • Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
  • the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
  • the communication module 302 is specifically configured to:
  • the identification of the fitting model selected by the fitting module 304 and the corresponding model parameters are sent to the second device.
  • the communication module 302 is further configured to:
  • the relationship function associated with the target fitting model is sent to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target fitting model is used for At least one of the other data associated with the plurality of data is restored.
  • the communication module is also used to:
  • the data processing device 300 may correspond to the implementation of the method described in the embodiment of the present application, and the above and other operations and/or functions of each module in the data processing device 300 are to implement each method in FIG. 2 respectively. For the sake of brevity, the corresponding process will not be repeated here.
  • the data processing device 400 includes:
  • the communication module 402 is configured to receive a target fitting model from a first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
  • the restoration module 404 is configured to restore at least one of the multiple data according to the target fitting model.
  • the communication module 402 is further configured to:
  • the device 400 further includes:
  • a determining module configured to determine another target fitting model according to the relationship function and the target fitting model
  • the recovery module 404 is also used for:
  • At least one of the other data associated with the plurality of data is restored according to the another target fitting model.
  • the communication module is also used to:
  • the communication module 402 is configured to receive a difference value from a first device, where the difference value is the plurality of data acquired by the first device and the plurality of data recovered by the first device based on the target fitting model The difference;
  • the recovery module 404 is specifically used for:
  • At least one data of the plurality of data is restored according to the target fitting model and the difference value.
  • the data processing apparatus 400 may correspond to the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of each module in the data processing apparatus 400 are to implement each method in FIG. 2 respectively. For the sake of brevity, the corresponding process will not be repeated here.
  • Figures 8-9 also provide a device.
  • the device 500 shown in FIG. 8 may be specifically used to implement the functions of the data processing apparatus 300 in the embodiment shown in FIG. 6, and the device 600 shown in FIG. 9 may be specifically used to implement the data processing apparatus in the embodiment shown in FIG. 400 features.
  • the device 500 includes a bus 501, a processor 502, a communication interface 503, and a memory 504.
  • the processor 502, the memory 504, and the communication interface 503 communicate through a bus 501.
  • the bus 501 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 503 is used to communicate with the outside, such as acquiring multiple data, sending a target fitting model to the second device, and so on.
  • the processor 502 may be a central processing unit (CPU).
  • the memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • the memory 504 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), flash memory, HDD or SSD.
  • the memory 504 stores executable code, and the processor 502 executes the executable code to execute the aforementioned data processing method.
  • each module described in the embodiment of FIG. 6 is realized by software
  • the software or program required to execute the function of the fitting module 304 in FIG. 6 The code is stored in the memory 504.
  • the function of the communication module 302 is implemented through the communication interface 503.
  • the processor 502 is configured to execute instructions in the memory 504 and execute a data processing method applied to the data processing device 300.
  • the device 600 includes a bus 601, a processor 602, a communication interface 603, and a memory 604.
  • the processor 602, the memory 604, and the communication interface 603 communicate through a bus 601.
  • the device 600 implements the embodiment shown in FIG. 7 and each module described in the embodiment of FIG. 7 is implemented by software, the software or program code required to execute the function of the recovery module 404 in 7 is stored in In the memory 604.
  • the function of the communication module 402 is implemented through the communication interface 603.
  • the processor 602 is configured to execute instructions in the memory 604, and execute a data processing method applied to the data processing device 400.
  • the embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a device, cause the device to execute the above-mentioned data processing method applied to the data processing apparatus 300 or the data processing apparatus 400.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product When the computer program product is executed by a computer, the computer executes any one of the aforementioned data processing methods.
  • the computer program product may be a software installation package. In the case where any of the aforementioned data processing methods needs to be used, the computer program product may be downloaded and executed on the computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present application provides a data processing method. Said method comprises: a first device acquiring a plurality of pieces of data, then the first device obtaining a target fitting model on the basis of the plurality of pieces of data, and then the first device sending the target fitting model to a second device, the target fitting model being used to recover at least one of the plurality of pieces of data. In this way, the amount of data to be transmitted and the amount of data to be stored by a second device can be reduced, thereby relieving the data transmission pressure and the data storage pressure.

Description

一种数据处理方法、装置、设备及介质Data processing method, device, equipment and medium
本申请要求于2020年01月23日提交中国国家知识产权局、申请号为202010076999.3、发明名称为“一种数据处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 202010076999.3, and the invention title is "a data processing method, device, equipment and medium" on January 23, 2020. The entire content of the application is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据处理方法、装置、设备以及计算机可读存储介质。This application relates to the field of computer technology, and in particular to a data processing method, device, device, and computer-readable storage medium.
背景技术Background technique
随着信息时代的来临,海量数据也随之产生。为了充分利用上述海量数据,常常需要对数据进行传输以及存储。以物联网(the Internet of things,IoT)为例,越来越多的设备如电视、空调、音箱、路由器、摄像机等能够被独立寻址形成互联互通的网络,进而实现对设备的智能化识别、定位、跟踪、监控和管理。With the advent of the information age, massive amounts of data have also been produced. In order to make full use of the above-mentioned massive data, it is often necessary to transmit and store the data. Taking the Internet of Things (IoT) as an example, more and more devices such as TVs, air conditioners, speakers, routers, cameras, etc. can be independently addressed to form an interconnected network, thereby realizing intelligent identification of devices , Positioning, tracking, monitoring and management.
为了实现智能化感知、识别、定位、跟踪、监控或管理,IoT设备产生大量的时序数据。所谓时序数据是指按照时间进行统计的指标数据。该时序数据可以是资源使用率、心电图、股价等等。在实际应用中,IoT设备每分钟产生的时序数据的数据量可以达到上亿条,如此导致数据传输和存储压力较大。In order to realize intelligent perception, identification, positioning, tracking, monitoring or management, IoT devices generate a large amount of time series data. The so-called time series data refers to the index data that is calculated according to time. The time series data may be resource utilization rate, electrocardiogram, stock price, and so on. In practical applications, the amount of time-series data generated by IoT devices every minute can reach hundreds of millions, which leads to greater pressure on data transmission and storage.
基于此,业界亟需提供一种数据处理方法,以缓解数据传输和存储压力。Based on this, the industry urgently needs to provide a data processing method to relieve the pressure of data transmission and storage.
发明内容Summary of the invention
本申请提供了一种数据处理方法,该方法通过对数据进行模型拟合,实现数据压缩,解决了数据传输和存储压力较大的问题。本申请还提供了对应的装置、设备、计算机可读存储介质以及计算机程序产品。This application provides a data processing method, which realizes data compression by performing model fitting on data, and solves the problem of high data transmission and storage pressure. This application also provides corresponding devices, equipment, computer-readable storage media, and computer program products.
第一方面,本申请提供了一种数据处理方法。具体地,作为数据发送方的第一设备,可以获取多个数据,然后基于多个数据获得目标拟合模型,接着向第二设备发送上述目标拟合模型,代替直接发送多个数据,第二设备可以根据需求通过上述目标拟合模型恢复多个数据中的至少一个。如此,可以减少传输的数据量以及第二设备存储的数据量,从而缓解数据传输压力和数据存储压力。In the first aspect, this application provides a data processing method. Specifically, the first device as the data sender can obtain multiple data, and then obtain the target fitting model based on the multiple data, and then send the aforementioned target fitting model to the second device, instead of directly sending multiple data, the second The device can recover at least one of the multiple data through the above-mentioned target fitting model according to requirements. In this way, the amount of transmitted data and the amount of data stored by the second device can be reduced, thereby alleviating the pressure of data transmission and data storage.
在一些可能的实现方式中,第一设备可以通过如下方式获得目标拟合模型。具体地,第一设备可以根据多个数据对多个拟合模型进行模型拟合,以从多个拟合模型选择一个拟合模型来作为目标拟合模型。In some possible implementation manners, the first device may obtain the target fitting model in the following manner. Specifically, the first device may perform model fitting on multiple fitting models according to multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
其中,多个拟合模型可以是第一设备和第二设备预先协商的模型,第一设备可以将多个数据作为训练数据对多个拟合模型进行训练,直至满足训练结束条件。训练结束条件可以是参数迭代次数达到最大次数,或者是基于损失函数确定的损失值小于预设值。在结束训练后,第一设备根据各拟合模型的损失值确定目标拟合模型。The multiple fitting models may be models negotiated in advance by the first device and the second device, and the first device may use multiple data as training data to train the multiple fitting models until the training end condition is satisfied. The training end condition may be that the number of parameter iterations reaches the maximum number, or the loss value determined based on the loss function is less than a preset value. After finishing the training, the first device determines the target fitting model according to the loss value of each fitting model.
在具体实现时,第一设备可以将损失值最小的拟合模型作为目标拟合模型,也可以将损失值小于预设值的拟合模型作为目标拟合模型,或者是将损失值趋于收敛的拟合模型作为目标拟合模型。In specific implementation, the first device can use the fitting model with the smallest loss value as the target fitting model, or the fitting model with the loss value less than the preset value as the target fitting model, or the loss value tends to converge The fitting model of is used as the target fitting model.
在一些可能的实现方式中,多个拟合模型具体可以是线性模型、多项式模型和神经网络模型中的至少两个。考虑到模型的适用范围,多个拟合模型可以包括线性模型、多项式模型和神经网络模型,如此,通过对较多的模型进行拟合,可以提高拟合精度,有利于提高数据传输准确度。In some possible implementations, the multiple fitting models may specifically be at least two of a linear model, a polynomial model, and a neural network model. Considering the applicable scope of the model, multiple fitting models may include linear models, polynomial models, and neural network models. In this way, by fitting more models, the fitting accuracy can be improved and the accuracy of data transmission can be improved.
在一些可能的实现方式中,第一设备在向第二设备发送目标拟合模型时,可以向第二设备发送该第一设备选择的拟合模型的标识以及对应的模型参数。具体地,第一设备和第二设备在协商模型时,可以预先约定模型标识model ID,如线性模型的model ID可以为1,多项式模型的model ID可以为2,神经网络模型的model ID可以为3。模型一般可以通过函数进行表征,模型参数即为函数表达式的参数。In some possible implementation manners, when the first device sends the target fitting model to the second device, it may send the identification of the fitting model selected by the first device and the corresponding model parameters to the second device. Specifically, when the first device and the second device negotiate the model, the model ID can be agreed in advance. For example, the model ID of a linear model can be 1, the model ID of a polynomial model can be 2, and the model ID of a neural network model can be 3. Models can generally be characterized by functions, and model parameters are the parameters of function expressions.
以线性模型为例,其函数表达式为y=ax+b。其中,x为自变量,y为函数值,a和b即为该线性模型的参数。当目标拟合模型为线性模型时,第一设备可以发送拟合模型的标识“1”以及模型参数“a”、“b”至第二设备。Taking the linear model as an example, the function expression is y=ax+b. Among them, x is the independent variable, y is the function value, and a and b are the parameters of the linear model. When the target fitting model is a linear model, the first device may send the identification “1” of the fitting model and the model parameters “a” and “b” to the second device.
发送模型标识和参数可以进一步减少传输的数据量和存储的数据量,减少传输资源开销和存储资源开销,节省成本。Sending model identifiers and parameters can further reduce the amount of transmitted data and the amount of stored data, reduce transmission resource overhead and storage resource overhead, and save costs.
在一些可能的实现方式中,当第一设备获取的多个数据与其它数据存在关联时,第一设备还可以确定多个数据与其他数据的关联关系,该关联关系可以用关系函数表征。基于此,第一设备可以向第二设备发送与目标拟合模型关联的关系函数,如此,该关系函数可以用于基于目标拟合模型确定另一个目标拟合模型,通过该另一个目标拟合弄下可以恢复多个数据关联的其它数据中的至少一个数据。In some possible implementations, when multiple pieces of data acquired by the first device are associated with other data, the first device may also determine the association relationship between the multiple pieces of data and other data, and the association relationship may be characterized by a relationship function. Based on this, the first device can send the relationship function associated with the target fitting model to the second device. In this way, the relationship function can be used to determine another target fitting model based on the target fitting model. At least one of the other data associated with multiple data can be restored.
通过发送上述关系函数代替直接发送多个数据关联的其它数据,进一步减少了传输的数据量和存储的数据量,提高了压缩性能,减轻了传输压力和存储压力。By sending the above relational function instead of directly sending other data associated with multiple data, the amount of transmitted data and the amount of stored data is further reduced, the compression performance is improved, and the transmission pressure and storage pressure are reduced.
在一些可能的实现方式中,第一设备还可以向所述第二设备发送差值。所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值。所述差值可以用于结合所述目标拟合模型来恢复所述多个数据中的至少一个数据。其中,在目标拟合模型恢复数据基础上叠加上述差值,可以实现数据无损恢复,提高了传输数据的精确度。In some possible implementation manners, the first device may also send the difference value to the second device. The difference is a difference between the plurality of data acquired by the first device and the plurality of data restored by the first device based on the target fitting model. The difference may be used in combination with the target fitting model to restore at least one data of the plurality of data. Among them, the above difference is superimposed on the recovered data of the target fitting model to achieve lossless data recovery and improve the accuracy of data transmission.
在一些可能的实现方式中,上述多个数据可以是时间窗内的多个时序数据。其中,时间窗可以根据实际需求而设置,例如可以设置为一分钟。时序数据是指时间序列数据,即同一指标按时间顺序记录的数据列。在一个示例中,时间窗为一分钟,采集周期为一秒,则一个时间窗内可以采集60个数据。In some possible implementation manners, the foregoing multiple data may be multiple time series data within a time window. Among them, the time window can be set according to actual needs, for example, it can be set to one minute. Time series data refers to time series data, that is, a series of data recorded in chronological order for the same indicator. In an example, if the time window is one minute and the collection period is one second, then 60 data can be collected in one time window.
第二方面,本申请提供了一种数据处理方法。具体地,作为数据接收方的第二设备,其接收来自第一设备的目标拟合模型,该目标拟合模型由第一设备基于获取的多个数据获得,第二设备根据目标拟合模型恢复多个数据中的至少一个数据。In the second aspect, this application provides a data processing method. Specifically, the second device as the data receiver receives the target fitting model from the first device, the target fitting model is obtained by the first device based on the multiple data acquired, and the second device recovers according to the target fitting model At least one of the multiple data.
相较于直接接收多个数据,接收目标拟合模型以及存储目标拟合模型大幅减轻了数据传输压力和数据存储压力。而且,第二设备可以选择恢复多个数据中的至少一个数据,如 此,可以满足不同用户的个性化需求。Compared with directly receiving multiple data, receiving the target fitting model and storing the target fitting model greatly reduce the pressure of data transmission and data storage. Moreover, the second device can choose to restore at least one of the multiple data, so that the individual needs of different users can be met.
在一些可能的实现方式中,第二设备还可以接收与目标拟合模型关联的关系函数,基于该关系函数和目标拟合模型确定另一个目标拟合模型,如此,第二设备可以根据该另一个目标拟合模型恢复多个数据关联的其他数据中的至少一个数据。In some possible implementation manners, the second device may also receive a relationship function associated with the target fitting model, and determine another target fitting model based on the relationship function and the target fitting model. In this way, the second device may determine another target fitting model based on the relationship function and the target fitting model. A target fitting model restores at least one data among other data associated with multiple data.
相比于直接接收关联指标的多个数据,接收与目标拟合模型关联的关系函数,可以进一步减少传输的数据量和存储的数据量,从而减轻数据传输压力和数据存储压力。Compared with directly receiving multiple data of related indicators, receiving the relationship function associated with the target fitting model can further reduce the amount of transmitted data and the amount of stored data, thereby reducing the pressure of data transmission and data storage.
在一些可能的实现方式中,第二设备可以根据需求选择对数据进行无损恢复或者有损恢复。其中,对于数据精度要求较高的数据,可以选择无损恢复,对于数据精度要求相对较低的数据,可以选择有损恢复,节省成本。In some possible implementation manners, the second device may choose to perform lossless recovery or lossy recovery for data according to requirements. Among them, for data with higher data accuracy requirements, lossless recovery can be selected, and for data with relatively low data accuracy requirements, lossy recovery can be selected to save costs.
在具体实现时,第二设备根据目标拟合模型直接恢复数据,属于有损恢复。当第二设备还接收来自第一设备的差值,该差值是第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值,第二设备可以根据目标拟合模型和差值恢复多个数据中的至少一个。具体地,第二设备可以根据目标拟合模型先进行数据恢复,然后在此基础上叠加上述差值,得到原始数据,如此实现数据无损恢复。In specific implementation, the second device directly restores the data according to the target fitting model, which is a lossy restoration. When the second device also receives the difference value from the first device, the difference value is the difference value between the multiple data acquired by the first device and the multiple data recovered by the first device based on the target fitting model, The second device may restore at least one of the multiple data according to the target fitting model and the difference. Specifically, the second device may first perform data recovery according to the target fitting model, and then superimpose the above-mentioned difference on this basis to obtain the original data, thus realizing data lossless recovery.
第三方面,本申请提供了一种数据处理装置,该数据处理装置包括:In a third aspect, this application provides a data processing device, which includes:
通信模块,用于获取多个数据;Communication module, used to obtain multiple data;
拟合模块,用于基于所述多个数据获得目标拟合模型;A fitting module, configured to obtain a target fitting model based on the plurality of data;
所述通信模块,还用于向第二设备发送所述目标拟合模型,所述目标拟合模型用于恢复所述多个数据中的至少一个数据。The communication module is further configured to send the target fitting model to a second device, and the target fitting model is used to restore at least one data of the plurality of data.
在一些可能的实现方式中,所述拟合模块具体用于:In some possible implementation manners, the fitting module is specifically used for:
根据所述多个数据对多个拟合模型进行模型拟合,以从所述多个拟合模型选择一个拟合模型来作为所述目标拟合模型。Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
在一些可能的实现方式中,所述多个拟合模型包括线性模型、多项式模型和神经网络模型中的至少两个。In some possible implementations, the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
在一些可能的实现方式中,所述通信模块具体用于:In some possible implementation manners, the communication module is specifically used for:
向所述第二设备发送所述拟合模块选择的拟合模型的标识以及对应的模型参数。Send the identification of the fitting model selected by the fitting module and the corresponding model parameters to the second device.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
向所述第二设备发送与所述目标拟合模型关联的关系函数,所述关系函数用于基于所述目标拟合模型确定另一个目标拟合模型,所述另一个目标拟合模型用于恢复所述多个数据关联的其它数据中的至少一个数据。The relationship function associated with the target fitting model is sent to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target fitting model is used for At least one of the other data associated with the plurality of data is restored.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
向所述第二设备发送差值,所述差值用于结合所述目标拟合模型来恢复所述多个数据中的至少一个数据,所述差值是第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值。Send a difference value to the second device, where the difference value is used in combination with the target fitting model to recover at least one of the multiple data, and the difference value is the multiple data obtained by the first device. The difference between the data and the multiple data recovered by the first device based on the target fitting model.
在一些可能的实现方式中,所述多个数据包括时间窗内的多个时序数据。In some possible implementation manners, the plurality of data includes a plurality of time series data within a time window.
第四方面,本申请提供了一种数据处理装置,所述装置包括:In a fourth aspect, the present application provides a data processing device, which includes:
通信模块,用于接收来自第一设备的目标拟合模型,所述目标拟合模型由所述第一设备基于获取的多个数据获得;A communication module, configured to receive a target fitting model from a first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
恢复模块,用于根据所述目标拟合模型恢复所述多个数据中的至少一个数据。The restoration module is configured to restore at least one of the multiple data according to the target fitting model.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
接收与所述目标拟合模型关联的关系函数;Receiving a relationship function associated with the target fitting model;
所述装置还包括:The device also includes:
确定模块,用于根据所述关系函数和所述目标拟合模型确定另一个目标拟合模型;A determining module, configured to determine another target fitting model according to the relationship function and the target fitting model;
所述恢复模块还用于:The recovery module is also used for:
根据所述另一个目标拟合模型恢复所述多个数据关联的其它数据中的至少一个数据。At least one of the other data associated with the plurality of data is restored according to the another target fitting model.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
通信模块,用于接收来自第一设备的差值,所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值;A communication module, configured to receive a difference value from a first device, where the difference value is the difference between the multiple data acquired by the first device and the multiple data recovered by the first device based on the target fitting model Difference
所述恢复模块具体用于:The recovery module is specifically used for:
根据所述目标拟合模型和所述差值恢复所述多个数据中的至少一个数据。At least one data of the plurality of data is restored according to the target fitting model and the difference value.
第五方面,本申请提供了一种设备,所述设备包括处理器和存储器;In a fifth aspect, the present application provides a device including a processor and a memory;
所述处理器用于执行所述存储器中存储的指令,以使得所述处理器执行如第一方面或第二方面所述的数据处理方法。The processor is configured to execute instructions stored in the memory, so that the processor executes the data processing method according to the first aspect or the second aspect.
第六方面,本申请提供了一种计算机可读存储介质,包括指令,当其在设备上运行时,使得设备执行如第一方面或第二方面所述的数据处理方法。In a sixth aspect, the present application provides a computer-readable storage medium, including instructions, which when run on a device, cause the device to execute the data processing method as described in the first or second aspect.
第七方面,本申请提供了一种包含指令的计算机程序产品,当其在设备上运行时,使得设备执行上述第一方面或第二方面所述的数据处理方法。In a seventh aspect, this application provides a computer program product containing instructions, which when run on a device, causes the device to execute the data processing method described in the first or second aspect.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。On the basis of the implementation manners provided by the above aspects, this application can be further combined to provide more implementation manners.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。In order to more clearly illustrate the technical methods of the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments.
图1为本申请实施例提供的一种数据处理方法的系统架构示意图;FIG. 1 is a schematic diagram of a system architecture of a data processing method provided by an embodiment of this application;
图2为本申请实施例提供的一种数据处理方法的交互流程图;FIG. 2 is an interaction flowchart of a data processing method provided by an embodiment of this application;
图3为本申请实施例提供的一种数据处理方法的示意图;FIG. 3 is a schematic diagram of a data processing method provided by an embodiment of the application;
图4为本申请实施例提供的一种数据处理方法的示意图;FIG. 4 is a schematic diagram of a data processing method provided by an embodiment of the application;
图5为本申请实施例提供的一种数据处理方法的示意图;FIG. 5 is a schematic diagram of a data processing method provided by an embodiment of this application;
图6为本申请实施例提供的一种数据处理装置的结构示意图;FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of this application;
图7为本申请实施例提供的一种数据处理装置的结构示意图;FIG. 7 is a schematic structural diagram of a data processing device provided by an embodiment of the application;
图8为本申请实施例提供的一种设备的结构示意图;FIG. 8 is a schematic structural diagram of a device provided by an embodiment of this application;
图9为本申请实施例提供的一种设备的结构示意图。FIG. 9 is a schematic structural diagram of a device provided by an embodiment of the application.
具体实施方式Detailed ways
下面结合附图,对本申请的实施例进行描述。The embodiments of the present application will be described below in conjunction with the drawings.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类 似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。The terms "first", "second", etc. in the specification and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application.
图1示出了本申请实施例的一种可能的应用场景。在该应用场景中,设备102和设备104为网络节点设备。其中,设备102可以为发送数据的网络节点设备,设备104为接收所述数据的网络节点设备。其中,设备102发送的数据可以是设备102自身产生的数据,也可以是设备102从其他设备获取的数据。在一些实现方式中,设备102也可以作为接收数据的设备,设备104也可以作为发送数据的设备。本申请仅以由设备102向设备104发送数据进行示例说明,并不构成对本申请技术方案的限定。Figure 1 shows a possible application scenario of an embodiment of the present application. In this application scenario, the device 102 and the device 104 are network node devices. The device 102 may be a network node device that sends data, and the device 104 is a network node device that receives the data. The data sent by the device 102 may be data generated by the device 102 itself, or may be data obtained by the device 102 from other devices. In some implementations, the device 102 may also be used as a device for receiving data, and the device 104 may also be used as a device for sending data. This application only uses the device 102 to send data to the device 104 as an example for description, and does not constitute a limitation to the technical solution of the application.
如图1所示,设备102可以是智能手机、智能手环、平板电脑等移动终端设备,该设备可以通过位置传感器、心跳传感器等对位置、心跳中的任意一个或多个指标等进行监控产生监控数据。在一些实施例中,设备102还可以是部署有云上应用的设备,该设备可以对应用的吞吐量、时延等任意一个或多个指标进行监控产生监控数据。当然设备102也可以是物联网(the Internet of things,IoT)的端侧设备,如智能冰箱、智能灯泡等等,该设备可以对温度、亮度等任意一个或多个指标进行监控产生监控数据。As shown in Figure 1, the device 102 can be a mobile terminal device such as a smart phone, a smart bracelet, a tablet computer, etc. The device can monitor any one or more indicators in the position and heartbeat through a position sensor, a heartbeat sensor, etc. Monitoring data. In some embodiments, the device 102 may also be a device deployed with an application on the cloud, and the device may monitor any one or more indicators such as throughput and delay of the application to generate monitoring data. Of course, the device 102 may also be an end-side device of the Internet of things (IoT), such as a smart refrigerator, a smart light bulb, etc., and the device may monitor any one or more indicators such as temperature and brightness to generate monitoring data.
对应地,设备104可以是数据的中转设备。例如在云服务场景中,设备104可以是云的边缘设备,如网关设备(gateway,GW)、交换机设备等等。在一些实施例中,设备104还可以是数据的消费设备,例如服务器等等。Correspondingly, the device 104 may be a data transfer device. For example, in a cloud service scenario, the device 104 may be an edge device of the cloud, such as a gateway device (gateway, GW), a switch device, and so on. In some embodiments, the device 104 may also be a data consuming device, such as a server or the like.
设备102如果直接向设备104发送采集的数据,在数据较大时,一方面需要较大的带宽资源,另一方面需要设备104具有足够的存储空间,对于网络传输和存储均提出了较高的要求。基于此,本申请提供了一种数据处理方法,该方法支持作为数据发送方的设备102先获取多个数据,然后基于多个数据获得目标拟合模型,通过发送目标拟合模型代替直接发送多个数据,如此,可以减少发送数据量,降低对网络传输的要求,降低传输压力。并且接收方可以存储目标拟合模型,并在需要使用数据时,再基于目标拟合模型恢复所述多个数据中的至少一个数据,如此也降低了对数据存储的要求,降低了存储压力,提升存储性能。If the device 102 directly sends the collected data to the device 104, when the data is large, on the one hand, a larger bandwidth resource is required, and on the other hand, the device 104 is required to have sufficient storage space. Therefore, higher requirements are proposed for network transmission and storage. Require. Based on this, the present application provides a data processing method that supports the device 102 as a data sender to obtain multiple data first, and then obtain a target fitting model based on the multiple data, and send the target fitting model instead of directly sending the multiple data. Individual data, in this way, can reduce the amount of data sent, reduce the requirements for network transmission, and reduce transmission pressure. And the receiver can store the target fitting model, and when the data needs to be used, restore at least one of the multiple data based on the target fitting model, which also reduces the requirements for data storage and reduces the storage pressure. Improve storage performance.
为了使得本申请的技术方案更加清楚、易于理解,下面将从设备102和设备104交互的角度对本申请实施例提供的数据处理方法进行详细介绍。In order to make the technical solution of the present application clearer and easier to understand, the data processing method provided by the embodiment of the present application will be introduced in detail from the perspective of interaction between the device 102 and the device 104.
参见图2所示的数据处理方法的交互流程图,该方法包括:Referring to the interaction flowchart of the data processing method shown in FIG. 2, the method includes:
S202:设备102获取多个数据。S202: The device 102 obtains multiple pieces of data.
在具体实现时,设备102可以产生多个数据,或者从其他设备获取多个数据。其中,设备102获取的多个数据(如设备102自行产生的数据、设备102从其他设备获取的数据)可以是时序数据,即指标在一个时间段内的数据。当然,设备102获取的多个数据也可以是非时序数据,如与位置相关的数据。In a specific implementation, the device 102 may generate multiple pieces of data or obtain multiple pieces of data from other devices. Among them, multiple data acquired by the device 102 (for example, data generated by the device 102 itself, data acquired by the device 102 from other devices) may be time series data, that is, data of an indicator within a time period. Of course, the multiple pieces of data acquired by the device 102 may also be non-sequential data, such as location-related data.
为了便于理解,下面结合具体示例进行说明。In order to facilitate understanding, the following description is combined with specific examples.
在一个示例中,设备102获取的多个数据可以是设备对虚机的中央处理器(central process unit,CPU)使用率、内存(memory)使用率等指标中的一个或多个进行监控,获得的虚机在一个时间段内的CPU使用率和/或虚机在一个时间段内的内存使用率。当然,设 备102获取的多个数据也可以是设备对应用的吞吐量、时延等指标中的一个或多个进行监控,获得的应用在一个时间段内的吞吐量和/或时延。In an example, the multiple data acquired by the device 102 may be that the device monitors one or more of the virtual machine's central processing unit (CPU) usage rate, memory (memory) usage rate and other indicators to obtain The CPU usage of the virtual machine in a period of time and/or the memory usage of the virtual machine in a period of time. Of course, the multiple pieces of data acquired by the device 102 may also be the device's monitoring of one or more of the application's throughput, delay and other indicators, and the acquired throughput and/or delay of the application in a period of time.
需要说明,本申请实施例所述一个时间段内的CPU使用率、内存使用率、吞吐量、时延具体是指一个时间段内不同时间点对应的CPU使用率、内存使用率、吞吐量、时延。基于此,进行数据交互的设备102和设备104可以预先协商数据的采集规则。例如数据为时序数据时,设备102和设备104可以预先协商数据的采集周期,即采集时间间隔。当然,设备102也可以在发送数据时,向设备104发送采集规则,如发送数据的采集周期。It should be noted that the CPU usage rate, memory usage rate, throughput, and time delay in a time period described in the embodiments of the present application specifically refer to the CPU usage rate, memory usage rate, throughput, and time delay corresponding to different time points in a time period. Time delay. Based on this, the device 102 and the device 104 that perform data interaction can negotiate data collection rules in advance. For example, when the data is time series data, the device 102 and the device 104 may pre-negotiate the data collection period, that is, the collection time interval. Of course, the device 102 may also send collection rules to the device 104 when sending data, such as the collection period of the sent data.
在另一个示例中,设备102获取的多个数据也可以是对不同位置的降雨量等指标进行监控所产生的数据。该数据属于非时序数据,其可以随着不同地理位置变化而变化。与时序数据类似,设备102和设备104可以预先协商数据的采集位置间隔。例如,采集位置间隔可以通过经纬度进行衡量,如1个经度或一个维度。在一些可能的实现方式中,采集位置间隔也可以通过长度、高度进行衡量,如为5公里(kilometer,km)等等。In another example, the multiple data acquired by the device 102 may also be data generated by monitoring indicators such as rainfall at different locations. This data is non-time series data, which can change with different geographic locations. Similar to time series data, the device 102 and the device 104 may negotiate the data collection location interval in advance. For example, the collection location interval can be measured by latitude and longitude, such as one longitude or one latitude. In some possible implementations, the collection location interval can also be measured by length and height, such as 5 kilometers (kilometer, km), and so on.
设备102获取多个数据时,由于在传输数据时往往还需要传输描述该数据的元数据meta data。元数据具体是描述数据的数据,其可以是数据的至少一种属性。例如,对于CPU使用率这一种数据而言,其元数据可以包括指标名称、所属项目的项目标识、服务名称等等。而同一指标对应的多个数据的元数据往往是一致的,因此,与直接传输数据及其元数据相比,获取多个数据可以实现对元数据的聚合,仅需传输一份元数据即可。如此,可以实现数据压缩。When the device 102 acquires multiple pieces of data, it often needs to transmit metadata describing the data when transmitting the data. Metadata is specifically data describing data, which can be at least one attribute of data. For example, for data such as CPU usage, its metadata may include indicator names, project identifiers of the projects to which they belong, service names, and so on. The metadata of multiple data corresponding to the same indicator is often the same. Therefore, compared with directly transmitting data and its metadata, obtaining multiple data can realize the aggregation of metadata, and only need to transmit one piece of metadata. . In this way, data compression can be achieved.
设备102获取多个数据以后可以对多个数据进行进一步处理,如进行压缩处理,如此可以减少传输开销和存储开销。在具体实现时,设备102可以根据预先配置的获取数量N获取多个数据,也即设备102可以一次获取N个数据。当设备102获取的数据为时序数据时,设备102可以获取时间窗Δt内的多个时序数据。其中,获取数量N或者时间窗Δt可以根据经验值进行设置,本申请实施例对此不作限定。After acquiring multiple data, the device 102 can further process the multiple data, such as performing compression processing, which can reduce transmission overhead and storage overhead. In a specific implementation, the device 102 can acquire multiple data according to the pre-configured acquisition quantity N, that is, the device 102 can acquire N data at a time. When the data acquired by the device 102 is time series data, the device 102 may acquire multiple time series data within the time window Δt. Wherein, the acquisition number N or the time window Δt can be set according to an empirical value, which is not limited in the embodiment of the application.
本申请实施例中的数据可以是数值。在一些情况下,数据也可以是字符,例如表征等级的字符A、B、C、D等等。The data in the embodiments of this application may be numerical values. In some cases, the data may also be characters, such as characters A, B, C, D, etc. that characterize the level.
S204:设备102基于所述多个数据获得目标拟合模型。S204: The device 102 obtains a target fitting model based on the multiple data.
在具体实现时,设备102可以对多个数据进行模型拟合,并将满足预设条件的拟合模型确定为目标拟合模型。在一些实现方式中,为了减少模型拟合的计算量,提高模型拟合效率,设备102和设备104可以预先协商多个拟合模型,如此,设备102可以根据多个数据对这多个拟合模型进行模型拟合,以从多个拟合模型中选择一个拟合模型作为目标拟合模型。In specific implementation, the device 102 may perform model fitting on multiple data, and determine a fitting model that meets a preset condition as a target fitting model. In some implementations, in order to reduce the amount of calculation for model fitting and improve the efficiency of model fitting, the device 102 and the device 104 may negotiate multiple fitting models in advance. In this way, the device 102 may perform the multiple fitting models based on multiple data. The model performs model fitting to select one fitting model from multiple fitting models as the target fitting model.
其中,设备102和设备104协商的多个拟合模型可以是易于拟合的模型。在一些实现方式中,多个拟合模型可以是线性模型、多项式模型和神经网络模型中的至少两个。其中,神经网络模型可以是三层神经网络模型,如此,可以降低模型复杂度,使得模型更加容易拟合。Among them, the multiple fitting models negotiated by the device 102 and the device 104 may be models that are easy to fit. In some implementations, the multiple fitting models may be at least two of a linear model, a polynomial model, and a neural network model. Among them, the neural network model may be a three-layer neural network model. In this way, the complexity of the model can be reduced and the model is easier to fit.
设备102在根据多个数据对多个拟合模型进行模拟拟合时,可以根据多个数据以及模型拟合所得数据计算各拟合模型的损失值,然后根据该损失值从多个拟合模型中选择一个拟合模型作为目标拟合模型。When the device 102 simulates and fits multiple fitting models based on multiple data, it can calculate the loss value of each fitting model based on the multiple data and the data obtained from the model fitting, and then calculate the loss value of each fitting model according to the loss value from the multiple fitting models Choose a fitting model as the target fitting model.
具体地,设备102可以将多个数据标记为训练数据,然后基于训练数据,结合损失函数训练拟合模型,优化拟合模型参数,直至满足训练结束条件。该训练结束条件可以是参数迭代次数达到最大次数,或者是基于损失函数确定的损失值小于预设值。在结束训练后,设备102根据各拟合模型的损失值确定目标拟合模型。Specifically, the device 102 may mark multiple pieces of data as training data, and then train a fitting model based on the training data in combination with a loss function, and optimize the fitting model parameters until the training end condition is satisfied. The training end condition may be that the number of parameter iterations reaches the maximum number, or the loss value determined based on the loss function is less than a preset value. After finishing the training, the device 102 determines the target fitting model according to the loss value of each fitting model.
在一些可能的实现方式中,设备102可以将损失值最小的拟合模型作为目标拟合模型,也可以将损失值小于预设值的拟合模型作为目标拟合模型,或者是将损失值趋于收敛的拟合模型作为目标拟合模型。In some possible implementations, the device 102 may use the fitting model with the smallest loss value as the target fitting model, or the fitting model with the loss value less than a preset value as the target fitting model, or the loss value tends to be The fitting model that converges is used as the target fitting model.
S206:设备102确定其获取的多个数据与其基于目标拟合模型恢复的多个数据的差值。S206: The device 102 determines the difference between the multiple pieces of data it acquires and the multiple pieces of data recovered based on the target fitting model.
具体地,设备102获取的多个数据与其基于目标拟合模型恢复的多个数据是一一对应的。例如,设备102获取N个数据,则可以基于目标拟合模型恢复N个数据。设备102可以针对获取的多个数据中的每个数据,分别与其对应的目标拟合模型恢复的数据作差,从而获得多个差值,该差值用于实现数据无损恢复,即根据目标拟合模型和差值实现精准地恢复第一设备获取的多个数据。Specifically, the multiple data acquired by the device 102 have a one-to-one correspondence with the multiple data recovered based on the target fitting model. For example, if the device 102 acquires N pieces of data, it can restore the N pieces of data based on the target fitting model. The device 102 can make a difference between the data recovered by the corresponding target fitting model for each of the multiple acquired data, thereby obtaining multiple differences. The differences are used to achieve data lossless recovery, that is, according to the target model. The combined model and the difference realize the accurate recovery of multiple data acquired by the first device.
应理解,在执行本申请实施例提供的数据处理方法时,设备102也可以不执行上述S206。例如,在数据有损压缩即能满足需求的场景中,设备102可以不执行上述S206。It should be understood that, when executing the data processing method provided in the embodiment of the present application, the device 102 may not execute the foregoing S206. For example, in a scenario where data lossy compression can meet the demand, the device 102 may not perform the above S206.
S208:设备102向设备104发送所述目标拟合模型。S208: The device 102 sends the target fitting model to the device 104.
在具体实现时,设备102和设备104可以约定协商的多个拟合模型的模型标识。如此,设备102在从多个拟合模型中选择一个拟合模型作为目标拟合模型时,可以向设备104发送其选择的拟合模型的标识以及对应的模型参数,从而实现向设备104发送目标拟合模型。基于此,可以进一步减少传输开销。In a specific implementation, the device 102 and the device 104 may agree on the model identifiers of the multiple fitting models negotiated. In this way, when the device 102 selects a fitting model from a plurality of fitting models as the target fitting model, it can send the identification of the fitted model selected and the corresponding model parameters to the device 104, so as to realize the sending of the target to the device 104 Fit the model. Based on this, the transmission overhead can be further reduced.
为了便于理解,本申请还提供了一具体示例进行说明。For ease of understanding, this application also provides a specific example for description.
在该示例中,线性模型的标识model_ID可以为1,多项式模型的标识model_ID可以为2,神经网络模型的标识model_ID可以为3。其中,线性模型可以表示为:In this example, the identification model_ID of the linear model may be 1, the identification model_ID of the polynomial model may be 2, and the identification model_ID of the neural network model may be 3. Among them, the linear model can be expressed as:
y=ax+b            (1)y=ax+b (1)
其中,a和b为线性模型的参数,分别标识斜率和截距。x可以为时间或经纬度(表示地区),y可以为设备104恢复的数据。Among them, a and b are the parameters of the linear model, which identify the slope and intercept respectively. x can be time or latitude and longitude (representing a region), and y can be data restored by the device 104.
当设备102选择线性模型作为目标拟合模型时,设备102可以发送包括如下数据在内的数列或数组,如{1,a,b}或[1,a,b],以向设备104发送目标拟合模型。When the device 102 selects a linear model as the target fitting model, the device 102 may send a sequence or array including the following data, such as {1, a, b} or [1, a, b], to send the target to the device 104 Fit the model.
假设时间窗中包括60个数据,与发送60个数据相比,在本申请实施例中,设备102仅需发送一个很短的数列或数组,如针对线性模型,仅需发送模型的标识和模型的参数,共计三个参数,相当于压缩了95%的数据,大幅减少了传输数据量和存储数据量。Assuming that the time window includes 60 pieces of data, compared with sending 60 pieces of data, in this embodiment of the present application, the device 102 only needs to send a very short sequence or array. For example, for a linear model, only the identification and model of the model need to be sent. There are three parameters in total, which is equivalent to 95% of the data compressed, which greatly reduces the amount of transmitted data and the amount of stored data.
需要说明,S206和S208的执行顺序可以是任意的,例如可以并行执行,也可以按照设定的顺序先后执行。It should be noted that the execution order of S206 and S208 may be arbitrary, for example, they may be executed in parallel, or may be executed sequentially in a set order.
S210:设备102向设备104发送所述设备102获取的多个数据与其基于目标拟合模型恢复的多个数据的差值。S210: The device 102 sends to the device 104 the difference between the multiple pieces of data acquired by the device 102 and the multiple pieces of data restored based on the target fitting model.
设备102在发送所述差值和所述目标拟合模型时,可以采用一条消息发送,也可以采用多条消息分别发送。When the device 102 sends the difference and the target fitting model, one message may be used to send, or multiple messages may be used to send separately.
设备102可以并行执行上述S208和S210,也可以按照设定的顺序先后执行。The device 102 may execute the above S208 and S210 in parallel, or may execute successively in a set sequence.
应理解,与S206类似,在执行本申请实施例提供的数据处理方法时,设备102也可以不执行上述S210。例如,在数据有损压缩即能满足需求的场景中,设备102可以不执行上述S210。It should be understood that, similar to S206, when executing the data processing method provided in the embodiment of the present application, the device 102 may not execute the foregoing S210. For example, in a scenario where data lossy compression can meet the demand, the device 102 may not perform the foregoing S210.
S212:设备104根据所述目标拟合模型和所述差值恢复多个数据。S212: The device 104 restores multiple data according to the target fitting model and the difference.
为了方便理解,可以将设备102获取的数据记作y 0,设备102基于目标拟合模型恢复的数据记作y 1,设备102获取的数据与设备102基于目标拟合模型恢复的数据y 1的差值记作Δy。 For ease of understanding, the data acquired by the device 102 can be denoted as y 0 , the data recovered by the device 102 based on the target fitting model is denoted as y 1 , the data acquired by the device 102 and the data recovered by the device 102 based on the target fitting model y 1 The difference is recorded as Δy.
设备104接收到目标拟合模型和差值Δy时,可以先根据目标拟合模型恢复y 1。例如针对时序数据,设备104可以将时间代入目标拟合模型的表达式,以恢复y 1。需要说明,针对有序的时序数据,例如周期性的时序数据,设备104可以根据直接基于约定的采集周期将对应时间代入目标拟合模型的表达式,以恢复y 1。针对非有序的时序数据,设备102还可以向设备104发送时间,如“1,2,4,5”,如此,设备104可以将设备102发送的时间代入目标拟合模型的表达式恢复y 1When the device 104 receives the target fitting model and the difference Δy, it may first restore y 1 according to the target fitting model. For example, for time series data, the device 104 may substitute time into the expression of the target fitting model to restore y 1 . It should be noted that for ordered time series data, such as periodic time series data, the device 104 may substitute the corresponding time into the expression of the target fitting model according to the agreed collection period directly to restore y 1 . For non-ordered time series data, the device 102 can also send the time to the device 104, such as "1, 2, 4, 5". In this way, the device 104 can substitute the time sent by the device 102 into the expression recovery of the target fitting model. 1 .
进一步地,在恢复y 1后,设备104可以将y1与对应的差值Δy叠加,从而恢复y 0,实现数据无损恢复。 Further, after y 1 is restored, the device 104 may superimpose y 1 with the corresponding difference Δy, thereby restoring y 0 , achieving lossless data recovery.
应理解,与S206、210类似,在执行本申请实施例提供的数据处理方法时,在数据有损压缩即能满足需求的场景中,设备104可以不执行将y1与对应的差值Δy叠加以恢复y 0的步骤。也即设备104可以根据目标拟合模型恢复多个数据。 It should be understood that, similar to S206 and 210, when the data processing method provided in the embodiment of the present application is executed, the device 104 may not perform superimposing y1 with the corresponding difference Δy in a scenario where data lossy compression can meet the demand. Steps to restore y 0. That is, the device 104 can recover multiple data according to the target fitting model.
基于上述内容描述,本申请实施例提供了一种数据处理方法,该方法支持作为数据发送方的设备102先获取多个数据,然后基于多个数据获得目标拟合模型,通过发送目标拟合模型代替直接发送多个数据,如此,可以减少发送数据量,降低对网络传输的要求,降低传输压力。并且作为接收方的设备104可以存储目标拟合模型,并在需要使用数据时,再基于目标拟合模型恢复多个数据,如此也降低了对数据存储的要求,降低了存储压力,提升存储性能。Based on the above description, the embodiment of the present application provides a data processing method that supports the device 102 as the data sender to obtain multiple data first, and then obtain a target fitting model based on the multiple data, and send the target fitting model Instead of sending multiple data directly, in this way, the amount of data sent can be reduced, the requirements for network transmission, and the transmission pressure can be reduced. And the device 104 as the receiver can store the target fitting model, and when data is needed, restore multiple data based on the target fitting model, which also reduces the requirements for data storage, reduces storage pressure, and improves storage performance .
进一步地,本申请提供的数据处理方法还支持有损和无损压缩两种方式以供用户选择。针对有损压缩方式,设备102发送目标拟合模型,设备104即根据该目标拟合模型恢复设备102获取的数据。针对无损压缩方式,设备102还发送设备102获取的数据与其基于目标拟合模型恢复数据的差值,设备104根据目标拟合模型和上述差值精确恢复设备102获取的数据。并且,由于设备102发送的差值一般是很小的数值,在编码上也会比大的数值占用的位数少,因此也可以得到压缩的效果,降低传输和存储压力。Further, the data processing method provided in this application also supports two methods of lossy and lossless compression for users to choose. For the lossy compression mode, the device 102 sends the target fitting model, and the device 104 restores the data acquired by the device 102 according to the target fitting model. For the lossless compression mode, the device 102 also sends the difference between the data acquired by the device 102 and the data recovered based on the target fitting model, and the device 104 accurately restores the data acquired by the device 102 according to the target fitting model and the aforementioned difference. In addition, since the difference value sent by the device 102 is generally a small value, the number of bits occupied by the encoding will be less than that of a large value. Therefore, the compression effect can also be obtained and the transmission and storage pressure can be reduced.
为了便于理解,本申请实施例还提供了数据处理方法的一个示意图。如图3所示,设备102可以获取时间窗内多个数据,然后根据获取的多个数据进行模型拟合,得到目标拟合模型。需要说明,图3是以数据为时序数据,目标拟合模型为正弦模型进行示例说明,在其他可能的实现方式中,也可以是其他类型数据以及其他类型模型,本实施例对此不作限定。To facilitate understanding, the embodiment of the present application also provides a schematic diagram of a data processing method. As shown in FIG. 3, the device 102 can acquire multiple data in a time window, and then perform model fitting according to the multiple acquired data to obtain a target fitting model. It should be noted that FIG. 3 uses the data as the time series data and the target fitting model as the sine model for illustration. In other possible implementations, it may also be other types of data and other types of models, which is not limited in this embodiment.
接着,设备102可以向设备104发送目标拟合模型以及获取的数据与目标拟合模型恢复数据的差值。在该示例中,差值具体为[0.01,0.12,0.03,…,0.06]。设备104可以根据目 标拟合模型进行数据有损恢复。在对数据精确度要求较高时,设备104还可以在数据有损恢复结果的基础上叠加上述差值,从而实现数据无损恢复。Then, the device 102 may send the target fitting model and the difference between the acquired data and the target fitting model recovery data to the device 104. In this example, the difference is specifically [0.01, 0.12, 0.03,..., 0.06]. The device 104 can perform data loss recovery according to the target fitting model. When the data accuracy is relatively high, the device 104 may also superimpose the aforementioned difference on the result of the lossy recovery of the data, so as to realize the lossless recovery of the data.
图3是以每个时刻采集一个数据,即每个时间戳对应一个数值进行示例说明的。然而,在许多场景下,一个时刻可以采集多个指标的数值。例如,设备102每分钟采集一次环境信息,该环境信息可以包括温度和湿度两个指标。Figure 3 is an example of collecting one data at each time, that is, each time stamp corresponds to a value. However, in many scenarios, the values of multiple indicators can be collected at a time. For example, the device 102 collects environmental information once every minute, and the environmental information may include two indicators of temperature and humidity.
设备102采集的多个指标可以是相互独立的,也可以是具有关联关系的。针对具有关联关系的指标,还可以通过关联性对数据进行进一步压缩,以进一步减少传输开销和存储开销。The multiple indicators collected by the device 102 may be independent of each other, or may have an associated relationship. For indicators that have an association relationship, the data can be further compressed through association to further reduce transmission overhead and storage overhead.
具体地,针对具有关联关系的多个指标,设备102可以仅针对一个指标对应的多个数据进行模型拟合。为了方便描述,可以将进行模型拟合的指标称为基础指标。设备102可以根据多个指标的数据确定多个指标中其他指标与上述基础指标的关联关系。其中,关联关联可以通过关系函数进行表征。例如,关联关系为线性时,可以通过线性函数进行表征。Specifically, for multiple indicators having an association relationship, the device 102 may perform model fitting only on multiple data corresponding to one indicator. For the convenience of description, the index for model fitting can be called the basic index. The device 102 may determine the association relationship between other indicators among the multiple indicators and the foregoing basic indicators according to the data of the multiple indicators. Among them, the association can be characterized by the relation function. For example, when the relationship is linear, it can be characterized by a linear function.
基于此,设备102的其他指标向设备104发送目标拟合模型关联的关系函数,设备104可以根据关系函数和所述目标拟合模型确定另一个目标拟合模型。具体是设备104将目标拟合模型的表达式代入上述关系函数,从而得到另一个目标拟合模型的表达式。然后设备104可以根据另一个目标拟合模型恢复多个数据关联的其他多个数据,即与基础指标关联的其他指标对应的多个数据。其中,设备104根据另一个目标拟合模型恢复关联的其他多个数据的具体实现与根据一个目标模型恢复多个数据类似,具体可以参见上文相关内容描述,在此不再赘述。Based on this, other indicators of the device 102 send the relationship function associated with the target fitting model to the device 104, and the device 104 may determine another target fitting model according to the relationship function and the target fitting model. Specifically, the device 104 substitutes the expression of the target fitting model into the above-mentioned relation function, thereby obtaining the expression of another target fitting model. Then, the device 104 may restore other multiple data associated with the multiple data according to another target fitting model, that is, multiple data corresponding to other indicators associated with the basic indicator. Wherein, the specific implementation of the device 104 restoring the associated multiple data according to another target fitting model is similar to that of restoring multiple data according to one target model. For details, please refer to the description of related content above, which will not be repeated here.
为了便于理解,本申请还提供了一具体示例对恢复关联的多个数据的过程进行说明。For ease of understanding, this application also provides a specific example to illustrate the process of restoring multiple pieces of associated data.
如图4所示,对于包括多指标的数据,设备102可以根据各指标对应的多个数据确定指标之间的关联关系。在该示例中,8个指标在时间窗内的多个数据可以分为两组,组内指标具有关联关系,组间指标不具有关联关系。As shown in FIG. 4, for data including multiple indicators, the device 102 may determine the association relationship between the indicators according to multiple data corresponding to each indicator. In this example, the multiple data of the 8 indicators in the time window can be divided into two groups, the indicators within the group have an association relationship, and the indicators between the groups do not have an association relationship.
对于不具有关联关系的指标,可以按照如图2所示实施例提供的数据处理方法进行压缩。在该示例中,设备102可以针对每组指标中的一个指标对应的多个数据进行模型拟合,得到对应的目标拟合模型。然后设备102向设备104发送目标拟合模型以及表征关联关系的关系函数。如图4所示,设备102发送两个目标拟合模型,以及与每个目标拟合模型关联的3个关系函数。设备104可以根据目标拟合模型恢复对应指标的多个数据。此外,设备104可以根据目标拟合模型以及关系函数确定具有关联关系的指标对应的目标拟合模型,并基于具有关联关系的指标对应的目标拟合模型恢复关联的多个数据。For indicators that do not have an association relationship, they can be compressed according to the data processing method provided in the embodiment shown in FIG. 2. In this example, the device 102 may perform model fitting for multiple data corresponding to one indicator in each group of indicators to obtain a corresponding target fitting model. Then the device 102 sends the target fitting model and the relationship function representing the association relationship to the device 104. As shown in FIG. 4, the device 102 sends two target fitting models and three relationship functions associated with each target fitting model. The device 104 can restore multiple data corresponding to the index according to the target fitting model. In addition, the device 104 may determine the target fitting model corresponding to the index having the correlation relationship according to the target fitting model and the relationship function, and restore multiple pieces of related data based on the target fitting model corresponding to the index having the correlation relationship.
下面以CPU使用率和memory使用率这两个指标的数据处理为例对本申请的数据处理方法进行介绍。In the following, the data processing method of the present application will be introduced by taking the data processing of the CPU usage rate and the memory usage rate as an example.
如图5所示,设备102按照采集周期为1秒采集CPU使用率和memory使用率,该示例中,设备102需要采集10分钟以内的数据,并向设备104发送这10分钟以内的数据。在传统方法中,设备102每采集一个数据,即将其数据(data)以及对应的元数据(meta data)发送至设备104。As shown in FIG. 5, the device 102 collects the CPU usage rate and the memory usage rate according to the collection period of 1 second. In this example, the device 102 needs to collect data within 10 minutes and send the data within the 10 minutes to the device 104. In the traditional method, each time the device 102 collects a piece of data, its data and corresponding metadata are sent to the device 104.
参见图5,设备102发送的元数据包括CPU所属项目的项目标识project_id,集群标识 cluster_id,服务名称service_name、服务实例名称service_instance_name以及指标名称metric_name。由于设备102发送的数据为时序数据,因此,在发送数据时需要发送时间戳timestamp和数据值value。基于此,一个指标在一个时间点的数据可以是上述元数据和数据组成的数据块。设备102发送1个指标在10分钟内的数据即发送600个数据块。Referring to FIG. 5, the metadata sent by the device 102 includes the project identifier project_id of the project to which the CPU belongs, the cluster identifier cluster_id, the service name service_name, the service instance name service_instance_name, and the metric name metric_name. Since the data sent by the device 102 is time series data, it is necessary to send the time stamp timestamp and the data value value when sending data. Based on this, the data of an indicator at a point in time may be a data block composed of the above metadata and data. The device 102 sends 1 index of data within 10 minutes, that is, 600 data blocks are sent.
其中,一个指标在不同时刻对应的元数据往往是一致的,通过获取多个数据(如缓存多个数据)可以对多个数据的元数据进行聚合,因此,设备102仅需要发送一个数据块,在该数据块中数据部分可以包括起始时间戳timestamp_from和结束时间戳timestamp_to。此外,数据值也由单个数值变更为多个时刻对应的数值组成的数组。由于无需重复发送重复的元数据,设备104也无需存储重复的元数据,因而降低了传输压力和存储压力。Among them, the metadata corresponding to an indicator at different times is often the same. By obtaining multiple data (such as caching multiple data), the metadata of multiple data can be aggregated. Therefore, the device 102 only needs to send one data block. The data part in the data block may include a start timestamp timestamp_from and an end timestamp timestamp_to. In addition, the data value is also changed from a single value to an array of values corresponding to multiple moments. Since there is no need to repeatedly send repeated metadata, the device 104 does not need to store the repeated metadata, thereby reducing transmission pressure and storage pressure.
进一步地,设备102可以基于获取的多个数据进行模型拟合,具体是基于聚合得到的数组作为训练数据,对预先设定的几种拟合模型,如线性模型、多项式模型以及神经网络模型分别进行拟合,计算各拟合模型的损失值,确定目标拟合模型。设备102在value字段记录模型标识以及模型参数即可,如此,value字段可以由包括60个数据的数组减少为包括3个数据的数组,进一步减少了数据量,降低了传输压力和存储压力。设备104接收上述数据并进行存储。对应地,当设备104对CPU使用率的一些数据有需求时,设备104可以读取value字段的模型标识和模型参数,恢复目标拟合模型,然后基于目标拟合模型恢复多个CPU使用率数据中的至少一个数据。Further, the device 102 can perform model fitting based on multiple acquired data, specifically based on the aggregated array as training data, and perform the respective pre-set fitting models, such as linear model, polynomial model, and neural network model. Perform fitting, calculate the loss value of each fitting model, and determine the target fitting model. The device 102 only needs to record the model identification and model parameters in the value field. In this way, the value field can be reduced from an array including 60 data to an array including 3 data, which further reduces the amount of data and reduces transmission pressure and storage pressure. The device 104 receives the above-mentioned data and stores it. Correspondingly, when the device 104 has a demand for some data of CPU usage, the device 104 can read the model identification and model parameters of the value field, restore the target fitting model, and then restore multiple CPU usage data based on the target fitting model At least one data in.
当memory使用率和CPU使用率存在关联关系时,还可以对待发送的数据进行进一步压缩。具体是,针对关联指标memory使用率,设备102发送简化后的元数据以及简化后的数据组成的一个数据块即可。如图5所示,简化后的元数据包括指标名称metric_name,在该示例中为memory,该数据的其他元数据可以参见关联的指标,也称作参考指标。简化后的数据包括参考指标(reference,ref)字段和value字段,其中参考指标字段的字段值为CPU,value字段用于标识该指标与参考指标的关系函数,其中,value字段包括函数标识以及函数参数在内的几个数值组成。如此,大幅减少了关联指标需要传输的数据量,降低了传输压力和存储压力。对应地,当设备104对memory使用率的一些数据有需求时,设备104可以读取value字段的函数标识和函数参数,由此恢复memory使用率与CPU使用率的关系函数,基于该关系函数以及CPU使用率的目标拟合模型恢复memory使用率的目标拟合模型,然后基于目标拟合模型恢复多个memory使用率数据中的至少一个。When there is an association between memory usage and CPU usage, the data to be sent can be further compressed. Specifically, for the memory usage rate of the associated index, the device 102 only needs to send a data block composed of simplified metadata and simplified data. As shown in Figure 5, the simplified metadata includes the indicator name metric_name, which is memory in this example, and other metadata of the data can be referred to the associated indicators, which are also called reference indicators. The simplified data includes a reference index (reference, ref) field and a value field. The field value of the reference index field is CPU, and the value field is used to identify the relationship function between the index and the reference index. The value field includes function identification and function It consists of several numerical values including parameters. In this way, the amount of data that needs to be transmitted for the associated indicators is greatly reduced, and the transmission pressure and storage pressure are reduced. Correspondingly, when the device 104 has requirements for some data of the memory usage rate, the device 104 can read the function identifier and function parameters of the value field, thereby restoring the relationship function between the memory usage rate and the CPU usage rate, based on the relationship function and The target fitting model of the CPU usage rate restores the target fitting model of the memory usage rate, and then restores at least one of the multiple memory usage rate data based on the target fitting model.
以上结合图1至图5对本申请实施例提供的数据处理方法进行了介绍,下面结合附图对本申请实施例提供的数据处理装置、设备进行介绍。The data processing method provided by the embodiment of the present application is described above with reference to FIGS. 1 to 5, and the data processing apparatus and equipment provided by the embodiment of the present application are described below with reference to the accompanying drawings.
参见图6所示的数据处理装置的结构示意图,该数据处理装置300包括:Referring to the schematic structural diagram of the data processing device shown in FIG. 6, the data processing device 300 includes:
通信模块302,用于获取多个数据;The communication module 302 is used to obtain multiple data;
拟合模块304,用于基于所述多个数据获得目标拟合模型;The fitting module 304 is configured to obtain a target fitting model based on the multiple data;
所述通信模块302,还用于向第二设备102发送所述目标拟合模型,所述目标拟合模型用于恢复所述多个数据中的至少一个数据。The communication module 302 is further configured to send the target fitting model to the second device 102, where the target fitting model is used to restore at least one data of the plurality of data.
在一些可能的实现方式中,所述拟合模块304具体用于:In some possible implementation manners, the fitting module 304 is specifically configured to:
根据所述多个数据对多个拟合模型进行模型拟合,以从所述多个拟合模型选择一个拟 合模型来作为所述目标拟合模型。Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
在一些可能的实现方式中,所述多个拟合模型包括线性模型、多项式模型和神经网络模型中的至少两个。In some possible implementations, the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
在一些可能的实现方式中,所述通信模块302具体用于:In some possible implementation manners, the communication module 302 is specifically configured to:
向所述第二设备发送所述拟合模块304选择的拟合模型的标识以及对应的模型参数。The identification of the fitting model selected by the fitting module 304 and the corresponding model parameters are sent to the second device.
在一些可能的实现方式中,所述通信模块302还用于:In some possible implementation manners, the communication module 302 is further configured to:
向所述第二设备发送与所述目标拟合模型关联的关系函数,所述关系函数用于基于所述目标拟合模型确定另一个目标拟合模型,所述另一个目标拟合模型用于恢复所述多个数据关联的其它数据中的至少一个数据。The relationship function associated with the target fitting model is sent to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target fitting model is used for At least one of the other data associated with the plurality of data is restored.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
向所述第二设备发送差值,所述差值用于结合所述目标拟合模型来恢复所述多个数据中的至少一个数据,所述差值是第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值。Send a difference value to the second device, where the difference value is used in combination with the target fitting model to recover at least one of the multiple data, and the difference value is the multiple data obtained by the first device. The difference between the data and the multiple data recovered by the first device based on the target fitting model.
根据本申请实施例的数据处理装置300可对应于执行本申请实施例中描述的方法,并且数据处理装置300中的各个模块的上述和其它操作和/或功能分别为了实现图2中的各个方法的相应流程,为了简洁,在此不再赘述。The data processing device 300 according to the embodiment of the present application may correspond to the implementation of the method described in the embodiment of the present application, and the above and other operations and/or functions of each module in the data processing device 300 are to implement each method in FIG. 2 respectively. For the sake of brevity, the corresponding process will not be repeated here.
接下来,参见图7所示的数据处理装置的结构示意图,该数据处理装置400包括:Next, referring to the schematic structural diagram of the data processing device shown in FIG. 7, the data processing device 400 includes:
通信模块402,用于接收来自第一设备的目标拟合模型,所述目标拟合模型由所述第一设备基于获取的多个数据获得;The communication module 402 is configured to receive a target fitting model from a first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
恢复模块404,用于根据所述目标拟合模型恢复所述多个数据中的至少一个数据。The restoration module 404 is configured to restore at least one of the multiple data according to the target fitting model.
在一些可能的实现方式中,所述通信模块402还用于:In some possible implementation manners, the communication module 402 is further configured to:
接收与所述目标拟合模型关联的关系函数;Receiving a relationship function associated with the target fitting model;
所述装置400还包括:The device 400 further includes:
确定模块,用于根据所述关系函数和所述目标拟合模型确定另一个目标拟合模型;A determining module, configured to determine another target fitting model according to the relationship function and the target fitting model;
所述恢复模块404还用于:The recovery module 404 is also used for:
根据所述另一个目标拟合模型恢复所述多个数据关联的其它数据中的至少一个数据。At least one of the other data associated with the plurality of data is restored according to the another target fitting model.
在一些可能的实现方式中,所述通信模块还用于:In some possible implementation manners, the communication module is also used to:
通信模块402,用于接收来自第一设备的差值,所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值;The communication module 402 is configured to receive a difference value from a first device, where the difference value is the plurality of data acquired by the first device and the plurality of data recovered by the first device based on the target fitting model The difference;
所述恢复模块404具体用于:The recovery module 404 is specifically used for:
根据所述目标拟合模型和所述差值恢复所述多个数据中的至少一个数据。At least one data of the plurality of data is restored according to the target fitting model and the difference value.
根据本申请实施例的数据处理装置400可对应于执行本申请实施例中描述的方法,并且数据处理装置400中的各个模块的上述和其它操作和/或功能分别为了实现图2中的各个方法的相应流程,为了简洁,在此不再赘述。The data processing apparatus 400 according to the embodiment of the present application may correspond to the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of each module in the data processing apparatus 400 are to implement each method in FIG. 2 respectively. For the sake of brevity, the corresponding process will not be repeated here.
图8至图9还提供了一种设备。图8所示的设备500具体可以用于实现上述图6所示实施例中数据处理装置300的功能,图9所示的设备600具体可以用于实现上述图7所示实施例中数据处理装置400的功能。Figures 8-9 also provide a device. The device 500 shown in FIG. 8 may be specifically used to implement the functions of the data processing apparatus 300 in the embodiment shown in FIG. 6, and the device 600 shown in FIG. 9 may be specifically used to implement the data processing apparatus in the embodiment shown in FIG. 400 features.
设备500包括总线501、处理器502、通信接口503和存储器504。处理器502、存储器504和通信接口503之间通过总线501通信。总线501可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口503用于与外部通信,例如获取多个数据,向第二设备发送目标拟合模型等等。The device 500 includes a bus 501, a processor 502, a communication interface 503, and a memory 504. The processor 502, the memory 504, and the communication interface 503 communicate through a bus 501. The bus 501 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus. The communication interface 503 is used to communicate with the outside, such as acquiring multiple data, sending a target fitting model to the second device, and so on.
其中,处理器502可以为中央处理器(central processing unit,CPU)。存储器504可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器504还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。The processor 502 may be a central processing unit (CPU). The memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM). The memory 504 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), flash memory, HDD or SSD.
存储器504中存储有可执行代码,处理器502执行该可执行代码以执行前述数据处理方法。The memory 504 stores executable code, and the processor 502 executes the executable code to execute the aforementioned data processing method.
具体地,在实现图6所示实施例的情况下,且图6实施例中所描述的各模块为通过软件实现的情况下,执行图6中的拟合模块304功能所需的软件或程序代码存储在存储器504中。通信模块302功能通过通信接口503实现。处理器502用于执行存储器504中的指令,执行应用于数据处理装置300的数据处理方法。Specifically, in the case of implementing the embodiment shown in FIG. 6 and each module described in the embodiment of FIG. 6 is realized by software, the software or program required to execute the function of the fitting module 304 in FIG. 6 The code is stored in the memory 504. The function of the communication module 302 is implemented through the communication interface 503. The processor 502 is configured to execute instructions in the memory 504 and execute a data processing method applied to the data processing device 300.
设备600包括总线601、处理器602、通信接口603和存储器604。处理器602、存储器604和通信接口603之间通过总线601通信。设备600在实现图7所示实施例的情况下,且图7实施例中所描述的各模块为通过软件实现的情况下,执行7中的恢复模块404功能所需的软件或程序代码存储在存储器604中。通信模块402功能通过通信接口603实现。处理器602用于执行存储器604中的指令,执行应用于数据处理装置400的数据处理方法。The device 600 includes a bus 601, a processor 602, a communication interface 603, and a memory 604. The processor 602, the memory 604, and the communication interface 603 communicate through a bus 601. When the device 600 implements the embodiment shown in FIG. 7 and each module described in the embodiment of FIG. 7 is implemented by software, the software or program code required to execute the function of the recovery module 404 in 7 is stored in In the memory 604. The function of the communication module 402 is implemented through the communication interface 603. The processor 602 is configured to execute instructions in the memory 604, and execute a data processing method applied to the data processing device 400.
本申请实施例还提供了一种计算机可读存储介质,包括指令,当其在设备上运行时,使得设备执行上述应用于数据处理装置300或数据处理装置400的数据处理方法。The embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a device, cause the device to execute the above-mentioned data processing method applied to the data processing apparatus 300 or the data processing apparatus 400.
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被计算机执行时,所述计算机执行前述数据处理方法的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述数据处理方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。The embodiments of the present application also provide a computer program product. When the computer program product is executed by a computer, the computer executes any one of the aforementioned data processing methods. The computer program product may be a software installation package. In the case where any of the aforementioned data processing methods needs to be used, the computer program product may be downloaded and executed on the computer.

Claims (22)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method includes:
    第一设备获取多个数据;The first device acquires multiple data;
    所述第一设备基于所述多个数据获得目标拟合模型;The first device obtains a target fitting model based on the plurality of data;
    所述第一设备向第二设备发送所述目标拟合模型,所述目标拟合模型用于恢复所述多个数据中的至少一个数据。The first device sends the target fitting model to the second device, and the target fitting model is used to restore at least one data of the plurality of data.
  2. 根据权利要求1所述的方法,其特征在于,所述第一设备基于所述多个数据获得目标拟合模型,包括:The method according to claim 1, wherein the first device obtaining a target fitting model based on the plurality of data comprises:
    根据所述多个数据对多个拟合模型进行模型拟合,以从所述多个拟合模型选择一个拟合模型来作为所述目标拟合模型。Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
  3. 根据权利要求2所述的方法,其特征在于,所述多个拟合模型包括线性模型、多项式模型和神经网络模型中的至少两个。The method according to claim 2, wherein the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
  4. 根据权利要求2所述的方法,其特征在于,所述第一设备向第二设备发送所述目标拟合模型,包括:The method according to claim 2, wherein the sending of the target fitting model by the first device to the second device comprises:
    所述第一设备向第二设备发送所述第一设备选择的拟合模型的标识以及对应的模型参数。The first device sends the identification of the fitting model selected by the first device and the corresponding model parameters to the second device.
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:
    所述第一设备向所述第二设备发送与所述目标拟合模型关联的关系函数,所述关系函数用于基于所述目标拟合模型确定另一个目标拟合模型,所述另一个目标拟合模型用于恢复所述多个数据关联的其它数据中的至少一个数据。The first device sends a relationship function associated with the target fitting model to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target The fitting model is used to restore at least one data among other data associated with the plurality of data.
  6. 根据权利要求1至4任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:
    所述第一设备向所述第二设备发送差值,所述差值用于结合所述目标拟合模型来恢复所述多个数据中的至少一个数据,所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值。The first device sends a difference value to the second device, the difference value is used to restore at least one of the multiple data in combination with the target fitting model, and the difference value is the first Differences between the plurality of data acquired by the device and the plurality of data restored by the first device based on the target fitting model.
  7. 根据权利要求1至4任意一项所述的方法,其特征在于,所述多个数据包括时间窗内的多个时序数据。The method according to any one of claims 1 to 4, wherein the plurality of data includes a plurality of time series data within a time window.
  8. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method includes:
    第二设备接收来自第一设备的目标拟合模型,所述目标拟合模型由所述第一设备基于获取的多个数据获得;The second device receives a target fitting model from the first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
    所述第二设备根据所述目标拟合模型恢复所述多个数据中的至少一个数据。The second device restores at least one data of the plurality of data according to the target fitting model.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method according to claim 8, wherein the method further comprises:
    所述第二设备接收与所述目标拟合模型关联的关系函数;Receiving, by the second device, a relationship function associated with the target fitting model;
    所述第二设备根据所述关系函数和所述目标拟合模型确定另一个目标拟合模型;The second device determines another target fitting model according to the relationship function and the target fitting model;
    所述第二设备根据所述另一个目标拟合模型恢复所述多个数据关联的其它数据中的至少一个数据。The second device restores at least one of the other data associated with the plurality of data according to the another target fitting model.
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:The method according to claim 8 or 9, wherein the method further comprises:
    所述第二设备接收来自第一设备的差值,所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值;The second device receives a difference value from the first device, where the difference value is the difference between the plurality of data acquired by the first device and the plurality of data recovered by the first device based on the target fitting model Difference
    所述第二设备根据所述目标拟合模型恢复所述多个数据中的至少一个数据,包括:The second device restoring at least one of the multiple data according to the target fitting model includes:
    所述第二设备根据所述目标拟合模型和所述差值恢复所述多个数据中的至少一个数据。The second device restores at least one data of the plurality of data according to the target fitting model and the difference value.
  11. 一种数据处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device comprises:
    通信模块,用于获取多个数据;Communication module, used to obtain multiple data;
    拟合模块,用于基于所述多个数据获得目标拟合模型;A fitting module, configured to obtain a target fitting model based on the plurality of data;
    所述通信模块,还用于向第二设备发送所述目标拟合模型,所述目标拟合模型用于恢复所述多个数据中的至少一个数据。The communication module is further configured to send the target fitting model to a second device, and the target fitting model is used to restore at least one data of the plurality of data.
  12. 根据权利要求11所述的装置,其特征在于,所述拟合模块具体用于:The device according to claim 11, wherein the fitting module is specifically configured to:
    根据所述多个数据对多个拟合模型进行模型拟合,以从所述多个拟合模型选择一个拟合模型来作为所述目标拟合模型。Model fitting is performed on multiple fitting models according to the multiple data, so as to select one fitting model from the multiple fitting models as the target fitting model.
  13. 根据权利要求12所述的装置,其特征在于,所述多个拟合模型包括线性模型、多项式模型和神经网络模型中的至少两个。The device according to claim 12, wherein the multiple fitting models include at least two of a linear model, a polynomial model, and a neural network model.
  14. 根据权利要求12所述的装置,其特征在于,所述通信模块具体用于:The device according to claim 12, wherein the communication module is specifically configured to:
    向所述第二设备发送所述拟合模块选择的拟合模型的标识以及对应的模型参数。Send the identification of the fitting model selected by the fitting module and the corresponding model parameters to the second device.
  15. 根据权利要求11至14任意一项所述的装置,其特征在于,所述通信模块还用于:The device according to any one of claims 11 to 14, wherein the communication module is further configured to:
    向所述第二设备发送与所述目标拟合模型关联的关系函数,所述关系函数用于基于所述目标拟合模型确定另一个目标拟合模型,所述另一个目标拟合模型用于恢复所述多个数据关联的其它数据中的至少一个数据。The relationship function associated with the target fitting model is sent to the second device, where the relationship function is used to determine another target fitting model based on the target fitting model, and the other target fitting model is used for At least one of the other data associated with the plurality of data is restored.
  16. 根据权利要求11至14任意一项所述的装置,其特征在于,所述通信模块还用于:The device according to any one of claims 11 to 14, wherein the communication module is further configured to:
    向所述第二设备发送差值,所述差值用于结合所述目标拟合模型来恢复所述多个数据中的至少一个数据,所述差值是第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值。Send a difference value to the second device, where the difference value is used in combination with the target fitting model to recover at least one of the multiple data, and the difference value is the multiple data obtained by the first device. The difference between the data and the multiple data recovered by the first device based on the target fitting model.
  17. 根据权利要求11至14任意一项所述的装置,其特征在于,所述多个数据包括时间窗内的多个时序数据。The device according to any one of claims 11 to 14, wherein the multiple data includes multiple time series data within a time window.
  18. 一种数据处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device comprises:
    通信模块,用于接收来自第一设备的目标拟合模型,所述目标拟合模型由所述第一设备基于获取的多个数据获得;A communication module, configured to receive a target fitting model from a first device, where the target fitting model is obtained by the first device based on a plurality of acquired data;
    恢复模块,用于根据所述目标拟合模型恢复所述多个数据中的至少一个数据。The restoration module is configured to restore at least one of the multiple data according to the target fitting model.
  19. 根据权利要求18所述的装置,其特征在于,所述通信模块还用于:The device according to claim 18, wherein the communication module is further configured to:
    接收与所述目标拟合模型关联的关系函数;Receiving a relationship function associated with the target fitting model;
    所述装置还包括:The device also includes:
    确定模块,用于根据所述关系函数和所述目标拟合模型确定另一个目标拟合模型;A determining module, configured to determine another target fitting model according to the relationship function and the target fitting model;
    所述恢复模块还用于:The recovery module is also used for:
    根据所述另一个目标拟合模型恢复所述多个数据关联的其它数据中的至少一个数据。At least one of the other data associated with the plurality of data is restored according to the another target fitting model.
  20. 根据权利要求18或19所述的装置,其特征在于,所述通信模块还用于:The device according to claim 18 or 19, wherein the communication module is further configured to:
    通信模块,用于接收来自第一设备的差值,所述差值是所述第一设备获取的所述多个数据与所述第一设备基于所述目标拟合模型恢复的多个数据的差值;A communication module, configured to receive a difference value from a first device, where the difference value is the difference between the multiple data acquired by the first device and the multiple data recovered by the first device based on the target fitting model Difference
    所述恢复模块具体用于:The recovery module is specifically used for:
    根据所述目标拟合模型和所述差值恢复所述多个数据中的至少一个数据。At least one data of the plurality of data is restored according to the target fitting model and the difference value.
  21. 一种设备,其特征在于,所述设备包括处理器和存储器;A device, characterized in that the device includes a processor and a memory;
    所述处理器用于执行所述存储器中存储的指令,以使得所述处理器执行如权利要求1至10任一项所述的方法。The processor is configured to execute instructions stored in the memory, so that the processor executes the method according to any one of claims 1 to 10.
  22. 一种计算机可读存储介质,包括指令,当其在设备上运行时,使得设备执行如权利要求1至10中任一项所述的方法。A computer-readable storage medium, comprising instructions, which when run on a device, cause the device to execute the method according to any one of claims 1 to 10.
PCT/CN2020/111993 2020-01-23 2020-08-28 Data processing method, apparatus, device, and medium WO2021147319A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010076999.3 2020-01-23
CN202010076999.3A CN113162960A (en) 2020-01-23 2020-01-23 Data processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
WO2021147319A1 true WO2021147319A1 (en) 2021-07-29

Family

ID=76882120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111993 WO2021147319A1 (en) 2020-01-23 2020-08-28 Data processing method, apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN113162960A (en)
WO (1) WO2021147319A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116780658A (en) * 2023-08-17 2023-09-19 国网浙江省电力有限公司金华供电公司 Multi-energy complementary optimization scheduling method considering source-load bilateral uncertainty

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398021B (en) * 2022-01-11 2022-09-06 北京大唐神州科技有限公司 Low code delivery method based on software development

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19630127C1 (en) * 1996-07-25 1998-01-08 Connect Plus Ingenieurgesellsc Data transmission method
CN103595568A (en) * 2013-11-17 2014-02-19 吉林大学 Internet real-time signal transmission method based on LS-SVM
US20170141875A1 (en) * 2015-11-13 2017-05-18 Avago Technologies General Ip (Singapore) Pte. Ltd. System, device, and method for multi-mode communications
CN107634819A (en) * 2017-07-14 2018-01-26 西安万像电子科技有限公司 Transmission method, equipment and the system of sensing data
CN110674941A (en) * 2019-09-25 2020-01-10 南开大学 Data encryption transmission method and system based on neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190761A (en) * 2018-08-06 2019-01-11 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium
CN109450606B (en) * 2019-01-07 2021-07-06 北京世纪好未来教育科技有限公司 Data transmission control method and device
CN110442557B (en) * 2019-07-31 2021-09-28 上海赜睿信息科技有限公司 Data compression and decompression method, electronic device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19630127C1 (en) * 1996-07-25 1998-01-08 Connect Plus Ingenieurgesellsc Data transmission method
CN103595568A (en) * 2013-11-17 2014-02-19 吉林大学 Internet real-time signal transmission method based on LS-SVM
US20170141875A1 (en) * 2015-11-13 2017-05-18 Avago Technologies General Ip (Singapore) Pte. Ltd. System, device, and method for multi-mode communications
CN107634819A (en) * 2017-07-14 2018-01-26 西安万像电子科技有限公司 Transmission method, equipment and the system of sensing data
CN110674941A (en) * 2019-09-25 2020-01-10 南开大学 Data encryption transmission method and system based on neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116780658A (en) * 2023-08-17 2023-09-19 国网浙江省电力有限公司金华供电公司 Multi-energy complementary optimization scheduling method considering source-load bilateral uncertainty
CN116780658B (en) * 2023-08-17 2023-11-10 国网浙江省电力有限公司金华供电公司 Multi-energy complementary optimization scheduling method considering source-load bilateral uncertainty

Also Published As

Publication number Publication date
CN113162960A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
KR102511271B1 (en) Method and device for storing and querying time series data, and server and storage medium therefor
CN111966289B (en) Partition optimization method and system based on Kafka cluster
CN103312544B (en) A kind of control terminal reports the method, apparatus and system of journal file
CN110493065B (en) Alarm correlation degree analysis method and system for cloud center operation and maintenance
JP2019511054A (en) Distributed cluster training method and apparatus
CN112511325B (en) Network congestion control method, node, system and storage medium
WO2021147319A1 (en) Data processing method, apparatus, device, and medium
US11188443B2 (en) Method, apparatus and system for processing log data
US20120191678A1 (en) Providing Reconstructed Data Based On Stored Aggregate Data in Response to Queries for Unavailable Data
CN112118174A (en) Software defined data gateway
CN112632129A (en) Code stream data management method, device and storage medium
CN110399224A (en) Information processing method and electronic equipment
CN107306200B (en) Network fault early warning method and gateway for network fault early warning
CN112751722B (en) Data transmission quality monitoring method and system
CN116578911A (en) Data processing method, device, electronic equipment and computer storage medium
WO2023045365A1 (en) Video quality evaluation method and apparatus, electronic device, and storage medium
JP2017162046A (en) Sensor data processing apparatus, sensor data processing system, sensor data processing method, and sensor data processing program
CN102655480B (en) Similar mail treatment system and method
CN112312209B (en) Comprehensive alarm generation method, device, server and storage medium
CN115130794A (en) Data processing method, device, equipment and computer readable storage medium
CN115114316A (en) Processing method, device, cluster and storage medium for high-concurrency data
CN117749800B (en) Method and related device for realizing edge data storage and transmission on new energy power generation side
CN117527708B (en) Optimized transmission method and system for enterprise data link based on data flow direction
CN117112039B (en) Transmission optimization system and operation method of data center
CN114189565B (en) Head area restoration system, method and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915777

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915777

Country of ref document: EP

Kind code of ref document: A1