WO2021189362A1 - 基于多条件约束的时间序列数据生成方法、装置及介质 - Google Patents

基于多条件约束的时间序列数据生成方法、装置及介质 Download PDF

Info

Publication number
WO2021189362A1
WO2021189362A1 PCT/CN2020/081440 CN2020081440W WO2021189362A1 WO 2021189362 A1 WO2021189362 A1 WO 2021189362A1 CN 2020081440 W CN2020081440 W CN 2020081440W WO 2021189362 A1 WO2021189362 A1 WO 2021189362A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sample
repair
condition
repaired
Prior art date
Application number
PCT/CN2020/081440
Other languages
English (en)
French (fr)
Inventor
彭磊
张俊楠
李慧云
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to AU2020438008A priority Critical patent/AU2020438008B2/en
Priority to PCT/CN2020/081440 priority patent/WO2021189362A1/zh
Priority to US17/618,758 priority patent/US11797372B2/en
Priority to GB2117945.2A priority patent/GB2606792A/en
Publication of WO2021189362A1 publication Critical patent/WO2021189362A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device and medium for generating time series data based on multiple constraint constraints.
  • Time series data refers to data collected at different times. This type of data is used in production and life to describe the changes of a certain thing or phenomenon over time. However, due to the dense data points of this type of data and poor anti-interference, it is easy to cause data loss in the process of data collection, application, or transmission.
  • the repair methods for missing data mainly include the following two: the first is an interpolation method based on prior knowledge; the second is to obtain the sample data that best matches the missing data, and use the sample data to train and generate a confrontation network. To repair the missing data.
  • the first method requires a large amount of historical data as a basis and is not suitable for the restoration of massive data; the second method is more difficult to obtain sample data with high matching degree and it is difficult to learn the effective features of the data, and the accuracy of the repaired data is poor. It is not sequential.
  • the embodiments of the present application provide a method, device and medium for generating time series data based on multi-condition constraints. Without a large amount of historical data or sample data with high matching degree as the training basis, the rich features of the data to be repaired can be obtained, which guarantees The accuracy and timing of the repaired data improves the repairing efficiency and quality.
  • an embodiment of the present application provides a method for generating time series data based on multiple constraints, including:
  • the data repair request includes data to be repaired and condition information
  • the data repair request is used to request data repair to the data to be repaired according to the condition information
  • the condition information is related to the State the characteristic conditions that match the data to be repaired
  • the data repair model that has been trained is called, the normalized data is repaired according to the feature label, and the first repair data is obtained.
  • the data repair model is based on the sample data, the first sample condition, and the real sample data And the second sample condition is obtained by training the data repair model, the sample data is noise data;
  • the client sends a data repair request including the data to be processed and condition information to the server, so that the server normalizes the data to be repaired to obtain normalized data, and performs quantitative processing on the condition information to obtain Feature label, the condition information is a feature condition that matches the data to be repaired, the completed training data repair model is called, the normalized data is repaired according to the feature label, the first repair data is obtained, and the first repair data Send to the client.
  • the rich features of the data to be repaired can be obtained, so that the first repaired data generated is closer to the distribution characteristics of the real data, the accuracy and timing of the repaired data are guaranteed, and the repair is improved. Efficiency and quality.
  • an embodiment of the present application provides a time series data generation device based on multiple constraint conditions, including:
  • the transceiver unit is configured to receive a data repair request from the client, the data repair request includes data to be repaired and condition information, the data repair request is used to request data repair to the data to be repaired according to the condition information, the The condition information is a characteristic condition that matches the data to be repaired;
  • the processing unit is configured to perform normalization processing on the data to be repaired to obtain the normalized data of the data to be repaired, and perform quantization processing on the condition information to obtain the feature label of the condition information; call For the data repair model that has been trained, the normalized data is repaired according to the feature tags to obtain the first repair data, and the data repair model is based on sample data, first sample conditions, real sample data, and The second sample condition is obtained by training the data repair model, and the sample data is noise data;
  • the transceiver unit is further configured to send the first repair data to the client.
  • an embodiment of the present application provides an apparatus for generating time series data based on multi-condition constraints, including a processor, a memory, and a communication interface.
  • the processor, the memory, and the communication interface are connected to each other.
  • the memory is used to store a computer program
  • the computer program includes program instructions
  • the processor is configured to call the program instructions to execute the method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more first instructions, and the one or more first instructions are suitable for The processor loads and executes the method as described in the first aspect.
  • the client sends a data repair request to the server, and the data repair request includes condition information of the data set to be repaired.
  • the server normalizes the data to be repaired to obtain the normalized data, and performs quantitative processing on the condition information.
  • the feature label is obtained, and the condition information is a feature condition that matches the data to be repaired.
  • data repair based on the condition information can fully consider the diversity of the features of the data to be repaired, so as to obtain more accurate repair data.
  • Call the completed training data repair model perform repair processing on the normalized data according to the feature label, obtain the first repair data, and send the first repair data to the client.
  • the training method of the data repair model is: iteratively supervised training the input at least one set of sample data, the first sample condition, the real sample data, and the second sample condition through the multi-condition constraint generative confrontation network, where the sample data It is noisy data, and the real sample data is real time series data.
  • the missing time series data can be repaired without the need for a large amount of historical data or manually obtaining sample data with a high degree of matching with the missing data.
  • the training basis can realize the training of the model, and through the introduction of multiple feature condition information, the rich features of the data to be repaired can be obtained, and the Long Short-Term Memory (LSTM) network is used as the multi-condition constraint
  • LSTM Long Short-Term Memory
  • FIG. 1 is an architecture diagram of a time series data generation system based on multiple constraints provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a method for generating time series data based on multi-condition constraints according to an embodiment of the present application
  • FIG. 3 is a flowchart of another method for generating time series data based on multi-condition constraints provided by an embodiment of the present application
  • FIG. 4 is a framework diagram of a multi-condition generation confrontation network provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a calculation result of average cosine similarity provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of yet another method for generating time series data based on multi-condition constraints according to an embodiment of the present application
  • Fig. 7(a) is a schematic diagram of comparison between unconditionally constrained generated data and real data provided by an embodiment of the present application;
  • FIG. 7(b) is a schematic diagram of comparison between single-condition constraint generated data and real data provided by an embodiment of the present application.
  • FIG. 7(c) is a schematic diagram of comparison between multi-condition constraint generated data and real data provided by an embodiment of the present application.
  • FIG. 8(a) is a schematic diagram of an unconditional residual analysis result provided by an embodiment of the present application.
  • FIG. 8(b) is a schematic diagram of a residual analysis result of a single condition constraint provided by an embodiment of the present application.
  • FIG. 8(c) is a schematic diagram of a residual analysis result of a multi-condition constraint provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a time series data generating device based on multiple conditional constraints provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another apparatus for generating time series data based on multiple constraints provided by an embodiment of the present application.
  • Time series data is a type of one-dimensional data with time information collected at different times, such as traffic monitoring data, parking situation data in parking lots, and so on.
  • This type of data is used in production and life to describe the changes of a certain thing or phenomenon over time.
  • the repair methods for missing data mainly include the following two: The first is an interpolation method based on prior knowledge, which requires a large amount of historical data as a basis, and cannot repair a large amount of missing data, and is not suitable for massive data.
  • the generative confrontation network may include: Deep Convolution Generative Adversarial Networks (DCGAN).
  • DCGAN Deep Convolution Generative Adversarial Networks
  • this method is difficult to obtain sample data with a high degree of matching, and the generated data is disordered. For example, when generating parking lot data for a week, it is impossible to determine which day the generated data is. At the same time, it is necessary to extract the features of the samples when using the generative adversarial network for training. Due to the diversification of sample features, it is impossible to learn all the features from one sample, which affects the accuracy of data restoration.
  • the embodiment of the present application provides a time series data generation method based on multi-condition constraints.
  • the time series data generation method is based on Multi-condition Generative Adversarial Networks (MCGAN) to repair massive data .
  • MCGAN Multi-condition Generative Adversarial Networks
  • the multi-condition generation confrontation network includes a generator network and a discriminator network.
  • the data repair model constructed by the above-mentioned multi-condition generation confrontation network can perform data repair on the data to be repaired based on the data to be repaired and the condition information corresponding to the data to be repaired, and obtain the first One to repair the data.
  • the condition information is the characteristic condition of the data to be repaired, such as time, space, climate, and so on.
  • rich sample features can be obtained without a large amount of historical data or specific sample data as a basis, so that the data repair model can generate repair data closer to the real data, ensuring the accuracy and timeliness of the massive repair data The orderliness improves the quality of data restoration.
  • this embodiment can be applied to parking lot data restoration scenarios.
  • missing parking lot data can be acquired as data to be repaired, and condition information affecting the distribution of parking spaces, such as time, space, climate, etc., can be acquired .
  • the above-mentioned data to be repaired is normalized to obtain normalized data, and the condition information is subjected to quantization processing to obtain feature labels.
  • the data repair model based on the multi-condition generation confrontation network can be called, and the normalized data can be repaired according to the feature tags to obtain the first repair data.
  • the first repair data can be understood as data that conforms to the real situation of the parking lot.
  • the above-mentioned multi-condition constraint-based time series data generation method can be applied to the multi-condition constraint-based time series data generation system as shown in FIG. 1.
  • the multi-condition constraint time series data generation system may include the client 101 And server 102.
  • the shape and quantity of the client 101 are used for example, and do not constitute a limitation to the embodiment of the present application. For example, two clients 101 may be included.
  • the client 101 may be a client that sends a data repair request to the server 102, or it may be used to provide the server 102 with sample data, first sample conditions, real sample data, and second samples during data repair model training.
  • the client can be any of the following: a terminal, an independent application, an application programming interface (Application Programming Interface, API), or a software development kit (Software Development Kit, SDK).
  • the terminal may include, but is not limited to: smart phones (such as Android phones, IOS phones, etc.), tablet computers, portable personal computers, mobile Internet devices (Mobile Internet Devices, MID) and other devices, which are not limited in the embodiment of the present application.
  • the server 102 may include, but is not limited to, a cluster server.
  • the client 101 sends a data repair request to the server 102, and the server 102 obtains the first repair data of the data to be repaired according to the data to be repaired and condition information included in the data repair request.
  • the data to be repaired is normalized to obtain normalized data
  • the condition information is quantized to obtain the feature label
  • the normalized data is repaired through the pre-trained data repair model and the feature label.
  • Obtain the first repair data and send the first repair data to the client 101, so that the operating user 103 of the client 101 can analyze the changes of a certain thing or phenomenon over time based on the first repair data.
  • FIG. 2 is a schematic flowchart of a method for generating time series data based on multiple constraints provided by an embodiment of the present application.
  • the method for generating time series data may include parts 201 to 205, wherein :
  • the client 101 sends a data repair request to the server 102.
  • the client 101 sends a data repair request to the server 102.
  • the server 102 receives a data repair request from the client 101.
  • the data repair request includes the data to be repaired and condition information.
  • Information performs data repair on the data to be repaired, where the data to be repaired is data with existing missing conditions, specifically, it may be time-series data with existing missing conditions.
  • the condition information is a characteristic condition that matches the data to be repaired. For example, if the data to be repaired is parking lot data, the condition information may include time, space, climate, and so on.
  • the server 102 performs normalization processing on the data to be repaired to obtain the normalized data of the data to be repaired.
  • the server 102 performs normalization processing on the data to be repaired to obtain the normalized data of the data to be repaired.
  • the to-be-processed data is time-series data. For example, if it is time-series data M, the to-be-processed data can be expressed as
  • m tk represents the data value of the data to be repaired corresponding to the time t k
  • l is the length of the data to be repaired.
  • the data can be expressed as:
  • data cleaning may be performed on the data to be repaired.
  • the server 102 performs quantization processing on the condition information to obtain a feature label of the condition information.
  • the server 102 performs quantization processing on the condition information to obtain the feature label of the condition information.
  • the condition information is a characteristic condition that matches the above-mentioned data to be repaired.
  • the condition information may include, but is not limited to, static condition information.
  • static condition information For example, when the data to be repaired is parking lot data, the static condition information may be buildings around the parking lot. Distribution; dynamic continuity condition information, such as: time series labels; discrete condition information, such as: there are 7 days in a week, then 7 days are different discrete characteristics, or social events such as weather and holidays. Then when input to the server 102. It is necessary to quantify the acquired condition information to obtain the feature label of the condition information.
  • the quantization of static condition information can be normalized, and the resulting feature label can be expressed as:
  • the normalization processing method can be used to obtain the condition sequence of the dynamic continuity condition information.
  • the condition sequence can be understood as a condition label arranged in time order, and the condition sequence can be Expressed as:
  • one-hot encoding (one-hot) mode can be used, and the feature label can be expressed as:
  • n is the number of possibilities for the occurrence of an event. For example, if the number of possibilities for the occurrence of an event is 2, then the representation methods of event 1 and event 2 can be ⁇ 1,0 ⁇ and ⁇ 0,1 ⁇ respectively.
  • the server 102 calls the data repair model that has completed the training, performs repair processing on the normalized data according to the feature tags, and obtains the first repair data.
  • the server 102 calls the data repair model that has completed the training, performs repair processing on the normalized data according to the feature tags, and obtains the first repair data.
  • the data repair model is obtained by training the data repair model according to the sample data, the first sample condition, the real sample data, and the second sample condition, and the sample data is noise data.
  • the data repair model is a model constructed by repeated iterative training of the generator network and the discriminator network using sample data, first sample conditions, real sample data, and second sample conditions.
  • the above-mentioned data to be repaired includes a sequence of time points, and the normalized data is repaired according to the feature tags to obtain the first repair data, which may be sorting each data in the normalized data according to the sequence of time points, according to The feature tag performs data repair processing on the sorted normalized data to obtain the first repair data.
  • the time point sequence is a sequence composed of the generation time points of each data in the data to be repaired, that is, ⁇ t l , t 2 ...t l ⁇ in the data to be processed above, then the data to be repaired is normalized to obtain
  • the normalized data also carries the time point sequence, that is, ⁇ t l , t 2 ... t l ⁇ in the above-mentioned normalized data.
  • each data in the normalized data is sorted according to the time point
  • input the sorted normalized data into the generator network of the data repair model, and the built-in network of the generator is long and short neural memory ( Long Short-Term Memory (LSTM) network, using LSTM network, can improve the processing ability of the data repair model for time series.
  • LSTM Long Short-Term Memory
  • each data in the sorted normalized data is input into each corresponding cell interface of the LSTM network in chronological order, where the cell interface data of the LSTM network has the same length as the normalized data.
  • input the characteristic label into each cell interface respectively, so that the generator network can perform data repair on the normalized data according to the characteristic label to obtain the first repair data.
  • the server 102 sends the first repair data to the client 101.
  • the server 102 sends the first repair data to the client 101.
  • the client 101 receives the first repair data, so that the operating user 103 of the client 101 can analyze the changes over time of a certain thing or phenomenon based on the first repair data.
  • the first repair data is the data close to the real situation after the repair is completed
  • the server 102 normalizes the data to be repaired in the data repair request to obtain the normalized data, and then performs the normalization process on the data in the data repair request.
  • the condition information is quantified to obtain the feature label.
  • the condition information is the feature condition that matches the data to be repaired.
  • the data repair based on the condition information here can fully consider the diversity of the data to be repaired in order to get a more accurate repair. data.
  • the repair model of the completed training data is called, the normalized data is repaired according to the feature label, and the first repair data is obtained, and the first repair data is sent to the client 101.
  • FIG. 3 is a schematic flowchart of a method for generating time series data based on multiple constraints provided by an embodiment of the present application.
  • the method for generating time series data may include parts 301 to 308, where :
  • the server 102 obtains sample data and a first sample condition.
  • the server 102 may obtain sample data from the client 101 or other data platforms, and the first sample condition that matches the sample data.
  • the first sample condition please refer to the related description of the condition information in step 201, which will not be repeated here.
  • the sample data can be noise sample sequence data sampled in the noise space, and the sample data can be expressed as:
  • the server 102 performs normalization processing on the sample data to obtain first processed data.
  • the server 102 when the server 102 obtains the sample data, it normalizes the sample data to obtain the first processed data of the sample data.
  • the server 102 For the method of normalization processing here, reference may be made to the related description of the normalization processing of the data to be repaired in step 202, which will not be repeated here.
  • the server 102 performs quantification processing on the first sample condition to obtain the first sample label.
  • the server 102 performs a quantification process on the first sample condition to obtain the first sample label.
  • a quantification process on the first sample condition to obtain the first sample label.
  • a sample supervision data set can be constructed based on the first processed data and the first sample label corresponding to the first processed data, and the sample supervision data The set is used to input into MCGAN for network training and build a data repair model.
  • the first sample label Y p can be expressed as:
  • n is the number of labels in the first sample
  • the server 102 obtains the real sample data and the second sample condition.
  • the server 102 may obtain the real sample data and the second sample condition matching the real sample data from the client 101 or other data platforms.
  • the related description of the second sample condition can refer to the related description of the condition information in step 201, which will not be repeated here.
  • the second sample condition can be the same characteristic condition as the first sample condition.
  • the real sample data is the real data with missing, for example: there is missing real parking lot data, the real sample data X can be expressed as:
  • n is the number of real sample data.
  • the server 102 performs normalization processing on the real sample data to obtain second processed data.
  • the server 102 performs normalization processing on the real sample data to obtain the second processed data of the real sample data.
  • normalization processing for the method of normalization processing here, reference may be made to the related description of the normalization processing of the data to be repaired in step 202, which will not be repeated here.
  • the server 102 performs a quantification process on the second sample condition to obtain a second sample label.
  • the server 102 performs a quantification process on the first sample condition to obtain the first sample label.
  • the tensor processing method here, refer to the related description of the tensor quantization processing of the condition information in step 203, which will not be repeated here.
  • a real supervised data set can be constructed based on the second processed data and the second sample label corresponding to the second processed data.
  • the relevant description of the method for constructing the real supervised data set can be based on the composition of the sample supervised data set in step 303 , I won’t go into details here.
  • the server 102 performs supervised training on the data repair model according to the first processed data, the first sample label, the second processed data, and the second sample label, and determines the model function.
  • the server 102 supervises and trains the data repair model according to the first sample data, the first sample label, the second processed data, and the second sample label, and determines the model function, so that it can further optimize according to the function of the model.
  • Network parameters, construct a data repair model, that is, step 308 is executed.
  • the network involved in training the data repair model is the MCGAN network, which mainly includes a generator and a discriminator.
  • the framework diagram of the network can be seen in Figure 4.
  • the process of using the MCGAN network for supervised training is: From the noise space Sampling to obtain sample data, normalize the sample data Z to obtain the first processed data, and obtain the first sample condition that matches the sample data Z, and perform quantization processing on the first sample condition to obtain the first sample Label.
  • the first sample label may include a plurality of quantified feature conditions C
  • the first processed data is input into each data cell interface of the built-in LSTM network of the generator
  • the quantized features of the first sample label Condition C is input to each cell interface through a condition channel, and each condition channel can transmit a quantified characteristic condition C.
  • the input data of the above generator can be expressed as ⁇ Z, C 1 , C 2 ... C n ⁇ .
  • the second repair data can be further obtained.
  • the relevant description of the LSTM network here, refer to the corresponding description in step 204, which is not repeated here.
  • the second repair data F and the corresponding first sample label can be input into the discriminator for discrimination processing, and the first discrimination result can be obtained; and the normalization process will be completed.
  • the sample data R and the second sample label after quantization processing corresponding to the real sample data R are input to the discriminator.
  • the second sample label includes a plurality of quantized feature conditions C.
  • the input data of the discriminator can represent Is ⁇ (F or R),C 1 ,C 2 ...C n ⁇ , output the second discrimination result.
  • the built-in network of the discriminator is an LSTM network.
  • the discriminator is the same as the generator, and both include conditional channels and data channels.
  • both the generator and the discriminator may be configured with a state transition vector, and the state transition vector can control the opening or closing of the above-mentioned conditional channel, thereby adjusting the characteristic conditions required for training.
  • the above-mentioned first sample condition includes n characteristic conditions, and n is a positive integer.
  • the client 101 may also send a conditional instruction to the server 102, and the conditional instruction is used for Indicates to obtain x feature conditions from n feature conditions, and x is a non-negative integer less than or equal to n.
  • the server 102 may obtain x characteristic conditions from the first sample condition information including n characteristic conditions according to the condition instruction.
  • the conditional instruction may include a state transition vector, and the state transition vector may be embedded in the condition channel to control the switch of each condition channel. Then the input data G′ of the generator LSTM network after adding the state transition vector can be expressed as:
  • G′ ⁇ Z,S 1 *C 1 ,S 2 *C 2 ...S n *C n ⁇
  • the input data D′ of the discriminator LSTM network after adding the state transition vector can be expressed as:
  • D′ ⁇ (F or R),S 1 *C 1 ,S 2 *C 2 ...S n *C n ⁇
  • the application range of the network can be increased.
  • the network structure can be adaptively adjusted to obtain repair data generated under different characteristic conditions.
  • the model function may include generating loss function, discriminating loss function, and objective function.
  • the process of determining the model function may be: repairing the first processed data according to the first sample label to obtain the second repaired data, and Perform discrimination processing on the second repair data and the first sample label to obtain the first discrimination result; perform discrimination processing on the second processed data and the second sample label to obtain the second discrimination result; according to the first discrimination result and the second discrimination result , Determine the discriminative loss function; determine the generation loss function according to the first discrimination result; optimize the discrimination loss function and the generation loss function, and determine the objective function.
  • the generator performs repair processing on the first processed data according to the first sample label Y p to obtain the second repair data G(Z
  • the data is the sample data Z that has been normalized.
  • the obtained sample carries a first label second repair data Y p G
  • the label carrying the first sample a second repairing data Y p G (Z
  • the discriminator so that the discriminator can discriminate the second repair data, and obtain the first discriminating result D(G(Z
  • the discriminator needs to determine whether the generated data meets the true sample distribution, on the other hand, it also needs to determine whether the generated data meets the corresponding characteristic conditions. If the judgment result is yes, it means that the generated second repair data is data that meets the characteristics of the real sample; if the judgment result is no, network parameters are required, and the iterative training is continued to generate repair data that meets the characteristics of the real sample, so that the output of the discriminator Be as true as possible.
  • the diagnostic network J of the discriminator can be expressed as:
  • J real-sample-distribustion (D′) represents the judgment result of whether the generated data meets the real sample distribution
  • J condition-n (D′) represents the judgment result of whether the generated data meets the corresponding characteristic conditions
  • D′ represents the discrimination result output by the discriminator
  • D′ ⁇ d 1 ,d 2 ...d n ⁇ .
  • the second processed data and the second sample label Y p can be discriminated to obtain the second discriminating result D(X
  • the discrimination loss function can be determined according to the first discrimination result and the second discrimination result.
  • the discrimination loss function is the loss function of the discriminator, and the loss of the discriminator The function can be expressed as:
  • the generation loss function can be determined according to the first discrimination result.
  • the generation loss function is the loss function of the generator, and the loss function of the generator can be expressed as:
  • the discrimination loss function and the generation loss function are optimized, and the objective function is determined.
  • the optimization goal of the discriminator is to optimize the objective function to achieve the maximum value.
  • the loss function of the discriminator is in the form of a negative number, so the goal is to optimize the minimum value of the discriminant loss function.
  • the optimization goal of the generator is to make the objective function get the minimum value through optimization. Then the objective function can be expressed as:
  • P cond is the joint distribution probability of each characteristic condition included in the sample condition
  • y 1 *...y n is the conditional probability space y cond composed of n characteristic conditions y
  • the joint distribution probability of the conditional probability space is:
  • the joint distribution probability P cond and the noise space p z (z) are quantitative and the probability space y cond are both quantitative.
  • the server 102 optimizes network parameters according to the model function, and constructs a data repair model.
  • the server 102 undergoes repeated iterative training to optimize the network parameters of the generator network and the discriminator network according to the loss function, and according to the optimized network parameters, Build a data repair model.
  • the optimization process can be: according to the result of the discriminant loss function, use the adaptive moment estimation (Adaptive Moment Estimation, Adam) optimization algorithm to optimize the discriminator. After the discriminator is optimized, the optimization is completed according to the optimization. The discriminator optimizes the generator. According to the result of generating the loss function, the Adam algorithm is used to optimize the generator. Through the continuous iterative training of the generator and the discriminator, the loss function is converged. Here, the process and goal of the loss function convergence See step 307, which will not be repeated here. Further, after the loss function completes the process of convergence and optimization of the network parameters, a data repair model is constructed according to the optimized network parameters.
  • the adaptive moment estimation Adaptive Moment Estimation, Adam
  • an average cosine similarity calculation may be performed on the second repair data and the second processed data to obtain a similarity result.
  • the network parameters of the data repair model are optimized.
  • the second processed data is real data that matches the second repaired data.
  • the generated data sequence of the second repair data can be expressed as:
  • n is the number of iterations of the generated data sequence
  • l is the length of the data sequence
  • the data sequence of the second processed data can be expressed as:
  • k is the length of the second processed data
  • the second processed data is the original real sample data corresponding to the second repair data.
  • the average cosine similarity of this training can be calculated, and the network parameters can be optimized according to the average cosine similarity result, so that the generator can generate repair data closer to the true sample distribution.
  • the network parameter optimization process based on the average cosine similarity can be performed according to the switching modes of different conditional channels.
  • This embodiment takes three characteristic conditions as an example.
  • the switching modes of the four different conditional channels can be: fully closed (no characteristic condition), partially closed (one characteristic condition and two characteristic conditions), and fully open (three characteristic conditions). Characteristic conditions).
  • For the switch mode control of the conditional channel please refer to the relevant description in step 307, which will not be repeated here. Then in different conditional channel switching modes, the results of cosine similarity with iterative training can be seen in Figure 5.
  • the generated The repaired data has the highest similarity with the real sample data, and the generated repaired data is closer to the distribution of the real sample. And with the optimization of network parameters, the generated repair data is closer to the real sample data.
  • the generation quality of the repair data and the training situation of the network can be displayed more intuitively.
  • the introduction of multi-condition information helps to learn the rich characteristics of the sample, so that the data repair model can generate repair data closer to the true distribution, and improve the quality and efficiency of the generated data.
  • the server 102 performs normalization processing on the sample data after obtaining the sample data and the corresponding first sample condition to obtain the first processed data.
  • the conditions are subjected to quantification processing to obtain the first sample label.
  • the server 102 obtains the real sample data and the second sample condition, it performs normalization processing on the real sample data, and performs quantization processing on the second sample condition to obtain the second processed data and the second sample label.
  • the data repair model can be supervised and trained according to the first processed data, the first sample label, the second processed data, and the second sample label, and the model function including the generation loss function, the discriminant loss function and the objective function can be determined, and according to the model
  • the function optimizes network parameters, and builds a data repair model based on the optimized network parameters.
  • the data repair model can be obtained by performing supervised training based on known time series sample data. There is no need to obtain a large amount of historical data or manually obtain sample data with high matching degree as the training basis, which solves the problem of high experimental cost and sample data. Obtaining difficult issues.
  • FIG. 6 is a schematic flow chart of a method for generating time series data based on multiple constraints provided by an embodiment of the present application. As shown in FIG. 6, the method for generating time series data may include parts 601 to 606, wherein :
  • the server 102 obtains verification data, and performs normalization processing on the verification data to obtain third processed data.
  • the server 102 may obtain verification data from the client 101 or other data platforms.
  • the verification data may be understood as a type of sample data.
  • the sample data may include training data, verification data, and test data.
  • the related process of normalizing the verification data to obtain the third processed data can refer to the related description of normalizing the sample data in step 302, which is not repeated here.
  • the server 102 obtains the verification condition, and quantifies the verification condition to obtain a verification label.
  • the server 102 obtains a verification condition that matches the verification data in step 401, and performs a quantification process on the verification condition to obtain a verification label.
  • a verification condition that matches the verification data in step 401
  • a quantification process on the verification condition to obtain a verification label.
  • the server 102 calls the data repair model that has completed the training, performs repair processing on the third processed data according to the verification label, and obtains the third repair data.
  • the server 102 calls the data repair model that has been trained, performs repair processing on the third processed data according to the verification label, and obtains the third repaired data.
  • the process of obtaining the third repaired data please refer to the first repaired data generation process in step 204. Go into details.
  • the server 102 obtains the real verification data, and performs normalization processing on the real verification data to obtain the fourth processed data of the real verification data.
  • the server 102 obtains the real verification data and performs normalization processing on the real verification data.
  • the fourth processed data is real data that matches the third repaired data.
  • the server 102 performs residual analysis on the third repair data and the fourth processed data to obtain a residual analysis result.
  • the server 102 performs residual analysis on the third repair data and the fourth processed data to obtain a residual analysis result.
  • the residual analysis result may be a residual analysis result graph, and the generation quality of the third repair data can be displayed more intuitively through the residual analysis result graph.
  • the residual analysis process can be performed based on the switching modes of different conditional channels.
  • This embodiment takes three characteristic conditions as an example, and the switching modes of the three different conditional channels can be: fully closed (no characteristic condition), single-channel open (single characteristic condition), and multi-channel open (multiple characteristic conditions).
  • the switching modes of the three different conditional channels can be: fully closed (no characteristic condition), single-channel open (single characteristic condition), and multi-channel open (multiple characteristic conditions).
  • the switch mode of the conditional channel here, refer to the related description in step 307, which will not be repeated here.
  • the comparison between the generated third repair data and the real fourth processed data can be seen in Figure 7 (a), Figure 7 (b) and Figure 7 (c), as shown in Figure 7 (a) 7.
  • the generated third repaired data is closer to the fourth processed data, that is, the generated repaired data is closer to the distribution of the real sample.
  • residual analysis is performed on the corresponding third repaired data and the real fourth processed data in Figure 7(a), Figure 7(b) and Figure 7(c) respectively.
  • the schematic diagram of the residual analysis results can be seen in Figure 8(a), Fig. 8(b) and Fig. 8(c), among them, the dark gray part shown in area 1 in Fig. 8(a) is the part that is not accepted by the residual analysis, that is, the third The fourth processing data whose repaired data deviates from the real one is larger.
  • the server 102 sends the residual analysis result to the client 101.
  • the server 102 sends the residual analysis result to the client 101, and accordingly, the client 101 receives the residual analysis result, so that the client 101 displays the residual analysis result to the operating user 103 of the client 101
  • the operating user 103 can intuitively evaluate the quality of the generated repair data and the training situation of the data repair model based on the residual analysis result.
  • the server 102 when the server 102 obtains the verification data and verification conditions, it normalizes the verification data to obtain the third processed data, and performs a quantitative process on the verification conditions to obtain Verify the label.
  • the data repair model that has been trained, repair the third processed data according to the verification label, and obtain the third repaired data, and perform residual analysis on the third repaired data according to the acquired real fourth processed data to obtain the residual Analyze the result, and send the residual analysis result to the client.
  • the quality of the generated repair data and the training of the data repair model can be evaluated more intuitively and accurately, and it can also be determined that the repair data generated under multi-feature constraints can be closer to the distribution of real samples.
  • the introduction of multi-feature conditions can obtain richer features of repair data, which improves the efficiency and quality of data repair.
  • an embodiment of the present application also proposes a time series data generation device based on multiple conditional constraints.
  • the device for generating time series data based on multi-condition constraints may be a computer program (including program code) running in a processing device; as shown in FIG. 9, the image visualization processing device may run the following units:
  • the transceiver unit 901 is configured to receive a data repair request from the client, the data repair request includes data to be repaired and condition information, the data repair request is used to request data repair to the data to be repaired according to the condition information, so
  • the condition information is a characteristic condition that matches the data to be repaired;
  • the processing unit 902 is configured to perform normalization processing on the data to be repaired to obtain the normalized data of the data to be repaired, and perform quantization processing on the condition information to obtain a feature label of the condition information;
  • the data repair model that has been trained is called, the normalized data is repaired according to the feature label, and the first repair data is obtained.
  • the data repair model is based on the sample data, the first sample condition, and the real sample data And the second sample condition is obtained by training the data repair model, and the sample data is noise data;
  • the transceiver unit 901 is further configured to send the first repair data to the client.
  • the data to be repaired includes a sequence of time points
  • the repairing process is performed on the normalized data according to the feature tag to obtain the first repaired data
  • the processing unit 901 may be further configured to sort each data in the normalized data according to the time point sequence
  • the time point sequence is a sequence composed of the generation time points of each data in the data to be repaired, and each data in the normalized data is obtained after normalization processing of each data in the data to be repaired ;
  • data repair processing is performed on the normalized data that has been sorted to obtain first repair data.
  • the processing unit 901 may also be used to obtain the sample data and the first sample condition, and normalize the sample data Processing to obtain first processed data of the sample data, and perform quantification processing on the first sample condition to obtain a first sample label;
  • the network parameters are optimized according to the model function, and the data repair model is constructed.
  • the first sample condition includes n characteristic conditions, and n is a positive integer
  • the processing unit 901 may also be configured to receive a conditional instruction sent by the client, where the conditional instruction is used to instruct to acquire from n of the characteristic conditions x said characteristic conditions, x is a non-negative integer less than or equal to n;
  • the model function includes a generation loss function, a discriminative loss function, and an objective function
  • the supervised training of the data repair model is performed according to the first processed data, the first sample label, the second processed data, and the second sample label, the model function is determined, and the processing unit 901 further It can be used to perform repair processing on the first processed data according to the first sample label to obtain second repair data, and perform discrimination processing on the second repair data and the first sample label to obtain the first Judgment result;
  • the discrimination loss function and the generation loss function are optimized, and the objective function is determined.
  • the processing unit 901 may also be configured to perform average cosine similarity calculation on the second repair data and the second processed data to obtain a similarity result, and the second processed data is the same as the The real data that matches the second repair data;
  • the network parameters of the data repair model are optimized.
  • the sample data includes verification data
  • the processing unit 901 can also be used to obtain the verification data and perform normalization processing on the verification data to obtain the third processed data of the verification data.
  • the transceiver unit 901 may also be used to send the residual analysis result to the client.
  • part of the steps involved in the method for generating time series data based on multiple conditional constraints shown in FIGS. 2, 3, and 6 can be performed by the processing unit in the time series data generating device based on multiple conditional constraints.
  • steps 201 and 205 shown in FIG. 2 may be executed by the transceiver unit 901; for another example, step 203 shown in FIG. 2 may be executed by the processing unit 902.
  • the units in the device for generating time series data based on multi-condition constraints can be separately or completely combined into one or several other units to form, or some unit(s) of them can also be constructed. It is further divided into multiple functionally smaller units to form, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • FIG. 10 is a schematic structural diagram of a time-series data generation device based on multiple conditional constraints provided by an embodiment of the present application.
  • the data generation device includes a processor 1001, a memory 1002, and a communication interface 1003.
  • the processor 1001 and the memory 1002 And the communication interface 1003 is connected through at least one communication bus, and the processor 1001 is configured to support the processing device to perform the corresponding functions of the processing device in the methods of FIG. 2, FIG. 3, and FIG. 6.
  • the memory 1002 is used to store at least one instruction suitable for being loaded and executed by a processor, and these instructions may be one or more computer programs (including program codes).
  • the communication interface 1003 is used for receiving data and for sending data.
  • the communication interface 1003 is used to send data repair requests and the like.
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • a data repair request from the client is received through the communication interface 1003.
  • the data repair request includes data to be repaired and condition information.
  • the data repair request is used to request data repair to the data to be repaired according to the condition information.
  • the information is a characteristic condition that matches the data to be repaired;
  • the data repair model that has been trained is called, the normalized data is repaired according to the feature label, and the first repair data is obtained.
  • the data repair model is based on the sample data, the first sample condition, and the real sample data And the second sample condition is obtained by training the data repair model, and the sample data is noise data;
  • the first repair data is sent to the client through the communication interface 1003.
  • the data to be repaired includes a sequence of time points
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • each data in the normalized data is sorted, and the sequence of time points is a sequence composed of the generation time points of each data in the data to be repaired, and in the normalized data
  • Each data is obtained after normalization processing of each data in the data to be repaired;
  • data repair processing is performed on the normalized data that has been sorted to obtain first repair data.
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • the network parameters are optimized according to the model function, and the data repair model is constructed.
  • the first sample condition includes n characteristic conditions, and n is a positive integer
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • condition instruction is used to instruct to obtain x of the characteristic conditions from n of the characteristic conditions, and x is a non-negative integer less than or equal to n;
  • the model function includes a generation loss function, a discriminative loss function, and an objective function
  • the processor 1001 may Call the program code stored in the memory 1002 to perform the following operations:
  • the discrimination loss function and the generation loss function are optimized, and the objective function is determined.
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • the network parameters of the data repair model are optimized.
  • the sample data includes verification data
  • the processor 1001 may call the program code stored in the memory 1002 to perform the following operations:
  • the embodiment of the present application also provides a computer-readable storage medium (Memory), which can be used to store the computer software instructions used by the processing device in the embodiment shown in FIG. 2 and FIG.
  • these instructions may be one or more computer programs (including program codes).
  • the above-mentioned computer-readable storage medium includes, but is not limited to, flash memory, hard disk, and solid-state hard disk.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium or transmitted through a computer-readable storage medium.
  • Computer instructions can be sent from one website site, computer, server, or data center to another website site, computer via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , Server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于多条件约束的时间序列数据生成方法、装置及介质,其中,所述方法包括:数据修复请求(201);数据修复请求包括待修复数据及条件信息;对待修复数据进行归一化处理,得到归一化数据(202);并对条件信息进行张量化处理,得到特征标签(203);调用已完成训练的数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据(204),数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件进行训练得到的,样本数据为噪声数据;服务器发送第一修复数据至客户端(205)。采用所述方法,无需大量的历史数据或匹配度高的样本数据作为训练基础,便可获取数据的丰富特征,保证了修复数据的准确性及时序性,提高了修复效率及质量。

Description

基于多条件约束的时间序列数据生成方法、装置及介质 技术领域
本申请涉及大数据技术领域,尤其涉及一种基于多条件约束的时间序列数据生成方法、装置及介质。
背景技术
时间序列数据是指在不同时间收集到的数据,此类数据在生产和生活中用于描述某一事物或现象等随时间的变化情况。但是,由于此类数据的数据点较为密集且抗干扰性差,在数据的采集、应用或传递等过程中,容易造成数据的缺失。目前,针对缺失数据的修复方法主要包括如下两种:第一种是基于先验知识的插值方法;第二种是获取与已缺失数据最匹配的样本数据,利用该样本数据训练生成对抗网络,以修复已缺失数据。但是第一种方法需要有大量的历史数据作为基础,不适用于海量数据的修复;第二种方法较难获取匹配度高的样本数据且难以学习到数据的有效特征,修复的数据准确性差且不具有时序性。
发明内容
本申请实施例提供一种基于多条件约束的时间序列数据生成方法、装置及介质,无需大量的历史数据或匹配度高的样本数据作为训练基础,便可获取待修复数据的丰富特征,保证了修复数据的准确性及时序性,提高了修复效率及质量。
第一方面,本申请实施例提供一种基于多条件约束的时间序列数据生成方法,包括:
接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;
调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的, 所述样本数据为噪声数据;
发送所述第一修复数据至所述客户端。
在该技术方案中,客户端发送包括待处理数据及条件信息的数据修复请求至服务器,以使服务器对待修复数据进行归一化处理得到归一化数据,并对条件信息进行张量化处理,得到特征标签,该条件信息为与待修复数据相匹配的特征条件,调用已完成训练数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据,并将该第一修复数据发送至客户端。通过这种方法,基于条件信息的输入,可获取待修复数据的丰富特征,以使生成的第一修复数据更接近真实数据的分布特征,保证了修复数据的准确性及时序性,提高了修复效率及质量。
第二方面,本申请实施例提供一种基于多条件约束的时间序列数据生成装置,包括:
收发单元,用于接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
处理单元,用于对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的,所述样本数据为噪声数据;
所述收发单元,还用于发送所述第一修复数据至所述客户端。
第三方面,本申请实施例提供一种基于多条件约束的时间序列数据生成装置,包括处理器、存储器和通信接口,所述处理器、所述存储器和所述通信接口相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如第一方面所描述的方法。该处理设备解决问题的实施方式以及有益效果可以参见上述第一方面所描述的方法以及有益效果,重复之处不再赘述。
第四方面,本申请实施例提供一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一条或多条第一指令,所述一条或多条第一指令适于由处理器加载并执行如第一方面所描述的方法。
本申请实施例中,客户端发送数据修复请求至服务器,该数据修复请求包括待修复数据集条件信息,服务器对待修复数据进行归一化处理得到归一化数据,并对条件信息进行张量化处理得到特征标签,该条件信息为与待修复数据相匹配的特征条件,此处基于条件信息进行数据修复可以充分考虑待修复数据特征的多样性,以便得到更准确的修复数据。调用已完成训练数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据,并将该第一修复数据发送至客户端。其中,数据修复模型的训练方法为:通过多条件约束的生成对抗网络对输入的至少一组样本数据、第一样本条件、真实样本数据及第二样本条件进行迭代监督训练,其中,样本数据为噪声数据,真实样本数据为真实的时间序列数据,通过本实施例的方法,可以对存在缺失的时间序列数据进行修复,无需大量的历史数据或手动获取与缺失数据匹配度高的样本数据作为训练基础,便可实现对模型的训练,并且,通过多个特征条件信息的引入,可以获取到待修复数据的丰富特征,采用长短神经记忆(Long Short-Term Memory,LSTM)网络作为多条件约束的生成对抗网络中生成器及判别器的内置网络,保证了修复数据的准确性及时序性,提高了修复效率及质量。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种基于多条件约束的时间序列数据生成系统的架构图;
图2是本申请实施例提供的一种基于多条件约束的时间序列数据生成方法的流程图;
图3是本申请实施例提供的另一种基于多条件约束的时间序列数据生成方法的流程图;
图4是本申请实施例提供的一种多条件生成对抗网络的框架图;
图5是本申请实施例提供的一种平均余弦相似度计算结果的示意图;
图6是本申请实施例提供的又一种基于多条件约束的时间序列数据生成方法的流程图;
图7(a)是本申请实施例提供的一种无条件约束生成数据与真实数据的对比示意图;
图7(b)是本申请实施例提供的一种单条件约束生成数据与真实数据的对比示意图;
图7(c)是本申请实施例提供的一种多条件约束生成数据与真实数据的对比示意图;
图8(a)是本申请实施例提供的一种无条件约束的残差分析结果示意图;
图8(b)是本申请实施例提供的一种单条件约束的残差分析结果示意图;
图8(c)是本申请实施例提供的一种多条件约束的残差分析结果示意图;
图9是本申请实施例提供的一种基于多条件约束的时间序列数据生成装置的结构示意图;
图10是本申请实施例提供的另一种基于多条件约束的时间序列数据生成装置的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例的技术方案进行描述。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或装置没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或装置固有的其它步骤或模块。
时间序列数据是一类在不同时间收集到的具有时间信息的一维数据,例如:交通监控数据、停车场的停车情况数据,等等。该类数据在生产和生活中用于描述某一事物或现象等随时间的变化情况,但是,由于此类数据的数据点较为密集且抗干扰性差,在数据的采集、应用或传递等过程中,容易造成数据的缺失,从而对生产及生活带来巨大影响。目前,针对缺失数据的修复方法主要包括如下两种:第一种是基于先验知识的插值方法,该方法需要有大量的历史数 据作为基础,无法修复缺失较多数据,不适用于海量数据的修复;另一种方法需要获取与已缺失数据最匹配的样本数据,利用该样本数据训练生成对抗网络,得到生成数据,将生成数据填补至缺失数据中。其中,生成对抗网络可以包括:深度卷积生成对抗网络(Deep Convolution Generative Adversarial Networks,DCGAN)。但是该方法较难获取匹配度高的样本数据,且生成数据具有无序性,例如:在生成一周的停车场数据时,无法确定生成的数据是哪一天的数据。同时,利用生成对抗网络进行训练时需要提取样本的特征,由于,样本特征的多样化,使得不能从一个样本中学习到全部的特征,从而影响数据修复的准确性。
为解决上述问题,本申请实施例提供一种基于多条件约束的时间序列数据生成方法,该时间序列数据生成方法基于多条件生成对抗网络(Multi-condition Generative Adversarial Networks,MCGAN)进行海量数据的修复。该多条件生成对抗网络包括生成器网络及判别器网络,通过上述多条件生成对抗网络构建的数据修复模型,可以基于待修复数据及待修复数据对应的条件信息对待修复数据进行数据修复,得到第一修复数据。其中,条件信息为待修复数据的特征条件,例如:时间、空间及气候,等等。通过本实施方式,无需大量的历史数据或特定的样本数据作为基础,便可获取丰富的样本特征,以使数据修复模型可以生成更接近真实数据的修复数据,保证了海量修复数据的准确性及时序性,提高了数据修复质量。
可选的,本实施方式可以应用于停车场数据修复场景,具体的,可以获取有缺失的停车场数据作为待修复数据,并获取影响车位分布的条件信息,例如:时间、空间、气候等信息。将上述待修复数据经过归一化处理得到归一化数据,并将条件信息进行张量化处理得到特征标签。则可以调用基于多条件生成对抗网络的数据修复模型,根据特征标签对归一化数据进行数据修复,得到第一修复数据,该第一修复数据可以理解为符合停车场真实情况的数据。
上述提及的基于多条件约束的时间序列数据生成方法可应用于如图1所示的基于多条件约束的时间序列数据生成系统中,该多条件约束的时间序列数据生成系统可包括客户端101及服务器102。该客户端101的形态和数量用于举例,并不构成对本申请实施例的限定。例如,可以包括两个客户端101。
其中,客户端101可以为向服务器102发送数据修复请求的客户端,也可以为在数据修复模型训练时,用于为服务器102提供样本数据、第一样本条件、真 实样本数据及第二样本条件的客户端,该客户端可以为以下任一种:终端、独立的应用程序、应用程序编程接口(Application Programming Interface,API)或者软件开发工具包(Software Development Kit,SDK)。其中,终端可以包括但不限于:智能手机(如Android手机、IOS手机等)、平板电脑、便携式个人计算机、移动互联网设备(Mobile Internet Devices,MID)等设备,本申请实施例不做限定。服务器102可以包括但不限于集群服务器。
在本申请的实施例中,客户端101向服务器102发送数据修复请求,服务器102根据该数据修复请求所包括的待修复数据及条件信息,获取该待修复数据的第一修复数据。具体的,对待修复数据进行归一化处理,得到归一化数据,并对条件信息进行张量化处理,得到特征标签,通过预先训练好的数据修复模型结合特征标签对归一化数据进行修复处理,得到第一修复数据,将该第一修复数据发送至客户端101,以使客户端101的操作用户103可以根据该第一修复数据进行某一事物或现象等随时间的变化情况的分析。
请参见图2,图2是本申请实施例提供的一种基于多条件约束的时间序列数据生成方法的流程示意图,如图2所示,该时间序列数据生成方法可以包括201~205部分,其中:
201、客户端101发送数据修复请求至服务器102。
具体的,客户端101发送数据修复请求至服务器102,相应的,服务器102接收来自客户端101的数据修复请求,该数据修复请求包括待修复数据及条件信息,该数据修复请求用于请求根据条件信息对待修复数据进行数据修复,其中,待修复数据为已存在缺失情况的数据,具体的,可以为已存在缺失情况的时间序列数据。条件信息为与待修复数据相匹配的特征条件,例如:若待修复数据为停车场数据,则条件信息可以包括时间、空间及气候,等等。
202、服务器102对待修复数据进行归一化处理,得到待修复数据的归一化数据。
具体的,服务器102对待修复数据进行归一化处理,得到待修复数据的归一化数据。该待处理数据为时间序列数据,例如:为时间序列数据M,则该待处理数据可以表示为
Figure PCTCN2020081440-appb-000001
其中m tk表示t k时刻对应的待修复数据的数据值,l为待修复数据的长度。
则归一化处理方法可以为,获取序列的最大值M max=max(M),将上述待修复数据中各个数据与该最大值分别作商,即可得到归一化数据,该归一化数据可以表示为:
Figure PCTCN2020081440-appb-000002
其中
Figure PCTCN2020081440-appb-000003
表示t l时刻对应的归一化数据的数据值。
可选的,在对待修复数据进行归一化处理之前,可以对待修复数据进行数据清洗。通过该可选的实施方式,有利于归一化数据在数据修复模型中更容易收敛。
203、服务器102对条件信息进行张量化处理,得到条件信息的特征标签。
具体的,服务器102对条件信息进行张量化处理,得到条件信息的特征标签。该条件信息为与上述待修复数据相匹配的特征条件,其中,条件信息可以包括但不限于:静态条件信息,例如:当待修复数据为停车场数据时,静态条件信息可以为停车场周围建筑分布;动态连续性条件信息,例如:时间序列标签;离散条件信息,例如:一周有7天,则7天是不同的离散特性,或天气、节假日等社会事件。则在输入至服务器102时。需要对获取的条件信息进行张量化处理,得到条件信息的特征标签。
进一步的,对于静态条件信息的张量化方法,以建筑物为例,不同的建筑物对应的静态条件信息不同,则需要获取不同分布情况的建筑物,即静态条件信息的多个影响因子l 1…l n,其中,n为影响因子的数量,则对静态条件信息的张量化处理可以采用归一化的方法,得到的特征标签可以表示为:
Figure PCTCN2020081440-appb-000004
进一步的,对于动态连续性条件信息的张量化方法,可以采用归一化处理方法,获取动态连续性条件信息的条件序列,该条件序列可以理解为采用时间顺序排列的条件标签,该条件序列可以表示为:
Figure PCTCN2020081440-appb-000005
其中,
Figure PCTCN2020081440-appb-000006
表示在时刻t k时刻的条件标签值。
则归一化处理方法可以为,获取条件序列的最大值L max=max(L),将上述待修复数据中各个数据与该最大值分别作商,即可得到特征标签,该特征标签可以表示为:
Figure PCTCN2020081440-appb-000007
进一步的,对于离散条件信息的张量化处理,可以使用独热编码(one-hot)模式,该特征标签可以表示为:
C=onehot(n)
其中,n为事件出现的可能性数目,例如:若事件出现的可能性数目为2,则事件1和事件2的表示方法可以分别为{1,0}及{0,1}.
204、服务器102调用已完成训练的数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据。
具体的,服务器102调用已完成训练的数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据。其中,数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对数据修复模型进行训练得到的,该样本数据为噪声数据。该数据修复模型为生成器网络及判别器网络利用样本数据、第一样本条件、真实样本数据及第二样本条件进行反复迭代训练构建的模型。
进一步的,上述待修复数据包括时间点序列,则根据特征标签对归一化数据进行修复处理,得到第一修复数据,可以为根据时间点序列,对归一化数据中各个数据进行排序,根据特征标签对已完成排序的归一化数据进行数据修复处理,得到第一修复数据。其中,时间点序列为待修复数据中各个数据的生成时间点所组成的序列,即上述待处理数据中{t l,t 2…t l},则将待修复数据经由归一化处理得到得归一化数据也携带有该时间点序列,即上述归一化数据中的{t l,t 2…t l}。在对归一化数据中的各个数据按照时间点完成排序的情况下,将已完成排序的归一化数据输入至数据修复模型的生成器网络中,该生成器的内置网络为长短神经记忆(Long Short-Term Memory,LSTM)网络,采用LSTM网络,可以提升该数据修复模型对时间序列的处理能力。具体的,将已完成排序的归一化数据中的各个数据按照时间顺序输入至LSTM网络的各个对应细胞接口中,其中,LSTM网络的细胞接口数据与归一化数据的长度相同。并且将特征标签分别输入至每个细胞接口,以使生成器网络可以根据该特征标签对归一化数据进行数据修复,得到第一修复数据。
通过执行本实施方式,可以基于待修复数据的丰富特征,得到更准确的修复数据,采用LSTM网络可以保证生成数据的时序性,提高了数据修复的质量。
205、服务器102发送第一修复数据至客户端101。
具体的,服务器102将第一修复数据发送至客户端101。相应的,客户端101接收该第一修复数据,以使客户端101的操作用户103可以根据该第一修复数据进行某一事物或现象等随时间的变化情况的分析。该第一修复数据为完成修复的接近真实情况的数据
可见,通过实施图2所描述的方法,客户端101在发送数据修复请求后,服务器102对数据修复请求中的待修复数据进行归一化处理得到归一化数据,并对数据修复请求中的条件信息进行张量化处理,得到特征标签,该条件信息为与待修复数据相匹配的特征条件,此处基于条件信息进行数据修复可以充分考虑待修复数据特征的多样性,以便得到更准确的修复数据。调用已完成训练数据修复模型,根据特征标签对归一化数据进行修复处理,得到第一修复数据,并将该第一修复数据发送至客户端101。通过执行本实施例的方法,无需大量的历史数据或匹配度高的样本数据作为训练基础,便可获取待修复数据的丰富特征,保证了修复数据的准确性及时序性,提高了修复效率及质量。
请参见图3,图3是本申请实施例提供的一种基于多条件约束的时间序列数据生成方法的流程示意图,如图3所示,该时间序列数据生成方法可以包括301~308部分,其中:
301、服务器102获取样本数据及第一样本条件。
具体的,服务器102可以从客户端101或其他数据平台获取样本数据,及与该样本数据相匹配的第一样本条件。其中,第一样本条件的相关描述可以参见步骤201中条件信息的相关描述,此处不赘述。样本数据可以为在噪声空间采样得到的噪声样本序列数据,该样本数据可以表示为:
Z={z (1),z (2),z (3)…z (n)}
302、服务器102对样本数据进行归一化处理,得到第一处理数据。
具体的,服务器102在获取到样本数据的情况下,对样本数据进行归一化处理,得到样本数据的第一处理数据。此处的归一化处理的方法可以参见步骤202中对待修复数据的归一化处理的相关描述,此处不赘述。
303、服务器102对第一样本条件进行张量化处理,得到第一样本标签。
具体的,服务器102对第一样本条件进行张量化处理,得到第一样本标签。此处的张量处理的方法可以参见步骤203中对条件信息的张量化处理的相关描 述,此处不赘述。
进一步的,在获取到第一处理数据及第一样本标签后,可以根据第一处理数据及与该第一处理数据相对应的第一样本标签,构建样本监督数据集,该样本监督数据集用于输入至MCGAN中进行网络训练,构建数据修复模型。其中,第一样本标签Y p可以表示为:
Figure PCTCN2020081440-appb-000008
其中n为第一样本标签的个数,则上述监督数据集可以表示为:
Figure PCTCN2020081440-appb-000009
304、服务器102获取真实样本数据及第二样本条件。
具体的,服务器102可以从客户端101或其他数据平台获取真实样本数据,及与真实样本数据相匹配的第二样本条件。其中,第二样本条件的相关描述可以参见步骤201中条件信息的相关描述,此处不赘述,可选的,第二样本条件可以为与第一样本条件相同的特征条件。真实样本数据为存在缺失的真实数据,例如:存在缺失的真实停车场数据,该真实样本数据X可以表示为:
X={x (1),x (2),x (3)…x (n)}
其中,n为真实样本数据的个数。
305、服务器102对真实样本数据进行归一化处理,得到第二处理数据。
具体的,服务器102对真实样本数据进行归一化处理,得到真实样本数据的第二处理数据。此处的归一化处理的方法可以参见步骤202中对待修复数据的归一化处理的相关描述,此处不赘述。
306、服务器102对第二样本条件进行张量化处理,得到第二样本标签。
具体的,服务器102对第一样本条件进行张量化处理,得到第一样本标签。此处的张量处理的方法可以参见步骤203中对条件信息的张量化处理的相关描述,此处不赘述。
进一步的,可以根据第二处理数据及与该第二处理数据相对应的第二样本标签,构建真实监督数据集,该真实监督数据集构成方法的相关描述可以步骤303中样本监督数据集的构成,此处不赘述。
307、服务器102根据第一处理数据、第一样本标签、第二处理数据及第二样本标签对数据修复模型进行监督训练,确定模型函数。
具体的,服务器102根据第一样本数据、第一样本标签、第二处理数据及第 二样本标签对数据修复模型进行监督训练,确定模型函数,以使可以进一步根据该模型的函数,优化网络参数,构建数据修复模型,即执行步骤308。
进一步的,训练该数据修复模型所涉及的网络为MCGAN网络,主要包括生成器及判别器,该网络的框架图可参见图4所示,利用该MCGAN网络进行监督训练的过程为:从噪声空间采样得到样本数据,对样本数据Z进行归一化处理得到第一处理数据,并获取与样本数据Z相匹配的第一样本条件,将第一样本条件进行张量化处理得到第一样本标签。其中,第一样本标签可以包括多个张量化特征条件C,将第一处理数据输入至生成器的内置LSTM网络的每个数据细胞接口中,并将第一样本标签的各个张量化特征条件C通过条件通道输入至每个细胞接口中,每个条件通道可以传输一个张量化特征条件C,则上述生成器的输入数据可以表示为{Z,C 1,C 2…C n}。进一步可得到第二修复数据,此处关于LSTM网络的相关描述可以参见步骤204中对应的描述,此处不赘述。在得到第二修复数据的情况下,可以将第二修复数据F及对应的第一样本标签输入至判别器中进行判别处理,可以得到第一判别结果;并且将完成归一化处理的真实样本数据R及与该真实样本数据R对应的经过张量化处理的第二样本标签输入至判别器中,该第二样本标签包括多个张量化特征条件C,此时判别器的输入数据可以表示为{(F or R),C 1,C 2…C n},输出第二判别结果。其中,判别器的内置网络为LSTM网络,该判别器与生成器相同,均包含有条件通道及数据通道。可选的,在生成器及判别器中均可以配置有状态转换向量,该状态转换向量可以控制上述条件通道的开通或关闭,从而调整训练所需的特征条件。
进一步的,上述第一样本条件包括n个特征条件,n为正整数,则在获取样本数据及第一样本条件之前,客户端101还可以发送条件指令至服务器102,该条件指令用于指示从n个特征条件中获取x个特征条件,x为小于或等于n的非负整数。则服务器102可以根据条件指令从包括n个特征条件的第一样本条件信息中获取x个特征条件。具体的,可以包括如下三种情况:当客户端101不发送该条件指令时,则生成器及判别器的条件通道为全部开启状态;若接收到客户端101发送的指令,当条件指令指示关闭部分条件通道时,则生成器及判别器关闭指定的条件通道;当条件指令指示关闭全部,则生成器及判别器根据关闭全部的条件通道,此时生成修复数据的过程不需要考虑特征条件。可选的,该条件指令可以包括状态转换向量,该状态转换向量可以嵌入到条件通道中,以便控制各个条件通道的开关。则在增加状态转换向量后的生成器LSTM网络的输入 数据G′可以表示为:
G′={Z,S 1*C 1,S 2*C 2…S n*C n}
其中,S=1代表通路,即条件通道的状态为开启;S=0代表闭路,即条件通道的状态为关闭。进一步的,在增加了状态转换向量后的判别器LSTM网络的输入数据D′可以表示为:
D′={(F or R),S 1*C 1,S 2*C 2…S n*C n}
通过该实施方式,可以提高网络的应用范围,对于不同的特征条件情况,该网络结构可以作出适应性的调整,以获取不同特征条件下,生成的修复数据。
进一步的,模型函数可以包括生成损失函数、判别损失函数及目标函数,则确定模型函数的过程可以为:根据第一样本标签对第一处理数据进行修复处理,得到第二修复数据,并对第二修复数据及第一样本标签进行判别处理,得到第一判别结果;对第二处理数据及第二样本标签进行判别处理,得到第二判别结果;根据第一判别结果及第二判别结果,确定判别损失函数;根据第一判别结果,确定生成损失函数;对判别损失函数及生成损失函数进行优化,确定目标函数。
具体的,生成器根据第一样本标签Y p对第一处理数据进行修复处理,得到携带有第一样本标签Y p的第二修复数据G(Z|Y p),其中,第一处理数据为完成归一化处理的样本数据Z。在得到携带有第一样本标签Y p的第二修复数据G(Z|Y p)的情况下,将携带有第一样本标签Y p的第二修复数据G(Z|Y p)输入至判别器中,以使判别器对第二修复数据进行判别,得到第一判别结果D(G(Z|Y p))。判别器一方面需要判断生成的数据是否满足真实的样本分布,另一方面,也需要判断生成的数据是否满足相应的特征条件。若判别结果为是,则说明生成的第二修复数据为满足真实样本特征的数据;若判断结果为否,则需要网络参数,继续迭代训练生成满足真实样本特征的修复数据,使判别器的输出尽可能全为真。其中判别器的诊断网络J可以表示为:
J=J real-sample-distribustion(D′)&J condition-1(D′)&J condition-1(D′)&…
…J condition-n(D′)
其中,J real-sample-distribustion(D′)表示对生成的数据是否满足真实的样本分布的判断结果,J condition-n(D′)表示对生成的数据是否满足相应的特征条件的判断结果,D′表示判别器输出的判别结果,D′={d 1,d 2…d n}。
基于上述判别器的诊断网络,进一步的,可以对第二处理数据及第二样本 标签Y p进行判别处理,得到第二判别结果D(X|Y p),其中,第二处理数据为完成归一化处理的真实样本数据X。在获取到上述第一判别结果及第二判别结果的情况下,可以根据第一判别结果及第二判别结果,确定判别损失函数,该判别损失函数为判别器的损失函数,该判别器的损失函数可以表示为:
Figure PCTCN2020081440-appb-000010
可以根据第一判别结果,确定生成损失函数,该生成损失函数为生成器的损失函数,该生成器的损失函数可以表示为:
Figure PCTCN2020081440-appb-000011
则在获取上述判别损失函数及生成损失函数的情况下,对判别损失函数及生成损失函数进行优化,确定目标函数。其中,对于判别器的优化目标是通过优化使得目标函数可以取得最大值。本实施方式中判别器的损失函数为负数形式,则以优化求得判别损失函数的最小值为目标。对于生成器的优化目标是通过优化使得目标函数取得最小值。则目标函数可以表示为:
Figure PCTCN2020081440-appb-000012
Figure PCTCN2020081440-appb-000013
其中,P cond为样本条件所包括的各个特征条件的联合分布概率,y 1*…y n为n个特征条件y所组成的条件概率空间y cond,则服从条件概率空间的联合分布概率为:
P cond:y 1*…y n
若输入数据为真实样本数据,则联合分布概率P cond、噪声空间p z(z)为定量及概率空间y cond均为定量。
308、服务器102根据模型函数优化网络参数,构建数据修复模型。
具体的,服务器102在确定上述生成损失函数、判别损失函数及目标函数的过程中,经过反复迭代训练,根据损失函数,优化生成器网络及判别器网络的网络参数,根据优化完成的网络参数,构建数据修复模型。其中,优化的过程可以为:根据判别损失函数的结果,使用自适应时刻估计(Adaptive Moment Estimation,Adam)优化算法对判别器进行优化,在对判别器优化完成的情况 下,根据优化完成优化的判别器对生成器进行优化,根据生成损失函数的结果,使用Adam算法对生成器进行优化,通过生成器及判别器的不断迭代对抗训练,使得损失函数收敛,此处损失函数收敛的过程及目标可以参见步骤307,此处不赘述。进一步的,在损失函数完成收敛及对网络参数的优化过程后,根据优化完成的网络参数,构建数据修复模型。
可选的,在优化网络参数之前,还可以对第二修复数据及第二处理数据进行平均余弦相似度计算,得到相似度结果。根据相似度结果,优化数据修复模型的网络参数。其中,第二处理数据为与第二修复数据相匹配的真实数据。具体的,生成的第二修复数据的数据序列可以表示为:
Figure PCTCN2020081440-appb-000014
其中,m为生成的数据序列的迭代次数,l为数据序列的长度,
第二处理数据的数据序列可以表示为:
Figure PCTCN2020081440-appb-000015
其中,k为第二处理数据的长度,该第二处理数据与第二修复数据相对应的原始真实样本数据。则对第二处理数据及第二修复数据进行平均余弦相似度计算,该相似度的计算方法可以表示为:
Figure PCTCN2020081440-appb-000016
进一步的,可以在每轮迭代训练时,计算本次训练的平均余弦相似度,根据平均余弦相似度结果,对网络参数进行优化,以使生成器可以生成更接近真实样本分布的修复数据。
举例来说,可以根据不同的条件通道的开关模式,进行基于平均余弦相似度的网络参数优化过程。本实施方式以三个特征条件为例,则四种不同的条件通道的开关模式可以为:全闭合(无特征条件)、部分闭合(一个特征条件及两个特征条件)及全开(三个特征条件),此处对条件通道的开关模式调控可以参见步骤307中的相关描述,此处不赘述。则在不同条件通道开关模式下,余弦相似度的结果随着迭代训练的变化情况示意图可以参见图5,如图5所示,在条件通道全开状态下,即考虑全部特征条件时,生成的修复数据与真实样本数据的相似度最高,生成的修复数据更接近真实样本的分布。且随着对网络参数的优化,生成的修复数据越接近真实样本数据。
通过执行本实施方式,可以更直观的展示修复数据的生成质量及对网络的训练情况。多条件信息的引入有助于学习到样本的丰富特征,以使数据修复模型可以生成更接近真实分布的修复数据,提高了生成数据的质量及效率。
可见,通过实施图3所描述的方法,服务器102在获取到样本数据及对应的第一样本条件的情况下,对样本数据经过归一化处理,得到第一处理数据,对第一样本条件进行张量化处理,得到第一样本标签。并且服务器102在获取到真实样本数据及第二样本条件的情况下,对真实样本数据进行归一化处理,对第二样本条件进行张量化处理,得到第二处理数据及第二样本标签。则可以根据第一处理数据、第一样本标签、第二处理数据及第二样本标签对数据修复模型进行监督训练,确定包括生成损失函数、判别损失函数及目标函数的模型函数,并根据模型函数优化网络参数,根据优化完成的网络参数,构建数据修复模型。通过执行本实施方式,根据已知的时间序列样本数据进行监督训练,即可得到数据修复模型,无需大量的历史数据或手动获取匹配度高的样本数据作为训练基础,解决了实验成本高,样本获取困难的问题。通过引入与数据相对应的特征条件,可以学习到样本的丰富特征,使得生成的修复数据可以更接近真实的样本分布,采用LSTM作为生成器及判别器的内置网络,保证了生成的修复数据的时序性,提高了数据修复过程的效率及质量。
请参见图6,图6是本申请实施例提供的一种基于多条件约束的时间序列数据生成方法的流程示意图,如图6所示,该时间序列数据生成方法可以包括601~606部分,其中:
601、服务器102获取验证数据,并对验证数据进行归一化处理,得到第三处理数据。
具体的,服务器102可以从客户端101或其他数据平台获取验证数据,该验证数据可以理解为一种样本数据,可选的,样本数据可以包括训练数据、验证数据及测试数据。则对验证数据进行归一化处理得到第三处理数据的相关过程可以参见步骤302中对样本数据进行归一化处理的相关描述,此处不赘述。
602、服务器102获取验证条件,并对验证条件进行张量化处理,得到验证标签。
具体的,服务器102获取与步骤401中的验证数据相匹配的验证条件,并对该验证条件进行张量化处理,得到验证标签。此处关于验证条件及验证标签获 取方法的相关描述可以参见步骤601及步骤303中关于第一样本条件及其获取方法的相关描述,此处不赘述。
603、服务器102调用已完成训练的数据修复模型,根据验证标签对第三处理数据进行修复处理,得到第三修复数据。
具体的,服务器102调用已完成训练的数据修复模型,根据验证标签对第三处理数据进行修复处理,得到第三修复数据的过程,可以参见步骤204中第一修复数据的生成过程,此处不赘述。
604、服务器102获取真实验证数据,并对真实验证数据进行归一化处理,得到真实验证数据的第四处理数据。
具体的,服务器102获取真实验证数据,并对真实验证数据进行归一化处理的相关描述,可以参见步骤304及步骤305中关于真实样本数据的获取及归一化处理方法的相关描述,此处不赘述。该第四处理数据为与第三修复数据相匹配的真实数据。
605、服务器102对第三修复数据及第四处理数据进行残差分析,得到残差分析结果。
具体的,服务器102对第三修复数据及第四处理数据进行残差分析,得到残差分析结果。该残差分析结果可以为残差分析结果图,通过该残差分析结果图可以更直观的展示第三修复数据的生成质量。
举例来说,可以基于不同条件通道的开关模式,进行残差分析过程。本实施方式以三个特征条件为例,则三种不同的条件通道的开关模式可以为:全闭合(无特征条件)、单通道开启(单特征条件)及多通道开启(多特征条件),此处对条件通道的开关模式调控可以参见步骤307中的相关描述,此处不赘述。则在不同条件通道开关模式下,生成的第三修复数据与真实的第四处理数据对比示意图可以参见图7(a)、图7(b)及图7(c),如图7(a)、图7(b)及图7(c)所示,引入的特征条件越多,生成的第三修复数据接近第四处理处理数据,即生成的修复数据越接近真实样本的分布。进一步的,对图7(a)、图7(b)及图7(c)中相应的第三修复数据与真实的第四处理数据分别进行残差分析,残差分析结果示意图可以分别参见图8(a)、图8(b)及图8(c),其中,如图8(a)中区域1所显示的深灰色的部分,为残差分析不接受的部分,即得到的第三修复数据偏离真实的第四处理数据较大。如图8(a)、图8(b)及图8(c)所示,多特征的引入使得生成的第三修复数据接近第四处理处理数据,即生成的修复数据越接近真实样本的分布。所 以,基于多条件约束的MCGAN在修复时间序列数据时,可以得到更准确、更接近真实分布情况的修复数据,提高了数据修复的效率及质量。
606、服务器102发送残差分析结果至客户端101。
具体的,服务器102将该残差分析结果发送至客户端101,相应的,客户端101接收该残差分析结果,以使客户端101将该残差分析结果展示给客户端101的操作用户103,操作用户103可以根据该残差分析结果直观的评估生成的修复数据的质量及该数据修复模型的训练情况。
可见,通过实施图6所描述的方法,服务器102在获取到验证数据及验证条件的情况下,对验证数据进行归一化处理,得到第三处理数据,并对验证条件进行张量化处理,得到验证标签。调用已完成训练的数据修复模型,根据验证标签对第三处理数据进行修复处理,得到第三修复数据,根据获取到的真实的第四处理数据对第三修复数据进行残差分析,得到残差分析结果,并将该残差分析结果发送给客户端。通过执行本实施方式,可以更直观、准确的评估生成的修复数据的质量及该数据修复模型的训练情况,并且,还可以确定在多特征条件约束下生成的修复数据可以更接近真实样本的分布情况,多特征条件的引入可以获取到修复数据更丰富的特征,提高了数据修复的效率及质量。
基于上述方法实施例的描述,本申请实施例还提出一种基于多条件约束的时间序列数据生成装置。该基于多条件约束的时间序列数据生成装置可以是运行于处理设备中的计算机程序(包括程序代码);请参见图9所示,该图像可视化处理装置可以运行如下单元:
收发单元901,用于接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
处理单元902,用于对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的,所述样本数据为噪声数据;
所述收发单元901,还用于发送所述第一修复数据至所述客户端。
在一种实施方式中,所述待修复数据包括时间点序列;
所述根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,处理单元901,还可用于根据所述时间点序列,对所述归一化数据中各个数据进行排序,所述时间点序列为所述待修复数据中各个数据的生成时间点所组成的序列,所述归一化数据中各个数据为所述待修复数据中各个数据经过归一化处理后得到的;
根据所述特征标签,对所述已完成排序的所述归一化数据进行数据修复处理,得到第一修复数据。
再一种实施方式中,所述调用已完成训练的数据修复模型之前,处理单元901,还可用于获取所述样本数据及所述第一样本条件,并对所述样本数据进行归一化处理,得到所述样本数据的第一处理数据,对所述第一样本条件进行张量化处理,得到第一样本标签;
获取所述真实样本数据及所述第二样本条件,并对所述真实样本数据进行归一化处理,得到所述真实样本数据的第二处理数据,对所述第二样本条件进行张量化处理,得到第二样本标签;
根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数;
根据所述模型函数优化网络参数,构建所述数据修复模型。
再一种实施方式中,所述第一样本条件包括n个特征条件,n为正整数;
所述获取所述样本数据及所述第一样本条件之前,处理单元901,还可用于接收所述客户端发送的条件指令,所述条件指令用于指示从n个所述特征条件中获取x个所述特征条件,x为小于或等于n的非负整数;
根据所述条件指令从所述包括n个特征条件的第一样本条件信息中获取x个所述特征条件。
再一种实施方式中,所述模型函数包括生成损失函数、判别损失函数及目标函数;
所述根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数,处理单元901,还可用于根据所述第一样本标签对所述第一处理数据进行修复处理,得到第二修复数据,并对所述第二修复数据及所述第一样本标签进行判别处理,得到第一判别结果;
对所述第二处理数据及所述第二样本标签进行判别处理,得到第二判别结果;
根据所述第一判别结果及所述第二判别结果,确定所述判别损失函数,所述判别损失函数为判别器的损失函数;
根据所述第一判别结果,确定所述生成损失函数,所述生成损失函数为生成器的损失函数;
对所述判别损失函数及所述生成损失函数进行优化,确定所述目标函数。
再一种实施方式中,处理单元901,还可用于对所述第二修复数据及所述第二处理数据进行平均余弦相似度计算,得到相似度结果,所述第二处理数据为与所述第二修复数据相匹配的真实数据;
根据所述相似度结果,优化所述数据修复模型的网络参数。
再一种实施方式中,所述样本数据包括验证数据,处理单元901,还可用于获取所述验证数据,并对所述验证数据进行归一化处理,得到所述验证数据的第三处理数据;
获取验证条件,并对所述验证条件进行张量化处理,得到所述验证条件的验证标签;
调用已完成训练的所述数据修复模型,根据所述验证标签对所述第三处理数据进行修复处理,得到第三修复数据;
获取真实验证数据,并对所述真实验证数据进行归一化处理,得到所述真实验证数据的第四处理数据,所述第四处理数据为与所述第三修复数据相匹配的真实数据;
对所述第三修复数据及所述第四处理数据进行残差分析,得到残差分析结果。
收发单元901,还可用于将所述残差分析结果发送至所述客户端。
根据本申请的一个实施例,图2、图3及图6所示的基于多条件约束的时间序列数据生成方法所涉及的部分步骤可由基于多条件约束的时间序列数据生成装置中的处理单元来执行。例如,图2中所示的步骤201和205可由收发单元901执行;又如,图2所示的步骤203可由处理单元902执行。根据本申请的另一个实施例,基于多条件约束的时间序列数据生成装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请 的实施例的技术效果的实现。
请参见图10,是本申请实施例提供的一种基于多条件约束的时间序列数据生成装置的结构示意图,该数据生成装置包括处理器1001、存储器1002及通信接口1003,处理器1001、存储器1002及通信接口1003通过至少一条通信总线连接,处理器1001被配置为支持处理设备执行图2、图3及图6方法中处理设备相应的功能。
存储器1002用于存放有适于被处理器加载并执行的至少一条指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。
通信接口1003用于接收数据和用于发送数据。例如,通信接口1003用于发送数据修复请求等。
在本申请实施例中,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
通过通信接口1003接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;
调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的,所述样本数据为噪声数据;
通过通信接口1003发送所述第一修复数据至所述客户端。
作为一种可选的实施方式,所述待修复数据包括时间点序列;
所述根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
根据所述时间点序列,对所述归一化数据中各个数据进行排序,所述时间点序列为所述待修复数据中各个数据的生成时间点所组成的序列,所述归一化数据中各个数据为所述待修复数据中各个数据经过归一化处理后得到的;
根据所述特征标签,对所述已完成排序的所述归一化数据进行数据修复处理,得到第一修复数据。
作为一种可选的实施方式,所述调用已完成训练的数据修复模型之前,该 处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
获取所述样本数据及所述第一样本条件,并对所述样本数据进行归一化处理,得到所述样本数据的第一处理数据,对所述第一样本条件进行张量化处理,得到第一样本标签;
获取所述真实样本数据及所述第二样本条件,并对所述真实样本数据进行归一化处理,得到所述真实样本数据的第二处理数据,对所述第二样本条件进行张量化处理,得到第二样本标签;
根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数;
根据所述模型函数优化网络参数,构建所述数据修复模型。
作为一种可选的实施方式,所述第一样本条件包括n个特征条件,n为正整数;
所述获取所述样本数据及所述第一样本条件之前,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
接收所述客户端发送的条件指令,所述条件指令用于指示从n个所述特征条件中获取x个所述特征条件,x为小于或等于n的非负整数;
根据所述条件指令从所述包括n个特征条件的第一样本条件信息中获取x个所述特征条件。
作为一种可选的实施方式,所述模型函数包括生成损失函数、判别损失函数及目标函数;
所述根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
根据所述第一样本标签对所述第一处理数据进行修复处理,得到第二修复数据,并对所述第二修复数据及所述第一样本标签进行判别处理,得到第一判别结果;
对所述第二处理数据及所述第二样本标签进行判别处理,得到第二判别结果;
根据所述第一判别结果及所述第二判别结果,确定所述判别损失函数,所述判别损失函数为判别器的损失函数;
根据所述第一判别结果,确定所述生成损失函数,所述生成损失函数为生 成器的损失函数;
对所述判别损失函数及所述生成损失函数进行优化,确定所述目标函数。
作为一种可选的实施方式,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
对所述第二修复数据及所述第二处理数据进行平均余弦相似度计算,得到相似度结果,所述第二处理数据为与所述第二修复数据相匹配的真实数据;
根据所述相似度结果,优化所述数据修复模型的网络参数。
作为一种可选的实施方式,所述样本数据包括验证数据,该处理器1001可以调用存储器1002中存储的程序代码以执行以下操作:
获取所述验证数据,并对所述验证数据进行归一化处理,得到所述验证数据的第三处理数据;
获取验证条件,并对所述验证条件进行张量化处理,得到所述验证条件的验证标签;
调用已完成训练的所述数据修复模型,根据所述验证标签对所述第三处理数据进行修复处理,得到第三修复数据;
获取真实验证数据,并对所述真实验证数据进行归一化处理,得到所述真实验证数据的第四处理数据,所述第四处理数据为与所述第三修复数据相匹配的真实数据;
对所述第三修复数据及所述第四处理数据进行残差分析,得到残差分析结果,并通过通信接口1003将所述残差分析结果发送至所述客户端。
本申请实施例还提供了一种计算机可读存储介质(Memory),可以用于存储图2及图3中所示实施例中处理设备所用的计算机软件指令,在该存储空间中还存放了适于被处理器加载并执行的至少一条指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。
上述计算机可读存储介质包括但不限于快闪存储器、硬盘、固态硬盘。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者通过计算机可读存储介质进行传输。计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (10)

  1. 一种基于多条件约束的时间序列数据生成方法,其特征在于,所述方法包括:接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
    对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;
    调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的,所述样本数据为噪声数据;
    发送所述第一修复数据至所述客户端。
  2. 根据权利要求1所述的方法,其特征在于,所述待修复数据包括时间点序列;所述根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,包括:
    根据所述时间点序列,对所述归一化数据中各个数据进行排序,所述时间点序列为所述待修复数据中各个数据的生成时间点所组成的序列,所述归一化数据中各个数据为所述待修复数据中各个数据经过归一化处理后得到的;
    根据所述特征标签,对所述已完成排序的所述归一化数据进行数据修复处理,得到第一修复数据。
  3. 根据权利要求1所述的方法,其特征在于,所述调用已完成训练的数据修复模型之前,所述方法还包括:
    获取所述样本数据及所述第一样本条件,并对所述样本数据进行归一化处理,得到所述样本数据的第一处理数据,对所述第一样本条件进行张量化处理,得到第一样本标签;
    获取所述真实样本数据及所述第二样本条件,并对所述真实样本数据进行归一化处理,得到所述真实样本数据的第二处理数据,对所述第二样本条件进行张量化处理,得到第二样本标签;
    根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数;
    根据所述模型函数优化网络参数,构建所述数据修复模型。
  4. 根据权利要求3所述的方法,其特征在于,所述第一样本条件包括n个特征条件,n为正整数;
    所述获取所述样本数据及所述第一样本条件之前,所述方法还包括:
    接收所述客户端发送的条件指令,所述条件指令用于指示从n个所述特征条件中获取x个所述特征条件,x为小于或等于n的非负整数;
    根据所述条件指令从所述包括n个特征条件的第一样本条件信息中获取x个所述特征条件。
  5. 根据权利要求3所述的方法,其特征在于,所述模型函数包括生成损失函数、判别损失函数及目标函数;
    所述根据所述第一处理数据、所述第一样本标签、所述第二处理数据及所述第二样本标签对所述数据修复模型进行监督训练,确定模型函数,包括:
    根据所述第一样本标签对所述第一处理数据进行修复处理,得到第二修复数据,并对所述第二修复数据及所述第一样本标签进行判别处理,得到第一判别结果;对所述第二处理数据及所述第二样本标签进行判别处理,得到第二判别结果;根据所述第一判别结果及所述第二判别结果,确定所述判别损失函数,所述判别损失函数为判别器的损失函数;
    根据所述第一判别结果,确定所述生成损失函数,所述生成损失函数为生成器的损失函数;
    对所述判别损失函数及所述生成损失函数进行优化,确定所述目标函数。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    对所述第二修复数据及所述第二处理数据进行平均余弦相似度计算,得到相似度结果,所述第二处理数据为与所述第二修复数据相匹配的真实数据;
    根据所述相似度结果,优化所述数据修复模型的网络参数。
  7. 根据权利要求3所述的方法,其特征在于,所述样本数据包括验证数据,所述方法还包括:
    获取所述验证数据,并对所述验证数据进行归一化处理,得到所述验证数据的第三处理数据;
    获取验证条件,并对所述验证条件进行张量化处理,得到所述验证条件的验证标签;
    调用已完成训练的所述数据修复模型,根据所述验证标签对所述第三处理数据进行修复处理,得到第三修复数据;
    获取真实验证数据,并对所述真实验证数据进行归一化处理,得到所述真实验证数据的第四处理数据,所述第四处理数据为与所述第三修复数据相匹配的真实数据;
    对所述第三修复数据及所述第四处理数据进行残差分析,得到残差分析结果,并将所述残差分析结果发送至所述客户端。
  8. 一种基于多条件约束的时间序列数据生成装置,其特征在于,包括:
    收发单元,用于接收客户端的数据修复请求,所述数据修复请求包括待修复数据及条件信息,所述数据修复请求用于请求根据所述条件信息对所述待修复数据进行数据修复,所述条件信息为与所述待修复数据相匹配的特征条件;
    处理单元,用于对所述待修复数据进行归一化处理,得到所述待修复数据的归一化数据,并对所述条件信息进行张量化处理,得到所述条件信息的特征标签;调用已完成训练的数据修复模型,根据所述特征标签对所述归一化数据进行修复处理,得到第一修复数据,所述数据修复模型是根据样本数据、第一样本条件、真实样本数据及第二样本条件对所述数据修复模型进行训练得到的,所述样本数据为噪声数据;
    所述收发单元,还用于发送所述第一修复数据至所述客户端。
  9. 一种基于多条件约束的时间序列数据生成装置,其特征在于,包括处理器、存储器和通信接口,所述处理器、所述存储器和所述通信接口相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1-7中任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1-7任一项所述的方法。
PCT/CN2020/081440 2020-03-26 2020-03-26 基于多条件约束的时间序列数据生成方法、装置及介质 WO2021189362A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
AU2020438008A AU2020438008B2 (en) 2020-03-26 2020-03-26 Time series data generation method and device based on multi-condition constraints, and medium
PCT/CN2020/081440 WO2021189362A1 (zh) 2020-03-26 2020-03-26 基于多条件约束的时间序列数据生成方法、装置及介质
US17/618,758 US11797372B2 (en) 2020-03-26 2020-03-26 Method and apparatus for generating time series data based on multi-condition constraints, and medium
GB2117945.2A GB2606792A (en) 2020-03-26 2020-03-26 Time series data generation method and device based on multi-condition constraints, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/081440 WO2021189362A1 (zh) 2020-03-26 2020-03-26 基于多条件约束的时间序列数据生成方法、装置及介质

Publications (1)

Publication Number Publication Date
WO2021189362A1 true WO2021189362A1 (zh) 2021-09-30

Family

ID=77890910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081440 WO2021189362A1 (zh) 2020-03-26 2020-03-26 基于多条件约束的时间序列数据生成方法、装置及介质

Country Status (4)

Country Link
US (1) US11797372B2 (zh)
AU (1) AU2020438008B2 (zh)
GB (1) GB2606792A (zh)
WO (1) WO2021189362A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595214A (zh) * 2022-03-03 2022-06-07 江苏鼎驰电子科技有限公司 一种大数据治理系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577694A (zh) * 2013-11-07 2014-02-12 广东海洋大学 一种基于多尺度分析的水产养殖水质短期组合预测方法
CN108009632A (zh) * 2017-12-14 2018-05-08 清华大学 对抗式时空大数据预测方法
US20180365521A1 (en) * 2016-02-25 2018-12-20 Alibaba Group Holding Limited Method and system for training model by using training data
CN109670580A (zh) * 2018-12-21 2019-04-23 浙江工业大学 一种基于时间序列的数据修复方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105652300A (zh) * 2015-12-23 2016-06-08 清华大学 一种基于速度约束的全球卫星定位系统数据的修正方法
CN107767408B (zh) 2017-11-09 2021-03-12 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN109840530A (zh) 2017-11-24 2019-06-04 华为技术有限公司 训练多标签分类模型的方法和装置
CN110223509B (zh) 2019-04-19 2021-12-28 中山大学 一种基于贝叶斯增强张量的缺失交通数据修复方法
CN110580328B (zh) * 2019-09-11 2022-12-13 江苏省地质工程勘察院 一种地下水位监测值缺失的修复方法
CN110825579B (zh) 2019-09-18 2022-03-08 平安科技(深圳)有限公司 服务器性能监控方法、装置、计算机设备及存储介质
US12019506B2 (en) * 2019-09-24 2024-06-25 Micron Technology, Inc. Imprint recovery management for memory systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577694A (zh) * 2013-11-07 2014-02-12 广东海洋大学 一种基于多尺度分析的水产养殖水质短期组合预测方法
US20180365521A1 (en) * 2016-02-25 2018-12-20 Alibaba Group Holding Limited Method and system for training model by using training data
CN108009632A (zh) * 2017-12-14 2018-05-08 清华大学 对抗式时空大数据预测方法
CN109670580A (zh) * 2018-12-21 2019-04-23 浙江工业大学 一种基于时间序列的数据修复方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595214A (zh) * 2022-03-03 2022-06-07 江苏鼎驰电子科技有限公司 一种大数据治理系统
CN114595214B (zh) * 2022-03-03 2023-05-02 江苏鼎驰电子科技有限公司 一种大数据治理系统

Also Published As

Publication number Publication date
AU2020438008B2 (en) 2023-02-02
GB2606792A (en) 2022-11-23
US20220253351A1 (en) 2022-08-11
GB202117945D0 (en) 2022-01-26
US11797372B2 (en) 2023-10-24
AU2020438008A1 (en) 2022-01-20

Similar Documents

Publication Publication Date Title
WO2020087974A1 (zh) 生成模型的方法和装置
CN110163261B (zh) 不平衡数据分类模型训练方法、装置、设备及存储介质
CN111444952B (zh) 样本识别模型的生成方法、装置、计算机设备和存储介质
CN109726763B (zh) 一种信息资产识别方法、装置、设备及介质
CN109376267B (zh) 用于生成模型的方法和装置
CN112259247B (zh) 对抗网络训练、医疗数据补充方法、装置、设备及介质
CN111475496B (zh) 基于多条件约束的时间序列数据生成方法、装置及介质
CN111210332A (zh) 贷后管理策略生成方法、装置及电子设备
US20210375492A1 (en) Ai enabled sensor data acquisition
WO2021189362A1 (zh) 基于多条件约束的时间序列数据生成方法、装置及介质
AU2021106200A4 (en) Wind power probability prediction method based on quantile regression
CN113886821A (zh) 基于孪生网络的恶意进程识别方法、装置、电子设备及存储介质
CN113420165A (zh) 二分类模型的训练、多媒体数据的分类方法及装置
CN115883424A (zh) 一种高速骨干网间流量数据预测方法及系统
CN110717577A (zh) 一种注意区域信息相似性的时间序列预测模型构建方法
CN115604131A (zh) 一种链路流量预测方法、系统、电子设备及介质
CN112732962B (zh) 基于深度学习与Flink的线上实时预测垃圾图片类别方法
CN112115443B (zh) 一种终端用户鉴权方法及系统
CN114298199A (zh) 转码参数模型的训练方法、视频转码方法及装置
CN115496175A (zh) 新建边缘节点接入评估方法、装置、终端设备及产品
CN109919203A (zh) 一种基于离散动态机制的数据分类方法及装置
CN115102852B (zh) 物联网业务开通方法、装置、电子设备及计算机介质
US20230377004A1 (en) Systems and methods for request validation
CN113938566B (zh) 任务执行方法、装置和电子设备
CN113011555B (zh) 一种数据处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20927850

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 202117945

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20200326

ENP Entry into the national phase

Ref document number: 2020438008

Country of ref document: AU

Date of ref document: 20200326

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20927850

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20927850

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20927850

Country of ref document: EP

Kind code of ref document: A1