WO2023113729A1 - High performance machine learning system based on predictive error compensation network and the associated device - Google Patents

High performance machine learning system based on predictive error compensation network and the associated device Download PDF

Info

Publication number
WO2023113729A1
WO2023113729A1 PCT/TR2022/051352 TR2022051352W WO2023113729A1 WO 2023113729 A1 WO2023113729 A1 WO 2023113729A1 TR 2022051352 W TR2022051352 W TR 2022051352W WO 2023113729 A1 WO2023113729 A1 WO 2023113729A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
prediction
input
error
artificial neural
Prior art date
Application number
PCT/TR2022/051352
Other languages
French (fr)
Inventor
Burak Berk ÜSTÜNDAĞ
Alper Cem YILMAZ
Ajla KULAGLIC
Original Assignee
Kriptek Kripto Ve Bilişim Teknolojileri Sanayi Ve Ticaret Anonim Şirketi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kriptek Kripto Ve Bilişim Teknolojileri Sanayi Ve Ticaret Anonim Şirketi filed Critical Kriptek Kripto Ve Bilişim Teknolojileri Sanayi Ve Ticaret Anonim Şirketi
Publication of WO2023113729A1 publication Critical patent/WO2023113729A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the invention High Performance Machine Learning System Based on Predictive Error Compensation Network (PECNET), is about a training and prediction method and the associated device which increases the performance of next value prediction or classification in one or more dimensional sequential signal arrays or patterns compared to the conventional machine learning applications.
  • PECNET Predictive Error Compensation Network
  • Machine learning techniques are increasingly used in next value prediction, classification, and anomaly detection of temporal or spatial data in numerous fields including cybersecurity, finance, and image processing.
  • One of the main problems that can be encountered in conventional artificial neural networkbased machine learning techniques is the overfitting in output performance when the amount of simultaneously evaluated input data increases without enlarging the data set used in training. This problem is about entering the memorization process instead of the learning process under the constraint of the training data set when the number of input data, starting from the input layer, increases the amount of the weight required to train. As a result of overfitting, although the performance is high when the training data set is applied, the performance drops significantly when test data, which is not used in training, is applied to the system.
  • Pruning of the weight parameters on the artificial neural network is one of the methods used when the learning time, along with the computational complexity, exponentially increases as the amount of simultaneously used input data in conventional methods increases. Despite as the more optimized and faster operation as a result of pruning, the effects of this process on the computational load during the training process and on the continual learning performance after pruning are limited. In time-varying systems, when conventional machine learning approaches are applied to predict the future or the sequent value of the same data sequence, the performance decreases due to problems such as the characteristic changes of the input data used in training over time, and the loss of validity of the normalization for the changes in the test data. This situation creates the need for continual learning adaptation as new data comes in, without split training and testing processes in future value prediction or behavior classification of time-varying systems, especially in economic and financial systems.
  • the main objective of the present invention is to increase the performance of the next value prediction or classification by eliminating the overfitting problem seen in one or more dimensional sequential signal arrays.
  • Another objective of the invention is to increase the accuracy of prediction with continual training update by the evaluation of the errors occurring after each new data in different subnetworks.
  • Another objective of the invention is to increase the performance of the prediction or classification by fusion, without causing an overfitting problem even when the input data type consists of sequences of more than one type and different widths.
  • the main objective of the present invention is to introduce a method structure for training a computer-based prediction system, comprising at least three artificial neural subnetworks and a main decision layer, based on a predictive error reduction network that takes the signal or data as input and predicts or classifies its next value.
  • the invention consists of an initial prediction network, at least one prediction network for residual error prediction of the previous prediction network, an error sequence next value prediction network, and a predictive error correcting summation point or fusion network that combines prediction and correction values.
  • the functional strategy of the invention is to recognize the system features that are the source of the data in the frequency or wavelet domain and train additional data or features to reduce the error components from previous networks instead of training them together with the whole for a target.
  • the present method contains the following steps of separating the current and historical values of the signal or data at the input into at least two different frequency bands, transforming the value sequences in the mentioned frequency bands and applying feature extraction, applying the extracted features to separate neural subnetworks, using the output of the first artificial neural subnetwork as an input to a main decision layer; determining the corrected data of the next neural subnetworks, by predicting the error in the output of the previous neural network, as a composite prediction value by subtracting the “prediction error” prediction value from the previous prediction value; by similarly predicting the error of this value by the next network, prediction subnetworks generate a new predictive value that is constantly corrected by subtracting it from the composite predictive value remaining from the previous network; applying the same data to these subnets by selecting or sorting according to the correlation between properties obtained from their values in different time and frequency ranges and residual error value instead of output prediction value of different types of data; in the training of subnets, starting from the second artificial neural network, shifting the input data sequences back one by one and using the difference between the previous prediction
  • all other sub artificial neural networks are trained with the backward-shifted values of the remaining error sequence from the previous residual error instead of the main prediction label data. While the prediction error values is gradually reduced in this cascaded and additive manner at the first level, input data features to the fusion network at the second level becomes more orthogonal even though number and type of external input parameters are increased. Hence the effect of overfitting is reduced and the composite prediction or classification performance at the output of the fusion or terminal network is increased.
  • the data sets to be predicted or classified are divided into different band groups and data types such as low and high frequency bands and applied to separate networks instead of being applied to a single network; the ones outside the first network are trained with each other's errors instead of training with the target label data.
  • the invention can prioritize data sources and time frame scales during the training update, starting from the lowest frequency or the widest time frame, reducing errors towards the high frequency or the narrow time window.
  • the target prediction or classification value is determined by aggregating the output values produced in the neural subnetworks or by applying them to another artificial neural subnetwork input. In this way, it becomes possible to use not only different types of data, but also the sampled values of the same data type in different frequency bands or with different periods, without causing excessive compatibility, in increasing the output performance.
  • another artificial neural subnetwork is used to compensate the output signal error of the first artificial neural subnetwork, and the output of said sequential artificial neural subnetwork is used sequentially according to the aforementioned artificial neural subnetworks, given as input to the decision layer.
  • target prediction or classification performance is increased by participating in the composite error reduction of multiple types of signal inputs.
  • the training is continued by using the current values as label data by applying the input values of the artificial neural subnetworks back to the previous value.
  • the inputs that have a non-linear contextual relationship with each other and/or the current error residue and with separate feature extractions are used as inputs together.
  • the previously mentioned main layer is provided as an artificial neural network or by summation point via aggregating the prediction values and the prediction of the error residuals from the artificial neural subnetworks.
  • the processing speed is increased on low-configuration hardware, especially when the decision layer is selected as the collection layer.
  • the present invention concept includes an artificial neural network model trained with this training method, a computer-based prediction method that provides the next value prediction, and a device containing hardware suitable for these processes.
  • Figure 1 Multiple neural network structure with one input and one output, in which the neural network is used as a decision layer at the output.
  • Figure 2 Schematic representation of the input, target label and output signal arrays in a multiple neural network in training update mode.
  • FIG. 1 Schematic view of the wavelet transformation with Mallat algorithm.
  • Figure 4 Multiple neural network structure with one input and one output using an aggregation layer as a decision layer at the output.
  • Figure 5 Multiple neural network structure using multiple types of input signals and a fusion network.
  • Figure 6 A simplified multiple neural network structure in which an aggregation layer is used as a decision layer for next value prediction with multiple types of input signals.
  • Figure 7 An artificial neural network structure that performs conventional time-delayed next value prediction or classification.
  • Figure 8 A conventional fully connected neural network structure.
  • Feature extraction layer Discrete signal array-wavelet transformer
  • a historical value of the first corrected prediction sequence obtained by subtracting the predicted error errlp(t) or errlp(k) by the secondary ANN after denormalizing the primary ANN’S prediction output (xp(t+T) or xp(k+l) ).
  • n-input wavelet converter normalized to the moving average of the last n values of the input signal array (it has n-1 outputs, excluding aO)
  • First wavelet transformation neural network WNN1 (Wavelet Neural Network) consisting of feature extraction with wavelet converter and primary next value prediction by ANN
  • Second wavelet transform neural network WNN2 consisting of feature extraction with wavelet converter and ANN, predicting the future value of the errors of the first network
  • Third wavelet transform neural network WNN3 which consists of feature extraction with wavelet converter and ANN, used in predicting the future value of the remaining prediction error value after reduced error with WNN2 output
  • the wavelet transformation aO value depending on the mean and approximation of the first-order approximation coefficients
  • the wavelet transformation values a2 and a3 representing the detail depending on the first level difference value
  • the predictive value output with reduced error obtained by subtracting the main predictive value from the expected error value in the prediction
  • a fusion prediction ANN in which the prediction, prediction error, and error sequence are applied to the inputs of the future value prediction
  • the subject of the invention is about a training and prediction method and the associated device which increases the performance of next value prediction or classification in one or more dimensional sequential signal arrays compared to the conventional machine learning applications.
  • x(k) represents the current value of any spatial or temporal data sequence
  • x(k+l) represents the next value
  • x(k-l) represents the previous value
  • x(t) represents data sampled with equal periods in time (uniform sampling)
  • x(t-T) represents the “T” sampling period previous value of the data. Therefore, for instance, x p (k+l) represents the next predicted value of any data sequence; x p (t+T) represents the predicted value of the uniformly sampled sequence at the next time period as the “T” sampling period.
  • the expression “avg” in the figures represents the average value and the expression “err” represents the error value.
  • FIG. 1 in order to implement of the aforementioned training and prediction method, there are at least three artificial neural subnetworks (1, 16, 20) where the inputs are provided and a main decision layer (84) where the outputs of the above mentioned artificial neural subnetworks (1, 16, 20)) are provided as inputs for.
  • a main decision layer (84) where the outputs of the above mentioned artificial neural subnetworks (1, 16, 20)) are provided as inputs for.
  • the mentioned artificial neural subnetworks (1, 16, 20) also provide inputs to each other in training as well. During this training, except for the training method of the first and the last artificial neural subnetworks, the neural networks in between are trained in the same way.
  • the aforementioned neural subnetwork layer (1, 16, 20) includes a neural network layer (3, 14, 18) and a feature extraction layer (4, 15, 19).
  • transformation layer to transform, preferably to normalize, the data before the input of the neural subnetwork layer (1, 16, 20).
  • DNN Deep Neural Networks
  • CNN convolutional networks
  • ESTM long short-term memory networks
  • basic machine learning functions such as recognition, prediction, classification
  • a subsequent or future value of the data type is provided over a predictive output trained with the method of the invention, and the output data type acquired here does not need to be the same type as the input data type.
  • the input can be signal or data sequences.
  • the input signal arrays which will be used according to the prediction data, are first grouped as correlation and frequency bands. This selection is made starting from the lowest frequency band of the data with the highest statistical correlation.
  • the moving average method (22, 23) and/or the filtered value with a low-pass filter is applied to the primary network (50) by normalizing it with another moving average receiver (44, 45, 46, 47).
  • the orthogonality between the currently used input data relative to the correlation of data sequences which can be used in addition to the main input data type (x(t) or x(k)) (such as y(t) or y(k)) to the target data sequence to be predicted, is a determining factor.
  • the data types determined after the data type design are decomposed starting with low-frequency components to high-frequency components via the correct filters and applied to different artificial neural subnetworks, the overall prediction or classification performance increases.
  • the moving average (24) value is used in the first artificial neural subnetwork (1) stage. Note that this value is also used in the input value normalization (25, 26, 27,28) of the second prediction network (51).
  • the prediction or classification system is explained by inputting only one type of signal array (2).
  • the n moving average values (24) consisting of the (n-1) historical values (2, 6, 7, 8) and the current value of the input sequence (2) are summed and divided by the number of signals, and the resulting average is used as a low- frequency component (50) in the neural subnetwork (1, 16, 20), preferably of the WNN type.
  • filters such as Butterwoth and Chebyshev, which are widely known in the literature and whose frequency responses are more stable, can be preferred in the application in order to have a more regular decomposition of the signal features according to the frequency bands.
  • filters such as Butterwoth and Chebyshev, which are widely known in the literature and whose frequency responses are more stable, can be preferred in the application in order to have a more regular decomposition of the signal features according to the frequency bands.
  • the prediction or classification performance may increase.
  • the final m values (24, 38, 39, 40) of the low frequency components (24) which are represented by the moving averages are subtracted from their averages (44, 45, 46, 47) and normalized.
  • the normalized values are sent to the first artificial neural network (1), where the features are extracted with discrete wavelet transform (3, 14, 18) in the feature extraction layer (4, 15, 19), especially with the Mallat wavelet transform shown in Figure 3.
  • the features extracted here are applied to the neural network layer (3) of the first neural subnetwork (1).
  • the frequency decomposition the moving average, the low- pass filter, and the relevant normalization are preferred, since it is suitable for continual signal updating without being dependent on a future value and its structure is simple in terms of expression. It is also possible to use different transformation and normalization types in different applications.
  • This first neural subnetwork (1) is trained directly with target data (x(t+T), x(k+l)) as in conventional prediction systems, unlike the other first artificial neural networks (16,20).
  • the first neural subnetwork (1) While the first neural subnetwork (1) is not denormalized, i.e., its direct output (5) is sent as an input to the main decision layer (84), that is, to the fusion artificial neural network (94), the first neural subnetwork (1) output transforms into the denormalized primary prediction value (96) by aggregating the n-valued moving mean of the input signal array with the average of the last m values.
  • the first error reduction is performed by taking the difference (70) of this primary prediction value with the error prediction value (17) which comes from the secondary artificial neural subnetwork (16), i.e., the secondary neural network. Therefore, how the secondary, more broadly speaking the next, artificial neural subnetwork (16, 20) is trained, is a crucial detail that differs from conventional prediction systems.
  • the secondary artificial neural subnetwork (51) in Figure 1 which contains the secondary neural network layer (16), is trained, as shown in Figure 2, with the historical prediction error data ( 66) of the primary artificial neural subnetwork (50), which contains the primary neural network layer (1).
  • the second artificial neural subnetwork (51) input is applied as a data sequence, starting from x(k-l), which is one value behind the current value, as much as the window size backwards.
  • the window size is n
  • n-past values are used starting from x(k-l) to x(k-n).
  • the established “prediction error value” of the first artificial neural subnetwork (1) is applied as the target label data (66) of the artificial neural network layer (16) in the secondary artificial neural subnetwork (51).
  • This “prediction error value” is equal to the difference between the last data input value x(k) and one sample back-shifted (z-1) value of the prediction value (96).
  • the prediction value is the denormalized value by aggregating the n-valued moving average of the input signal array of the first neural subnetwork (1) output with the average of the last m number of values.
  • the normalized value of a sample- shifted input sequence (x(k-l). . .x(k-n)) is entered into the second artificial neural subnetwork (16) input, while the predicted value of the first artificial neural subnetwork (1) goes backwards.
  • the current error value (66) which is the difference between the shifted (z - 1 ) value and the current input, is applied as the target training label.
  • the training can be updated without using subsequent values.
  • An essential benefit of establishing such a training model is that the training can be updated as new data becomes available for predictions about systems which change over time.
  • a certain proportion of the pre-given data set for example, 70%
  • the remaining part for example, 30%
  • the conventional methods can be applied and also it is possible to constantly update the training as new data comes in. If a continual training update is desired, the cycle consisting of reading input data, calculating the previous prediction error, updating the training, and determining the new predicted output value will need to be repeated.
  • the normalized output value of the secondary prediction network secondary neural subnetwork (51) is the prediction of the prediction error of the primary second artificial neural subnetwork (50). This value (17) is subtracted from the previous prediction value (96) to obtain the “compensated prediction value” (70).
  • the output of the second artificial neural subnetwork (17) that compensates for this is also applied as an input to the main decision layer, i.e., the prediction artificial neural network (94).
  • the main decision layer (94) is trained directly with the target prediction or classification value.
  • This input can be called the “current residual error value”.
  • the prediction error corrected prediction values are obtained by taking the difference of the primary prediction value with the error prediction value which will come from the secondary artificial neural subnetwork. Thus, this difference determines the final prediction error.
  • the third artificial neural subnetwork (20) is the final error prediction stage.
  • the current residual error value (10) which is the difference between the historical value of the error compensated prediction value and the current input value (2), and its historical values (11, 12, 13) are obtained and applied to the last, i.e., the third artificial neural subnetwork (52).
  • the technique of taking the difference from the moving average technique is used in the last error prediction stage, as with the previous stages.
  • the residual error data sequence (err(k), err(k-l), err(k-2), ... err(k-(p-l))) (10, 11, 12, 13) in the last artificial neural network (52) input is normalized with this method (32, 33, 34, 35).
  • the final error correction artificial neural network (52) differs from the prediction (validation) mode; p number of values of the residual error values from the penultimate value err(k-l) to err(k-p) are applied.
  • err(k) (10) which is the last error value of the last artificial neural network (52) is applied as the target label value.
  • the last artificial neural network (52) works in prediction (validation) mode, it predicts err(k+l) or err(k+T) (21) corresponding to the next expected error value while applying to the input of the error sequence ending with err(k), which is the last current residual error value, and this value is denormalized (69) and applied to the decision layer (94).
  • the non-denormalized output value (5) of the first artificial neural network (1) where the low frequency component values are entered, the denormalized output (17) of the second neural network (16) where the error of the first prediction output is predicted, and the denormalized output value (69) of the last artificial neural network (20) from which the residual error of all is predicted, are all inputted to the main decision layer (94).
  • the performance of composite prediction or classification is increased by applying the orthogonality-enhanced features (5, 17, 69) obtained by predicting the current residual error values in sequential networks to the inputs of the main decision layer (94).
  • the last input value x(k) (2) is applied as the target label value, while the previous values of the data (5, 17, 69) coming from the neural subnetworks (1, 16, 20) are entered into the input of the main decision layer (94).
  • x p (k+1) (5) is applied to the main decision layer (D), while its previous value (z - 1 ) in the training update, x p (k) is sent to the input of the fusion network (94) input.
  • the fusion network output is denormalized with the mean at the output to obtain the future value Xfmai_ P (k+1) or Xf m ai_p (t+T) (49) as the main prediction.
  • an artificial neural network can be used as the main decision layer (94).
  • the signed aggregation (subtraction) method is shown instead of the main decision layer (94).
  • the applications in Figures 4 and 6 can also use an artificial neural network as the decision layer (94).
  • the use of a separate aggregation network (87, 94) instead of the aggregation process in commonly encountered signal array examples increases the prediction performance.
  • the configuration in Figure 1 is applied to predict the next day values based on the daily closing values of 4 stocks in the New York Stock Exchange Nasdaq index, and the results obtained according to the stages are compared with ARIMA (Auto Regressive Integrated Moving Average), which is widely used in the literature.
  • ARIMA Auto Regressive Integrated Moving Average
  • the values of Apple (APPL), Bank of America (BAC), Micron (MU) and Ford (F) stocks for 3147 stock market working days between the dates of 24.09.2007-02.04.2020 are used for comparison between PECNET and ARIMA, and the results are shown in Table 1.
  • Continual learning mode is not used in this comparison. Instead, the first 70% of the data is used for training and the remaining 30% is used for testing (validation), as is common in prediction and classification with conventional machine learning.
  • the performance of the prediction or classification in timevarying systems such as the stock market increases when the continual training update of the present invention is applied.
  • the prediction error at the output of the fourth level, the main decision layer (49) of the configuration in Figure 1 is significantly lower than ARIMA for all compared stock values.
  • the RMSE Root-Mean Square
  • the error (RMSE) decreases continuously as the level progresses, and it comes to the lowest level at the output of the decision layer (49).
  • the second level value is the corrected value of the first level prediction output by subtracting the denormalized value of the second artificial neural subnetwork (51) error prediction output (70).
  • the third level prediction value is not included as a value in the diagram in Figure 1, the denormalized value (69) of the third neural subnetwork (53) output is given by subtracting the corrected prediction value (70) at the previous level in order to show the reduction success in the comparison table. Because the denormalized value (69) of the third artificial neural subnetwork (53) output (21) sends the prediction of the residual prediction error not the prediction to the main decision layer (87, 94).
  • the effective error (RMSE) in predicting the next closing value of Bank of America shares with the previous closing values with ARIMA is 0.4853, while at the output of the present invention it is 0.2646.
  • All artificial neural subnetworks (1, 16, 20) are identical and have 4 inputs, 1 output and 4 hidden layers in the form of 4x32x32x32x16x1.
  • the moving average values subjected into the normalization (97) are given as five-point (avg(x(k-5). .
  • Table 1- Comparison of the RMSE (Root Means Square Error) performance in daily value prediction of 4 stocks in Nasdaq, according to the stages of the PECNET configuration depicted in Figure 1, with ARIMA (Auto Regressive Integrated Moving Average) Compared to the network types such as LSTM (Long short-term Memory) and CNN (Convolutional Neural Network), the training and prediction system of the present invention generally provides higher performance (i.e., lower error) in future value prediction or classification of data sequences.
  • the fully connected network model shown in Figure 8 is used for the ease of expression. This also applies to the comparisons in Table 1.
  • Network types such as LSTM and CNN can be used in place of the fully connected network in the system of the present invention.
  • the artificial neural subnetworks (1, 16, 20) of the present invention allow to increase the combined prediction or classification value by decomposing the error residues and orthogonal features.
  • the time-delayed wavelet transformation neural subnetworks (1, 16, 20) model (TDWNN), which is used at different levels in the present invention and seen in Figure 7, can be found in the literature with different configurations, especially with time series prediction applications.
  • TDWNN model is used alone for the stock forecast data in Table 1, none of its configurations can approach the level of performance provided by its multiple use in the present invention system.
  • the performance of the present invention in Table 1 can be increased by using other data of the stock whose closing value is predicted on the next day, as shown in Figure 5.
  • economic and financial data pertaining to the stock or the market such as transaction volume, can be inputted.
  • Inputting these data reveals another advantage of the present invention.
  • the risk of overfitting does not increase as new values come in a limited data set, as observed in common forecasting models, because the new incoming data are mostly entered into independent subnets and trained with the residual error value.
  • the present invention can be extended by sampling of a single data type in different window sized or by additional data types as well as features in different frequency bands.
  • additional data types such as y(k) or y(t) (71) can be used for sequential error reduction.
  • the level of the pattern relationship of the data (71) to be added as a new input type with the residual error sequence from the previous network instead of the target value to be predicted (x p (k+1)) determines its contribution to the improvement of the prediction value.
  • maximizing the correlation with the residual error from the previous networks enables the selection of orthogonal data features, which is a matter of fundamental superiority in machine learning. Overfitting is prevented by selecting new data (71) with the orthogonality condition and finalizing it with a main decision layer (87,94) selected as aggregating or artificial neural network.
  • a structure with at least two inputs can use the output of another structure or more than one structure as an input sequence.
  • the future value predicted by another structure can be used instead of the past value of the index, in addition to data such as the past values and trading volume of the stock in question as input data.
  • a large number of parameters and historical values can be used in multiple neural subnetworks with fewer inputs (1, 16, 20, 94, 84, 87) or in different structures within the structure networks, and in the composite prediction output despite the limitation of the amount of data in the training set, the likelihood of overfitting is reduced.
  • the WTDNN structure in Figure 7 is used as a common building block in the stages before the main decision layer (87, 94).
  • the configuration in Figure 1 uses 3 WTDNN artificial neural subnetworks (50, 51, 52) and the main decision layer (94), while the configuration in Figure 5 uses 4 WTDNN artificial neural subnetworks and the main decision layer (87).
  • a total of n values, including the time discrete signal array’s x(k) current and (n-1) past values (6, 7, 8), are normalized and inputted into the wavelet converter (14) feature extraction layer (15).
  • This model is to predict the next value of the sequence or the next value of a certain number of samples using historical values, or to determine a class associated with historical values. Extracting what sound, it is from the change of a sound signal can also be shown as an example of classification of the same structure.
  • this structure in addition to x(k), multiple types of signal arrays and historical values such as y(k), z(k) can be applied, which will increase the accuracy of target data prediction or classification.
  • the second of the two WTDNN neural subnetworks used in the configuration in Figure 6 inputs the wavelet transform feature extraction blocks (18, 82) of two separate data sets into a common artificial neural subnetwork (90).
  • the prediction performance for the training set also decreases.
  • the closing value of a stock in the stock market for the next day may depend on the historical values at different frequencies such as the daily change in the current week, the change in the past weeks or months, or it may be in an antecedent signal relationship with the change in value of another stock with the time shift.
  • wavelet transform (14) for feature extraction (14) is one of the methods used in many applications before the data is applied to the neural subnetwork (16) in prediction or classification.
  • wavelet transform Although it is not necessary to use this transformation, wavelet transform generally increases the overall performance in evaluating the pattern signals related to physical systems as time series. Therefore, here, comparative explanations of the conventional artificial neural subnetwork (16) prediction model and the present invention are made on schematics using the wavelet transform.
  • Normalization in the prediction scheme with the conventional artificial neural network (16) in Figure 7 is done by subtracting (25, 26, 27, 28) the moving average values from the input values(x(k), x(k-l), ... x(k-(n-l))).
  • the next value of the T period, x p (t+T) or x p (k+l) is summed with the input average, and the denormalized prediction value is obtained.
  • the artificial neural network system seen in Figure 1 has 4 fundamental differences compared to the conventional single-network data entry systems seen in Figure 7.
  • the first difference is the way in which the values of the same type of data in different frequency bands are entered separately into the prediction or classification system.
  • the conventional method for example, the average weekly value of a stock for the last 4 days and the last 4 weeks is used together in an artificial neural network with a total of 8 entries when a network is entered.
  • low frequency components of the signal 24, 38, 39, 40
  • a second artificial neural subnetwork (16) predicts the prediction error of the former with higher frequency current data (2, 6, 7, 8).
  • the purpose of doing this is to decompose the values of the same signal array in different frequency ranges or past time windows and orthogonal features in the future value prediction or class relationship. Otherwise, the components that are highly correlated with each other increase the number of inputs even though they do not contribute, so the total number of internal neurons that require training also increases.
  • the structure used as the second level artificial neural subnetwork (16) in Figure 1 is shown with the same numbering as the artificial neural subnetwork (16) in Figure 7 in terms of the input signal array.
  • the denormalized output (96) of the neural subnetwork (1) using the low-frequency component and the denormalized (17) difference (70) of the artificial neural subnetwork using the direct input signal are used in the remaining error compensation as the predicted value with reduced error, but instead of the improved prediction value, the value that improves it (17) is sent to the fusion network (94) as input.
  • the predictive value (70) which has been reduced by compensating the error is used in the artificial neural subnetwork (52).
  • FIG. 1 Another significant difference between the conventional prediction or classification method and the present invention is in the way target label data is used in training.
  • the second neural subnetwork (16) of the present invention in Figure 1 and the neural subnetwork (16) of the conventional method in Figure 7 are structurally the same in terms of the way the input data is implemented
  • the second artificial neural subnetwork (16) in Figure 1 uses the residual error after the prediction of the first artificial neural subnetwork (1), as seen in Figure 2, in training.
  • the input, output, and target label data relations in the training mode of artificial neural subnetworks can be seen when 4 historical values of the present invention are used.
  • feature extraction with wavelet transform is shown as a block (50, 51, 52) as WNN.
  • the error data (66) obtained by shifting the normalized prediction output (96) of the first artificial neural subnetwork (50) back one sample and subtracting the current input value (2) is used as the target label data.
  • input signal values in the form of x(k-l). . . ,x(k-4) are used for the second artificial neural subnetwork (51) input, excluding x(k).
  • the past values of the input are used to predict the current forecast error.
  • the third artificial neural subnetwork (52) used for error termination in Figure 1 is a second use case, as it is in Figure 4, by using the aggregation layer as the main decision layer for error reduction.
  • the composite prediction error can be reduced by subtracting the error prediction value (95) directly from the denormalized output value (54) of the previous artificial neural subnetwork (16).
  • the input signal array x(k) is normalized and entered directly into a single artificial neural subnetwork (16) without dividing it into frequency groups or different time intervals.
  • the neural network layer (20) used as the second neural subnetwork without using any additional data and the wavelet converter (18) before it corresponds to the third neural subnetwork (52) in the first Figure.
  • First level mean values (60) are obtained from the sum of the first two values (58) and the sum of the third and fourth values.
  • the aO (61) values are obtained from their mean and al values from their difference.
  • the first level details obtained from the difference of the third and fourth (59) and the difference of the first and the second form the transformation coefficients as a3 and a2, respectively.
  • This described transformation corresponds to the use of Haar as shown in Figure 3 as the mother wavelet function.
  • the transform layer functions as feature extraction, and as an alternative to the wavelet transform, Fourier and Hilbert transforms can be preferred according to the domain where the characteristic features of the data are dominant.
  • the feature extracted from the current (2) and historical values (6,7,8) of the signal or data in the input and their value sequences (24,38,39,40) allocated to at least one different frequency band by transformation (3, 14, 16) which is a data training and/or classification method that predicts or classifies the future or next value of signals by applying them to multiple artificial neural subnetworks (16, 20, 94, 84, 87),
  • the first artificial neural subnetwork (16) directly predicts the target data (5), while from the second one, the artificial neural subnetworks (16, 20) predict the error of the previous one (17) compensation (70) and error value (10) remaining from the difference between a historical value (9) and the current value (2) of the prediction signal (70) whose error is reduced in each step and using the past values (10,11,12) as inputs in the prediction network (62) that infers the next prediction error correction value (69), and the prediction value of the first network and the error correction prediction values
  • the present invention fundamentally has the following steps; the application of feature extraction by transformation after normalization of the value sequences of the signal or data sequence or pattern in the input and the additional data correlated with the target, sending the normalized output value (96) of the first artificial neural subnetwork (1) to the first level correction network array as a raw prediction value, determination of the composite predictive value (70) of the corrected data by predicting the error in the prediction value of the previous artificial neural network (1) of the next neural subnetworks (16), by subtracting the “prediction error” prediction value (17) from the first denormalized prediction value (96); prediction subnetworks generate (86) a new predictive value that is constantly corrected by subtracting the error of this value from the composite prediction value (70) remaining from the previous network, by similarly predicting (85) by subsequent subnets (84); Starting from the second artificial neural network (11), in the training of subnets, the input data sequences (2) are shifted back one by one and the difference between the previous prediction value of the previous network and using the actual input value (66) as the target label.
  • the present application also includes a device for conducting the aforementioned methods.
  • Said device contains a processing unit configured to execute the above given processing steps and a memory unit that stores the instructions/commands describing the above steps or combination of these steps and can transmit these instructions/commands to the processor for execution.
  • the above-mentioned memory unit may also be configured to store previous or input data from an external source.
  • a separate memory unit can be provided to store previous or input data.
  • the aforementioned device may also include a communication unit for externally receiving data and/or transferring the predictions provided over the received data to another external device.
  • the said communication unit can be provided in a wireless or wired way, but it can also be provided in a system in a way that directly associates the device with the system.
  • the present application also includes a device comprising elements for executing the method of claims 1-15 and/or 16-31 when executed by a processing unit, a program containing instructions for performing the method of claims 1-12 and/or 13-25 when executed by a processing unit, and a computer readable medium containing the above-mentioned program.

Abstract

The invention is about a training and prediction method and the associated device which increases the next value prediction and classification performance compared to the conventional machine learning applications; it provides subsequent value prediction or classification performance in one or higher dimensional sequential signal sequences, which enables to increase the prediction or classification performance in stochastic or chaotic time series or spatial data sequences where large and different time scales or different data types are used simultaneously with conventional machine learning.

Description

HIGH PERFORMANCE MACHINE LEARNING SYSTEM BASED ON PREDICTIVE ERROR COMPENSATION NETWORK AND THE ASSOCIATED
DEVICE
The Technical Field of the Invention
The invention, High Performance Machine Learning System Based on Predictive Error Compensation Network (PECNET), is about a training and prediction method and the associated device which increases the performance of next value prediction or classification in one or more dimensional sequential signal arrays or patterns compared to the conventional machine learning applications.
The State of the Art of the Invention
Machine learning techniques are increasingly used in next value prediction, classification, and anomaly detection of temporal or spatial data in numerous fields including cybersecurity, finance, and image processing.
One of the main problems that can be encountered in conventional artificial neural networkbased machine learning techniques is the overfitting in output performance when the amount of simultaneously evaluated input data increases without enlarging the data set used in training. This problem is about entering the memorization process instead of the learning process under the constraint of the training data set when the number of input data, starting from the input layer, increases the amount of the weight required to train. As a result of overfitting, although the performance is high when the training data set is applied, the performance drops significantly when test data, which is not used in training, is applied to the system.
Pruning of the weight parameters on the artificial neural network is one of the methods used when the learning time, along with the computational complexity, exponentially increases as the amount of simultaneously used input data in conventional methods increases. Despite as the more optimized and faster operation as a result of pruning, the effects of this process on the computational load during the training process and on the continual learning performance after pruning are limited. In time-varying systems, when conventional machine learning approaches are applied to predict the future or the sequent value of the same data sequence, the performance decreases due to problems such as the characteristic changes of the input data used in training over time, and the loss of validity of the normalization for the changes in the test data. This situation creates the need for continual learning adaptation as new data comes in, without split training and testing processes in future value prediction or behavior classification of time-varying systems, especially in economic and financial systems.
In US6735580 Bl, it explains a computer-based method for time-series forecasting in artificial neural networks. With this method, the time-series data for financial time-series forecasting is entered into the artificial neural network, corrected according to an error model, and the price forecast for the next selected time is obtained. Then, the expected trend data is procured using the retrospective stock market data and this data is extracted. The aforementioned data is entered as an input to an iterative artificial neural network, and then the trends and weights are rearranged according to the results of the first training step. The error output and the average of the error output are calculated from the difference between the output result obtained from the iterations and the expected sample data, and the output of the desired output data is regulated according to the average of this error output.
As in the next day's value prediction problem of stock market share values, when the type of data (financial, economic, administrative, and political parameters) that can be used as input is high compared to the total amount of past records, it is important for the performance of the machine learning to determine which features of which data or how much historical value will be used. Although data with a high statistical correlation to the target prediction label are usually selected, if there is a dependency between the input data types, the performance may be adversely affected.
In US 10650045 B2, it describes the management of the training data and the hyperparameters of networks in a chain to perform a function by using the outputs of a set of neural networks together. The problem of overfitting will also be observed in the system here.
In US9934259 B2, it is about a method that involves evaluating time series data together in a hierarchically distributed processor sequence and the associated distributed processing device to perform weather forecasting over a wide range of weather data. Consequently, all the problems mentioned above have made it necessary to innovate in the related field.
The Objectives of the Invention
The main objective of the present invention is to increase the performance of the next value prediction or classification by eliminating the overfitting problem seen in one or more dimensional sequential signal arrays.
Another objective of the invention is to increase the accuracy of prediction with continual training update by the evaluation of the errors occurring after each new data in different subnetworks.
Another objective of the invention is to increase the performance of the prediction or classification by fusion, without causing an overfitting problem even when the input data type consists of sequences of more than one type and different widths.
The Brief Explanation of the Invention
The main objective of the present invention is to introduce a method structure for training a computer-based prediction system, comprising at least three artificial neural subnetworks and a main decision layer, based on a predictive error reduction network that takes the signal or data as input and predicts or classifies its next value. The invention consists of an initial prediction network, at least one prediction network for residual error prediction of the previous prediction network, an error sequence next value prediction network, and a predictive error correcting summation point or fusion network that combines prediction and correction values. The functional strategy of the invention is to recognize the system features that are the source of the data in the frequency or wavelet domain and train additional data or features to reduce the error components from previous networks instead of training them together with the whole for a target.
Accordingly, the present method contains the following steps of separating the current and historical values of the signal or data at the input into at least two different frequency bands, transforming the value sequences in the mentioned frequency bands and applying feature extraction, applying the extracted features to separate neural subnetworks, using the output of the first artificial neural subnetwork as an input to a main decision layer; determining the corrected data of the next neural subnetworks, by predicting the error in the output of the previous neural network, as a composite prediction value by subtracting the “prediction error” prediction value from the previous prediction value; by similarly predicting the error of this value by the next network, prediction subnetworks generate a new predictive value that is constantly corrected by subtracting it from the composite predictive value remaining from the previous network; applying the same data to these subnets by selecting or sorting according to the correlation between properties obtained from their values in different time and frequency ranges and residual error value instead of output prediction value of different types of data; in the training of subnets, starting from the second artificial neural network, shifting the input data sequences back one by one and using the difference between the previous prediction value of the previous network and the actual input value as a target label; reducing error by using only compensated and reduced error data sequences as input instead of the last of the first level neural subnetworks, and predicting the next value at its output, subtracting the error indexdependent prediction value from the composite prediction remaining from the previous ones; for this purpose, using the last updated error value from the previous ones as the target label by shifting the input data sequences one step back during the training; applying the denormalized values of the error prediction outputs of the first prediction neural network and its residual error values to a separate artificial neural network input for fusion, in order to further reduce the reduced error value in the first level subnets with this method; obtaining the denormalized value of its output as a high-performance prediction or classification value by directly training this end-level fusion network with the target prediction or classification label value.
Therefore, except the first network and the last fusion network, all other sub artificial neural networks are trained with the backward-shifted values of the remaining error sequence from the previous residual error instead of the main prediction label data. While the prediction error values is gradually reduced in this cascaded and additive manner at the first level, input data features to the fusion network at the second level becomes more orthogonal even though number and type of external input parameters are increased. Hence the effect of overfitting is reduced and the composite prediction or classification performance at the output of the fusion or terminal network is increased.
Additionally, the data sets to be predicted or classified are divided into different band groups and data types such as low and high frequency bands and applied to separate networks instead of being applied to a single network; the ones outside the first network are trained with each other's errors instead of training with the target label data. In addition, the invention can prioritize data sources and time frame scales during the training update, starting from the lowest frequency or the widest time frame, reducing errors towards the high frequency or the narrow time window.
Therefore, even if the usable data set expands, the overfitting effect and the decrease in sensitivity are eliminated.
The target prediction or classification value is determined by aggregating the output values produced in the neural subnetworks or by applying them to another artificial neural subnetwork input. In this way, it becomes possible to use not only different types of data, but also the sampled values of the same data type in different frequency bands or with different periods, without causing excessive compatibility, in increasing the output performance.
In a preferred embodiment of the invention, if the input contains more than one type of signal, another artificial neural subnetwork is used to compensate the output signal error of the first artificial neural subnetwork, and the output of said sequential artificial neural subnetwork is used sequentially according to the aforementioned artificial neural subnetworks, given as input to the decision layer.
Therefore, target prediction or classification performance is increased by participating in the composite error reduction of multiple types of signal inputs.
In a preferred embodiment of the invention, in case of continuous input, the training is continued by using the current values as label data by applying the input values of the artificial neural subnetworks back to the previous value.
Thus, continual learning can be provided without using current input values.
In a preferred embodiment of the invention, the inputs that have a non-linear contextual relationship with each other and/or the current error residue and with separate feature extractions are used as inputs together.
In a preferred embodiment of the invention, the previously mentioned main layer is provided as an artificial neural network or by summation point via aggregating the prediction values and the prediction of the error residuals from the artificial neural subnetworks.
Therefore, the processing speed is increased on low-configuration hardware, especially when the decision layer is selected as the collection layer. Additionally, the present invention concept includes an artificial neural network model trained with this training method, a computer-based prediction method that provides the next value prediction, and a device containing hardware suitable for these processes.
The Definitions of the Figures Explaining the Invention
The figures and relevant explanations which are used in order to better explain the device developed with this invention are found below.
Figure 1. Multiple neural network structure with one input and one output, in which the neural network is used as a decision layer at the output.
Figure 2. Schematic representation of the input, target label and output signal arrays in a multiple neural network in training update mode.
Figure 3. Schematic view of the wavelet transformation with Mallat algorithm.
Figure 4. Multiple neural network structure with one input and one output using an aggregation layer as a decision layer at the output.
Figure 5. Multiple neural network structure using multiple types of input signals and a fusion network.
Figure 6. A simplified multiple neural network structure in which an aggregation layer is used as a decision layer for next value prediction with multiple types of input signals.
Figure 7. An artificial neural network structure that performs conventional time-delayed next value prediction or classification.
Figure 8. A conventional fully connected neural network structure.
The Definitions of Elements/Parts Forming the Invention
In order to better explain the device developed with this invention, the elements and parts in the figures are enumerated, and the meaning of each number is given below.
A. Artificial neural subnetwork
B. Neural network layer
C. Feature extraction layer
D. Main decision layer
1. The first ANN 2. Discrete time series x(t=kT) signal or x(k) data sequence input
3. Feature extraction layer: Discrete signal array-wavelet transformer
4. Entering the wavelet transformation coefficients except aO into the ANN
5. The primary predictive value before denormalization at the first ANN output
6. The first historical value of the discrete signal array input
7. The second historical value of the discrete signal array input
8. The (n-l)th historical value of the discrete signal array input
9. A historical value of the first corrected prediction sequence obtained by subtracting the predicted error errlp(t) or errlp(k) by the secondary ANN after denormalizing the primary ANN’S prediction output (xp(t+T) or xp(k+l) ).
10. The current prediction error, err(t) or err(k), calculated based on the difference between the corrected previous prediction value and the current input value
11. The previous value of the prediction error value, err(t-T) or err(k-l)
12. Two-previous value of the prediction error value, err(t-2T) or err(k-2)
13. (p-1) previous value of the prediction error value, err(t-(p-l)T) or err(k-(p-l))
14. n-input wavelet converter normalized to the moving average of the last n values of the input signal array (it has n-1 outputs, excluding aO)
15. Entering the (n-1) valued wavelet transform output coefficients to ANN except aO
16. The ANN that processes current data x(t), x(k) and historical values
17. Prediction of the error of the primary ANN by denormalizing the output of the secondary ANN
18. Normalized final p-value of the residual error prediction sequence remaining after secondary ANN by wavelet converter (p-1 outputs, except aO)
19. Output of the normalized residual prediction error with (p-1) valued wavelet transform coefficients except aO 20. ANN that predicts the next value of the residual prediction or classification (error termination network)
21. Following value prediction of the residual error or classification error with the third ANN err2p(t+T) or err2p(k+l)
22. Summation of the last n values of the input signal array
23. Division of the last n values of the input signal array by n
24. The moving averages of the last n values of the input signal array
25. Normalization of the current value by subtracting the last n-valued moving average value in the input signal array
26. Normalization of the first historical value by subtracting the last n-valued moving mean value in the input signal array
27. Normalization of the second historical value by subtracting the last n-valued moving mean value in the input sequence
28. Normalization of the (n-l)th past value by subtracting the last n-valued moving mean value in the input signal array
29. Summing the final p-value of the reduced prediction error by the second ANN
30. Finding the moving average of the final p-value of the prediction error reduced by the second ANN by dividing the sum by p YSA
31. The p-value moving average of the prediction error
32. Normalization of the current value of the prediction error by subtracting the last p-valued moving average
33. Normalization of the first historical value of the prediction error by subtracting the moving average with the last p-value
34. Normalization of the second historical value of the prediction error by subtracting the moving average with the last p-value 35. Normalization of the (p-l)th historical value of the prediction error by subtracting the last p-valued moving average
36. A multi-layer fully connected ANN
37. A multi-layer fully connected ANN output
38. First historical value of the last n-valued moving average of the input signal array
39. The second historical value of the last n-valued moving average of the input signal array
40. The (m-l)th past value of the last n-valued moving average of the input signal array
41. Sum of the last m-value of the n-valued moving average of the input signal array
42. The sum of the last m-value of the n-valued moving average of the input signal array divided by m
43. The mean of the last m-value of the low-frequency component obtained by the last n-valued moving average of the input signal array
44. Normalization of the current value of the low-frequency component of the input signal array obtained with the last n-valued moving average by subtracting the m-valued moving average
45. Normalization of the first historical value of the low-frequency component of the input signal array obtained with the last n-valued moving average by subtracting the m-valued moving average
46. Normalization of the second historical value of the low-frequency component of the input signal array obtained with the last n-valued moving average by subtracting the m-valued moving average
47. Normalization of the (m-l)th past value of the low-frequency component of the input signal array obtained with the last n-valued moving average by subtracting the m-valued moving average 48. Prediction of the next value with the low-frequency component, prediction of the prediction error, and non-denormalized predictive value output of the fusion ANN that takes the future value prediction values of the error sequence as input
49. Denormalized composite prediction output with minimized error xpcd(t+T) or xpcd(k+l)
50. First wavelet transformation neural network WNN1 (Wavelet Neural Network) consisting of feature extraction with wavelet converter and primary next value prediction by ANN
51. Second wavelet transform neural network WNN2, consisting of feature extraction with wavelet converter and ANN, predicting the future value of the errors of the first network
52. Third wavelet transform neural network WNN3, which consists of feature extraction with wavelet converter and ANN, used in predicting the future value of the remaining prediction error value after reduced error with WNN2 output
53. Non-denormalized ANN prediction output value in an example structure from common machine learning applications for time series predictions
54. Denormalized prediction output value in an example structure from common machine learning applications for time series predictions
55. Input layer neurons in ANN
56. Hidden layer connections after input layer in ANN
57. One of the layers of neurons hidden in ANN
58. Output neuron in ANN
59. Signal array formed by sampling a time-series signal
60. Obtaining the approximation coefficient from the sum of the consecutive signals in the wavelet transform
61. Obtaining the detail coefficient from the difference of consecutive signs in the wavelet transform 62. First-order approximation coefficient in two-level wavelet transform applied to a four- sample array
63. In the two-level wavelet transform applied to the four-sample array, the wavelet transformation aO value depending on the mean and approximation of the first-order approximation coefficients
64. Obtaining al wavelet transform output value from the difference of the first level approximation coefficients in the two-level wavelet transform applied to the four-sample array
65. In the two-level wavelet transform applied to the four-sample array, the wavelet transformation values a2 and a3 representing the detail depending on the first level difference value
66. Using the prediction error of WNN1 as the label value in the training update of WNN2
67. Input signs in a conventional ANN
68. Output coefficient values in a wavelet transform example with four sampling value inputs
69. Denormalized value of the next value prediction of the prediction or classification error with the third ANN
70. Obtaining the predictive value with reduced error by subtracting the current "prediction value of the prediction error" at the output of WNN2 trained with the previous denormalized errors of WNN 1 from the subsequent value prediction with WNN 1 according to the low- frequency component of the input signal array
71. Additional token input y(t) or y( if a second data type is used to continue error reduction in the PECNET structure used for next value prediction or classification based on a sequence entry ( x(t) or x(k) ) k)
72. The first historical value of the additional token sequence y(t-T) or y(k-l)
73. The second historical value of the additional sequence y(t-2T) or y(k-2) 74. The (q-l)th past value of the additional marker sequence y(t-(q-l)T) or y(k-(q-l))
75. Addition of the last q value of the additional sequence of signs
76. The sum of the last q value of the additional sequence of signs divided by q
77. The moving average value of the last q value of the additional sequence of signals
78. Normalization by subtracting the q-valued moving average from the current value of the additional signal array
79. Normalization by subtracting the q-valued moving average from the first historical value of the additional sequence of signals
80. Normalization by subtracting the q-valued moving average from the second historical value of the additional sequence of signals
81. Ek Normalization by subtracting the q-valued moving average from the (q-l)th past value of the additional sequence of signals
82. Wavelet transform after normalization of the last q value of the additional signal array
83. Transformation coefficients excluding aO in wavelet transform after normalization of the last q value of the additional signal array
84. ANN where wavelet transform coefficients except aO obtained from the additional signal array are applied to its input
85. Denormalized value of the residual error prediction at the ANN output, where the normalized wavelet transform coefficients of the additional signal array are entered
86. Reduced predictive value with error prediction due to additional sequence of signals
87. ANN for composite error reduction in case of multi-signature input
88. Non-denormalized prediction or classification value xpc(t+T) or xpc(k+l) at the output of ANN used in combined error reduction
89. Composite prediction or classification value with reduced error xfinal_p(t+T) or xfinal_p(k+l) 90. ANN for predicting the error value of an additional set of signals ( y(t) or y(k) ) together with the residual errors from the previous network for simplified PECNET implementation in multiple input signal arrays
91. Non-denormalized output of ANN that predicts future error value for simplified PECNET implementation in multiple input signal arrays
92. Denormalized value of ANN output predicting future error value for simplified PECNET application in multiple input signal arrays
93. For simplified PECNET application in multiple input signal arrays, the predictive value output with reduced error obtained by subtracting the main predictive value from the expected error value in the prediction
94. A fusion prediction ANN, in which the prediction, prediction error, and error sequence are applied to the inputs of the future value prediction
95. Denormalized error prediction output in simplified PECNET system example
96. Denormalized primary prediction value obtained by adding the moving average of its input to the first ANN output value
97. Normalization of the data in the low frequency band entered in the first ANN
98. Normalization of the data entered in the second ANN
99. Normalization of the prediction error sequence to be entered into the third ANN
100. Normalization of the sequence of additional markers
101. Output value with improved accuracy by subtracting the predictive error value from the initial prediction value in a simplified example PECNET structure
The Detailed Description of the Invention
The subject of the invention is about a training and prediction method and the associated device which increases the performance of next value prediction or classification in one or more dimensional sequential signal arrays compared to the conventional machine learning applications.
In the detailed explanation here, x(k) represents the current value of any spatial or temporal data sequence, x(k+l) represents the next value and x(k-l) represents the previous value. x(t) represents data sampled with equal periods in time (uniform sampling); x(t-T) represents the “T” sampling period previous value of the data. Therefore, for instance, xp(k+l) represents the next predicted value of any data sequence; xp(t+T) represents the predicted value of the uniformly sampled sequence at the next time period as the “T” sampling period. The expression “avg” in the figures represents the average value and the expression “err” represents the error value.
Referring to Figure 1; in order to implement of the aforementioned training and prediction method, there are at least three artificial neural subnetworks (1, 16, 20) where the inputs are provided and a main decision layer (84) where the outputs of the above mentioned artificial neural subnetworks (1, 16, 20)) are provided as inputs for. Even though the number of neural subnetworks (1, 16, 20) is three in Figure 1, more than three artificial neural subnetworks (1, 16, 20, 84, 87, 94) can be provided. The mentioned artificial neural subnetworks (1, 16, 20) also provide inputs to each other in training as well. During this training, except for the training method of the first and the last artificial neural subnetworks, the neural networks in between are trained in the same way.
The aforementioned neural subnetwork layer (1, 16, 20) includes a neural network layer (3, 14, 18) and a feature extraction layer (4, 15, 19).
Additionally, there may also be a transformation layer to transform, preferably to normalize, the data before the input of the neural subnetwork layer (1, 16, 20).
For neural network layers (3, 14, 18), in addition to using methods such as Deep Neural Networks (DNN), convolutional networks (CNN), long short-term memory networks (ESTM), for basic machine learning functions such as recognition, prediction, classification, it is possible to adapt the invention described here to increase the performance of different types of neural networks. Here, a subsequent or future value of the data type is provided over a predictive output trained with the method of the invention, and the output data type acquired here does not need to be the same type as the input data type. Here, the input can be signal or data sequences.
Firstly, the input signal arrays, which will be used according to the prediction data, are first grouped as correlation and frequency bands. This selection is made starting from the lowest frequency band of the data with the highest statistical correlation. For ease of understanding of the method in the figures given here, assuming that the next value of x(t) or x(k) shows the highest correlation with itself x(t+T) or x(k+l), the moving average method (22, 23) and/or the filtered value with a low-pass filter is applied to the primary network (50) by normalizing it with another moving average receiver (44, 45, 46, 47).
The orthogonality between the currently used input data relative to the correlation of data sequences which can be used in addition to the main input data type (x(t) or x(k)) (such as y(t) or y(k)) to the target data sequence to be predicted, is a determining factor. When the data types determined after the data type design are decomposed starting with low-frequency components to high-frequency components via the correct filters and applied to different artificial neural subnetworks, the overall prediction or classification performance increases. For ease of understanding the structure in Figure 1, with this purpose, the moving average (24) value is used in the first artificial neural subnetwork (1) stage. Note that this value is also used in the input value normalization (25, 26, 27,28) of the second prediction network (51).
In Figure 1, the prediction or classification system is explained by inputting only one type of signal array (2). To obtain the transformed value, the n moving average values (24) consisting of the (n-1) historical values (2, 6, 7, 8) and the current value of the input sequence (2) are summed and divided by the number of signals, and the resulting average is used as a low- frequency component (50) in the neural subnetwork (1, 16, 20), preferably of the WNN type.
Instead of a moving average, filters such as Butterwoth and Chebyshev, which are widely known in the literature and whose frequency responses are more stable, can be preferred in the application in order to have a more regular decomposition of the signal features according to the frequency bands. Unlike this simplified example, when the input signal (2) is decomposed into more than two frequency bands and the highly correlated ones are applied to different artificial neural subnetworks (1, 16, 20), the prediction or classification performance may increase.
In the application in Figure 1, the final m values (24, 38, 39, 40) of the low frequency components (24) which are represented by the moving averages are subtracted from their averages (44, 45, 46, 47) and normalized. In this example application, the normalized values are sent to the first artificial neural network (1), where the features are extracted with discrete wavelet transform (3, 14, 18) in the feature extraction layer (4, 15, 19), especially with the Mallat wavelet transform shown in Figure 3. The features extracted here are applied to the neural network layer (3) of the first neural subnetwork (1).
In the application in Figure 1, as the frequency decomposition, the moving average, the low- pass filter, and the relevant normalization are preferred, since it is suitable for continual signal updating without being dependent on a future value and its structure is simple in terms of expression. It is also possible to use different transformation and normalization types in different applications.
Since subtraction from the average process is applied during the normalization, the wavelet transform coefficient representing the average would converge to ao=O, hence it is not used in the first artificial neural network (1) input in Figure 1. Therefore, the first neural network layer (3) in Figure 1, for the feature extraction layer with m inputs (4), preferably the wavelet converter with m inputs (3), has m-1 inputs. This first neural subnetwork (1) is trained directly with target data (x(t+T), x(k+l)) as in conventional prediction systems, unlike the other first artificial neural networks (16,20).
While the first neural subnetwork (1) is not denormalized, i.e., its direct output (5) is sent as an input to the main decision layer (84), that is, to the fusion artificial neural network (94), the first neural subnetwork (1) output transforms into the denormalized primary prediction value (96) by aggregating the n-valued moving mean of the input signal array with the average of the last m values.
The first error reduction is performed by taking the difference (70) of this primary prediction value with the error prediction value (17) which comes from the secondary artificial neural subnetwork (16), i.e., the secondary neural network. Therefore, how the secondary, more broadly speaking the next, artificial neural subnetwork (16, 20) is trained, is a crucial detail that differs from conventional prediction systems.
The secondary artificial neural subnetwork (51) in Figure 1 which contains the secondary neural network layer (16), is trained, as shown in Figure 2, with the historical prediction error data ( 66) of the primary artificial neural subnetwork (50), which contains the primary neural network layer (1).
To do this, the last n values x(k). . ,x(k-(n-l)) including the current value x(k) (25, 26, 27, 28) of the input signal are implemented to the secondary neural subnetwork (51) in the test run. The input, output, and target label data flow relationship of artificial neural subnetworks (16, 20) in the training mode in Figure 1, can be seen in Figure 2.
In the training mode, unlike the test mode, the second artificial neural subnetwork (51) input is applied as a data sequence, starting from x(k-l), which is one value behind the current value, as much as the window size backwards. If the window size is n, n-past values are used starting from x(k-l) to x(k-n). This diagram is drawn for n=4 in Figure 2. Here, the established “prediction error value” of the first artificial neural subnetwork (1) is applied as the target label data (66) of the artificial neural network layer (16) in the secondary artificial neural subnetwork (51). This “prediction error value” is equal to the difference between the last data input value x(k) and one sample back-shifted (z-1) value of the prediction value (96). Here, the prediction value is the denormalized value by aggregating the n-valued moving average of the input signal array of the first neural subnetwork (1) output with the average of the last m number of values.
Therefore, in the training mode unlike the prediction mode, the normalized value of a sample- shifted input sequence (x(k-l). . .x(k-n)) is entered into the second artificial neural subnetwork (16) input, while the predicted value of the first artificial neural subnetwork (1) goes backwards. The current error value (66), which is the difference between the shifted (z- 1) value and the current input, is applied as the target training label.
With this method, the training can be updated without using subsequent values. An essential benefit of establishing such a training model is that the training can be updated as new data becomes available for predictions about systems which change over time. Usually in supervised artificial neural network (1, 16, 20, 84, 87, 94) learning applications, a certain proportion of the pre-given data set (for example, 70%) is used in training and the remaining part (for example, 30%) is used in the validation of the model. In the method proposed by the present invention, the conventional methods can be applied and also it is possible to constantly update the training as new data comes in. If a continual training update is desired, the cycle consisting of reading input data, calculating the previous prediction error, updating the training, and determining the new predicted output value will need to be repeated.
Therefore, if desired, each time a new value comes in, first the predictive value is obtained by providing the signal flow seen in the test mode in Figure 1, and then the training can be updated by applying the training mode in Figure 2. Continual training contributes to the increase in performance in systems that change over time, especially in stock market value prediction.
In the prediction mode, the normalized output value of the secondary prediction network secondary neural subnetwork (51) is the prediction of the prediction error of the primary second artificial neural subnetwork (50). This value (17) is subtracted from the previous prediction value (96) to obtain the “compensated prediction value” (70). The output of the second artificial neural subnetwork (17) that compensates for this is also applied as an input to the main decision layer, i.e., the prediction artificial neural network (94). After the training of the neural subnetworks (1, 16, 20) before the main decision layer (94) is completed, the main decision layer (94) is trained directly with the target prediction or classification value.
The third neural subnetwork (20) in Figure 1, i.e., the last neural subnetwork, uses the difference between the current value and (z- 1) (9), which is the penultimate value of the prediction error- corrected prediction values (70), which are predicted by the second artificial neural subnetwork (16), as input. This input can be called the “current residual error value”. Here, the prediction error corrected prediction values are obtained by taking the difference of the primary prediction value with the error prediction value which will come from the secondary artificial neural subnetwork. Thus, this difference determines the final prediction error.
Even though three neural subnetworks (1, 16, 20) are shown in Figure 1, it has been previously stated that many more artificial neural subnetworks (1, 16, 20, 94, 84, 87) can be used. Regardless of the number of stages of the structure, while the ones before the last neural subnetwork (20) use different frequency components of the input data as input, the last neural subnetwork uses only the error sequence remaining after the corrections, as input. In the structure in Figure 1, the third artificial neural subnetwork (20) is the final error prediction stage. Here, the current residual error value (10), which is the difference between the historical value of the error compensated prediction value and the current input value (2), and its historical values (11, 12, 13) are obtained and applied to the last, i.e., the third artificial neural subnetwork (52). Although there are different types of data normalization possibilities, in terms of the ease of the demonstration of the method, the technique of taking the difference from the moving average technique is used in the last error prediction stage, as with the previous stages. Thus, the residual error data sequence (err(k), err(k-l), err(k-2), ... err(k-(p-l))) (10, 11, 12, 13) in the last artificial neural network (52) input is normalized with this method (32, 33, 34, 35).
In the training mode shown in Figure 2, the final error correction artificial neural network (52) differs from the prediction (validation) mode; p number of values of the residual error values from the penultimate value err(k-l) to err(k-p) are applied. The example in this figure shows the p=4 case. During the training, err(k) (10), which is the last error value of the last artificial neural network (52) is applied as the target label value. Thus, when the last artificial neural network (52) works in prediction (validation) mode, it predicts err(k+l) or err(k+T) (21) corresponding to the next expected error value while applying to the input of the error sequence ending with err(k), which is the last current residual error value, and this value is denormalized (69) and applied to the decision layer (94).
In the configuration in Figure 1, the non-denormalized output value (5) of the first artificial neural network (1) where the low frequency component values are entered, the denormalized output (17) of the second neural network (16) where the error of the first prediction output is predicted, and the denormalized output value (69) of the last artificial neural network (20) from which the residual error of all is predicted, are all inputted to the main decision layer (94). Thus, the performance of composite prediction or classification is increased by applying the orthogonality-enhanced features (5, 17, 69) obtained by predicting the current residual error values in sequential networks to the inputs of the main decision layer (94).
In the training mode (Figure 2), the last input value x(k) (2) is applied as the target label value, while the previous values of the data (5, 17, 69) coming from the neural subnetworks (1, 16, 20) are entered into the input of the main decision layer (94). In other words, for example, in the prediction mode in Figure 1, xp (k+1) (5) is applied to the main decision layer (D), while its previous value (z- 1 ) in the training update, xp (k) is sent to the input of the fusion network (94) input. Whereas in the prediction mode, when the signal array ending with the current data x(k)(2) is applied, at the output (48) of the main decision layer (94), the prediction value Xfmai_p (k+1) or Xfinai_p(t+T) (49), whose error is reduced by denormalizing Xf p(k+l) or Xf P(t+T), is obtained. In the example in Figure 1, since it is normalized by subtracting the moving average, for the ease of understanding the working principle, the fusion network output is denormalized with the mean at the output to obtain the future value Xfmai_P (k+1) or Xfmai_p (t+T) (49) as the main prediction.
In Figure 1, an artificial neural network can be used as the main decision layer (94). Alternatively, it is also possible to obtain an improved prediction value from the primary prediction (5), the prediction error prediction (17) and the residual error prediction (69) with the aggregation (subtraction) layer as the directly signed main decision layer (94). In fact, in the simplified examples in Figure 4 and Figure 6, the signed aggregation (subtraction) method is shown instead of the main decision layer (94). It should also be noted that the applications in Figures 4 and 6 can also use an artificial neural network as the decision layer (94). However, it is observed that the use of a separate aggregation network (87, 94) instead of the aggregation process in commonly encountered signal array examples increases the prediction performance.
The configuration in Figure 1 is applied to predict the next day values based on the daily closing values of 4 stocks in the New York Stock Exchange Nasdaq index, and the results obtained according to the stages are compared with ARIMA (Auto Regressive Integrated Moving Average), which is widely used in the literature. The values of Apple (APPL), Bank of America (BAC), Micron (MU) and Ford (F) stocks for 3147 stock market working days between the dates of 24.09.2007-02.04.2020 are used for comparison between PECNET and ARIMA, and the results are shown in Table 1. Continual learning mode is not used in this comparison. Instead, the first 70% of the data is used for training and the remaining 30% is used for testing (validation), as is common in prediction and classification with conventional machine learning. First of all, it should be noted that the performance of the prediction or classification in timevarying systems such as the stock market increases when the continual training update of the present invention is applied. However, the prediction error at the output of the fourth level, the main decision layer (49) of the configuration in Figure 1, is significantly lower than ARIMA for all compared stock values. In Table 1, the RMSE (Root-Mean Square) values, which are the effective values of the prediction errors, are given. As can be seen here, starting from the denormalized value (96) of the first artificial neural subnetwork (1) output of the configuration in Figure 1, the error (RMSE) decreases continuously as the level progresses, and it comes to the lowest level at the output of the decision layer (49). In this table, the second level value is the corrected value of the first level prediction output by subtracting the denormalized value of the second artificial neural subnetwork (51) error prediction output (70). Although the third level prediction value is not included as a value in the diagram in Figure 1, the denormalized value (69) of the third neural subnetwork (53) output is given by subtracting the corrected prediction value (70) at the previous level in order to show the reduction success in the comparison table. Because the denormalized value (69) of the third artificial neural subnetwork (53) output (21) sends the prediction of the residual prediction error not the prediction to the main decision layer (87, 94). As can be seen here, for instance, the effective error (RMSE) in predicting the next closing value of Bank of America shares with the previous closing values with ARIMA is 0.4853, while at the output of the present invention it is 0.2646. In the PECNET application of this comparison, the system is operated as m=4, n=3 and p=3 as shown in Figure 1. All artificial neural subnetworks (1, 16, 20) are identical and have 4 inputs, 1 output and 4 hidden layers in the form of 4x32x32x32x16x1. In the training mode in Figure 2, the moving average values subjected into the normalization (97) are given as five-point (avg(x(k-5). . ,x(k- 1)) etc.) and the last 4 of them are inputted into the first artificial neural subnetwork (50) in the sample application. Therefore, as in the example of the moving average filter, the filter order, and the last number of values from this filter do not have to be equal to each other.
Figure imgf000023_0001
Table 1- Comparison of the RMSE (Root Means Square Error) performance in daily value prediction of 4 stocks in Nasdaq, according to the stages of the PECNET configuration depicted in Figure 1, with ARIMA (Auto Regressive Integrated Moving Average) Compared to the network types such as LSTM (Long short-term Memory) and CNN (Convolutional Neural Network), the training and prediction system of the present invention generally provides higher performance (i.e., lower error) in future value prediction or classification of data sequences. In the artificial neural networks (1, 16, 20, 94, 84, 87) in the present invention, the fully connected network model shown in Figure 8 is used for the ease of expression. This also applies to the comparisons in Table 1.
Network types such as LSTM and CNN can be used in place of the fully connected network in the system of the present invention. The artificial neural subnetworks (1, 16, 20) of the present invention allow to increase the combined prediction or classification value by decomposing the error residues and orthogonal features. In fact, the time-delayed wavelet transformation neural subnetworks (1, 16, 20) model (TDWNN), which is used at different levels in the present invention and seen in Figure 7, can be found in the literature with different configurations, especially with time series prediction applications. When the TDWNN model is used alone for the stock forecast data in Table 1, none of its configurations can approach the level of performance provided by its multiple use in the present invention system.
The performance of the present invention in Table 1 can be increased by using other data of the stock whose closing value is predicted on the next day, as shown in Figure 5. For this purpose, economic and financial data pertaining to the stock or the market, such as transaction volume, can be inputted. Inputting these data reveals another advantage of the present invention. The risk of overfitting does not increase as new values come in a limited data set, as observed in common forecasting models, because the new incoming data are mostly entered into independent subnets and trained with the residual error value. As a matter of fact, if a direct entry is made to a conventional network by using the 4 values of the daily and weekly changes of 5 kinds of parameters such as trading volume, price / profit rate for a total of 3147 daily data sets here, the total number of entries will be 5x(4+4)=40. The problem of overfitting would be observed in the training of some of the 3147 -day data with a single network with 40 inputs. The present invention provides a critical advantage at this point.
The present invention can be extended by sampling of a single data type in different window sized or by additional data types as well as features in different frequency bands. As in Figure 5, additional data types such as y(k) or y(t) (71) can be used for sequential error reduction. In this case, the level of the pattern relationship of the data (71) to be added as a new input type with the residual error sequence from the previous network instead of the target value to be predicted (xp (k+1)) determines its contribution to the improvement of the prediction value. Instead of the correlation of the components of the same data in different frequency bands or different types of data with the data to be predicted, maximizing the correlation with the residual error from the previous networks enables the selection of orthogonal data features, which is a matter of fundamental superiority in machine learning. Overfitting is prevented by selecting new data (71) with the orthogonality condition and finalizing it with a main decision layer (87,94) selected as aggregating or artificial neural network.
Another direction of expansion of the present invention is that it is also possible to used sequentially with more than one system. In this case, as in Figure 5, a structure with at least two inputs can use the output of another structure or more than one structure as an input sequence. For example, in the next day's value prediction of a stock in the stock market, the future value predicted by another structure can be used instead of the past value of the index, in addition to data such as the past values and trading volume of the stock in question as input data. Thus, a large number of parameters and historical values can be used in multiple neural subnetworks with fewer inputs (1, 16, 20, 94, 84, 87) or in different structures within the structure networks, and in the composite prediction output despite the limitation of the amount of data in the training set, the likelihood of overfitting is reduced.
Considering the embodiment of the present invention described here, the WTDNN structure in Figure 7 is used as a common building block in the stages before the main decision layer (87, 94). The configuration in Figure 1 uses 3 WTDNN artificial neural subnetworks (50, 51, 52) and the main decision layer (94), while the configuration in Figure 5 uses 4 WTDNN artificial neural subnetworks and the main decision layer (87). In Figure 7, a continuous signal in time x(t) can be discretized with the T period in the form of t=k T or directly as a discrete signal array (2, 59) in the shape of x(k) k=0,l,2,.. applied to the model. A total of n values, including the time discrete signal array’s x(k) current and (n-1) past values (6, 7, 8), are normalized and inputted into the wavelet converter (14) feature extraction layer (15).
The common use of this model is to predict the next value of the sequence or the next value of a certain number of samples using historical values, or to determine a class associated with historical values. Extracting what sound, it is from the change of a sound signal can also be shown as an example of classification of the same structure. In this structure, in addition to x(k), multiple types of signal arrays and historical values such as y(k), z(k) can be applied, which will increase the accuracy of target data prediction or classification. In fact, the second of the two WTDNN neural subnetworks used in the configuration in Figure 6 inputs the wavelet transform feature extraction blocks (18, 82) of two separate data sets into a common artificial neural subnetwork (90).
A frequent problem that arises in time series prediction or data series classification with conventional artificial neural subnetwork (16) structures, including WTDNN in Figure 7, is that as the number and type of historical values of the data increases, the number of subsequent artificial neural subnetwork (16) entries also increases. The increase in the number of inputs also increases the number of data sets required to train the artificial neural subnetwork (16). In particular, if the non-linearity is high between the input signal data sequences (2) and the output (54) that is desired to be learned, and/or the system that is the source of the data has a timevarying character, underfitting or overfitting problems may arise due to the insufficient training data set. While a high prediction or classification performance is achieved between the dataset and the output in the training of the system, a significant decrease in the performance in the test data set in which these have not been used before is a result of the overfitting problem. In case of underfitting, the prediction performance for the training set also decreases. For example, the closing value of a stock in the stock market for the next day may depend on the historical values at different frequencies such as the daily change in the current week, the change in the past weeks or months, or it may be in an antecedent signal relationship with the change in value of another stock with the time shift. With the conventional neural subnetwork (16) model in Figure 7, wavelet transform (14) for feature extraction (14) is one of the methods used in many applications before the data is applied to the neural subnetwork (16) in prediction or classification. Although it is not necessary to use this transformation, wavelet transform generally increases the overall performance in evaluating the pattern signals related to physical systems as time series. Therefore, here, comparative explanations of the conventional artificial neural subnetwork (16) prediction model and the present invention are made on schematics using the wavelet transform.
Normalization in the prediction scheme with the conventional artificial neural network (16) in Figure 7 is done by subtracting (25, 26, 27, 28) the moving average values from the input values(x(k), x(k-l), ... x(k-(n-l))). At the output of the artificial neural network, the next value of the T period, xp(t+T) or xp(k+l), is summed with the input average, and the denormalized prediction value is obtained. Here, it is also possible to apply the input signal and historical values directly to the artificial neural network (16) without wavelet transform. In such networks, it is important to include the changes to be learned as well as the amount of input data to be used in the artificial neural network (16).
The artificial neural network system seen in Figure 1 has 4 fundamental differences compared to the conventional single-network data entry systems seen in Figure 7.
The first difference is the way in which the values of the same type of data in different frequency bands are entered separately into the prediction or classification system. When the conventional method is used, for example, the average weekly value of a stock for the last 4 days and the last 4 weeks is used together in an artificial neural network with a total of 8 entries when a network is entered. In the application of the present invention given in Figure 1, as with the weekly average values, low frequency components of the signal (24, 38, 39, 40) are entered into an artificial neural subnetwork (1) after feature extraction, while a second artificial neural subnetwork (16), predicts the prediction error of the former with higher frequency current data (2, 6, 7, 8). The purpose of doing this is to decompose the values of the same signal array in different frequency ranges or past time windows and orthogonal features in the future value prediction or class relationship. Otherwise, the components that are highly correlated with each other increase the number of inputs even though they do not contribute, so the total number of internal neurons that require training also increases.
The structure used as the second level artificial neural subnetwork (16) in Figure 1 is shown with the same numbering as the artificial neural subnetwork (16) in Figure 7 in terms of the input signal array. Note that in the present invention, the denormalized output (96) of the neural subnetwork (1) using the low-frequency component and the denormalized (17) difference (70) of the artificial neural subnetwork using the direct input signal are used in the remaining error compensation as the predicted value with reduced error, but instead of the improved prediction value, the value that improves it (17) is sent to the fusion network (94) as input. Whereas the predictive value (70) which has been reduced by compensating the error is used in the artificial neural subnetwork (52). However, for this to be used, it is obtained by applying a unit delay with the z’1 operator (9). This is because in order to make a sequential error prediction, it is necessary to use the previous value (9), which predicted the current value (2) in the input, not the latest predicted value. For this, the difference between the previous error compensated prediction value (9) and the current input value (2) is taken and applied to the next error correction network (52) as error sequence values (10, 11, 12 ,13) that have changed over time. The output (21) of the artificial neural subnetwork (84) in this correction network (52) is denormalized by aggregating with the input average, and the residual error prediction value (69) is sent to a separate decision layer (94). Alternatively, subtracting the output after the first compensation from the predicted value (70) is an alternative application. In fact, the 3rd level performance of the present invention is shown in this manner in the example in Table 1.
Another significant difference between the conventional prediction or classification method and the present invention is in the way target label data is used in training. Although the second neural subnetwork (16) of the present invention in Figure 1 and the neural subnetwork (16) of the conventional method in Figure 7 are structurally the same in terms of the way the input data is implemented, the second artificial neural subnetwork (16) in Figure 1 uses the residual error after the prediction of the first artificial neural subnetwork (1), as seen in Figure 2, in training. In Figure 2, the input, output, and target label data relations in the training mode of artificial neural subnetworks can be seen when 4 historical values of the present invention are used. Here, feature extraction with wavelet transform is shown as a block (50, 51, 52) as WNN. In the training mode scheme of the present invention given in Figure 2, while the array of taken moving averages starting from x(k-l), which is the one previous value of the first-level neural subnetwork (50) input signal array x(k) (2) is normalized (97) and used (4) at input, it is trained with the last value x(k) as the target label data. In the example drawing here, 4 values of one shifted (z 1) the 5-point moving average of the form avg[x(k-l),x(k-2),x(k-3),x(k-4),x(k-5)] are used in the WNN input. The singular one of the outputs of the normalization block (97) (43) shows the average value that normalizes and is used in the denormalization of the output. The multiple output shows the normalized sequence of values. A similar notation applies to the normalization block (98) of the second-level network.
In the training update of the second artificial neural subnetwork (51), the error data (66) obtained by shifting the normalized prediction output (96) of the first artificial neural subnetwork (50) back one sample and subtracting the current input value (2) is used as the target label data. During the training, input signal values in the form of x(k-l). . . ,x(k-4) are used for the second artificial neural subnetwork (51) input, excluding x(k). Thus, the past values of the input are used to predict the current forecast error. Another difference of the present invention from the conventional prediction systems is that it is possible to find a termination network that reduces the composite residual error remaining after the error reduction networks, with its own variation of the error. The historical value sequence which starts with err(k-l), which is the penultimate value of err(k) (10) which is the difference between the current input value (2) and the previous sample value (9) of the prediction value (70) which was compensated on the second level in the training mode, is applied to the termination network third and last artificial neural subnetwork (52) input. Meanwhile, err(k) is entered as the target label value for the third artificial neural subnetwork (52) training. Note that when switching from training mode to prediction mode, err(k) (10) is included in the third artificial neural subnetwork (52) input. Thus, err(k+l) value in prediction mode is determined by using err(k) and past values.
While the main decision layer (94) used for merging in Figure 1 is trained with historical values of the input data in the training mode in Figure 2, the current input value x(k) (2) is used as the target label value.
The third artificial neural subnetwork (52) used for error termination in Figure 1 is a second use case, as it is in Figure 4, by using the aggregation layer as the main decision layer for error reduction. In cases where small performance differences according to the data type are not very critical, if it is desired to create a system with a simpler structure, the composite prediction error can be reduced by subtracting the error prediction value (95) directly from the denormalized output value (54) of the previous artificial neural subnetwork (16). In the embodiment in Figure 4, the input signal array x(k) is normalized and entered directly into a single artificial neural subnetwork (16) without dividing it into frequency groups or different time intervals. The neural network layer (20) used as the second neural subnetwork without using any additional data and the wavelet converter (18) before it corresponds to the third neural subnetwork (52) in the first Figure.
There are many types of wavelet transformations in the literature. The method shown and described in Figure 3 is the Mallat transformation model used in the exemplary application of the invention. In the model in Figure 3, as an example, it is seen that the transformation dataset consisting of a total of 4 values, aO, al, a2 and a3, is obtained from a total of 4 values, the current value of the x(k) signal array (2) and 3 past values, ( x(k-l), x(k-2), x(k-3) ). In the example here, the sample values (57) of a time series signal at the selected scale are applied
T1 sequentially as the converter input sequence (2). First level mean values (60) are obtained from the sum of the first two values (58) and the sum of the third and fourth values. The aO (61) values are obtained from their mean and al values from their difference. The first level details obtained from the difference of the third and fourth (59) and the difference of the first and the second form the transformation coefficients as a3 and a2, respectively. This described transformation corresponds to the use of Haar as shown in Figure 3 as the mother wavelet function. In the PECNET system, the transform layer functions as feature extraction, and as an alternative to the wavelet transform, Fourier and Hilbert transforms can be preferred according to the domain where the characteristic features of the data are dominant.
Based on the detailed information above; the feature extracted from the current (2) and historical values (6,7,8) of the signal or data in the input and their value sequences (24,38,39,40) allocated to at least one different frequency band, by transformation (3, 14, 16) which is a data training and/or classification method that predicts or classifies the future or next value of signals by applying them to multiple artificial neural subnetworks (16, 20, 94, 84, 87), Starting from the signals allocated to the low frequency band (24,38,39,40), the first artificial neural subnetwork (16) directly predicts the target data (5), while from the second one, the artificial neural subnetworks (16, 20) predict the error of the previous one (17) compensation (70) and error value (10) remaining from the difference between a historical value (9) and the current value (2) of the prediction signal (70) whose error is reduced in each step and using the past values (10,11,12) as inputs in the prediction network (62) that infers the next prediction error correction value (69), and the prediction value of the first network and the error correction prediction values of the second and subsequent networks are created by a master decision trained with the target data, layer (87, 94) to obtain the output prediction or classification (49) value by fusion based on continuous error reduction.
The present invention fundamentally has the following steps; the application of feature extraction by transformation after normalization of the value sequences of the signal or data sequence or pattern in the input and the additional data correlated with the target, sending the normalized output value (96) of the first artificial neural subnetwork (1) to the first level correction network array as a raw prediction value, determination of the composite predictive value (70) of the corrected data by predicting the error in the prediction value of the previous artificial neural network (1) of the next neural subnetworks (16), by subtracting the “prediction error” prediction value (17) from the first denormalized prediction value (96); prediction subnetworks generate (86) a new predictive value that is constantly corrected by subtracting the error of this value from the composite prediction value (70) remaining from the previous network, by similarly predicting (85) by subsequent subnets (84); Starting from the second artificial neural network (11), in the training of subnets, the input data sequences (2) are shifted back one by one and the difference between the previous prediction value of the previous network and using the actual input value (66) as the target label.
The present application also includes a device for conducting the aforementioned methods. Said device contains a processing unit configured to execute the above given processing steps and a memory unit that stores the instructions/commands describing the above steps or combination of these steps and can transmit these instructions/commands to the processor for execution.
The above-mentioned memory unit may also be configured to store previous or input data from an external source. In addition, a separate memory unit can be provided to store previous or input data.
The aforementioned device may also include a communication unit for externally receiving data and/or transferring the predictions provided over the received data to another external device. The said communication unit can be provided in a wireless or wired way, but it can also be provided in a system in a way that directly associates the device with the system.
The present application also includes a device comprising elements for executing the method of claims 1-15 and/or 16-31 when executed by a processing unit, a program containing instructions for performing the method of claims 1-12 and/or 13-25 when executed by a processing unit, and a computer readable medium containing the above-mentioned program.

Claims

1. Being a method for training computer-based prediction system, comprising; at least three neural subnetworks (1 , 16, 20) and a main decision layer (87, 94) based on a predictive error reduction network that takes a signal or data as input and predicts or classifies its next value, the application of feature extraction by transformation after normalization of the value sequences of the signal or data sequence or pattern in the input and the additional data correlated with the target, sending the normalized output value (96) of the first artificial neural subnetwork ( 1) to the first level correction network sequence as a raw prediction value, determining the data corrected by subtracting the “prediction error” prediction value (17) from the first denormalized predictive value (96) by predicting the error in the prediction value of the previous neural network (1) at the next neural subnetworks (16) as composite prediction value (70); predicting the error of this value similarly (85) by the next subnets (84) and subtracting from the composite prediction value (70) remaining from the previous network, so that the prediction subnetworks generate a new predictive value that is constantly corrected (86); using the difference (66) of the previous prediction value of the previous network and the actual input value as a target label by shifting the input data sequences (2) back one by one in the training of the subnets starting from the second artificial neural network (11).
2. Being a method appropriate to Claim 1, applying the error prediction outputs (5, 17, 69) of the first prediction neural network, which continues to reduce the error value which is reduced at the first level subnetworks (16, 84, 20), and the networks that predict its residual error values, to the input of a distinct neural network purposed for fusion (94); training this last level network directly by the target prediction or classification label value, denormalizing the output and obtaining the output as either a high-performance prediction or the classification value.
3. Being a method appropriate to Claim 1, reducing error (101) by predicting the next value of the denormalized value of the output (21) and subtracting the future error prediction value (95) based on the error sequence of the composite prediction value remaining from the previous ones and as the input data of the last of the neural subnetworks (20) at the first level, the difference between the previous value (9) and the current input (2) of the prediction data (70) with only compensated error value is the value change data sequence (10,11,12,13); for this purpose, during the training of the last subnet (20), the input data sequence is applied by shifting it back by one (19) and the last error value (10) is used as the target prediction label.
4. Being a method appropriate to Claim 1, sending the non-denormalized output (5) of the first neural sub network (1) a main decision layer (84) as input.
5. Being a method appropriate to Claim 1, applying feature extraction by decomposing of the signal or data sequence or pattern at the input into at least two different frequency bands and normalization of the value sequences of the mentioned frequency bands and the additional data correlated with the target.
6. Being a method appropriate to Claim 1, in the case that the input contains multiple types of signals, using an artificial neural subnetwork (16, 20) sequentially provided according to the mentioned artificial neural subnetworks (1) to compensate for the output signal error of the first artificial neural subnetwork (1) and giving the mentioned sequential artificial neural network (16, 20) to the main decision layer (84, 97) as input.
7. Being a method appropriate to 6, transforming the output of the above-mentioned sequential artificial neural subnetwork (16) and giving it as input to the main decision layer (84).
8. Being a method appropriate to Claim 6 or Claim 7, for the above-mentioned input signal array, containing multiple samples from various window sizes and time offsets.
9. Being a method appropriate to Claim 1, in the case of continuous input, continuing the training by using the current values as label data by applying the input values of the artificial neural subnetworks (1, 16, 20) by shifting them back to the previous value.
10. Being a method appropriate to Claim 1, using wavelet transformation as a method of feature extraction.
11. Being a method appropriate to Claim 1, using inputs that have a non-linear contextual relationship with each other and/or the current error residue together and with separate feature extractions as inputs.
12. Being a method appropriate to any of the preceding Claims, being the normalization process of the previously mentioned transform process.
13. Being a method appropriate to Claim 11 or Claim 12, conducting the transformation via continuous moving average method.
14. Being a method appropriate to Claim 1, the previously mentioned main decision layer (87, 94) is an artificial neural network.
15. Being a method appropriate to Claim 1, the above-mentioned main layer (87, 94) is provided in such a way as to aggregate and concatenate the prediction values obtained from artificial neural subnetworks and the current error residual.
16. Being a method for training computer-based prediction system, comprising; at least three neural subnetworks (1, 16, 20) and a main decision layer (87, 94) based on a predictive error reduction network that takes a signal or data, as input and predicts or classifies its next value,
The application of feature extraction by transformation after normalization of the value sequences of the signal or data sequence or pattern at the input of an artificial neural network and the additional data that are correlated with the target, sending the normalized output value (96) of the first artificial neural subnetwork (1) to the first level correction network array as a raw prediction value, as the composite predictive value (70) of the corrected data, by predicting the error in the prediction value of the previous artificial neural network (1) of the next neural subnetworks (16), by subtracting the “ prediction error” prediction value (17) is determined from the first denormalized prediction value (96): the error of this value is similarly predicted (85) by the next subnets (84); sub-prediction networks generate a new continuously corrected predictive value by subtracting the composite predictive value (70) from the previous network (86); starting from the second artificial neural network (11), training the subnets by shifting the input data sequences (2) back one by one and using the difference (66) of the previous prediction value of the previous network and the actual input value as the target label, and prediction of the next value of the signal or data input at the output of the main decision layer (87, 94) in response to the current signal or data input.
17. Being a method appropriate to Claim 16, in the first level subnets (16, 84, 20), the first prediction neural network continues to reduce the reduced error value further and applying it to a separate artificial neural network (94) input for fusion purpose of the error prediction outputs (5, 17, 69) of the networks that predict the remaining error values; this last level is to train the network directly with the target prediction or classification label value to obtain the denormalized value (49) of the output (48) as a high-performance prediction or classification value.
18. Being a method appropriate to Claim 16, reducing the error (101) by predicting the next value of the denormalized value of the output (21) and subtracting the future error prediction value (95) based on the error sequence of the composite prediction value remaining from the previous ones and as the input data of the last of the neural subnetworks (20) at the first level, the difference between the previous value (9) and the current input (2) of the prediction data (70) with only compensated error value is the value change data series (10,11,12,13); for this purpose, during the training of the last subnet (20), the input data sequence is applied by a backward shift (19) and the last error value (10) is used as the target prediction label.
19. Being a method appropriate to Claim 16, sending the non-denormalized output (5) of the first neural subnetwork (1) as an input to a main decision layer (84).
20. Being a method appropriate to Claim 16, decomposing of the signal or data sequence or pattern at the input into at least two different frequency bands and the application of feature extraction by transformation after the value sequences of the mentioned frequency bands and the normalization of the additional data correlated with the target.
. Being a method appropriate to Claim 16, in the case that the input contains multiple types of signals, using of another artificial neural subnetwork (16, 20) sequentially provided according to the mentioned artificial neural subnetworks (1) to compensate for the output signal error of the first artificial neural subnetwork (1) and sending the output of said sequential artificial neural subnetwork (16, 20) to the main decision layer (84, 97) as input. . Being a method appropriate to Claim 21, transforming the output of said sequential artificial neural subnetwork (16, 20) as input to the main decision layer. . Being a method appropriate to Claim 21 or 22, previously mentioned input signal array contains values taken from multiple sampling windows sizes and time offsets. . Being a method appropriate to Claim 16, in case of continuous input, continuing the training by using the current values as label data by applying the input values of the artificial neural subnetworks (1, 16, 20) by shifting them back to the previous value. . Being a method appropriate to Claim 16, using wavelet transform as a feature extraction method. . Being a method appropriate to Claim 16, using inputs that have non-linear contextual relations with each other and/or the current error residue together and with separate feature extractions as inputs. . Being a method appropriate to any Claim between Claim 16 and Claim 26, being the normalization process of the previously mentioned transform process. . Being a method appropriate to Claim 26 or 27, applying the transformation process to the continuously moving average method. . Being a method appropriate to Claim 16, the above-mentioned main decision layer (84, 97) is an artificial neural network.
30. Being a method appropriate to Claim 16, the aforementioned main layer (84, 97) has been provided in such a way as to collect and combine the current error residual and the prediction values obtained from the artificial neural subnetworks (1, 16, 20).
31. Being a method appropriate to Claim 16, previously mentioned input being the stock values and the predicted output being the next value of the stock.
32. Being a device, which is configured to train computer-based prediction systems, comprising;
A memory unit and a processing unit that stores the suggestions to be executed by a process person to execute a predictive error reduction network-based prediction method that takes a signal or data as input and predicts or classifies its next value.
According to the instructions in the memory unit of the mentioned processing unit,
An artificial neural network, the application of feature extraction by transformation after normalization of the value sequences of the signal or data sequence or pattern in the input and the additional data correlated with the target, sending the normalized output value (96) of the first artificial neural subnetwork (1) to the first level correction network array as a raw prediction value, as the composite predictive value (70) of the corrected data by predicting the error in the prediction value of the previous artificial neural network (1), of the next neural subnetworks (16), is determined by subtracting the “prediction error” prediction value (17) from the first denormalized prediction value (96); prediction subnetworks generate (86) a new predictive value that is constantly corrected by subtracting the error of this value from the composite prediction value (70) remaining from the previous network, by similarly predicting (85) by subsequent subnets (84); starting from the second artificial neural network (11), in the training of the subnets, the input data series (2) will be shifted back one by one and the difference between the previous prediction value and the actual input value (66) will be used as the target label, and the processor is configured to generate a prediction of the next value of the signal or data input at the output of the main decision layer (84, 97) in response to a current signal or data input.
33. Being a device appropriate to Claim 32, the processing unit is configured, in the first level subnets (16, 84, 20), the first prediction neural network continues to reduce the reduced error value further and it will apply the error prediction outputs (5, 17, 69) of the networks predicting its remaining error values to a separate artificial neural network (94) input for fusion purposes; this last level is the fact that the network is trained directly with the target prediction or classification label value and is configured to obtain the denormalized value (49) of the output (48) as the high-performance prediction or classification value.
34. Being a device appropriate to Claim 32, the processing unit is configured to reduce the error (101) by predicting the next value of the denormalized value of the output (21) and subtracting the future error prediction value (95) based on the error sequence of the composite predictive value remaining from the previous ones and as the input data of the last of the neural subnetworks (20) at the first level, the difference between the previous value (9) and the current input (2) of the prediction data (70) with only compensated error value is the value change data sequence (10,11,12,13); and for this purpose, it will implement a backward shift (19) of the input data array during the training of the last subnet (20) and will use the last error value (10) as the target prediction label.
35. Being a device appropriate to Claim 32, the processing unit is configured to send the nondenormalized output (5) of the first neural subnetwork (1) as an input to a main decision layer (84).
36. Being a device appropriate to Claim 32, the signal or data sequence or pattern at the input is configured to decompose at least two different frequency bands and to apply feature extraction by transformation after normalization of the value sequences of the said frequency bands and the additional data correlated with the target.
37. Being a device appropriate to Claim 32, if the input of the processing unit contains more than one type of signal, using another artificial neural sub network that is provided sequentially according to said artificial neural subnetworks (16, 20) to compensate for the output signal error of the first artificial neural subnetwork (1) and that the above-mentioned sequential neural subnetwork (16, 20) is configured to send its output as input to the main decision layer (87, 94).
38. Being a device appropriate to Claim 37, the processing unit is configured to transform the output of said sequential artificial neural subnetwork (16, 20) and give it as input to the main decision layer (87, 94).
39. Being a device appropriate to Claim 32, the processing unit is configured to process the values received from a plurality of sampling windows of sizes and time offsets of the previously mentioned input signal array.
40. Being a device appropriate to Claim 32, in the case of continuous input of the processing unit, the artificial neural subnetworks (1, 16, 20) are configured to continue the training by using the current values as label data by applying the input values by shifting them back to the previous value.
41. Being a device appropriate to Claim 32, the processing unit is configured to use wavelet transform as the feature extraction method.
42. Being a device appropriate to Claim 32, the processing unit is configured to use inputs that have non-linear contextual relations with each other and/or the current error residue together and with separate feature extractions as inputs.
43. Being a device appropriate to any Claim between Claim 32 and 42, the processing unit is configured to perform the above-mentioned transformation as a normalization operation.
44. Being a device appropriate to Claim 42 or 43, the processing unit is configured to perform the transforming process using the continuous moving average method.
45. Being a device appropriate to Claim 32, the processing unit is configured to operate the mentioned main layer (87, 94) as an artificial neural network.
46. Being a device appropriate to Claim 32, the processing unit is configured to aggregate the current residual error and the prediction values provided from the artificial neural subnetworks (1, 16, 20) of the above-mentioned main layer (87, 94) and concatenate them.
PCT/TR2022/051352 2021-11-24 2022-11-24 High performance machine learning system based on predictive error compensation network and the associated device WO2023113729A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR2021/018443 2021-11-24
TR2021018443 2021-11-24

Publications (1)

Publication Number Publication Date
WO2023113729A1 true WO2023113729A1 (en) 2023-06-22

Family

ID=86773293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2022/051352 WO2023113729A1 (en) 2021-11-24 2022-11-24 High performance machine learning system based on predictive error compensation network and the associated device

Country Status (1)

Country Link
WO (1) WO2023113729A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881854A (en) * 2023-09-08 2023-10-13 国际关系学院 XGBoost-fused time sequence prediction method for calculating feature weights

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735580B1 (en) * 1999-08-26 2004-05-11 Westport Financial Llc Artificial neural network based universal time series
CN113093545A (en) * 2021-04-01 2021-07-09 重庆大学 Linear servo system thermal error modeling method and compensation system based on energy balance
CN114310911A (en) * 2022-02-08 2022-04-12 天津大学 Neural network-based dynamic error prediction and compensation system and method for driving joint

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735580B1 (en) * 1999-08-26 2004-05-11 Westport Financial Llc Artificial neural network based universal time series
CN113093545A (en) * 2021-04-01 2021-07-09 重庆大学 Linear servo system thermal error modeling method and compensation system based on energy balance
CN114310911A (en) * 2022-02-08 2022-04-12 天津大学 Neural network-based dynamic error prediction and compensation system and method for driving joint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
USTUNDAG BURAK BERK; KULAGLIC AJLA: "High-Performance Time Series Prediction With Predictive Error Compensated Wavelet Neural Networks", IEEE ACCESS, IEEE, USA, vol. 8, 24 November 2020 (2020-11-24), USA , pages 210532 - 210541, XP011823229, DOI: 10.1109/ACCESS.2020.3038724 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881854A (en) * 2023-09-08 2023-10-13 国际关系学院 XGBoost-fused time sequence prediction method for calculating feature weights
CN116881854B (en) * 2023-09-08 2023-12-22 国际关系学院 XGBoost-fused time sequence prediction method for calculating feature weights

Similar Documents

Publication Publication Date Title
CN111563706A (en) Multivariable logistics freight volume prediction method based on LSTM network
Maca et al. Forecasting SPEI and SPI drought indices using the integrated artificial neural networks
CN112508243B (en) Training method and device for multi-fault prediction network model of power information system
CN111738477B (en) Power grid new energy consumption capability prediction method based on deep feature combination
WO2023113729A1 (en) High performance machine learning system based on predictive error compensation network and the associated device
CN112232604A (en) Prediction method for extracting network traffic based on Prophet model
CN111985825A (en) Crystal face quality evaluation method for roller mill orientation instrument
US20200372295A1 (en) Minimum-Example/Maximum-Batch Entropy-Based Clustering with Neural Networks
CN114626585A (en) Urban rail transit short-time passenger flow prediction method based on generation of countermeasure network
CN111353534B (en) Graph data category prediction method based on adaptive fractional order gradient
Inage et al. Application of Monte Carlo stochastic optimization (MOST) to deep learning
CN114358389A (en) Short-term power load prediction method combining VMD decomposition and time convolution network
Abdulsalam et al. Electrical energy demand forecasting model using artificial neural network: A case study of Lagos State Nigeria
CN111340107A (en) Fault diagnosis method and system based on convolutional neural network cost sensitive learning
CN110740063B (en) Network flow characteristic index prediction method based on signal decomposition and periodic characteristics
CN112686367A (en) Novel normalization mechanism
CN116402123A (en) Pre-training model fine tuning method and system based on learning strategy
CN116307206A (en) Natural gas flow prediction method based on segmented graph convolution and time attention mechanism
CN115169747A (en) Method and device for predicting non-stationary time sequence of power load and related equipment
Abdulkadir et al. An enhanced ELMAN-NARX hybrid model for FTSE Bursa Malaysia KLCI index forecasting
CN115081323A (en) Method for solving multi-objective constrained optimization problem and storage medium thereof
Haq et al. Intelligent ehrs: predicting procedure codes from diagnosis codes
Abbas et al. Volterra system identification using adaptive genetic algorithms
CN110990766A (en) Data prediction method and storage medium
CN113052388A (en) Time series prediction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22908109

Country of ref document: EP

Kind code of ref document: A1