EP3507746A1 - Systems and methods for learning and predicting time-series data using deep multiplicative networks - Google Patents
Systems and methods for learning and predicting time-series data using deep multiplicative networksInfo
- Publication number
- EP3507746A1 EP3507746A1 EP17847459.9A EP17847459A EP3507746A1 EP 3507746 A1 EP3507746 A1 EP 3507746A1 EP 17847459 A EP17847459 A EP 17847459A EP 3507746 A1 EP3507746 A1 EP 3507746A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- layer
- computational network
- information
- current
- feedback information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims description 36
- 238000012549 training Methods 0.000 claims description 29
- 238000011176 pooling Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 7
- 238000012805 post-processing Methods 0.000 claims description 5
- 239000010410 layer Substances 0.000 description 123
- 238000004891 communication Methods 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 9
- 230000001364 causal effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000003111 delayed effect Effects 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000001994 activation Methods 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- This disclosure relates generally to machine learning and data prediction.
- this disclosure relates to systems and methods for learning and predicting time-series data using deep multiplicative networks.
- Machine learning generally refers to computing technology that is designed to learn from and perform predictive analysis on data.
- Neural networks are one example type of machine learning technique based on biological networks, such as the human brain.
- data processing is performed using artificial neurons, which are coupled together and exchange processed data over various communication links.
- the "learning" aspect of neural networks can be implemented by altering weights associated with the communication links so that some data is treated as being more important than other data.
- time series prediction refers to a prediction made by a machine learning algorithm using time-series data, such as data values that are collected over time via one or more sensory inputs.
- Time series prediction is an important component of intelligence. For example, an intelligent entity's ability to predict a time series of inputs can allow the intelligent entity to create a model of the world (or some smaller portion thereof).
- This disclosure provides systems and methods for learning and predicting time-series data using deep multiplicative networks.
- a method includes using a computational network to learn and predict time-series data.
- the computational network includes one or more layers, and each layer includes an encoder and a decoder.
- the encoder of each layer multiplicatively combines (i) current feed-forward information from a lower layer or a computational network input and (ii) past feedback information from a higher layer or that layer.
- the encoder of each layer generates current feed-forward information for the higher layer or that layer.
- the decoder of each layer multiplicatively combines (i) current feedback information from the higher layer or that layer and (ii) at least one of the current feed-forward information from the lower layer or the computational network input or past feed-forward information from the lower layer or the computational network input.
- the decoder of each layer generates current feedback information for the lower layer or a computational network output.
- an apparatus in a second embodiment, includes at least one processing device and at least one memory storing instructions that, when executed by the at least one processing device, cause the at least one processing device to learn and predict time-series data using a computational network.
- the computational network includes one or more layers, and each layer includes an encoder and a decoder.
- the encoder of each layer is configured to multiplicatively combine (i) current feed-forward information from a lower layer or a computational network input and (ii) past feedback information from a higher layer or that layer.
- the encoder of each layer is configured to generate current feed-forward information for the higher layer or that layer.
- the decoder of each layer is configured to multiplicatively combine (i) current feedback information from the higher layer or that layer and (ii) at least one of the current feed-forward information from the lower layer or the computational network input or past feed-forward information from the lower layer or the computational network input.
- the decoder of each layer is configured to generate current feedback information for the lower layer or a computational network output.
- a non-transitory computer readable medium contains instructions that, when executed by at least one processing device, cause the at least one processing device to learn and predict time-series data using a computational network.
- the computational network includes one or more layers, and each layer includes an encoder and a decoder.
- the encoder of each layer is configured to multiplicatively combine (i) current feedforward information from a lower layer or a computational network input and (ii) past feedback information from a higher layer or that layer.
- the encoder of each layer is configured to generate current feed-forward information for the higher layer or that layer.
- the decoder of each layer is configured to multiplicatively combine (i) current feedback information from the higher layer or that layer and (ii) at least one of the current feed-forward information from the lower layer or the computational network input or past feed-forward information from the lower layer or the computational network input.
- the decoder of each layer is configured to generate current feedback information for the lower layer or a computational network output.
- FIGURE 1 illustrates an example architecture implementing a deep multiplicative network for learning and predicting time-series data according to this disclosure
- FIGURE 2 illustrates an example system for learning and predicting time- series data using deep multiplicative networks according to this disclosure
- FIGURE 3 illustrates an example method for learning and predicting time- series data using deep multiplicative networks according to this disclosure.
- FIGURES 1 through 3 discussed below, and the various embodiments used to describe the principles of the present invention in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the invention. Those skilled in the art will understand that the principles of the invention may be implemented in any type of suitably arranged device or system.
- time series prediction is an important component of intelligence, such as when it allows an intelligent entity (like a person) to create a predictive model of the world around him or her.
- Motor intent by an intelligent entity may naturally form part of a time series, as well.
- “Motor intent” generally refers to intended motor movements associated with neural signals, such as moving one's arm or leg or opening/closing one's hand based on different neural signals. Predictions that include past motor intent allow modeling of the effects of that motor intent on the surrounding world.
- an intelligent entity includes a control system that can compute optimal motor intents relative to some high-level goals for affecting the world, the ability to predict future motor intents can occur more accurately without always having to perform a full optimization, which can provide enormous savings in computations and energy usage.
- time-series data that can be learned and predicted include natural language (including text or voice) and video.
- a recurrent neural network refers to a neural network where connections between nodes in the network form a "directed cycle" or a closed loop in which no repetitions of nodes and connections are allowed except for the starting and ending node (which represent the same node). More recently, deep recurrent networks using long short-term memory have been devised. While these networks use some multiplicative elements, they are primarily additive in order to make back-propagation feasible.
- a device, system, method, and computer readable medium for learning and predicting time-series data are provided.
- the learning and predicting are accomplished by (i) abstracting high-level information through a multiplicative combination (with optional pooling) of current low-level information and past high-level information and (ii) feeding back future predictions of a time series through a multiplicative combination of predicted future high-level information and current and/or past low-level information.
- a deep recurrent network is formed by combining feed-forward and feedback through multiplicative combination of the high-level and low-level information.
- FIGURE 1 illustrates an example architecture 100 implementing a deep multiplicative network for learning and predicting time-series data according to this disclosure.
- the architecture 100 includes one or more layers 102a- 102c.
- the architecture 100 includes three layers 102a-102c, although other numbers of layers could be used in the architecture 100.
- the layers 102a-102c include encoders 104a-104c, respectively, and decoders
- the encoders 104a- 104c are configured to generate and output feedforward information 108a-108c, respectively.
- the encoders 104a-104b in the layers 102a- 102b are configured to output the feed-forward information 108a-108b to the next-higher layers 102b- 102c.
- the encoder 104c of the highest layer 102c is configured to output the feed-forward information 108c for use by the highest layer 102c itself (for both feedback and feed-forward purposes).
- the decoders 106a- 106c are configured to generate and output feedback information l lOa-l lOc, respectively.
- the decoders 106b-106c in the layers 102b-102c are configured to output the feedback information 110a- 110b to the next-lower layers 102a- 102b.
- the decoder 106a of the lowest layer 102a is configured to output the feedback information 110a from the architecture 100.
- Feed-forward information is received into the lowest layer 102a of the architecture 100 as inputs 112.
- a single input 112 represents a current time series value, and multiple inputs 112 represent a sequence of values provided into the lowest layer 102a forming a time series of data.
- Feedback information is provided from the lowest layer 102a of the architecture 100 as predicted next values 114.
- a single predicted next value 114 represents a predicted future time series value, and multiple predicted next values 114 represent a sequence of values provided from the lowest layer 102a forming a predicted time series of data.
- the highest feed-forward (and first feedback) information 108c represents the highest-level encoding of the input time-series data.
- the layers 102a-102c also include delay units 116a-116c, respectively, for feedback and optionally delay units 118a- 118c, respectively, for feed-forward.
- the delay units 116a- 116c are configured to receive feedback information and to delay that information by one or more units of time. In some embodiments, the delay units 116a- 116c may provide different delays of information, such as when the delay(s) for the higher layers is/are longer than the delay(s) for the lower layers. The delayed information is then provided from the delay units 116a-116c to the encoders 104a-104c.
- the delay units 118a-118c are configured to receive inputs 112 or feed-forward information and to potentially delay that information by zero or more units of time.
- the delay units 118a- 118c may provide different delays of information, such as when the delay(s) for the higher layers is/are longer than the delay(s) for the lower layers.
- the (potentially) delayed information is provided from the delay units 118a- 118c to the decoders 106a- 106c.
- the inputs 112 or feed-forward information 108a-108b provided to the encoders 104a- 104c may, in some embodiments, be passed through non-linear pooling units 120a- 120c, respectively.
- the pooling units 120a- 120c operate to reduce the dimensionality of the data in a manner that increases its transformation-invariance.
- so-called I 2 pooling units can provide invariance to unitary group representations, such as translation and rotation.
- Each of the encoders 104a- 104c and decoders 106a- 106c is configured to multiplicatively combine its inputs.
- each encoder 104a- 104c is configured to multiplicatively combine (i) current (and possibly pooled) feed-forward information from a lower layer or an input 112 and (ii) delayed feedback information from a higher layer (or its own layer at the top) to produce current feed-forward information.
- the current feed-forward information is then provided to a higher layer (or to that same layer at the top).
- Each decoder 106a- 106c is configured to multiplicatively combine (i) current feedback information from a higher layer (or its own layer at the top) and (ii) current (and possibly delayed) feed-forward information from the lower layer or an input 112 to produce current feedback information.
- the current feedback information is then provided to a lower layer or as a predicted next value.
- the feed-forward information 108c from the highest layer 102c is fed back to itself, delayed as appropriate by the delay unit 1 16c.
- the architecture 100 shown in FIGURE 1 can be used to implement an auto-encoder.
- An "auto-encoder” is a type of neural network or other machine learning algorithm that attempts to generate an encoding for a set of data.
- the encoding denotes a representation of the set of data but with reduced dimensionality. In the ideal case, the encoding allows the auto-encoder to predict future values in time-series data based on initial values in the time-series data. The ability to predict time-series data can find use in a large number of applications.
- each decoder 106a- 106c multiplicatively combine (i) current feedback information from a higher layer (or its own layer at the top) and (ii) current and past feed-forward information from the lower layer or an input 112. This allows the network to generalize an inertial auto-encoder, which uses an inertial combination of a current feed-forward value, one past feed-forward value, and invariant higher-level feedback.
- the network implementing an auto-encoder is generally designed so that its outputs approximately reproduce its inputs.
- an auto- encoder When applied to time-series data, an auto- encoder is "causal" in the sense that only past information is used to reproduce future information. Iteratively, such a causal auto-encoder can reproduce the whole time series from itself, meaning the causal auto-encoder can identify the entire time series based on the time series' initial values.
- the encoding of the inputs 112 by the layers 102a-102c is done so that the final encoded representation of the inputs 112 (the information 108c) is highly constrained (such as sparse).
- the encoded representation of the inputs 112 can also ideally be used to generate the predicted next values 1 14, which represent an approximate reproduction of the inputs 1 12.
- a causal auto-encoder would approximately reproduce future inputs as the predicted next values 1 14 based on past inputs 1 12, allowing the causal auto-encoder to make predictions for the time-series data.
- a causal auto-encoder could be most useful when the ultimate encoding is as high-level and invariant as possible so that the same encoding can be used for many time steps. Invariance can be achieved in FIGURE 1 through pooling and/or through multiplicative encoding of the time-series data into an encoding of lower dimensionality. However, to approximately reproduce the original input 112 one time step later (as required of a causal auto-encoder), discarded low-level information needs to be added back into the calculations.
- the feed-forward information 108a- 108b can be used to compute a high-level invariant encoding (the information 108c), and the feedback information 110a- 110b through the same network can be used to enrich the predicted next value 114 with non-invariant information via use of multiplicative decoding.
- Each of the layers 102a- 102c includes any suitable structure(s) for encoding data, providing dimensional reduction, or performing any other suitable processing operations.
- each of the layers 102a-102c could be implemented using hardware or a combination of hardware and software/firmware instructions.
- the multiplicative combination in each of the encoders 104a- 104c and decoders 106a- 106c may take various forms.
- the multiplicative combination could include a numerical multiplication or a Boolean AND function.
- the multiplicative combination generally forms part of the transfer function of the encoding or decoding node, which may contain or perform other mathematical operations as well (such as sigmoid damping of an input signal).
- the multiplicative combination could provide some approximation of a Boolean AND operation, allowing the node to operate as a general-state machine. As a result, the node could check whether an input is x AND a state is y and, if so, determine that the new state should be z.
- the architecture 100 can be trained so that the encoders 104a- 104c, decoders 106a- 106c, and delay units 116a- 116c, 118a-118c function as desired. Deep multiplicative networks have generally been avoided up to this point because it is difficult to train pure feed-forward deep multiplicative networks using standard back-propagation techniques.
- the training approach for the architecture 100 (combining feed-forward and feedback units) is to repeatedly apply the following steps given time-series training data.
- post-processing could also be performed at each encoder 104a- 104c and/or decoder 106a- 106c, such as by normalization and/or sparsification of its weights. This results in stable convergence to a locally optimal network.
- the encoders 104a-104c can alternatively be trained using a sparse coding technique as adapted for the recurrent and multiplicative nature of the architecture 100.
- the training involves alternatingly (i) updating the encoder's weights and (ii) updating the encoded states (which are not just outputs but used as inputs for the training due to the recurrence of the architecture 100).
- the activations of the outputs of the encoder are normalized individually across the training set and in aggregate across each training pair of input and state. All weights of the encoder are then shrunk by a fixed amount. This combination of normalization and shrinking tends to make the weights sparse.
- Sparseness can be particularly useful for multiplicative networks since the total number of possible weights is very large.
- the associated decoder in that layer can be trained, such as by using a frequency analysis of how these coded states combine with actual future values of the time series.
- any other suitable training mechanisms can be used with the components of the architecture 100.
- the architecture 100 shown in FIGURE 1 can find use in a number of applications.
- the architecture 100 can be applied to natural language understanding and generation.
- the architecture 100 includes four levels.
- the four layers of the architecture 100 moving up in the architecture 100
- the four layers of the architecture 100 moving down in the architecture 100
- each layer (except the lowest layer 102a) would switch states more slowly than its adjacent lower layer since the information at that layer represents more invariant encoded state. The less invariant information used to predict lower-level information is then fed back into the predictions through the decoders. Both the encoders and decoders, due to their multiplicative nature, can be thought of as "state machines" that represent the grammar of that particular level of abstraction.
- FIGURE 1 illustrates one example of an architecture 100 implementing a deep multiplicative network for learning and predicting time-series data
- the architecture 100 need not include three layers and could include other numbers of layers in any suitable arrangement (including a single layer).
- FIGURE 2 illustrates an example system 200 for learning and predicting time- series data using deep multiplicative networks according to this disclosure.
- the system 200 denotes a computing system that includes at least one processing device 202, at least one storage device 204, at least one communications unit 206, and at least one input/output (I/O) unit 208.
- I/O input/output
- the processing device 202 executes instructions that may be loaded into a memory 210.
- the processing device 202 includes any suitable number(s) and type(s) of processors or other devices in any suitable arrangement.
- Example types of processing devices 202 include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays, application specific integrated circuits, and discrete circuitry.
- the memory device 210 and a persistent storage 212 are examples of storage devices 204, which represent any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, and/or other suitable information on a temporary or permanent basis).
- the memory device 210 may represent a random access memory or any other suitable volatile or non-volatile storage device(s).
- the persistent storage 212 may contain one or more components or devices supporting longer-term storage of data, such as a read only memory, hard drive, Flash memory, or optical disc.
- the communications unit 206 supports communications with other systems or devices.
- the communications unit 206 could include a network interface card or a wireless transceiver facilitating communications over a wired or wireless network.
- the communications unit 206 may support communications through any suitable physical or wireless communication link(s).
- the I/O unit 208 allows for input and output of data.
- the I/O unit 208 may provide a connection for user input through a keyboard, mouse, keypad, touchscreen, or other suitable input device.
- the I/O unit 208 may also send output to a display, printer, or other suitable output device.
- the instructions executed by the processing device 202 could include instructions that implement the architecture 100 of FIGURE 1.
- the instructions executed by the processing device 202 could include instructions that implement the various encoders, decoders, and delay units shown in FIGURE 1, as well as instructions that support the data flows and data exchanges involving these components.
- FIGURE 2 illustrates one example of a system 200 for learning and predicting time-series data using deep multiplicative networks
- various changes may be made to FIGURE 2.
- the architecture 100 of FIGURE 1 is implemented using software/firmware that is executed by the processing device 202.
- any suitable hardware-only implementation or any suitable hardware and software/firmware implementation could be used to implement this functionality.
- computing devices come in a wide variety of configurations, and FIGURE 2 does not limit this disclosure to any particular computing device.
- FIGURE 3 illustrates an example method 300 for learning and predicting time-series data using deep multiplicative networks according to this disclosure.
- the method 300 is described as being implemented using the architecture 100 of FIGURE 1 by the device 200 of FIGURE 2. Note, however, that the method 300 could be implemented in any other suitable manner.
- a computational network is trained at step 302. This could include, for example, the processing device 202 of the device 200 receiving training time-series data and providing the data to the architecture 100 of FIGURE 1.
- the architecture 100 includes one or more layers 102a-102c, each of which includes a respective encoder 104a- 104c and a respective decoder 106a- 106c.
- the training could occur by repeatedly performing the following operations. Forward-propagate the training data through the encoders 104a- 104c and delay units 116a-116c/118a-118c for each time step. Back-propagate the training data through the decoders 106a- 106c for each time step.
- Input time-series data is received at the computational network at step 304.
- each encoder thereby generates current feed-forward information for a higher layer or for itself at step 308.
- each decoder thereby generates current feedback information for a lower layer or for itself at step 312. This could include, for example, the processing device 202 of the device 200 using the decoder 106c to multiplicatively combine feedback information (the information 108c) from the encoder 104c with current/past feed-forward information from the encoder 104b of the layer 102b.
- each layer 102a-102b other than the highest layer 102c sends its current feed-forward information to a next-higher layer 102b- 102c
- each layer 102b-102c other than the lowest layer 102a sends its current feedback information to the next-lower layer 102a-102b.
- the highest layer 102c uses its current feed-forward information 108c as its current feedback information
- the lowest layer 102a sends its current feedback information to the computational network output as the predicted next values 114.
- the current feed-forward information provided to the lowest layer 102a represents a current time-series value
- the current feedback information provided from the lowest layer 102a represents a predicted future time series value.
- past feedback information can be generated by delaying current feedback information from a higher layer or itself.
- past feed-forward information can be generated by delaying current feed-forward information from a lower layer.
- current feed-forward information provided to an encoder 104a-104c could first be passed through a pooling unit 120a-120c to reduce the dimensionality or increase the transformation-invariance of the time-series data.
- the computational network is used to predict the time-series data at step 314.
- FIGURE 3 illustrates one example of a method 300 for learning and predicting time-series data using deep multiplicative networks
- various changes may be made to FIGURE 3.
- steps could overlap, occur in parallel, occur in a different order, or occur any number of times.
- steps 306-314 could generally overlap with each other.
- various functions described in this patent document are implemented or supported by a computer program that is formed from computer readable program code and that is embodied in a computer readable medium.
- computer readable program code includes any type of computer code, including source code, object code, and executable code.
- computer readable medium includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory.
- ROM read only memory
- RAM random access memory
- CD compact disc
- DVD digital video disc
- a “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals.
- a non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
- application and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, obj ects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer code (including source code, object code, or executable code).
- program refers to one or more computer programs, software components, sets of instructions, procedures, functions, obj ects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer code (including source code, object code, or executable code).
- communicate as well as derivatives thereof, encompasses both direct and indirect communication.
- the term “or” is inclusive, meaning and/or.
- phrases "associated with,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like.
- the phrase "at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, "at least one of: A, B, and C" includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B and C.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662382774P | 2016-09-01 | 2016-09-01 | |
US15/666,379 US10839316B2 (en) | 2016-08-08 | 2017-08-01 | Systems and methods for learning and predicting time-series data using inertial auto-encoders |
US15/681,942 US11353833B2 (en) | 2016-08-08 | 2017-08-21 | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
PCT/US2017/049358 WO2018045021A1 (en) | 2016-09-01 | 2017-08-30 | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3507746A1 true EP3507746A1 (en) | 2019-07-10 |
EP3507746A4 EP3507746A4 (en) | 2020-06-10 |
Family
ID=61301606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP17847459.9A Withdrawn EP3507746A4 (en) | 2016-09-01 | 2017-08-30 | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3507746A4 (en) |
CN (1) | CN109643387A (en) |
AU (1) | AU2017321524B2 (en) |
CA (1) | CA3033753A1 (en) |
WO (1) | WO2018045021A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175338B (en) * | 2019-05-31 | 2023-09-26 | 北京金山数字娱乐科技有限公司 | Data processing method and device |
CN111241688B (en) * | 2020-01-15 | 2023-08-25 | 北京百度网讯科技有限公司 | Method and device for monitoring composite production process |
CN111709785B (en) * | 2020-06-18 | 2023-08-22 | 抖音视界有限公司 | Method, apparatus, device and medium for determining user retention time |
CN112581031B (en) * | 2020-12-30 | 2023-10-17 | 杭州朗阳科技有限公司 | Method for implementing real-time monitoring of motor abnormality by Recurrent Neural Network (RNN) through C language |
CN114024587B (en) * | 2021-10-29 | 2024-07-16 | 北京邮电大学 | Feedback network encoder, architecture and training method based on full-connection layer sharing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6125105A (en) * | 1997-06-05 | 2000-09-26 | Nortel Networks Corporation | Method and apparatus for forecasting future values of a time series |
US9146546B2 (en) * | 2012-06-04 | 2015-09-29 | Brain Corporation | Systems and apparatus for implementing task-specific learning using spiking neurons |
TR201514432T1 (en) * | 2013-06-21 | 2016-11-21 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Method for pseudo-recurrent processing of data using a feedforward neural network architecture |
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
-
2017
- 2017-08-30 CA CA3033753A patent/CA3033753A1/en active Pending
- 2017-08-30 CN CN201780053794.XA patent/CN109643387A/en active Pending
- 2017-08-30 AU AU2017321524A patent/AU2017321524B2/en not_active Ceased
- 2017-08-30 WO PCT/US2017/049358 patent/WO2018045021A1/en unknown
- 2017-08-30 EP EP17847459.9A patent/EP3507746A4/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
WO2018045021A1 (en) | 2018-03-08 |
EP3507746A4 (en) | 2020-06-10 |
CN109643387A (en) | 2019-04-16 |
CA3033753A1 (en) | 2018-03-08 |
AU2017321524B2 (en) | 2022-03-10 |
AU2017321524A1 (en) | 2019-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11353833B2 (en) | Systems and methods for learning and predicting time-series data using deep multiplicative networks | |
AU2017321524B2 (en) | Systems and methods for learning and predicting time-series data using deep multiplicative networks | |
CN110738026B (en) | Method and device for generating description text | |
CN112288075B (en) | Data processing method and related equipment | |
CN116415654A (en) | Data processing method and related equipment | |
Neruda et al. | Learning methods for radial basis function networks | |
JP2024036354A (en) | Contrast prior training for language task | |
US11341413B2 (en) | Leveraging class information to initialize a neural network language model | |
Patil et al. | LSTM based Ensemble Network to enhance the learning of long-term dependencies in chatbot | |
WO2023231513A1 (en) | Conversation content generation method and apparatus, and storage medium and terminal | |
Huang et al. | Text classification in memristor-based spiking neural networks | |
CN115809464A (en) | Knowledge distillation-based light-weight source code vulnerability detection method | |
Morioka et al. | Multiscale recurrent neural network based language model. | |
Vima et al. | Enhancing inference efficiency and accuracy in large language models through next-phrase prediction | |
US10839316B2 (en) | Systems and methods for learning and predicting time-series data using inertial auto-encoders | |
Cho et al. | Parallel parsing in a Gradient Symbolic Computation parser | |
CN114547308B (en) | Text processing method, device, electronic equipment and storage medium | |
Wakchaure et al. | A scheme of answer selection in community question answering using machine learning techniques | |
Gordon et al. | Long distance relationships without time travel: Boosting the performance of a sparse predictive autoencoder in sequence modeling | |
CN114818690A (en) | Comment information generation method and device and storage medium | |
Saravani et al. | Persian language modeling using recurrent neural networks | |
Monteleone | Nooj for artificial intelligence: An anthropic approach | |
Vavra et al. | Optimization of the novelty detection model based on LSTM autoencoder for ICS environment | |
Kang et al. | Seq-dnc-seq: Context aware dialog generation system through external memory | |
Abeyruwan et al. | Rllib: C++ library to predict, control, and represent learnable knowledge using on/off policy reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20190213 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20200512 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/08 20060101ALI20200506BHEP Ipc: G06N 3/04 20060101ALI20200506BHEP Ipc: G06N 3/00 20060101AFI20200506BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230421 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20230902 |