CN112585621A

CN112585621A - Characterizing activity in a recurrent artificial neural network and encoding and decoding information

Info

Publication number: CN112585621A
Application number: CN201980054063.6A
Authority: CN
Inventors: H·马克莱姆; R·利维; K·P·赫斯贝尔瓦尔德
Original assignee: Inet Co ltd
Current assignee: Inet Co ltd; INAIT SA
Priority date: 2018-06-11
Filing date: 2019-06-06
Publication date: 2021-03-30
Also published as: KR20210008858A; CN112567389A; EP3803707A1; WO2019238512A1; EP3803708A1; EP3803699A1; WO2019238522A1; TW202001693A; KR102526132B1; WO2019238523A1; EP3803705A1; CN112567390A; CN112567388A; KR102465409B1; KR20210008417A; TWI822792B; CN112567387A; KR20210010894A; KR102488042B1; WO2019238483A1

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for characterizing activity in a recurrent artificial neural network and encoding and decoding information. In one aspect, a method may include outputting numbers from a recurrent artificial neural network, where each number represents whether an activity in a particular set of nodes in the recurrent artificial neural network is commensurate with a corresponding pattern of activity.

Description

Characterizing activity in a recurrent artificial neural network and encoding and decoding information

Background

This specification relates to the characterization of activities in a recurrent artificial neural network (recurrent cognitive neural network). Characterization of activities can be used, for example, in the identification of decision moments (decision moments) and in encoding/decoding signals in scenarios such as transmission, encryption and data storage. It also relates to systems and techniques for encoding and decoding information, and using the encoded information in various scenarios. The encoded information may represent activity in a neural network (e.g., a recurrent neural network).

Artificial neural networks are devices inspired by structural and functional aspects of biological neuronal networks. In particular, artificial neural networks use a system of interconnected constructs called nodes to model the information coding and other processing capabilities of biological neuronal networks. The arrangement and strength of the connections between nodes in an artificial neural network determine the result of information processing or information storage by the artificial neural network.

Neural networks may be trained to produce a desired signal flow in the network and to achieve desired information processing or information storage results. Typically, training a neural network will change the placement and/or strength of connections between nodes during the learning phase. A neural network may be considered trained when it achieves a sufficiently appropriate processing result for a given set of inputs.

Artificial neural networks may be used in a variety of different devices to perform nonlinear data processing and analysis. The nonlinear data processing does not satisfy the principle of superposition (i.e., the variable to be determined cannot be written as a linear sum of the independent components). Examples of scenarios in which non-linear data processing is useful include pattern and sequence recognition (pattern and sequence recognition), speech processing, novelty detection and sequential decision-making, complex system modeling, and systems and techniques in a wide variety of other scenarios.

Both encoding and decoding convert information from one form or representation to another. Different representations may provide different features that are more or less useful in different applications. For example, some forms or representations of information (e.g., natural language) may be more easily understood by humans. Other forms or representations may be smaller in size (e.g., "compressed") and easier to transmit or store. Still other forms or representations may intentionally obscure the information content (e.g., the information may be cryptographically encoded).

Regardless of the particular application, the encoding or decoding process will typically follow a predefined set of rules or algorithms that establish correspondence between different forms or representations of information. For example, the encoding process that produces binary code may assign a role or meaning to individual bits based on their position in a binary sequence or vector.

Disclosure of Invention

This specification describes technologies relating to the characterization of activities in artificial neural networks.

For example, a method for identifying decision moments in a neural network includes: determining a complexity of a pattern of activity in a recurrent artificial neural network, wherein the activity is responsive to an input into the recurrent artificial neural network; determining a particular time (timing, time) of an activity having a complexity distinguishable from other activities in response to the input; and identifying the decision time based on the particular time of the activity having distinguishable complexity.

As another example, a method for characterizing activity in a recurrent artificial neural network includes identifying clique patterns (clique patterns) of activity of the recurrent artificial neural network. The method is performed by a data processing apparatus.

As another example, a method may include outputting a binary sequence of 0 s and 1 s from a recurrent artificial neural network, wherein each number in the sequence represents whether a particular group of nodes in the recurrent artificial neural network exhibit a corresponding pattern of activity.

As another example, a method of structuring a recurrent artificial neural network may include: characterizing a complexity of a pattern of activity that may occur in the recurrent artificial neural network, the recurrent artificial neural network comprising a structured set of nodes and links between the nodes; and evolving (evolve) a structure of the recurrent artificial neural network to increase the complexity of the pattern of activity. This method of structuring may also be used, for example, as part of a method of training the recurrent artificial neural network.

Other embodiments of these aspects include corresponding systems, apparatus, and computer programs configured to perform the actions of the methods encoded on computer storage devices.

Particular embodiments of the subject matter described in this specification can be implemented to realize one or more of the following advantages. For example, conventional data processing apparatus, such as, for example, digital computers and other computers, are programmed to follow a predefined logical sequence when processing information. Thus, the moment at which the computer implements the result is relatively easy to identify. That is, the completion of the logic sequence embedded in the programming indicates when the information processing is complete and the computer has "reached a decision". The results may be maintained at the output of a data processor of the computer in a relatively long-lived form by, for example, a memory device, a set of buffers, or the like, and may be accessed for a variety of purposes.

Rather, as described herein, decision moments in an artificial recurrent neural network can be identified based on features of the dynamics of the neural network during information processing. Rather than waiting for the artificial neural network to reach a predefined end of the logic sequence, decision moments in the artificial neural network may be identified based on characteristics of the functional state of the artificial neural network during information processing.

Furthermore, features that cycle the dynamics of the artificial neural network during information processing, including features such as activities commensurate with clique patterns and directed clique patterns, can be used in a wide variety of signaling operations (signaling operations), including signal transmission, encoding, encryption, and storage. In particular, the features that cycle the activity in the artificial neural network during information processing reflect the input and can be considered an encoded form of the input (i.e., the "output" of the artificial neural network is cycled during the encoding process). These characteristics may be transmitted, for example, to a remote receiver, which may decode the transmitted characteristics to reconstitute the input or a portion of the input.

Further, in some cases, the activity in different node groups of the recurrent artificial neural network (e.g., activity commensurate with the clique pattern and directed clique pattern) may be represented as a binary sequence of 0 and 1, with each number indicating whether the activity is commensurate with the pattern. Since in some scenarios the activity may be an output of a recurrent artificial neural network, the output of the recurrent artificial neural network may be represented as a vector of binary digits and be compatible with digital data processing.

Furthermore, in some cases, such characterization of the dynamics of the recurrent artificial neural networks may be used prior to and/or during training to increase the likelihood of complex patterns of activity (likelihoods) occurring during information processing. For example, before or during training, the links between nodes in the recurrent neural network may be intentionally evolved to increase the complexity of the activity pattern. For example, links between nodes in a recurrent artificial neural network may be intentionally evolved to increase the likelihood of clique patterns and directed clique patterns of activity occurring, for example, during information processing. This may reduce the time and effort required to train the recurrent artificial neural network.

As another example, such characterization of the dynamics of the recurrent artificial neural network may be used to determine the degree of completion in the training of the recurrent neural network. For example, a recurrent artificial neural network that displays a particular type of ranking in an activity (e.g., a clique pattern and a directed clique pattern) may be considered to be more highly trained than a recurrent artificial neural network that does not display such a ranking. Indeed, in some cases, the degree of training may be quantified by quantifying the degree of ordering of activities in the recurrent artificial neural network.

The details of one or more implementations described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

Drawings

Fig. 1 is a schematic illustration of the structure of a recurrent artificial neural network device.

Figures 2 and 3 are schematic illustrations of cycling the function of an artificial neural network device within different time windows.

FIG. 4 is a flow diagram of a process for identifying decision moments in a recurrent artificial neural network based on characterization of activity in the network.

FIG. 5 is a schematic illustration of a pattern of activities that may be identified and used to identify decision moments in a recurrent artificial neural network.

FIG. 6 is a schematic illustration of a pattern of activities that may be identified and used to identify decision moments in a recurrent artificial neural network.

FIG. 7 is a schematic illustration of patterns of activity that may be identified and used to identify decision moments in a recurrent artificial neural network.

FIG. 8 is a schematic illustration of a data table that may be used in the determination of complexity or degree of ordering in an activity pattern in a recurrent artificial neural network device.

Fig. 9 is a schematic illustration of the determination of a specific time of an activity pattern with distinguishable complexity.

FIG. 10 is a flow diagram of a process for encoding a signal using a recurrent artificial neural network based on characterization of activity in the network.

FIG. 11 is a flow diagram of a process for decoding a signal using a recurrent artificial neural network based on characterization of activity in the network.

Fig. 12, 13 and 14 are schematic illustrations of binary forms or representations of topologies.

Fig. 15 and 16 schematically illustrate one example of how the presence or absence of features corresponding to different bits are not independent of each other.

Fig. 17, 18, 19, 20 are schematic illustrations of representations of the occurrence of a topology in activity in a neural network used in four different classification systems.

Fig. 21, 22 are schematic illustrations of an edge device (edge device) that includes a local artificial neural network (local artificial neural network) that can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network.

FIG. 23 is a schematic illustration of a system in which a local neural network can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network.

Fig. 24, 25, 26, 27 are schematic illustrations of representations of the occurrence of topology in activity in using neural networks in four different systems.

FIG. 28 is a schematic illustration of a system 0 that includes an artificial neural network that may be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network.

Like reference symbols in the various drawings indicate like elements.

Detailed Description

Fig. 1 is a schematic illustration of the structure of a recurrent artificial neural network device 100. The recurrent artificial neural network device 100 is a device that simulates the information coding and other processing capabilities of a biological neural network using a system of interconnected nodes. The recurrent artificial neural network device 100 can be implemented in hardware, software, or a combination thereof.

The instantiation of the recurrent artificial neural network device 100 includes a plurality of

nodes

101, 102, … …, 107 interconnected by a plurality of structural links 110.

Nodes

101, 102, … …, 107 are discrete information processing constructs similar to neurons in a biological network.

Nodes

101, 102, … …, 107 typically process one or more input signals received over one or more of links 110 to produce one or more output signals output over one or more of links 110. For example, in some implementations,

nodes

101, 102, … …, 107 may be artificial neurons that weight and sum multiple input signals, transfer the sums through one or more nonlinear activation functions (activation functions), and output one or more output signals.

Nodes

101, 102, … …, 107 may operate as accumulators. For example, the

nodes

101, 102, … …, 107 may operate according to an integrated-and-fire model in which one or more signals are accumulated in a first node until a threshold is reached. After the threshold is reached, the first node fires by transmitting an output signal along one or more of the links 110 to the connected second node. In turn, the

second node

101, 102, … …, 107 accumulates the received signals and, if the threshold is reached, the

second node

101, 102, … …, 107 transmits a further output signal to a further connected node.

The structural link 110 is a connection capable of transmitting signals between the

nodes

101, 102, … …, 107. For convenience, all structural links 110 are considered herein to be the same bi-directional links that transmit signals from each first one of the

nodes

101, 102, … …, 107 to each second one of the

nodes

101, 102, … …, 107 in the same manner as signals are transmitted from the second node to the first node. However, this is not necessarily the case. For example, some or all of the structural links 110 may be unidirectional links that transmit signals from a first one of the

nodes

101, 102, … …, 107 to a second one of the

nodes

101, 102, … …, 107 without transmitting signals from the second node to the first node.

As another example, in some implementations, the structural links 110 may have a wide variety of characteristics other than or in addition to directionality. For example, in some implementations, different structural links 110 may carry signals of different magnitudes-resulting in different interconnection strengths between respective ones of

nodes

101, 102, … …, 107. As another example, different structural links 110 may carry different types of signals (e.g., inhibitory (inhibitory) and/or excitatory (excitatory) signals). Indeed, in some implementations, the structural links 110 may mimic links between somatic cells in biological systems and reflect at least a portion of the vast morphological, chemical, and other diversity of such links.

In the illustrated implementation, the recurrent artificial neural network device 100 is a clique network (or sub-network) in that each

node

101, 102, … …, 107 is connected to every

other node

101, 102, … …, 107. This is not necessarily the case. Rather, in some implementations, each

node

101, 102, … …, 107 can be connected to an appropriate subset of

nodes

101, 102, … …, 107 (either by the same link or by a variety of links, as appropriate).

For clarity of illustration, the recurrent artificial neural network device 100 is illustrated as having only seven nodes. Typically, real-world neural network devices will include a significantly greater number of nodes. For example, in some implementations, a neural network device may include hundreds of thousands, millions, or even billions of nodes. Thus, the recurrent neural network device 100 may be a small part (i.e., a sub-network) of a larger recurrent artificial neural network.

In biological neural network devices, the accumulation and signal transmission processes require real-world time lapses. For example, the somatic cells of neurons integrate inputs received over time, and signal transmission from neuron to neuron requires time determined by, for example, the signal transmission speed and the nature and length of the links between neurons. Thus, the state of the biological neural network device is dynamic and changes over time.

In the artificial circulation neural network device, time is artificial and expressed using a mathematical construct (mathematical construct). For example, a signal transmitted from node to node does not require a real-world time lapse, and such a signal may be represented in man-made units that are generally unrelated to real-world time lapses, as measured by computer clock cycles or otherwise. However, the state of the artificial recurrent neural network device can be described as "dynamic" because it changes with respect to these artificial units.

Note that for convenience, these artificial units are referred to herein as "time" units. However, it should be understood that these units are man-made and do not generally correspond to the real-world time lapse.

Fig. 2 and 3 are schematic illustrations of cycling the functionality of the artificial neural network device 100 over different time windows. Because the state of the device 100 is dynamic, the signaling activity occurring within a window may be used to represent the functionality of the device 100. Such a functional instantiation typically shows activity in only a small portion of link 110. In particular, not every link 110 is illustrated as actively (activery) contributing to the functionality of device 100 in these illustrations, since typically not every link 110 transmits a signal within a particular window.

In the illustration of fig. 2 and 3, the active (active) link 110 is illustrated as a relatively thick solid line connecting a pair of

nodes

101, 102, … …, 107. In contrast, inactive (inactive) link 110 is illustrated as a dashed line. This is for illustration purposes only. In other words, the structural connection formed by link 110 exists regardless of whether link 110 is active. However, this formalism (formalism) highlights the activities and functions of the device 100.

In addition to schematically illustrating the presence of activity along the link, the direction of the activity is also schematically illustrated. In particular, the relatively thick solid line illustrating the active link 110 also includes arrows representing the direction of signal transmission along the link during the relevant window. In general, the direction of signal transmission within a single window does not critically limit the link to unidirectional links with the indicated directionality. Conversely, in the first function instantiation for the first time window, the link may be active in the first direction. In a second function instantiation for a second window, the link may be active in the opposite direction. However, in some cases, such as, for example, in a recurrent artificial neural network device 100 that includes only unidirectional links, the directionality of the signal transmission will critically indicate the directionality of the links.

In a feed-forward neural network device, information moves in only a single direction (i.e., forward) to the node output layer at the end of the network. The feed-forward neural network device indicates that a "decision" has been reached and information processing is complete by propagation of a signal through the network to the output layer.

In contrast, in a recurrent neural network, the connections between nodes form a cycle, and the activity of the network progresses dynamically (progress) without easily identifiable decisions. For example, even in a three-node recurrent neural network, a first node may transmit a signal to a second node, which in response may transmit a signal to a third node. In response, the third node may transmit a signal back to the first node. The signal received by the first node may be, at least in part, responsive to a signal transmitted from the same node.

Schematic functional illustrations fig. 2 and 3 illustrate this in a network that is only slightly larger than a three-node recurrent neural network. The functional illustration shown in fig. 2 may illustrate activity within a first window, and fig. 3 may illustrate activity within an immediately following second window. As shown, the set of signaling activities appears to originate in the node 104 and progress in a substantially clockwise direction through the device 100 during the first window. Within the second window, at least some of the signaling activity appears to return to node 104 approximately. Even in such a simplistic example, the signal transmission does not proceed in such a way as to produce a clearly identifiable output or end.

When considering a recurrent neural network of, for example, thousands of nodes or more, it can be appreciated that signal propagation can occur over a large number of paths, and that these signals lack clearly identifiable "output" locations or times. Although by design the network can be returned to a quiescent state where only background or even no signaling activity occurs, the quiescent state itself does not indicate the result of the information processing. Regardless of the input, the recurrent neural network always returns to a quiescent state. Thus, the "output" or result of the information processing is encoded in the activity that occurs in the recurrent neural network in response to a particular input.

FIG. 4 is a flow diagram of a process 400 for identifying decision moments in a recurrent artificial neural network based on characterization of activity in the network. The decision time is the point in time when activity in the recurrent artificial neural network indicates the result of the network processing information in response to the input. Process 400 may be performed by a system of one or more data processing apparatus that perform operations according to logic of one or more sets of machine-readable instructions. For example, the process 400 may be performed by the same system of one or more computers executing software for implementing the recurrent artificial neural network used in the process 400.

At 405, the system performing process 400 receives a notification that a signal has been input into the recurrent artificial neural network. In some cases, the input of the signal is a discrete injection event, where, for example, information is injected into one or more nodes and/or one or more links of the neural network. In other cases, the input of the signal is a stream of information injected over a period of time into one or more nodes and/or links of the neural network. The notification indicates that the artificial neural network is actively processing information and is not, for example, in a quiescent state. In some cases, the notification is received from the neural network itself, such as when the neural network exits an identifiable quiescent state, for example.

At 410, the system performing process 400 divides the response activity in the network into a set of windows. Where the injection is a discrete event, the window may subdivide the time between injection and return to a quiescent state into periods during which activities exhibit variable complexity. Where the injection is a stream of information, the duration of the injection (and optionally the time to return to a quiescent state after completion of the injection) may be subdivided into windows during which the activity exhibits variable complexity. Various methods of determining the complexity of an activity are discussed further below.

In some implementations, the windows all have the same duration, but this is not necessarily the case. Conversely, in some implementations, the windows may have different durations. For example, in some implementations, the duration may increase as the time since the discrete injection event has occurred increases.

In some implementations, the window may be a continuous series of individual windows. In other implementations, the windows overlap in time such that one window begins before the end of the previous window. In some cases, the window may be a moving window that moves in time.

In some implementations, different durations of the window are defined for different determinations of complexity of the activity. For example, for an activity pattern defining activity occurring between a relatively large number of nodes, the window may have a relatively longer duration than a window defined for an activity pattern defining activity occurring between a relatively small number of nodes. For example, in the context of a pattern 500 of activities (FIG. 5), the window defined for identifying activities commensurate with pattern 530 may be longer than the window defined for identifying activities commensurate with pattern 505.

At 415, the system performing process 400 identifies patterns in activity within different windows in the network. As discussed further below, patterns in activity can be identified by treating a functional graph (functional graph) as a topological space with nodes as points. In some implementations, the identified activity pattern is a clique in a functional graph of the network, e.g., a directed clique.

At 420, the system performing process 400 determines the complexity of the activity pattern within the different windows. Complexity may be a measure of the likelihood that an ordered pattern of activity occurs within a window. Thus, a randomly occurring pattern of activity would be relatively simple. On the other hand, it is relatively complex to show a non-random sequence of activity patterns. For example, in some implementations, the complexity of an active pattern may be measured using, for example, a simplex count (simplex count) or a Betti number (Betti number) of the active pattern.

At 425, the system performing process 400 determines a particular time of the activity pattern having distinguishable complexity. The particular activity pattern may be distinguishable based on the complexity of the upward or downward deviation (e.g., from a fixed or variable baseline). In other words, a particular time of an activity pattern of a particularly high level or a particularly low level indicating a non-random order in the activity may be determined.

For example, where the signal input is a discrete injection event, a deviation, e.g., from a stable baseline or from a curve that is characteristic of the mean response of the neural network to a wide variety of different discrete injection events, can be used to determine a particular time of the distinguishable complex activity pattern. As another example, where information is input in the form of a stream, large changes in complexity during streaming may be used to determine specific times of distinguishable complex activity patterns.

At 430, the system performing process 400 schedules the reading of the output from the neural network based on the particular time of the distinguishably complex activity pattern. For example, in some implementations, the output of the neural network may be read at the same time that distinguishable complex activity patterns occur. In implementations where the complexity deviation indicates a relatively high non-random order in the activity, the observed activity pattern itself may also be taken as the output of the recurrent artificial neural network.

FIG. 5 is an illustration of a pattern 500 that can be identified and used to identify activity at decision times in a recurrent artificial neural network. For example, pattern 500 may be identified at 415 in process 400 (FIG. 4).

The pattern 500 is illustrative of activity in a recurrent artificial neural network. During application of the schema 500, the functional graph is viewed as a topological space with nodes as points. Activities in the nodes and links associated with the schema 500 may be identified as ordered regardless of the identity of the particular node and/or link participating in the activity. For example, a first pattern 505 may represent activity between

nodes

101, 104, 105 in fig. 2, with point 0 as node 104, point 1 as node 105, and point 2 as node 101 in pattern 505. As another example, the first pattern 505 may also represent activity between

nodes

104, 105, 106 in fig. 3, with point 0 as node 106, point 1 as node 104, and point 2 as node 105 in pattern 505. The order of the activities in the directed clique is also specified. For example, in pattern 505, activity between point 1 and point 2 occurs after activity between point 0 and point 1.

In the illustrated implementation, the schema 500 is all directed blobs or directed simplex (simplices). In such a mode, the activity originates from the source node transmitting a signal to each of the other nodes in the mode. In pattern 500, such a source node is designated as point 0, while the other nodes are designated as

points

1, 2, … …. Further, in a directed clique or simplex, one of the nodes acts as a sink and receives signals transmitted from each other node in the pattern. In pattern 500, such an aggregation node is designated as the highest numbered point in the pattern. For example, in mode 505, the sink node is designated as Point 2. In mode 510, the sink node is designated as point 3. In pattern 515, the sink node is designated as point 3, and so on. Thus, the activities represented by the pattern 500 are ordered in a distinguishable manner.

Each of the patterns 500 has a different number of points and reflects ordered activity in the different number of nodes. For example, schema 505 is a two-dimensional simplex and reflects activity in three nodes, schema 510 is a three-dimensional simplex and reflects activity in four nodes, and so on. As the number of points in a pattern increases, the degree of ordering and complexity of the activities also increases. For example, for a large set of nodes with some degree of random activity within the window, some of the activity may be due to chance as opposed to pattern 505. However, random activity will progressively be less likely to be commensurate with the respective ones of the

patterns

510, 515, 520 … …. The presence of activity commensurate with the pattern 530 indicates a relatively high degree and complexity of ordering in the activity as compared to the presence of activity commensurate with the pattern 505.

As previously discussed, in some implementations, windows of different durations may be defined for different determinations of the complexity of an activity. For example, when activity commensurate with pattern 530 is to be identified, a longer duration window may be used than when activity commensurate with pattern 505 is to be identified.

FIG. 6 is an illustration of a pattern 600 that can be identified and used to identify activity at decision times in a recurrent artificial neural network. For example, the pattern 600 may be identified at 415 in the process 400 (FIG. 4).

Like pattern 500, pattern 600 is illustrative of activities in a recurrent artificial neural network. However, the patterns 600 deviate from the strict ordering of the patterns 500 because the patterns 600 are not all directed blobs or directed simplex. In particular,

modes

605, 610 have lower directivity than mode 515. In fact, the mode 605 lacks an aggregation node at all. However, the

patterns

605, 610 indicate a degree of ordered activity that exceeds the degree of ordered activity expected by random chance, and may be used to determine the complexity of activity in a recurrent artificial neural network.

FIG. 7 is an illustration of a pattern 700 that can be identified and used to identify activity at decision times in a recurrent artificial neural network. For example, pattern 700 may be identified at 415 in process 400 (fig. 4).

The pattern 700 is a directed blob or a group of directed simplices having the same dimensions (i.e., having the same number of points), which defines a pattern involving more points than the blob or the simplex alone and encloses a cavity (cavity) within the group of directed simplices.

By way of example, pattern 705 includes six different three-point, two-dimensional patterns 505 that together define a level 2 coherence class, while pattern 710 includes eight different three-point, two-dimensional patterns 505 that together define a level 2 coherence class. Three-point, two-dimensional modes 505 in

modes

705, 710 may each be considered to surround a respective cavity. The nth Betti number associated with the directed graph provides a count of such classes of coherence in the topological representation.

Activities illustrated by patterns such as pattern 700 illustrate a relatively high degree of ranking of activities in a network that are unlikely to occur by random chance. The schema 700 can be used to characterize the complexity of the activity.

In some implementations, only some patterns of activity are identified and/or some portion of the identified patterns of activity are discarded or otherwise ignored during the identification at the decision time. For example, referring to FIG. 5, activities commensurate with a five-point, four-dimensional simplex pattern 515 inherently include activities commensurate with four-point, three-dimensional, and three-point, two-dimensional

simplex patterns

510, 505. For example, points 0, 2, 3, 4 and points 1, 2, 3, 4 in the four-dimensional simplex pattern 515 of FIG. 5 are all commensurate with the three-dimensional simplex pattern 510. In some implementations, patterns that contain fewer points-and therefore have lower dimensionality-may be discarded or otherwise ignored during identification of decision instants.

As another example, only some patterns of activity need to be identified. For example, in some implementations, patterns with only odd-numbered points (3, 5, 7, … …) or even-numbered dimensions (2, 4, 6, … …) are used in the identification of decision instants.

The degree of complexity or ordering in the pattern of activity in the recurrent artificial neural network devices within different windows can be determined in a variety of different ways. Fig. 8 is a schematic illustration of a data table 800 that may be used in such a determination. Data table 800 may be used to determine the complexity of activity patterns in isolation or in conjunction with other activities. For example, data table 800 may be used at 420 in process 400 (FIG. 4).

In more detail, table 800 includes a count of the number of pattern occurrences during window "N", where the count of the number of activities matching patterns of different dimensions are presented in different rows. For example, in the illustrated example, row 805 includes a count of the number of occurrences of activity that match one or more three-point, two-dimensional patterns (i.e., "2032"), while row 810 includes a count of the number of occurrences of activity that match one or more four-point, three-dimensional patterns (i.e., "877"). Since the occurrence of a pattern indicates that the activities have a non-random order, the number count also provides a generalized characterization of the overall complexity of the activity pattern. A table similar to table 800 may be formed for each window defined, for example, at 410 in process 400 (fig. 4).

Although table 800 includes separate rows and separate entries for each type of activity pattern, this need not be the case. For example, one of more counts (e.g., the count of the simpler mode) may be omitted from table 800 and from the determination of complexity. As another example, in some implementations, a single row or entry may include a count of occurrences of multiple activity patterns.

Although fig. 8 presents a number count in table 800, this is not necessarily the case. For example, the number count may be presented as a vector (e.g., <2032, 877, 133, 66, 48, … … >). Regardless of how the count is presented, in some implementations, the count may be expressed in binary and may be compatible with a digital data processing infrastructure.

In some implementations, the number counts of occurrences of the patterns may be weighted or combined to determine the degree or complexity of the ordering, e.g., at 420 in process 400 (fig. 4). For example, the Eular charateristic (Eular charateristic) may provide an approximation of the complexity of an activity and is given by the equation:

S₀-S₁+S₂-S₃+ … equation 1

Wherein S_nIs the number of occurrences of the pattern of n points (i.e., the pattern of dimension n-1). The schema may be, for example, a directed blob schema 500 (FIG. 5).

As another example of how the number of occurrences of a pattern may be weighted to determine the degree or complexity of the ordering, in some implementations, the pattern occurrences may be weighted based on the weights of the active links. In more detail, as discussed previously, the strength of the connections between nodes in the artificial neural network may vary, for example, due to the activity of the connections during training. The occurrence of a pattern of activity along a set of relatively strong links may be weighted differently than the occurrence of the same pattern of activity along a set of relatively weak links. For example, in some implementations, the sum of the weights of the active links may be used to weight the occurrence.

In some implementations, the euler feature or other measure of complexity may be normalized by the total number of patterns matched within a particular window and/or the total number of patterns a given network may form given the structure of the network. An example of normalization with respect to the total number of modes that the network may form is given below in

equations

2, 3.

In some implementations, occurrences of higher dimensional patterns involving a larger number of nodes may be weighted more heavily than occurrences of lower dimensional patterns involving a smaller number of nodes. For example, the probability of forming a directed blob decreases rapidly with increasing dimensionality. In particular, in order to form an n-clique from n +1 nodes, it is necessary that (n +1) n/2 edges are all correctly oriented. This probability may be reflected in the weighting.

In some implementations, both the dimensionality and directionality of the patterns can be used to weight the occurrence of the patterns and determine the complexity of the activity. For example, referring to FIG. 6, depending on the differences in directionality of the five-point, four-dimensional pattern 515 and the five-point, four-

dimensional patterns

605, 610, the occurrence of the five-point, four-dimensional pattern 515 may be weighted more heavily than the occurrence of the five-point, four-

dimensional patterns

605, 610.

One example of using both directionality and dimensionality of patterns to determine the degree of ordering or complexity of activities can be given by the following equation:

wherein S_x ^activeIndicates the number of occurrences of activity of the pattern of n points and ERN is a calculation for an equivalent random network (i.e., a network with the same number of nodes randomly connected). Further, SC is given by the following equation:

wherein S_x ^silentIndicates the number of occurrences of the pattern of n points when the recurrent artificial neural network is silent and can be considered to represent the total number of patterns that the network may form. In

equations

2, 3, the pattern may be, for example, a directed blob pattern 500 (fig. 5).

Fig. 9 is a schematic illustration of the determination of a specific time of an activity pattern with distinguishable complexity. The determination illustrated in fig. 9 may be performed in isolation or in conjunction with other activities. This determination may be performed, for example, at 425 in process 400 (fig. 4).

Fig. 9 includes a graph 905 and a graph 910. Graph 905 illustrates the occurrence of a pattern as a function of time along the x-axis. In particular, the respective occurrences are schematically illustrated as

vertical lines

906, 907, 908, 909. Each row occurrence may be an instance of an activity matching a corresponding pattern or class of patterns. For example, the occurrence of the top row may be an instance of active matching pattern 505 (FIG. 5), the occurrence of the second row may be an instance of active matching pattern 510 (FIG. 5), the occurrence of the third row may be an instance of active matching pattern 515 (FIG. 5), and so on.

The graph 905 also includes dashed

rectangles

915, 920, 925 that schematically depict different time windows when the active mode has distinguishable complexity. As shown, during the windows depicted by the dashed

rectangles

915, 920, 925, activity in the recurrent artificial neural network matches patterns indicative of complexity more likely than outside those windows.

Graph 910 illustrates the complexity associated with these occurrences as a function of time along the x-axis. The chart 910 includes: a first peak of complexity 930, which coincides with the window depicted by the dashed rectangle 915; and a second peak of complexity 935, which coincides with the window depicted by the dashed

rectangles

920, 925. As shown, the complexity illustrated by

peaks

930, 925 is distinguishable from the complexity of baseline level 940, which may be considered complexity.

In some implementations, the time at which the output of the recurrent artificial neural network will be read is consistent with the occurrence of activity patterns with distinguishable complexity. For example, in the illustrative scenario of fig. 9, the output of the recurrent artificial neural network may be read at the

peaks

930, 925, i.e., during the windows depicted by the dashed

rectangles

915, 920, 925.

The identification of distinguishable levels of complexity in a recurrent artificial neural network is particularly beneficial when the input is a data stream. Examples of data streams include, for example, video or audio data. Although the data stream has a start, it is generally desirable to process information in the data stream that does not have a predefined relationship to the start of the data stream. By way of example, the neural network may perform object recognition, such as, for example, recognizing a bicyclist in the vicinity of a car. Such a neural network should be able to identify cyclists regardless of when those cyclists appear in the video stream, i.e. regardless of the time since the start of the video. Continuing with this example, when the data stream is input into the object recognition neural network, any pattern of activity in the neural network will typically exhibit a low or quiescent level of complexity. These low or quiescent levels of complexity are exhibited despite continuous (or near continuous) input of streaming data into the neural network device. However, when an object of interest appears in the video stream, the complexity of the activity will become distinguishable and indicate the time at which the object was identified in the video stream. Thus, a particular time of a distinguishable level of complexity of an activity may also serve as a yes/no output as to whether data in the data stream meets certain criteria.

In some implementations, not only the specific time of the output of the recurrent artificial neural network, but also the content of the output of the recurrent artificial neural network is given by the activity pattern having distinguishable complexity. In particular, the identity and activity of nodes participating in activities commensurate with the activity pattern may be considered the output of the recurrent artificial neural network. Thus, the identified activity pattern may instantiate the results processed by the neural network, as well as the specific time at which this decision will be read.

The content of the decision can be expressed in a variety of different forms. For example, in some implementations and as discussed in further detail below, the content of the decision may be expressed as a binary vector or matrix of 1's and 0's. Each number may indicate: for example, whether a pattern of activity exists for a predefined group of nodes and/or a predefined duration. In such an implementation, the content of the decision is expressed in binary and may be compatible with conventional digital data processing infrastructure.

FIG. 10 is a flow diagram of a process 1000 for encoding a signal using a recurrent artificial neural network based on characterization of activity in the network. The signal may be encoded in a variety of different scenarios, such as, for example, transmission, encryption, and data storage. Process 1000 may be performed by a system of one or more data processing apparatus that perform operations according to logic of one or more sets of machine-readable instructions. For example, process 1000 may be performed by the same system of one or more computers executing software for implementing the recurrent artificial neural network used in process 1000. In some instances, process 1000 may be performed by the same data processing apparatus that performs process 400. In some examples, process 1000 may be performed by, for example, an encoder in a signal transmission system or an encoder of a data storage system.

At 1005, the system performing process 1000 inputs a signal into the recurrent artificial neural network. In some cases, the input of the signal is a discrete injection event. In other cases, the input signal is streamed into a recurrent artificial neural network.

At 1010, the system performing process 1000 identifies one or more decision moments in the recurrent artificial neural network. For example, the system may identify one or more decision moments by performing process 400 (FIG. 4).

At 1015, the system performing process 1000 reads the output of the recurrent artificial neural network. As discussed above, in some implementations, the content of the output of the recurrent artificial neural network is activity in the neural network that matches the pattern used to identify the decision point.

In some implementations, a separate "reader node" may be added to the neural network to identify the occurrence of a particular pattern of activity at a particular set of nodes and thus to read the output of the recurrent artificial neural network at 1015. A reader node may fire if and only if the activity at a particular set of nodes meets a particular time (and possibly also amplitude) criterion. For example, to read the occurrence of pattern 505 (FIG. 5) at

nodes

104, 105, 106 (FIG. 2, FIG. 3), a reader node may be connected to

nodes

104, 105, 106 (or links 110 therebetween). The reader node itself will become active only when a pattern of activity involving the

nodes

104, 105, 106 (or their links) occurs.

The use of such reader nodes would eliminate the need to define time windows for the recurrent artificial neural network as a whole. In particular, individual reader nodes may be connected to different nodes and/or multiple nodes (or links between them). Individual reader nodes may be configured to have customized responses (e.g., integrating different decay times in the firing model) to identify different activity patterns. At 1020, the system performing process 1000 transmits or stores the output of the recurrent artificial neural network. The particular action performed at 1020 may reflect the scenario in which process 1000 is being used. For example, in a scenario where secure or compressed communication is desired, a system executing process 1000 may transmit the output of a recurrent neural network to a receiver that may access the same or a similar recurrent neural network. As another example, in scenarios where secure or compressed data storage is desired, the system performing process 1000 may record the output of the recurrent neural network in one or more machine-readable data storage devices for later access.

In some implementations, the complete output of the recurrent neural network is not transmitted or stored. For example, in implementations where the content of the output of the recurrent neural network is activity in the neural network that matches a pattern indicative of complexity in the activity, only activity that matches relatively more complex or higher dimensional activity may be transmitted or stored. By way of example, referring to pattern 500 (fig. 5), in some implementations, only the activity of matching

patterns

515, 520, 525, and 530 is transmitted or stored, while the activity of matching

patterns

505, 510 is ignored or discarded. In this way, lossy processing allows for a reduction in the amount of data transmitted or stored at the expense of the integrity of the information being encoded.

Fig. 11 is a flow diagram of a process 1100 for decoding a signal using a recurrent artificial neural network based on characterization of activity in the network. The signal may be decoded in a variety of different scenarios, such as, for example, signal reception, decryption, and reading of data from storage. Process 1100 may be performed by a system of one or more data processing apparatus that perform operations according to logic of one or more sets of machine-readable instructions. For example, the process 1100 may be performed by the same system of one or more computers executing software for implementing the recurrent artificial neural network used in the process 1100. In some examples, process 1100 may be performed by the same data processing apparatus that performs process 400 and/or process 1000. In some examples, process 1100 may be performed by, for example, a decoder in a signal receiving system or a decoder of a data storage system.

At 1105, the system performing process 1100 receives at least a portion of an output of the recurrent artificial neural network. The particular action performed at 1105 may reflect the scenario in which process 1100 is being used. For example, a system performing process 1000 may receive a transmission signal including an output of a recurrent artificial neural network or read a machine-readable data storage device storing the output of the recurrent artificial neural network.

At 1110, the system performing process 1100 reconstructs inputs to the recurrent artificial neural network from the received outputs. Reconstruction can be performed in a variety of different ways. For example, in some implementations, a second artificial neural network (recurrent or acyclic) may be trained to reconstruct inputs into the recurrent neural network from outputs received at 1105.

As another example, in some implementations, a decoder that has been trained using machine learning (including but not limited to deep learning) may be trained to reconstruct inputs into the recurrent neural network from outputs received at 1105.

As yet another example, in some implementations, the inputs into the same recurrent artificial neural network or into a similar recurrent artificial neural network may be iteratively permuted (permute) until the output of the recurrent artificial neural network matches to some extent the output received at 1105.

In some implementations, the process 1100 can include receiving user input specifying the extent to which the input is to be reconstructed, and in response, adjusting the reconstruction accordingly at 1110. For example, the user input may specify that a complete reconstruction is not required. In response, the system adjustment rebuild of process 1100 is performed. For example, in an implementation in which the content of the output of the recurrent neural network is activity in the neural network that matches a pattern indicative of complexity in the activity, only the output characterizing the activity that matches the relatively more complex or higher dimensional activity will be used to reconstruct the input. By way of example, referring to pattern 500 (fig. 5), in some implementations, only the activity of matching

patterns

515, 520, 525, and 530 may be used to reconstruct the input, while the activity of matching

patterns

505, 510 may be ignored or discarded. In this way, lossy reconstruction can be performed under selected circumstances.

In some implementations, the

processes

1000, 1100 may be used for peer-to-peer (peer) encrypted communications. In particular, both the sender (i.e., encoder) and the receiver (i.e., decoder) may be provided with the same recurrent artificial neural network. There are several methods by which a shared recurrent artificial neural network can be customized to ensure that third parties cannot reverse engineer it and decrypt the signal, including:

architecture of a recurrent artificial neural network

-functional settings of the recurrent artificial neural network, including node states and edge weights,

the size (or dimension) of the pattern, and

-a fraction of the patterns in each dimension.

These parameters may be considered as multiple layers that together ensure transport security. Furthermore, in some implementations, the decision time instant point may be used as a key to decrypt the signal.

Although the

processes

1000, 1100 are presented in terms of encoding and decoding a single recurrent artificial neural network, the

processes

1000, 1100 may also be applied in systems and processes that rely on multiple recurrent artificial neural networks. These recurrent artificial neural networks may operate in parallel or in series.

As one example of series operation, the output of a first recurrent artificial neural network may be used as the input of a second recurrent artificial neural network. The resulting output of the second recurrent artificial neural network is a twice-encoded (or twice-encrypted) version of the input into the first recurrent artificial neural network. Such a serial arrangement of recurrent artificial neural networks may be useful in situations where different parties have different levels of access to the information, for example, in a medical record system where patient identity information may not be accessible to the party that will use and have access to the remainder of the medical record.

As one example of parallel operation, the same information may be input into a plurality of different recurrent artificial neural networks. The different outputs of these neural networks can be used, for example, to ensure that the input can be reconstructed with high fidelity.

While multiple implementations have been described, various modifications may be made. For example, while an application typically means that activity in a recurrent artificial neural network should match a pattern indicating ordering, this is not necessarily the case. In contrast, in some implementations, activity in the recurrent artificial neural network may be commensurate with a pattern without having to display activity that matches the pattern. For example, an increase in the likelihood that the recurrent neural network will show activity that will match the pattern may be considered a non-random ordering of activity.

As yet another example, in some implementations, different sets of patterns may be customized for use in characterizing activity in different recurrent artificial neural networks. The patterns may be customized, for example, according to the effectiveness of the patterns in characterizing the activity of different recurrent artificial neural networks. The effectiveness may be quantified, for example, based on the size of a table or vector representing the occurrence counts of the different patterns.

As yet another example, in some implementations, the pattern used to characterize activity in the recurrent artificial neural network may take into account the strength of the connections between nodes. In other words, the mode previously described herein handles all signaling activity between two nodes in a binary manner (i.e., activity present or absent). This is not necessarily the case. Conversely, in some implementations, activities that are connected with a certain level or strength may need to be considered as indicative of orderly complexity in the activity of the recurrent artificial neural network, commensurate with the pattern.

As yet another example, the content of the output of the recurrent artificial neural network may include patterns of activity that occur outside of a time window within which activities in the neural network have distinguishable levels of complexity. For example, the output of the recurrent artificial neural network read at 1015 and transmitted or stored at 1020 (fig. 10) may include information encoding activity patterns that occur outside of the dashed

rectangles

915, 920, 925 in the graph 905 (fig. 9), for example. By way of example, the output of the recurrent artificial neural network can characterize only the highest dimensional patterns of activity, regardless of when those patterns of activity occur. As another example, the output of the recurrent artificial neural network can characterize only the patterns of activity that surround the cavity, regardless of when those patterns of activity occur.

Fig. 12, 13, and 14 are schematic illustrations of a binary form or representation 1200 of a topology such as, for example, a pattern of activity in a neural network. The topologies illustrated in fig. 12, 13 and 14 all include the same information, i.e., an indication of the presence or absence of a feature in the graph. The feature may be, for example, an activity in a neural network device. In some implementations, the activity is identified based on or during a period of time within which the activity in the neural network has a complexity that is distinguishable from other activities responsive to the input.

As shown, binary representation 1200 includes

bits

1205, 1207, 1211, 1293, 1294, 1297, and an additional, arbitrary number of bits (represented by the ellipsis "… …"). For purposes of teaching,

bits

1205, 1207, 1211, 1293, 1294, 1297 … … are illustrated as discrete rectangular shapes that are filled or unfilled with binary values indicative of the bits. In the schematic illustration, the representation 1200 appears superficially as a one-dimensional vector of bits (fig. 12, 13) or a two-dimensional matrix of bits (fig. 14). However, representation 1200 differs from other ordered sets of vectors, matrices, or bits in that the same information can be encoded regardless of the order of the bits-i.e., regardless of the position of the individual bits within the set.

For example, in some implementations, each

individual bit

1205, 1207, 1211, 1293, 1294, 1297 … … may represent the presence or absence of a topological feature, regardless of the location of the feature in the diagram. By way of example, referring to fig. 2, a bit such as bit 1207 may indicate the presence of a topological feature commensurate with mode 505 (fig. 5), regardless of whether the activity occurred between

nodes

104, 105, 101 or between

nodes

105, 101, 102. Thus, while each

individual bit

1205, 1207, 1211, 1293, 1294, 1297 … … may be associated with a particular feature, the location of that feature in the diagram need not be encoded, for example, by the corresponding location of that bit in the representation 1200. In other words, in some implementations, the representation 1200 may only provide a homogenous topological reconstruction of the graph.

Further, in other implementations, it is possible that the location of the

individual bits

1205, 1207, 1211, 1293, 1294, 1297 … … do encode information such as, for example, the location of a feature in a diagram. In these implementations, a source graph (source graph) may be reconstructed using representation 1200. However, such encoding need not be present.

Whereas bits can represent the presence or absence of a topological feature regardless of its position in the graph, in FIG. 1, at the beginning of representation 1200, bit 1205 appears before bit 1207 and bit 1207 appears before bit 1211. In contrast, in FIGS. 2 and 3, the order of

bits

1205, 1207, and 1211-and the position of

bits

1205, 1207, and 1211 relative to other bits in representation 1200-in representation 1200 have changed. However, the binary representation 1200 remains the same — the rule set or algorithm that defines the process for encoding the information in the binary representation 1200 also remains unchanged. The position of the bits in representation 1200 is irrelevant as long as the correspondence between the bits and the features is known.

In more detail, each

bit

1205, 1207, 1211, 1293, 1294, 1297 … … individually represents the presence or absence of a feature in a diagram. A graph is a set of nodes and a set of edges between the nodes. The node may correspond to an object. Examples of objects may include, for example, artificial neurons in a neural network, individuals in a social network, and so forth. Edges may correspond to some relationship between objects. Examples of relationships include, for example, structural connections or activities along the connections. In the context of neural networks, artificial neurons may be associated by structural connections between neurons or by transmission of information along structural connections. In the context of a social network, individuals may be associated through "friend" or other relationship connections or through the transmission of information (e.g., posts) along such connections. Thus, an edge may characterize a relatively long-lived structure of a group of nodes or characterize a relatively short-lived activity occurring within a defined time frame. Further, edges may be directional or bidirectional. The directional edge indicates the directionality of the relationship between the objects. For example, the transmission of information from a first neuron to a second neuron may be represented by a directed edge representing the direction of transmission. As another example, in a social network, a relationship connection may indicate: the second user will receive information from the first user, and the non-first user will receive information from the second user. In topological terms, a graph can be expressed as a set of unit intervals [0, 1], where 0 and 1 are identified by the corresponding nodes connected by edges.

The feature whose presence or absence is indicated by

bits

1205, 1207, 1211, 1293, 1294, 1297 may be, for example, a node, a set of nodes in a plurality of sets of nodes, a set of edges in a plurality of sets of edges, and/or an additional hierarchically more complex feature (e.g., a set of nodes in a plurality of sets of nodes).

Bits

1205, 1207, 1211, 1293, 1294, 1297 generally represent the presence or absence of a feature at a different hierarchical level. For example, a bit 1205 may indicate the presence or absence of a node, while a bit 1205 may indicate the presence or absence of a group of nodes.

In some implementations,

bits

1205, 1207, 1211, 1293, 1294, 1297 may represent a feature in the graph having some characteristic at a threshold level. For example,

bits

1205, 1207, 1211, 1293, 1294, 1297 may not only indicate that there is activity in a set of edges, but may also indicate that this activity is weighted above or below a threshold level. The weights may, for example, embody training of the neural network device for a particular purpose or may be inherent features of the edges.

Fig. 5, 6, and 8 above illustrate features whose presence or absence may be represented by

bits

1205, 1207, 1211, 1293, 1294, 1297 … ….

The directed simplex in the

sets

500, 600, 700 treats the functional or structural graph as a topological space with nodes as points. Structures or activities involving one or more nodes and links commensurate with a simplex in the

collection

500, 600, 700 may be represented by bits regardless of the identity of the particular node and/or link participating in the activity.

In some implementations, only some patterns of structures or activities are identified and/or some portion of the identified patterns of structures or activities are discarded or otherwise ignored. For example, referring to FIG. 5, structures or activities commensurate with a five-point, four-dimensional simplex pattern 515 inherently include structures or activities commensurate with four-point, three-dimensional, and three-point, two-dimensional

simplex patterns

510, 505. For example, points 0, 2, 3, 4 and points 1, 2, 3, 4 in the four-dimensional simplex pattern 515 of FIG. 5 are all commensurate with the three-dimensional simplex pattern 510. In some implementations, simplex patterns that contain fewer points, and thus have lower dimensions, may be discarded or otherwise ignored.

As another example, only some patterns of structure or activity need to be identified. For example, in some implementations, only patterns with odd-numbered dots (3, 5, 7, … …) or even-numbered dimensions (2, 4, 6, … …) are used.

Returning to fig. 12, 13, 14, the features whose presence or absence is indicated by

bits

1205, 1207, 1211, 1293, 1294, 1297 … … may not be independent of each other. By way of explanation,

bits

1205, 1207, 1211, 1293, 1294, 1297 are independent of each other if

bits

1205, 1207, 1211, 1293, 1294, 1297 represent the presence or absence of zero-dimensional simplex, each reflecting the presence or activity of a single node. However, if

bits

1205, 1207, 1211, 1293, 1294, 1297 represent the presence or absence of higher-dimensional simplex, each reflecting the presence or activity of multiple nodes, the information encoded by the presence or absence of each individual feature may not be independent of the presence or absence of other features.

Fig. 15 schematically illustrates an example of how the presence or absence of features corresponding to different bits are not independent of each other. In particular, a sub-graph 1500 is illustrated that includes four

nodes

1505, 1510, 1515, 1520 and six directed

edges

1525, 1530, 1535, 1540, 1545, 1550. In particular, edge 1525 points from node 1525 to node 1510, edge 1530 points from node 1515 to node 1505, edge 1535 points from node 1520 to node 1505, edge 1540 points from node 1520 to node 1510, edge 1545 points from node 1515 to node 1510, and edge 1550 points from node 1515 to node 1520.

A single bit in representation 1200 (e.g., padded bit 1207 in fig. 12, 13, 14) may indicate the presence of a directional three-dimensional simplex. For example, such bits may indicate the presence of a three-dimensional simplex formed by

nodes

1505, 1510, 1515, 1520 and

edges

1525, 1530, 1535, 1540, 1545, 1550. The second bit in the representation 1200 (e.g., padded bit 1293 in fig. 12, 13, 14) may indicate the presence of a directional two-dimensional simplex. For example, such a bit may indicate the presence of a two-dimensional simplex formed by

nodes

1515, 1505, 1510 and

edges

1525, 1530, 1545. In this simple example, the information encoded by bits 1293 is fully redundant with the information encoded by bits 1207.

Note that the information encoded by bit 1293, together with the information encoded by the further bit, may also be redundant. For example, the information encoded by bits 1293, together with both the third and fourth bits indicating the presence of an additional directed two-dimensional simplex, would be redundant. Examples of these simplices are formed by

nodes

1515, 1520, 1510 and

edges

1540, 1545, 1550 and

nodes

1520, 1505, 1510 and

edges

1525, 1535, 1540.

Fig. 16 schematically illustrates another example of how the presence or absence of features corresponding to different bits are not independent of each other. In particular, a sub-graph 1600 is illustrated that includes four

nodes

1605, 1610, 1615, 1620 and five directed

edges

1625, 1630, 1635, 1640, 1645.

Nodes

1505, 1510, 1515, 1520 and

edges

1625, 1630, 1635, 1640, 1645 generally correspond to

nodes

1505, 1510, 1515, 1520 and

edges

1525, 1530, 1535, 1540, 1545 in sub-graph 1500 (FIG. 15). However, in contrast to sub-graph 1500, where

nodes

1515, 1520 are connected by edges 1550,

nodes

1615, 1620 are not connected by edges.

A single bit in representation 1200 (e.g., unfilled bit 1205 in fig. 12, 13, 14) may indicate the absence of a directed three-dimensional simplex, such as, for example, a directed three-dimensional

simplex containing nodes

1605, 1610, 1615, 1620. The second bit in the representation 1200 (e.g., padded bit 1293 in fig. 12, 13, 14) may indicate the presence of a two-dimensional simplex. An exemplary directed two-dimensional simplex is formed by

nodes

1615, 1605, 1610 and

edges

1625, 1630, 1645. This combination of padded 1293 and unfilled 1205 provides information indicating the presence or absence of other features (as well as the state of other bits) that may or may not be present in representation 1200. In particular, the combination of the absence of the directed three-dimensional simplex and the presence of the directed two-dimensional simplex indicates that at least one edge is absent from:

a) possible directed two-dimensional simplex or formed by

nodes

1615, 1620, 1610

b) A possible directed two-dimensional simplex formed by

nodes

1620, 1605, 1610.

Thus, the state of the bit representing the presence or absence of any of these possible simplifications is not independent of the state of the

bits

1205, 1293.

Although the examples have been discussed in terms of features having different numbers of nodes and hierarchical relationships, this need not be the case. For example, it is possible to include a representation 1200 that only corresponds to a set of bits, e.g., the presence or absence, of a three-dimensional simplex.

The use of separate bits to represent the presence or absence of a feature in a graph yields certain characteristics. For example, the encoding of information is fault tolerant and provides a "graceful degradation" of the encoded information. In particular, the loss of a particular bit (or group of bits) may increase the uncertainty as to the presence or absence of a feature. However, the likelihood of the presence or absence of a feature will still be assessed from other bits indicating the presence or absence of neighboring features.

Also, as the number of bits increases, the certainty as to the presence or absence of a feature increases.

As another example, as discussed above, the ordering or arrangement of the bits is independent of the isomorphic reconstruction of the graph represented by the bits. All that is required is a known correspondence between the bits and the particular nodes/structures in the graph.

In some implementations, patterns of activity in the neural network may be encoded in the representation 1200 (fig. 12, 13, and 14). In general, patterns of activity in a neural network are the result of many characteristics of the neural network, such as, for example, structural connections between nodes of the neural network, weights between nodes, and a number of possible other parameters. For example, in some implementations, the neural network may have been trained prior to encoding of the pattern representing activity in 1200.

However, regardless of whether the neural network is untrained or trained, for a given input, the response pattern of the activity can be considered a "representation" or "abstraction" of the input in the neural network. Thus, although the representation 1200 may appear to be a (in some cases, binary) direct-appearing set of numbers, each of the numbers may encode a relationship or correspondence between a particular input and a related activity in the neural network.

Fig. 17, 18, 19, 20 are schematic illustrations of representations of the occurrence of topologies in activity in a neural network used in four

different classification systems

1700, 1800, 1900, 2000. The

classification systems

1700, 1800 each classify representations of patterns of activity in the neural network as part of the classification of the input. The

classification systems

1900, 2000 each classify an approximation of a representation of a pattern of activity in a neural network as part of the classification of the input. In the

classification system

1700, 1800, the represented pattern of activity occurs in the source neural network device 1705 that is part of the

classification system

1700, 1800 and is read from the source neural network device 1705. In contrast, in the

classification systems

1900, 2000, the approximately represented patterns of activity occur in source neural network devices that are not part of the

classification systems

1700, 1800. However, approximations of representations of those patterns of activity are read from an approximator 1905 that is part of the

classification system

1900, 2000.

In more detail, turning to fig. 17, the classification system 1700 includes a source neural network 1705 and a linear classifier 1710. The source neural network 1705 is a neural network device configured to receive input and present a representation of the occurrence of a topology in an activity in the source neural network 1705. In the illustrated implementation, the source neural network 1705 includes an input layer 1715 that receives input. However, this is not necessarily the case. For example, in some implementations, some or all of the inputs may be injected into different layers and/or edges or nodes throughout the source neural network 1705.

The source neural network 1705 can be any of a variety of different types of neural networks. Typically, the source neural network 1705 is a recurrent neural network, such as, for example, a recurrent neural network that mimics a biological system. In some cases, the source neural network 1705 may mimic the extent of morphological, chemical, and other features of a biological system. Typically, the source neural network 1705 is implemented on one or more computing devices (e.g., supercomputers) having a relatively high level of computational performance. In such a case, the classification system 1700 will typically be a decentralized system in which the remote classifier 1710 is in communication with the source neural network 1705, e.g., via a data communication network.

In some implementations, the source neural network 1705 may be untrained, and the represented activity may be intrinsic activity of the source neural network 1705. In other implementations, the source neural network 1705 may be trained, and the represented activity may embody this training.

The representation read from the source neural network 1705 may be a representation such as representation 1200 (fig. 12, 13, 14). The representation may be read from the source neural network 1705 in a variety of ways. For example, in the illustrated example, the source neural network 1705 includes a "reader node" that reads patterns of activity between other nodes in the source neural network 1705. In other implementations, the activity in the source neural network 1705 is read by a data processing component programmed to monitor a relatively highly ordered pattern of activity of the source neural network 1705. In other implementations, the source neural network 1705 may include an output layer from which the representation 1200 may be read, for example, when the source neural network 1705 is implemented as a feed-forward neural network.

The linear classifier 1710 is a device that classifies objects, that is, representations of patterns of activity in the source neural network 1705, based on linear combinations of features of the objects. Linear classifier 1710 includes an input 1720 and an output 1725. Input 1720 is coupled to receive a representation of a pattern of activity in source neural network 1705. In other words, the representation of the pattern of activity in the source neural network 1705 is a feature vector that represents features of the input into the source neural network 1705 that are used by the linear classifier 1710 to classify the input. The linear classifier 1710 may receive a representation of a pattern of activity in the source neural network 1705 in a variety of ways. For example, the representation of the pattern of activity may be received as discrete events or as a continuous stream over a real-time or non-real-time communication channel.

Output 1725 is coupled to output the classification results from linear classifier 1710. In the illustrated implementation, output 1725 is illustrated schematically as a parallel port having multiple channels. This is not necessarily the case. For example, output 1725 may output the classification result through a serial port or a port with combined parallel and serial capabilities.

In some implementations, the linear classifier 1710 may be implemented on one or more computing devices with relatively limited computing performance. For example, the linear classifier 1710 may be implemented on a personal computer or a mobile computing device such as a smartphone or tablet computer.

In fig. 18, classification system 1800 includes a source neural network 1705 and a neural network classifier 1810. The neural network classifier 1810 is a neural network device that classifies an object, that is, a representation of a pattern of activity in the source neural network 1705, based on a non-linear combination of features of the object. In the illustrated implementation, the neural network classifier 1810 is a feed-forward network that includes an input layer 1820 and an output layer 1825. As with the linear classifier 1710, the neural network classifier 1810 may receive representations of patterns of activity in the source neural network 1705 in a variety of ways. For example, the representation of the pattern of activity may be received as discrete events or as a continuous stream over a real-time or non-real-time communication channel.

In some implementations, the neural network classifier 1810 may perform the inference on one or more computing devices with relatively limited computational performance. For example, the neural network classifier 1810 may be implemented on a personal computer or a mobile computing device such as a smartphone or tablet computer, e.g., in a neural processing unit of such a device. Like classification system 1700, classification system 1800 will typically be a decentralized system in which remote neural network classifier 1810 communicates with source neural network 1705, e.g., via a data communications network.

In some implementations, the neural network classifier 1810 may be, for example, a deep neural network, such as a convolutional neural network that includes convolutional layers, pooled layers, and fully-connected layers. The convolutional layer may generate a feature map, for example, using a linear convolution filter and/or a non-linear activation function. The pooling layer reduces the number of parameters and controls overfitting. The computations performed by different layers in image classifier 1820 may be defined differently in different implementations of image classifier 1820.

In fig. 19, a classification system 1900 includes a source approximator 1905 and a linear classifier 1710. As discussed further below, the source approximator 1905 is a relatively simple neural network that is trained to receive inputs-at the input layer 1915 or elsewhere-and output vectors that approximate representations of topologies that occur in patterns of activity in a relatively more complex neural network. For example, source approximator 1905 may be trained to approximate a recurrent source neural network, such as, for example, a recurrent neural network that mimics a biological system and includes a degree of morphology, chemistry, and other features of the biological system. In the illustrated implementation, source approximator 1905 includes an input layer 1915 and an output layer 1920. The input layer 1915 may be coupled to receive input data. The output layer 1920 is coupled to output an approximation of the representation of activity within the neural network device for receipt by the input 1720 of the linear classifier. For example, output layer 1920 may output an approximation 1200' of representation 1200 (fig. 12, 13, 14). Note that the representation 1200 schematically illustrated in fig. 17 and 18 and the approximation 1200' of the representation 1200 schematically illustrated in fig. 19 and 20 are the same. This is for convenience only. Generally, the approximation 1200' will differ from the representation 1200 at least in some respects. Despite these differences, the linear classifier 1710 can still classify the approximation 1200'.

In general, the source approximator 1905 may perform inference on one or more computing devices with relatively limited computing capabilities. For example, the source approximator 1905 may be implemented on a personal computer or a mobile computing device, such as a smartphone or tablet computer, e.g., in a neural processing unit of such a device. Generally and in contrast to the

classification systems

1700, 1800, the classification system 1900 will typically be housed within a single housing, for example, with the source approximator 1905 and the linear classifier 1710 implemented on the same data processing device or on data processing devices coupled by a hardwired connection.

In fig. 20, classification system 2000 includes a source approximator 1905 and a neural network classifier 1810. The output layer 1920 of the source approximator 1905 is coupled to output an approximation 1200' of the representation of activity within the neural network device for receipt by an input 1820 of the neural network classifier 1810. Despite any differences between approximation 1200 'and representation 1200, neural network classifier 1810 may still classify approximation 1200'. Generally and like classification system 1900, classification system 1900 will typically be housed within a single housing, for example, with source approximator 1905 and neural network classifier 1810 implemented on the same data processing device or on data processing devices coupled by a hardwired connection.

Fig. 21 is a schematic illustration of an edge device 2100 that includes a local artificial neural network that can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network. In this scenario, the local artificial neural network may be, for example, an artificial neural network that executes entirely on one or more local processors that do not require a communication network to exchange data. Typically, the local processors will be connected by a hard-wired connection. In some instances, the local processor may be housed within a single housing, such as a single personal computer or a single handheld, mobile device. In some instances, the local processor may be controlled and accessed by a single individual or a limited number of individuals. Indeed, by training (e.g., using supervised learning or reinforcement learning techniques) a simpler and/or less highly trained but more unique second neural network using a representation of the occurrence of topology in a more complex source neural network, even individuals with limited computational resources and a limited number of training samples can train the neural network as needed. Storage requirements and computational complexity during training are reduced and resources like battery life are saved.

In the illustrated implementation, the side device 2100 is schematically illustrated as a security camera device that includes an optical imaging system 2110, image processing electronics 2115, a source approximator 2120, a representation classifier 2125, and a communication controller and interface 2130.

The optical imaging system 2110 may comprise, for example, one or more lenses (or even a pinhole) and a CCD device. Image processing electronics 2115 can read the output of the optical imaging system 2110 and can typically perform basic image processing functions. Communication controller and interface 2130 is a device configured to control the flow of information to and from device 2100. Among the operations that may be performed by the communications controller and interface 2130 are the transmission of images of interest to other devices and the reception of training information from other devices, as discussed further below. Thus, the communication controller and interface 2130 may include both a data transmitter and receiver, which may communicate through, for example, a data port 2135. Data port 2135 may be a wired port, a wireless port, an optical port, etc.

The source approximator 2120 is a relatively simple neural network that is trained to output vectors that approximate representations of topologies that occur in patterns of activity in a relatively complex neural network. For example, the source approximator 2120 may be trained to approximate a recurrent source neural network, such as, for example, a recurrent neural network that mimics a biological system and includes degrees of morphology, chemistry, and other features of the biological system.

The representation classifier 2125 is a linear classifier or a neural network classifier coupled to receive an approximation of a representation of a pattern of activity in the source neural network from the source approximator 2120 and output a classification result. The representation classifier 2125 may be, for example, a deep neural network, such as a convolutional neural network that includes convolutional layers, pooling layers, and fully-connected layers. The convolutional layer may generate a feature map, for example, using a linear convolution filter and/or a non-linear activation function. The pooling layer reduces the number of parameters and controls overfitting. The computations performed by different layers in the representation classifier 2125 may be defined in different ways in different implementations of the representation classifier 2125.

In operation, in some implementations, the optical imaging system 2110 can generate a raw digital image. The image processing electronics 2115 may read the original image and will typically perform at least some basic image processing functions. The source approximator 2120 may receive the images from the image processing electronics 2115 and perform an inference operation to output a vector that approximates a representation of the topology that occurs in a pattern of activity in a relatively complex neural network. This approximation vector is input into a representation classifier 2125, which determines whether the approximation vector satisfies one or more sets of classification criteria 2125. Examples include facial recognition and other machine vision operations. In the event that the representation classifier 2125 determines that the approximation vector satisfies a set of classification criteria, the representation classifier 2125 may instruct the communication controller and interface 2130 to transmit information about the image. For example, the communication controller and interface 2130 may communicate the image itself, the classification, and/or other information about the image.

Sometimes, it may be desirable to change the classification process. In these cases, the communications controller and interface 2130 may receive a training set. In some implementations, the training set may include raw or processed image data and representations of topology that occurs in patterns of activity in a relatively complex neural network. Such a training set may be used to retrain the source approximator 2120, e.g., using supervised learning or reinforcement learning techniques. In particular, the representation is used as a target answer vector and represents the expected result of the source approximator 2120 processing the raw or processed image data.

In other implementations, the training set may include representations of topologies that occur in patterns of activity in a relatively complex neural network and desired classifications of those representations of the topologies. Such a training set may be used to retrain the neural network representation classifier 2125, e.g., using supervised learning or reinforcement learning techniques. In particular, the desired classification is used as the target answer vector and represents the desired result of the representation of the processing topology by the classifier 2125.

Regardless of whether the source approximator 2120 or the representation classifier 2125 is retrained, the inference operations at the device 2100 can be easily adapted to changing conditions and goals without requiring large training data sets and time-intensive and computing-power-intensive iterative training.

Fig. 22 is a schematic illustration of a second edge device 2200 that includes a local artificial neural network that can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network. In the illustrated implementation, the second side device 2200 is schematically illustrated as a mobile computing device such as a smartphone or tablet computer. Device 2200 includes an optical imaging system (e.g., on the back of device 2200, not shown), image processing electronics 2215, a presentation classifier 2225, a communications controller and interface 2230, and a data port 2235. These components may feature and perform actions corresponding to the actions of the optical imaging system 2110, the image processing electronics 2115, the presentation classifier 2125, the communication controller and interface 2130, and the data port 2135 in the device 2100 (fig. 21).

The illustrated implementation of device 2200 additionally includes one or more additional sensors 2240 and a multi-input source approximator 2245. The sensor 2240 may sense one of a plurality of characteristics of the environment surrounding the device 2200 or of the device 2200 itself. For example, in some implementations, the sensor 2240 may be an accelerometer that senses acceleration experienced by the device 2200. As another example, in some implementations, the sensor 2240 may be an acoustic sensor, such as a microphone, that senses noise in the environment of the device 2200. Yet another example of sensors 2240 includes chemical sensors (e.g., "artificial noses," etc.), humidity sensors, radiation sensors, and so forth. In some cases, sensors 2240 are coupled to processing electronics that can read the output of sensors 2240 (or other information, such as, for example, a contact list or map) and perform basic processing functions. Thus, different implementations of the sensors 2240 may have different "modalities" because the physical parameter that is physically sensed varies from sensor to sensor.

The multi-input source approximator 2245 is a relatively simple neural network that is trained to output a vector that approximates a representation of the topology that occurs in a pattern of activity in a relatively complex neural network. For example, multi-input source approximator 2245 may be trained to approximate a recurrent source neural network, such as, for example, a recurrent neural network that mimics a biological system and includes a degree of morphology, chemistry, and other features of the biological system.

Unlike source approximator 2120, multi-input source approximator 2245 is coupled to receive raw or processed sensor data from a plurality of sensors and, based on that data, returns an approximation of a representation of a topology that occurs in a pattern of activity in a relatively complex neural network. For example, the multi-input source approximator 2245 may receive processed image data from the image processing electronics 2215, as well as, for example, acoustic, acceleration, chemical, or other data from one or more sensors 2240. The multi-input source approximator 2245 may be, for example, a deep neural network, such as a convolutional neural network that includes convolutional layers, pooling layers, and fully-connected layers. The calculations performed by the different layers in the multiple-input source approximator 2245 may be specific to a single type of sensor data or multiple forms of sensor data.

Regardless of the particular organization of the multi-input source approximator 2245, the multi-input source approximator 2245 will return an approximation based on raw or processed sensor data from a plurality of sensors. For example, processed image data from the image processing electronics 2215 and acoustic data from the microphone sensor 2240 may be used by the multiple-input source approximator 2245 to approximate a representation of a topology that would appear in a pattern of activity in a relatively more complex neural network that receives the same data.

At times, it may be desirable to change the classification process at the device 2200. In these cases, the communications controller and interface 2230 may receive the training set. In some implementations, the training set may include raw or processed images, sounds, chemical or other data, and representations of topological structures that occur in patterns of activity in relatively complex neural networks. Such training sets may be used to retrain the multi-input source approximator 2245, e.g., using supervised learning or reinforcement learning techniques. In particular, the representation is used as a target answer vector and represents the expected result of the multi-input source approximator 2245 processing raw or processed image or sensor data.

In other implementations, the training set may include representations of topologies that occur in patterns of activity in a relatively complex neural network and desired classifications of those representations of the topologies. Such training sets may be used to retrain the neural network representation classifier 2225, e.g., using supervised learning or reinforcement learning techniques. In particular, the desired classification is used as the target answer vector and represents the desired result of the representation classifier 2225 processing the representation of the topology.

Regardless of whether the multi-input source approximator 2245 or the representation classifier 2225 is retrained, the inference operations at the device 2200 can be easily adapted to changing conditions and goals without requiring large training data sets and time-and computational-intensive iterative training.

FIG. 23 is a schematic illustration of a system 2300 in which a representation of the occurrence of a topology corresponding to activity in a source neural network can be used to train a local neural network 2300. The target neural network is implemented on a relatively simple, relatively inexpensive data processing system, while the source neural network may be implemented on a relatively complex, relatively expensive data processing system.

The system 2300 includes a variety of devices 2305 having local neural networks, a telephone base station 2310, a wireless access point 2315, a server system 2320 and one or more data communication networks 2325.

The local neural network device 2305 is a device configured to process data using a computationally-less-intensive target neural network. As illustrated, the local neural network device 2305 can be implemented as a mobile computing device, a camera, an automobile, or any one of a large array of other appliances, fixtures, and mobile components, as well as different brands and models of devices within each category (category). Different local neural network devices 2305 may belong to different owners. In some implementations, access to the data processing functionality of the local neural network device 2305 will typically be limited to these owners and/or the owner's designations.

The local neural network devices 2305 may each include one or more source approximators trained to output vectors that approximate representations of topologies that occur in patterns of activity in a relatively complex neural network. For example, a relatively complex neural network may be a recurrent source neural network, such as, for example, a recurrent neural network that mimics a biological system and includes a degree of morphology, chemistry, and other features of the biological system.

In some implementations, in addition to processing data using a source approximator, the local neural network device 2305 can be programmed to retrain the source approximator using, as the target answer vector, a representation of a topology that occurs in a pattern of activity in a relatively complex neural network. For example, the local neural network device 2305 may be programmed to perform one or more iterative training techniques (e.g., gradient descent or random gradient descent). In other implementations, the source approximators in the local neural network devices 2305 are trainable by, for example, a dedicated training system or by a training system installed on a personal computer that can interact with the local neural network devices 2305 to train the source approximators.

Each local neural network device 2305 includes one or more wireless or wired data communication components. In the illustrated implementation, each local neural network device 2305 includes at least one wireless data communication component, such as a mobile telephone transceiver, a wireless transceiver, or both. The mobile telephone transceiver is capable of exchanging data with the telephone base station 2310. The wireless transceiver is capable of exchanging data with the wireless access point 2315. Each local neural network device 2305 may also be capable of exchanging data with peer mobile computing devices.

The telephone base station 2310 and the wireless access point 2315 are connected for data communication with one or more data communication networks 2325 and may exchange information with the server system 2320 via the networks. Thus, the local neural network device 2305 is also typically in data communication with the server system 2320. However, this is not necessarily the case. For example, in implementations where the local neural network device 2305 is trained by other data processing devices, the local neural network device 2305 need only communicate data with these other data processing devices at least once.

Server system 2320 is a system of one or more data processing devices programmed to perform data processing activities according to one or more sets of machine-readable instructions. The activity may include providing a training set to a training system for the mobile computing device 2305. As discussed above, the training system may be within the mobile local neural network device 2305 itself or on one or more other data processing devices. The training set may include representations of occurrences of topological structures corresponding to activities in the source neural network and corresponding input data.

In some implementations, the server system 2320 also includes a source neural network. However, this is not necessarily the case, and the server system 2320 may receive a training set from yet another system of data processing devices that implements the source neural network.

In operation, after the server system 2320 receives the training set (from the source neural network found at the server system 2320 itself or elsewhere), the server system 2320 may provide the training set to the trainer training the mobile computing device 2305. The source approximator in the target local neural network device 2305 can be trained using a training set to cause the target neural network to approximate the operation of the source neural network.

Fig. 24, 25, 26, 27 are schematic illustrations of representations of the occurrence of topologies in activity in using neural networks in four

different systems

2400, 2500, 2600, 2700. The

systems

2400, 2500, 2600, 2700 can be configured to perform any of a number of different operations. For example, the

systems

2400, 2500, 2600, 2700 can perform object localization operations, object detection operations, object segmentation operations, object detection operations, prediction operations, action selection operations, and the like.

The object locating operation locates an object within the image. For example, a bounding box may be constructed around the object. In some cases, object localization may be combined with object recognition, where the located object is labeled with an appropriate name (designation).

The object detection operation classifies image pixels as belonging to a particular class (e.g., belonging to an object interest) or as not belonging to a particular class. In general, object detection is performed by grouping pixels and forming a bounding box around the pixel groups. The bounding box should fit tightly around the object.

Object segmentation typically assigns a class label to each image pixel. Thus, object segmentation is done on a pixel-by-pixel basis and typically requires only a single label to be assigned to each pixel, rather than a bounding box.

The prediction operation seeks to draw conclusions outside the scope of the observed data. While predictive operations may seek to predict future occurrences (e.g., based on information about past and current states), predictive operations may also seek to draw conclusions about past and current states based on incomplete information about these states.

The action selection operation seeks to select an action based on a set of conditions. Action selection operations have traditionally been broken down into different approaches, such as symbol-based systems (classical planning), distributed solutions, and reactive or dynamic planning.

The

classification systems

2400, 2500 each perform a desired operation on the representation of the pattern of activity in the neural network. The

systems

2600, 2700 each perform the desired operation on an approximation of the representation of the pattern of activity in the neural network. In the

systems

2400, 2500, the pattern of activity represented occurs in and is read from a source neural network device 1705 that is part of the

systems

2400, 2500. In contrast, in

systems

2400, 2500, the approximately represented pattern of activity occurs in a source neural network device that is not part of

systems

2400, 2500. However, approximations of representations of those patterns that are active are read from approximator 1905, which is part of

systems

2400, 2500.

In more detail, turning to fig. 24, system 2400 includes a source neural network 1705 and a linear processor 2410. The linear processor 2410 is a device that performs operations based on linear combinations of features of representations (or approximations of such representations) of patterns of activity in a neural network. The operation may be, for example, an object locating operation, an object detecting operation, an object dividing operation, an object detecting operation, a predicting operation, an action selecting operation, or the like.

The linear processor 2410 includes an input 2420 and an output 2425. Input 2420 is coupled to receive a representation of a pattern of activity in source neural network 1705. The linear processor 2410 may receive a representation of the pattern of activity in the source neural network 1705 in a variety of ways. For example, the representation of the pattern of activity may be received as discrete events or as a continuous stream over a real-time or non-real-time communication channel. An output 2525 is coupled to output the processed results from linear processor 2410. In some implementations, the linear processor 2410 may be implemented on one or more computing devices with relatively limited computing performance. For example, the linear processor 2410 may be implemented on a personal computer or a mobile computing device such as a smartphone or tablet computer.

Turning to fig. 24, system 2400 includes a source neural network 1705 and a linear processor 2410. The linear processor 2410 is a device that performs operations based on linear combinations of features of representations (or approximations of such representations) of patterns of activity in a neural network. The operation may be, for example, an object locating operation, an object detecting operation, an object dividing operation, a predicting operation, an action selecting operation, or the like.

In fig. 25, the classification system 2500 includes a source neural network 1705 and a neural network 2510. The neural network 2510 is a neural network device configured to perform operations based on non-linear combinations of features of representations of patterns of activity in the neural network (or approximations of such representations). The operation may be, for example, an object locating operation, an object detecting operation, an object dividing operation, a predicting operation, an action selecting operation, or the like. In the illustrated implementation, the neural network 2510 is a feed-forward network that includes an input layer 2520 and an output layer 2525. As with linear processor 2410, neural network 2510 can receive representations of patterns of activity in source neural network 1705 in a variety of ways.

In some implementations, the neural network 2510 may perform inference on one or more computing devices with relatively limited computational performance. For example, the neural network 2510 may be implemented on a personal computer or a mobile computing device such as a smartphone or tablet computer, e.g., in a neural processing unit of such a device. Like system 2400, system 2500 will typically be a decentralized system in which remote neural network 2510 communicates with source neural network 1705, e.g., via a data communication network. In some implementations, the neural network 2510 can be, for example, a deep neural network, such as a convolutional neural network.

In fig. 26, the system 2600 includes a source approximator 1905 and a linear processor 2410. Despite any differences between the approximation 1200 'and the representation 1200, the processor 2410 may still perform operations on the approximation 1200'.

In fig. 27, system 2700 includes source approximator 1905 and neural network 2510. Despite any differences between approximation 1200 'and representation 1200, neural network 2510 may still operate on approximation 1200'.

In some implementations,

systems

2600, 2700 can be implemented on edge devices, such as, for example, edge devices 2100, 2200 (fig. 21, 22). In some implementations, the

systems

2600, 2700 can be implemented as part of a system, such as the system 2300 (fig. 23), in which a local neural network can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network.

FIG. 28 is a schematic illustration of a reinforcement learning system 2800 that includes an artificial neural network that can be trained using a representation of the occurrence of a topology corresponding to activity in a source neural network. Reinforcement learning is a type of machine learning in which an artificial neural network learns from feedback regarding the results of actions taken in response to decisions of the artificial neural network. A reinforcement learning system moves from one state to another state in an environment by performing actions and receiving information characterizing new states, as well as rewards (rewards) and/or regrets (regrets) characterizing success (or lack of success) of the actions. Reinforcement learning seeks to maximize (or minimize regret) the total reward through the learning process.

In the illustrated implementation, the artificial neural network in the reinforcement learning system 2800 is a deep neural network 2805 (or other deep learning architecture) trained using a reinforcement learning approach. In some implementations, the deep neural network 2805 may be a local artificial neural network, such as neural network 2510 (fig. 25, 27), and implemented locally, for example, on an automobile, aircraft, robot, or other device. However, this need not be the case, and in other implementations, the deep neural network 2805 may be implemented on a system of networked devices (networked devices).

In addition to the source approximator 1905 and the deep neural network 2805, the reinforcement learning system 2800 also includes an actuator 2810, one or more sensors 2815, and a teacher module (teacher module) 2820. In some implementations, the reinforcement learning system 2800 also includes one or more additional data sources 2825.

Actuator 2810 is a device that controls a mechanism or system that interacts with environment 2830. In some implementations, the actuator 2810 controls a physical mechanism or system (e.g., steering of an automobile or positioning of a robot). In other implementations, the actuator 2810 may control a virtual mechanism or system (e.g., a virtual game board (game board) or an investment portfolio). Thus, environment 2830 may also be physical or virtual.

Sensor 2815 is a device that measures characteristics of environment 2830. At least some of the measurements (measurements) characterize the interaction between the mechanism or system being controlled and other aspects of environment 2830. For example, when the actuator 2810 maneuvers a car, the sensors 2815 may measure one or more of the speed, direction, and acceleration of the car, the proximity of the car to other features, and the response of other features to the car. As another example, when the actuator 2810 controls a portfolio, the sensor 2815 may measure the value and risk associated with the portfolio.

Typically, both the source approximator 1905 and the teacher module 2820 are coupled to receive at least some of the measurements obtained by the sensors 2815. For example, the source approximator 1905 may receive measurement data at the input layer 1915 and output an approximation 1200' of a representation of a topology that occurs in a pattern of activity in the source neural network.

The teacher module 2820 is a device configured to interpret measurements received from the sensors 2815 and provide rewards and/or regrettments to the deep neural network 2805. The reward is positive and indicates successful control of the institution or system. Unfortunately, negative and indicate unsuccessful or less than optimal control. Typically, the teacher module 2820 also provides characterization of the measurement results and incentives/regrets to reinforce learning. Typically, the characterization of the measurement is an approximation (such as approximately 1200') of the representation of the topology that occurs in the pattern of activity in the source neural network. For example, the teacher module 2820 may read the approximation 1200 'output from the source approximator 1905 and pair the read approximation 1200' with the corresponding reward/regret value.

In various implementations, reinforcement learning does not occur in real time in the system 2800 or during active control of the actuator 2810 by the deep neural network 2805. Instead, training feedback may be collected by the teacher module 2820 and used to augment training when the deep neural network 2805 is not actively indicating the actuator 2810. For example, in some implementations, the teacher module 2820 may be remote from the deep neural network 2805 and only intermittently communicate data with the deep neural network 2805. Regardless of whether reinforcement learning is intermittent or continuous, the deep neural network 2805 may be evolved, for example, to optimize rewards and/or reduce regrets using information received from the teacher module 2820.

In some implementations, the system 2800 also includes one or more additional data sources 2825. The source approximator 1905 may also receive data from a data source 2825 at the input layer 1915. In these examples, approximation 1200' would result from processing both sensor data and data from data source 2825.

In some implementations, data collected by one reinforcement learning system 2800 can be used for training or reinforcement learning of other systems (including other reinforcement learning systems). For example, characterization of the measurements and rewards/regrets may be provided by the teacher module 2820 to a data exchange system that collects such data from the various reinforcement learning systems and redistributes the data among them. Further, as discussed above, the characterization of the measurement may be an approximation of the representation of the topology that occurs in the pattern of activity in the source neural network, such as the approximation 1200'.

The particular operations performed by the reinforcement learning system 2800 will, of course, depend on the particular operational scenario. For example, in a scenario where the source approximator 1905, the deep neural network 2805, the actuators 2810, and the sensors 2815 are part of an automobile, the deep neural network 2805 may perform object positioning and/or detection operations while maneuvering the automobile.

In implementations where the data collected by the reinforcement learning system 2800 is used for training of other systems or reinforcement learning, rewards/regrets and approximations 1200' that characterize the state of the environment when performing object location and/or detection operations may be provided to the data exchange system. The data exchange system may then assign the reward/regret value and the approximation 1200' to other reinforcement learning systems 2800 associated with other vehicles for reinforcement learning at these other vehicles. For example, reinforcement learning may be used to improve object location and/or detection operations at the second vehicle using the reward/regret value and approximately 1200'.

However, the operation of learning at other vehicles need not be the same as the operation performed by the deep neural network 2805. For example, rewards/regrettes based on travel time and the approximation 1200' resulting from the input of sensor data characterizing an unexpectedly wet road at a location, for example, identified by the GPS data source 2825, may be used for route planning operations at another vehicle.

Embodiments of the operations and subject matter described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, program instructions may be encoded on an artificially generated propagated signal (e.g., a machine-generated electrical, optical, or electromagnetic signal) that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage media may be or be included in a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Further, although the computer storage medium is not a propagated signal, the computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium may also be or be included in one or more separate physical components or media, such as multiple CDs, disks, or other storage devices.

The operations described in this specification may be implemented as operations performed by data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.

The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or a combination of any or all of the foregoing. The apparatus can comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment may implement a variety of different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with the instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Moreover, a computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game controller, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a Universal Serial Bus (USB) flash drive), to name a few.

To provide for interaction with a user, embodiments of the subject matter described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with the user; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, the computer may interact with the user by sending and receiving documents to and from the device used by the user; for example, by sending a web page to a web browser on the user's client device in response to a request received from the web browser.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

Various embodiments have been described. Nevertheless, it will be understood that various modifications may be made. For example, although representation 1200 is a binary representation in which each bit individually represents the presence or absence of a feature in a graph, other representations of information are possible. For example, a multi-valued, non-binary digit vector or matrix may be used to represent, for example, the presence or absence of features and possibly other features that represent such features. One example of such a feature is the weight of the active edges that make up the feature.

Accordingly, other embodiments are within the scope of the following claims.

Claims

1. A method comprising outputting numbers from a recurrent artificial neural network, wherein each number represents whether an activity in a particular set of nodes in the recurrent artificial neural network is commensurate with a corresponding pattern of activity.

2. The method of claim 1, further comprising: determining whether each particular node group is commensurate with the corresponding pattern of activity by identifying a clique pattern of activity of the recurrent artificial neural network.

3. The method of claim 2, wherein the method further comprises defining a plurality of time windows during which the activity of the recurrent artificial neural network is responsive to input into the recurrent artificial neural network, wherein the clique pattern of activity is identified in each of the plurality of time windows, and each number represents whether the particular group of nodes is commensurate with the corresponding clique pattern of activity in a first one of the windows.

4. The method of claim 3, wherein the method further comprises: identifying the first window of the plurality of time windows based on a distinguishably increased likelihood of identifying the clique pattern during the first time window.

5. The method of claim 1, wherein the respective pattern of activity surrounds a cavity.

6. The method of claim 2, wherein identifying cliques comprises discarding or ignoring lower cliques that exist in higher cliques.

7. The method of claim 1, wherein each of the numbers is output from a reader node coupled to a set of nodes in the recurrent artificial neural network, wherein the reader node indicates whether activity of a node in the set is commensurate with a particular pattern of activity.

8. The method of claim 1, further comprising transmitting or storing only some of the read outputs, wherein transmitting or storing only some of the read outputs comprises:

transmitting or storing output associated with a pattern of activities having a relatively high complexity; and

the outputs associated with the active modes having relatively low complexity are discarded or ignored.

9. The method of claim 1, further comprising:

structuring the recurrent artificial neural network, including

Reading the number output from the recurrent artificial neural network, an

Evolving the structure of the recurrent artificial neural network, wherein evolving the structure of the recurrent artificial neural network comprises:

the structure is changed iteratively in such a way that,

characterizing the complexity of patterns of activity in a changed structure, an

Using the characterization of the complexity of the pattern as an indication of whether a changed structure is desired.

10. The method of claim 1, wherein the numbers are multi-valued, non-binary numbers, wherein the values each represent a weight assigned to an edge in a corresponding one of the active patterns.

11. An encoder or decoder comprising one or more computers operable to perform operations comprising outputting numbers from a recurrent artificial neural network, wherein each number represents whether an activity in a particular set of nodes in the recurrent artificial neural network is commensurate with a corresponding pattern of activity.

12. The encoder or decoder of claim 11, wherein the operations further comprise: determining whether each particular node group is commensurate with the corresponding pattern of activity by identifying a clique pattern of activity of the recurrent artificial neural network.

13. The encoder or decoder of claim 12, wherein the operations further comprise defining a plurality of time windows during which the activity of the recurrent artificial neural network is responsive to input into the recurrent artificial neural network, wherein the clique pattern of activity is identified in each of the plurality of time windows, and each number represents whether the particular group of nodes is commensurate with the respective clique pattern of activity in a first one of the windows.

14. The encoder or decoder of claim 13, wherein the operations further comprise: identifying the first window of the plurality of time windows based on a distinguishably increased likelihood of identifying the clique pattern during the first time window.

15. An encoder or decoder according to claim 11, wherein the respective pattern of activity surrounds a cavity.

16. The encoder or decoder of claim 11, wherein each of the numbers is output from a reader node coupled to a set of nodes in the recurrent artificial neural network, wherein the reader node indicates whether activity of a node in the set is commensurate with a particular pattern of activity.

17. The encoder or decoder of claim 11, further comprising transmitting or storing only some of the read outputs, wherein transmitting or storing only some of the read outputs comprises:

18. The encoder or decoder of claim 11, the operations further comprising:

structuring the recurrent artificial neural network, including

Reading the number output from the recurrent artificial neural network, an

the structure is changed iteratively in such a way that,

19. An encoder or decoder according to claim 11, wherein the numbers are multi-valued, non-binary numbers, wherein the values each represent a weight assigned to an edge in a corresponding one of the active patterns.

20. The encoder or decoder of claim 11, wherein the encoder or decoder is a video encoder or decoder.