EP4176387A1

EP4176387A1 - Construction of binary neural networks

Info

Publication number: EP4176387A1
Application number: EP20737008.1A
Authority: EP
Inventors: Jean-Claude Belfiore; Merouane Debbah; Van Minh Nguyen
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-06
Filing date: 2020-07-06
Publication date: 2023-05-10
Also published as: WO2022008032A1

Abstract

A method and a computing device for constructing and/or training a binary neural network, BNN, are disclosed. The method may remove a knot from a layer of a BNN by adding a new neuron to the layer and solving appropriate weights for the new neuron. A BNN can be constructed and/or trained by repeating this process for consecutive layers of the BNN.

Description

CONSTRUCTION OF BINARY NEURAL NETWORKS

TECHNICAL FIELD

The disclosure relates to a method, and more particularly to a method for constructing a binary neural network. Furthermore, the disclosure relates to corresponding computing device and a computer program.

BACKGROUND

Deep Neural Networks (DNNs) are computing systems inspired by the biological neural networks that constitute biological brains. DNNs can be trained to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, DNNs may be trained to identify images that contain cars by analysing example images that have been manually labelled as “car” or “no car” and using the results to identify cars in other images. DNNs do this without any prior knowledge about cars. Instead, they automatically generate identifying features from the learning material that they process.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

It is an object to provide a device and a method for constructing a binary neural network. The object is achieved by the features of the independent claims. Further implementation forms are provided in the dependent claims, the description and the figures.

According to first aspect, a method comprises obtaining a dataset comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels; constructing a layer of a binary neural network using the dataset, wherein the constructing the layer comprises at least: identifying at least one knot in the layer, wherein a knot corresponds to at least two inputs of the layer that are mapped to the same output by the layer and correspond to different labels in the plurality of labels; partitioning the at least one knot into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part; adding a new binary neuron comprising a plurality of weights into the layer; and assigning values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part. Thus, the method may remove a knot from a layer of a binary neural network by adding a new neuron to the layer and solving appropriate weights for the new neuron. The method may enable, for example, constructing and/or training a layer of a binary neural network. The method operations may be repeated until the layer comprises no more knots, thus removing all knots from a layer.

In an implementation form of the first aspect, the partitioning the at least one knot into a first part and a second part comprises: identifying a number of occurrences in the dataset for each input in the at least two inputs; and consecutively assigning each input in the at least two inputs into the first part or into the second part according to which part has a lower sum of number of occurrences. The method may enable, for example, efficiently partitioning the knot into the first part and the second part.

In a further implementation form of the first aspect, wherein the assigning values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part comprises: forming a system of linear equations for the plurality of weights according to the condition that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part; and solving the system of linear equations. The method may enable, for example, efficiently assigning the weights for the new neuron.

In a further implementation form of the first aspect, the solving the system of linear equations comprises: assigning each row in the system of linear equations a number of occurrences based on the dataset; and performing row elimination sequentially for the system of linear equations, wherein order of the row elimination follows the number of occurrences assigned to each row. The method may enable, for example, equations with higher statistics to be considered first when solving the system of linear equations. Thus, in case of overdetermined system, more important equations are satisfied first.

In a further implementation form of the first aspect, the solving the system of linear equations further comprises: in response to a pivoting condition being met, pivoting columns of the system of linear equations after the row elimination. The method may enable, for example, ensuring that the row added in the current iteration will not be eliminated by lower priority rows in subsequent iterations.

In a further implementation form of the first aspect, the method further comprises repeating the operations of identifying at least one knot, partitioning the at least one knot, generating a new binary neuron, and assigning value to the plurality of weights until a preconfigured condition is met. The preconfigured condition may comprise, for example, the layer not comprising any more knots. Thus, the method may construct a knot-free layer. In a further implementation form of the first aspect, the method further comprises constructing a binary neural network comprising plurality of layers wherein an output of each layer functions as an input of a consecutive layer, and wherein each layer in the plurality of layers is constructed consecutively by performing the layer constructing. The method may enable, for example, constructing a knot-free binary neural network.

In a further implementation form of the first aspect, the method further comprises using the binary neural network in a telecommunication device. Alternatively or additionally, the binary neural network may be used in other devices, such as smartphones, sensors, and wearables. The binary neural network may be configured to perform, for example, image recognition, portrait mode photography, text prediction, user profiling, de-noising, or camera enhancement.

According to a second aspect, a computer program is provided, comprising program code configured to perform a method according to the first aspect when the computer program is executed on a computer.

According to a third aspect, a computing device is configured to: obtain a dataset comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels; construct a layer of a binary neural network using the dataset, wherein the constructing the layer comprises at least: identify at least one knot in the layer, wherein a knot corresponds to at least two inputs of the layer that are mapped to the same output by the layer and correspond to different labels in the plurality of labels; partition the at least one knot into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part; add a new binary neuron comprising a plurality of weights into the layer; and assign values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part. The computing device may, for example, construct and/or train a layer of a binary neural network. The computing device may repeat the procedure until the layer comprises no more knots.

In an implementation form of the third aspect, the computing device is further configured to partition the at least one knot into the first part and the second part by performing: identify a number of occurrences in the dataset for each input in the at least two inputs; and consecutively assign each input in the at least two inputs into the first part or into the second part according to which part has a lower sum of number of occurrences. The computing device may, for example, efficiently partition the knot into the first part and the second part.

In a further implementation form of the third aspect, the computing device is further configured to assign the values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part by performing: form a system of linear equations for the plurality of weights according to the condition that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part; and solve the system of linear equations. The computing device may, for example, efficiently assign the weights for the new neuron.

In a further implementation form of the third aspect, the computing device is further configured to solve the system of linear equations by performing: assign each row in the system of linear equations a number of occurrences based on the dataset; and perform row elimination sequentially for the system of linear equations, wherein order of the row elimination follows the number of occurrences assigned to each row. The computing device may, for example, consider equations with higher statistics first when solving the system of linear equations. Thus, in case of overdetermined system, more important equations are satisfied first.

In a further implementation form of the third aspect, the computing device is further configured to solve the system of linear equations by performing: in response to a pivoting condition being met, pivot columns of the system of linear equations after the row elimination. The computing device may, for example, ensure that the row added in the current iteration will not be eliminated by lower priority rows in subsequent iterations.

In a further implementation form of the third aspect, the computing device is further configured to repeat the operations of identifying at least one knot, partitioning the at least one knot, generating a new binary neuron, and assigning value to the plurality of weights until a preconfigured condition is met. The preconfigured condition may comprise, for example, the layer not comprising any more knots. Thus, computing device may construct a knot-free layer.

In a further implementation form of the third aspect, the computing device is further configured to construct a binary neural network comprising plurality of layers wherein an output of each layer functions as an input of a consecutive layer, and to construct each layer in the plurality of layers consecutively by performing the layer constructing. The computing device may, for example, construct a knot-free binary neural network.

Many of the attendant features will be more readily appreciated as they become better understood by reference to the following detailed description considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:

Fig. 1 illustrates a schematic representation of a computing device according to an embodiment; Fig. 2 illustrates a flow chart representation of a method according to an embodiment;

Fig. 3 illustrates a schematic representation of neural network usage according to an embodiment;

Fig. 4 illustrates a schematic representation of a binary neuron according to an embodiment;

Fig. 5 illustrates a schematic representation of a truth tables of Boolean functions according to an embodiment;

Fig. 6 illustrates a schematic representation of a mapping of a layer of a binary neural network according to an embodiment;

Fig. 7 illustrates a schematic representation of a binary neural network according to an embodiment;

Fig. 8 illustrates a schematic representation of a training dataset according to an embodiment;

Fig. 9 illustrates a schematic representation of a mapping of a knot in a binary neural network according to an embodiment;

Fig. 10 illustrates a schematic representation of mapping of a layer of a binary neural network comprising a knot according to an embodiment;

Fig. 11 illustrates a flow chart representation of a procedure for constructing a layer of a binary neural network according to an embodiment;

Fig. 12 illustrates a flow chart representation of a procedure for constructing a binary neural network according to an embodiment;

Fig. 13 illustrates a graph representation of a knot in a binary neural network according to an embodiment;

Fig. 14 illustrates a table representation of knot removal in a binary neural network according to an embodiment;

Fig. 15 illustrates a table representation of knot removal in a binary neural network according to another embodiment; and

Fig. 16 illustrates a flow chart representation of a method for solving a system of linear equations according to an embodiment.

Like references are used to designate like parts in the accompanying drawings.

DETAILED DESCRIPTION

The detailed description provided below in connection with the appended drawings is intended as a description of the embodiments and is not intended to represent the only forms in which the embodiment may be constructed or utilized. However, the same or equivalent functions and structures may be accomplished by different embodiments. Fig. 1 illustrates a schematic representation of a computing device 100 according to an embodiment.

According to an embodiment, the computing device 100 is configured to obtain a dataset comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels.

The computing device 100 may be further configured to construct a layer of a binary neural network using the dataset.

The constructing the layer may comprise identify at least one knot in the layer, wherein a knot corresponds to at least two inputs of the layer that are mapped to the same output by the layer and correspond to different labels in the plurality of labels.

The constructing the layer may comprise partition the at least one knot into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part.

The constructing the layer may comprise add a new binary neuron comprising a plurality of weights into the layer.

The constructing the layer may comprise assign values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part.

The computing device 100 may comprise a processor 101. The computing device 100 may further comprise a memory 102.

In some embodiments at least some parts of the computing device 100 may be implemented as a system on a chip (SoC). For example, the processor 101, the memory 102, and/or other components of computing device 100 may be implemented using a field- programmable gate array (FPGA).

Components of the computing device 100, such as the processor 101 and the memory 102, may not be discrete components. For example, if the device 100 is implemented using a SoC, the components may correspond to different units of the SoC.

The processor 101 may comprise, for example, one or more of various processing devices, such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.

The memory 102 may be configured to store, for example, computer programs and the like. The memory 102 may include one or more volatile memory devices, one or more non volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. For example, the memory 102 may be embodied as magnetic storage devices (such as hard disk drives, floppy disks, magnetic tapes, etc.), optical magnetic storage devices, and semi-conductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory), etc.).

Functionality described herein may be implemented via the various components of the computing device 100. For example, the memory 102 may comprise program code for performing any functionality disclosed herein, and the processor 101 may be configured to perform the functionality according to the program code comprised in the memory 102.

When the computing device 100 is configured to implement some functionality, some component and/or components of the computing device 100, such as the one or more processors 201 and/or the memory 102, may be configured to implement this functionality. Furthermore, when the one or more processors 201 is configured to implement some functionality, this functionality may be implemented using program code comprised, for example, in the memory 102. For example, if the computing device 100 is configured to perform an operation, the one or more memories 202 and the computer program code can be configured to, with the one or more processors 201 , cause the computing device 100 to perform that operation.

According to an embodiment, a telecommunication device comprises the computing device 100.

Fig. 2 illustrates a flow chart representation of a method 200 according to an embodiment.

According to an embodiment, the method 200 comprises obtaining 201 a dataset comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels. For example, each input in the plurality of inputs may comprise a picture and a corresponding label may indicate an object in the picture. Thus, using the dataset, the a neural network can be trained to, for example, identify objects in pictures.

The dataset may also be referred to as a training dataset or similar.

The method 200 may further comprise constructing 202 a layer of a binary neural network using the dataset.

The constructing 202 may comprise identifying 203 at least one knot in the layer, wherein a knot corresponds to at least two inputs of the layer that are mapped to the same output by the layer and correspond to different labels in the plurality of labels.

Inputs of a layer may be provided by a previous layer of the BNN. For a first layer of a BNN, the inputs of the layer may comprise inputs in the dataset. The constructing 202 may further comprise partitioning 204 the at least one knot into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part.

The constructing 202 may further comprise adding 205 a new binary neuron comprising a plurality of weights into the layer.

The constructing 202 may further comprise assigning 206 values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part.

The method 200 may be performed, for example, by the computing device 100.

At least some operations of the method 200 may be performed by a computer program product when executed on a computer.

The method 200 may further comprise repeating the operations of identifying at least one knot, partitioning the at least one knot, generating a new binary neuron, and assigning value to the plurality of weights until a preconfigured condition is met. Thus, the method operations of the constructing 202 the layer may be repeated until a condition is met.

The preconfigured condition may comprise, for example, the layer no more comprising knots. Thus, the method operations may be repeated until the layer no more comprises any knots. Thus, a knot-free layer can be constructed.

The method may further comprise constructing a binary neural network comprising plurality of layers wherein an output of each layer functions as an input of a consecutive layer, and wherein each layer in the plurality of layers is constructed consecutively by performing the layer constructing 202. Thus, the method operations of the constructing 202 the layer may be repeated for consecutive layers of a binary neural network in order to construct and train the binary neural network.

The binary neural network constructed using the method 200 may be use, for example, in a telecommunication device.

Fig. 3 illustrates a schematic representation of neural network usage according to an embodiment.

In machine learning, a model 301 can be trained 303 using a training dataset 302, producing a trained model 304. The trained model 304 can be tested 305 using test data 306.

A deep neural network (DNN) is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Typically, artificial neurons are aggregated into layers, where different layers may perform different kinds of transformations on their inputs. The connections between artificial neurons have a weight are adjusted as learning proceeds. The weight increases or decreases the strength of the signal at a connection. DNNs may play an important role in devices in future networks, such as smartphones, sensors and wearables. Especially for smartphones, there is already a big market for DNNs, with various diverse applications, including image recognition, portrait mode photography, text prediction, user profiling, de-noising and camera enhancement.

However, the implementation of DNNs on such resource constrained devices may be limited by the high resource consumption (memory and energy). In the standard implementation of DNNs, all inputs, outputs, weights, biases, and activation functions are considered to be real numbers, and are typically represented in computers with floating point arithmetic, meaning that each number is approximated using a 64-bit sequence. It is noteworthy that in deep learning applications, there may be DNNs with thousands of neurons, where each neuron may have numerous inputs, resulting in millions of such floating point numbers. Thus, it may be desirable to compress DNNs using a smaller-digit arithmetic, such as binary, ternary, or 8-bit. In particular, it may be possible that DNNs can be even made purely binary in which each number (input, output, and weight) is binary and operations are logic arithmetics.

Fig. 4 illustrates a schematic representation of a binary neuron 401 according to an embodiment.

Input signals 402 x ·= (b₁, b₂, - , b_m) , weights 403 w := (w₀, w₁, ..., w_m) and output signal 404 c of a binary neuron 401 can all be binary numbers. Such a structure of binary neuron can be represented with a Boolean function. A Boolean function h(c, w) is any function that takes values in the binary field F₂. Since the binary inputs take on totally n = 2^m input combinations from (0,0, ...,0) to (1,1, ...,1), a Boolean function can be specified by a truth table which defines the output value at all of the 2^m input combinations.

Fig. 5 illustrates a schematic representation of a truth tables 500 of Boolean functions according to an embodiment.

In the embodiments of Fig. 5, each Boolean function has two inputs b₁ b₂ and three weights w_1; w₂, w₃ and ® is the logic exclusive or (XOR) operator.

Fig. 6 illustrates a schematic representation of a mapping of a layer of a binary neural network according to an embodiment.

Each possible binary input of a layer can be represented by an integer number 601. The layer maps each possible input into an output comprising a plurality of bits, wherein each bit is an output of a binary neuron of the layer. In other words, the output of a layer comprising k binary neurons is binary /c-tuples whose elements are aggregated from neuron outputs. Like a binary neuron, a layer can be represented by a truth table 602 which comprises its neurons truth vectors. Each row of the truth table 602 corresponds to a binary output of the layer for a specific input and each column of the table 602 corresponds to the output of a neuron in the layer. Each row of the table 602 can be represented by an integer number and these numbers can be presented in a vector 603.

Fig. 7 illustrates a schematic representation of a binary neural network 700 according to an embodiment. It should be appreciated that the size of the neural network 700 in the embodiment of Fig. 7 may be too small for any practical application and is only intended for illustration.

The design and training of binary neural networks which are purely binary in both inference and training phases face various problems, notably because of combinatorial characteristics and lack of gradient notion due to discrete nature of the problem. One problem is that the network may map, in the hidden layers, data with different labels into the same output.

For example, the binary neural network 700 of the embodiment of Fig. 7 may need to classify a 2-bit number into one of three classes. The labelled dataset used for training includes data points {(0,0), (1,0)} labelled with v = 0, (0,1) labelled with v = 1, and (1,1) labelled with v = 2 as illustrated in the training dataset 800 of the embodiment of Fig. 8.

For such a learning task, each neuron of the neural network 700 of the embodiment of Fig. 7 may have two binary inputs, b_j1 b_j2, and may be implemented with 1st order Reed- Muller code using three weights w_;0, w_;1, w_;2, such that its output is given by where denotes logic XOR operator, and wb denotes logic AND operation of the variables w and b . The neurons h and h₂ may be initialized with weights [0,1,1] , and [1,1,1] , respectively. Truth vectors of the neurons and of the first layer L_t are then

As a result, layer L₁ maps inputs (0,0) and (1,1) to the same output (0,1), and maps (0,1) and (1,0) to the same output (1,0). Since inputs (0,0) and (1,1) have different labels (i.e., v = 0 and v = 2, respectively), the fact that layer L₁ maps these two inputs to the same output creates a problem. That is, no matter how layer L₂ is configured, these two inputs remain entangled together and become inseparable. This is illustrated in the embodiment of Fig. 9.

Fig. 9 illustrates a schematic representation of a mapping 900 of a knot 901 in a binary neural network according to an embodiment.

As is illustrated in the embodiment of Fig. 9, the layer L_t maps both inputs (0,0) and (1,1) into the same output (0,1) even though the inputs correspond to different labels. Such a situation may be referred to as a knot 901. Due to the knot 901, information required to distinguish the two inputs (0,0) and (1,1) from each other is lost after layer L_t and consecutive layers cannot, therefore, distinguish these inputs from each other.

Hence, a binary neural network should be made free of knots. If the network is trained with an iterative process of feedforward-backpropagation loops, the network should be free of knots in each iteration.

Disentanglement is however a complex problem. In the example above, one needs to try two sets of weights for the two neurons and evaluate if the obtained layer contains knot or not. This try-and-evaluation process needs to be repeated until the knot-free condition is met. If each of the k neurons of the layer has q weights, there are N = 2^q possible weight configurations. This amounts to select k configurations between 2^q possibilities, requiring operations. In practical applications, a layer may be composed of hundreds or even thousands of neurons, and each neuron has thousands of weights, this complexity is quickly prohibitive.

On the other hand, in the network design, the number of layers as well as the size of each layer are unknown and need to be determined. Trying different values for k will mutiply the above complexity, making the problem harder.

Thus, making a network free of knots is a problem of high complexity. Moreover, the classification nature of learning requires the convergence of all data of a same label to the same output. Making data converged entails knot creation whereas dissolving knots entails data divergence. This entanglement-divergence dilemma is a primary blocking problem to the design and training of binary neural networks.

At least some embodiments disclosed herein may properly determine the architecture of a binary DNN and train it to successfully achieve a learning task.

Fig. 10 illustrates a schematic representation of mapping 1000 of a layer of a binary neural network comprising a knot according to an embodiment.

In the embodiment of Fig. 10, a layer currently maps three inputs x₁,x₂, and x₃ to the same output y in which x₁ and x₂ have the same label U, while x₃ has label V different from U. Thus, the layer comprises a knot, because the layer maps input data of different labels to the same output. In the following, a method for resolving the knot is presented according to an embodiment.

Suppose that the layer is currently composed of k neurons. The current layer output y is a sequence of k bits.

The knot can be resolved by adding a new binary neuron to the layer that is configured to output binary value c for inputs x₁ and x₂, and binary complement for input x₃. As a result, the new layer will output [yc] for x₁ and x₂ , but [yc] for x₃. Hence, the knot is dissolved because the layer does not map input data of different labels to the same output, and on the other hand, the convergence of input data x₁ and x₂ to the same output is achieved.

The remaining task is to determine the weights of the new neuron from known information about inputs and the corresponding outputs. This can be solved by many efficient methods.

Fig. 11 illustrates a flow chart representation of a procedure for constructing a layer of a binary neural network according to an embodiment.

In an operation 1101, labelled input data and an empty layer are obtained. Since the layer is empty, it may be considered to comprise one knot.

In an operation 1102, the knot is bi-partitioned so that all inputs of the layer having the same label are partitioned into the same part.

In an operation 1103, weights of a new neuron to be added to the layer are determined based on the bi-partitioning.

In an operation 1104, the new neuron is added to the layer.

In an operation 1105, the layer function and knots of the layer are updated.

In an operation 1106, a predefined condition is checked. If the condition is not met, the procedure can move back to the operation 1102. Otherwise, the layer can be obtained in an operation 1107.

Using the procedure 1100, a layer of a binary neural network can be constructed and trained simultaneously. At the beginning, the empty layer forms one biggest knot mapping all the inputs. Adding the 1st neuron breaks this knot into 2 smaller knots, adding the second neuron breaks these 2 knots into at most 4 smaller knots, and so on. Until no inputs of different labels are mapped to the same output, the layer is free of knots, i.e., complete disentanglement.

A binary neural network can be constructed by incrementally constructing layer after layer with each being sequentially added one new neuron at a time so as to gradually break knots while enhancing the data convergence.

Fig. 12 illustrates a flow chart representation of a procedure 1200 for constructing a binary neural network according to an embodiment.

The binary neural network can be constructed and trained using a labelled training dataset 1201.

In an operation 1202, the construction of the binary neural network can be started from an empty network.

In an operation 1203, a layer of the binary neural network can be constructed. For the first layer of the network, input data of the layer can comprise the input data form the training dataset 1201. For each consecutive layer, the input data of the layer can comprise the output of the previous layer. In an operation 1204, outputs of the constructed layer 1204 can be obtained.

In an operation 1205, a predefined condition can be checked based on the training dataset 1201 and the obtained outputs of the layer. For example, the predefined condition may correspond to the binary neural network not comprising any knots and/or accuracy of the binary neural network. The predefined condition may comprise a plurality of sub-conditions. If the condition is met, the binary neural network can be obtained in an operation 1206. Otherwise, the procedure 1200 may move back to the operation 1203. Thus, the procedure 1200 can construct consecutive layers into the binary neural network until the predefined condition is met.

This process of incremental construction of layers can continue until predefined criteria are met. This principle transforms the complex combinatorial problem at the layer level into a single problem at the neuron level, hence entirely dissolving the combinatorial structure.

One advantage of the solution is that it can enable simultaneously determining the network architecture and solving primary blocking entanglement problem for successfully training binary DNNs. In addition, it is computationally efficient for any binary-field neuron.

The procedure 1200 can build and train a binary neural network simultaneously by incrementally constructing the network from scratch.

Each layer can be constructed by incrementally adding one neuron at a time until a predefined set of criteria is met, for instance until it does not contain any knots.

During the incremental construction process, the neuron to be added can be configured so as to break the current knots and enhance the convergence of data of the same label.

The process can be stopped when a set of predefined criteria is met, for instance, it may automatically stop when the training accuracy reaches a given threshold, or when a specified number of layers is reached.

In the following, an embodiment of the procedure is disclosed.

The binary network to be designed and trained can be a fully connected feedforward architecture of which each neuron is implemented as a Boolean function of binary inputs and binary weights. Boolean function η(x,w ) implementing a binary neuron of binary inputs x = ( b₁,b₂ , - . b_m) and weights w = (w₀,w·₁ ... ,w_q ) is generally given in the following form where p_i is a multivariate monomial of b₁,b₂, - , b_m. This means that p_i is one of the basis monomials . Examples of such Boolean functions h(c, w) are given it the following. h(c, w) = w₀ + å™ _{1 Wi}b(. In this function, each monomial p_t is a first order monomial. This function is indeed a first order Reed-Muller code. Wib_2i wherein [m/2\ denotes the greatest integer not greater than m/2. With this Boolean function, only binary inputs of even indices are considered. This function is also a first order polynomial, but it is not a first order Reed-Muller code, showing that the method is not constrained by any Boolean function used for implementing a binary neuron. h(c, w) = w₀ + In this function, monomials p_t include the first order and second order monomials. In this case, the neuron has q = 1 + m(m + l)/2 weights.

Here, all arithmetic operations are in the binary field.

Given a training dataset D = {(d_j, vi), i = 1,2, ... } wherein d_j is a data point and v_t is its label, the method can start with an empty network and construct layers sequentially. In the training dataset, labelled data points may be repeated, hence we firstly compact it into a new dataset denoted as {(X_{p i}^S_j ), i = 1,2, ... } such that each labelled data (x_it v_t) is unique and S_j counts its number of appearances in dataset D. s_t may be referred to as statistics or number of occurrences of (x_t, v_t).

The proposed method can repeat the same process over all layers and all neurons. In the following, the process for determining and adding one new neuron to a layer is disclosed. The information to this process is the compact dataset {(Xi,Vi,si), i = 1,2, ... } as described above. Denote by y_{i t} the current layer output at input x^ , {( xi,yd, i = 1,2, ... } presents the complete input-output mapping of the current layer. The new neuron is determined by the following steps: Bi-partition knots of a layer and determine neuron weights.

Fig. 13 illustrates a graph representation of a knot in a binary neural network according to an embodiment.

A purpose of the new neuron is to output different values at inputs that have different labels but have the same layer output. Since a binary neuron can only have two distinct outputs, whereas there can be more than two inputs which need to be separated, the question is to decide target output c or e for each input point wherein c denotes a binary number.

In the embodiment of Fig. 13, the a layer function is interpreted as a clustered graph 1300 such that one unique input X_j of the layer corresponds to one graph vertex whose value is set to its output y_it and all inputs of the same label are assigned to one graph cluster 1301. An edge is added between two any vertices which have the same value. Vertex X_j is also attributed statistics S_j . In this graph, we identify all cliques 1302 each of which is a fully connected subgraph. Hence, all vertices of the same clique 1302 have the same layer output. In such a graph, a clique which is completely inside one cluster 1301 corresponds to converged inputs and is called isolated clique, whereas a clique that lays over more than one cluster 1301 corresponds to one knot and is called non-isolated clique.

Fig. 14 illustrates a table representation of knot removal in a binary neural network according to an embodiment.

A clique will be separated if the output of the new neuron is not the same over all the vertices of the clique. Thus, a purpose to the new neuron is to preserve all isolated cliques, i.e., to output the same value for all vertices of an isolated clique and to break each non isolated clique into two parts such that all the vertices of the clique which are in the same cluster are assigned to the same part.

There can be several ways to perform the above bi-partitioning. The following example illustrates one technique for the embodiment of Fig. 14.

Fig. 15 illustrates a table representation of knot removal in a binary neural network according to another embodiment.

Firstly, we take the vertex of strongest statistics, which is x1, and assign it to part 1. Since x2 belongs to the same cluster as x1, we also assign x2 to part 1. After this first step, part 1 has sum statistics of 22, and part 2 has 0.

Among the remaining vertices, x3 has the strongest statistics and, since part 2 has smaller sum statistics than part 1, we assign x3 to part 2. Now, part 1 and part 2 have sum statistics of 22 and 15, respectively. Similarly, in the next step, we assign x4 to part 2, and part 2 now has sum statistics of 25. Finally, vertex x5 is assigned to part 1 which currently has smaller statistics than part 2.

At the end of this process, each non-isolated clique is bi-partitioned into two parts.

According to an embodiment, the partitioning 204 the at least one knot into a first part and a second part comprises identifying a number of occurrences in the dataset for each input in the at least two inputs and consecutively assigning each input in the at least two inputs into the first part or into the second part according to which part has a lower sum of number of occurrences.

For each clique, the target for the new neuron is to output value c at all the vertices of one part, and output value c at all the vertices of the other part.

The method in this embodiment does not require explicit value of c. Let h(c, w) be the neuron function of input x and weight vector w. Let c_t and c_; be the target outputs of two input points _i and x_j , we have:

If the relative relation between c_t and c_; is known, we can still obtain the right-hand side of the above equation:

From knot bi-partitioning, we have c_i = c_j if x_i and x_j are in the same partitioned part, and C_j = C_j otherwise. Hence, for each knot bi-partitioning a system of equations can be established by following the above principle.

For example, suppose that the layer has 2 inputs, i.e., m = 2, and the 2nd order Reed- Muller code is used for neurons. T where b_n is a binary input, then:

Hence,

In this equation, the binary inputs b_n are known. Thus, the unknown variables are the neuron weights w_t. Further, this equation is linear with respect to the weights w_t.

By collecting all the established equations, a system of linear equations in weights is obtained. The weights can then be obtained by using any method for solving a system of linear equations, such as a Gauss elimination method.

According to an embodiment, the assigning 206 of values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part comprises forming a system of linear equations for the plurality of weights according to the condition that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part and solving the system of linear equations.

The solving may comprise, for example, assigning each row in the system of linear equations a number of occurrences based on the dataset and performing row elimination sequentially for the system of linear equations, wherein order of the row elimination follows the number of occurrences assigned to each row.

The solving may further comprise, in response to a pivoting condition being met, pivoting columns of the system of linear equations after the row elimination.

In the following, construction of the system of linear equations is presented when the expected outputs are given in a concrete form. Constructing the target truth vector of the neuron from dataset D = {d₁, d₂, d₃, … } and V = {v₁, v₂, v₃, … }: Dataset D can be compacted into [x₁, x₂, … , x_n] such that each element of the resulting set [x₁, x₂ , … , x_n] is unique (i.e., not repetitive). Statistics can be represented by vector [s₁ , s₂, … , s_n] in which s_i is the number of occurrences of x_i in D. For each element x_i, get all the corresponding expected outputs from V. Then, count in N₀ the number of times its corresponding outputs is 0, and in N₁ of its outputs being 1. Set c_i to 0 or 1 following majority logic, i.e., c_i = 0 if N₀ ≥ N₁, and c_i = 1 otherwise. The resulting vector [x₁, x₂, … , x_n] coupled with [c₁, c₂, … , c_n] is referred to as the target truth vector of the neuron, and [s₁, s₂, … , s_n] is its statistics. The system of equations can be established by, for each element x_i, computing all the monomials (P_p1, P_i2, … , P_ik ⁾ from the m binary inputs expressed by x_i and by getting coefficient vector [P_i1, P_i2, … , P_ik , C_i]. The i-th binary linear equation can then be obtained as α_i1w₁ ⊕ α_i2w₂ ⊕ ⋯ ⊕ α_ikw_k = α_i(k+1) where [α_i1, α_i2, … , α_ik , α_i(k+1)] = [P_i1, P_i2, … , P_ik , C_i]. Statistics s_i can also be associated to the ^-th equation. System of the established equations can then be obtained. In the following, construction of the system of linear equations is presented when the expected outputs are given in a concrete or relative form. The target truth vector of the neuron can be constructed from dataset D = {d₁, d₂, d₃, … } and relative expected outputs. Dataset D can be compacted into [x₁, x₂ , … , x_n] such that each element of the resulting set [x₁, x₂, … , x_n] is unique (i.e., not repetitive). Statistics can be represented by vector [s₁, s₂ , … , s_n] in which s_i is the number of occurrences of x_i in D. A reference value can be set to c = 0 or c = 1 arbitrarily. Then, a concrete expected output set V can be obtained by setting all the points belonging to one part to c, and the remaining to ^̅ (binary complement of c). For each element x_i, all the corresponding expected outputs can be obtained from V. Then, the number of times its corresponding output is c can be counted in ^_^, and the number of times its corresponding output is ^̅ can be counted in N₂. After that, c_i can be set to c or ^̅ following majority logic, i.e., c_i = c if N1 ≥ N2, and ci = ̅ otherwise. The resulting vector [x₁, x₂, … , x_n] coupled with [c₁, c₂, … , c_n] is referred to as the target truth vector of the neuron, and [s₁, s₂ , … , s_n] is its statistics. For each element ^_^ , all the monomials (P_i1, P_i2, … , P_ik) can be computed and coefficient vector [P_i1, P_i2, … , P_ik , C_i] can be obtained. Point can be obtained which has the maximum statistics Elementwise XOR of the coefficient vector of the point ^^∗ with that of each remaining point ^ can be performed, obtaining , , … , , _{( )} . ^-th binary linear equation can be obtained as and its statistics is ^ A system of the established equations can be obtained. In the following, a procedure for solving the obtained system of equations is presented according to an embodiment. In an operation 1601, the equations are rearranged in decreasing order of their statistics. Let ^ be the matrix of which the ^-th row is the coefficients of the ^-th equation, i.e., is , , , , _{( )} ( ) columns. In an operation 1602, the following loop with empty matrix ^ (echelon-form matrix to be constructed), and row index ^ = 0 is started. In an operation 1603, ^ + 1 is assigned to ^. In an operation 1604, starting from ^th row ^_^ of matrix ^, row elimination of ^_^ is performed sequentially with each row ^ from the first to the last of ^ by using element-wise XOR operation of ^ ^ ^ and ^. The resulting row is denoted by ^_^ . In an operation 1605, the first non-zero element of ^^ ^ is searched for from left to right. Index of the first non-zero element is denoted by j. In an operation 1606, ^ is compared to n. If j > n, αi ^ is ignored (i.e., row ^_^ is linearly dependent of the rows of ^). The procedure moves back to the operation 1603. Otherwise the procedure moves to the operation 1607. In an operation 1607, row ^^ ^ is concatenated to . The number of rows in ^ is denoted by ^_^^^ . In an operation 1608, ^ is compared to the procedure moves to an operation 1609. In the operation 1609, columns ^ and ^_^^^ of ^ and of A are pivoted and the procedure moves to an operation 1610. If in the operation 1608, the procedure moves to the operation 1610 without performing the operation 1609. In the operation 1610, if or if the last row of ^ has been reached, the procedure moves to an operation 1611. Otherwise, the procedure moves to the operation 1603. Thus, the loop is repeated until matrix ^ has ^ rows or ^ reaches the last row of ^. In the operation 1611, the weights ) are obtained the matrix ^ by using the back-substitution method. In case of overdetermined system, more important equations can be satisfied first, since equations with higher statistics are considered first. Computational complexity can be reduced, since the echelon-form matrix can be constructed from empty and sequentially added to it one row at a time according to equation priority. The procedure can stop earlier if matrix ^ is quickly full rank, which reduces computational complexity and running time for a largely overdetermined system, since the stop condition is dynamic; the process stops when the resulting matrix ^ is full rank or when the pool of input rows has been iterated. The procedure can ensure that the row added in the current iteration is not eliminated by lower priority rows in subsequent iterations, since column pivoting is performed in each iteration. The procedure can ensure obtaining a best solution of neuron’s weights, since the procedure ensures having one solution to the system of equations. Although some of the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as embodiments of implementing the claims and other equivalent features and acts are intended to be within the scope of the claims. The functionality described herein can be performed, at least in part, by one or more computer program product components such as software components. Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs). It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to 'an' item may refer to one or more of those items. The term ‘and/or’ may be used to indicate that one or more of the cases it connects may occur. Both, or more, connected cases may occur, or only either one of the connected cases may occur. The operations of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the objective and scope of the subject matter described herein. Aspects of any of the embodiments described above may be combined with aspects of any of the other embodiments described to form further embodiments without losing the effect sought. The term 'comprising' is used herein to mean including the method, blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements. It will be understood that the above description is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, embodiments and data provide a complete description of the structure and use of exemplary embodiments. Although various embodiments have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this specification.

Claims

CLAIMS: 1. A method (200), comprising: obtaining (201) a dataset comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels; constructing (202) a layer of a binary neural network using the dataset, wherein the constructing the layer comprises at least: identifying (203) at least one knot in the layer, wherein a knot corresponds to at least two inputs of the layer that are mapped to the same output by the layer and correspond to different labels in the plurality of labels; partitioning (204) the at least one knot into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part; adding (205) a new binary neuron comprising a plurality of weights into the layer; and assigning (206) values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part.

2. The method (200) according to claim 1, wherein the partitioning the at least one knot into a first part and a second part comprises: identifying a number of occurrences in the dataset for each input in the at least two inputs; and consecutively assigning each input in the at least two inputs into the first part or into the second part according to which part has a lower sum of number of occurrences.

3. The method (200) according to any preceding claim, wherein the assigning values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part comprises: forming a system of linear equations for the plurality of weights according to the condition that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part; and solving the system of linear equations.

4. The method (200) according to claim 3, wherein the solving the system of linear equations comprises: assigning each row in the system of linear equations a number of occurrences based on the dataset; and performing row elimination sequentially for the system of linear equations, wherein order of the row elimination follows the number of occurrences assigned to each row.

5. The method (200) according to claim 4, wherein the solving the system of linear equations further comprises: in response to a pivoting condition being met, pivoting columns of the system of linear equations after the row elimination.

6. The method (200) according to any preceding claim, further comprising repeating the operations of identifying at least one knot, partitioning the at least one knot, generating a new binary neuron, and assigning value to the plurality of weights until a preconfigured condition is met.

7. The method (200) according to claim 6, wherein the preconfigured condition comprises the layer no more comprising knots.

8. The method (200) according to any preceding claim, further comprising constructing a binary neural network comprising plurality of layers wherein an output of each layer functions as an input of a consecutive layer, and wherein each layer in the plurality of layers is constructed consecutively by performing the layer constructing.

9. The method (200) according to any preceding claim, further comprising using the binary neural network in a telecommunication device.

10. A computer program product comprising program code configured the perform the method according to any preceding claim when the computer program product is executed on a computer.

11. A computing device (100), configured to: obtain a dataset (302) comprising a plurality of inputs and a plurality of labels, wherein each input in the plurality of inputs corresponds to a label in the plurality of labels; construct a layer of a binary neural network using the dataset, wherein the constructing the layer comprises at least: identify at least one knot (901) in the layer, wherein a knot corresponds to at least two inputs (401) of the layer that are mapped to the same output (404) by the layer and correspond to different labels in the plurality of labels; partition the at least one knot (901) into a first part and a second part, wherein inputs in the at least two inputs corresponding to the same label in the plurality of labels are partitioned into the same part; add a new binary neuron (401) comprising a plurality of weights (403) into the layer; and assign values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part.

12. The computing device (100) according to claim 11, further configured to partition the at least one knot into the first part and the second part by performing: identify a number of occurrences in the dataset for each input in the at least two inputs; and consecutively assign each input in the at least two inputs into the first part or into the second part according to which part has a lower sum of number of occurrences.

13. The computing device (100) according to claim 11 or claim 12, further configured to assign the values to the plurality of weights of the new binary neuron in such a way that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part by performing: form a system of linear equations for the plurality of weights according to the condition that an output of the new binary neuron for inputs in the first part is binary complementary to an output of the new binary neuron for inputs in the second part; and solve the system of linear equations.

14. The computing device (100) according to claim 13, further configured to solve the system of linear equations by performing: assign each row in the system of linear equations a number of occurrences based on the dataset; and perform row elimination sequentially for the system of linear equations, wherein order of the row elimination follows the number of occurrences assigned to each row.

15. The computing device (100) according to claim 14, further configured to solve the system of linear equations by performing: in response to a pivoting condition being met, pivot columns of the system of linear equations after the row elimination.

16. The computing device (100) according to any of claims 11 – 15, further configured to repeat the operations of identifying at least one knot, partitioning the at least one knot, generating a new binary neuron, and assigning value to the plurality of weights until a preconfigured condition is met.

17. The computing device (100) according to claim 16, wherein the preconfigured condition comprises the layer no more comprising knots.

18. The computing device (100) according to any of claims 11 – 17, further configured to construct a binary neural network comprising plurality of layers wherein an output of each layer functions as an input of a consecutive layer, and to construct each layer in the plurality of layers consecutively by performing the layer constructing.

19. A telecommunication device comprising the computing device according to any of claims 11 – 18.