Running a Keras Tensorflow neural network classifier model on a microcontroller to solve the XOR problem

Written by Guy Fernando

Created Sep 2022 - Last modified Aug 2024

The single-layer perceptron was first proposed by
Warren McCulloch and Walter Pitts in 1943 at the University of Illinois, having based the idea upon biological neurons
found in brain tissue. An actual single-layer perceptron was later realized in 1958 by
Frank Rosenblatt at the Cornell Aeronautical
Laboratory initially as a software program running on an
IBM 704 valve (vacuum tube) based computer, and
subsequently as the secret “Mark I Perceptron” intended to be used for photographic image classification.

Although the perceptron initially seemed promising, the famous book entitled Perceptrons by
Marvin Minsky and
Seymour Papert In 1969 showed that it was
impossible for a single-layer perceptron to solve the simple XOR function, and indeed many other classes of problem.
However Marvin and Papert did know that a multi-layer perceptron could solve the problem, but at the time had no training
algorithm for multi-layer perceptron networks. This caused the field of machine learning research to stagnate during the
1970s and early 1980s, and this period became known as the AI winter.

In 1986 a famous paper by David Rumelhart,
Geoffrey Hinton, and
Ronald Williams described an efficient
backpropagation algorithm for
neural networks, making it possible to solve problems which
had previously been unsolvable, including the XOR problem. Today, the backpropagation algorithm is the primary method used to train
feedforward neural networks such as the ones shown here.

The simplest kind of neural network is a single-layer perceptron, consisting of a single node P, two inputs X1 and X2, an
optional bias value, and a single output Y. The sum of the weighted input products and bias is calculated and fed into the
node’s specific activation function to
produce the final output.

Theactivation functionof a node defines the transfer characteristics of that node given an input or set of inputs. Activation functions include: Identity, ReLU, Tanh or Sigmoid to name a few.

The multi-layer perceptron has one or more additional hidden stacked layers, in this case one additional layer consisting
of the H1 and H2 nodes. The individual nodes H1, H2 and P work as single-layer perceptrons, where each layer propagates its
outputs to the inputs of the next layer. Typically nodes in a specific layer will use the same activation function. To train
the network, the backpropagation algorithm is used to iteratively adjust the weights with the goal of minimizing the cost
or loss function.

Aloss functionis a function that compares the target and predicted output values and determines how well the neural network models the training data. In supervised learning, there are two main types of loss functions, namely the regression and the classification loss functions.

hyperspaceis used to represent the input and output patterns as coordinates in space. Ahyperplaneis a subspace whose dimension is one less than that of its hyperspace. It is useful to train the network to find the hyperplanes that separate the data for classification purposes.

A single-layer perceptron can solve any linearly separable function such as the OR function since it is possible to draw a
single straight line (hyperplane) to separate and group the output patterns. The data is linearly separable using a
1-dimensional hyperplane.

Below shows one solution to the OR function using a single-layer perception with the indicated weights and bias values.
The node uses the linear activation function.

X1 | X2 | Y = X1 ∨ X2 | Output |
---|---|---|---|

0 | 0 | 0 | -0.5 |

1 | 0 | 1 | +0.5 |

0 | 1 | 1 | +0.5 |

1 | 1 | 1 | +1.5 |

A single-layer perceptron can solve any linearly separable function such as the AND function since it is possible to draw
a single straight line (hyperplane) to separate and group the output patterns. The data is linearly separable using a
1-dimensional hyperplane.

Below shows one solution to the AND function using a single-layer perception with the indicated weights and bias values.
The node uses the linear activation function.

X1 | X2 | Y = X1 ∧ X2 | Output |
---|---|---|---|

0 | 0 | 0 | -1.5 |

1 | 0 | 0 | -0.5 |

0 | 1 | 0 | -0.5 |

1 | 1 | 1 | +0.5 |

A single-layer perceptron model cannot solve the XOR function since a single straight line cannot be drawn to separate
and group the output patterns. However it is possible to draw two straight lines to separate and group the output patterns.
A multi-layer perceptron containing an extra layer of hidden neurons is capable of solving problems in 3-dimensional
hyperspace such as the XOR problem. The data is now linearly separable using a 2-dimensional hyperplane.

Simplified XOR gate circuit using NOR, NAND and OR gates.

Below shows solution to the XOR function using a multi-layer perception with the indicated weights and biases. All nodes
use the linear activation function.

X1 | X2 | Y = X1 ⊕ X2 | Output |
---|---|---|---|

0 | 0 | 0 | -0.5 |

1 | 0 | 1 | +0.5 |

0 | 1 | 1 | +0.5 |

1 | 1 | 0 | -0.5 |

It is interesting to note that the NAND and NOR functions can be simply implemented just by negating the input weights and
the bias of the AND and OR networks respectively. Furthermore, the XOR multi-layer perceptron combines basic NOR, NAND and
OR single-layer neural networks, in the same way as the simplified XOR logic symbol does by combining basic NOR, NAND and
OR logic gates.

` ````
# XOR Neural Network Problem.
# Guy Fernando (2022)
from tensorflow import keras
from keras.models import Sequential
from keras.layers.core import Dense
import tensorflow as tf
import numpy as np
# Input training data.
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
], 'float32')
# Output required.
Y = np.array([
[0],
[1],
[1],
[0]
], 'float32')
# Use a sequential model with 2 hidden neurons, and 1 output neuron.
model = Sequential()
model.add(Dense(2, input_dim = 2, activation = 'tanh'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam')
# Train model.
model.fit(X, Y, batch_size = 1, epochs = 10000, verbose = 0)
print(model.predict(X))
# Convert Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save tensorflow lite model to a file.
open("converted_model.tflite", "wb").write(tflite_model)
```

Keras is an open-source software library that provides a
Python interface for creating neural networks. Keras acts as a high level interface for the
Tensorflow library to enable fast and easy experimentation
with neural networks. Combined with
Jupyter Notebooks running within
Visual Studio Code, Python code using the Keras
library can be written, documented and tested interactively within a single development environment.

The Keras Python code shown here solves the XOR problem. The NumPy array variables `X`

and `Y`

hold the
training input tensor, and the required output tensor. The model consists of the same multi-layer perceptron as discussed above,
having two hidden nodes and a single output node. The only difference is the `tanh`

and ` sigmoid`

activation functions were used for the hidden and output nodes, instead of the Identity activation function. Although it
is possible to use the linear Identity activation function as has been shown above, it is sensible for the hidden layer to
use a non-linear activation function to improve the learning rate. It was found that this combination of activation functions
yielded the best training results for the minimum epochs.

The `binary_crossentropy`

loss function was chosen since the XOR function requires the output to be of binary
classification, i.e. 0 or 1. Many sources advise that the `sigmoid`

activation function be used on the last
layer when Binary Cross-Entropy is used. Keras optimizers are essentially backpropagation algorithms used to change the
weights and biases of the neural network in order to improve learning rate and classification accuracy of the neural
network. The `adam`

optimizer was chosen as it yielded the best training results for the minimum number of
training epochs.

When the model is compiled and executed, the following output appears in the Visual Studio Code terminal.

The terminal shows the predicted model output tensor `Y`

, given the input tensor `X`

. It can be
observed that the model has been successfully trained, since the first and last values equate approximately to 0 and the
two middle numbers to 1, i.e. the XOR function logic output we require. Ultimately the goal is to run the model on a
microcontroller, so the model is saved to a Tensorflow-Lite *.tflite* flatbuffermodel file for use in the next section.

Tensorflow Lite for Microcontrollers is designed to run machine learning models on various microcontroller devices with
only a few kilobytes of memory. The ESP32 is just one of
the platforms that is supported by the Tensorflow Lite framework. Here I have used Visual Studio Code with the PlatformIO
IDE extension to build an Arduino project that targets the ESP32 microcontroller.

The C++ code listed in *XorNN.h* and *XorNN.cpp* files encompasses a `XorNeuralNetwork`

wrapper class
created to encapsulate the neural network model and for making calls to the neural network model itself. The *main.cpp*
file listing shows the model is first instantiated when the `XorNeuralNetwork`

class is constructed, and
subsequently each combination of input to the XOR function is computed by making calls to the `RunModel`

method.

` ````
// XOR Neural Network Problem.
// Guy Fernando (2022)
#include "XorNN.h"
#include "Arduino.h"
XorNeuralNetwork* xorNeuralNetwork;
void setup()
{
Serial.begin(115200);
xorNeuralNetwork = new XorNeuralNetwork();
Serial.printf("\nArena used bytes = %d\n\n", xorNeuralNetwork->GetArenaUsedBytes());
}
void loop()
{
Serial.printf("XOR(0, 0) = %1.0f\n", xorNeuralNetwork->RunModel(0., 0.));
Serial.printf("XOR(0, 1) = %1.0f\n", xorNeuralNetwork->RunModel(0., 1.));
Serial.printf("XOR(1, 0) = %1.0f\n", xorNeuralNetwork->RunModel(1., 0.));
Serial.printf("XOR(1, 1) = %1.0f\n", xorNeuralNetwork->RunModel(1., 1.));
Serial.println("");
esp_deep_sleep_start();
}
```

Before the neural network can built for an ESP32 microcontroller, the previously saved *.tflite* flatbuffermodel file
is converted to a C byte array definition using the UNIX `xxd`

command, and then added to the project, along with
*XorNN.h*, *XorNN.cpp* and *main.cpp*.

After the project is built and uploaded to the ESP32 microcontroller, the PlatformIO serial monitor is used to observe the program output.

The reported heap memory used by the neural network model is 612 bytes, and the XOR function is calculated correctly.

` ````
// XOR Neural Network Problem.
// Guy Fernando (2022)
#ifndef __XOR_NN_H__
#define __XOR_NN_H__
// Include the library headers.
// To use the TensorFlow Lite for Microcontrollers library, we must include the following header files.
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"
using namespace tflite;
class XorNeuralNetwork
{
public:
XorNeuralNetwork(void);
float RunModel(float x1, float x2);
size_t GetArenaUsedBytes(void) const;
protected:
~XorNeuralNetwork(void);
private:
void AssertModelShape(void);
private:
uint8_t* tensor_arena;
MicroInterpreter* interpreter;
MicroErrorReporter micro_error_reporter;
ErrorReporter* error_reporter = µ_error_reporter;
};
#endif // __XOR_NN_H__
```

` ````
// XOR Neural Network Problem.
// Guy Fernando (2022)
#include "XorNN.h"
// Include the model header.
#include "XorModel.h"
// Include the unit test framework header.
#include "tensorflow/lite/micro/testing/micro_test.h"
TF_LITE_MICRO_TESTS_BEGIN
// No unit tests defined.
TF_LITE_MICRO_TESTS_END
XorNeuralNetwork::XorNeuralNetwork(void)
{
// Load the model.
const Model* model = GetModel(xor_model_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION)
{
TF_LITE_REPORT_ERROR(error_reporter,
"Model provided is schema version %d not equal to supported version %d.\n",
model->version(), TFLITE_SCHEMA_VERSION);
}
// Instantiate operations resolver.
AllOpsResolver resolver;
// Allocate memory.
const int tensor_arena_size = 1000;
this->tensor_arena = (uint8_t*) malloc(tensor_arena_size);
if (!tensor_arena_size)
{
TF_LITE_REPORT_ERROR(error_reporter, "Could not allocate arena");
}
// Instantiate the interpreter.
this->interpreter = new MicroInterpreter(
model, resolver, tensor_arena, tensor_arena_size, error_reporter);
// Allocate tensors.
this->interpreter->AllocateTensors();
// Validate the model shape.
AssertModelShape();
}
XorNeuralNetwork::~XorNeuralNetwork(void)
{
// Clean up arena.
delete this->tensor_arena;
this->tensor_arena = nullptr;
}
float XorNeuralNetwork::RunModel(float x1, float x2)
{
// Provide the input.
TfLiteTensor* input = this->interpreter->input(0);
input->data.f[0] = x1;
input->data.f[1] = x2;
// Run the model.
TfLiteStatus invoke_status = this->interpreter->Invoke();
if (invoke_status != kTfLiteOk)
{
TF_LITE_REPORT_ERROR(this->error_reporter, "Invoke failed\n");
}
// Obtain the output.
TfLiteTensor* output = this->interpreter->output(0);
float value = output->data.f[0];
return value;
}
size_t XorNeuralNetwork::GetArenaUsedBytes(void) const
{
return this->interpreter->arena_used_bytes();
}
void XorNeuralNetwork::AssertModelShape(void)
{
TfLiteTensor* input = this->interpreter->input(0);
TF_LITE_MICRO_EXPECT_NE(nullptr, input);
TF_LITE_MICRO_EXPECT_EQ(2, input->dims->size);
TF_LITE_MICRO_EXPECT_EQ(1, input->dims->data[0]);
TF_LITE_MICRO_EXPECT_EQ(kTfLiteFloat32, input->type);
TfLiteTensor* output = this->interpreter->output(0);
TF_LITE_MICRO_EXPECT_EQ(kTfLiteFloat32, output->type);
}
```

It has been shown here that it is possible to run a Keras Tensorflow neural network classifier model on a low cost ESP32
microcontroller, in this case to solve the XOR problem. Of course using a neural network on a microcontroller to calculate
the XOR function is purely an illustrative exercise and serves no practical use. All microcontrollers have a built in XOR
instruction that can calculate the XOR function enormously faster while using fewer resources than a neural network.

This website is powered using ultra low power green locally based servers.

Copyright © i4cy 2000-2024. All rights reserved.