Brief History of Neural Networks

The single-layer perceptron was first proposed by Warren McCulloch and Walter Pitts in 1943 at the University of Illinois, having based the idea upon biological neurons found in brain tissue. An actual single-layer perceptron was later realized in 1958 by Frank Rosenblatt at the Cornell Aeronautical Laboratory initially as a software program running on an IBM 704 valve (vacuum tube) based computer, and subsequently as the secret “Mark I Perceptron” intended to be used for photographic image classification.

Although the perceptron initially seemed promising, the famous book entitled Perceptrons by Marvin Minsky and Seymour Papert In 1969 showed that it was impossible for a single-layer perceptron to solve the simple XOR function, and indeed many other classes of problem. However Marvin and Papert did know that a multi-layer perceptron could solve the problem, but at the time had no training algorithm for multi-layer perceptron networks. This caused the field of machine learning research to stagnate during the 1970s and early 1980s, and this period became known as the AI winter.

In 1986 a famous paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams described an efficient backpropagation algorithm for neural networks, making it possible to solve problems which had previously been unsolvable, including the XOR problem. Today, the backpropagation algorithm is the primary method used to train feedforward neural networks such as the ones shown here.

Single-layer Perceptron

Single layer perceptron

The simplest kind of neural network is a single-layer perceptron, consisting of a single node P, two inputs X1 and X2, an optional bias value, and a single output Y. The sum of the weighted input products and bias is calculated and fed into the node’s specific activation function to produce the final output.

The activation function of a node defines the transfer characteristics of that node given an input or set of inputs. Activation functions include: Identity, ReLU, Tanh or Sigmoid to name a few.

Multi-layer Perceptron

The multi-layer perceptron has one or more additional hidden stacked layers, in this case one additional layer consisting of the H1 and H2 nodes. The individual nodes H1, H2 and P work as single-layer perceptrons, where each layer propagates its outputs to the inputs of the next layer. Typically nodes in a specific layer will use the same activation function. To train the network, the backpropagation algorithm is used to iteratively adjust the weights with the goal of minimizing the cost or loss function.

A loss function is a function that compares the target and predicted output values and determines how well the neural network models the training data. In supervised learning, there are two main types of loss functions, namely the regression and the classification loss functions.

Multi layer perceptron

A hyperspace is used to represent the input and output patterns as coordinates in space. A hyperplane is a subspace whose dimension is one less than that of its hyperspace. It is useful to train the network to find the hyperplanes that separate the data for classification purposes.

OR Neural Network

A single-layer perceptron can solve any linearly separable function such as the OR function since it is possible to draw a single straight line (hyperplane) to separate and group the output patterns. The data is linearly separable using a 1-dimensional hyperplane.

OR Logic Symbol

OR Neural Network

Below shows one solution to the OR function using a single-layer perception with the indicated weights and bias values. The node uses the linear activation function.

OR Truth Table

X1 X2 Y = X1 ∨ X2 Output
0 0 0 -0.5
1 0 1 +0.5
0 1 1 +0.5
1 1 1 +1.5
OR Hyperspace

AND Neural Network

A single-layer perceptron can solve any linearly separable function such as the AND function since it is possible to draw a single straight line (hyperplane) to separate and group the output patterns. The data is linearly separable using a 1-dimensional hyperplane.

AND Logic Symbol

AND Neural Network

Below shows one solution to the AND function using a single-layer perception with the indicated weights and bias values. The node uses the linear activation function.

AND Truth Table

X1 X2 Y = X1 ∧ X2 Output
0 0 0 -1.5
1 0 0 -0.5
0 1 0 -0.5
1 1 1 +0.5
AND Hyperspace

XOR Neural Network

A single-layer perceptron model cannot solve the XOR function since a single straight line cannot be drawn to separate and group the output patterns. However it is possible to draw two straight lines to separate and group the output patterns. A multi-layer perceptron containing an extra layer of hidden neurons is capable of solving problems in 3-dimensional hyperspace such as the XOR problem. The data is now linearly separable using a 2-dimensional hyperplane.

XOR Logic Symbol

Simplified XOR gate circuit using NOR, NAND and OR gates.

XOR Neural Network

Below shows solution to the XOR function using a multi-layer perception with the indicated weights and biases. All nodes use the linear activation function.

XOR Truth Table

X1 X2 Y = X1 ⊕ X2 Output
0 0 0 -0.5
1 0 1 +0.5
0 1 1 +0.5
1 1 0 -0.5
XOR Hyperspace

It is interesting to note that the NAND and NOR functions can be simply implemented just by negating the input weights and the bias of the AND and OR networks respectively. Furthermore, the XOR multi-layer perceptron combines basic NOR, NAND and OR single-layer neural networks, in the same way as the simplified XOR logic symbol does by combining basic NOR, NAND and OR logic gates.

Keras Tensorflow Model

# XOR Neural Network Problem.
# Guy Fernando (2022)

from tensorflow import keras
from keras.models import Sequential
from keras.layers.core import Dense
import tensorflow as tf
import numpy as np

# Input training data.
X = np.array([
   [0, 0],
   [0, 1],
   [1, 0],
   [1, 1]
], 'float32')

# Output required.
Y = np.array([
], 'float32')

# Use a sequential model with 2 hidden neurons, and 1 output neuron.
model = Sequential()
model.add(Dense(2, input_dim = 2, activation = 'tanh'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam')

# Train model., Y, batch_size = 1, epochs = 10000, verbose = 0)

# Convert Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save tensorflow lite model to a file.
open("converted_model.tflite", "wb").write(tflite_model)


Keras is an open-source software library that provides a Python interface for creating neural networks. Keras acts as a high level interface for the Tensorflow library to enable fast and easy experimentation with neural networks. Combined with Jupyter Notebooks running within Visual Studio Code, Python code using the Keras library can be written, documented and tested interactively within a single development environment.

The Keras Python code shown here solves the XOR problem. The NumPy array variables X and Y hold the training input tensor, and the required output tensor. The model consists of the same multi-layer perceptron as discussed above, having two hidden nodes and a single output node. The only difference is the tanh and sigmoid activation functions were used for the hidden and output nodes, instead of the Identity activation function. Although it is possible to use the linear Identity activation function as has been shown above, it is sensible for the hidden layer to use a non-linear activation function to improve the learning rate. It was found that this combination of activation functions yielded the best training results for the minimum epochs.

The binary_crossentropy loss function was chosen since the XOR function requires the output to be of binary classification, i.e. 0 or 1. Many sources advise that the sigmoid activation function be used on the last layer when Binary Cross-Entropy is used. Keras optimizers are essentially backpropagation algorithms used to change the weights and biases of the neural network in order to improve learning rate and classification accuracy of the neural network. The adam optimizer was chosen as it yielded the best training results for the minimum number of training epochs.

When the model is compiled and executed, the following output appears in the Visual Studio Code terminal.

Terminal Output

The terminal shows the predicted model output tensor Y, given the input tensor X. It can be observed that the model has been successfully trained, since the first and last values equate approximately to 0 and the two middle numbers to 1, i.e. the XOR function logic output we require. Ultimately the goal is to run the model on a microcontroller, so the model is saved to a Tensorflow-Lite .tflite flatbuffermodel file for use in the next section.

Running the Model on an ESP32 Microcontroller

Tensorflow Lite for Microcontrollers is designed to run machine learning models on various microcontroller devices with only a few kilobytes of memory. The ESP32 is just one of the platforms that is supported by the Tensorflow Lite framework. Here I have used Visual Studio Code with the PlatformIO IDE extension to build an Arduino project that targets the ESP32 microcontroller.

The C++ code listed in XorNN.h and XorNN.cpp files encompasses a XorNeuralNetwork wrapper class created to encapsulate the neural network model and for making calls to the neural network model itself. The main.cpp file listing shows the model is first instantiated when the XorNeuralNetwork class is constructed, and subsequently each combination of input to the XOR function is computed by making calls to the RunModel method.

// XOR Neural Network Problem.
// Guy Fernando (2022)

#include "XorNN.h"
#include "Arduino.h"

XorNeuralNetwork* xorNeuralNetwork;

void setup()

    xorNeuralNetwork = new XorNeuralNetwork();
    Serial.printf("\nArena used bytes = %d\n\n", xorNeuralNetwork->GetArenaUsedBytes());

void loop()
    Serial.printf("XOR(0, 0) = %1.0f\n", xorNeuralNetwork->RunModel(0., 0.));
    Serial.printf("XOR(0, 1) = %1.0f\n", xorNeuralNetwork->RunModel(0., 1.));
    Serial.printf("XOR(1, 0) = %1.0f\n", xorNeuralNetwork->RunModel(1., 0.));
    Serial.printf("XOR(1, 1) = %1.0f\n", xorNeuralNetwork->RunModel(1., 1.));


Before the neural network can built for an ESP32 microcontroller, the previously saved .tflite flatbuffermodel file is converted to a C byte array definition using the UNIX xxd command, and then added to the project, along with XorNN.h, XorNN.cpp and main.cpp.

After the project is built and uploaded to the ESP32 microcontroller, the PlatformIO serial monitor is used to observe the program output.

Microcontroller Output

The reported heap memory used by the neural network model is 612 bytes, and the XOR function is calculated correctly.

// XOR Neural Network Problem.
// Guy Fernando (2022)

#ifndef __XOR_NN_H__
#define __XOR_NN_H__

// Include the library headers.
// To use the TensorFlow Lite for Microcontrollers library, we must include the following header files.
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"

using namespace tflite;

class XorNeuralNetwork
    float RunModel(float x1, float x2);
    size_t GetArenaUsedBytes(void) const;


    void AssertModelShape(void);

    uint8_t* tensor_arena;
    MicroInterpreter* interpreter;

    MicroErrorReporter micro_error_reporter;
    ErrorReporter* error_reporter = µ_error_reporter;

#endif // __XOR_NN_H__


// XOR Neural Network Problem.
// Guy Fernando (2022)

#include "XorNN.h"

// Include the model header.
#include "XorModel.h"

// Include the unit test framework header.
#include "tensorflow/lite/micro/testing/micro_test.h"

// No unit tests defined.

    // Load the model.
    const Model* model = GetModel(xor_model_tflite);
    if (model->version() != TFLITE_SCHEMA_VERSION)
            "Model provided is schema version %d not equal to supported version %d.\n",
            model->version(), TFLITE_SCHEMA_VERSION);

    // Instantiate operations resolver.
    AllOpsResolver resolver;

    // Allocate memory.
    const int tensor_arena_size = 1000;
    this->tensor_arena = (uint8_t*) malloc(tensor_arena_size);
    if (!tensor_arena_size)
        TF_LITE_REPORT_ERROR(error_reporter, "Could not allocate arena");

    // Instantiate the interpreter.
    this->interpreter = new MicroInterpreter(
        model, resolver, tensor_arena, tensor_arena_size, error_reporter);

    // Allocate tensors.

    // Validate the model shape.

    // Clean up arena.
    delete this->tensor_arena;
    this->tensor_arena = nullptr;

float XorNeuralNetwork::RunModel(float x1, float x2)
    // Provide the input.
    TfLiteTensor* input = this->interpreter->input(0);
    input->data.f[0] = x1;
    input->data.f[1] = x2;

    // Run the model.
    TfLiteStatus invoke_status = this->interpreter->Invoke();
    if (invoke_status != kTfLiteOk)
        TF_LITE_REPORT_ERROR(this->error_reporter, "Invoke failed\n");

    // Obtain the output.
    TfLiteTensor* output = this->interpreter->output(0);
    float value = output->data.f[0];
    return value;

size_t XorNeuralNetwork::GetArenaUsedBytes(void) const
    return this->interpreter->arena_used_bytes();

void XorNeuralNetwork::AssertModelShape(void)
    TfLiteTensor* input = this->interpreter->input(0);
    TF_LITE_MICRO_EXPECT_NE(nullptr, input);
    TF_LITE_MICRO_EXPECT_EQ(2, input->dims->size);
    TF_LITE_MICRO_EXPECT_EQ(1, input->dims->data[0]);
    TF_LITE_MICRO_EXPECT_EQ(kTfLiteFloat32, input->type);

    TfLiteTensor* output = this->interpreter->output(0);
    TF_LITE_MICRO_EXPECT_EQ(kTfLiteFloat32, output->type);



It has been shown here that it is possible to run a Keras Tensorflow neural network classifier model on a low cost ESP32 microcontroller, in this case to solve the XOR problem. Of course using a neural network on a microcontroller to calculate the XOR function is purely an illustrative exercise and serves no practical use. All microcontrollers have a built in XOR instruction that can calculate the XOR function enormously faster while using fewer resources than a neural network.