## Motivations

For about a year or so I’ve had a desire to add some data science and machine learning skills to my software development skillset. Demand for these is at all all time high and machine learning algorithms are finding their way into system architectures everywhere. My motivation for learning some of these skills is partly motivated by fear of being left behind on these technologies, but also because I’ve seen some really cool applications in the DevOps space, e.g. automated triage of failing tests/logging in a continuous delivery setup.

I never really expect to be a Data Scientist at any point in my future career, it is most definitely a full-time discipline that is quite distinct from general software development. But I sell myself as being a good generalist in the dev space, so having 20% of a Data Scientist’s skill set would be greatly valuable to me.

## If at first you don’t succeed…

Honesty time: my first attempt at getting started in machine learning ended in total failure. I started how I begin to learn most new things: I bought a reputable book on the subject and sat down with the introductory chapters. What I saw was reams of linear algebra and calculus – it was very intimidating. I think if you make a living writing code you tend to have a fairly numerate brain, but I hadn’t dealt with this kind of math since I was at school. I struggled and gave up within a couple of days, ego bruised.

A number of months passed before I dusted myself down for a second attempt. I got a different book to start over with: Python Machine Learning (2nd edition) by Sebastian Raschka. This book gets you to implement some of the simpler algorithms by hand during the early chapters, which I found a more relatable way to learn what machine learning was all about. The book has the math in it as well, but this is much easier to digest if you can learn what is happening in a language that you can actually understand first, which for me was not the Greek alphabet.

## Perceptron: the first neural net

The Perceptron was the world’s first machine learning algorithm, invented by Frank Rosenblatt in 1957. It’s usually where most people start with machine learning algorithms as it’s the simplest one to understand. It’s a logical expression of how a neuron cell in a real brain works – it accepts a load of information as input and either fires or not based on that input, a binary response. This makes a single Perceptron a binary classifier.

To solidify my understanding of the algorithm further, I decided to take what I learned from Raschka’s Python implementation and redevelop it myself in PowerShell – coded to reflect how I reasoned about the algorithm rather than just copying out the Python in a different language. This blog post covers my PowerShell implementation of the Perceptron algorithm.

**The code covered below is available on GitHub here.**

## Working from the outside in

Before we look at the internal workings of the Perceptron algorithm, let’s consider the kind of inputs it accepts and the kind of output it produces. Here is a real usage of my Perceptron class, available in the GitHub repo as **test.ps1**.

using module ./Perceptron.psm1 # Acquire Iris dataset and import from CSV file. Invoke-WebRequest -Uri 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' -OutFile './iris.data' $samples = Import-Csv -Path './iris.data' -Header 'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'ClassName' # Select only setosa and versicolor samples (first 100). Our Perceptron is only a binary classifier. $samples = $samples | Select-Object -First 100 # Turn sample data into a two-dimensional array of feature values and an array of target class # labels (integers) that we will use in training. $samples2dArray = New-Object "double[,]" -ArgumentList $samples.length, 4 $targetClassLabels = New-Object "int[]" -ArgumentList $samples.length for ($i = 0; $i -lt $samples.length; $i++) { $samples2dArray[$i, 0] = $samples[$i].SepalLength $samples2dArray[$i, 1] = $samples[$i].SepalWidth $samples2dArray[$i, 2] = $samples[$i].PetalLength $samples2dArray[$i, 3] = $samples[$i].PetalWidth if ($samples[$i].ClassName -eq 'Iris-setosa') { $targetClassLabels[$i] = 0 # setosa = 0 } else { $targetClassLabels[$i] = 1 # versicolor = 1 } } # Create our Perceptron object. Experimenting with the arguments will give it different # performance characteristics. $perceptronArgs = @{ "LearningRate" = 0.0001; "Epochs" = 10; "RandomSeed" = 1 } $perceptron = New-Object -TypeName Perceptron -Property $perceptronArgs # Train the Perceptron with the sample data. Output the number of errors encountered # per pass over the sample data. $perceptron.Train($samples2dArray, $targetClassLabels) Write-Host "Errors per epoch: $($perceptron.ErrorsPerEpoch)" # If we feed the Perceptron new sample data, it should correctly identify the type of Iris. $newSetosaSample = @(5.1, 3.5, 1.4, 0.3) $result = $perceptron.Classify($newSetosaSample) if ($result -eq 0) { Write-Host "Successfully classified a Setosa Iris." } $newVersicolorSample = @(6.7, 3.0, 5.0, 1.7) $result = $perceptron.Classify($newVersicolorSample) if ($result -eq 1) { Write-Host "Successfully classified a Versicolor Iris." }

This test script:

- Downloads a CSV file from the internet containing a dataset on Iris flowers. This is a classic dataset for testing machine learning algorithms. It contains measurement values of the
*features*of iris*samples,*e.g. petal length. A*sample*is the total set of measurements about a single iris. A*feature*is a single measurement within that sample. - After we import the CSV we only take the first one hundred iris samples, which are all the samples that can be classified as either a
*setosa*iris or a*versicolor*iris. The standard Perceptron is a binary classifier, so we can only have two*class labels*. Given some input, the Perceptron needs to identify it as either a setosa or a versicolor. - We turn all the setosa and versicolor samples into a two-dimensional array. Each inner-array contains all the feature values of a single iris sample. The outer-array is an array of samples. This two-dimensional array can be thought of as a matrix of values.
- We also create a
*$targetClassLabels*array which is parallel to the samples array. It contains the real class labels that the Perceptron needs to correlate to the feature values. A setosa is identified by the class label**0**and a versicolor has a class label of**1**. This array, alongside the two-dimensional sample array, is the dataset that we use to train the Perceptron to identify types of iris. - The algorithm will make a number of passes over the training data, as defined by the
*$Epochs*parameter. For each pass it will calculate the number of misclassifications it makes, adjusting its internal values for its mistakes. The number of misclassifications is stored on the*$ErrorsPerEpoch*property. If the Perceptron is learning and getting better at classifying each sample, the number of errors should reduce with each pass over the dataset. We output this array of numbers as part of the test script to see how well it performed. - After training, we give the Perceptron two new samples to classify to ensure it can correctly identify a setosa iris and a versicolor iris.

In a nutshell: for training it accepts a matrix of feature values for numerous samples, and returns an array containing the number of misclassifications it made for each pass over the samples during the training phase (the number of errors should reduce over time). Afterwards, it accepts a single sample array and returns the class label it thinks the sample should have.

## Running the test script

If we run **test.ps1** as is, we see:

Errors per epoch: 4 2 3 2 1 0 0 0 0 0 Successfully classified a Setosa Iris. Successfully classified a Versicolor Iris.

So during the first pass over the training dataset, the perceptron made four classification errors. i.e. what it thought was one type of iris was actually the other. After adjusting it’s internals in response to the misclassifications, it made two errors on the second pass. You can see that after five passes over the dataset it can classify all the samples without error.

Interestingly, the Perceptron made more errors in the third pass over the training data than the second, which implies the adjustments it made up to that point were too severe and it needed to backtrack a little to become more accurate.

Hopefully you can clearly see how a Perceptron works in a black-box sense. Now we’re ready to look at the internal workings.

## The Perceptron algorithm

Here is the code for the Perceptron class itself:

class Perceptron { # An array of ints tracking the number of misclassifications per epoch. [array] $ErrorsPerEpoch = @() # An array of doubles that hold the weight coefficients for each feature in a sample. [array] $Weights = @() # A constant used for calculating the net input of an individual sample. hidden [double] $BiasValue = 0.0 # The number of passes over the dataset during training. hidden [int] $Epochs = 50 # Controls the size of weight updates during training. hidden [double] $LearningRate = 0.01 # The seed for the random number generator used to initialize the weights vector. hidden [int] $RandomSeed = 1 # The default constructor just uses the default values of properties. Perceptron() { } Perceptron([double] $LearningRate, [int] $Epochs, [int] $RandomSeed) { $this.LearningRate = $LearningRate $this.Epochs = $Epochs $this.RandomSeed = $RandomSeed } # Trains the Perceptron with the given sample matrix and vector (array) of target # class labels. The target class labels are the labels that the Perceptron needs # to learn to classify when given unlabelled data after training. Training involves # building a vector (array) of weight coefficients that can be multipled with the # feature values of the sample and passed to a decision making function (see: Classify) # for classification. [Void] Train([double[,]] $Samples, [int[]] $Targets) { if ($Samples.GetLength(0) -le 1) { throw "No training samples provided." } if ($Targets.length -ne $Samples.GetLength(0)) { throw "Number of target class labels does not equal number of samples." } $numFeatures = $Samples.GetLength(1) if ($numFeatures -le 1) { throw "No feature values provided in training samples." } # Create the vector (array) of weight coefficients for the number of features # in the sample. Initialize the weight values with random numbers that are # close to the decision boundary (0.0). $this.Weights = New-Object -TypeName "double[]" -ArgumentList $numFeatures Get-Random -SetSeed $this.RandomSeed | Out-Null for ($i = 0; $i -lt $this.Weights.length; $i++) { $this.Weights[$i] = Get-Random -Minimum -0.01 -Maximum 0.01 } $this.ErrorsPerEpoch = @() # For each epoch (pass over the dataset). for ($i = 0; $i -lt $this.Epochs; $i++) { $errors = 0 # For each sample in the dataset. for ($j = 0; $j -lt $Samples.GetLength(0); $j++) { $sample = New-Object -TypeName "double[]" -ArgumentList $numFeatures # Copy sample features into a one-dimensional array to represent a # single sample. for ($k = 0; $k -lt $numFeatures; $k++) { $sample[$k] = $Samples[$j, $k] } $target = $Targets[$j] # Attempt to classify the sample and compare the result to the target # value to produce an update value. If 'result' equals 'target' (i.e. # we classified the sample correctly), 'target' minus 'result' will # equal zero, thus the update value will be zero and the weights will # remain unchanged. If 'result' does not equal 'target', 'update' will # be a number that will push the weight coefficients towards 0 or 1, # depending on what the result should have been. $result = $this.Classify($sample) $update = $this.LearningRate * ($target - $result) # Update all weight coefficients and the bias value. for ($k = 0; $k -lt $this.Weights.length; $k++) { $this.Weights[$k] += ($update * $sample[$k]) } $this.BiasValue += $update if ($update -ne 0) { $errors++ } } # Keep track of the number of misclassifications in each epoch. If the # perceptron is learning correctly this number should reduce over the # total number of epochs. $this.ErrorsPerEpoch += $errors } } # The decision function of the Perceptron. If the net input of the sample is # greater than 0.0 it will return one class label, otherwise it will return # the other. [int] Classify([array] $Sample) { if ($this.getNetInput($Sample) -gt 0.0) { return 1 } return 0 } # Get the net input of the given sample. The net input converts the entire # feature vector (array) into a single number. This single number is the # dot product of the feature vector and the weights vector, plus the bias # value. hidden [double] getNetInput([array] $Sample) { if ($Sample.length -ne $this.Weights.length) { throw "Invalid number of features in the sample." } $dotProduct = 0.0 for ($i = 0; $i -lt $Sample.length; $i++) { $dotProduct += ($Sample[$i] * $this.Weights[$i]) } return $dotProduct + $this.BiasValue } }

This is actually not a lot of code. It’s over a hundred lines but only because I’ve written a lot of comments to explain the logic. Don’t worry if it looks scary, let’s go through it piece-by-piece starting with the decision function.

### The decision function

# The decision function of the Perceptron. If the net input of the sample is # greater than 0.0 it will return one class label, otherwise it will return # the other. [int] Classify([array] $Sample) { if ($this.getNetInput($Sample) -gt 0.0) { return 1 } return 0 }

This function takes a sample array, converts the array of numbers into a single number (the net input), then returns the class label based on that number. Simple stuff.

### Calculating the net input of a sample

# Get the net input of the given sample. The net input converts the entire # feature vector (array) into a single number. This single number is the # dot product of the feature vector and the weights vector, plus the bias # value. hidden [double] getNetInput([array] $Sample) { if ($Sample.length -ne $this.Weights.length) { throw "Invalid number of features in the sample." } $dotProduct = 0.0 for ($i = 0; $i -lt $Sample.length; $i++) { $dotProduct += ($Sample[$i] * $this.Weights[$i]) } return $dotProduct + $this.BiasValue }

Converting a sample array into a single number that serves as the basis of the classification is where most of the math in the perceptron lies. We do this by combining the sample array with another array internal to the perceptron, the *weights* array. You will often see this referred to as the weights vector in the literature, rather than array.

The weights array contains numerical values that reflect the adjustments the algorithm has to make to correctly identify training data. These weight values are updated as the algorithm misclassifies training samples, with the intention that the next time it calculates the net input for that sample the output will be closer to the correct value for its real class label.

### The training function

# Trains the Perceptron with the given sample matrix and vector (array) of target # class labels. The target class labels are the labels that the Perceptron needs # to learn to classify when given unlabelled data after training. Training involves # building a vector (array) of weight coefficients that can be multipled with the # feature values of the sample and passed to a decision making function (see: Classify) # for classification. [Void] Train([double[,]] $Samples, [int[]] $Targets) { if ($Samples.GetLength(0) -le 1) { throw "No training samples provided." } if ($Targets.length -ne $Samples.GetLength(0)) { throw "Number of target class labels does not equal number of samples." } $numFeatures = $Samples.GetLength(1) if ($numFeatures -le 1) { throw "No feature values provided in training samples." } # Create the vector (array) of weight coefficients for the number of features # in the sample. Initialize the weight values with random numbers that are # close to the decision boundary (0.0). $this.Weights = New-Object -TypeName "double[]" -ArgumentList $numFeatures Get-Random -SetSeed $this.RandomSeed | Out-Null for ($i = 0; $i -lt $this.Weights.length; $i++) { $this.Weights[$i] = Get-Random -Minimum -0.01 -Maximum 0.01 } $this.ErrorsPerEpoch = @() # For each epoch (pass over the dataset). for ($i = 0; $i -lt $this.Epochs; $i++) { $errors = 0 # For each sample in the dataset. for ($j = 0; $j -lt $Samples.GetLength(0); $j++) { $sample = New-Object -TypeName "double[]" -ArgumentList $numFeatures # Copy sample features into a one-dimensional array to represent a # single sample. for ($k = 0; $k -lt $numFeatures; $k++) { $sample[$k] = $Samples[$j, $k] } $target = $Targets[$j] # Attempt to classify the sample and compare the result to the target # value to produce an update value. If 'result' equals 'target' (i.e. # we classified the sample correctly), 'target' minus 'result' will # equal zero, thus the update value will be zero and the weights will # remain unchanged. If 'result' does not equal 'target', 'update' will # be a number that will push the weight coefficients towards 0 or 1, # depending on what the result should have been. $result = $this.Classify($sample) $update = $this.LearningRate * ($target - $result) # Update all weight coefficients and the bias value. for ($k = 0; $k -lt $this.Weights.length; $k++) { $this.Weights[$k] += ($update * $sample[$k]) } $this.BiasValue += $update if ($update -ne 0) { $errors++ } } # Keep track of the number of misclassifications in each epoch. If the # perceptron is learning correctly this number should reduce over the # total number of epochs. $this.ErrorsPerEpoch += $errors } }

If this looks intimidating – don’t worry. I’ll attempt as non-technical explanation as I can manage.

First, we create the weights array, which as you’ll remember from our “net input” function, is the array that gets combined with an iris sample array to classify the sample. This array is a property of the Perceptron class and is as long as the number of features in a data sample (a single iris). Our test script provides four features per sample: petal length, petal width, sepal length and sepal width. So at runtime there will be four weights in our test scenario.

Next, we initialize the weight values for each feature to a random value close to the decision boundary (which is the number **0.0**). So each feature coefficient gets assigned random numbers between **-0.01** and **0.01**. These numbers will be modified by the algorithm as it learns, it will push the values towards the ideal for identifying our given class labels.

Now, for each pass (epoch) over the training dataset we attempt to classify each sample. For this, we call our **Classify** function and compare the result to the known class label of the sample. The next few lines of code are where the magic happens:

$result = $this.Classify($sample) $update = $this.LearningRate * ($target - $result) # Update all weight coefficients and the bias value. for ($k = 0; $k -lt $this.Weights.length; $k++) { $this.Weights[$k] += ($update * $sample[$k]) }

If we correctly identify the sample as being **class zero** (setosa) and our target is **class zero**, *$update* will resolve to the value zero. i.e. *$this.LearningRate * (0 – 0)* must equal zero, because anything multiplied by zero is zero. Similarly, if we correctly identify the sample as being **class one** (versicolor), then one minus one is also zero and *$update* will resolve to zero. After each classification, our weight values (and a bias constant) get modified by this *$update* value. As this is zero on a correct classification, the weights remain unchanged.

Now, in the case that the algorithm misclassifies the sample then *$update* will equal a positive or negative number that will incrementally push the weight values closer to the desired class label. i.e. one minus zero is **one** and zero minus one is **minus one**. The next time the algorithm looks at this sample it still might not classify it correctly, but with the updated weight values it takes a step in the right direction.

Finally, we track the number of misclassifications encountered during a pass over the dataset. We used this array earlier in our test script to see how many passes it took the algorithm to classify all the training data without error.