Numpy is a useful module in python that allows the use of a wide variety of mathematical functions. For machine learning, the method numpy.array() is very useful, as we can instantiate matrices with it. To do matrix operations, there are methods such as numpy.dot(), which allows us to compute the dot product of two matrices.
Lineara lgebra is used extensively on this course. Here are some useful resources:
Classifiers are mathemtical models that can classify values depending on seen before data. A 'hyperplane' is created, that can be a 2D line (for 2D data) or a 3D plane etc. Values on one side of the line will be classified a certain way, and values on the other will be classified another. This form of binary classification can be implemented in a few ways.
To classify, a prediction must be made, given some data. The prediction is the guess based off the current theta and theta zero. This prediction can be compared to the actual data for the values. Then, the error can be calculated, by subtracting the guesses from the actual data. From the loss, we can find thetas that give the closest predictions to the data.
The Random Linear Classifier is a classifier that finds values for theta and theta_0 by randomly choosing them, and finding the smallest loss. It's not very efficient.
The Perceptron is a slightly smarter linear classifier. It
If we use optimisation to find good classification values in linear regression, it's difficult to obtain a level of certainty. This is because the linear loss function that gives values of either 1 or 0 is not helpful when data is not linearly seperable. When data is not linearly seperable, we could have two classifiers that misclassify the same number of values, but one is better than the other. When we use optimisation to find a good model, it will be very difficult to find them as two similar models will not be close to each other on the graph of loss functions.
Instead, we can use linear logistic regression (LLC), or just logistic regression for short. This is a classification algorithmm that gives a probability for a loss function, instead of a binary 1 or 0. Therefore, better models will have a higher probability of being useful. This uses a sigmoid function instead of a sign function. In prediction, we will predict +1 if the sigmoid function of our theta(s) and theta 0 is above some threshold value e.g. 0.5
To train this data, we need to use a new loss function. This is called the negative log likelihood loss function.