This lesson is in the early stages of development (Alpha version)

Introduction to Machine Learning with Scikit Learn: Glossary

Key Points

Introduction
  • Machine learning is a set of tools and techniques to find patterns in data.

  • Some machine learning techniques are useful for predicting something given some input data.

  • Some machine learning techniques are useful for classifying input data and working out which class it belongs to.

  • Artificial Intelligence is a broader term that refers to making computers show human like intelligence.

  • Some people say Artificial Intelligence to mean machine learning

  • All machine learning systems have some kinds of limitations

Regression
  • We can model linear data using a linear or least squares regression.

  • A linear regression model can be used to predict future values.

  • We should split up our training dataset and use part of it to test the model.

  • For non-linear data we can use logarithms to make the data linear.

Introducing Scikit Learn
  • Scikit Learn is a Python library with lots of useful machine learning functions.

  • Scikit Learn includes a linear regression function.

  • It also includes a polynomial modelling function which is useful for modelling non-linear data.

Clustering with Scikit Learn
  • Clustering is a form of unsupervised learning

  • Unsupervised learning algorithms don’t need training

  • Kmeans is a popular clustering algorithm.

  • Kmeans struggles where one cluster exists within another, such as concentric circles.

  • Spectral clustering is another technique which can overcome some of the limitations of Kmeans.

  • Spectral clustering is much slower than Kmeans.

  • As well as providing machine learning algorithms scikit learn also has functions to make example data

Dimensionality Reduction
  • PCA is a linear dimensionality reduction technique for tabular data

  • t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA

Neural Networks
  • Perceptrons are artificial neurons which build neural networks.

  • A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum.

  • A single perceptron can solve simple functions which are linearly separable.

  • Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable.

  • We can train a whole neural network with the back propagation algorithm. Scikit-learn includes an implementation of this algorithm.

  • Training a neural network requires some training data to show the network examples of what to learn.

  • To validate our training we split the the training data into a training set and a test set.

  • To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation.

  • Deep learning neural networks are a very powerful modern technique. Scikit learn does not support these but other libraries like Tensorflow do.

  • Several companies now offer cloud APIs where we can train neural networks on powerful computers.

Ethics and Implications of Machine Learning
  • Machine learning is often thought of as unbiased and impartial. But if the training data is biased the machine learning will be.

  • Many machine learning algorithms can’t explain how they arrived at a decision.

  • There is a lot of concern about how machine learning can be used for unethical purposes.

  • No machine learning system is 100% accurate, think about the implications of false positives and false negatives.

Find out more
  • This course has only touched on a few areas of machine learning.

  • Machine learning is a large and growing field.

  • This course is designed to teach you just enough to do something useful.

  • Machine learning is a rapidly developing field and new tools and techniques are constantly appearing.

Glossary