Understanding Machine Learning

in #machinelearning7 years ago


Photo by Denys Nevozhai on Unsplash - https://unsplash.com/photos/2vmT5_FeMck

This is short overview of machine learning. What it is, what learning is and what it's most common concepts are. It is designed as a first step into the topic.

📄 Table of contents


"A wise man can learn more from a foolish question than a fool can learn from a wise answer." - Bruce Lee


What is Machine Learning (ML)

ML finds patterns in data and uses them to predict the future.

Learning requires:

  • identifying patterns
  • recognizing those patterns

Now it's easy to find patterns. But it is not easy to find patterns that are correct. Increasing the size of data allows to predict outcome that is more and more correct.

DataAlgorithmModelApplication
contains patternsfinds patternsrecognizes patternsuses to recognition on other data

img

By Megajuice - CC0, Link

Common programming languages used for ML are:

  • R
  • Python

The learning process

1. Asking questions

  • what questions to ask
  • what data helps you to answer the question
  • how do you measure success

2. Iterate

  • select and prepare your data over and over to make it useable for the algorithm
  • apply an algorithm on the data and create models over and over to increase your success rate
  • expose and test successful models to different data

ML concepts

  • supervised learning (the value you want to predict is already in the data)
  • unsupervised learning (the value you want to predict is not in the data)

Data preprocessing with supervised learning

Raw data has to be transformed in to training data by removing unnecessary items like duplicates, wrong/false information, useless information.

The training data contains features, which stand for important classifications and target values, which stand for the desired piece of information for the model.

Problems

regressionsclassificationclustering
Goaltrying to find a line or curve that fit the datatrying to group data into classestrying to identify segments of the data
Exampleimgimgimg
Image CreditBy Sewaqu - , Public Domain, LinkBy Elizabeth Goodspeed - , CC BY-SA 4.0, LinkBy Chire -, CC BY-SA 3.0, Link

Algorithms

Common styles are:

  • decision trees (construct a model based on actual values of attributes in a data)

img

By Stephen Milborrow - , CC BY-SA 3.0, Link

  • neural networks (construct a model based on the recombination and reevaluation of results within the training data)
    img

By Glosser.ca - , Derivative of File:Artificial neural network.svg, CC BY-SA 3.0, Link

  • bayesian (filters according to probabilistic classifiers)

img

By AnAj -, Public Domain, Link

  • K-means (construct a model based on vector quantization to the k closest training examples)

img

By Chire - Public Domain, Link

(Iris flower data set, clustered using k means (left) and true species in the data set (right). Note that k-means is non-determinicstic, so results vary. Cluster means are visualized using larger, semi-transparent markers. The visualization was generated using ELKI.)

Training the model

  1. find features that are relevant to identifying the target value
  2. put a significant percentage of the features data into the algorithm
  3. generate a model
  4. test the model with the remaining percentage of the features data by comparing the target values with the values form the actual data
  5. if the model is not accurate, change the features, change the algorithm or change the data

img
By Docurbs - , CC BY-SA 4.0, Link


Thanks for reading my article! Feel free to leave any feedback!


Daniel is a LL.M. student in business law, working as a software engineer and organizer of tech related events in Vienna.
His current personal learning efforts focus on machine learning. Connect on: