Design Philosophy of Tensorflow - Introduction to Tensorflow Part 1

in #tensorflow7 years ago (edited)

tensorflow_cover.jpg

Introduction to Tensorflow as a Computational Framework Part 1

Tensorflow is likely the most popular, and fastest growing machine learning framework that exists. With over 70000 stars on Github, and backing from Google, it not only has more stars than Linux, but also has a ton of resources behind it.

If that doesn't peak your interest, I have no idea what will.

If you've been following the machine learning 101 series up to now, you will notice that we've used the sklearn framework to implement our models. However, as we begin venturing into neural networks, deep learning, and the inner workings of some of the algorithms, we will start using the Tensorflow framework which has the capability to access more low-level APIs to give us a more nuanced control over the model.

Because of this, we will spend some time familiarizing ourselves with Tensorflow, and its design philosophy, so that we in subsequent tutorials can start using it without introduction.

In this tutorial we will talk about:

  • General design philosophy
  • Visualization
  • Examples covering common use cases
  • How it relates to machine learning

This post originally appeared on kasperfred.com where I write more about machine learning.

In the official white-paper, Tensorflow is described as "an interface for expressing machine learning
algorithms, and an implementation for executing such algorithms". Its main advantage over other frameworks is how easy it is to execute the code on a wide array of devices. This is related to the initial motivation for its development, before it was open-sourced. Google initially developed Tensorflow to bridge the gap between research and production aspiring to an ideal where no edits to the code had to be made to go from research to production.

Tensorflow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.

To achieve this, Tensorflow implements a computational graph behind the scenes; in your code, you're defining just defining that graph: the flow of tensors.

Wait, what is a tensor?

Just like a vector can be thought of as being an array, or a list, of scalars (ordinary numbers like 1, 2, and PI), and matrices can be thought of as arrays of vectors, a tensor can be thought of as an array of matrices. So a tensor is really just an n-dimensional matrix. It turns out, as we will see in the coding examples, that this architecture makes a lot of sense when working with machine learning.

What is the flow?

The flow is how tensors are passed around in the network. When the tensors are passed around, their values and shapes are updated by the graph operations.

As an analogy, you can think of the graph as a car factory with a series of workstations. One station may put on the wheels of the car while another installs the gearbox. The flow then describes the route a car skeleton has to take in order to become a fully functional car. The tensors passed around in this analogy would be the car prototype, or skeleton.

Installing Tensorflow

You can install Tensorflow using pip using the following command:

pip install tensorflow

Or if you have a GPU:

pip install tensorflow-gpu

Note that if you're installing the GPU version, you need to have CUDA and cuDNN installed.

As of writing this, Tensorflow (v1.3) supports CUDA 8 and cuDNN 6.

Once you have installed Tensorflow, you can verify that everything works correctly using:

import tensorflow as tf
# Figure out what devices are available
from tensorflow.python.client import device_lib

def get_devices():
    return [x.name for x in device_lib.list_local_devices()]

print (get_devices())
['/cpu:0', '/gpu:0']

For more information, you can refer to the installation page.

The atoms of Tensorflow

We already discussed how Tensorflow literally is the flow of tensors, but we didn't go into much detail. In order to better justify the architectural decisions, we will elaborate a bit on this.

Three types of tensors

In Tensorflow, there are three primary types of tensors:

  • tf.Variable
  • tf.constant
  • tf.placeholder

It's worth it to take a a look at each of these to discuss the differences, and when they are to be used.

tf.Variable

The tf.Variable tensor is the most straight forward basic tensor, and is in many ways analogous to pure Python variables in that the value of it is, well, variable.

Variables retain their value during the entire session, and are therefore useful when defining learnable parameters such as weights in neural networks, or anything else that's going to change as the code is running.

You define a variable as by the following:

a = tf.Variable([1,2,3], name="a")

Here, we create a tensor variable with the initial state [1,2,3], and the name a. Notice, that Tensorflow is not able to inherit the Python variable name, so if you want to have a name on the graph (more on that later), you need to specify a name.

There are a few more options, but this is only meant to cover the basics. As with any of the things discussed here, you can read more about it on the documentation page.

tf.constant

The tf.Constant is very similar to tf.Variable with one major difference, they are immutable, that is the value is constant (wow, Google really nailed the naming of tensors).

The usage follows that of the tf.Variable tensor:

b = tf.constant([1,2,3], name="b")

You use this whenever you have a value that doesn't change through the execution of the code for example to denote some property of the data, or to store the learning rate when using neural networks.

tf.placeholder

Finally, we have the tf.placeholder tensor. As the name implies, this tensor type is used to define variables, or graph nodes (operations), for which you don't have an initial value. You then defer setting a value until you actually do the computation using sess.run. This is useful for example as a proxy for your training data when defining the network.

When running the operations, you need to pass actual data for the placeholders. This is done like so:

c = tf.placeholder(tf.int32, shape=[1,2], name="myPlaceholder")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    res = sess.run(c, 
       feed_dict={
        c:[[5,6]]
       })
    
    print (res)
[[5 6]]

Notice that we define a placeholder by first passing a non-optional parameter of the element type (here tf.int32), and then we define the shape using matrix dimension notation. The [1,2] denotes a matrix with 1 row and two columns. If you haven't studied linear algebra, this may seem confusing at first: why denote the height before the width?, and isn't [1,2] a 1 by 2 matrix itself with the values 1 and 2?

These are valid questions, but in-depth answers are out of the scope of this essay. However, to give you the gist of it, the apparantly weird notation form has some quite neat mnemonic properties for some matrix operations, and yes, [1,2] can also be seen as a one by two matrix in itself. Tensorflow uses the list like notation because it supports n-dimensional matrices, and it's therefore very convenient as we will see later.

You can find a complete list of supported Tensorflow datatypes here.

When we evaluate the value of c with sess.run we pass in the actual data using a feed_dict. Notice that we use the Python variable name, and not the name given to the Tensorflow graph to target the placeholder. This same approach also extends to multiple placeholders where each variable name is mapped to a dictionary key of the same name.

Wildcards when defining shapes

Sometimes, you don't know some, or the entire shape of a placeholder when defining it. For example, you may use a variable batch size when training, this is where wildcards come in.

Wildcards essentially allows you to say, "I don't know" to Tensorflow, and let it infer the shapes from the incoming tensors.

What's the difference between -1 and None?

Honestly, I tried to figure out the answer to this, but I haven't been able to find any documented difference between them, and the little I dug around in the source-code of Tensorflow didn't yield any results either. However, I've run into a couple of examples where one would raise an error while the other one wouldn't.

Of the two, None seems to work better for me, so that's what I always use, and if I get an error related to the size of my placeholders, I try to change it to -1, but I do think they are supposed to be equivalent.

Why not just wildcard EVERYTHING!?!

Having explicit shapes helps debugging as a lot of errors will be catches at "compile time" as opposed when training allowing you to spot mistakes more quickly, and ensures that errors don't creep up on you silently (at least it tries to).

So to save your future self from headaches, you should only use wildcards when describing something variable such as input size, and not something static such as network parameter size.

Come back tomorrow when we will look at doing computations with Tensorflow.

Edit:
You can read part 2 here