Note_kerasComputationalGraph
In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Introduction

TensorFlow framework

The TensorFlow framework has two components:

  • Library: for defining computational graphs.
  • Runtime: for executing such graphs on a variety of different hardware platforms.

Workflow diagram

This study note will focus on defining computational graphs, so we are at defining stage in the workflow diagram.

Static computational graphs

A computational graph is a type of directed graph where:

  • nodes: describe operations
    • operation node: operation
    • storage node: Variable
    • data node: PlaceHolder
  • edges: represent the data (tensor) flowing between those operations
    • Tensor
    • SparseTensor

A sample computational graph in TensorFlow (Source: TensorFlow website)

Building computational graphs requires three elements:

  • keras.Input(): start model by define an Input object
    • shape=() have to be defined at input layers
  • layers: chain layer calls to specify the model's forward pass
    • shape=() have to be defined at output layers
  • keras.Model(): groups layers into an object with training and inference features
    • inputs=[]
    • outputs=[]

API comparison

Feature sequential API Functional API
description plain stack of layers complex graph topologies
model/layers inputs one multiple
model/layers outputs one multiple
layer sharing False True
topology linear non-linear
special usage - residual connection, a multi-branch model
Table: Sequential model vs Functional API

keras.Sequential

In [2]:
model = keras.Sequential()
model.add(keras.Input(shape=(4,)))
model.add(layers.Dense(2, activation="relu"))
In [3]:
keras.utils.plot_model(model, "multi_input_and_output_model.png", show_shapes=True)
Out[3]:

Functional API

In [4]:
main_input = keras.Input(shape=(4,), name="main_input")
aux_input = keras.Input(shape=(2,), name="aux_input")

x = layers.concatenate([main_input, aux_input])

main_output = layers.Dense(1, name="main_output")(x)
aux_output = layers.Dense(2, name="aux_output")(x)

model = keras.Model(
    inputs=[main_input, aux_input],
    outputs=[main_output, aux_output],
)
In [5]:
keras.utils.plot_model(model, "multi_input_and_output_model.png", show_shapes=True)
Out[5]:

Layer

Layers are functions with a known mathematical structure that can be reused and have trainable variables.

class layer has def __call__(self, inputs, **kwargs):, so it is callable and be written into double brackets.

x = layers.Dense(64, activation='relu')(x)
equvalent to:
fc = Dense(64, activation='relu')
x = fc(x)

Input shape

Since the input shape is the only one you need to define, Keras will demand it in the first layer. But in this definition, Keras ignores the first dimension, which is the batch size. Your model should be able to deal with any batch size, so you define only the other dimensions:

  • when defining the input shape, ignore the batch size: input_shape=(50,50,3)
  • When doing operations directly on tensors, the shape will be again (30,50,50,3)
  • when printing the model summary, it will show (None,50,50,3) for input_shape=(50,50,3):
    • The first dimension is the batch size, it's None because it can vary depending on how many examples you give for training. If you defined the batch size explicitly, then the number you defined will appear instead of None

Each type of layer requires the input with a certain number of dimensions:

  • Dense layers: (batch_size, input_size) or (batch_size, optional,...,optional, input_size)
  • 2D convolutional layers:
    • if using channels_last: (batch_size, imageside1, imageside2, channels)
    • if using channels_first: (batch_size, channels, imageside1, imageside2)
  • 1D convolutions and recurrent layers: (batch_size, sequence_length, features)

reference: https://stackoverflow.com/questions/44747343/keras-input-explanation-input-shape-units-batch-size-dim-etc

Module comparison

Two modules in Tensorflow provides layers API with different levels. Using conv2d as example for comparison:

  • tf.nn: Wrappers for primitive Neural Net (NN) Operations. This lower level API is there for people with special needs, or who wishes to keep a finer control of what is going on
    • tf.nn.conv2d: has to manually declare weights, biases, regularization, activation
  • tf.keras.layers: high level wrapper built upon tf.nn. This higher level API is to provide functions that greatly simplify the design of the most common neural nets.
    • tf.keras.layers.conv2d: a single line of code to create a convolutional layer, with default setting for weights, biases, regularization, activation.

reference: https://stackoverflow.com/questions/45172725/tensorflow-why-are-there-so-many-similar-or-even-duplicate-functions-in-tf-nn

CORE LAYERS

layer description illustration
tf.keras.layers.InputLayer Layer to be used as an entry point into a Network (a graph of layers)
tf.keras.layers.Dense regular densely-connected NN layer
tf.keras.layers.Activation Apply an activation function to an output
tf.keras.layers.Dropout Applies Dropoutto the input
tf.keras.layers.Reshape Reshapes an output to a certain shape
tf.keras.layers.Permute Permute the dimensions of an input according to a given pattern
tf.keras.layers.RepeatVector Repeats the input n times
tf.keras.layers.Lambda Wraps arbitrary expression as a layer
tf.keras.layers.ActivityRegularization Layer that applies an update to the cost function based input activity
tf.keras.layers.Masking Masks a sequence by using a mask value to skip timesteps
tf.keras.layers.Flatten Flattens an input
tf.keras.layers.Concatenate concatenates a list of inputs.

CONVOLUTIONAL LAYERS

layer description illustration
tf.keras.layers.Conv1D 1D, e.g.temporal convolution
tf.keras.layers.Conv1DTranspose Transposed convolution layer (sometimes called Deconvolution)
tf.keras.layers.Conv2D 2D, e.g. spatial convolution over images
tf.keras.layers.Conv2DTranspose Transposed 2D (deconvolution)
tf.keras.layers.Conv3D 3D, e.g. spatial convolution over volumes
tf.keras.layers.Conv3DTranspose Transposed 3D (deconvolution)
tf.keras.layers.ConvLSTM2D Convolutional LSTM
tf.keras.layers.SeparableConv1D
tf.keras.layers.SeparableConv2D
Depthwise separable 2D
tf.keras.layers.UpSampling1D
tf.keras.layers.UpSampling2D
tf.keras.layers.UpSampling3D
Upsampling layer
tf.keras.layers.ZeroPadding1D
tf.keras.layers.ZeroPadding2D
tf.keras.layers.ZeroPadding3D
Zero-padding layer
tf.keras.layers.Cropping1D
tf.keras.layers.Cropping2D
tf.keras.layers.Cropping3D
Cropping layer

POOLING LAYERS

layer description
tf.keras.layers.GlobalMaxPool1D
tf.keras.layers.GlobalMaxPool2D
tf.keras.layers.GlobalMaxPool3D
Maximum pooling for 1D to 3D
tf.keras.layers.AveragePooling1D
tf.keras.layers.AveragePooling2D
tf.keras.layers.AveragePooling3D
Average pooling for 1D to 3D
tf.keras.layers.GlobalMaxPool1D
tf.keras.layers.GlobalMaxPool2D
tf.keras.layers.GlobalMaxPool3D
Global maximum pooling
tf.keras.layers.GlobalAveragePooling1D
tf.keras.layers.GlobalAveragePooling2D
tf.keras.layers.GlobalAveragePooling3D
Global average pooling

ACTIVATION LAYERS

layer description
tf.keras.layers.Activation(object, activation) Apply an activation function to an output
tf.keras.layers.LeakyReLU Leaky version of a rectified linear unit
tf.keras.layers.PReLU Parametric rectified linear unit
tf.keras.layers.ThresholdedReLU Thresholded rectified linear unit
tf.keras.layers.ELU Exponential linear unit
tf.keras.layers.Softmax Softmax activation function

DROPOUT LAYERS

layer description
tf.keras.layers.Dropout Applies dropout to the input
tf.keras.layers.SpatialDropout1D
tf.keras.layers.SpatialDropout2D
tf.keras.layers.SpatialDropout3D
Spatial 1D to 3D version of dropout

RECURRENT LAYERS

layer description
tf.keras.layers.SimpleRNN Fully-connected RNN where the output is to be fed back to input
tf.keras.layers.GRU Gated recurrent unit - Cho et al
tf.keras.layers.LSTM Long-Short Term Memory unit - Hochreiter 1997
tf.keras.layers.ConvLSTM1D
tf.keras.layers.ConvLSTM2D
tf.keras.layers.ConvLSTM3D
Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.

LOCALLY CONNECTED LAYERS

layer description
tf.keras.layers.LocallyConnected1D
tf.keras.layers.LocallyConnected2D
Similar to convolution, but weights are not shared, i.e. different filters for each patch

ATTENTION LAYER

layer description
tf.keras.layers.Dot Layer that computes a dot product between samples in two tensors.
tf.linalg.matmul Multiplies matrix a by matrix b, producing a * b
tf.keras.layers.Attention Dot-product attention layer, a.k.a. Luong-style attention
tf.keras.layers.AdditiveAttention Additive attention layer, a.k.a. Bahdanau-style attention.
tf.keras.layers.MultiHeadAttention MultiHeadAttention layer

ARITHMETIC

layer description
tf.keras.layers.Add Layer that adds (element-wise) a list of inputs.
tf.keras.layers.Subtract Layer that subtracts (element-wise) two inputs
tf.keras.layers.Multiply Layer that multiplies (element-wise) a list of inputs.
tf.keras.layers.Maximum Layer that computes the maximum (element-wise) a list of inputs
tf.keras.layers.Minimum Layer that computes the minimum (element-wise) a list of inputs
function description
tf.keras.layers.add function that adds (element-wise) a list of inputs.
tf.keras.layers.subtract function that subtracts (element-wise) two inputs
tf.keras.layers.multiply function that multiplies (element-wise) a list of inputs.
tf.keras.layers.maximum function that computes the maximum (element-wise) a list of inputs
tf.keras.layers.minimum function that computes the minimum (element-wise) a list of inputs

DIMENTION REDUCTION

layer description
tf.math.reduce_sum Computes the sum of elements across dimensions of a tensor.
tf.math.reduce_prod Computes tf.math.multiply of elements across dimensions of a tensor.
tf.math.reduce_mean Computes the mean of elements across dimensions of a tensor.
tf.math.reduce_max Computes tf.math.maximum of elements across dimensions of a tensor.
tf.math.reduce_min Computes the tf.math.minimum of elements across dimensions of a tensor.
tf.math.reduce_variance Computes the variance of elements across dimensions of a tensor.
tf.math.reduce_std Computes the standard deviation of elements across dimensions of a tensor.