keras floydhub


We want to automatically generate original text based on Nietzsche’s corpus. We will use a LSTM using Keras to model a character generator.

Setup Floyd

While looking for options to test deep learning training with GPU I came accross Floyd, the Heroku of deep learning. It is easier to setup than AWS gpu instances.

Go to FloyHub install instruction and tutorials to get started

Create a Floyd project directory

floyd login
floyd init PROJECT_NAME

Create a Floyd data directory and uplod text to /INPUT/

mkdir DATA_DIR
floyd data init DATASET_NAME

Check status

floyd status
floyd data status

Run jupyter notebook on GPU instance

floyd run --data ID --gpu --mode jupyter

Which will give an URL to access the jupyter notebook

Run script on GPU instance

floyd run --data ID --gpu "python"



  • We use Keras library for deep learning
  • check install instruction for keras and numpy if necessary

Module imports

import numpy as np
from matplotlib import pyplot as plt
from keras.utils.data_utils import get_file
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
import sys
from keras import backend as K
print K.backend()
%matplotlib inline


We will train our model on a text by Nietzsche.

There are 600893 characters in the text. 84 of which are unique.

data_path = "/input/nietzsche.txt"
text = open(data_path).read()
print('Corpus length:', len(text))
chars = sorted(list(set(text)))
vocab_size = len(chars)
print('Total unique chars:', vocab_size)

We encode these chars as indices to feed the learning model using char2index map.

char2index = dict((c,i) for i, c in enumerate(chars))

The feature vector is defined as 100 subsequent characters. The target is the 101th character.

sub_text = [char2index[c] for c in text[:]]
input_length = 100
X = []
y = []
for i in range(0, len(sub_text) - input_length):

Reformat following keras/tensorflow convention and normalize features [0,1]

# reshape X to be [samples, time steps, features]
X_ = np.reshape(X, (len(X), input_length, 1))
# normalize
X_ = X / float(vocab_size)
# one hot encode the output variable
y = np_utils.to_categorical(y)

LSTM architecture

We first try a simple LSTM architeture with 256 units in the LSTM layer, a dropout rate of 0.2 and a softmax predictor.

n_hidden = 256
n_samples = 900
n_features = 100
n_depth = 1
n_y = y.shape[1]
model = Sequential()
model.add(LSTM(256, input_shape=(n_features, n_depth)))
model.add(Dense(n_y, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

We make sure to progressively save the weights as loss function decreases.

# define checkpoint: for long training save when improvement in loss
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

Train with epochs and batch_size, y, epochs=10, batch_size=128, callbacks=callbacks_list)

Generate text

Load the weights that have been saved during the training phase.

filename = "weights-improvement-01-2.7955.hdf5"
model.compile(loss='categorical_crossentropy', optimizer='adam')

Once the model is trained we can use it to generate sequences of characters.

# randomly pick initial sequence among input
start = np.random.randint(0, len(X_))
pattern = X_[start] * (vocab_size + 1)
print "\"", ''.join([chars[int(c)] for c in pattern]),"\""

# generate characters
for i in range(100):
    x = np.reshape(pattern, (1, len(pattern), 1))
#     print "x: ", x.shape
#     x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = chars[index]
    seq_in = [chars[int(value)] for value in pattern]
    pattern = np.append(pattern,index)
    pattern = pattern[1:len(pattern)]
#     print "shape: ", pattern.shape
print "\nDone."