Moving averaging of Loss during Training in Keras

I am using Keras with TensorFlow to implement a deep neural network. When I plot the loss and number of iterations, there is a significant jump in loss after each epoch. In reality, the loss of each mini-batch should vary from each other, but Keras calculates the moving average of the loss over the mini-batches, that’s why we obtain a smooth curve instead of an arbitrary one. The array of the moving average is reset after each epoch because of which we can observe a jump in the loss.

I would like to remove the functionality of moving average instead I would like to have raw loss values which will vary for each mini-batch. For now, I tried reduction in the loss function but it works only on the examples within the mini-batch. The following code sum losses of all the training examples within the mini-batch.

tf.keras.losses.BinaryCrossentropy(reduction = 'sum')

I also tried writing a custom loss function but that doesn’t help either.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Keras in fact shows the moving average instead of the “raw” loss values. In order to acquire the raw loss values, one should implement a callback as shown below:

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        #initialize a list at the begining of training
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

mycallback = LossHistory()

Then call it in model.fit

model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)

I tested with the following configuration

Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9

Method 2

( 1 ): I would like to remove the functionality of moving average instead I would like to have raw loss values that will vary for each mini-batch.

That can reach by using callback functions but again I look through the question you also try to optimize the actual loss value back into the calculation.

That is, of course, you can apply in the callback function or you can do it directly since this example tells you how the basic custom optimizer works.

[ Sample ]:

import os
from os.path import exists

import tensorflow as tf

import matplotlib.pyplot as plt
from skimage.transform import resize

import numpy as np

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
learning_rate = 0.001
global_step = 0
tf.compat.v1.disable_eager_execution()

BATCH_SIZE = 1
IMG_SIZE = (32, 32)

history = [ ] 
history_Y = [ ]
list_file = [ ]
list_label = [ ]

for file in os.listdir("F:\datasets\downloads\dark\train") :
    image = plt.imread( "F:\datasets\downloads\dark\train\" + file )
    image = resize(image, (32, 32))
    image = np.reshape( image, (1, 32, 32, 3) )
    list_file.append( image )
    list_label.append(1)

optimizer = tf.compat.v1.train.ProximalAdagradOptimizer(
    learning_rate,
    initial_accumulator_value=0.1,
    l1_regularization_strength=0.2,
    l2_regularization_strength=0.1,
    use_locking=False,
    name='ProximalAdagrad'
)

var1 = tf.Variable(255.0)
var2 = tf.Variable(10.0)
X_var = tf.compat.v1.get_variable('X', dtype = tf.float32, initializer = tf.random.normal((1, 32, 32, 3)))
y_var = tf.compat.v1.get_variable('Y', dtype = tf.float32, initializer = tf.random.normal((1, 32, 32, 3)))
Z = tf.nn.l2_loss((var1 - X_var) ** 2 + (var2 - y_var) ** 2, name="loss")

cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
loss = tf.reduce_mean(input_tensor=tf.square(Z))
training_op = optimizer.minimize(cosine_loss(X_var, y_var))

previous_train_loss = 0
with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())

    image = list_file[0]
    X = image
    Y = image

    for i in range(1000):
        global_step = global_step + 1
        train_loss, temp = sess.run([loss, training_op], feed_dict={X_var:X, y_var:Y})
        
        history.append( train_loss )
        
        if global_step % 2 == 0 :
            var2 = var2 - 0.001
        if global_step % 4 == 0 and train_loss <=  previous_train_loss :
            var1 = var1 - var2 + 0.5
        print( 'steps: ' + str(i) )
        print( 'train_loss: ' + str(train_loss) )
        
        previous_train_loss = train_loss
        
sess.close()

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Graph
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = history[:-1]

plt.plot(np.asarray(history))
plt.xlabel('Epoch')
plt.ylabel('loss')
plt.legend(loc='lower right')
plt.show()

[ Output ]:
Moving averaging of Loss during Training in Keras


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x