Skip to main content

Explicit Reporting

In this tutorial, learn how to extend ClearML automagical capturing of inputs and outputs with explicit reporting.

In this example, we will add the following to the pytorch_mnist.py example script from ClearML's GitHub repo:

  • Setting an output destination for model checkpoints (snapshots).
  • Explicitly logging a scalar, other (non-scalar) data, and logging text.
  • Registering an artifact, which is uploaded to ClearML Server, and ClearML logs changes to it.
  • Uploading an artifact, which is uploaded, but changes to it are not logged.

Prerequisites#

  • The clearml repository is cloned.
  • The clearml package is installed.

Before Starting#

Make a copy of pytorch_mnist.py in order to add explicit reporting to it.

  • In the local ClearML repository, example directory.
cp pytorch_mnist.py pytorch_mnist_tutorial.py

Step 1: Setting an Output Destination for Model Checkpoints#

Specify a default output location, which is where model checkpoints (snapshots) and artifacts will be stored when the experiment runs. Some possible destinations include:

  • Local destination
  • Shared folder
  • Cloud storage:
    • S3 EC2
    • Google Cloud Storage
    • Azure Storage.

Specify the output location in the output_uri parameter of the Task.init method. In this tutorial, we specify a local folder destination.

In pytorch_mnist_tutorial.py, change the code from:

task = Task.init(project_name='examples', task_name='pytorch mnist train')

to:

model_snapshots_path = '/mnt/clearml'
if not os.path.exists(model_snapshots_path):
os.makedirs(model_snapshots_path)
task = Task.init(project_name='examples',
task_name='extending automagical ClearML example',
output_uri=model_snapshots_path)

When the script runs, ClearML creates the following directory structure:

+ - <output destination name>
| +-- <project name>
| +-- <task name>.<Task Id>
| +-- models
| +-- artifacts

and puts the model checkpoints (snapshots) and artifacts in that folder.

For example, if the Task ID is 9ed78536b91a44fbb3cc7a006128c1b0, then the directory structure will be:

+ - model_snapshots
| +-- examples
| +-- extending automagical ClearML example.9ed78536b91a44fbb3cc7a006128c1b0
| +-- models
| +-- artifacts

Step 2: Logger Class Reporting Methods#

In addition to ClearML automagical logging, the ClearML Python package contains methods for explicit reporting of plots, log text, media, and tables. These methods include:

Get a Logger#

First, create a logger for the Task using the Task.get_logger method.

logger = task.get_logger

Plot Scalar Metrics#

Add scalar metrics using the Logger.report_scalar method to report loss metrics.

def train(args, model, device, train_loader, optimizer, epoch):
save_loss = []
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
save_loss.append(loss)
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
# Add manual scalar reporting for loss metrics
logger.report_scalar(title='Scalar example {} - epoch'.format(epoch),
series='Loss', value=loss.item(), iteration=batch_idx)

Plot Other (Not Scalar) Data#

The script contains a function named test, which determines loss and correct for the trained model. We add a histogram and confusion matrix to log them.

def test(args, model, device, test_loader):
save_test_loss = []
save_correct = []
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction='sum').item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
save_test_loss.append(test_loss)
save_correct.append(correct)
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
logger.report_histogram(title='Histogram example', series='correct',
iteration=1, values=save_correct, xaxis='Test', yaxis='Correct')
# Manually report test loss and correct as a confusion matrix
matrix = np.array([save_test_loss, save_correct])
logger.report_confusion_matrix(title='Confusion matrix example',
series='Test loss / correct', matrix=matrix, iteration=1)

Log Text#

Extend ClearML by explicitly logging text, including errors, warnings, and debugging statements. We use the Logger.report_text method and its argument level to report a debugging message.

logger.report_text('The default output destination for model snapshots and artifacts is: {}'.format(model_snapshots_path ), level=logging.DEBUG)

Step 3: Registering Artifacts#

Registering an artifact uploads it to ClearML Server, and if it changes, the change is logged in ClearML Server. Currently, ClearML supports Pandas DataFrames as registered artifacts.

Register the Artifact#

In the tutorial script, test function, we can assign the test loss and correct data to a Pandas DataFrame object and register that Pandas DataFrame using the Task.register_artifact method.

# Create the Pandas DataFrame
test_loss_correct = {
'test lost': save_test_loss,
'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])
# Register the test loss and correct as a Pandas DataFrame artifact
task.register_artifact('Test_Loss_Correct', df, metadata={'metadata string': 'apple',
'metadata int': 100, 'metadata dict': {'dict string': 'pear', 'dict int': 200}})

Reference the Registered Artifact#

Once an artifact is registered, it can be referenced and utilized in the Python experiment script.

In the tutorial script, we add Task.current_task and Task.get_registered_artifacts methods to take a sample.

# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(frac=0.5,
replace=True, random_state=1)

Step 4: Uploading Artifacts#

Artifact can be uploaded to the ClearML Server, but changes are not logged.

Supported artifacts include:

  • Pandas DataFrames
  • Files of any type, including image files
  • Folders - stored as ZIP files
  • Images - stored as PNG files
  • Dictionaries - stored as JSONs
  • Numpy arrays - stored as NPZ files

In the tutorial script, we upload the loss data as an artifact using the Task.upload_artifact method with metadata specified in the metadata parameter.

# Upload test loss as an artifact. Here, the artifact is numpy array
task.upload_artifact('Predictions',artifact_object=np.array(save_test_loss),
metadata={'metadata string': 'banana', 'metadata integer': 300,
'metadata dictionary': {'dict string': 'orange', 'dict int': 400}})

Additional Information#

After extending the Python experiment script, run it and view the results in the ClearML Web UI.

python pytorch_mnist_tutorial.py

To view the experiment results, do the following:

  1. In the ClearML Web UI, on the Projects page, click the examples project.
  2. In the experiments table, click the Extending automagical ClearML example experiment.
  3. In the ARTIFACTS tab, DATA AUDIT section, click Test_Loss_Correct. The registered Pandas DataFrame appears, including the file path, size, hash, metadata, and a preview.
  4. In the OTHER section, click Loss. The uploaded numpy array appears, including its related information.
  5. Click the RESULTS tab.
  6. Click the CONSOLE sub-tab, and see the debugging message showing the Pandas DataFrame sample.
  7. Click the SCALARS sub-tab, and see the scalar plots for epoch logging loss.
  8. Click the PLOTS sub-tab, and see the confusion matrix and histogram.

Next Steps#