Skip to main content

Version 0.16

important

Trains is now ClearML.

Version 0.16.4#

Trains#

Features

  • Add Hydra support (GitHub trains Issue 219).
  • Add cifar ignite example (GitHub trains Issue 237).
  • Add auto extraction of tar.gz files when using StorageManager (GitHub trains Issue 237).
  • Add Task.init() argument auto_connect_streams controlling stdout / stderr / logging capture (GitHub trains Issue 181).
  • Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181).
  • Add Task.create_function_task() to allow creating a new task, using a function and arguments, to be executed remotely (GitHub trains Issue 230).
  • Allow disabling SSL certificates verification using Task.setup_upload() argument verify or AWS S3 bucket configuration verify property (GitHub trains Issue 256).
  • Add StorageManager.get_files_server().
  • Add Task.get_project_id() using project name.
  • Add project_name argument to Task.set_project().
  • Add Task.connect() support for class / instance objects.
  • Add Task get_configuration_object() and Task.set_configuration_object() for easier automation.
  • Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings.
  • Use a built-in matplotlib convertor.
  • Add reporting text as debug sample example.

Bug Fixes

  • Fix Optuna HPO parameter serializing (GitHub trains Issue 254).
  • Fix connect dictionary '' cast to None (GitHub trains Issue 258).
  • Fix lightgbm binding keyword argument issue (GitHub trains Issue 251).
  • Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239).
  • Fix infinite recursion in StorageManager upload (GitHub trains Issue 253).
  • Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252).
  • Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243).
  • Fix matplotlib 3.3.3 support:
    • Fix global figure enumeration.
    • Fix binding without a title reported a single plot (untitled 00) instead of increasing the counter.
  • Fix Python 2.7 / 3.5 support.
  • Fix quote issue when reporting debug images.
  • Fix replace quote safe characters in upload file to include ;=@$.
  • Fix at_exit called from another process should be ignored.
  • Fix Task.set_tags() for completed / published tasks.
  • Fix Task.add_tags() not working when running remotely.
  • Fix Task.set_user_properties() docstring and interface.
  • Fix preview with JSON (dict) artifacts did not store the artifact.
  • Fix Logger.report_text() on task created using Task.create() was not supported.
  • Fix initialization for torch: only call torch get_worker_info if torch was loaded.
  • Fix flush (wait) on auxiliary task (obtained using Task.get_task()) should wait on all upload events.
  • Fix server was not updated with the defaults from the code when running remotely and configuration section is missing.
  • Fix connect dict containing None default values, blocked the remote execution from passing string instead of None.
  • Fix Task.upload_artifact() argument delete_after_upload=True used in conjunction with wait_for_upload=True was not supported.

Version 0.16.3#

Trains#

Features

  • Add LightGBM support.
  • Add initial Hydra support (GitHub trains Issue 219).
  • Add synchronous support for Task.upload_artifact() (GitHub trains Issue 231).
  • Add sdk.development.store_code_diff_from_remote (default false) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222).
  • Add sdk.development.detect_with_conda_freeze (default true) for full conda freeze (requires trains-agent >= 16.2).
  • Add user properties support in Task object.
  • Add Logger.report_table() support for table as list of lists.
  • Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph..
  • Add Pipeline / Optimization can be attached to any Task (not just the current task).
  • Add force_download flag to StorageManager.get_local_copy().
  • Add control over the artifact preview using Task.upload_artifact() preview argument.
  • Add Logger.report_matplotlib_figure() with examples.
  • Add Task.set_task_type().
  • Improve AWS auto-scaler:
    • Add key pair and security groups support.
    • Add multi-line support for both extra bash script and extra trains.conf data.
  • Update examples.

Bug Fixes

  • Fix Task.update_output_model() wrong argument order (GitHub trains Issue 220).
  • Fix initializing task on argparse parse in remote mode. Do not call Task.init() to avoid auto connect, use Task.get_task() instead.
  • Fix detected task cwd outside of repository root folder.
  • Fix Task.connect(dict) to place non-existing entries on the section name instead of General.
  • Fix Task.clone() support for trains-server < 0.16.
  • Fix StorageManager cache extract zipped artifacts. Use modified time instead of access time for cached files.
  • Fix diff command output was stripped.
  • Make sure local packages with multi-files are marked as package.
  • Fix Task.set_base_docker() should be skipped when running remotely.
  • Fix ArgParser binding handling of string argument with boolean default value (affects Pytorch Lightning integration).
  • When using detect_with_pip_freeze make sure that package @ file:// lines are replaced with package==x.y.z as local file will probably not be available.
  • Fix git packages to new pip standard package @ git+.
  • Improve conda package naming _ and - support.
  • Do not add specific setuptools version to requirements (pip can't install it anyway).
  • Fix image URL quoting when uploading from a file path.

Version 0.16.2#

Trains#

Features

  • Add Task.set_resource_monitor_iteration_timeout() to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208).
  • Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
  • Add git diff for repository submodule (requires git 2.14 or above).
  • Add TrainsJob.is_completed() and TrainsJob.is_aborted().
  • Add Task.logger property.
  • Add Pipeline Controller automation and example (see here).
  • Add improved trace filtering capabilities in trains.debugging.trace.trace_trains().
  • Add default help per argument (if not provided) in ArgParser binding.
  • Deprecate Task.reporter.
  • Update PyTorch example.
  • Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
  • Support Keras restructuring for Network, Model and Sequential.
  • Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.

Bug Fixes

  • Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
  • Fix sending empty reports (GitHub trains Issue 205).
  • Fix scatter2d sub-sampling and rounding.
  • Fix plots reporting:
    • NaN representation (matplotlib conversion).
    • Limit the number of digits in a plot to reduce plot size (using sdk.metrics.plot_max_num_digits configuration value).
  • Fix Task.wait_for_status() to reload after it ends.
  • Fix thread wait Ctrl-C interrupt did not exit process.
  • Improve Windows support for installed packages analysis.
  • Fix auto model logging using relative path.
  • Fix Hyperparameter Optimization example.
  • Fix Task.clone() when working with TrainsServer < 0.16.0.
  • Fix pandas artifact handling.
  • Avoid adding unnamed:0 column.
  • Return original pandas object.
  • Fix TrainsJob hyper-params overriding order was not guaranteed.
  • Fix ArgParse auto-connect to support default function type.

Trains-Agent#

Features

  • conda:

    • Add agent.package_manager.conda_env_as_base_docker allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz created by conda-pack). Use TRAINS_CONDA_ENV_PACKAGE environment variable to specify conda tar.gz file.
    • Add conda support for read-only pre-built environment (pass conda folder as docker_cmd on Task).
    • Improve trying to find conda executable.
  • k8s glue:

    • Add support for limited number of services exposing ports.
    • Add support for k8s pod custom user properties.
    • Allow selecting external trains.conf file for the pod itself.
    • Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress / ELB).
  • Allow specifying cudatoolkit version in the "installed packages" section when using conda as package manager (GitHub trains Issue 229).

  • Add agent.package_manager.force_repo_requirements_txt. If True, "Installed Packages" on Task are ignored, and only repository requirements.txt is used.

  • Pass TRAINS_DOCKER_IMAGE into docker for interactive sessions.

  • Add torchcsprng and torchtext to PyTorch resolving.

Bug Fixes

  • When logging suppress "\r" when reading a current chunk of a file / stream. Add agent.suppress_carriage_return (default True) to support previous behavior.
  • Make sure TRAINS_AGENT_K8S_HOST_MOUNT is used only once per mount.
  • Fix k8s glue script to trains-agent default docker script.
  • Fix apply git diff from submodule only.
  • conda:
    • Fix conda pip freeze to be consistent with trains 0.16.3.
    • Fix conda environment support for trains 0.16.3 full env. Add agent.package_manager.conda_full_env_update to allow conda to update back the requirements (default False, to preserve previous behavior).
    • Fix running from conda environment - conda.sh not found in first conda PATH match.
  • Fix docker mode ubuntu / debian support by making sure not to ask for input (fix tzdata install).
  • Fix repository detection - ignore environment SSH_AUTH_SOCK, only check if git user/pass are configured.
  • git diff:
    • Fix support for non-ascii diff.
    • Fix diff with empty line at the end will cause corrupt diff apply message.
    • Allow zero context diffs (useful when blind patching repository).
  • Fix daemon --stop when agent UID cannot be located.
  • Fix nvidia docker support on some linux distros (SUSE).
  • Fix nvidia pytorch dockers support.
  • Fix torch CUDA 11.1 support.
  • Fix requirements dict with null entry in pip should be considered None install from repository's requirements.txt.

Version 0.16.1#

Trains#

Features

  • Enhance HyperParameter optimizer.

Bug Fixes

  • Fix typing dependency for Python<3.5 (GitHub trains Issue 184).
  • Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
  • Fix Task.get_reported_console_output() for new Trains Server API v2.9.
  • Fix cache handling for different partitions / drives / devices.
  • Disable offline mode when running remotely (i.e. executed by Trains Agent).
  • Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
  • Fix double-escaped model design text when connecting OutputModel.

Trains Server#

important

Upgrading to this version requires a manual data migration.

Bug Fixes

  • Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
  • Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using services.tasks.multi_task_histogram_limit. (Trains Slack channel thread).
  • Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
  • Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
  • Update Fixed User full-name on restart (Trains Slack channel thread).
  • Fix project ordering issue.
  • When loading plots, display a spinner and don't show "no data".
  • Improve logging to provide more coherent ElasticSearch connection status in server log.

Trains Agent#

Features

  • Add sdk.metrics.plot_max_num_digits configuration option to reduce plot storage size.
  • Add agent.package_manager.post_packages and agent.package_manager.post_optional_packages configuration options to control packages install order (e.g. horovod).
  • Add agent.git_host configuration option for limiting git credential usage for a specific host (overridable using TRAINS_AGENT_GIT_HOST environment variable).
  • Add agent.force_git_ssh_port configuration option to control HTTPS to SSH link conversion for non-standard SSH ports.
  • Add requirements detection features. Improve support for detecting new pip version (20+) supporting package @ scheme://link.

Bug Fixes

  • Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a git+http link is enough to make sure all requirements are met / installed (GitHub Issue #196).
  • Fix incorrect check for spaces in current execution folder.
  • Fix requirements detection:
    • Update torch version after using downloaded / system pre-installed version.
    • Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).

Version 0.16.0#

Trains#

Features

  • Add continuing of previously executed experiments. Add Task.init() argument continue_last_task to continue a previously used Task (GitHub Issue #160).
  • Allow Task editing / creation from code. Task.export_task/import_task/update_task() (GitHub Issue #128).
  • Add offline mode. Use Task.set_offline() and Task.import_offline_session():
    • Support setting offline mode via TRAINS_OFFLINE_MODE=1 environment variable.
    • Support setting offline API version via TRAINS_OFFLINE_MODE=2.9 environment variable.
  • Automatically pickle all objects uploaded as artifacts, task.upload_artifact() argument auto_pickle=True (GitHub Issue #153).
  • Add multiple sections / groups support for Task hyperparameters, using Task.connect().
  • Add multiple configurations (files) using Task.connect_configuration().
  • Allow enabling OS environment logging using the sdk.development.log_os_environments configuration parameter (complements the TRAINS_LOG_ENVIRONMENT environment variable).
  • Add Optuna support for hyperparameter optimization controller. OptimizerOptuna is now the default optimizer.
  • Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
  • Add automatic FastAI logging. It is disabled if Tensorboard is loaded (assuming TensorBoardLogger will be used).
  • Support Tensorboard text logging (add_text()) as debug samples (.txt files), instead of as console output.
  • Allow for more standard confusion matrix reporting. Logger.report_confusion_matrix() argument yaxis_reversed (flips the confusion matrix if True, default False) (GitHub Issue #165).
  • Add support for Trains Server 0.16.0 (API v2.9 support).
  • Allow disabling Trains update message from the log using the TRAINS_SUPPRESS_UPDATE_MESSAGE environment variable (GitHub Issue #157).
  • Add AWS EC2 Auto-Scaler service wizard and Service.
  • Improved and updated examples:
    • Add Keras Tuner CIFAR10 example.
    • Add FastAI example.
    • Update PyTorch Jupyter notebook examples (GitHub Issue #150).
  • Support global requirements detection using pip freeze (set sdk.development.detect_with_pip_freeze configuration in trains.conf).
  • Add Task.get_projects() to get all projects in the system, sorted by last update time.

Bug Fixes

  • Fix UTC to time stamp in comment (GitHub Issue #152).
  • Fix and enhance GPU monitoring:
  • Fix filename too long bug (GitHub trains-server Issue #49).
  • Fix TensorFlow image logging to allow images with no width / height / color metadata (GitHub Issue #182).
  • Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush stdout.
  • Fix plotly support for matplotlib 3.3.
  • Add Python 2.7 support for get_current_thread_id().
  • Update examples requirements.
  • Fix and improve signal handling.
  • Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
  • Fix auto logging multiple argparse calls before Task.init().
  • Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named auxiliary_git_dif.
  • Fix requirements detection:
    • Fix Trains installed from git+.
    • Fix when Trains is not directly imported.
    • Fix multiple -e packages were not detected (only the first one).
    • Fix running with Trains in PYTHONPATH resulted in double entry of trains.
  • Fix Task.set_base_docker() on main task to do nothing when running remotely.

Trains Server#

important

Upgrading to this version requires a manual data migration.

Features

  • Add experiment hyperparameter grouping:
    • HYPER PARAMETERS tab renamed to CONFIGURATIONS.
    • CONFIGURATIONS tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
    • Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
    • Add command line options group * argparse and older experiments parameters (CONFIGURATIONS / HYPER PARAMETERS / Args).
    • Add TensorFlow definitions group (CONFIGURATIONS / HYPER PARAMETERS / TF_DEFINE).
    • Add environment variables group (CONFIGURATIONS / HYPER PARAMETERS / Environment).
  • Improve experiment model configuration:
    • Model design is in the ARTIFACTS tab.
    • Experiment model description is in the CONFIGURATIONS OBJECTS section in the CONFIGURATIONS tab.
  • Improve experiment comparison:
    • In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
    • Remove fields providing no additional information from comparison.
  • Improve the model framework filter. Filter contains only frameworks used by models in the project.
  • Add configurable Trains services examples.
  • Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
  • Add legend on / off toggle control for every plot.
  • Add clear button for text areas (GitHub trains-server Issue #42).
  • Reinstate the bottom bar Archive button.
  • Add Trains community links to left bar.
  • Add Hi-DPI display support.
  • Add debug.ping endpoint for simple health monitoring.
  • Add support for field exclusion in *.get_all endpoints.
  • Move to ElasticSearch 7. Requires manual data migration.

Bug Fixes

  • Auto-fit column width on column resize double click.
  • Allow top-bar search if fewer than three characters are entered, and Enter is pressed.

Trains Agent#

Features

  • Add agent.docker_init_bash_script configuration section to allow finer control over Docker startup script.
  • Changed default Docker image from nvidia/cuda to nvidia/cuda:10.1-runtime-ubuntu18.04 to support cudnn frameworks (e.g. TF).
  • Improve support for Dockers with preinstalled conda environment.
  • Improve trains-agent-docker spinning.
  • Add daemon --order-fairness for round-robin queue pulling.
  • Add daemon --stop to terminate a running agent (assuming other arguments are the same). If no additional arguments, Agents are terminated in lexicographical order.
  • Support cleanup of all log files on termination unless executed with --debug.
  • Add error message when Trains API Server is not accessible on startup.

Bug Fixes

  • Fix GPU Windows monitoring support (GitHub Issue #177).
  • Fix .git-credentials and .gitconfig mapping into docker.
  • Fix non-root docker image usage.
  • Fix docker to use UTF-8 encoding, so prints won't break it.
  • Fix --debug to set all loggers to DEBUG.
  • Fix task status change to queued should never happen during Task runtime.
  • Fix requirement_parser to support package @ git+http lines.
  • Fix GIT user/password in requirements and support for -e git+http lines.
  • Fix configuration wizard to generate trains.conf matching latest Trains definitions.