Skip to main content

StorageManager

class StorageManager()#

StorageManager is helper interface for downloading & uploading files to supported remote storage Support remote servers: http(s)/S3/GS/Azure/File-System-Folder Cache is enabled by default for all downloaded remote urls/files


StorageManager.download_folder#

classmethod download_folder(remote_url, local_folder=None, match_wildcard=None, overwrite=False)

Download remote folder recursively to the local machine, maintaining the sub folder structure from the remote storage.

info

If we have a local file s3://bucket/sub/file.ext then StorageManager.download_folder(β€˜s3://bucket/’, β€˜~/folder/’) will create ~/folder/sub/file.ext

  • Parameters

    • remote_url (str ) – Source remote storage location, tree structure of remote_url will be created under the target local_folder. Supports S3/GS/Azure and shared filesystem. Example: β€˜s3://bucket/data/’

    • local_folder (str ) – Local target folder to create the full tree from remote_url. If None, use the cache folder. (Default: use cache folder)

    • match_wildcard (Optional[str]) – If specified only download files matching the match_wildcard Example: *.json

    • overwrite (bool ) – If False, and target files exist do not download. If True always download the remote files. Default False.

  • Return type

    Optional[str]

  • Returns

    Target local folder


StorageManager.get_local_copy#

classmethod get_local_copy(remote_url, cache_context=None, extract_archive=True, name=None, force_download=False)

Get a local copy of the remote file. If the remote URL is a direct file access, the returned link is the same, otherwise a link to a local copy of the url file is returned. Caching is enabled by default, cache limited by number of stored files per cache context. Oldest accessed files are deleted when cache is full.

  • Parameters

    • remote_url (str ) – remote url link (string)

    • cache_context (str ) – Optional caching context identifier (string), default context β€˜global’

    • extract_archive (bool ) – if True returned path will be a cached folder containing the archive’s content, currently only zip files are supported.

    • name (str ) – name of the target file

    • force_download (bool ) – download file from remote even if exists in local cache

  • Return type

    str

  • Returns

    Full path to local copy of the requested url. Return None on Error.


StorageManager.list#

classmethod list(remote_url, return_full_path=False)

Return a list of object names inside the base path

  • Parameters

    • remote_url (str ) – The base path. For Google Storage, Azure and S3 it is the bucket of the path, for local files it is the root directory. For example: AWS S3: s3://bucket/folder will list all the files you have in s3://bucket-name/folder*/*. The same behaviour with Google Storage: gs://bucket/folder, Azure blob storage: azure://bucket/folder and also file system listing: /mnt/share/folder_

    • return_full_path (bool ) – If True, return a list of full object paths, otherwise return a list of relative object paths (default False).

  • Return type

    Optional[List[str]]

  • Returns

    The paths of all the objects in the storage base path under prefix, relative to the base path. None in case of list operation is not supported (http and https protocols for example)


StorageManager.set_cache_file_limit#

classmethod set_cache_file_limit(cache_file_limit, cache_context=None)

Set the cache context file limit. File limit is the maximum number of files the specific cache context holds. Notice, there is no limit on the size of these files, only the total number of cached files.

  • Parameters

    • cache_file_limit (int ) – New maximum number of cached files

    • cache_context (str ) – Optional cache context identifier, default global context

  • Return type

    int

  • Returns

    The new cache context file limit.


StorageManager.upload_file#

classmethod upload_file(local_file, remote_url, wait_for_upload=True, retries=1)

Upload a local file to a remote location. remote url is the finale destination of the uploaded file.

Examples:

upload_file('/tmp/artifact.yaml', 'http://localhost:8081/manual_artifacts/my_artifact.yaml')
upload_file('/tmp/artifact.yaml', 's3://a_bucket/artifacts/my_artifact.yaml')
upload_file('/tmp/artifact.yaml', '/mnt/share/folder/artifacts/my_artifact.yaml')
  • Parameters

    • local_file (str ) – Full path of a local file to be uploaded

    • remote_url (str ) – Full path or remote url to upload to (including file name)

    • wait_for_upload (bool ) – If False, return immediately and upload in the background. Default True.

    • retries (int ) – Number of retries before failing to upload file, default 1.

  • Return type

    str

  • Returns

    Newly uploaded remote URL.


StorageManager.upload_folder#

classmethod upload_folder(local_folder, remote_url, match_wildcard=None)

Upload local folder recursively to a remote storage, maintaining the sub folder structure in the remote storage.

info

If we have a local file ~/folder/sub/file.ext then StorageManager.upload_folder(β€˜~/folder/’, β€˜s3://bucket/’) will create s3://bucket/sub/file.ext

  • Parameters

    • local_folder (str ) – Local folder to recursively upload

    • remote_url (str ) – Target remote storage location, tree structure of local_folder will be created under the target remote_url. Supports Http/S3/GS/Azure and shared filesystem. Example: β€˜s3://bucket/data/’

    • match_wildcard (str ) – If specified only upload files matching the match_wildcard Example: *.json Notice: target file size/date are not checked. Default True, always upload. Notice if uploading to http, we will always overwrite the target.

  • Return type

    None