layout | title | permalink | redirect_from | ||
---|---|---|---|---|---|
post |
S3CMD |
/docs/s3cmd |
|
While the preferred and recommended management client for AIStore is its own CLI, Amazon's s3cmd
client can also be used, with certain minor limitations.
But first:
AIStore is a multi-cloud mutli-backend solution: an AIS cluster can simultaneously access ais://
, s3://
, gs://
, etc. buckets.
For background on supported Cloud and non-Cloud backends, please see Backend Providers
However:
When we use 3rd party clients, such as s3cmd
and aws
, we must impose a certain limitation: buckets in question must be unambiguously resolvable by name.
The following shows (native) ais
and (Amazon's) s3cmd
CLI that in many cases can be used interchangeably. There is a single bucket named abc
and we access it using the two aforementioned clients.
But again, if we want to use s3cmd
(or aws
, etc.), there must be a single abc
bucket across all providers.
Notice that with
s3cmd
we must always uses3://
prefix.
$ ais ls ais:
$ ais create ais://abc
"ais://abc" created (see https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#default-bucket-properties)
$ ais bucket props set ais://abc checksum.type=md5
Bucket props successfully updated
"checksum.type" set to: "md5" (was: "xxhash")
$ s3cmd put README.md s3://abc
upload: 'README.md' -> 's3://abc/README.md' [1 of 1]
10689 of 10689 100% in 0s 3.13 MB/s done
upload: 'README.md' -> 's3://abc/README.md' [1 of 1]
10689 of 10689 100% in 0s 4.20 MB/s done
$ s3cmd rm s3://abc/README.md
delete: 's3://abc/README.md'
Similarly:
$ ais ls s3:
aws://my-s3-bucket
...
$ s3cmd put README.md s3://my-s3-bucket
upload: 'README.md' -> 's3://my-s3-bucket/README.md' [1 of 1]
10689 of 10689 100% in 0s 3.13 MB/s done
upload: 'README.md' -> 's3://abc/README.md' [1 of 1]
10689 of 10689 100% in 0s 4.20 MB/s done
$ s3cmd rm s3://my-s3-bucket/README.md
delete: 's3://my-s3-bucket/README.md'
When using s3cmd
the very first time, or if your AWS access credentials have changed, or if you'd want to change certain s3cmd
defaults (also shown below) - in each one and all of those cases run s3cmd --configure
.
NOTE: it is important to have s3cmd
client properly configured.
For example:
# s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key [ABCDABCDABCDABCDABCD]: EFGHEFGHEFGHEFGHEFGH
Secret Key [abcdabcdABCDabcd/abcde/abcdABCDabc/ABCDe]: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH
Default Region [us-east-2]:
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]:
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]:
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program [/usr/bin/gpg]:
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]:
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:
New settings:
Access Key: EFGHEFGHEFGHEFGHEFGH
Secret Key: efghEFGHefghEFGHe/ghEFGHe/ghEFghef/hEFGH
Default Region: us-east-2
S3 Endpoint: s3.amazonaws.com
DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com
Encryption password:
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name:
HTTP Proxy server port: 0
Test access with supplied credentials? [Y/n] n
Save settings? [y/N] y
Configuration saved to '/home/.s3cfg'
It is maybe a good idea to also notice the version of the
s3cmd
you have, e.g.:
$ s3cmd --version
s3cmd version 2.0.1
In this section we walk the most basic and simple (and simplified) steps to get s3cmd
to conveniently work with AIStore.
With s3cmd
client configuration safely stored in $HOME/.s3cfg
, the next immediate step is to figure out AIS endpoint
AIS cluster must be running, of course.
The endpoint consists of a gateway's hostname and its port followed by /s3
suffix.
AIS clusters usually run multiple gateways all of which are equivalent in terms of supporting all operations and providing access (to their respective clusters).
For example: given AIS gateway at 10.10.0.1:51080
(where 51080
would be the gateway's listening port), AIS endpoint then would be 10.10.0.1:51080/s3
.
NOTE the
/s3
suffix. It is important to have it in all subsequents3cmd
requests to AIS, and the surest way to achieve that is to have it in the endpoint.
But then the question is, how to transfer AIS endpoint into s3cmd
commands. There are essentially two ways:
s3cmd
command lines3cmd
configuration
For command line (related) examples, see, for instance, this multipart upload test. In particular, the following settings:
s3endpoint="localhost:8080/s3"
host="--host=$s3endpoint"
host_bucket="--host-bucket=$s3endpoint/%(bucket)"
Separately, note that by default aistore handles S3 API at its
AIS_ENDPOINT/s3
endpoint (e.g.,localhost:8080/s3
). However, any aistore cluster is configurable to accept S3 API calls at its root as well. That is, without the "/s3" suffix shown above.
Back to running s3cmd
though - the second, and arguably the easiest, way is exemplified by the diff
below:
# diff -uN .s3cfg.orig $HOME/.s3cfg
--- .s3cfg.orig 2022-07-18 09:42:36.502271267 -0400
+++ .s3cfg 2022-07-18 10:14:50.878813029 -0400
@@ -29,8 +29,8 @@
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase =
guess_mime_type = True
-host_base = s3.amazonaws.com
-host_bucket = %(bucket)s.s3.amazonaws.com
+host_base = 10.10.0.1:51080/s3
+host_bucket = 10.10.0.1:51080/s3
human_readable_sizes = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
Here we hack s3cmd
configuration: replace Amazon's default s3.amazonaws.com
endpoint with the correct one, and be done.
From this point on, s3cmd
will be calling AIStore at 10.10.0.1:51080, with /s3
suffix causing the latter to execute special handling (specifically) designed to support S3 compatibility.
Alternatively, instead of hacking .s3cfg
once and for all we could use --host
and --host-bucket
command-line options (of the s3cmd
). For instance:
$ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl --host=10.10.0.1:51080/s3 --host-bucket=10.10.0.1:51080/s3
Compare with the identical
PUT
example in the section 5 below.
Goes without saying that, as long as .s3cfg
keeps pointing to s3.amazonaws.com
, the --host
and --host-bucket
must be explicitly specified in every s3cmd
command.
This next step actually depends on the AIStore configuration - the configuration of the cluster we intend to use with s3cmd
client.
Specifically, there are two config knobs of interest:
# ais config cluster net.http.use_https
PROPERTY VALUE
net.http.use_https false
# ais config cluster checksum.type
PROPERTY VALUE
checksum.type xxhash
Note that HTTPS is s3cmd
default, and so if AIStore runs on HTTP every single s3cmd
command must have the --no-ssl
option.
Setting
net.http.use_https=true
requires AIS cluster restart. In other words, HTTPS is configurable but for the HTTP => HTTPS change to take an effect AIS cluster must be restarted.
NOTE
--no-ssl
flag, e.g.:s3cmd ls --no-ssl
to list buckets.
$ s3cmd ls --host=10.10.0.1:51080/s3
If the AIS cluster in question is deployed with HTTP (the default) and not HTTPS:
$ ais config cluster net.http
PROPERTY VALUE
net.http.server_crt server.crt
net.http.server_key server.key
net.http.write_buffer_size 65536
net.http.read_buffer_size 65536
net.http.use_https false # <<<<<<<<< (NOTE) <<<<<<<<<<<<<<<<<<
net.http.skip_verify false
net.http.chunked_transfer true
we need turn HTTPS off in the s3cmd
client using its --no-ssl
option.
For example:
$ s3cmd ls --host=10.10.0.1:51080/s3 --no-ssl
Secondly, there's the second important knob mentioned above: checksum.type=xxhash
(where xxhash
is the AIS's default).
However:
When using s3cmd
with AIStore, it is strongly recommended to update the checksum to md5
.
The following will update checksum type globally, on the level of the entire cluster:
# This update will cause all subsequently created buckets to use `md5`.
# But note: all existing buckets will keep using `xxhash`, as per their own - per-bucket - configuration.
$ ais config cluster checksum.type
PROPERTY VALUE
checksum.type xxhash
# ais config cluster checksum.type=md5
{
"checksum.type": "md5"
}
Alternatively, and preferably, update specific bucket's property (e.g. ais://nnn
below):
$ ais bucket props set ais://nnn checksum.type=md5
Bucket props successfully updated
"checksum.type" set to: "md5" (was: "xxhash")
Once the 3 steps (above) are done, the rest must be really easy. Just start using s3cmd
as described, for instance:
# Create bucket `mmm` using `s3cmd` make-bucket (`mb`) command:
$ s3cmd mb s3://mmm --no-ssl
Bucket 's3://mmm/' created
# And double-check it using AIS CLI:
$ ais ls ais:
AIS Buckets (2)
ais://mmm
...
Not to forget to change the bucket's checksum to md5
(needed iff the default cluster-level checksum != md5
):
$ ais bucket props set ais://mmm checksum.type=md5
PUT:
$ s3cmd put README.md s3://mmm/saved-readme.md --no-ssl
GET:
$ s3cmd get s3://mmm/saved-readme.md /tmp/copied-readme.md --no-ssl
download: 's3://mmm/saved-readme.md -> '/tmp/copied-readme.md' [1 of 1]
And so on.
In this section, we use updated .s3cfg
to avoid typing much longer command lines that contain --host
and --host-bucket
options.
In other words, we simplify s3cmd
commands using the following local configuration update:
$ diff -uN ~/.s3cfg.orig ~/.s3cfg
--- /root/.s3cfg.orig
+++ /root/.s3cfg
@@ -31,6 +31,8 @@
guess_mime_type = True
host_base = s3.amazonaws.com
host_bucket = %(bucket)s.s3.amazonaws.com
+host_base = localhost:8080/s3
+host_bucket = localhost:8080/s3
human_readable_sizes = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
NOTE:
localhost:8080
(above) can be replaced with any legitimate (http or https) address of any AIS gateway. The latter may - but not necessarily have to - be specified with the environment variableAIS ENDPOINT
.
The following further assumes that abc
is an AIStore bucket, while my-s3-bucket
is S3 bucket that this AIStore cluster can access.
The cluster must be deployed with AWS credentials to list, read, and write
my-s3-bucket
.
# Upload 50MB aisnode executable in 5MB chunks
$ s3cmd put /go/bin/aisnode s3://abc --multipart-chunk-size-mb=5
# Notice the `ais://` prefix:
$ ais ls ais://abc
NAME SIZE
aisnode 50.98MiB
# When using Amazon clients, we have to resort to always use s3://:
$ s3cmd ls s3://abc
2022-08-22 13:04 53452800 s3://abc/aisnode
# Confirm via `ls`:
$ ls -al /go/bin/aisnode
-rwxr-xr-x 1 root root 53452800 Aug 22 12:17 /root/gocode/bin/aisnode*
Uploading s3://my-s3-bucket
looks absolutely identical with a one notable difference: consistently using s3:
(or aws://
) prefix:
# Upload 50MB aisnode executable in 7MB chunks
$ s3cmd put /go/bin/aisnode s3://my-s3-bucket --multipart-chunk-size-mb=7
$ ais ls s3://my-s3-bucket
NAME SIZE
aisnode 50.98MiB
$ s3cmd ls s3://my-s3-bucket
2022-08-22 13:04 53452800 s3://my-s3-bucket/aisnode
Use s3cmd multipart
to show any/all ongoing uploads to s3://my-s3-bucket
(or any other bucket):
$ s3cmd multipart s3://my-s3-bucket
Note that s3cmd
expects S3 URI, simethin like s3://bucket-name
.
In other words, s3cmd
does not recognize any prefix other than s3://
.
In the examples above, the mmm
and nnn
buckets are, actually, AIS buckets with no remote backends.
Nevertheless, when using s3cmd
we have to reference them as s3://mmm
and s3://nnn
, respectively.
When Auth is enabled on AIStore, it expects a JWT token for each request. Unfortunately, using the --add-header
option in s3cmd
doesn't work because the header gets overwritten with the signature and signing algorithm when the actual request is made.
To overcome this, you can modify the S3.py
file in s3cmd
to include the JWT token directly in the Authorization
header before the request is sent.
In the S3.py
file (found in the S3CMD GitHub repository), add the following line before the request is sent:
self.headers["Authorization"] = "Bearer <token>"
$ git diff
diff --git a/S3/S3.py b/S3/S3.py
index d4cac8f..9fa1496 100644
--- a/S3/S3.py
+++ b/S3/S3.py
@@ -210,6 +210,7 @@ class S3Request(object):
resource['uri'] = s3_quote(resource['uri'], quote_backslashes=False, unicode_output=True)
# Get the final uri by adding the uri parameters
resource['uri'] += format_param_str(self.params)
+ self.headers["Authorization"] = "Bearer <token>"
return (self.method_string, resource, self.headers)
Adding this line ensures that the Authorization
header contains the correct token for requests to the AIStore server.
For table summary documenting AIS/S3 compatibility and further discussion, please see: