S3 - Simple Storage Service

AWS S3 allows us to store and retrieve files in the cloud; we have cheap and virtually unlimited space.

S3 stores files in buckets, which have a globally-unique name; we can create a bucket using cloudformation;

1. creating the bucket

The simplest cloudformation template would just define one resource, of type AWS::S3::Bucket. The only required property is BucketName, which is a globally unique name (S3 is one of the oldest AWS services, and they went with global names, rather than per-account).

So, a simple CF template would look like: (remember the bucket name has to be unique, so replace it with a name of your own if you run this).

Simplest yml bucket

AWSTemplateFormatVersion: '2010-09-09'
Description: Simple S3 Bucket
Resources:
  FirstS3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: okaram-simple-s3-bucket

2. Downloading a file

AWS provides a REST API for most of its services, and the Java SDK tends to wrap that API with a common pattern; you have a client interface and class, and methods that take request objects.

For S3, the client interface is AmazonS3 and the client class is AmazonS3Client. We create a GetObjectRequest', specifying the bucket and the file name, and then call `getObject on the client.

For our groovy code, we use @Grab to download the sdk parts we need; We can download the full java sdk, or just the pieces we need; for this example, I’m just using 'com.amazonaws:aws-java-sdk-s3'.

So, a program to download a file would be as follows (we take the bucket and file as program arguments)

download.groovy

@Grab('com.amazonaws:aws-java-sdk-s3:+')
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;

import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.GetObjectRequest;

String bucketName=args[0]
String remoteFileName=args[1]

AmazonS3 s3Client = new AmazonS3Client();
GetObjectRequest request= new GetObjectRequest(bucketName, remoteFileName);
S3Object object = s3Client.getObject(request);
// s3Client also has a getObject that directly takes the bucket and file names, if you prefer

InputStream objectData = object.getObjectContent();
objectData.eachLine { line -> println(line)}
objectData.close();

3. Uploading a file

Uploading through S3Client directly involves uploading a file in several parts, and then assembling the parts, which makes it much more convenient to use a TransferManager, which takes care of breaking the file in parts and uploading all the parts (you can also use it to download files). A simple file uploader would look as follows:

download.groovy

@Grab('com.amazonaws:aws-java-sdk-s3:+')
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;

import java.io.File;
import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.Upload;
String bucketName=args[0]
String remoteFileName=args[1]
String localFileName=args[2]

TransferManager tm = new TransferManager();
// TransferManager processes all transfers asynchronously,
// so this call will return immediately.
Upload upload = tm.upload(bucketName, remoteFileName, new File(localFileName));
// this blocks until the transfer is done
upload.waitForCompletion();
System.out.println("Upload complete.");
// the TransferManager creates threads, so if we don't exit it will run forever
tm.shutdownNow();

4. Listing all files in a bucket

S3 does not really have folders, (but we can use any separator (usually /) within file names to simulate folders).

A program to list the files in a bucket, without regard of any folders (would include ALL files, regardles of which folder it belongs to) is shown below. It also illustrates a common pattern within the AWS API and SDK; if an operation can return a list of infinite (well, unbound) size, then the API will return just a portion of the list, and then a continuation token; if we get a continuation token, then we know that we need to call again, passing that token, to get more elements from the list.

Here we use a ListObjectsV2Request

ls.groovy

@Grab('com.amazonaws:aws-java-sdk-s3:+')
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;

import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Request;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;

String bucketName=args[0]
AmazonS3 s3client = new AmazonS3Client();
final ListObjectsV2Request request = new ListObjectsV2Request()
          .withBucketName(bucketName)
          .withMaxKeys(2);
ListObjectsV2Result result;
while(true){
    result = s3client.listObjectsV2(request);

    for (S3ObjectSummary objectSummary :
        result.getObjectSummaries()) {
        System.out.println(" - " + objectSummary.getKey() + "  " +
                "(size = " + objectSummary.getSize() +
                ")");
    }
    request.setContinuationToken(result.getNextContinuationToken());
    if(!result.isTruncated())
      break;
}

5. Bucket with web configuration and authorization for everybody to read

Bucket with web configuration and a policy allowing anybody to read

AWSTemplateFormatVersion: '2010-09-09'
Description: My s3 buckets
Resources:
  FirstS3Bucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Properties:
      VersioningConfiguration:
        Status: Enabled
      BucketName: okaram-first-s3-bucket
      CorsConfiguration:
        CorsRules:
          -
            AllowedOrigins:
              -
                '*'
            AllowedMethods:
              -
                'GET'
      WebsiteConfiguration:
        ErrorDocument: String
        IndexDocument: String
  AllReadBucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      PolicyDocument:
        Id: MyPolicy
        Version: '2012-10-17'
        Statement:
        - Sid: ReadAccess
          Action:
            - s3:GetObject
          Effect: Allow
          Resource: 'arn:aws:s3:::okaram-first-s3-bucket/*'
          Principal: '*'
      Bucket: !Ref FirstS3Bucket

6. TODO

ls with folders, withDelimiter/withPrefix , getCommonPrefixes
Permission policies with CF
Deletion policies etc
Temporary access URLs