Using the AWS CLI to Collect Amazon S3 Bucket Object Information

Share on:

Data collection is the theme of the month! This next ask was to easily identify S3 bucket objects, based on name, class of storage and time created.

You may be asking why is this important? Well when landing objects in S3, they can be stored in in tons of nested folders and there are many different S3 classes. What happens if you want to search all the objects and identify specifically which object is in which storage class? This script will help collect that data.

In my case here at Pure Storage, we allow customers to offload snapshots to S3 as part of our CloudSnap feature.

Today 3 classes are supported:

  • Standard - Items are stored in the S3 Standard Tier
  • Retention-Based - Objects are stored in Standard and then moved to Standard-IA after 30 days through a lifecycle policy.
  • Direct to Standard IA - When using Retention Based and setting Retention to 30 days or higher, we will place objects immediately in the Standard-IA class.

My initial use case of this script was to gather all of S3 items within a bucket and then sort by storage class.

To execute the code run

./get-s3-bucket-objects.sh <bucketname>

#! /bin/bash
print_help() {
echo Usage:
echo -e " get-s3-bucket-objects bucket|help [aws-profile]"
exit $1
}
if [ "$#" -eq 0 ] || [ "$1" == "help" ]; then
print_help 0
elif [[ ! -z "$2" ]]; then
_profile="--profile $2"
fi
_bucket="$1"
_objects_file="/tmp/s3_bucket_objects.txt"
_sorted_objects_file="/tmp/sorted_s3_bucket_objects.txt"
echo -n > $_objects_file
echo -n > $_sorted_objects_file
# get all objects within the s3 bucket. saved to
echo "Gathering Objects"
aws s3api list-objects --bucket $_bucket --query 'Contents[].{Key: Key, Size: Size,Class:StorageClass,LastModified:LastModified}' --no-cli-pager --output text > $_objects_file
sort -k1 -k3 $_objects_file > $_sorted_objects_file
#sort -k1 -k3 c:\temp\aws-ai.txt > c:\temp\sorted-aws-ai.txt
#remove files
rm -rf $_objects_file
echo "results located in $_sorted_objects_file"
cat $_sorted_objects_file
exit 0
Here is an example output, no Header View, however you can see what each column is based on the Query Fields in the AWS CLI command.

Another great way of using API’s and CLI’s to collect information. Having to collect all this data manually would have been a pain, but the script should help you collect this information and present it easily!

See Also