Using the AWS CLI to Collect Amazon S3 Bucket Object Information
Data collection is the theme of the month! This next ask was to easily identify S3 bucket objects, based on name, class of storage and time created.
You may be asking why is this important? Well when landing objects in S3, they can be stored in in tons of nested folders and there are many different S3 classes. What happens if you want to search all the objects and identify specifically which object is in which storage class? This script will help collect that data.
In my case here at Pure Storage, we allow customers to offload snapshots to S3 as part of our CloudSnap feature.
Today 3 classes are supported:
- Standard - Items are stored in the S3 Standard Tier
- Retention-Based - Objects are stored in Standard and then moved to Standard-IA after 30 days through a lifecycle policy.
- Direct to Standard IA - When using Retention Based and setting Retention to 30 days or higher, we will place objects immediately in the Standard-IA class.
Pre-Requisite
- Install the AWS CLI
- Use aws configure to login to your AWS Account and Region.
Using the Code
My initial use case of this script was to gather all of S3 items within a bucket and then sort by storage class.
To execute the code run
./get-s3-bucket-objects.sh <bucketname>
#! /bin/bash | |
print_help() { | |
echo Usage: | |
echo -e " get-s3-bucket-objects bucket|help [aws-profile]" | |
exit $1 | |
} | |
if [ "$#" -eq 0 ] || [ "$1" == "help" ]; then | |
print_help 0 | |
elif [[ ! -z "$2" ]]; then | |
_profile="--profile $2" | |
fi | |
_bucket="$1" | |
_objects_file="/tmp/s3_bucket_objects.txt" | |
_sorted_objects_file="/tmp/sorted_s3_bucket_objects.txt" | |
echo -n > $_objects_file | |
echo -n > $_sorted_objects_file | |
# get all objects within the s3 bucket. saved to | |
echo "Gathering Objects" | |
aws s3api list-objects --bucket $_bucket --query 'Contents[].{Key: Key, Size: Size,Class:StorageClass,LastModified:LastModified}' --no-cli-pager --output text > $_objects_file | |
sort -k1 -k3 $_objects_file > $_sorted_objects_file | |
#sort -k1 -k3 c:\temp\aws-ai.txt > c:\temp\sorted-aws-ai.txt | |
#remove files | |
rm -rf $_objects_file | |
echo "results located in $_sorted_objects_file" | |
cat $_sorted_objects_file | |
exit 0 |
Conclusion
Another great way of using API’s and CLI’s to collect information. Having to collect all this data manually would have been a pain, but the script should help you collect this information and present it easily!