Using the AWS CLI to Collect Amazon S3 Bucket Object Information

Feb 14, 2023 S3 Amazon Web Services

Share on:

Data collection is the theme of the month! This next ask was to easily identify S3 bucket objects, based on name, class of storage and time created.

You may be asking why is this important? Well when landing objects in S3, they can be stored in in tons of nested folders and there are many different S3 classes. What happens if you want to search all the objects and identify specifically which object is in which storage class? This script will help collect that data.

In my case here at Pure Storage, we allow customers to offload snapshots to S3 as part of our CloudSnap feature.

Today 3 classes are supported:

Standard - Items are stored in the S3 Standard Tier
Retention-Based - Objects are stored in Standard and then moved to Standard-IA after 30 days through a lifecycle policy.
Direct to Standard IA - When using Retention Based and setting Retention to 30 days or higher, we will place objects immediately in the Standard-IA class.

Pre-Requisite

Install the AWS CLI
Use aws configure to login to your AWS Account and Region.

Using the Code

My initial use case of this script was to gather all of S3 items within a bucket and then sort by storage class.

To execute the code run

./get-s3-bucket-objects.sh <bucketname>

	#! /bin/bash

	print_help() {
	echo Usage:
	echo -e " get-s3-bucket-objects bucket\|help [aws-profile]"
	exit $1
	}

	if [ "$#" -eq 0 ] \|\| [ "$1" == "help" ]; then
	print_help 0
	elif [[ ! -z "$2" ]]; then
	_profile="--profile $2"
	fi

	_bucket="$1"
	_objects_file="/tmp/s3_bucket_objects.txt"
	_sorted_objects_file="/tmp/sorted_s3_bucket_objects.txt"

	echo -n > $_objects_file
	echo -n > $_sorted_objects_file

	# get all objects within the s3 bucket. saved to
	echo "Gathering Objects"
	aws s3api list-objects --bucket $_bucket --query 'Contents[].{Key: Key, Size: Size,Class:StorageClass,LastModified:LastModified}' --no-cli-pager --output text > $_objects_file
	sort -k1 -k3 $_objects_file > $_sorted_objects_file
	#sort -k1 -k3 c:\temp\aws-ai.txt > c:\temp\sorted-aws-ai.txt

	#remove files
	rm -rf $_objects_file

	echo "results located in $_sorted_objects_file"
	cat $_sorted_objects_file

	exit 0

view raw get-s3-bucket-objects.sh hosted with ❤ by GitHub

Here is an example output, no Header View, however you can see what each column is based on the Query Fields in the AWS CLI command.

Conclusion

Another great way of using API’s and CLI’s to collect information. Having to collect all this data manually would have been a pain, but the script should help you collect this information and present it easily!

Using the AWS CLI to Collect Amazon S3 Bucket Object Information

Pre-Requisite

Using the Code

Conclusion

See Also