Daniel Green's Homepage

Optimizing EC2-S3 transfer throughput / bandwidth utilization

The base "aws s3" commands like "sync" seem to work sequentially, uploading one file at a time.  S3 also seems to have a per-transfer bandwidth limit, which significantly restricts the performance of 10+ gbit instances, and increases the amount of time one must wait around for transfers to complete.

So we just need to parallelize the "aws cp" command ourselves using gasp gnu parallel:

ls /ssd | grep xbstream | parallel -j 20 aws s3 cp /ssd/{} s3://my_bucket/

This command finds our database backups (*.xbstream) in /ssd, then uploads 20 of them at a time to the S3 bucket named "my_bucket".  Just tune that 20 count to match your available cpu and network resources, and transfers to S3 should speed up quite dramatically.