How we reduced the weight of our 80k+ images from 222gb to 65gb

Hamza Baig
Le Craft
Published in
4 min readMay 13, 2018

--

Images are the fundamental part of WanderSnap.co as our platform is all about Images. Keeping that in mind, while migrating from old system to new system we saw that the images in our buckets were >5mb. That was problematic as they need to be served in the browser and considering the galleries of our customers consist of at least 30+ images it’ll be a nightmare to load them as it is in the browser and will give them bad UX.

We decided to compress the images and resize them as the images from DSLR are very high quality, if we compress them and resize them it won’t be even noticeable to human eye because we can’t see so much details in the picture.

In our case, we went for 81% compression in JPEG format since this is the level where you achieve lossless compression. Meaning it’s not perceptible for human eye. We also reduced the size of the images proportionally to 2000px width since it’s more than enough for most of screens.

This idea was not easy to execute as it looks because we were storing our images in S3 buckets in AWS. So that means first we had to download every image, resize and compress it and then upload it back with same URL and read permission.

We choose to do it on EC2 Server as the latency there will be very low because its on the AWS network and in same region as S3 buckets are.

To compress and resize images, we had two options

  • Download and save each image to a temporary file and then process it
  • Utilise the piping feature in Linux and use it to pipe the stdout of image to stdin of the image resizing script and then pipe the output back to AWS cli to upload

We used the latter as it looked more clean and neat, also who wants to create and then clean the temporary file ? Looks dirty :/

To get the list of images, we used our DB to return us the paths of the images in a text file. We then used a script to loop over the paths returned by the db and used AWS CLI to download the image from the path piped the output to the image magick library to resize and compress it and then piped the ouput back to AWS CLI to upload it to same path overwriting the previous one.

Lets explain this with the help of code snippets.

Paths File Reading Script

This is how we read the file that contains the image paths.

Reading paths of images from a txt file using Shell

There is resize.sh script which takes the max width to resize the image to and we also pass the script the path ( $line) of the image. You may have noticed the & at the end of line 9. This is to make this script run in the background and then increment the counter.

The while loop checks in the beginning that if the counter is equal to 8, then it waits for the background processes to finish before continuing to next and then resets the counter back to 1.

Without this parallelization, it was taking too much time because it was doing it sequentially. But Why 8? We checked this before manually that which number is good, above 8 was slowing it down and below 8 was just not good performance wise. So we picked 8.

Resize Script

This is how the resize script looked like.

Line 4 basically takes the start time of the script and then executes the resize command.

Line 5 is composed of 3 commands, piping output of each command to the next one from left to right.

  • aws s3 cp “s3://bucket-here$path” -
  • convert -strip -resize $max_widthx -quality 80% — jpg:-
  • aws s3 cp — “s3://bucket-here$path” — acl public-read

So the first command basically get the image stream from AWS Bucket and writes it on the stdoutThe — in the end of the command tells aws sdk to write the output to stdout instead of to a file path.

We then pipe the output of first command to the next command which is basically image magick’s convert command. We strip the EXIF data with -strip command, resize it to max width provided (which is 2000 in our case) and then lossless compress to 80% quality. We also write the output of this command to stdout and pipe to next one.

The last command reads from its stdin and uploads it to the same path in the bucket with public read permissions, overriding the previous file.

Thats it, with these 2 pretty simple scripts which we ran on Amazon’s EC2 Instance to reduce the latency between upload/download we were able to resize more than 80k images in our buckets.

Conclusion

When working with images, some decisions should be made from the start like:

  • How much quality we need.
  • What resolution we need
  • What should be the name of the stored image
  • What structure to follow

Thinking about above in the start will help you scale better in the future otherwise you’ll be wasting resources to put them back to right place.

Moreover we encourage you to always count at least with an optimized version that not slows down your site or app. No user wants to wait forever for the content to load, so reduce the content load time as minimal as possible to provide a seamless UX. If you still need a high quality version, keep it as an optional download.

You can browse the files embed in this article here in the repo https://github.com/ws-engineering/resize-script

Ever got stuck in a use case like this ? We’d love to hear your stories. Write them down below ;)

--

--

Hamza Baig
Le Craft

Software Engineer with focus on bridging businesses with tech