Reconstructing photo albums from a Flickr data export

As every Flickr user knows, a year ago or so the service became subscription-based. Initially I decided to stay with them as it was an handy service, but then I found myself not using it that much. Moreover, they recently increased the monthly fee, so I opted to download the photos and store them locally.

Flickr allows to download all your data fairly easily, dividing it in ZIP archives of 500 photos each, so I downloaded by 26000+ photos. Problem is the pics are not divided into albums, so - especially if you have a lot of pics - you get a huge mess which is very difficult to manage.

Luckily, in the downloadable data Flickr also provides all the information to programmatically reconstruct all the structures. I was able to successfully move the files into different directories, divided by album name. I hope sharing this experience will be useful to anyone who either is moving away from Flickr or just downloading the data to keep a local copy.

In the downloaded data, there is an albums.json file, which has all album information in this structure:

{
    "albums": [
        {
            "photo_count": "11",
            "id": "72157696153620351",
            "url": "https://www.flickr.com/photos/arthas/albums/72157696153620351",
            "title": "2018-06-09: Iconocluster - Profumo DiVino",
            "description": "",
            "view_count": "2",
            "created": "1528666970",
            "last_updated": "1528799288",
            "cover_photo": "https://www.flickr.com/photos/arthas/42670192932",
            "photos": [
                "28846036158",
                "42670194192",
                "40908904860",
                "40908904790",
                "42670192932",
                "42701822032",
                "42751133711",
                "28877442928",
                "42701821872",
                "27882010047",
                "42751133381"
            ]
        }
    ]
}

Ain’t this perfect? Well, not exactly. You can’t just use the codes in the ‘‘photos’’ array as they are to find the files belonging to an album, because:

The extension (JPG, MOV, PNG, …) is not provided
Files in the data export have names including the title, such as dublin-castle_5303007241_o.jpg

So, a little bit of programming is needed. Here’s my commented Perl script which does the job. This assumes that you are in a directory containing:

All the downloaded photos/videos in a photos subdirectory
An empty albums subdirectory, where albums directories will be created
The script itself

For the rest, it’s mostly self-explanatory but I provided comments here and there.

#!/usr/bin/env perl

use Mojo::File qw/path/;
use Mojo::JSON qw/decode_json/;
use Arthas::Defaults::520;

my $jdata = decode_json( path('./albums.json')->slurp ) || die 'No-albumdata';
my $photos = path('./photos')->list || die 'No-photos';
my @missings;

my $albums = $jdata->{albums};
for my $album (@$albums) {
    say "==> $album->{title}";

    # Create album directory, using a cleaned-up version of its title
    my $albumdir = path './albums/'.clean_albumtitle( $album->{title} );
    $albumdir->make_path();

    for my $photocode (@{ $album->{photos} }) {
        # Photo filename may have bizarre name but they always contain the code
        my $fphotos = $photos->grep(qr/$photocode/);
        my $fphoto = shift @$fphotos;
        print "    $photocode => $fphoto => ";

        # Check if we actually found a match
        if ( !-e $fphoto ) {
            push @missings, $photocode.'';
            say "MISSING!";
            next;
        }

        # Copy the pic in destination folder
        $fphoto->copy_to($albumdir);
        say 'copied!';
        # If you have a ton of photos, you may prefer to move them
        # Also, moving allows to easily find any leftovers just by looking at the
        # directory
        # And by the way: it's also way faster than copying!
        # $fphoto->move_to($albumdir);
        # say 'moved!';
    }
}

# Report any codes which we didn't find
# There's should be any, so reporting is useful for debugging (i.e. if you forgot to unpack one of the ZIP files)
say "\nMISSING CODES:";
say join "\n", @missings;
say "OK";

# File systems (depending on OS/FS) don't like some special chars, so I'm cleaning slashes, ampersands and colons.
# Tune this to match your own needs.
sub clean_albumtitle($title) {
    $title =~ s/[\/\'\:]/-/gxs;
    return $title;
}

The code relies on Mojo::File and Mojo::JSON to achieve a clean and elegant approach to the solution of the task.

The Cattle Grid

Splashes of digital ink by Michele Beltrame

Reconstructing photo albums from a Flickr data export