Join Donate

Emily LakdawallaMarch 3, 2011

Public service announcement: How to use Wget to grab the 2011 LPSC abstracts

The Lunar and Planetary Science Conference is happening next week, and today I went to the meeting website as I usually do to peruse the listed talks for the sessions and abstracts in advance. The sessions and abstracts are all in PDF format, so it's tiresome to access them online; I much prefer to download them all to my computer and browse them locally. So it was a nasty surprise to discover that this year, unlike previous years, you can't just direct your FTP client of choice to and download everything. If there's an officially sanctioned way to download all the abstracts for the 2011 meeting, I haven't been able to figure it out.

LPSC 2011


LPSC 2011

I got them all anyway, though, thanks to a handy open-source file retrieval tool called Wget. I use Wget all the time to grab space image data, but it's possible to use it for other nefarious purposes like downloading 2,000 conference abstracts. I crowed about this on Twitter this morning and received several requests to explain the method, so here's how it works.

First, install Wget. Go to the Wget website and find the installer that's appropriate for your operating system.

There are various user interfaces out there that you can install on top of Wget to make it "easier" to use, but I find it pretty easy to run it from the command line, as it's well documented online. To grab the LPSC abstracts, I first used Excel to create a text list of all the hypertext links to the session and abstract PDFs (a step that you can skip by just downloading my file), which I placed in the same folder where I installed wget. Then I ran the following command:

 > wget -i lpsc.txt

Then watched it grab the 2,000 or so files. Easy as pie!

I use that Excel trick a lot when I want to grab a large number of files with sequential filenames. For instance, every time Cassini spits a large number of images to their raw images website, I like to be able to browse through those images locally. To grab them, I begin with the path to the most recent image. Right now, that path is:

This path is a lot of text that is mostly the same for every Cassini image except for the last six digits before the ".jpg" filename extension. So I use Excel to generate a list of the 200 most recent narrow-angle camera images this way:


What that gobbledygook does is grab the left 66 characters of the path, then takes the next 6 characters and turns it into a number (that's the "VALUE" bit) and add a number to it that consists of the row number of the current cell, minus 1, then appends this new number to the end of our path, and finally tacks on the ".jpg" file name extension. Copy and paste this into as many cells as you want to grab as many images as you want, save the results as a text file within the wget folder, and run the same command: "wget -i [yourfilename].txt".

Excel isn't required; there are myriad other ways to create these sequential filename lists using little more programming skills than one needs to write a program that outputs "Hello, world!" -- Excel is just the program I'm comfortable with.

Have fun grabbing files!

Read more:

You are here:
Emily Lakdawalla 2017 headshot square serene
Emily Lakdawalla

Solar System Specialist for The Planetary Society
Read more articles by Emily Lakdawalla

Comments & Sharing
Bill Nye and people
Let's Change the World

Become a member of The Planetary Society and together we will create the future of space exploration.

Join Today

The Planetary Fund

Help advance robotic and human space exploration, defend our planet, and search for life.


"We're changing the world. Are you in?"
- CEO Bill Nye

Sign Up for Email Updates