Help Shape the Future of Space Exploration

Join The Planetary Society Now Join Now!

Join our eNewsletter for updates & action alerts

   Please leave this field empty
Blogs

Headshot of Emily Lakdawalla

Public service announcement: How to use Wget to grab the 2011 LPSC abstracts

Posted by Emily Lakdawalla

03-03-2011 17:08 CST

Topics:

The Lunar and Planetary Science Conference is happening next week, and today I went to the meeting website as I usually do to peruse the listed talks for the sessions and abstracts in advance. The sessions and abstracts are all in PDF format, so it's tiresome to access them online; I much prefer to download them all to my computer and browse them locally. So it was a nasty surprise to discover that this year, unlike previous years, you can't just direct your FTP client of choice to ftp.lpi.usra.edu/pub/outgoing and download everything. If there's an officially sanctioned way to download all the abstracts for the 2011 meeting, I haven't been able to figure it out.

LPSC 2011

LPI

LPSC 2011

I got them all anyway, though, thanks to a handy open-source file retrieval tool called Wget. I use Wget all the time to grab space image data, but it's possible to use it for other nefarious purposes like downloading 2,000 conference abstracts. I crowed about this on Twitter this morning and received several requests to explain the method, so here's how it works.

First, install Wget. Go to the Wget website and find the installer that's appropriate for your operating system.

There are various user interfaces out there that you can install on top of Wget to make it "easier" to use, but I find it pretty easy to run it from the command line, as it's well documented online. To grab the LPSC abstracts, I first used Excel to create a text list of all the hypertext links to the session and abstract PDFs (a step that you can skip by just downloading my file), which I placed in the same folder where I installed wget. Then I ran the following command:

 > wget -i lpsc.txt


Then watched it grab the 2,000 or so files. Easy as pie!

I use that Excel trick a lot when I want to grab a large number of files with sequential filenames. For instance, every time Cassini spits a large number of images to their raw images website, I like to be able to browse through those images locally. To grab them, I begin with the path to the most recent image. Right now, that path is:

http://saturn.jpl.nasa.gov/multimedia/images/raw/casJPGFullS66/N00168985.jpg


This path is a lot of text that is mostly the same for every Cassini image except for the last six digits before the ".jpg" filename extension. So I use Excel to generate a list of the 200 most recent narrow-angle camera images this way:

  • Put the filename of the most recent image in cell A1 of the table.
  • In cell A2, write the following formula:

=CONCATENATE(LEFT(A1,66),VALUE(MID(A1,67,6))+ROW()-1,".jpg")


What that gobbledygook does is grab the left 66 characters of the path, then takes the next 6 characters and turns it into a number (that's the "VALUE" bit) and add a number to it that consists of the row number of the current cell, minus 1, then appends this new number to the end of our path, and finally tacks on the ".jpg" file name extension. Copy and paste this into as many cells as you want to grab as many images as you want, save the results as a text file within the wget folder, and run the same command: "wget -i [yourfilename].txt".

Excel isn't required; there are myriad other ways to create these sequential filename lists using little more programming skills than one needs to write a program that outputs "Hello, world!" -- Excel is just the program I'm comfortable with.

Have fun grabbing files!

 
See other posts from March 2011

 

Or read more blog entries about:

Comments:

Leave a Comment:

You must be logged in to submit a comment. Log in now.
Facebook Twitter Email RSS AddThis

Blog Search

JOIN THE
PLANETARY SOCIETY

Our Curiosity Knows No Bounds!

Become a member of The Planetary Society and together we will create the future of space exploration.

Join Us

Featured Images

What Lies Beneath

MAVEN enters orbit
Finding jets in the September 19, 2014 NavCam image of comet 67P
NavCam view of comet Churyumov-Gerasimenko on September 19, 2014
More Images

Fly to an Asteroid!

Travel to Bennu on the OSIRIS-REx spacecraft!

Send your name

Join the New Millennium Committee

Let’s invent the future together!

Become a Member

Connect With Us

Facebook! Twitter! Google+ and more…
Continue the conversation with our online community!