Help Shape the Future of Space Exploration

Join The Planetary Society Now  arrow.png

Join our eNewsletter for updates & action alerts

    Please leave this field empty
Blogs

Jason Davis headshot v.2

Software Glitch Pauses LightSail Test Mission

Posted by Jason Davis

26-05-2015 16:35 CDT

Topics: mission status, LightSail

The Planetary Society’s LightSail test mission is paused while engineers wait out a suspected software glitch that has silenced the solar sailing spacecraft. Following a successful start to the mission last Wednesday, LightSail spent more than two days sending about 140 data packets back to Earth.

But the long Memorial Day weekend here in the United States offered no respite for the LightSail team, as they scrambled to figure out why the spacecraft's automated telemetry chirps suddenly fell silent. It is now believed that a vulnerability in the software controlling the main avionics board halted spacecraft operations, leaving a reboot as the only remedy to continue the mission. When that occurs, the team will likely initiate a manual sail deployment as soon as possible.

What happened?

As of late Friday afternoon, LightSail was continuing to operate normally. The spacecraft’s ground stations at Cal Poly San Luis Obispo and Georgia Tech were receiving data on each pass. Power and temperature readings were trending stably, and the spacecraft was in good health. 

LightSail battery levels from start of mission through communications loss

Cal Poly / Edited by Jason Davis

LightSail battery levels from start of mission through communications loss

But inside the spacecraft's Linux-based flight software, a problem was brewing. Every 15 seconds, LightSail transmits a telemetry beacon packet. The software controlling the main system board writes corresponding information to a file called beacon.csv. If you’re not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. 

As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files—it can crash the flight system. The manufacturer of the avionics board corrected this glitch in later software revisions. But alas, LightSail’s software version doesn’t include the update.

Late Friday, the team received a heads-up warning them of the vulnerability. A fix was quickly devised to prevent the spacecraft from crashing, and it was scheduled to be uploaded during the next ground station pass. But before that happened, LightSail fell silent. The last data packet received from the spacecraft was May 22 at 21:31 UTC (5:31 p.m. EDT). 

The aftermath

LightSail is likley now frozen, not unlike the way a desktop computer suddenly stops responding. A reboot should clear the contents of the problematic beacon.csv file, giving the team a couple days to implement a fix. But to pull a phrase from recent mission reports, the outcome of the freeze is “non-deterministic.” That means sometimes the processor will still accept a reboot command; other times, it won’t. It’s similar to the way you deal with a frozen computer: You can try to struggle past sluggish menus and click reboot, but sometimes, your only recourse is pressing the power button.

As of Tuesday afternoon, there have been 37 Cal Poly and Georgia Tech ground station passes. During half of those, reboot commands were sent to the spacecraft. Nothing has happened yet. Therefore, we have to assume that LightSail is only going to respond to the power button method.

When I filmed an interview with our CEO, Bill Nye, and system engineer Barbara Plante, last year, Nye points out a piece of hardware strapped to BenchSat, LightSail’s acrylic-mounted testing clone:

“There’s nobody in outer space to push that reset button,” says Nye.
“No one that we’ve gotten to volunteer for that job,” Plante replies. “But it’s open."

Since we can’t send anyone into space to reboot LightSail, we may have to wait for the spacecraft to reboot on its own. Spacecraft are susceptible to charged particles zipping through deep space, many of which get trapped inside Earth’s magnetic field. If one of these particles strikes an electronics component in just the right way, it can cause a reboot. This is not an uncommon occurrence for CubeSats, or even larger spacecraft, for that matter. Cal Poly’s experience with CubeSats suggest most experience a reboot in the first three weeks; I spoke with another CubeSat team that rebooted after six. Coincidentally, this is close to the original 28-day sail deployment timeline. 

Radio operators

A lot of amateur radio enthusiasts have been helping to track LightSail. Many of you have sent in data packets that you’ve received, and I’ve been getting a lot of questions about how to decode packet contents. First of all, thank you! The first operator to grab a full packet was Ken Swaggart (call sign W7KKE) from Lincoln City, Oregon. His packet was nabbed at 20:01 UTC on May 20—just five hours after launch.

This is a test flight in more ways than one. In addition to figuring out how LightSail behaves in space, we’re also refining our procedures for getting information out to the public—including the radio community. Because the team has been busy with actual spacecraft operations, I’ve been trying to field those inquiries myself. Unfortunately, I’m far from an expert in this area. I know what the data should look like after they are decoded; not so much on the raw radio signal side of things.

We don’t currently have a public decoder available, but Dr. John Bellardo at Cal Poly generously spent some time building a prototype web version that takes raw hexidecimal data and converts it to plain text. I’ve tried using it to decode some of the packets you’ve sent me, but to no avail thus far—meaning I have to send them over to our engineers instead. The team’s first duty is to the spacecraft itself, but we’ll get to your requests eventually, and iron out the process for our 2016 flight. In the meantime, you can still listen for LightSail and send reports to lightsail@planetary.org. For more details, check out our Mission Control page at sail.planetary.org/missioncontrol.

Finally, many folks have written in to tell me that our spacecraft ground track is getting stale. We know, and we’re just as anxious as you are to get that updated. We rely on JSpOC, the Joint Space Operations Center, for updated Two-Line Element sets. TLEs are numerical descriptions of an object’s orbit. Thus far, our only TLE came from the launch vehicle shortly after LightSail was deposited into orbit.

When we get a new TLE, we expect there to be several that represent the entire group of CubeSats that hitched a ride to space aboard the Atlas V Centaur upper stage. Since the spacecraft remain bunched relatively close together, there might not be discrete one-to-one matches between each TLE and its corresponding spacecraft. One way we can narrow down which spacecraft is LightSail is to measure its doppler shift, but since we're no longer transmitting, the process may take more time.

Atlas V at night

Navid Baraty / The Planetary Society

Atlas V at night
This image of the LightSail 1 spacecraft's Atlas V rocket was captured the night before launch after a sound-activated remote camera was inadvertently triggered.

Next steps

Cal Poly and Georgia Tech will keep listening for LightSail on each ground pass. Furthermore, Cal Poly is automating the reboot command transmission to be sent every few ground station passes, on the hope that one command sneaks through (we don't send the command on every pass because a successful reboot triggers a waiting period before beacon transmissions begin). But as of right now, we can’t do much except wait, hoping a charged particle smacks the spacecraft in just the right way to cause a reboot. LightSail is capable of remaining in orbit about six months in its CubeSat form.

In the meantime, the team is looking at several fixes to work around the software vulnerability once contact is reestablished. One is a Linux file redirect that would send the contents of the troublesome beacon.csv file to a null location, a sort-of software black hole. Lab testing on this fix has been promising—over a gigabyte of beacon packets have already been sent into nothingness without a system freeze.

When we hear from LightSail again, the team will likely initiate a manual sail deployment as soon as possible. Planning has already started on that front—we’ll keep you updated.

In the meantime, I'll be refreshing the spacecraft's raw telemetry packet repository, ready to jump at the first sign of new data. With a little luck, the test mission isn’t over just yet. Hopefully LightSail will follow trends established by other CubeSat missions and reboot soon.

 
See other posts from May 2015

 

Or read more blog entries about: mission status, LightSail

Comments:

Bryan Foster AB7IR: 05/26/2015 05:33 CDT

i know you guys are busy, but you might contact the KICKSAT (a cube sat also funded by kickstarter ) folks, as they were relying solely on us ham radio guys to track/recieve/decode the cubesat signals. they may already have the information you need for us to decode any LightSail signals.

Aerospacey: 05/26/2015 10:42 CDT

The LightSail track can also be viewed at the N2YO.com page using the "Find a satellite" Search box and "LightSail" or the 40661 identification number. I have a tab for each page. Good hunting!

Douglass: 05/27/2015 07:54 CDT

Kind of remarkable that this fault wasn't uncovered in ground testing. Filling buffers with several days worth of telemetry files seems like a pretty obvious and easy thing to check. File system overflows are well known weaknesses in operating systems, so the premise that "Oops, we didn't get the update" is a sketchy excuse. The idea that a hard reset is possible only with a well targeted cosmic ray hit is also eyebrow-raising. That all being said, we're all hoping for the best.

gcw1957: 05/27/2015 09:20 CDT

As a developer of autonomous systems, I feel your pain. Good Luck. G.

Philfreeze: 05/27/2015 09:42 CDT

I hope this charged particle will hit the spacecraft soon but I have to say as an apprentice in the field of electronics this is kind of funny. I work for a company which developes coffee machines and I usually build test equipment for it and even there were we could just pull the cord it is considered normal to reboot the system in a regular schedule or if somethings goes wrong. Applied to this situation it would proably be like: No comm for x amount of time (4h or so) -> reboot and a scheduled reboot every 24h. Nice to see that even you guys are just people ^^

Edgar: 05/27/2015 10:22 CDT

Well, charged particles do affect spacecraft but it is far more likely they cause additional damage than doing something useful in a specific situation.

pkuhar: 05/27/2015 11:17 CDT

Don't this things a at least a watchdog?

Ollopa: 05/27/2015 12:38 CDT

Is Georgia Tech missing a trick here? They have a campus in Ireland - could that not be used to increase coverage (clearly an issue)?

Gene: 05/27/2015 12:45 CDT

I not sure to expect other missions' experiences with cosmic-ray-triggered reboots to be all that relevant, if normal operation of a cube sat CPU is susceptible to reboots from corruption in data beyond just the few registers that a "frozen" CPU might be examining. It might take a more specific hit to help in this case. Perhaps something to consider for future missions is scheduled CPU reboots as Philfreeze mentioned? I half expected the interview with Nye to culminate with a little distinct device that literally could "hit" the reboot button, and I could imagine a very robust clock designed to raise the reboot trigger on the primary CPU every, say, 24 or 48 hours. Perhaps the potential risks of rebooting are considered as outweighing the potential benefits? If so, the clock could indeed receive a "reset to 0" every so often from the main computer as a "heartbeat" to prevent unnecessary reboots. ...but someone must have thought about all this already, right?

jkeegan: 05/27/2015 02:43 CDT

Please decide who your audience is. In this sentence: "One is a Linux file redirect that would send the contents of the troublesome beacon.csv file to a null location, a sort-of software black hole. Lab testing on this fix has been promising—over a gigabyte of beacon packets have already been sent into nothingness without a system freeze." ...who is your audience? People who don't know about /dev/null and need it described as a software "black hole" probably don't care about the details of the troublesome csv file size limitation. People who DO know would probably relate to /dev/null easier than deciphering your software black hole description.

TerrenceNyathi: 05/27/2015 03:07 CDT

I am willing to do the job.

CharlesHouston: 05/27/2015 03:27 CDT

It may be hard to get a TLE from the JSpOC, they do not release TLEs for some spacecraft and the OTV-4 is one of them. I just looked at space-track.org and the TLEs for all of the spacecraft from that mission are blank. Mike McCants maintains a file that can help us get more current TLEs.

CharlesHouston: 05/27/2015 07:42 CDT

And the second commenter mentioned N2YO.com - they also get their TLEs from the JSpOC, so their TLE is no more accurate than any other. Hopefully the Air Force would send a current TLE to the owner of a satellite. And if someone mentions CelesTrak, they also get their TLEs from the Air Force. I looked at Dr TS Kelso's CelesTrak site and it does not have the TLEs.

vincentcius: 05/27/2015 08:21 CDT

Iam a layman and its completly of topic bit I have just discovered this site and would like to say that I just love living in a time witch I used to read about in sf books. So thanks and i hope it all turns out wel. In my books these sails in time could reach a velocity close to lightspeed. Hope I will witness that to in the nearby future...goodluck

CharlesHouston: 05/27/2015 11:27 CDT

Oops. After I wrote my two earlier notes I remembered that the Cubes were deployed in a much different inclination and altitude from the OTV-4!! The only likely source of TLEs to search around might be the Aerospace Corporation. Maybe they get TLEs from JSpOC on their AreoCubes???

_101_: 05/28/2015 11:52 CDT

logrotate was added to Linux 19 years ago. Someone never learned or forgot the fundamentals.

megaton: 05/28/2015 03:45 CDT

If you want to reboot your Linux box while it's frozen... http://www.linuxjournal.com/content/rebooting-magic-way

bill: 05/28/2015 03:52 CDT

Next Time Include one of these .http://www.st.com/web/en/resource/technical/document/datasheet/CD00176077.pdf external watchdog timer.

Tomas: 05/28/2015 04:31 CDT

I am sorry but I need to jump in. No known space radiation effect magically restores satellite processor to its operational state. That has never been credibly observed or documented. There is a common mechanism involving excessive power draw because spacecraft software or hardware is some strange non-nominal configuration or faulty attitude control that hinders solar panel pointing. Once the battery is fully depleted, most satellites automatically remove all loads from the satellite battery bus, including a processor. The processor thus has a chance for clean reboot after next eclipse egress. But first, the battery has to be depleted. Ironically, that is not easy for Lightsail as its battery is oversized for the satellite. Battery was sized for peak current during the deployment (motor, cameras, gyros, transmitter, are all active) and not for energy storage, as is more typical. Radiation upsets need to trigger more power draw to deplete the battery or prevent battery recharging.

Tomas: 05/28/2015 04:36 CDT

In response to earlier comments — There is a watchdog timer on the satellite.In fact, there are several timers. Most important, satellite has a hardware-based (no software involved at all) timer switch that will force the avionics board reset every 28 or 35 days (there is contradictory information which is correct value). This information comes from the Tyvak board datasheet — “Resistor programmable hard reboot timer (1 to 48 days).” Why does the project team believes that this hard reboot timer is not likely to work?

bill: 05/28/2015 04:37 CDT

@tomas I was reading an article and it appears a cosmic particle can flip bits in in ic's. http://www.digikey.com/en-US/articles/techzone/2012/may/a-designers-guide-to-watchdog-timers

tomas: 05/28/2015 04:56 CDT

Yes bill -- that is a well-understood effect but it is unlikely to "uncrash" processor that has crashed. Corrupting code space with more wrong bits is unlikely to fix it, eight? You need a clean initialization, from fresh boot copy and that is only achieved by unpowering and powering processor IC again.

chugs: 05/28/2015 09:49 CDT

I think the underlying problem here is that all of these organisation launching cubesats are underestimating interference, ground operation costs, locations and spacecraft in decaying orbits. I would argue that before we go about launching satellites that we can barely communicate with that we should be launching a space internet network system that has continous groundstation links with a web of geosynchronous sats. With such a network communicating with sats that are in Geocentric orbit. That way instead of orientating sats towards earth they're dishes/transmitters can be connected continious by this network of sats, which would also form a network with Mars and lunar geosync sats. Space is getting crowded, along with limited bandwidth from groundstations trying to transmit through a dense atmosphere is just asking for trouble.

PedroLopez: 05/29/2015 10:10 CDT

OK. let me clue you into a little Systems Administration 101, roll your log files. And what happened to testing? Nowhere on your site do I see CVs of your development team or QA so I can only assume a DevOps approach was used and staffed by people who had little experience. Seriously this is a rookie mistake. It is sad when such a noble experiment fails for such an obvious defect. This is coming from someone with about 20 years of experience in both Systems Administration and Software Development.

dtaviation: 05/29/2015 11:51 CDT

CSV !? You've got to be kidding. Lack of sophistication killed this bird.

PedroLopez: 05/29/2015 12:31 CDT

CSV files are a great approach for this sort of problem. Very light weight and easy to parse. Other formats would've bloated the file on a system with very limited disk and memory space and a low end processor. They used the correct format for the file. If you use a Earth based approach in this problem domain it will bite you.

RealityCheck: 05/29/2015 02:22 CDT

For those of you posting criticism of the project why don't you take a moment to stop and think what purpose your criticism serves. Is it constructive in nature and likely to benefit the engineers in some way or is it self-serving to your ego? If you think you can do a better job then by all means, do it. How many of you have a satellite in orbit?

Bob Ware: 05/29/2015 02:40 CDT

Thank you all for your input. Please keep in mind that this is NOT an actual space flight. It is a Test Bed mission with a new design theory to see what works and does not work. What is learned here can be applied to the 2nd flight.

Edgar: 05/29/2015 04:09 CDT

@Bob: so what was learned from this test bed mission? That you shouldn't continuously write to limited memory until the system crashes? You don't need to launch a cubesat for that! I think that some pretty incompetent remarks from Jason (charged particles, black hole devices) triggered the critizism to some extent. Also the communication to the public, who finances them, from this group is less than mediocre. I would actually go as far as to recommend to them to give this project up and talk to experienced people, e.g. AMSAT, to get it back on track. And Bob, admittedly, I don't have satellite in space, but I do have some neat stuff, I don't need to feel inferior.

Zoe Brain: 05/29/2015 06:25 CDT

"The question for software developers is not, 'Are you paranoid?', the question is, 'Are you paranoid enough?' " Brain says. "Every software module, every function, procedure or method has to assume that information coming in may have been spoilt by a malfunction and be prepared for the worst. The system must be ductile - bending, not breaking - when things go wrong. In space no one can press Control/Alt/Delete - (The Age 2002)

Ed: 05/29/2015 07:05 CDT

This is absurd. I'm a software engineer. This could have been avoided by the obvious expedient of using a watch-dog circuit. A watch-dog circuit is "a small bit of hardware" that periodically causes a "hard reset", unless the software tells it not to. For example, you could tell the watch-dog "perform a reset in 12 hours". As long as you give it another dead-line within those 12 hours, it doesn't reset. A simple approach would be for the Earth to have to send a "reset watchdog timer command" in order to reset the watch-dog.timer. That way, a hard-reset occurs (within 12 hours) if a message to prevent it is not received from Earth. A reset occurs if the software totally fails (since the software isn't saying "DON'T RESET). I don't understand the absence of a watch-dog circuit.

randomcollegestudent: 05/29/2015 11:13 CDT

try executing something like this as root at regular intervals, lol, Ive done it as a temporary bandaid on a couple linux systems until I found a better solution cd (directory of beacon.csv) rm beacon.csv touch beacon.csv exit or if you need certain info at the start of the file, then make a copy with just that information, then execute this as root at regular intervals cp -R /(location of backup)/beacon.csv /(target location)/ exit

Andre: 05/30/2015 10:11 CDT

When I read the kickstarter update I thought "Ah, they are waiting for the watchdog to sort it out". Reading this article I was wondering if a watchdog is missing but then remembered the many times I was waiting for an watchdog to trigger which never triggered. The reason is that an watchdog is supposed to trigger if an specific action was not registered for a while. In many cases the part talking to the watchdog keeps working while the rest of the system is not responding [well enough]. Continuing this thought it could be that the team is hoping something crashes/freezes the system enough for the watchdog to trigger. On the testing side it can be said that hindsight is great but reality dictates that not everything can be tested (particularly under time/money pressures). On one side I agree it would been sensible to test a full mission but I'd not want to judge with the information available and therefore not knowing what effort requirements such testing would have.

Chris Krupiarz: 05/30/2015 03:09 CDT

For all the Monday morning quarterbacks here and elsewhere, saying what the engineers behind LightSail should or should not have done at this point is useless. This is the time to figure out how to correct the problem, not to waste time placing blame as to why the problem occurred (especially if you don't know precisely why that software was there, which, unless you are on the spacecraft team, you don't). Yes, of course, there are lessons learned, but they are better addressed in the calm aftermath not when the pressure is there to actually save the mission. Cluttering the brain with the with all these "should ofs" isn't important. Figuring out the problem and fixing it is.

G_Marcus: 05/31/2015 09:44 CDT

In answer to Mr Krupiarz, I don't think that commenters are necessarily pointing the finger of blame. However, to those of us who have actually developed satellite systems (including CubeSats), this error sequence is astonishing, pointing out many areas where "lessons learned" need to be taken to heart for the next launch.

Chris Krupiarz: 05/31/2015 11:51 CDT

I don't agree. The tone of some of the comments here as well as many elsewhere are not helpful. As I said, unless you worked on the program, I don't think you know exactly what the error sequence is. Perhaps it will be astonishing, but having seen this before where people react without all the facts, it can be frustrating for those trying to correct the actual problem in real-time. (And, yes, I've worked and am working on spacecraft, too, which is why I empathize with the folks on Light Sail that are being criticized without the full facts being known.)

Philfreeze : 06/03/2015 02:54 CDT

@Chris Krupiarz I don't want to blame them and I don't have to know the specific error since they hope for a complete reboot anyway. I suggested just a scheduled reboot which would solve the problem faster. One does not have to know everything to find a viable solution to a problem or most computer scientists would be useless since they don't have to know all the hardware thingies.

Chris Krupiarz: 06/04/2015 07:51 CDT

@Philfreeze I didn't mean you. :) The suggestion you had is essentially what they ended up doing by rebooting every 24 hours. There's certainly nothing wrong with suggesting ideas. That can help! And you are correct, you don't need to know the entire architecture to do that. I was referring to comments of the "this is absurd" variety over things that can't be changed now and other comments I saw elsewhere.

Leave a Comment:

You must be logged in to submit a comment. Log in now.
Facebook Twitter Email RSS AddThis

Blog Search

Essential Advocacy

Our Advocacy Program 
provides each Society member 
a voice in the process.



Funding is critical. The more 
we have, the more effective 
we can be, translating into more 
missions, more science, 
and more exploration.

Donate

Featured Images

BEAM expansion progress
THAICOM 8 drone ship landing
BEAM on Canadarm
Mars near opposition from Pic du midi Observatory, 2016
More Images

Featured Video

The Planetary Post - A Visit To JPL

Watch Now

Space in Images

Pretty pictures and
awe-inspiring science.

See More

Join the New Millennium Committee

Let’s invent the future together!

Become a Member

Connect With Us

Facebook, Twitter, YouTube and more…
Continue the conversation with our online community!