Reobservations Report No. 3: Selecting the Finalist Candidates
From March 18th through the 20th SETI@home will take a break from its usual task of surveying the sky in search of an alien transmission. For three successive days SETI@home will have use of the giant Arecibo radio telescope to revisit the most promising candidate signals detected since the project was launched in 1999. SETI@home Chief Scientist Dan Werthimer and his team put together a list of the "best" 200 locations in the sky where promising candidates have previously been detected. While they hope to revisit as many of them as possible, realistically, even 100-150 reobservations would be considered a success.
Picking the right 200 candidate signals, however, is crucial. Since the project's launching in 1999, SETI@home users around the world have detected no less than 5 billion(!) different gaussians, triplets, and pulses, any one of which could potentially be that sought-for signal. This means that for every signal that is selected for reobservation at Arecibo, 25 million others will be discarded, and most likely never observed again. Obviously, SETI scientists want to make sure they make the right choices. There is no easy way of selecting the 200 "best" candidate signals out of 5 billion. But the stakes are high, and the SETI@home crew, led by Project Director David Anderson, Dan Werthimer and Eric Korpela, has been constantly refining and improving the selection criteria. It has been an ongoing process, and the selection criteria that appeared satisfactory a few months ago are not necessarily those that will actually be used in the picking the 200 finalists.
The first stage is relatively straightforward: the least reliable candidate signals are weeded out in a process called "data integrity check," and those that are most likely the result of detection or computer error are eliminated. Then all candidate signals are compared to a database of known Radio Frequency Interference (RFI) sources. These are strong human-made radio transmissions generated by radars, satellites, and the like, which operate in the vicinity of Arecibo. If a SETI@home signal appears to match a known RFI source, then it too is removed from the list.
Once these obvious "false alarms" are eliminated, however, SETI@home scientists are still left with several billion candidate signals to choose from - still far too many. Each of the remaining candidates must now be assigned a score, representing the likelihood that it represents a "true" signal. The top 200 scorers will be the ones to earn another visit from Arecibo.
The Value of Persistence
The first and most important criterion in determining the candidates' scores is whether a signal had been detected repeatedly over time. Bitter experience has taught SETI scientists that a signal that has been detected only once and never again is not a good candidate for an extraterrestrial communication. Consider the "Wow!" signal for example: detected in 1977, it was the strongest and clearest signal received by SETI at the time, and remains SETI's most celebrated signal to this day. It was, however, never heard from again despite repeated efforts, and as a result we are still not sure what it truly was. Because of this experience, SETI@home scientists insist that candidate signals must be persistent and reliable to be strong candidates for an extraterrestrial transmission. Only candidate signals that have been detected more than once in the same location on separate SETI runs are to be considered. These candidate signals, composed of two, and sometimes three separate observations, are referred to as "multiplets" by the SETI@home team.
Gaussians are the power curves produced when the Arecibo beam scans a steady celestial radio source. The signal is weak at first, strong when it is at the center of the beam, and then fades again. This produces a bell shaped power curve known as a gaussian.
Pulses represent any celestial radio signal of a fixed frequency that is distinguishable above the background noise.
Triplets are a sets of 3 equally spaced spikes. Whereas gaussians represent a constant signal from space, triplets may represent a series of pulses transmitted at fixed time intervals.
In most cases, a "multiplet" is a repeating signal that has the same characteristics when it is detected on different occasions. A gaussian found where one had been detected before, a triplet in the place of a triplet, or a pulse in the place of a pulse. But there is an additional type of signal that should be considered. In some cases a particular kind of signal, say a gaussian, was detected at a given location during one pass of the Arecibo dish, while a different kind of signal, say a triplet, was detected coming from the same direction during a later pass. SETI@home scientists combine the two (or more) candidate signals into a single candidate, and refer to it as a "metacandidate."
The notion that a signal should be detected more than once in order to be considered a candidate seems straightforward enough. Implementing this seemingly simple criterion, however, is by no means easy. For how are we to decide whether separate candidate signals, detected at different times, are truly different, or multiple detections of the same one?
Clearly, the two (or more) detections should originate from the same part of the sky. This is indeed an important consideration used by the SETI@home team: the closer the two (or more) detections are to each other in the sky, the more likely they are to represent a deliberate transmission rather than random noise.
Thinking like an Alien
Naturally, the multiple detections should also be at the same frequency. How close, however, should the detection frequencies be in order to be considered the same? This question is particularly delicate if we remember that the frequency at which we detect a signal is different from the frequency at which it was transmitted. This is due to the Doppler effect, brought about by the fact that the Earth is almost certainly in motion relative to the object transmitting the signal, and that the speed of this relative movement is different at different times. If an alien civilization is sending a beacon to the Earth at a fixed frequency, their transmission will be received at slightly different frequencies depending on whether their planet is moving towards the Earth or away from it, and at what speed. SETI scientists can correct this Doppler drift to a certain extent, by compensating for the Earth's own movements within the Solar System. Instead of considering the frequency at which a signal was detected, they calculate the frequency at which the signal would have been received at our Solar System's center of gravity, also known as its "barycenter." If the aliens are making similar barycentric corrections to their transmissions, and compensating for the movements of their own planet, this would create a "magical frame of reference." Candidate signals within this magical frame would not drift, but remain at their original frequency.
Would an alien civilization cooperate with us in creating this "magical frame of reference?" Some think it likely, but of course we cannot know for sure. If we assume that they would, it means that we should concentrate our search only on candidate signals that are detected several times at almost exactly the same frequency. If, however, the aliens do not oblige us by making barycentric corrections, then we should consider candidate signals to be "repeaters" even if their multiple detections occur at somewhat different frequencies.
After considering the matter thoroughly, SETI@home scientists decided to hedge their bets. Accordingly, all calculations of the likelihood that a repeating signal is in fact an alien transmission would proceed on two parallel tracks. One set of calculations would assume that a "real" signal would be barycentrically corrected at the origin, and would therefore insist on a very tight frequency fit between the different occasions at which it was detected. A parallel set of calculations would not assume that a "real" transmission would be corrected by the aliens, and would therefore allow for greater frequency variation.
The end result is the creation of eight different classes of repeating candidate signals. The four classes of gaussians, triplets, pulses, and metacandidates, with each class calculated according to both narrow, "barycentrically corrected" criterion, and the wider "uncorrected" version.
The Final Score
Each multiplet in each of the eight categories is now examined in relation to several additional criteria. Some of these factors depend on the type of signal. Gaussians, for instance, will be ranked according to how well their curve matches a perfect gaussian generated by the Arecibo dish, as well as by their strength. The stronger the signal, the more likely it is to be caused by ET rather than noise. Triplets and spikes are not restricted to a particular shape, but their score also improves with their strength.
The Star Factor
The formula used to rank the different stars according to the likelihood that they would host a communicating civilization is:
N is a normalizing factor, 1.65x10^7 bv is b-v color bv0 is b-v color of the bluest star in the catalog (-0.41) bv_sun is the b-v color of the sun (+0.65) par is the parallax in milliarcseconds
The formula was developed by SETI@home scientist Eric Korpela.
Another factor that contributes to a signal's score is its location in the sky. A signal that comes from the direction of a known star or galaxy will be given preference over one that appears to emerge from empty space. To check for this, the SETI@home crew relied on the Hipparcus catalogue - the most comprehensive list of stars available. Hipparcus lists no less than 33,000 main sequence stars within Arecibo's observation band, and all of these are compared with each signal. To these are added the numerous distant galaxies that dot the skies at Arecibo's latitude, on the assumption that a signal might just possibly originate from one of them as well.
When it comes to scoring candidate signals, however, not all stars are equal. This is because, according to SETI wisdom, some stars are more likely to host a communicating alien civilization than others. Thus, for example, only main-sequence stars are considered for signal-scoring purposes, excluding red giants and white dwarfs. Short-lived stars, whose lifespan is only a few million years, are also excluded from consideration, since complex life would not have had time to evolve in such an environment. Nearby stars, on the other hand, get "extra credit" in their scoring, since it would be comparatively easier to communicate with civilizations in our galactic neighborhood than with those in distant parts of our galaxy or beyond. Finally, the more similar a star is to our own Sun, the higher its score, since many consider solar type stars most promising for the evolution of life.
The extrasolar planets discovered in recent years are also factored into the equation: a signal originating from the direction of a star with known planets will certainly receive special attention. The ideal signal, in other words, would originate from the direction of a nearby main-sequence Sun-like star with known planets.
Based on all these factors, every multiplet of each of the eight types is assigned a "detection score," and ranked accordingly. A major problem nonetheless remains. A signal's "detection score" effectively compare gaussians to other gaussians and triplets to other triplets, and determines which ones are most likely to represent intelligent transmissions. But in order to come up with a list of the 200 best candidate signals overall, it is also necessary to compare gaussians to triplets, and spikes to metacandidates, and decide which are the most promising. To resolve this, each signal is assigned not only a "detection score," which is specific to each type of signal, but also a "metascore," which can be compared with all the different types of candidate signals. The 200 candidate signals with the best metascores are the ones that Arecibo will aim for.
It must be clear by now that selecting the 200 "best" candidate signals out of 5 billion is no easy task. With millions of candidate signals discarded for every one revisited, SETI@home researchers can never be absolutely certain that they did not miss the only "real" signal in the pile. All they can do is try their best, and hope that by using a careful and meticulous method to sift through the cosmic haystack, that one true signal will not slip by unnoticed.