What happens to your markers? A look inside the Space Warps Analysis Pipeline
Once you hit the big “Next” button, you’re moving on to a new image of the deep night sky – but what happens to the marker you just placed? And you may have noticed us in Talk commenting that each image is seen by about ten people – so what happens to all of those markers? In this post we take you inside the Space Warps Analysis Pipeline, where your markers, get interpreted and translated into image classifications and eventually, lens discoveries.
The marker positions are automatically stored in a database which is then copied and sent to the Science Team every morning for analysis. The first problem we have to face at Space Warps is the same one we run into in life every day – namely, that we are only human, and we make mistakes. Lots of them! If we were all perfect, then the Space Warps analysis would be easy, and the CFHTLS project would be done by now. Instead though, we have to allow for mistakes – mistakes that we make when we’ve done hundreds of images tonight already and we’re tired, or mistakes we make because we didn’t realise what we were supposed to be looking for, or mistakes we make – well, you know how it goes. We all make mistakes! And it means that there’s a lot of uncertainty encoded in the Space Warps database.
What we can do to cope with this uncertainty is simply to allow for it. It’s OK to make mistakes at Space Warps! Other people will see the same images and make up for them. What we do is try and understand what each volunteer is good at: Spotting lenses? Or rejecting images that don’t contain lenses? We do this by using the information that we have about how often each volunteer gets images “right”, so that when a new image comes along, we can estimate the probability that they got it “right” that time. This information has to come from the few images where we do actually know “the right answer” – the training images. Each time you classify a training image, the database records whether you spotted the sim or caught the empty image, and the analysis software uses this information to estimate how likely you are to be right about a new, unseen survey image. But this estimation process also introduces uncertainty, which we also have to cope with!
We wrote the analysis software that we use ourselves, specially for Space Warps. It’s called “SWAP”, and is written in a language called python (which hopefully makes it easy for you to read!) Here’s how it works. Every volunteer, when they do their first classification, is assigned a software “agent” whose job it is to interpret its volunteer’s marker placements, and estimate the probability of the image at hand containing a gravitational lens. These “agents” are very simple-minded: in order to make sense of the markers, we’ve programmed them to make a basic assumption: that they can interpret their volunteer’s classification behavior using just two numbers, the probabilities of being right when a lens is present, and of being right when a lens is not present, which they estimate using your results for the training images. The advantage of working with such simple agents is that SWAP runs quickly (easily in time for the next day’s database dump!), and can be easily checked: it’s robust. The whole collaboration of volunteers and their SWAP agents makes up a giant “supervised learning” system: you guys train on the sims, and the agents then try and learn how likely you are to have spotted, or missed, something. And thanks to some mathematical wizardry from Surhud, we also track how likely the agents are to be wrong about their volunteers.
What we find is that the agents have a reasonably wide spread of probabilities: the Space Warps collaboration is fairly diverse! Even so, *everyone* is contributing. To see this we can plot, for each image, its probability of containing a lens, and follow how this probability changes over time as more and more people classify it. You can see one of these “trajectory” plots above: images start out at the top, assigned a “prior” probability of 1 in 5000 (about how often we expect lenses to occur). As they are classified more and more times they drift down the plot, and either to the left (the low probability side) if no markers are placed on them, and to the right (the high probability side) if they get marked. You can see that we do pretty well at rejecting images for not containing lenses! And you can also see that at each step, no image falls straight down: every time you classify an image, its probability is changed in response.
Notice how nearly all of the red “dud” images (where we know there is no lens) end up on the left hand side, along with more than 99% of the survey images. All survey images that end up to the left of the red dashed line get “retired” – withdrawn from the interface and not shown any more. The sims, meanwhile, end up mostly on the right, as they are correctly classified as lenses: at the moment we are only missing about 8% of the sims, and when we look at those images, it does turn out they are the ones containing the sims that are the most difficult to spot. This gives us a lot of confidence that we will end up with a fairly complete sample of real lenses.
Indeed, what do we expect to find based on the classifications you have made so far? We have already been able to retire almost 200,000 images with SWAP, and have identified over 1500 images that you and your agents think contain lenses with over 95% probability. That means that we are almost halfway through, and that we can expect a final sample of a bit more than 3000 lens candidates. Many of these images will turn out not to contain lenses (a substantial fraction will be systems that look very much like lensed systems, but are not actually lenses) – but it’s looking as though we are doing pretty well at filtering out empty images while spotting almost all the visible lenses. Through your classifications, we achieving both of these goals well ahead of our expectations. Please keep classifying and there will be some exciting times ahead!
It’s good to Talk! Why we need to hear from you to find Space Warps
Some of you may be wondering what happens to an image after you hit “Next” and why “Talk”ing about your lens candidates is important, so here’s a brief explanation!
WHAT HAPPENS TO THE IMAGES YOU DON’T MARK?
Each night, we retire images from the pool based on your collective classifications. If the community together says no (i.e. by enough people not placing a marker on the image), we throw out the image so that we can focus your classifications on fresh data and images that might contain gravitational lenses. After only five weeks, you guys have made an astonishing 5.2 million classifications. This means we’ve been able to already reject about 60% of the total CFHT Legacy Survey as not containing gravitational lenses!
WHAT HAPPENS TO THE IMAGES YOU DO MARK?
When you mark an image two things happen. First, we record your mark in our database so we can compare it with what other people thought. Second, that image is automatically saved into your Talk profile under a collection called “My Candidates”. Talk allows you to discuss your interesting candidates with the rest of the Space Warps community. It’s great to see so many discussions happening there already, so please keep talking! Talking in Space Warps is an essential part of refining the list of plausible candidates, which is explained next.
HERE’S HOW YOU CAN HELP
As we work our way through the images, it looks as though we are going to end up with a sample of a few thousand lens candidates from your markings. That’s great – it means Space Warps is a very effective filter! But a few thousand is still several times more than the number of actual lenses we expect – so we’ll need to investigate the images of the candidates further before presenting them to the rest of the astronomy community. This is where you, and Talk, can really help us out!
BECOME CURATORS OF YOUR LENS COLLECTIONS
If you see a lens candidate, either when browsing Talk, or while you are marking, that you would like to see investigated further, make a “collection” called ‘Probable Lens Candidates’ and add this object to it! Remember, you can also add images you think are the most likely lens candidates from your automatically filled ‘My Candidates’ collection. Then, later on, you might do some further investigation of the images in your collection – or someone else in the collaboration might do it, after browsing your collections. Either way, collecting the candidates is the first step. You can start a discussion about any candidate or collection any time, and ask the Space Warps community to share their thoughts.
WHAT HAVE YOU FOUND SO FAR?
We’ve just started looking at the most commonly marked images, and there are some promising candidates already being discussed in Talk. Some of these are previously known lens candidates: as you may know, the CFHT Legacy Survey has been searched using automated computer algorithms. We’ve started to label the candidates from those searches in Talk, you’ll see the label “Known Lens Candidate” at the bottom right of the image in the individual object page of Talk. As well as the labels, Budgieye has done a phenomenal job in collecting known CFHT-LS lens candidates from the research literature in a dedicated discussion board. Much like the tricky simulations, some of these known candidates may be difficult to spot.
Most excitingly, some of you have started discussing a few lens candidates that we think have been missed by the algorithms – watch this space for a special post about these potential new lens candidates next week!!!
HOW TO GET STARTED IN TALK
If you want some top tips on using Talk, please visit the discussion board (thanks Budgieye!)
Thanks again for your phenomenal work – and let’s get Talking!!!