What happens to your markers? A look inside the Space Warps Analysis Pipeline

Once you hit the big “Next” button, you’re moving on to a new image of the deep night sky – but what happens to the marker you just placed? And you may have noticed us in Talk commenting that each image is seen by about ten people – so what happens to all of those markers? In this post we take you inside the Space Warps Analysis Pipeline, where your markers, get interpreted and translated into image classifications and eventually, lens discoveries.

The marker positions are automatically stored in a database which is then copied and sent to the Science Team every morning for analysis. The first problem we have to face at Space Warps is the same one we run into in life every day – namely, that we are only human, and we make mistakes. Lots of them! If we were all perfect, then the Space Warps analysis would be easy, and the CFHTLS project would be done by now. Instead though, we have to allow for mistakes – mistakes that we make when we’ve done hundreds of images tonight already and we’re tired, or mistakes we make because we didn’t realise what we were supposed to be looking for, or mistakes we make – well, you know how it goes. We all make mistakes! And it means that there’s a lot of uncertainty encoded in the Space Warps database.

What we can do to cope with this uncertainty is simply to allow for it. It’s OK to make mistakes at Space Warps! Other people will see the same images and make up for them. What we do is try and understand what each volunteer is good at: Spotting lenses? Or rejecting images that don’t contain lenses? We do this by using the information that we have about how often each volunteer gets images “right”, so that when a new image comes along, we can estimate the probability that they got it “right” that time. This information has to come from the few images where we do actually know “the right answer” – the training images. Each time you classify a training image, the database records whether you spotted the sim or caught the empty image, and the analysis software uses this information to estimate how likely you are to be right about a new, unseen survey image. But this estimation process also introduces uncertainty, which we also have to cope with!

We wrote the analysis software that we use ourselves, specially for Space Warps. It’s called “SWAP”, and is written in a language called python (which hopefully makes it easy for you to read!) Here’s how it works. Every volunteer, when they do their first classification, is assigned a software “agent” whose job it is to interpret its volunteer’s marker placements, and estimate the probability of the image at hand containing a gravitational lens. These “agents” are very simple-minded: in order to make sense of the markers, we’ve programmed them to make a basic assumption: that they can interpret their volunteer’s classification behavior using just two numbers, the probabilities of being right when a lens is present, and of being right when a lens is not present, which they estimate using your results for the training images. The advantage of working with such simple agents is that SWAP runs quickly (easily in time for the next day’s database dump!), and can be easily checked: it’s robust. The whole collaboration of volunteers and their SWAP agents makes up a giant “supervised learning” system: you guys train on the sims, and the agents then try and learn how likely you are to have spotted, or missed, something. And thanks to some mathematical wizardry from Surhud, we also track how likely the agents are to be wrong about their volunteers.


What we find is that the agents have a reasonably wide spread of probabilities: the Space Warps collaboration is fairly diverse! Even so, *everyone* is contributing. To see this we can plot, for each image, its probability of containing a lens, and follow how this probability changes over time as more and more people classify it. You can see one of these “trajectory” plots above: images start out at the top, assigned a “prior” probability of 1 in 5000 (about how often we expect lenses to occur). As they are classified more and more times they drift down the plot, and either to the left (the low probability side) if no markers are placed on them, and to the right (the high probability side) if they get marked. You can see that we do pretty well at rejecting images for not containing lenses! And you can also see that at each step, no image falls straight down: every time you classify an image, its probability is changed in response.

Notice how nearly all of the red “dud” images (where we know there is no lens) end up on the left hand side, along with more than 99% of the survey images. All survey images that end up to the left of the red dashed line get “retired” – withdrawn from the interface and not shown any more. The sims, meanwhile, end up mostly on the right, as they are correctly classified as lenses: at the moment we are only missing about 8% of the sims, and when we look at those images, it does turn out they are the ones containing the sims that are the most difficult to spot. This gives us a lot of confidence that we will end up with a fairly complete sample of real lenses.

Indeed, what do we expect to find based on the classifications you have made so far? We have already been able to retire almost 200,000 images with SWAP, and have identified over 1500 images that you and your agents think contain lenses with over 95% probability. That means that we are almost halfway through, and that we can expect a final sample of a bit more than 3000 lens candidates. Many of these images will turn out not to contain lenses (a substantial fraction will be systems that look very much like lensed systems, but are not actually lenses) – but it’s looking as though we are doing pretty well at filtering out empty images while spotting almost all the visible lenses. Through your classifications, we achieving both of these goals well ahead of our expectations. Please keep classifying and there will be some exciting times ahead!


9 responses to “What happens to your markers? A look inside the Space Warps Analysis Pipeline”

  1. Jean Tate says :

    Very cool!

    In the top chart/plot/graph, what is the distinction between black/dark grey-filled circles (datapoints?), grey-filled circles, open grey circles, and open blue ones? What are the horizontal lines on (some of) the circles, some kind of error bar?

    • Phil Marshall says :

      Each point represents a different image, and so the plot shows 500 randomly selected images getting classified over time. The points are all slightly translucent, so when they lie on top of each other, you see a darker point. Filled points are “active” – they lie between the two vertical threshold lines, and are still being classified. Open points to the left of the red line have been “retired” so that you don’t have to look at those images any more! Open points to the right of the blue line represent “candidates” – images that the classifiers think contain lenses. Those ones are not retired – there are not that many of them, and it’s good to get more opinions on these interesting images. Grey points are the images we are interested in: the survey images. Blue points represent training images containing simulated lenses, and so should end up on the right; red points represent training images known not to contain lenses (“duds”) and should end up on the left. The horizontal lines attached to the points are indeed “error bars” – they give a rough idea of how uncertain that image’s probability is. Good questions, thanks!

      • Jean Tate says :

        That is very, very cool! Thank you.

        Every line starts at 0.5 classifications and 2×10^-4 posterior probability; well, I guess they have to start somewhere!

        Something very strange – and potentially interesting (tho’ perhaps not to SW-ites) – is that there seem to be ~3 objects which have not made it to ‘candidate’ nor to 10^-4). What was special about this one?

      • Jean Tate says :

        Darn it! 😦 The WP interpreter converted the text between the “less than” and “greater than” characters in my last reply to an unintelligible HTML tag, resulting in a rather meaningless sentence!

        Trying again … “is that there seem to be ~3 objects which have not made it to ‘candidate’ nor to” less-than 10^-4, even after 20+ classifications. What are the reasons for this? At ~1%, there must be an awful lot of them in the whole database!

        Also, though it’s not clear (to me), there seems to be one object which ‘arose from the dead’ (i.e. it crossed the red line before/by 10 classifications, but then ‘tacked right’, ending up – after ~100 classifications – at greater-than “10^-4). What was special about this one?

      • Phil Marshall says :

        The images that are still hovering between the dashed lines (neither rejected nor detected) are probably systems where there is genuine doubt over objects that look a bit like lenses. I would expect most to eventually settle down in one camp of the other, but if they don’t, well, then they will just have to stay as low probability candidates! The important thing is to be able to assign a numerical probability to each image, based on the volunteers’ classifications: without this, we would not be able to make good decisions about follow-up analysis etc. Images that cross the red line cannot come back – but they can nearly cross it, and then osciillate right again. Such images are like the others between the lines – low probability candidates. You’re right about there being a large number of them: we set the detection threshold high to keep the candidate sample small, and we hope, pure.

  2. mitch says :

    Thanks Dr. Phil,
    Explains a lot of the things that were percolating up in the back of my mind. What happens when you don’t get any more training images in the pipeline? Are they all retired now?

    • Phil Marshall says :

      As images are retired, new ones are brought in to the system; this will keep going until either we complete the project, or we decide to focus on a particular subset of images (in which case we’ll activate those ones by hand). If you, individually, stops seeing images, it’s because you have seen all the images that are currently active (the system is set up not to ever show you the same image twice). When that happens, I guess it’s time to spend some time in Talk while you wait for rest of the collaboration to retire some images!

  3. James Pruett says :

    Did you ever consider using the Pareto principle? It is a super easy way of handling these types of things on earth and I don’t doubt that it works as well in the universe. Granted, it only finds 80% with one pass but that’s probably more than you will ever use anyway.

    • Phil Marshall says :

      Unfortunately 80% is not high enough: ultimately we need to reject 99.99% of our images! One of the things we will learn from the analysis is the distribution of classification probabilities, and it will be interesting to compare with a Pareto distribution!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: