This morning I am at the first seminar arranged by the University of Edinburgh Citizen Science and Crowdsourced Data and Evidence Network. The Network brings together those interested in citizen science and crowdsourcing from across the organisation and this event is also supported by the Academic Networking Fund, IAD. Today’s seminar looks at the Zooniverse crowdsourcing organisation and suite of projects with two guest speakers, and I’ll be taking live notes here. As usual, because these are live notes there may be errors, typos, formatting issues, etc and corrections are welcomed.
We are starting our day with an introduction by James Stewart on the focus of the network, which will particularly focus on methodological approaches.
Grant Miller (Zooniverse): ‘The Zooniverse – Real Science Online’
About Grant and his talk:
‘The Zooniverse is the world’s largest and most successful citizen science platform. I will discuss what we have learned from building over 40 projects, and where the platform is heading in the future.’ (Website: https://www.zooniverse.org/)
Grant Miller is a recovering astrophysicist who gained his PhD from the University of St Andrews, searching for planets orbiting distant stars. He is now the communications lead for the Zooniverse on-line citizen science platform.
I had kind of a weird introduction into crowdsourcing and citizen science.. But the main thing I will be talking about today is about how we engage the Zooniverse community to participate and enjoy doing that and being part of our community.
Zooniverse all started with Kevin, a student at Oxford who was tasked with looking at thousands of images of the universe to find two sorts of galaxies: eliptical galaxies and spiral galaxies. He had a million to classify. He did 50,000 and then met with his supervisor and had some strong arguements: he didn’t want to spend his whole academic career classifying galaxies, and he argued that it didn’t require his training. So, by show of hands who thinks this image of a galaxy (we are looking at one of many) is an eliptical, how many think it is a spiral? The room votes that this is a spiral and it is indeed a spiral – and that’s basically how Zooniverse works. We show an image, we ask people what it is, and they choose. And people, en mass, really went for this. They went through huge amounts of images very quickly.
Other things started to happen to… The first community around the project was the Galaxy Zoo forum. A participant called Hanny found a thing (vootwerp)… It didn’t look like the galaxies she was classifying. This was a completely new astronomical phenomenon, which was never known about. An amateur had found this through this very simple platform. People aren’t just good at recognising patterns, they also get distracted and find new things. And after discovering and publishing on this phenomenon – a huge cloud of gas associated with a galaxy – a group from the community decided to make a project of looking for more of these in other Galaxy Zoo images. And this is why communities are so brilliant. On another project our community found a whole new worm under the sea. That’s the power of having this community taking part.
So, how do we do this? Well we really simplify the language of the task, make it easy for people to take part. And when Galaxy Zoo took off we found other scientists and researchers approaching us to build new projects including humanities projects, and biological projects. So we set up projects such as Snapshot Serengeti – used to indicate what you can see in images from camera traps on the Serengeti. I was working with a group of computer scientists trying to work out how to identify the object in the image, and also my 4 year old nephew… and he said in seconds, the computer scientists are still looking for a solution.
So at this point in time we now have 42 projects in the Zooniverse. Old Weather in 2010 was our first humanities project. It started as a climatology project, but because it was using historic ship logs and those include so many other types of data we found humanities researchers and historians coming on board so it has had a second life. We have other humanities projects, cancer research projects, etc. Of those projects about 30-35 are currently live. We think this will expand rapidly soon but I’ll come back to that. And last year we passed the 1 million volunteer mark, that’s registered volunteers. Mostly those are in Western Europe and North America, but we have participants in 200 countries (7 countries have not).
The community is expanding, the projects are expanding… But there is a lot of potential out there, a huge cognitive surplus we could be using. For instance Clay Shirky notes that 200 billon hours are spent watching TV by adults in the UK, it took only 100 million hours to create Wikipedia. We are only beginning to tap that potential. On January 7th last year we relaunched a project called Space Warps – we had over a million classifications an hour – when Prof Brian Cox and Dara O’Brien asked the public to do it on live TV. That meant that overnight we had discovered an object it can take astronomers years to discover. It’s good but it’s no 200 billion hours… Imagine what you could do with that much time. Every hour there are 16 years worth of human effort spent playing Angry Birds… How do we get that effort into citizen science?
So, if gamification the way to go? For those working in citizen science you could probably run a week long conference just on whether you should or should not do gamification. We have decided not to but some of the most successful – foldit and Eyewire – do use it. Those projects gave huge thought about how to ensure participants reward efforts in the right way so that people don’t just game the system. For us we are worried that that won’t work for us, not convinced we would be good enough building a game and end up with something neither game nor citizen science. But some of our projects have tried gamification and we have studied this. On Galaxy Zoo we used a leader board to start with but that caused some tension: those in the lead were doing hundreds of thousands of classifications and people felt the leaders might have cheated, others felt that they could never get there so just left. On Old Weather we enabled those participants who focused on a particular ships log could become captain – but it put off as many people as it attracted. And those who became captain had nowhere to go.
This comes back to motivation for taking part. When we do ask our volunteers frequently it comes down to those participants wanting to contribute to research. So, for instance, The Andromeda project involved images that weren’t that exciting… They were asked to circle clusters of galaxy. The task is simple, they feel they are really contributing… They finished the task in a week. This time, when we had finished we put up a message thanking participants for their contribution, saying that we had enough for the paper, but they were welcome to carry on… And that shows a rapid fall down to zero participation – they were only interested while the task at hand was useful. And that pattern reminds us not to mess with our community, they use precious spare time and they want to be doing something useful and meaningful.
Planet Hunters is a project we used to detect planets based on data. People don’t take part to discover planets, it is because they really are interested in the science. Some of our really active participants choose to download the data, write their own code, doing work at PhD level as a volunteer and sending data back… The planets discovered in that project are rare and weird – things we didn’t spot with algorithms – the first one found had 4 suns. And recently we found a seven planet solar system, the largest other than our own .
Volunteers are keen to go further, so we have a discussion area – labelled Talk – for all of our projects. That means you can comments, Twitter style, or you can use old style discussion boards for long form discussions. Those areas are also used by the scientists, the researchers, the technical teams and developers, and the community can interact with them there – the most productive findings often come from that interaction between volunteers and scientists. The talk areas of our community are really important. In fact we have a network diagram for our community we can see some of our most active participants – one huge green blob on this diagram is a wonderful woman called Elizabeth who posts and comments, and moderates, helps fellow volunteers come along. And we are looking at those networks, at who those lynchpins are, etc.
I said that people write their own code, do their own analysis… So can we get that on the site? We have been playing with the tools area, which we’ve tried this for Galaxy Zoo and for Snapshot Serengeti. We’ve been funded to build a broader set of tools, to map data, etc. from the website itself.
One of the other big things we are trying to do is to translate the site. For instance here is Galaxy Zoo in traditional character Mandarin. And we are doing this through crowdsourcing. You pick your site, and you show words or sections for users to translate. I talked about understanding the community and their interest and motivation. You also need to understand how we allocate images etc. We have done it based on seen/not seen but have been toying with the idea of shaping what images you see based on what you have seen, or are good at, or particularly like or are good at identifying. We tried that, shaping images to suit interested folk. When we tried that it wasn’t that successful, this was on Snapshot Serengeti, and realised we hadn’t been showing them blank images… So we looked at usage data to see to what extend seeing blank images impacts classifying images. It seems that the more blank images a user sees, the more they classify. When you classify a few/lots in one go they leave the site sooner. But psychologically we aren’t sure why this is yet – to classify a blank image its one click, that’s quick… But also what is the reward there for that image – is it just as rewarding to classify a blank image. There seems to be a sweet spot here… The same team trying to automatically spot a zebra has also been looking at identifying anything being in the image… But doing that may mean they leave the site sooner so we could be shooting ourselves in the foot…
So, we’ve been thinking who should see what? And as part of that we have been trying, with some of the space image projects, putting some simulated images into the mix to rank/detect expert level – and looking at that in comparison to their experience/expert level within the system. We want to see if there is a smarter way to do a Zooniverse project.
The other thing that can happen is fear, a sort of classification anxiety. For instance for cancer images people can be quite scared to click the button and contribute to the research. So we are toying with showing volunteers how the consensus clustering works – so we can show people that their marking counts but that they are backed up by the wisdom of the crowds we think that may help them trust themselves. At the moment we just blog about this stuff, but how can we show this on the site.
Panoptes is our new infrastructure platform, which we’ve been building for the last year, built with 2 million dollars of funding from Google. And the first project using this appeared on Stargazing Live this year, looking for Super Novas. We discovered five Super Novas during the week long run of that programme. That project on panoptes is infrastructure we will be building projects on, but anyone can run projects on this site. You can build your own project with name, introduction, research case, work flows – mark an ellipse, answer a question, etc. Then you upload your subjects/data as images. Scientists were building things in half an hour that would have taken our developer six months during our trials here. We will be launching our beta today, and launching fully over the next two weeks… There are still only two types of work flows at launch: tree logic, and classifying. But there are still so many other questions and tasks to do – but we hope to tackle and add facilities later on, notably: humanities/transcription – consensus being the main problem there; audio; and video. We have tried audio and video before but they won’t be in the first iteration of Panoptes. And we still have to answer the question of whether audio or video can work for citizen science – they are not that popular in our experience, but maybe that is about the projects not the format… There are still lots of questions to answer.
Q1) Can you say more about social motivation here. But also what about subjectivity and objectivity here – and how much opportunity there is to learn, how you become more able to identify things that have previously been ambiguous. Your predecessor talked about people popping on for a few minutes, not gaining
A1) For citizen science, crowdsourcing and volunteering generally the majority of people do just pop in briefly. The learning is often through the discussion areas. But we do see that people who do more classifications become better at it… And we see that the most comments people do post in discussion, the more technical detail or terminology they include. But we are also trying to actively teach our volunteers. When I came in we started looking at ways to go further than the data processing – I wanted to create an educational course for Planet Hunters, maybe a 25 slide course that could appear every few classifications through an invitation to take part every 10 classifications. People did opt in to that… And we thought that would improve classifications and keep volunteers in the system, as well as supporting them to learn. But we are still looking at ways to educate through the site.
Q2) Can you say more about who decides which projects are made live? So many research communities in the world, who’s using the data? Also is there any communications between the volunteers and the scientists?
A2) The process, until now, was that we got grant money to build citizen science projects and we put out calls for proposals. People would come to us with a case, and we would decide in-house as a team which seemed worth doing, were buildable, might be interested to try. Research output was always put first – they had to have a good research case. We would get 50-100 proposals and build 5 per year. But that has led to the new infrastructure. There is huge demand for citizen science, and all areas of science have huge amounts of data… But to some extent the problem still exists… I could put up 100,000 pictures when this platform goes live, so we will still be reviewing and filtering projects before they can be become official Zooniverse projects. So you can use the platform to build private projects etc. but before they can be on the homepage they will be filtered etc, tested in beta, rated by the crowd, etc. On the communication front – that’s mainly on discussion boards. And each participant had a suitable label – you can tell who the researchers are. So when Hanne made her discovery that was discussion boards and researchers following up and discussing that. But some of our volunteers and science teams do their own thing with google groups, hang outs etc.
Q3) I’m interested in your use of the word “discovery” and what that might mean. That end point is easy to attribute, but how do you credit all that prior work?
A3) The first author for the Planet Hunters project is that research team, then us, then those who have classified the planets. We try to attribute credit there. We are trying to work out how to credit everyone who has ever taken part – on the website, not on the papers – but it is now more complex. Even just in science it is complex – there are 30 people on that paper discovering a new planet… It becomes really properly collaborative and hard to credit. We try to recognise anyone we thin
Q4) In general, but particularly thinking about the new platform, how are you handling the moderation of images, data and discussion – there seems to be potential for really problematic trolling/inappropriate activity here, but also legitimate but inappropriate images.
Q5) What do you mean by private projects?
A5) You will be able to create a project and share only with those you send a link to. So we won’t be able to review them all. Hopefully they will be built by those genuinely trying to run a research project but we know people could use or abuse that facility, so we will state our policy and will delete anything that we need to, and to report to authorities if needed.
Q6) Researchers can already pay to use crowdsourcing, is that something you will be doing? e.g. Crowd Power, Mechanical Turk.
A6) In theory someone could offer financial rewards for a project running on the platform, we won’t facilitate that in the infrastructure and we will be sharing our ideals and policies. I have no problem for financial incentives as long as that is above board, but that’s not our model and not what we are offering. And there are serious citizen science questions about data quality where people are working for financial rewards. But it will be interesting to see what happens over the coming months.
Q7) Will all projects stay there forever?
A7) We already review our own projects. We do not want to waste people’s time. We will impress this on those using the new platform. And we will also make it possible for people to share the final products – papers etc – of those projects. Right now we have archive sites for our projects, we link to a GitHub site for retired projects, data etc.
Q8) Looking at loyalty for different projects. Presumably you have a small number doing large amounts of work… Does that pattern of loyalty track to different projects or do they only get very loyal about one project?
A8) In the past we deliberately separated our projects, we didn’t make great efforts to encourage volunteers to work across the projects, making it hard to switch between them. We’ve been thinking a lot about this when we think about delivering the right data to the right user, we are also thinking about letting volunteers know about the projects that will be of interest.
Grant shows an image annotated with consensus classifications in Galaxy Zoo
Mark Hartswood (Oxford University & CSCS Data and Evidence network founder): ‘Intervening in Citizen Science: From incentives to value co-creation’
About Mark and his talk:
‘This talk reflects upon a collaboration between SmartSociety, an EU project exploring how to architect effective collectives of people and machines, and the Zooniverse, a leading on-line citizen science platform.
Our collaboration tackled the question of how to increase engagement of Zooniverse volunteers. In the talk I will chart how our thinking has progressed from framing volunteering in terms of motivation and incentives, and how it moved towards a much richer conceptualisation of multiple participating groups engaging in complicated relationships of value co-creation.’
Mark Hartswood is a Social Informatician whose main employer is Oxford University and currently working in the area of Responsible Research and Innovation.
I am going to start with an answer to one of Grant’s questions.. volunteers find it fun to see a surprisung image – building up hope and tension for an exciting image… I’d taken this slide out of my slides but I thought I’d add it back in…
Grant: Isn’t it great when you see the same answer in two different places!
Mark: In my talk proper I’ll be talking about motivations for participation, and I will be looking at several projects here SOCIAM, Smart Society (which I work on) and Zooniverse, with acknowledgements to my colleagues on the study I will be talking about.
Our colleagues at Ben-Gurion University of Negev have been looking at incentive schemes for crowd sourcing, and Zooniverse offered us an opportunity to try this out with a group of real volunteers…
Our study in a nutshell was:
- Auto ethnography – exploring Zooniverse as a volunteer
- Survey of Zooniverse participants, looking at motivation, anxiety, engagement, disengagement. Targeted at volunteers actve in last three months
- Develop an intervention to re-engage volunteers (essentially an email)
- Intervention successful…
But that’s not the story I want to tell today. I want to talk about conceptualising citizen science as co-creation of value, looking at the literature and moving to a co-creation of value approach.
Literature wise: Peer production has been posed as a problem for economists in terms of understanding motivation (Benkler). Motivation for citizen science is important but it seems hard to properly explain. Roddich et al found motivations were multiple and compound – from appreciating scale and beauty of universe, supporting scientific process, personal connection to the project. There can be real mix. And they give complex narratives. Motivations are also shown to be dynamic, they change, evolve, wax or wane (Rotman et al). And motivation is non exhaustive in explaining participation – Eveleigh et al shows that people may be highly motivated but not have time/be able to participate in practice.
Coupled with motive are issues of reward and inventives. Often in the literature motives are coupled with the idea that the right motives can lead to use of rewards or inventives. Incentives seen to generate interest, sustain engagegemnt and improve quality in citizen science according to Prestopnik et al. Or exerting a form of leverage. Or “programming” participation (Maggi et al?).
So Dickinson et al (2012) looked at incentives and rewards. But there are some confusing combination of badges and certificates as incentives, discussion as social incentives, and other incentives. Building community and recognising effort are also part of the mix. There are real mixes of social individualised approaches, and more social processes.
There are some real problematic areas here. Kittur et al that motivation must be there first, incentives should just align otives to desiered behaviour. Gamification could produce ambivalent results in citizen science (Darch, Preist et al). Incentives can create perverse outcomes as well (Sneddon et al).
We want to not ask what motivates people, but ask how participation creates value for participants and for others. So what is co-creation of value? It has its origins in commerce and value. The idea is that value is created in the factory and delivered to the consumer, in the past. Currently the customer is active in creating the value of the product or service. That includes promoting the product, design of new products, aiding diffusion. Flows of value to the business, the customer, and to other customers – see for instance WetSeal which enables customers to combine garments into collections, to share those, to share images of themselves in garments, etc.
So, in science we can see co-creation of value in citizen science. In a mature platform like Zooniverse there are complex types of values shared. Different forms of value are shared by participants. There are diverse reasons to participate, very varied levels of participation by individuals. There is a difference between value made collectively (e.g. casual users who make only a few classifications), and value made individually (the few who make many classifications). And we see those conversations on forums on, say, anomolies, and scientist responses to those… add values to the community, become resources for the community, and scientist blog posts also add to that, and help acknowledge the role of volunteers. And participants also build social capital via social media, which also promotes the platform. And contributed data and project outputs we see materials like star catalogues becoming available for individuals to use in their own research.
So there are complex forms of value, and those values interact. Changes in incentives can therefore change dynamics in this web of value.
Looking at a scientist blog post “There’s a green one and a pink one and a blue one and a yellow one” – beginning with an image visualising all the contributions of a community, from super active participants, through to those making a few each. The text of this post speaks to the delicacy of talking about participants in a project with those dynamics, acknowledging contribution of all forms and emphasising that volume is not the only measure. The post is artfully written to achieve a number of delicate balances. The crowd each has to be acknowledged as valuable. It would be easy to praise the highly active participants, and dismiss casual participants, and this post carefully avoids any sense of jealousy, unfairness, etc.
If we have complex dynamics in these webs of value and co-creation, what happens when incentives explicitly value one type of contribution over another. And that brings us back to the effects of gamification. So, looking at Old Weather, where contributions enabled you to rise to the rank of captain… The leaderboard explicitly values volume of contribution. For non gamers game elements can be demotivating, and the heights of the leaderboard looks inaccessible (see Darch). But also leader borads can set a normative standard for contribution that demotivates the long tail (Preist et al). So, we think a co-creation model enables us to better understand the impact of changing the dynamics through incentives.
This takes us back to the inventions we looked at in our study… And comments from Zooniverse participants. In terms of how volunteers became disengaged that was about boredom/forgetting about the project, about distractions from work or home, and people said that to motivate them an email when they haven’t logged in might work. So we looked at an email to remind volunteers about zooniverse.
But there were other reasons too. Ideas about achieving a level of mastery, and if you are not reaching that it isn’t valuable, or fear of classifying in case of mistakes. And there we think an incentive that might be effective is reassurance about classification anxiety.
We also saw volunteers unware of other projects being available to participate in – which can be resolved through sign posting to other projects.
So, benefits of a co-creation perspective…
- More symmetical idea – motives held by volunteer and incentives are things you do to the volunteer
- Less individualistics – explains more complex relationships and dynamics between both participating individuals and groups
- Don’t want to reject incentives or motivations – but want to put them in broader non-individualistic framework
- Opens up a broader framework for design e.g. around diagnosing and repairing problems where participants fail to realise value for themselves or each other
- Provides access to thinking about value and values and ethics dilemmas in participatory citizen science based on principles of mutuality and equitability
- Much of this is half-articulated in the citizen science literature – but moving away from the language and logic of incentives and motives helps realse it more fully.
Q1) I think you’ve both given brilliant talks on the motivations of students in learning environments – that’s my area and educators have been looking at this for some time. With intrinsic and extrinsic motivations. Is that something you are looking at?
A1) Is there a whole area of literature here then?
Q1) Betty Collis comes to mind on the issue of co-creation. But yes, there is a literature there in education.
A1) It would be interesting to make those connections there…
Comment) I think that you are also talking about the psychology of learning, and there are really different motivations there, some quite instrumental… Do you have any thoughts on that based on what you have seen in Zooniverse?
A2) I am certainly still exploring this area. But I think the idea that motivations are a priori has to be challenged. Zooniverse creates a space for volunteers to be challenged by things they may have never thought of before.
Q2) And what incentives would you recommend for an online learning forum
A2) There is that diversity… And that is quite healthy. And we don’t neccassarily want to convert all this sort of person, into that sort of person. Zooniverse is pretty successful in creating lots of different sorts of rooms – to participate in different sort of ways. Catering to that diversity, and accepting that, is actually sort of important.
Comment) A lot of the crowdsourcing systems in commercial academic fields started very nievely – individualised collective intelligence idea… realising the wisdom of the crowd but then seeing the community collaborating and changing things… So now we see discovering of the world of people, normal dynamics… But also new things are brought to that space… Mutual new ideas that can help fields think about social organisation and motivation and things…
Comment) You are seeking to do something different to us (educators) but you are similarly trying to avoid negative experiences through cliques, and you also don’t want to create that.
Grant) We had a Zooniverse discussion board, with many early super users… They were quite cliquey. They were not hostile but almost too much too soon for someone new coming in. They were using technical language, showing their knowledge, perhaps feeling or behaving in quite entitled ways. So we do think about how we get people to form a healthy community… And it’s not something we have solved…
Comment) And you haven’t written that up, as that would be divisive.
Grant) Indeed, but we have been looking at new ways to tackle that potential issue – breaking down walls between projects being part of that – by relaunching talk. We find commentators wanting a count of how many comments they have made – and we don’t want to convey authority in that way. It is common in forums but we don’t want to do that.
Comment) But people do invest time and knowledge… So levelling everyone to the shame can diminish contribution.
Grant) I like that blog post Mark highlighted for it’s approach to acknowledging contributions of all types. We have to think about how to reward everyone, without alienating the other types of contributions.
Mark) It’s not so much about levelling, but about emergent politics about values. And being thoughtful of those dynamics.
Comment) But to some extent you’ll never understand the reasons for participation. There was a US project with two users who were way ahead… proved to be a guy and his father in law competing!
Grant) There are a whole bunch of compound motivations – some may be petty, some may be
Mark) We had some really lovely motivations and some really sad ones – terminally ill people wanting to make a contribution for instance. But there were also motivations that were total turn offs – some wanted to look at alien worlds, some found that disturbing or frightening. People had really individual perspectives.
Comment) You’ve talked about people sharing what they do to social media accounts – bi-passing a lack of gamification by sharing in that way!
Grant) That is implemented more for sharing a lovely image – it’s not about numbers but sharing something interesting. We have talked about the idea – and have some new funding – to build a native Facebook app for four of our projects… But that sort of issue may arise there. Whether personal announcement is motivating or not.
Comment) More open platforms does enable more entrepreneurship and different approaches.. It becomes a game perhaps… Could be other things to search for… Scrapbooking the loveliest images, new ways into projects.
Grant) We are wary of gamification, but it can create motive for some but it is kind of treacherous. We have also seen volunteers make their own games out of ungamified projects – tracking how many animals or types of galaxies etc. they have seen. There are some who like the idea of a gamified Zooniverse project.
Q) How representative do you think the Zooniverse volunteers are – they are very heavily studied as a group, and the literature looks at very few niche groups but how do they compared to that big pool of untapped talent – that 200 billion hours.
A) Demographically it was a very flat age range – very level participation across age ranges. Participants tended to be quite highly educated. So a lot of untapped reserves would be about that less educated range of people perhaps.
Grant) One of the things we indicated in our funding we do have that flat age range, but we also have Facebook likes and that lets us see detailed demographic age range. We saw a massive discrepancy there with loads of young people, those under 25 who were interested on Facebook but didn’t participate on the Zooniverse projects.
Mark: Under 18s weren’t in our study for ethical reasons…
Grant: But even looking just at 18-25 year olds that discrepancy between the Facebook likes and the participation applied.
Comment) Just on that gamification front, it does work but why it works is really an issue.
And with that we are closing the session… This event has really shown the value of combining very different people in the room… That breadth of interests etc. And I think that bodes well for our network as a whole, and that will hopefully add real value to our events in the future.