Aug 092016
 
Notes from the Unleashing Data session at Repository Fringe 2016

After 6 years of being Repository Fringe‘s resident live blogger this was the first year that I haven’t been part of the organisation or amplification in any official capacity. From what I’ve seen though my colleagues from EDINA, University of Edinburgh Library, and the DCC did an awesome job of putting together a really interesting programme for the 2016 edition of RepoFringe, attracting a big and diverse audience.

Whilst I was mainly participating through reading the tweets to #rfringe16, I couldn’t quite keep away!

Pauline Ward at Repository Fringe 2016

Pauline Ward at Repository Fringe 2016

This year’s chair, Pauline Ward, asked me to be part of the Unleashing Data session on Tuesday 2nd August. The session was a “World Cafe” format and I was asked to help facilitate discussion around the question: “How can the respository community use crowd-sourcing (e.g. Citizen Science) to engage the public in reuse of data?” – so I was along wearing my COBWEB: Citizen Observatory Web and social media hats. My session also benefited from what I gather was an excellent talk on “The Social Life of Data” earlier in the event from the Erinma Ochu (who, although I missed her this time, is always involved in really interesting projects including several fab citizen science initiatives).

I won’t attempt to reflect on all of the discussions during the Unleashing Data Session here – I know that Pauline will be reporting back from the session to Repository Fringe 2016 participants shortly – but I thought I would share a few pictures of our notes, capturing some of the ideas and discussions that came out of the various groups visiting this question throughout the session. Click the image to view a larger version. Questions or clarifications are welcome – just leave me a comment here on the blog.

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

If you are interested in finding out more about crowd sourcing and citizen science in general then there are a couple of resources that made be helpful (plus many more resources and articles if you leave a comment/drop me an email with your particular interests).

This June I chaired the “Crowd-Sourcing Data and Citizen Science” breakout session for the Flooding and Coastal Erosion Risk Management Network (FCERM.NET) Annual Assembly in Newcastle. The short slide set created for that workshop gives a brief overview of some of the challenges and considerations in setting up and running citizen science projects:

Last October the CSCS Network interviewed me on developing and running Citizen Science projects for their website – the interview brings together some general thoughts as well as specific comment on the COBWEB experience:

After the Unleashing Data session I was also able to stick around for Stuart Lewis’ closing keynote. Stuart has been working at Edinburgh University since 2012 but is moving on soon to the National Library of Scotland so this was a lovely chance to get some of his reflections and predictions as he prepares to make that move. And to include quite a lot of fun references to The Secret Diary of Adrian Mole aged 13 ¾. (Before his talk Stuart had also snuck some boxes of sweets under some of the tables around the room – a popularity tactic I’m noting for future talks!)

So, my liveblog notes from Stuart’s talk (slightly tidied up but corrections are, of course, welcomed) follow. Because old Repofringe live blogging habits are hard to kick!

The Secret Diary of a Repository aged 13 ¾ – Stuart Lewis

I’m going to talk about our bread and butter – the institutional repository… Now my inspiration is Adrian Mole… Why? Well we have a bunch of teenage repositories… EPrints is 15 1/2; Fedora is 13 ½; DSpace is 13 ¾.

Now Adrian Mole is a teenager – you can read about him on Wikipedia [note to fellow Wikipedia contributors: this, and most of the other Adrian Mole-related pages could use some major work!]. You see him quoted in two conferences to my amazement! And there are also some Scotland and Edinburgh entries in there too… Brought a haggis… Goes to Glasgow at 11am… and says he encounters 27 drunks in one hour…

Stuart Lewis at Repository Fringe 2016

Stuart Lewis illustrates the teenage birth dates of three of the major repository softwares as captured in (perhaps less well-aged) pop hits of the day.

So, I have four points to make about how repositories are like/unlike teenagers…

The thing about teenagers… People complain about them… They can be expensive, they can be awkward, they aren’t always self aware… Eventually though they usually become useful members of society. So, is that true of repositories? Well ERA, one of our repositories has gotten bigger and bigger – over 18k items… and over 10k paper thesis currently being digitized…

Now teenagers also start to look around… Pandora!

I’m going to call Pandora the CRIS… And we’ve all kind of overlooked their commercial background because we are in love with them…!

Stuart Lewis at Repository Fringe 2016

Stuart Lewis captures the eternal optimism – both around Mole’s love of Pandora, and our love of the (commercial) CRIS.

Now, we have PURE at Edinburgh which also powers Edinburgh Research Explorer. When you looked at repositories a few years ago, it was a bit like Freshers Week… The three questions were: where are you from; what repository platform do you use; how many items do you have? But that’s moved on. We now have around 80% of our outputs in the repository within the REF compliance (3 months of Acceptance)… And that’s a huge change – volumes of materials are open access very promptly.

So,

1. We need to celebrate our success

But are our successes as positive as they could be?

Repositories continue to develop. We’ve heard good things about new developments. But how do repositories demonstrate value – and how do we compare to other areas of librarianship.

Other library domains use different numbers. We can use these to give comparative figures. How do we compare to publishers for cost? Whats our CPU (Cost Per Use)? And what is a good CPU? £10, £5, £0.46… But how easy is it to calculate – are repositories expensive? That’s a “to do” – to take the cost to run/IRUS cost. I would expect it to be lower than publishers, but I’d like to do that calculation.

The other side of this is to become more self-aware… Can we gather new numbers? We only tend to look at deposit and use from our own repositories… What about our own local consumption of OA (the reverse)?

Working within new e-resource infrastructure – http://doai.io/ – lets us see where open versions are available. And we can integrate with OpenURL resolvers to see how much of our usage can be fulfilled.

2. Our repositories must continue to grow up

Do we have double standards?

Hopefully you are all aware of the UK Text and Data Mining Copyright Exception that came out from 1st June 2014. We have massive massive access to electronic resources as universities, and can text and data mine those.

Some do a good job here – Gale Cengage Historic British Newspapers: additional payment to buy all the data (images + XML text) on hard drives for local use. Working with local informatics LTG staff to (geo)parse the data.

Some are not so good – basic APIs allow only simple searchers… But not complex queries (e.g. could use a search term, but not e.g. sentiment).

And many publishers do nothing at all….

So we are working with publishers to encourage and highlight the potential.

But what about our content? Our repositories are open, with extracted full-text, data can be harvested… Sufficient but is it ideal? Why not do bulk download from one click… You can – for example – download all of Wikipedia (if you want to).  We should be able to do that with our repositories.

3. We need to get our house in order for Text and Data Mining

When will we be finished though? Depends on what we do with open access? What should we be doing with OA? Where do we want to get to? Right now we have mandates so it’s easy – green and gold. With gold there is PURE or Hybrid… Mixed views on Hybrid. Can also publish locally for free. Then for gree there is local or disciplinary repositories… For Gold – Pure, Hybrid, Local we pay APCs (some local option is free)… In Hybrid we can do offsetting, discounted subscriptions, voucher schemes too. And for green we have UK Scholarly Communications License (Harvard)…

But which of these forms of OA are best?! Is choice always a great thing?

We still have outstanding OA issues. Is a mixed-modal approach OK, or should we choose a single route? Which one? What role will repositories play? What is the ultimate aim of Open Access? Is it “just” access?

How and where do we have these conversations? We need academics, repository managers, librarians, publishers to all come together to do this.

4. Do we now what a grown-up repository look like? What part does it play?

Please remember to celebrate your repositories – we are in a fantastic place, making a real difference. But they need to continue to grow up. There is work to do with text and data mining… And we have more to do… To be a grown up, to be in the right sort of environment, etc.

Q&A

Q1) I can remember giving my first talk on repositories in 2010… When it comes to OA I think we need to think about what is cost effective, what is sustainable, why are we doing it and what’s the cost?

A1) I think in some ways that’s about what repositories are versus publishers… Right now we are essentially replicating them… And maybe that isn’t the way to approach this.

And with that Repository Fringe 2016 drew to a close. I am sure others will have already blogged their experiences and comments on the event. Do have a look at the Repository Fringe website and at #rfringe16 for more comments, shared blog posts, and resources from the sessions. 

Apr 202016
 

This is a very belated LiveBlog post from the CSCS Network Citizen Science and the Mass Media event, which I chaired back on 22nd October 2015. Since the event took place several videos recorded at the event have been published by the lovely CSCS Network folks and I’ve embedded those throughout this post.

About the Event

This session looked at how media and communications can be used to promote and engage communities in a crowd sourcing and citizen science project. This included aspects including understanding the purpose and audience for a project; gaining exposure from a project; communicating these types of projects effectively; engaging the press; expectation management; practical issues such as timing, use of interviewees and quotes, etc.

I was chairing this session, drawing on my experience working on the COBWEB project in particular, and I was delighted that we were able to bring in two guest speakers whose work I’ve been following for a while:

Dave Kilbey, University of Bristol and Founder and CEO of Natural Apptitude Ltd. Natural Apptitute works with academic and partner organisations to create mobile phone apps and websites for citizen science projects that have included NatureLocator, Leafwatch, Batmobile, and BeeMapp. Some of these projects have received substantial press interest, in particular Leafwatch (along with the wider Conker Tree Science initiative), and Dave will talk about his personal experience of the way that crowd sourcing and citizen science and the media work together, some of the benefits and risks of exposure, and some of the challenges associated with working with the press based on his own experience.  @kilbey252

Alastair (Ally) Tibbitt, Senior Online Journalist at STV, where he has been based since 2011 working both in journalism and community engagement. Aly’s background lies in community projects in Glasgow and Edinburgh, experience that informs his work writing both for STV and Greener Leith. He has particular interests in hyperlocal news, open data and environmental issues, giving him a really interesting insiders’ perspective on the way that citizen science and crowd sourcing can engage the press, some of the realities of media expectations, timings, etc. and an insight into effective ways to pitch a citizen engagement story. @allytibbett

My notes from the talks were captured on the day but, due to chairing, I wasn’t able to capture all of the discussion or questions that arose in the session. The video below captures the talks, with my notes from these below. 

YouTube Preview Image

Musings on Media and Communications for Citizen Science Projects – Dave Kilbey, Natural Appitude

I’m not an expert but I have been working in this area for some time so these are some musings informed by my work to date.

I’ve worked on a variety of projects, which started with a project called NatureLocator – all basically mobile apps, but also website. We try to make it as simple as possible for people to take part in these projects, and we try to do that working with experts so that the data we collect is useful and purposeful. So our projects include work on invasive species, work with the biological monitoring centre. So effectively we work with researchers, organisations, and engaging the public in what we do. And we do that with design of bespoke smartphone apps and websites. In theory Innovative but actually much of this is established – although BatMobile is an exception – as was never really good enough to launch. And public engagement is central to what we do, and from that naturally comes much of our engagement with media.

We spend a lot of time and money on design and usability, because if they aren’t easy to use and appealling then participants won’t use them or use them again. The apps are for contribution, the website is for looking at the data – that’s more of an unprovoked engagement…

So the content on media on communications is this bit, which I’m calling “Smurfs… and the wrong kind of conkers”.

So I thought about why we want media coverage in the first place? It’s obvious but it matters… And these are selfish through to altruistic…

We want this to get the project (and us) noticed – we want to share what we do, and to get the project out there (important for a business too). You want to engage an army of volunteers – you can’t have citizen science without citizen scientists, you need people engaged. You want to attract more funding – crucial in a university context. Success metrics – which include impact – we are measured on how many people took part, engaged etc. and as researchers we are also measured on media presence to an extent. But there is also the aspect of personal satisfaction, and that matters.

On a more altruistic basis is increase knowledge of a concept or problem – we’ve really had that feedback on our invasive plant species work. Citizen science is increasingly about finding solutions to problems – there are all sorts of things like examination of proteins being gamified, so participants contribute regardless of knowledge. We also want to inspire interest, perhaps even the next generation of researchers – we are all passionate about what we do, and want to share that…

But the crux of the matter is that media isn’t always as important in the ways you’d expect.

If your project isn’t ready, the media coverage will be a real pain. There is a project called Ash Town done more of less as a media stunt… The organisation using the data wasn’t ready, the data wasn’t ready… and they had a backlog of verification and that disillusioned participants… The feedback loop wasn’t there but they had to take advantage of that moment. So I tend to be quite conservative about when I share projects, I want them ready.

Quite a few of our projects have had mass media interest and that can be brilliant but they cause a big spike and are largely unfocused… Normally you want a focused set of interested participants. It can be helpful but long term it’s less clear how it is helpful for finding those participants. By contrast micro media and focused marketsing and events, such as conferences, lead to better engagement – and the data from targeted audiences tends to be much better. For example there was a big issue of giant hog weed in the media this summer – we had more records than ever before… but 80% of that data was incorrect. Normally the data in Plant Tracker is 90% accurate. That was due to lots of people finding out about giant hog weed and recording lots of false positive. NOt neccassarily a problem, but an issue for data centric projects.

So we find drip feeding/organic networking works best for us. But as they say “Any publicity is good publicity?”… Maybe…. Mostly we’ve had good coverage,

To use a fishing analogy I see the mass media as ground bating – causing a general feeding frenzy, but then you have to think about how you are baiting your hook to make use of this… So it’s all about how you follow up…

So, with our first app, Leaf Watch, we had loads of media coverage. This project was small scale before with maybe 500 records a year, without the photos or georeference. So we set up a smartphone app with that sort of data for verification interested… And we had 5000 records… But also a lot of noise… 3 bottom pictures, and worse… even a smurf!

So, how to attract publicity… Again, I’m no expert… Often it’s about finding an interesting story to tell that has relevance at this point in time – is there a hook to draw people in, trigger their imagination. For the Uni of Bristol it was often our Public Relations Office that often got us the gig. Me, on my own using my Twitter feed, is going to get the Times interested… So utilise your existing resources in your organisation, they have some great powerful contacts etc. to call on. And I have a colleague who does a good job of researching likely journalists and contacting them directly…

Really much of this feels random, but it’s about a lot of events coming together, and stuff in the outside world… Looking for those opportunities to tell your story to an audience that’s ready to listen… (And do get in touch).

Engaging the Media – Ally Tibbett, STV

I work at STV, and have a background in community projects and volunteering activities. I currently work at STV, also setting up a fledgling news site.

So I wanted to set the context of engaging with media… ANd I wanted to set the scene. Many newspapers are losing 10% circulation, broadcast TV are doing better, but still online transition. But most media company websites are booming – our STV pages collectively reach a few million people a day. So still a lot of reason to get word out there. And it’s worth planning that as you do your citizen science project. You need to think about where you will find the people you do want to engage with. More and more people get their news via social media. Many read news via mobile device. It’s getting more visual with vides, images, infographics. Big interactive graphics are great, but hard to scale to a phone so many media companies keep it simple..

So I’ve tried to set this up as a timeline… How you might engage the media… Before your project. When recruiting participants – who do you want to reach, is it a specific geography? Age greoup? demographic? that should influence both the scial media platfors and media companies you use. What is the benefit for participants? What is the long term goal. Is ther ean interesting back story – and what change will it bring about. And plan out a communication calendar – can you hook into, e.g. International Authors day. Editors are always looking for a new angle on events, or a local angle on a national news story. And even if that doesn’t fit your timing it can be helpful. The other thing to think about is what digital assets can you share/produce. A press release is nice, but a press release with bangs and whistle, with infographics or images etc. That is brilliant – helps journalists know why they should engage now. It’s about the infotainment, not just the data. And it could be as simple as a slideshow, or animated gifs, or data we could map. Thinking about citizen science projects I’ve already worked on, I thought of a project on happiness on different neighbourhoods – we persuaded them to share some data. If you do want help producing maps etc, then there are skilled journalists who can help. We’ll need a Shapefile. And we need that data to be open to support more open interactive stuff…

So, assuming you had a nice launch and a little publicity boost… How do you engage dring th eproject? Well citizen engagement can be more than just research – can they promote project fro you on social media. You need a #hashtga to generate social media buss and help you collate conversation. Can you give progress reports to journalists who covered the launch and those you hope will cover final results. And building that buzz from the outset, can mean there is a story, and help show th eimpact of your prokect. Also, thnk about things that cannot be shared – could be copyright or child protection etc. issues. And as you aggregate content around the hashtag and curate the best, remove anything with an issue. Tools like STorify let you do this.

From my point of view one of the best ways to engage the press is when there is a result, a discovery… The media thrives on a wee bit of controversy etc. So Neive Short from CRESH at Edinburgh looks at mapping alchohol etc. and social issues – she is a campaigning academic, taking her studies to policy makers, and that, for instance, is always of interest. So air quality or air pollution crowd sourcing project would certainly have some of those qualities, those cases to engage policy makers. Too often we get press releases about “we did a study… we might be able to do something in the future…” but we need a concrete story really…

A note on press releases… They are fundamentally quite useful. Do send them out. Keep them short. Include multiple short quotes. have a clear top line, be clear about what you’ve done. Comes with a variety of visuals in different formats – landscape, portrait, infographics, animated films etc. And supplying images in multiple formets – making our job to package it easier – makes a big difference. Is the story important enough for us to send someone out to take new images? Maybe not. BUt actually don’t send 6MBs of materials is not good – so send a press release linking to resources.

So, journalists. Do send releases etc to a generic news email addresses. Use tools like Twitter and LinkedIn to find journalists with an interest in your subject, message them direct. Provide advance warning, reminders, photo and filming opportunities. Don’t do it at the weekend – no TV will come. Do it at a lunchtime on a weekday… PRactical stuff. If no one shows up, don’t worry about it, do send them pictures etc. And if there is one place that you really really want to be featured in, offer it as an exclusive and see it works. Obviously I’d like that to be me… BUt that’s something useful to hold back ni that way…

And, lastly, humour works. If you can find something daft, and can present it in a funny way… Our story “What if Back to the Future was set in Glasgow” is the second most ready story on our website having gone up yesterday. Most read story in the last year on STV was a very tall man who using the bathroom had a hand dryer calamity – that did great and almost made the front page of Reddit. We can be too serious… Be fun. Share the 15 things that happened in this project that were most funny, say… Humour works.

And with that we turned to some really interesting questions and discussion – huge thanks to all who came along and took part in this.

Whilst he was in Edinburgh for this event Dave Kilbey was also able to give an interview for the CSCS Network website, which you can watch there, or in the embed below:

YouTube Preview Image

Huge thanks to Dave and Ally for making the time to come along and speak to the CSCS network who I know really appreciated their presentations and sharing of experience. Huge thanks too to the lovely CSCS network team for providing a space for this event and support for our speakers and their travel. 

Oct 202015
 
Digital Footprint campaign logo

I am involved in organising, and very much looking forward to, two events this week which I think will be of interest to Edinburgh-based readers of this blog. Both are taking place on Thursday and I’ll try to either liveblog or summarise them here.

If you are are based at Edinburgh University do consider booking these events or sharing the details with your colleagues or contacts at the University. If you are based further afield you might still be interested in taking a look at these and following up some of the links etc.

Firstly we have the fourth seminar of the new(ish) University of Edinburgh Crowd Sourcing and Citizen Science network:

Citizen Science and the Mass Media

Thursday, 22nd October 2015, 12 – 1.30 pm, Paterson’s Land 1.21, Old Moray House, Holyrood Road, Edinburgh.

“This session will be an opportunity to look at how media and communications can be used to promote a CSCS project and to engage and develop the community around a project.

The kinds of issues that we hope will be covered will include aspects such as understanding the purpose and audience for your project; gaining exposure from a project; communicating these types of projects effectively; engaging the press; expectation management;  practical issues such as timing, use of interviewees and quotes, etc.

We will have two guest presenters, Dave Kilbey from Natural Apptitude Ltd, and Ally Tibbitt from STV, followed by plenty of time for questions and discussion. The session will be chaired by Nicola Osborne (EDINA), drawing on her experience working on the COBWEB project.”

I am really excited about this session as both Dave and Ally have really interesting backgrounds: Dave runs his own app company and has worked on a range of high profile projects so has some great insights into what makes a project appealing to the media, what makes the difference to that project’s success, etc; Ally works as STV and has a background in journalism but also in community engagement, particularly around social and environmental projects. I think the combination will make for an excellent lunchtime session. UoE staff and students can register for the event via Eventbright, here.

On the same day we have our Principal’s Teaching Award Scheme seminar for the Managing Your Digital Footprints project:

Social media, students and digital footprints (PTAS research findings)

Thursday, 22nd October 2015, 2 – 3.30pm, IAD Resources Room, 7 Bristo Square, George Square, Edinburgh.

“This short information and interactive session will present findings from the PTAS Digital Footprint research http://edin.ac/1d1qY4K

In order to understand how students are curating their digital presence, key findings from two student surveys (1457 responses) as well as data from 16 in-depth interviews with six students will be presented. This unique dataset provides an opportunity for us to critically reflect on the changing internet landscape and take stock of how students are currently using social media; how they are presenting themselves online; and what challenges they face, such as cyberbullying, viewing inappropriate content or whether they have the digital skills to successfully navigate in online spaces.

The session will also introduce the next phase of the Digital Footprint research: social media in a learning & teaching context.  There will be an opportunity to discuss e-professionalism and social media guidelines for inclusion in handbooks/VLEs, as well as other areas.”

I am also really excited about this event, at which Louise Connelly, Sian Bayne, and I will be talking about the early findings from our Managing Your Digital Footprints project, and some of the outputs from the research and campaign (find these at: www.ed.ac.uk/iad/digitalfootprint).

Although this event is open to University staff and students only (register via the Online Bookings system, here), we are disseminating this work at a variety of events, publications etc. Our recent ECSM 2015 paper is the best overview of the work to date but expect to see more here in the near future about how we are taking forward this work. Do also get in touch with Louise or I if you have any questions about the project or would be interested in hearing more about the project, some of the associated training, or the research findings as they emerge.

May 282015
 
Image of the first CSCS seminar

This morning I am at the first seminar arranged by the University of Edinburgh Citizen Science and Crowdsourced Data and Evidence Network. The Network brings together those interested in citizen science and crowdsourcing from across the organisation and this event is also supported by the Academic Networking Fund, IAD. Today’s seminar looks at the Zooniverse crowdsourcing organisation and suite of projects with two guest speakers, and I’ll be taking live notes here. As usual, because these are live notes there may be errors, typos, formatting issues, etc and corrections are welcomed. 

We are starting our day with an introduction by James Stewart on the focus of the network, which will particularly focus on methodological approaches.

Grant Miller (Zooniverse): ‘The Zooniverse – Real Science Online’

About Grant and his talk:

‘The Zooniverse is the world’s largest and most successful citizen science platform. I will discuss what we have learned from building over 40 projects, and where the platform is heading in the future.’ (Website: https://www.zooniverse.org/)

Grant Miller is a recovering astrophysicist who gained his PhD from the University of St Andrews, searching for planets orbiting distant stars. He is now the communications lead for the Zooniverse on-line citizen science platform.

I had kind of a weird introduction into crowdsourcing and citizen science.. But the main thing I will be talking about today is about how we engage the Zooniverse community to participate and enjoy doing that and being part of our community.

Zooniverse all started with Kevin, a student at Oxford who was tasked with looking at thousands of images of the universe to find two sorts of galaxies: eliptical galaxies and spiral galaxies. He had a million to classify. He did 50,000 and then met with his supervisor and had some strong arguements: he didn’t want to spend his whole academic career classifying galaxies, and he argued that it didn’t require his training. So, by show of hands who thinks this image of a galaxy (we are looking at one of many) is an eliptical, how many think it is a spiral? The room votes that this is a spiral and it is indeed a spiral – and that’s basically how Zooniverse works. We show an image, we ask people what it is, and they choose. And people, en mass, really went for this. They went through huge amounts of images very quickly.

Other things started to happen to… The first community around the project was the Galaxy Zoo forum. A participant called Hanny found a thing (vootwerp)… It didn’t look like the galaxies she was classifying. This was a completely new astronomical phenomenon, which was never known about. An amateur had found this through this very simple platform. People aren’t just good at recognising patterns, they also get distracted and find new things. And after discovering and publishing on this phenomenon – a huge cloud of gas associated with a galaxy – a group from the community decided to make a project of looking for more of these in other Galaxy Zoo images. And this is why communities are so brilliant. On another project our community found a whole new worm under the sea. That’s the power of having this community taking part.

So, how do we do this? Well we really simplify the language of the task, make it easy for people to take part. And when Galaxy Zoo took off we found other scientists and researchers approaching us to build new projects including humanities projects, and biological projects. So we set up projects such as Snapshot Serengeti – used to indicate what you can see in images from camera traps on the Serengeti. I was working with a group of computer scientists trying to work out how to identify the object in the image, and also my 4 year old nephew… and he said in seconds, the computer scientists are still looking for a solution.

So at this point in time we now have 42 projects in the Zooniverse. Old Weather in 2010 was our first humanities project. It started as a climatology project, but because it was using historic ship logs and those include so many other types of data we found humanities researchers and historians coming on board so it has had a second life. We have other humanities projects, cancer research projects, etc. Of those projects about 30-35 are currently live. We think this will expand rapidly soon but I’ll come back to that. And last year we passed the 1 million volunteer mark, that’s registered volunteers. Mostly those are in Western Europe and North America, but we have participants in 200 countries (7 countries have not).

The community is expanding, the projects are expanding… But there is a lot of potential out there, a huge cognitive surplus we could be using. For instance Clay Shirky notes that 200 billon hours are spent watching TV by adults in the UK, it took only 100 million hours to create Wikipedia. We are only beginning to tap that potential. On January 7th last year we relaunched a project called Space Warps – we had over a million classifications an hour – when Prof Brian Cox and Dara O’Brien asked the public to do it on live TV. That meant that overnight we had discovered an object it can take astronomers years to discover. It’s good but it’s no 200 billion hours… Imagine what you could do with that much time. Every hour there are 16 years worth of human effort spent playing Angry Birds… How do we get that effort into citizen science?

So, if gamification the way to go? For those working in citizen science you could probably run a week long conference just on whether you should or should not do gamification. We have decided not to but some of the most successful – foldit and Eyewire – do use it. Those projects gave huge thought about how to ensure participants reward efforts in the right way so that people don’t just game the system. For us we are worried that that won’t work for us, not convinced we would be good enough building a game and end up with something neither game nor citizen science. But some of our projects have tried gamification and we have studied this. On Galaxy Zoo we used a leader board to start with but that caused some tension: those in the lead were doing hundreds of thousands of classifications and people felt the leaders might have cheated, others felt that they could never get there so just left. On Old Weather we enabled those participants who focused on a particular ships log could become captain – but it put off as many people as it attracted. And those who became captain had nowhere to go.

This comes back to motivation for taking part. When we do ask our volunteers frequently it comes down to those participants wanting to contribute to research. So, for instance, The Andromeda project involved images that weren’t that exciting… They were asked to circle clusters of galaxy. The task is simple, they feel they are really contributing… They finished the task in a week. This time, when we had finished we put up a message thanking participants for their contribution, saying that we had enough for the paper, but they were welcome to carry on… And that shows a rapid fall down to zero participation – they were only interested while the task at hand was useful. And that pattern reminds us not to mess with our community, they use precious spare time and they want to be doing something useful and meaningful.

Planet Hunters is a project we used to detect planets based on data. People don’t take part to discover planets, it is because they really are interested in the science. Some of our really active participants choose to download the data, write their own code, doing work at PhD level as a volunteer and sending data back… The planets discovered in that project are rare and weird – things we didn’t spot with algorithms – the first one found had 4 suns. And recently we found a seven planet solar system, the largest other than our own .

Volunteers are keen to go further, so we have a discussion area – labelled Talk – for all of our projects. That means you can comments, Twitter style, or you can use old style discussion boards for long form discussions. Those areas are also used by the scientists, the researchers, the technical teams and developers, and the community can interact with them there – the most productive findings often come from that interaction between volunteers and scientists. The talk areas of our community are really important. In fact we have a network diagram for our community we can see some of our most active participants  – one huge green blob on this diagram is a wonderful woman called Elizabeth who posts and comments, and moderates, helps fellow volunteers come along. And we are looking at those networks, at who those lynchpins are, etc.

I said that people write their own code, do their own analysis… So can we get that on the site? We have been playing with the tools area, which we’ve tried this for Galaxy Zoo and for Snapshot Serengeti. We’ve been funded to build a broader set of tools, to map data, etc. from the website itself.

One of the other big things we are trying to do is to translate the site. For instance here is Galaxy Zoo in traditional character Mandarin. And we are doing this through crowdsourcing. You pick your site, and you show words or sections for users to translate. I talked about understanding the community and their interest and motivation. You also need to understand how we allocate images etc. We have done it based on seen/not seen but have been toying with the idea of shaping what images you see based on what you have seen, or are good at, or particularly like or are good at identifying. We tried that, shaping images to suit interested folk. When we tried that it wasn’t that successful, this was on Snapshot Serengeti, and realised we hadn’t been showing them blank images… So we looked at usage data to see to what extend seeing blank images impacts classifying images. It seems that the more blank images a user sees, the more they classify. When you classify a few/lots in one go they leave the site sooner. But psychologically we aren’t sure why this is yet – to classify a blank image its one click, that’s quick… But also what is the reward there for that image – is it just as rewarding to classify a blank image. There seems to be a sweet spot here… The same team trying to automatically spot a zebra has also been looking at identifying anything being in the image… But doing that may mean they leave the site sooner so we could be shooting ourselves in the foot…

So, we’ve been thinking who should see what? And as part of that we have been trying, with some of the space image projects, putting some simulated images into the mix  to rank/detect expert level – and looking at that in comparison to their experience/expert level within the system. We want to see if there is a smarter way to do a Zooniverse project.

The other thing that can happen is fear, a sort of classification anxiety. For instance for cancer images people can be quite scared to click the button and contribute to the research. So we are toying with showing volunteers how the consensus clustering works – so we can show people that their marking counts but that they are backed up by the wisdom of the crowds we think that may help them trust themselves. At the moment we just blog about this stuff, but how can we show this on the site.

Panoptes is our new infrastructure platform, which we’ve been building for the last year, built with 2 million dollars of funding from Google. And the first project using this appeared on Stargazing Live this year, looking for Super Novas. We discovered five Super Novas during the week long run of that programme. That project on panoptes is infrastructure we will be building projects on, but anyone can run projects on this site. You can build your own project with name, introduction, research case, work flows – mark an ellipse, answer a question, etc. Then you upload your subjects/data as images. Scientists were building things in half an hour that would have taken our developer six months during our trials here. We will be launching our beta today, and launching fully over the next two weeks… There are still only two types of work flows at launch: tree logic, and classifying. But there are still so many other questions and tasks to do – but we hope to tackle and add facilities later on, notably: humanities/transcription – consensus being the main problem there; audio; and video. We have tried audio and video before but they won’t be in the first iteration of Panoptes. And we still have to answer the question of whether audio or video can work for citizen science – they are not that popular in our experience, but maybe that is about the projects not the format… There are still lots of questions to answer.

Q&A

Q1) Can you say more about social motivation here. But also what about subjectivity and objectivity here – and how much opportunity there is to learn, how you become more able to identify things that have previously been ambiguous. Your predecessor talked about people popping on for a few minutes, not gaining

A1) For citizen science, crowdsourcing and volunteering generally the majority of people do just pop in briefly. The learning is often through the discussion areas. But we do see that people who do more classifications become better at it… And we see that the most comments people do post in discussion, the more technical detail or terminology they include. But we are also trying to actively teach our volunteers. When I came in we started looking at ways to go further than the data processing – I wanted to create an educational course for Planet Hunters, maybe a 25 slide course that could appear every few classifications through an invitation to take part every 10 classifications. People did opt in to that… And we thought that would improve classifications and keep volunteers in the system, as well as supporting them to learn. But we are still looking at ways to educate through the site.

Q2) Can you say more about who decides which projects are made live? So many research communities in the world, who’s using the data? Also is there any communications between the volunteers and the scientists?

A2) The process, until now, was that we got grant money to build citizen science projects and we put out calls for proposals. People would come to us with a case, and we would decide in-house as a team which seemed worth doing, were buildable, might be interested to try. Research output was always put first – they had to have a good research case. We would get 50-100 proposals and build 5 per year. But that has led to the new infrastructure. There is huge demand for citizen science, and all areas of science have huge amounts of data… But to some extent the problem still exists… I could put up 100,000 pictures when this platform goes live, so we will still be reviewing and filtering projects before they can be become official Zooniverse projects. So you can use the platform to build private projects etc. but before they can be on the homepage they will be filtered etc, tested in beta, rated by the crowd, etc. On the communication front – that’s mainly on discussion boards. And each participant had a suitable label – you can tell who the researchers are. So when Hanne made her discovery that was discussion boards and researchers following up and discussing that. But some of our volunteers and science teams do their own thing with google groups, hang outs etc.

Q3) I’m interested in your use of the word “discovery” and what that might mean. That end point is easy to attribute, but how do you credit all that prior work?

A3) The first author for the Planet Hunters project is that research team, then us, then those who have classified the planets. We try to attribute credit there. We are trying to work out how to credit everyone who has ever taken part – on the website, not on the papers – but it is now more complex. Even just in science it is complex – there are 30 people on that paper discovering a new planet… It becomes really properly collaborative and hard to credit. We try to recognise anyone we thin

Q4) In general, but particularly thinking about the new platform, how are you handling the moderation of images, data and discussion – there seems to be potential for really problematic trolling/inappropriate activity here, but also legitimate but inappropriate images.

A4) We looked at various sites where you can upload images. We liked Flickr’s privacy policy – we can’t review all the images or monitor all those projects, especially the private ones. So we rely on if we do find something, we will remove them. Sharing our ideals… And there is a grey area where people might share adult material but in a legitimate research project 0 that will be case by case. In terms of comments etc. we do have moderators who can flag or delate comments, or can talk to volunteers about that. And we will keep those for people who moderate or have admin rights.

Q5) What do you mean by private projects?

A5) You will be able to create a project and share only with those you send a link to. So we won’t be able to review them all. Hopefully they will be built by those genuinely trying to run a research project but we know people could use or abuse that facility, so we will state our policy and will delete anything that we need to, and to report to authorities if needed.

Q6) Researchers can already pay to use crowdsourcing, is that something you will be doing? e.g. Crowd Power, Mechanical Turk.

A6) In theory someone could offer financial rewards for a project running on the platform, we won’t facilitate that in the infrastructure and we will be sharing our ideals and policies. I have no problem for financial incentives as long as that is above board, but that’s not our model and not what we are offering.  And there are serious citizen science questions about data quality where people are working for financial rewards. But it will be interesting to see what happens over the coming months.

Q7) Will all projects stay there forever?

A7) We already review our own projects. We do not want to waste people’s time. We will impress this on those using the new platform. And we will also make it possible for people to share the final products – papers etc – of those projects. Right now we have archive sites for our projects, we link to a GitHub site for retired projects, data etc.

Q8) Looking at loyalty for different projects. Presumably you have a small number doing large amounts of work… Does that pattern of loyalty track to different projects or do they only get very loyal about one project?

A8) In the past we deliberately separated our projects, we didn’t make great efforts to encourage volunteers to work across the projects, making it hard to switch between them. We’ve been thinking a lot about this when we think about delivering the right data to the right user, we are also thinking about letting volunteers know about the projects that will be of interest.

Image showing consensus classifications in Galaxy Zoo

Grant shows an image annotated with consensus classifications in Galaxy Zoo

Mark Hartswood (Oxford University & CSCS Data and Evidence network founder): ‘Intervening in Citizen Science: From incentives to value co-creation’

About Mark and his talk:

‘This talk reflects upon a collaboration between SmartSociety, an EU project exploring how to architect effective collectives of people and machines, and the Zooniverse,  a leading on-line citizen science platform.

Our collaboration tackled the question of how to increase engagement of Zooniverse volunteers. In the talk I will chart how our thinking has progressed from framing volunteering in terms of motivation and incentives, and how it moved towards a much richer conceptualisation of multiple participating groups engaging in complicated relationships of value co-creation.’

Mark Hartswood is a Social Informatician whose main employer is Oxford University and currently working in the area of Responsible Research and Innovation.

I am going to start with an answer to one of Grant’s questions.. volunteers find it fun to see a surprisung image – building up hope and tension for an exciting image… I’d taken this slide out of my slides but I thought I’d add it back in…

Grant: Isn’t it great when you see the same answer in two different places!

Mark: In my talk proper I’ll be talking about motivations for participation, and I will be looking at several projects here SOCIAM, Smart Society (which I work on) and Zooniverse, with acknowledgements to my colleagues on the study I will be talking about.

Our colleagues at Ben-Gurion University of Negev have been looking at incentive schemes for crowd sourcing, and Zooniverse offered us an opportunity to try this out with a group of real volunteers…

Our study in a nutshell was:

  • Auto ethnography – exploring Zooniverse as a volunteer
  • Survey of Zooniverse participants, looking at motivation, anxiety, engagement, disengagement. Targeted at volunteers actve in last three months
  • Develop an intervention to re-engage volunteers (essentially an email)
  • Intervention successful…

But that’s not the story I want to tell today. I want to talk about conceptualising citizen science as co-creation of value, looking at the literature and moving to a co-creation of value approach.

Literature wise: Peer production has been posed as a problem for economists in terms of understanding motivation (Benkler). Motivation for citizen science is important but it seems hard to properly explain. Roddich et al found motivations were multiple and compound – from appreciating scale and beauty of universe, supporting scientific process, personal connection to the project. There can be real mix. And they give complex narratives. Motivations are also shown to be dynamic, they change, evolve, wax or wane (Rotman et al). And motivation is non exhaustive in explaining participation – Eveleigh et al shows that people may be highly motivated but not have time/be able to participate in practice.

Coupled with motive are issues of reward and inventives. Often in the literature motives are coupled with the idea that the right motives can lead to use of rewards or inventives. Incentives seen to generate interest, sustain engagegemnt and improve quality in citizen science according to Prestopnik et al. Or exerting a form of leverage. Or “programming” participation (Maggi et al?).

So Dickinson et al (2012) looked at incentives and rewards. But there are some confusing combination of badges and certificates as incentives, discussion as social incentives, and other incentives. Building community and recognising effort are also part of the mix. There are real mixes of social individualised approaches, and more social processes.

There are some real problematic areas here. Kittur et al that motivation must be there first, incentives should just align otives to desiered behaviour. Gamification could produce ambivalent results in citizen science (Darch, Preist et al). Incentives can create perverse outcomes as well (Sneddon et al).

We want to not ask what motivates people, but ask how participation creates value for participants and for others. So what is co-creation of value? It has its origins in commerce and value. The idea is that value is created in the factory and delivered to the consumer, in the past. Currently the customer is active in creating the value of the product or service. That includes promoting the product, design of new products, aiding diffusion. Flows of value to the business, the customer, and to other customers – see for instance WetSeal which enables customers to combine garments into collections, to share those, to share images of themselves in garments, etc.

So, in science we can see co-creation of value in citizen science. In a mature platform like Zooniverse there are complex types of values shared. Different forms of value are shared by participants. There are diverse reasons to participate, very varied levels of participation by individuals. There is a difference between value made collectively (e.g. casual users who make only a few classifications), and value made individually (the few who make many classifications). And we see those conversations on forums on, say, anomolies, and scientist responses to those… add values to the community, become resources for the community, and scientist blog posts also add to that, and help acknowledge the role of volunteers. And participants also build social capital via social media, which also promotes the platform. And contributed data and project outputs we see materials like star catalogues becoming available for individuals to use in their own research.

So there are complex forms of value, and those values interact. Changes in incentives can therefore change dynamics in this web of value.

Looking at a scientist blog post “There’s a green one and a pink one and a blue one and a yellow one” – beginning with an image visualising all the contributions of a community, from super active participants, through to those making a few each. The text of this post speaks to the delicacy of talking about participants in a project with those dynamics, acknowledging contribution of all forms and emphasising that volume is not the only measure. The post is artfully written to achieve a number of delicate balances. The crowd each has to be acknowledged as valuable. It would be easy to praise the highly active participants, and dismiss casual participants, and this post carefully avoids any sense of jealousy, unfairness, etc.

If we have complex dynamics in these webs of value and co-creation, what happens when incentives explicitly value one type of contribution over another. And that brings us back to the effects of gamification. So, looking at Old Weather, where contributions enabled you to rise to the rank of captain… The leaderboard explicitly values volume of contribution. For non gamers game elements can be demotivating, and the heights of the leaderboard looks inaccessible (see Darch). But also leader borads can set a normative standard for contribution that demotivates the long tail (Preist et al). So, we think a co-creation model enables us to better understand the impact of changing the dynamics through incentives.

This takes us back to the inventions we looked at in our study… And comments from Zooniverse participants. In terms of how volunteers became disengaged that was about boredom/forgetting about the project, about distractions from work or home, and people said that to motivate them an email when they haven’t logged in might work. So we looked at an email to remind volunteers about zooniverse.

But there were other reasons too. Ideas about achieving a level of mastery, and if you are not reaching that it isn’t valuable, or fear of classifying in case of mistakes. And there we think an incentive that might be effective is reassurance about classification anxiety.

We also saw volunteers unware of other projects being available to participate in – which can be resolved through sign posting to other projects.

So, benefits of a co-creation perspective…

  • More symmetical idea – motives held by volunteer and incentives are things you do to the volunteer
  • Less individualistics – explains more complex relationships and dynamics between both participating individuals and groups
  • Don’t want to reject incentives or motivations – but want to put them in broader non-individualistic framework
  • Opens up a broader framework for design e.g. around diagnosing and repairing problems where participants fail to realise value for themselves or each other
  • Provides access to thinking about value and values and ethics dilemmas in participatory citizen science based on principles of mutuality and equitability
  • Much of this is half-articulated in the citizen science literature – but moving away from the language and logic of incentives and motives helps realse it more fully.

Q1) I think you’ve both given brilliant talks on the motivations of students in learning environments – that’s my area and educators have been looking at this for some time. With intrinsic and extrinsic motivations. Is that something you are looking at?

A1) Is there a whole area of literature here then?

Q1) Betty Collis comes to mind on the issue of co-creation. But yes, there is a literature there in education.

A1) It would be interesting to make those connections there…

Comment) I think that you are also talking about the psychology of learning, and there are really different motivations there, some quite instrumental… Do you have any thoughts on that based on what you have seen in Zooniverse?

A2) I am certainly still exploring this area. But I think the idea that motivations are a priori has to be challenged. Zooniverse creates a space for volunteers to be challenged by things they may have never thought of before.

Q2) And what incentives would you recommend for an online learning forum

A2) There is that diversity… And that is quite healthy. And we don’t neccassarily want to convert all this sort of person, into that sort of person. Zooniverse is pretty successful in creating lots of different sorts of rooms – to participate in different sort of ways. Catering to that diversity, and accepting that, is actually sort of important.

Comment) A lot of the crowdsourcing systems in commercial academic fields started very nievely – individualised collective intelligence idea… realising the wisdom of the crowd but then seeing the community collaborating and changing things… So now we see discovering of the world of people, normal dynamics… But also new things are brought to that space… Mutual new ideas that can help fields think about social organisation and motivation and things…

Comment) You are seeking to do something different to us (educators) but you are similarly trying to avoid negative experiences through cliques, and you also don’t want to create that.

Grant) We had a Zooniverse discussion board, with many early super users… They were quite cliquey. They were not hostile but almost too much too soon for someone new coming in. They were using technical language, showing their knowledge, perhaps feeling or behaving in quite entitled ways. So we do think about how we get people to form a healthy community… And it’s not something we have solved…

Comment) And you haven’t written that up, as that would be divisive.

Grant) Indeed, but we have been looking at new ways to tackle that potential issue – breaking down walls between projects being part of that – by relaunching talk. We find commentators wanting a count of how many comments they have made – and we don’t want to convey authority in that way. It is common in forums but we don’t want to do that.

Comment) But people do invest time and knowledge… So levelling everyone to the shame can diminish contribution.

Grant) I like that blog post Mark highlighted for it’s approach to acknowledging contributions of all types. We have to think about how to reward everyone, without alienating the other types of contributions.

Mark) It’s not so much about levelling, but about emergent politics about values. And being thoughtful of those dynamics.

Comment) But to some extent you’ll never understand the reasons for participation. There was a US project with two users who were way ahead… proved to be a guy and his father in law competing!

Grant) There are a whole bunch of compound motivations – some may be petty, some may be

Mark) We had some really lovely motivations and some really sad ones – terminally ill people wanting to make a contribution for instance. But there were also motivations that were total turn offs – some wanted to look at alien worlds, some found that disturbing or frightening. People had really individual perspectives.

Comment) You’ve talked about people sharing what they do to social media accounts – bi-passing a lack of gamification by sharing in that way!

Grant) That is implemented more for sharing a lovely image – it’s not about numbers but sharing something interesting. We have talked about the idea – and have some new funding – to build a native Facebook app for four of our projects… But that sort of issue may arise there. Whether personal announcement is motivating or not.

Comment) More open platforms does enable more entrepreneurship and different approaches.. It becomes a game perhaps… Could be other things to search for… Scrapbooking the loveliest images, new ways into projects.

Grant) We are wary of gamification, but it can create motive for some but it is kind of treacherous. We have also seen volunteers make their own games out of ungamified projects – tracking how many animals or types of galaxies etc. they have seen. There are some who like the idea of a gamified Zooniverse project.

Q) How representative do you think the Zooniverse volunteers are – they are very heavily studied as a group, and the literature looks at very few niche groups but how do they compared to that big pool of untapped talent – that 200 billion hours.

A) Demographically it was a very flat age range – very level participation across age ranges. Participants tended to be quite highly educated. So a lot of untapped reserves would be about that less educated range of people perhaps.

Grant) One of the things we indicated in our funding we do have that flat age range, but we also have Facebook likes and that lets us see detailed demographic age range. We saw a massive discrepancy there with loads of young people, those under 25 who were interested on Facebook but didn’t participate on the Zooniverse projects.

Mark: Under 18s weren’t in our study for ethical reasons…

Grant: But even looking just at 18-25 year olds that discrepancy between the Facebook likes and the participation applied.

Comment) Just on that gamification front, it does work but why it works is really an issue.

And with that we are closing the session… This event has really shown the value of combining very different people in the room… That breadth of interests etc. And I think that bodes well for our network as a whole, and that will hopefully add real value to our events in the future.