Aug 092016
 
Notes from the Unleashing Data session at Repository Fringe 2016

After 6 years of being Repository Fringe‘s resident live blogger this was the first year that I haven’t been part of the organisation or amplification in any official capacity. From what I’ve seen though my colleagues from EDINA, University of Edinburgh Library, and the DCC did an awesome job of putting together a really interesting programme for the 2016 edition of RepoFringe, attracting a big and diverse audience.

Whilst I was mainly participating through reading the tweets to #rfringe16, I couldn’t quite keep away!

Pauline Ward at Repository Fringe 2016

Pauline Ward at Repository Fringe 2016

This year’s chair, Pauline Ward, asked me to be part of the Unleashing Data session on Tuesday 2nd August. The session was a “World Cafe” format and I was asked to help facilitate discussion around the question: “How can the respository community use crowd-sourcing (e.g. Citizen Science) to engage the public in reuse of data?” – so I was along wearing my COBWEB: Citizen Observatory Web and social media hats. My session also benefited from what I gather was an excellent talk on “The Social Life of Data” earlier in the event from the Erinma Ochu (who, although I missed her this time, is always involved in really interesting projects including several fab citizen science initiatives).

I won’t attempt to reflect on all of the discussions during the Unleashing Data Session here – I know that Pauline will be reporting back from the session to Repository Fringe 2016 participants shortly – but I thought I would share a few pictures of our notes, capturing some of the ideas and discussions that came out of the various groups visiting this question throughout the session. Click the image to view a larger version. Questions or clarifications are welcome – just leave me a comment here on the blog.

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

If you are interested in finding out more about crowd sourcing and citizen science in general then there are a couple of resources that made be helpful (plus many more resources and articles if you leave a comment/drop me an email with your particular interests).

This June I chaired the “Crowd-Sourcing Data and Citizen Science” breakout session for the Flooding and Coastal Erosion Risk Management Network (FCERM.NET) Annual Assembly in Newcastle. The short slide set created for that workshop gives a brief overview of some of the challenges and considerations in setting up and running citizen science projects:

Last October the CSCS Network interviewed me on developing and running Citizen Science projects for their website – the interview brings together some general thoughts as well as specific comment on the COBWEB experience:

After the Unleashing Data session I was also able to stick around for Stuart Lewis’ closing keynote. Stuart has been working at Edinburgh University since 2012 but is moving on soon to the National Library of Scotland so this was a lovely chance to get some of his reflections and predictions as he prepares to make that move. And to include quite a lot of fun references to The Secret Diary of Adrian Mole aged 13 ¾. (Before his talk Stuart had also snuck some boxes of sweets under some of the tables around the room – a popularity tactic I’m noting for future talks!)

So, my liveblog notes from Stuart’s talk (slightly tidied up but corrections are, of course, welcomed) follow. Because old Repofringe live blogging habits are hard to kick!

The Secret Diary of a Repository aged 13 ¾ – Stuart Lewis

I’m going to talk about our bread and butter – the institutional repository… Now my inspiration is Adrian Mole… Why? Well we have a bunch of teenage repositories… EPrints is 15 1/2; Fedora is 13 ½; DSpace is 13 ¾.

Now Adrian Mole is a teenager – you can read about him on Wikipedia [note to fellow Wikipedia contributors: this, and most of the other Adrian Mole-related pages could use some major work!]. You see him quoted in two conferences to my amazement! And there are also some Scotland and Edinburgh entries in there too… Brought a haggis… Goes to Glasgow at 11am… and says he encounters 27 drunks in one hour…

Stuart Lewis at Repository Fringe 2016

Stuart Lewis illustrates the teenage birth dates of three of the major repository softwares as captured in (perhaps less well-aged) pop hits of the day.

So, I have four points to make about how repositories are like/unlike teenagers…

The thing about teenagers… People complain about them… They can be expensive, they can be awkward, they aren’t always self aware… Eventually though they usually become useful members of society. So, is that true of repositories? Well ERA, one of our repositories has gotten bigger and bigger – over 18k items… and over 10k paper thesis currently being digitized…

Now teenagers also start to look around… Pandora!

I’m going to call Pandora the CRIS… And we’ve all kind of overlooked their commercial background because we are in love with them…!

Stuart Lewis at Repository Fringe 2016

Stuart Lewis captures the eternal optimism – both around Mole’s love of Pandora, and our love of the (commercial) CRIS.

Now, we have PURE at Edinburgh which also powers Edinburgh Research Explorer. When you looked at repositories a few years ago, it was a bit like Freshers Week… The three questions were: where are you from; what repository platform do you use; how many items do you have? But that’s moved on. We now have around 80% of our outputs in the repository within the REF compliance (3 months of Acceptance)… And that’s a huge change – volumes of materials are open access very promptly.

So,

1. We need to celebrate our success

But are our successes as positive as they could be?

Repositories continue to develop. We’ve heard good things about new developments. But how do repositories demonstrate value – and how do we compare to other areas of librarianship.

Other library domains use different numbers. We can use these to give comparative figures. How do we compare to publishers for cost? Whats our CPU (Cost Per Use)? And what is a good CPU? £10, £5, £0.46… But how easy is it to calculate – are repositories expensive? That’s a “to do” – to take the cost to run/IRUS cost. I would expect it to be lower than publishers, but I’d like to do that calculation.

The other side of this is to become more self-aware… Can we gather new numbers? We only tend to look at deposit and use from our own repositories… What about our own local consumption of OA (the reverse)?

Working within new e-resource infrastructure – http://doai.io/ – lets us see where open versions are available. And we can integrate with OpenURL resolvers to see how much of our usage can be fulfilled.

2. Our repositories must continue to grow up

Do we have double standards?

Hopefully you are all aware of the UK Text and Data Mining Copyright Exception that came out from 1st June 2014. We have massive massive access to electronic resources as universities, and can text and data mine those.

Some do a good job here – Gale Cengage Historic British Newspapers: additional payment to buy all the data (images + XML text) on hard drives for local use. Working with local informatics LTG staff to (geo)parse the data.

Some are not so good – basic APIs allow only simple searchers… But not complex queries (e.g. could use a search term, but not e.g. sentiment).

And many publishers do nothing at all….

So we are working with publishers to encourage and highlight the potential.

But what about our content? Our repositories are open, with extracted full-text, data can be harvested… Sufficient but is it ideal? Why not do bulk download from one click… You can – for example – download all of Wikipedia (if you want to).  We should be able to do that with our repositories.

3. We need to get our house in order for Text and Data Mining

When will we be finished though? Depends on what we do with open access? What should we be doing with OA? Where do we want to get to? Right now we have mandates so it’s easy – green and gold. With gold there is PURE or Hybrid… Mixed views on Hybrid. Can also publish locally for free. Then for gree there is local or disciplinary repositories… For Gold – Pure, Hybrid, Local we pay APCs (some local option is free)… In Hybrid we can do offsetting, discounted subscriptions, voucher schemes too. And for green we have UK Scholarly Communications License (Harvard)…

But which of these forms of OA are best?! Is choice always a great thing?

We still have outstanding OA issues. Is a mixed-modal approach OK, or should we choose a single route? Which one? What role will repositories play? What is the ultimate aim of Open Access? Is it “just” access?

How and where do we have these conversations? We need academics, repository managers, librarians, publishers to all come together to do this.

4. Do we now what a grown-up repository look like? What part does it play?

Please remember to celebrate your repositories – we are in a fantastic place, making a real difference. But they need to continue to grow up. There is work to do with text and data mining… And we have more to do… To be a grown up, to be in the right sort of environment, etc.

Q&A

Q1) I can remember giving my first talk on repositories in 2010… When it comes to OA I think we need to think about what is cost effective, what is sustainable, why are we doing it and what’s the cost?

A1) I think in some ways that’s about what repositories are versus publishers… Right now we are essentially replicating them… And maybe that isn’t the way to approach this.

And with that Repository Fringe 2016 drew to a close. I am sure others will have already blogged their experiences and comments on the event. Do have a look at the Repository Fringe website and at #rfringe16 for more comments, shared blog posts, and resources from the sessions.