May 162018

Today I am at the Digital Scholarship Day of Ideas, organised by the Digital Scholarship programme at University of Edinburgh. I’ll be liveblogging all day so, as usual, I welcome additions, corrections, etc. 

Welcome & Introduction – Melissa Terras, Professor of Digital Cultural Heritage, University of Edinburgh

Hi everyone, it is my great pleasure to welcome you to the Digital Day of Ideas 2018 – I’ve been on stage here before as I spoke at the very first one in 2012. I am introducing the day but want to give my thanks to Anouk Lang and Professor James Loxley for putting the event together and their work in supporting digital scholarship. Today is an opportunity to focus on digital research methods and work.

Later on I am pleased that we have speakers from sociology and economic sociology, and the nexus of that with digital techniques, areas which will feed into the Edinburgh Futures Institute. We’ll also have opportunity to talk about the future of digital methods, and particularly what we can do here to support that.

Lynn Jameson – Introduction

Susan Halford is professor of sociology but also director of the institution-wide Web Science Institute.

Symphonic Social Science and the Future of Big Data Analytics – Susan J Halford, Professor of Sociology & Director of Web Science Institute, University of Southampton

Abstract: Recent years have seen ongoing battles between proponents of big data analytics, using new forms of digital data to make computational and statistical claims about the social world, and many social scientists who remain sceptical about the value of big data, its associated methods and claims to knowledge. This talk suggest that we must move beyond this, and offers some possible ways forward. The first part of the talk takes inspiration from a mode of argumentation identified as ‘symphonic social science’ which, it is suggested, offers a potential way forward. The second part of talk considers how we might put this into practice, with a particular emphasis on visualisation and the role that this could play in overcoming disciplinary hierarchies and enabling in-depth interdisciplinary collaboration.

It’s a great pleasure to be here in very sunny Edinburgh, and to be speaking to such a wide ranging audience. My own background is geography, politics, english literature, sociology and in recent years computer sciences. That interdisciplinary background has been increasingly important as we start to work with data, new forms of data, new types of work with data, and new knowledge – but lets query that – from that data. All this new work raises significant challenges especially as those individual fields come from very different backgrounds. I’m going to look at this from the perspective of sociology and perhaps the social sciences, I won’t claim to cover all of the arts and humanities as well.

My talk today is based on work that I have been doing with Mike Savage on “big data” and the new forms of practice emerging around these new forms of data, and the claims being made about how we understand the social world. In this world there has been something of a stand off between data scientists and social scientists. Chris Anderson (in 2008), a writer for Wired, essentially claimed “the data will speak for itself” – you won’t need the disciplines. Many have pushed back hard on this. The push back is partly methodological: these data do not capture every aspect of our lives, they capture partial traces, often lacking in demographic detail (do we care? sociologists generally do…) and we know little of its promise. And it is very hard to work with this data without computational methods – tools for pattern recognition generally, not usually thorough sociological approaches. And present concerning, something ethically problematic, results that are presented as unproblematic. So, this is highly challenging. John Goldthorpe says “whatever big data may have for “knowing capitalism” it’s value to social science has… remained open to questions…”.

Today I want to move beyond that stand out. The divisiveness and siloing of disciplines is destructive for the disciplines – it’s not good for social science and it’s not good for big data analytics either. From a social science perspective, that position marginalises social sciences, sociology specifically, and makes us unable to take part in this big data paradigm which – love it or loathe it – has growing importance, influence, and investment. We have to take part in this for three major reasons: (1) it is happening anyway – it will march forward with or without it; (2) these new data and methods do offer new opportunities for social sciences research and; (3) we may be able to shape big data analytics as the field emerges – it is very much in formation right now. It’s also really bad for data science not to engage with the social sciences… Anderson and others made these claims ten years ago… Reality hasn’t really shown that happen. In commercial contexts – recommendations, behaviour tracking and advertising, the data and analysis is doing that. But in actually drawing understanding from the world, it hasn’t really happened. And even the evangelists have moved on… Wired itself has moved to saying “big data is a tool, but should not be considered the solution”. Jeff Hammerbacker (co-credited for coining the term “data science” in 2008, said in 2013 “the best minds of my generation are thinking about how to make people click ads… that sucks”.

We have a wobble here, a real change in the discourse. We have a call for greater engagement with domain experts. We have a recognition that data are only part of the picture. We need to build a middle ground between those two positions of data science and social science. This isn’t easy… It’s really hard for a variety of reasons. There are bodies buried here… But rather than focus on that, I want to focus on how we take big steps forward here…

The inspiration here are three major social science projects: Bowling Alone (Robert Putnam); The Spirit Level – Richard Wilkinson and Kate Pickett; Capital – Thomas Piketty. These projects have made huge differences, influencing public policy and in the case of Bowling Alone, really reshaped how governments make policy. These aren’t by sociologists. They aren’t connected as such. The connection we make in our paper is that we see a new style of social science argumentation – and we see it as a way that social scientists may engage in data analytics.

There are some big similarities between these books. They are all data driven. Think about sociologists at the end of 20th century was highly theoretical… At the beginning of the 21st century we see data driven works. And they haven’t done their own research generating data here, they have drawn on existing research data. Piketty has drawn together diverse tax data… But also Jane Austen quotes… Not just mixed methods but huge repurposing. These books don’t make claims for causality based on data, their claims for causality is supported by theory. However they present data throughout and supporting their arguments. Data is key, with images to hold the data together. There is a “visual consistency”. The books each have a key graph that essentially summarises the book. Putnam talks about social capital, Piketty talks about the rise and fall of wealth inequality in the 20th century.

In each of these texts data, method and visualisation are woven into a repeat refrain, combined with theory as a composite whole to makes powerful arguments about the nature of social life and social change over the long term. We call this a “Symphonic Aesthetic” as different instruments and refrains build, come in and go… and the whole is greater than the sum of the parts.

OK, thats an observation about the narrative… But why does that matter? We think it’s a way to engage with and disrupt big data. There are similarities: re-purposing multiple and varied “found” data sources; an emphasis on correlation; use of visualistion. There are differences too: theoretical awareness; choice of data; temporality is different – big data has huge sets of data looking at tiny focused and often real time moments. Social Science takes long term comparisons – potentially over 100 years. The role of correlation is different. Big data analytics looks for a result (at least in the early stage), in symphonic aesthetics there is a real interest in correlation through statistical and theoretical understandings. Practice of visualisation varies as well. In big data it is the results, in symphonic aesthetics it is part of the process, not the end of the process.

Those similarities are useful but there is much still to do: symphonic authors do not use new forms of digital data, their methods cannot simply be applied, big data demand new and unfamiliar skills and collaborations. So I want to talk about the prospective direction of travel around data; method; theory; visualisation practice.

So, firstly, data. If we talk about symphonic aesthetics we have to think about critical data pragmatism. That is about lateral thinking – redirection of what data exist already. And we have to move beyond naivety – we cannot claim they are “naturally occurring” mirrors/telescopes etc. They are deliberately social-technical constructions. And we need to understand what the data are and what they are not: socio-technical processes of data construction (eg carefully constructed samples); understanding and using demographic biases (go with the biases and use the data as appropriate, rather than claiming they are representative; or maybe ignore that, look at network construction, flows, mobilities – e.g. John Murrey’s work).

Secondly method. We have to be methodologically plural. Normally we do mixed methods – some quantitative, some qualitative. But most of us aren’t yet trained for computational methods, and that is a problem. Many of the most interesting things about these data – their scale, complexity etc. – are not things we can accommodate in our traditional methods. We need to extend our repertoire here. So social network analysis has a long and venerable history – we can apply the more intensive smaller version of large scale social network analysis. But we also need machine learning – supervised (with training sets) and unsupervised (without). This allows you to seek evidence of different perhaps even contradictory patterns. But also machine learning can help you find the structures and patterns in the data – which you may well not know in data sets at this scale.

We have this quote from Ari Goldberg (2015): “sociologists often round up the usual suspects. They enter the metaphorical crime scene every dat, armed with strong and well-theorised hypotheses about who the murderer should or at least plausibly might be.”

To be very clear I am not suggesting we outsource analysis to computational methods: we need to understand what the methods are doing and how.

Thirdly, theory. We have to use abductive reasoning – a constant interplay between data, method and theory. Initial methods may be informed by initial hunches, themes, etc. We might use those methods to see if there is something interesting there… Perhaps there isn’t, or perhaps you build upon this. That interplay and iterative process is, I suspect, something sociologists already do.

So, how do we bring this all together in practice? Most sociologists do not have a sophisticated understanding of the methods; and most computer scientists may understand the methods but not the theoretical elements. I am suggesting something end to end, with both sociologists and computer scientists working together.

It isn’t the only answer but I am suggesting that visualisation becomes an analytical method, rather than a “result”. And thinking about a space for work where both sociological and computer science expertise are equally valid rather than combatorial. At best visualisations are “instruments for reasoning about quantitative information. Often the most effective way to describe, explore and summarise a set of numbers – even a very large set – is to look at pictures of those numbers” (Tufte 1998). Visualisations as interdisciplinary boundary objects. Beyond a mode of argumentation… visualisation becomes a mode of practice.

An example of this was a visualisation of the network of a hashtag that was collaborative with my colleague Ramin, which developed over time as we asked each other questions about how the data was presented and what that means…

In conclusion, sociology flourished in the C20th. Developing methods, data and theory that gave us expertise in “the social” (a near monopoly). This is changing – new forms of data, new forms of expertise… And claims being made which we may, or may not, think are valid. And that stands on the work of sociologists. But there is some promise in the idea of symphonic aesthetic: for data science – data science has to be credible and there is recognition of that – see for instance Cathy O’Neil’s work on data science, “Weapons of Math Destruction” which also pushes in this direction. ; for sociological research – but not all of it, these won’t be the right methods for everyone; for public sociology – this being used in lots of ways already, algorithm sentencing debates, Cambridge Analytics… There is a real place for sociologists to reshape sociology in the public understanding. There are big epistemological implications here… Changing the data and methods changes what we study… But it has always been like that. Big data can do something different – not necessarily better, but different.


Q1) I was really interested in your comments about visualisations as a method… Joanna Drucker talks about visual technology and visual discourse – and issues of visualisations as being biased towards positivistic approaches, and advocates for getting involved in the design of visualisation tools.

A1) I’m familiar with these concepts. That work I did with Ramin is early speculative work… But it builds and is based on classic social network analysis so yes, I agree, that reflects some issues.

Q2 – Tim Squirrel) I guess my question is about the trade off between access and making meaningful critiques. Often sociology is about critiquing power and methods by which power is transmitted. The more data proliferates, the more the data is locked behind doors – like the kind of data Facebook holds. And in order to access that data you ahve to compromise the kinds of critiques you can make. How do you navigate that narrow channel, to make critiques without compromising those…

Q2) The field is quite unsettled… It looks settled a year ago but I think Cambridge Analytica will have major impact… That may make the doors more closed… Or perhaps we will see these platforms – for instance Facebook – understanding that to retain credibility it has to create a segregation between their own use of the data, and research (not funded by Facebook), so that there is proper separation. But I’m not naive about how that will work in practice… Maybe we have to tread a careful line… And maybe that does mean not being critical in all the ways we might be, in every paper. Empirical data may help us make critical cases across the diverse range of scholarship taking place.

Q3 – Jake Broadhurst) Data science has been used in the social world already, how do we keep up and remain relevant?

A3) It is a pressing challenge. The academy does not have the scale or capacity to address data science in the way the private sector does. One of the big issues is ethics… And how difficult it is for academics to navigate ethics of social media and social data. And it is right that we are bound to ethical processes in a way data scientists and even journalists do not need to. But it is also absolutely right that our ethics committees have to understand new methods, and the realities of the gold standard consent and other options where that is not feasible.

The discussion we are having now, in the wake of Cambridge Analytica, is crucial. Two years ago I’d ask students what data they felt was collected, they just didn’t know. And understanding that is part of being relevant.

Q4 – Karen Gregory) If you were taking up a sociology PhD next year, how would you take that up?

A4) My official response would be that I’d do a PhD in Web Science. We have a programme at University of Southampton, taking students from a huge array of backgrounds, and giving them all the same theoretical and methodological backgrounds. They then have to have 2 supervisors, from at least 2 different disciplines for their PhD.

Q5 – Kate Orton Johnson) How do we tackle the structures of HE that prevent those interdisciplinary projects, creating space, time, collaborative push to create the things that you describe?

A5) It’s a continuous struggle. Money helps – we’ve had £10m from EPSRC and that really helps. UKRI could help – I’m sceptical but hopeful about interdisciplinary possibilities here. Having PhD supervision across really different disciplines is a beautiful thing, you learn so much and it leads to new things. Universities talk about interdisciplinary work but the reality doesn’t always match up. Money helps. Interdisciplinary research helps. Collaboration on small scales – conference papers etc. also help.

Q6 – David, research in AI and Law) I found your comments about dialogues between data scientists and social scientists… How can you achieve similar with law scholars and data scientists… Especially if trying to avoid hierachichal issues. Law and data science is a really interesting space right now… GDPR but also algorithmic accountability – legal aspects of equality, protected categories, etc. Very few users of big data have faced up to the risks of how they use the data, and potential for legal challenge on the basis of discrimination. You have to find joint enthusiasm areas, and fundable areas, and that’s where you have to start.

The Economics Agora Online: Open Surveys and the Politics of Expertise – Tod van Gunten, Lecturer in Economic Sociology, University of Edinburgh

Abstract: In recent years, research centres in both the United States and United Kingdom have conducted open online surveys of professional economists in order to inform the public about expert opinion.  Media attention to a US-based survey has centred on early research claiming to show a broad policy consensus among professional economists.  However, my own research shows that there is a clear alignment of political ideology in this survey.  My talk will discuss the value and limitations of these online surveys as tools for informing the public about expert opinion.

Thank you for the invitation to speak today, and for Susan’s great and inspiring talk. I wouldn’t claim the label “symphonic” for this talk, but I think there is something of that spirit in this talk. This project is based on found and repurposed data. It isn’t particularly “big” data… But the “found” aspect of the data raises profound questions. Data never holds the answers on its own, it is always crucial to understand method and context. Visualisation is a big part of this. And it about public sociology – so it hasn’t just been published in journals but in popular press as well.

I am an economist who studies economists as a sociological object in their own right. So, this is a famous moment in 2008 when the Queen, during the midst of the largest global financial crisis since 1929, asked an economist “why did nobody notice it”. Because she is the queen, the British Academy convened a panel to respond to this question. And they said that lots of people did a good job, but it was no-one had it as their job to put everything together. Meanwhile with Brexit we’ve seen economists as a profession receiving substantial criticism.

Economists are hugely influential, we study them because it is the politics of expertise. It is the most politically influential social science. So, I’m going to talk about properties we would like politically influential experts to have:

  1. A high level of professional consensus within the the relevant community of experts. Gold standard here is climate science. If we have a community of experts that all agree, there seems to be a need for action. That’s a good principle.
  2. Form policy opinions independently of their own political ideology. We will receive and have confidence in advice from an independent expert more than someone presenting their own views.
  3. Acknowledge professional debate in expressing their views. That they acknowledge that issues are not settled issues.

So in this paper I want to look at how we may use data to measure these aspects. And I’mm be going through some theory around the cultural structure of belief spaces and how this relates to data, big data in the context of economics – but this theory can be used in other contexts as well).

I want to open on the “economics agora” online. I want to talk about two surveys here – these are open online surveys of economists since the financial crisis. It is no coincidence that these have emerged at this time. These surveys are in the UK and in the USA. And unusually the results include publishing the full responses, and the names of the responders – by their consent. These are famous/well known individuals in their field. This allows us to do more… Bring in data that is not in the survey – the CVs of the respondents for instance so including universities, political activities, their co-authorship network, etc. The survey organisers’ goal is to inform the public, but finding patterns in the data requires aggregation and analysis. This isn’t just individual responses, but understanding the context of the data. And again, this isn’t big data, this is quite small data. But these approaches apply to big data too.

So one of these surveys is the Chicago Booth IGM Economic Experts Panel. Each month they put a question to 40 economists about some issue of the moment – the impact of autonomous cars for instance. The second survey is the Centre for Math and Economics, based in London, and again they ask a panel for responses. Typically the UK/European survey shows much more disagreement than the US survey.

There are a lot of issues with these surveys: they are small (the UK/EU one is expanding) and non-random samples; deliberately elitist samples (US survey – “top 7” economics departments in US universities, mainly Ivy League) – why would you take this sample? Well you wouldn’t really… But you have very high status economists. The UK survey has a much wider range in its samples. I think these surveys are great… But I think they should do a better job! Another problem is that you have a high rate of “softball” questions – in the US survey, not in the UK/EU surveys. For instance “imposing new US tariffs on steel and alumnium will improve Americans’ welfare” – it’s timely but we already know that there is high consensus here. We need to ask harder questions! And finally we need to think about the motivations of the people who produce the data – the survey designers are looking to raise the profile of the profession. In a Wall Street Journal the designers of the US survey talked about wanting to counteract the idea of a lack of consensus in the field – and they are the ones asking the questions.

Gordon and Dahl (2013) looked at views and consensus in the field based on the surveys. They presented this as being a “remarkably high degree of consensus” and little variance across schools and departments. And thus look at how influential this field should be. This got big pick up… the Washington Post picked it up. Nobel winning economist Paul Krugman picked this up in his opinion column in the Economist. He is on record (New York Times 2009) as saying pretty much the opposite – that there is polarisation between the “saltwater” economists in the Keynesian camp, and the “freshwater” economists who are very much the opposite.

So, a bit of theory… What do we mean by consensus, polarisation, factions etc? How do groups of people structure their belief systems? We do have twenty years of literature and theory here around understanding belief systems. This goes back to political scientists in the 1960s. Philip Converse (1964) found that most american voters do not adhere to a coherent political ideology – this is still the case. Their believe systems are disorganised or “unconstrained” – so one belief does not let you predict another belief. So for instance comparing a belief that you should “reduce immigration” and “reduce corporate tax” – could show little correlation, those beliefs don’t automatically go together. Now if you are a voter in the UK in 2018 there probably is more alignment. That pattern is a “constrained or aligned” correlation. If you look at polarisation you see clusters of correlation.

So, that paper on economists looks for clusters. I looked at polarisation to look at latent ideology, noting partisanship (where known involvement in e.g. being part of political left or right leaning think tanks etc. – or marked as “none”), current department (freshwater vs saltwater) and belief dimension. Unsurprisingly those involved in Republican/conservative organisations and those with backgrounds in democratic/liberal organisations were very different, leaning right and left respectively. This is the same data that generated that paper that showed consensus and little variance.

There is a high degree of consensus in this survey but you can also see idealogical alignment. That can be consistent. But it depends on what you think, and what you ask. The UK survey – more recently expanded to Europe – shows much less consensus. This could mean there is more consensus in the US than in Europe; but it could also mean that the questions being asked in the UK survey are harder questions. The UK survey asks very complex questions… e.g. “Do you agree that, in a period of great uncertainty and after a prolonged period of weak real wage growth, monetary policy makers can afford to wait for greater certainty about real wage developments and building inflationary pressure before raising interest rates?”. So, you can’t measure consensus without a comparison with another group. You can see consensus on a question, not of a group/community or set of beliefs.

So, looking at a recent UK/EU survey on  looking at anti-establishment vs monetary conservatism you can see a diversity of views here.

So, back to those qualities. Professional consensus is harder to measure than it first appears.

One of the questions respondents are asked to give is their vote and their level of confidence. So, when experts give an opinion on hot topics you’d really want a low confidence score to show you don’t have a partisan respondent on your hands. Looking at the data here in the US surveys we see a lot of overly confident responses. Respondents with a stronger idealogical disposition (aligned belief structure) exhibit systematic overconfidence. In general, across all questions, when asked politically salient questions they state higher confidence than questions with little/no political salience.

By way of conclusion… Am I joining ranks with Michael Gove “people in this country have had enough of experts”? No. I would say something more nuanced. Arguably professions in general, economists in particular, has lost political legitimacy, then professional over-reach (“look how much consensus we have”) is not the answer. Claiming consensus where none exists is over-reach. Transparency about professional debate is always better than overstating consensus. Political legitimacy is a scarce resource and should be treated as such.

The economics agora online is a useful tool for studying the beliefs of an important community of experts… but survey designers should up their game. If you want an “unbiased” expert, chose someone whose belief structure is unconstrained. You probably want someone in the middle – people whose belief systems are not correlated. You need a theory of how groups form beliefs…. So read cultural sociology!


Q1) In thinking about the resistance to “naturally occurring data” and the idea of an “unbiased expert” – do you have a sense that that isn’t possible… Rather than getting that, should we instead shift the conversation to make the politics relevant – to be clear in a way that makes the numbers make sense…

A1) If we chose which experts to listen to, which do we listen to…

Q1) It was interesting to think of economists as “not political” – if that’s the conversation… I think the non-biased expert… That raises issues. We query that that even exists… Maybe we can shift the conversation.

A1) I guess I would want to push back a little bit. I am sympathetic that there is no unbiased expert but… I do a lot of work on economists on how they influence policy. I think the world does need economists, especially for monetary policy, technical aspects of policy. So, having some tools to understand this profession, how they structure beliefs… We need more tools to unpack that set of questions… I’m trying to find ways to study this profession studying quantitative tools and qualitative tools and understand impact on politics and society.

Q2) You mentioned a graph to show polarisation – how did you do that?

A2) This is not based on data, this is based on theoretical patterns… A series of plots using a test data set to illustrate the patterns of the theory – it’s theoretical rather than empirical data.

Q3) A slight follow up… How much have you played with non linear tools… Consensus and confidence… Research on scientific knowledge shows that people who know a little about science have higher confidence than those who know more… That could impact that data on confidence.

A3) We did look an non-linearity – doesn’t make a big difference to some measures here.

Q4) What definition of “expert” are you using, and why?

A4) People with PhDs in economics. In the US case are high status people in the field… In the UK/EU case it is broader. Most work as professors of economics, some work in the private sector in financial sectors. For my purposes it’s holding a PhD in economics… Other work I’ve done on organisations in Latin America you have senior political elites with those credentials, a lot don’t boundary work becomes more important here.

Q5) I think some of the Chicago questions also go to the public. Have you looked at that?

A5) It’s not publicly available… I’ve been thinking about asking for that. But it would be interesting to know if members of the public structure their belief systems differently. There is some work that compares public beliefs to these questions.

Q6)  I work on spatial models around expert agreement and disagreement – interesting measures there and on polarisation. Also dimensionality reduction. Since you are trying to identify latent ideological positions… Not sure if you’ve looked at that. Political behaviour research has

Q7) I wanted to ask about how much the very different types of respondents and samples you have between the US and UK/EU surveys. I was particularly wondering about the high status nature of the US experts and how much that status plays a part… You talked about doing some social network and contextual work here so I was wondering the degree to which their network and co-authorship and professional standing feeds into wanting to be seen to take a particular view, or visibly agree.

A7) The social network part, and co-authorship data is going to lead to a paper. We found people who are closer in co-authoring papers are ideologically closer – not totally surprising… So there is a social approval thing and a selection vias. We think that is the more likely interpretation here – the homophily effect. They co-author non-political papers, they still pick ideologically aligned authors. The status thing is interest… The UK/EU experts is less hierarchical – maybe reflects practice. In terms of monitoring each others responses… I think it’s more contrarian thing… They want to find ways to disagree… They can add comments… So lots of “My colleagues all think this, but if you think about it this other way you get this opposite response”.

Q8) My question/comment is about the “unconstrained” idea space – it feels funny and attractive… But also quite negative… Unconstrained… Disorganised… But you are talking it about a positive quality. But does that suggest they haven’t thought this stuff through?

A8) I’m glad you asked this. This question came up in the 1960s and it was seen as terrible that the ideologies didn’t align to political parties… The field has turned on it’s head now. In the 1960s though this was seen as politically naive. Actually more educated voters are seen to have more constrained beliefs… But with the economists that unconstrained belief system is good as it shows that they are not bring in their partisan/idealogical stand point. There is a contraction there. The idea that the more information you have, the more constrained your belief system should be… But only to a point. There is a really interesting paper by ? de Surrey and Ari Goldberg that compares idealogical voters, the unconstrained voters, and they find a third group that is e.g. politically liberal and economically conservative. This is a really interesting area of the literature. There are a bunch of new methods that are getting us nearer that question…

We broke for lunch and workshops at this point… 

Workshops: Parallel workshop sessions – please see descriptors below.

  • Text Analysis for the Tech Beginner – Suzanne Black, PhD student in LLC
  • An Introduction to Digital Manufacture – Mike Boyd (uCreate Studio Manager, UoE)
  • ‘I have the best words’: Twitter, Trump and Text Analysis – Dave Elsmore (EDINA)
  • An Introduction to Databases, with Maria DB & Navicat – Bridget Moynihan (LLC, UoE)
  • Introduction to Data Visualisation in Processing – Jules Rawlinson (Music, ECA, UoE)
  • Jupyter Notebooks and The University of Edinburgh Noteable service – Overview and Introduction – James Reid (EDINA)
  • Obtaining and working with Facebook Data – Simon Yuill (Goldsmiths)

I attended the Introduction to Data Visualisation in Processing workshop which was really interesting, and left me wanting to have a further play to see where it may potentially be useful. 

Round Table Discussion

  • Melissa Terras (MT), Professor of Digital Cultural Heritage
  • Kirsty Lingstadt (KL), Head of Digital Library and Depute Director of Library and University Collections
  • Ewan McAndrew (EM), Wikimedian in Residence
  • Tim Squirell (TM), PhD Student, Science, Technology and Innovation Studies working on communities and expertise and negotiations of those concepts.

MT: I wanted to start with quite a personal place… I realised last year that I was sort of grieving for the internet. I grew up with the internet, it’s been a big part of my life and friendships… But the internet has taken a different turn… And there is a need to step away from that a bit to stay sane. There is a need to step back and reflect, and think about the University Space. I feel maybe we could have stepped in… The questions of Facebook, Twitter, the use of data… The human nature of trust… And how we use and engage and archive and preserve some of these spaces… I think that makes it interesting to an academic in the digital space right now.

EM: I think the idea of the web was quite sour after Cambridge Analytica. Tim Berners-Lee spoke on Channel 4 News about how it’s not enough to build and run the open web, but we have to look critically at what is being done with it, what people are building. I also thought that the Scottish Referendum, and Glasgow Strathclyde University which called upon all librarians to support political literacy. But that could be “universities” not just “libraries” – there is a need for much more information literacy as a service almost.

KL: The role of the university is about knowledge and supporting and preserving knowledge, with the library central to that… As the digital world changes we need those skills of information literacy, to think critically about what we see on the web, and how we understand that. That’s an important thread the library offers and supports. The arts, humanities and social sciences really support that development of critical engagement, literacy, context and the origins of big data. I was very much chiming with CILIPS work on information literacy – the university library has a really important part to play here…

TS: I want to make three brief points on engagement, expertise and access. One of the things I’ve observed on the web around online communities, is that there is a tendency to only notice a community until something happens. I study some quite extreme communities, including the involuntary celibate community, and you can’t raise interest until people go out and kill people. We really need to see more engagement and understanding, not as an object of interest. The second point is about experts and what that means… I think that reification of expertise is niave at best, and often dangerous. Only engaging with experts, or corroborating your beliefs, or feeling that you only engage with an expert class, overlooks the way most people engage with issues. And finally on access… In light of Cambridge Analytica, Facebook has shut down access for all but their own Facebook programme (with funding councils) of research. Doing that means only people working at the companies, or the elite universities with particular track records…

Comment: Interesting that you mentioned Tim Berners-Lee as he was the reason Web Science got set up at Southampton. The narrative was… I invented the web (discuss) and it has gone wrong (discuss). That was a perspective that didn’t problematise information or communication etc. The idea was that we would reengineer the web (discuss) as if it is technical, not a complex socio-technical network. I’m not being negative but supporting your statements. The restructing of Information Technology GCSE was a travesty – there was no attempt at critical engagement, just at programming. And it is really important that we envision what we want the web to be. There is no fixed idea of the web. We have gone down the rabbit hole of behavioural tracking and advertising as the only economic model… But we could play with that. I would make a pitch for Utopianism… With Donna Harraway: looking at the trouble and thinking about what else we could do.

Comment: I wondered about… that sense of the internet as being what we hoped it could be… But also the issue of the attack on net neutrality in the US, and immediate recognition that that isn’t ok… How do we back away, not engage in the toxic parts of the internet… But also save the parts that are worth saving… Keeping an eye on legislation? Do we protect without participating?

MT: I immediately started to think of how we talk about bitcoin – very utopian visions and turning it into a profit making machine, as has happened in the internet… How do we build structures that can be used to make money… Without that consuming the rest of it… The internet is consuming all the other stuff… I think bitcoin will be the same… The same people who had money 200 years ago, will be the same people who’ll make money now… Partly information literacy, partly being cynical, being civic… Being alive to issues…

TS: I am going to say two contradictory sounding things… So many of these issues seem to be engineering issues to social problems. I was at a conference with someone talking about a blockchain based education network, with a smart contract to validate credentials. Taking the human out of the process, in order to improve the situation. Bitcoin is supposed to be trustless… But at some point you have a human interface, it will fail… You will always face problems you couldn’t spot – unless you spoke to a social scientist. But that goes with us as social scientists is the need for us to engage with the engineering sides of things… Lots of “if only we could have known what would happen with Cambridge Analytica”, but we’ve known about that for years… We struggle to be listened to by policy makers when compared with businesses who have legitimate routes in, and argue for a lack of accountability. Platforms are not neutral, you can engineer the behaviours available in the space. You have to understand the feedback loop between administration and engineering.

EM: Thinking about democratisation.. And thinking about utopian visions… Putting my wikimedian hat on… I think that it has been amazing to see the work done by students here… There is real benefit to having a very transparant space online where you can query or change or contribute to the world. Wikipedia is committed to keeping the human element at its core. One of the ways that Wikipedia checks and balances the data is that you can’t edit a page unless you’ve had an account for four days.

KL: That’s where libraries of all kinds come in – a space or platform to trace the source, the archive materials… And digital data… Data curation and longer term lifecycles.. Digital content being created… To check, to contribute.

Comment: There’s an interesting underlying narrative that the web has gone wrong, and that the economy has gone wrong… As if these structured inequalities are accidental but they are not, they are deliberate. We need a critical historical narrative of the web and how this has taken place…. And the historical narrative of where the web has come from. We need more engagement from the humanities here… There are underlying themes here.

Comment: From literary and fan fiction studies we have for years been talking to a literature and community that exists online and how that interacts online. Fan fiction is often written by women, by BME and LGBTQ and non-binary people… We have a cry of “own the servers” to avoid exploitation… Could anyone comment on that type of utopian vision – the local and the global… Who accesses the data…

KL: From my context of the library, it’s about putting materials out there to access what they need as equitably as possible… But that’s difficult… For archives and personal material there are restrictions and limitations for good reason… We haven’t cracked that perfectly… It is a challenge, there isn’t an easy answer to it…

EM: From a Wikipedia angle… Wikipedia had a conversation within and around the community about where the community is going by 2030… Where they were going, what they needed to do to share and access knowledge around the world… To enable better understanding… To more civic and better societies. But there are huge disparities of access. Out of that came the sense of knowledge not as a product but as a service. And the idea of knowledge equity – in terms of access but recognising only 10% of editors are female, it’s Northern Hemisphere orientated, only 2.5% of geotagged content relates to Africa. It’s not shying away from that, instead trying to address that over time… Which is why Wiki Project Medicine has created “the internet in a box” to enable access to a downloaded medical version of the content to improve access to information.

Comment: From Biological sciences background… My question underpins everything here… We haven’t really touched on digital preservation, it’s a big and worrying thing. I’ve listened to comment on big gaps in digital data, it’s really difficult in the long term. How will that be affected by GDPR and what can be done there in terms of preservation and access. We are looking more and more at the cloud… The carbon footprint of ICT is expected to be 40% by 2040. Thinking about preservation and the more and more carbon intensive nature of the web, what can universities do to tackle these years…

KL: Digital preservation is close and dear to us. It is challenging and not easy. It’s not a commodity you can just buy, there isn’t one way to do this. We are trying to tackle certain areas. We are trying to preserve the university’s history. We also look actively on research data produced by the University. Addressing those two areas, there is still a huge area of web output and web archiving there… There is interest in the University output, but less interest in the wider context. We acknowledge that agenda and push it up in the university – and digital humanities helps here, and that means access to information which helps us make our case. With GDPR does present complexity, it does mean working with encryption… For company/global content that’s broader.

Comment: In terms of the issue of experts… I think it’s interesting to see experts by credentials, or by reputation… And how that relates to the internet… It seems like a great way to be a self-made expert… To promote yourself as an expert because you have a blog. You may have stature and influence… But that’s very different from a PhD or an academic expertise… I’m interested that part of being an expert is admitting when you don’t know something… It seems the public wants experts to tell you the answer right now… What is the role of the internet right now here.

TS: I have a lot of thoughts on this. It’s basically my PhD. If I ramble… Stop me… I think this is fundamentally about the way we reconceptualise expertise.. There is the idea of it being reiified, as rare and based on credentials, and that being in conflict with other types of self-made influential. Steven Taylor has a paper on experts across three types, including this group of self-made experts… They come to represent a much larger group of experts – it hasn’t democratised broadcast but it’s certainly opened up and broadened the field somewhat. When we understand expertise as only credentialed people in specific organisations, we limit communication. We have to be able to engage as compellingly as these people able to weaponise, essentially, nonsense and see how we can be as engaging with them. We have to be provocative and interesting. We can’t expect people to just come and ask the right experts. The burden shouldn’t be on audiences, the burden should be on “experts” to be palatable and appealing as experts.

MT: The anti expertise thing isn’t a new thing too… It goes right back to founding of universities, particularly in the Victorian era… I have a book coming out on professors in childrens literature, and accompanying anthrology, and every single story is “the professor is rubbish”. All of them. All about not trusting experts, just when expertise is being formalised… The general populace ridiculing them… The internet has boosted that again. But a positive thing… Crowd sourcing is a positive development… We did a few crowd sourcing projects that truly changed access and use of information – work that used to only be done by paleographist, looking at Jeremy Bentham’s papers… The internet helped us speed that all up… If we have the right platforms, the right structures, we can do the right things… But we can’t let “expertise is rubbish to perpetuate”.

EM: Again with digital preservation, there is a cost attached… There may be volunteers… If there is a platform or a lack of cost… You can do a lot. And archive a lot in public ways…

KL: I was going to add that the cultural heritage sector has an interesting relationship with working with the community… But there is this tension about how and who can contribute how, and who can do it best. But the crowd is full of enthusiasm… As long as work is provenanced…. That is a really good way to positively use the web.

Comment: In response to the Cambridge Analytica stuff… And why didn’t they listen to the social scientists… Isn’t GDPR an example of the law doing as good a job as it could… And data ownership… Legislative work in Europe on copyright and data ownership… If we want to set the right example, it’s not enough to throw up our hands in horror… You have to engage in legislative process… Laws do have an impact in cyberspace.

Comment: Business models – and how do we change that – it shapes the platform. Investment doesn’t go in equally – and as universities we do start ups, we do engagement with industry. How do we move beyond all of these businesses being set up by young wealthy guys, and opening that up… And reconceptualising success as more than just exit, and data as asset – and that being personal data. I also wanted to note that web archiving does take place – with the Internet Archive who operate in the more permissive US copyright context (and mirrored in Canada – they were concerned that Trump might interfere with the archive). There is a small but politically aware web archiving community but part of making that and any platform work is about acknowledging that there is cost to running platforms, to archiving materials…

Comment: That idea of “an expert” – surely we reconceptualise the expert as a distributed thing.

TS: Yes.

MT: And with that I’d like to thank the panel and draw this to a close. We hope to have some announcements in the next year about expanding this work, and this day takes place in an environment that contributed to my coming to Edinburgh, with the City Deal, and with the work driving Edinburgh to be the Data Driven Innovation capital of Europe.

May 022018

This morning I’m at the “Working with the British Library’s Digital Content, Data and Services for your research (University of Edinburgh)” event at the Informatics Forum to hear about work that has been taking place at the British Library Labs programme, and with BL data recently. I’ll be liveblogging and, as usual, any comments, questions, 

Introduction and Welcome – Professor Melissa Terras

Welcome to this British Library Labs event, this is about work that fits into wider work taking place and coming here at Edinburgh. British Library Labs works in a space that is changing all the time, and we need to think about how we as researchers can use digital content and this kind of work – and we’ll be hearing from some Edinburgh researchers using British Library data in their work today.

“What is British Library Labs? How have we engaged researchers, artists, entrepreneurs and educators in using our digital collections” – Ben O’Steen, Technical Lead, British Library Labs

We work to engage researchers, artists, entrepreneurs and educators to use our digital collections – we don’t build stuff, we find ways to enable access and use of our data.

The British Library isn’t just our building in St Pancras, we also have a huge document supply and storage facility in Boston Spa. At St Pancras we don’t just have the collections, we have space to work, we have reading rooms, and we have five underground floors hidden away there. We also have a public mission and a “Living Knowledge Vision” which helps us to shape our work

British Library Labs has been running for four years now, funded by the Andrew Mellow Fund, and we are in our third funded phase where we are trying to make this business as usual… So the BL supports the reader who wants to read 3 things, and the reader who wants to read 300,000 things. To do that we have some challenges to face to make things more accessible – not least to help people deal with the sheer scale of the collections. And we want to avoid people having to learn unfamiliar formats and methodologies which are about the library and our processes. We also want to help people explore the feel of collections, their “shape” – what’s missing, what’s there, why and how to understand that. We also want to help people navigate data in new ways.

So, for the last few years we have been trying to help researchers address their own specific problems, but also trying to work out if that is part of a wider problem, to see where there are general issues. But a lot of what we have done has been about getting started… We have a lot of items – about 180 million – but any count e have is always an estimates. Those items include 14m books, 60m patents, 8m stamps, 3m sound recordings… So what do researchers ask for….

Well, researchers often ask for all the content we have. That hides the failure that we should have better tools to understand what is there, and what they want. That is a big ask, but that means a lot of internal change. So, we try to give researchers as much as we have… Sometimes thats TBs of data, sometimes GBs.. And data might be all sorts of stuff – not just the text but the images, the bindings, etc. If we take a digitised item we have an image of the cover, we have pictures, we have text, we also have OCR for these books – when people ask for “all” the book – is that the images, the OCR or both? One of those is much easier to provide…

Facial recognition is quite hot right now… That was one of the original reasons to access all of the illustrations – I run something called the Mechanical Curator to help highlight those images – they asked if they could have the images – so we now have 120m images on Flickr. What we knew about images was the book, and the page. All the categorisation and metadata now there has been from people and machines looking at the data. We worked with Wikimedia UK to find maps, using manual and machine learning techniques – kind of in competition – to identify those maps… And they have now been moved into georeferencing tools ( and fed back to Flickr and also into the catalgue… But that breaks the catalogue… It’s not the best way to do this, so that has triggered conversations within the library about what we do differently, what we do extra.

As part of the crowdsourcing I built an arcade machine – and we ran a game jam with several usable games to categorise or confirm categories. That’s currently in the hallway by the lifts in the building, and was the result of work with researchers.

We put our content out there under CC0 license, and then we have awards to recognise great use of our data. And this was submitted – a video of Hey There Young Sailor official music video using that content! We also have the Off the Map copetition – a curated set of data for undergraduate gaming students based on a theme… Every year there is something exceptional.

I mentioned library catalogue being challenging. And not always understanding that when you ask for everything, that isn’t everything that exists. But there are still holes…. When we look at the metadata for our 19th century books we see huge amounts of data in [square brackets] meaning the data isn’t known but is the best suggestion. And this becomes more obvious when we look at work researcher Pieter Francois did on the collection – showing spikes in publication dates at 5 year intervals… Which reflects the guesses at publication year that tend to be e.g. 1800/1805/1810. So if you take intervals to shape your data, it will be distorted. And then what we have digitised is not representative of that, and it’s a very small part of the collection…

There is bias in digitisation then, and we try to help others understand that. Right now our digitised collections are about 3% of our collections. Of the digitised material 15% is openly licensed. But only about 10% is online. About 85% of our collections cn only be accessed “on site” as licenses were written pre-internet. We have been exploring that, and exploring what that means…

So, back to use of our data… People have a hierachy of needs from big broad questions down to filtered and specific queries… We have to get to the place where we can address those specific questions. We know we have messy OCR, so that needs addressing.

We have people looking for (sometimes terrible) jokes – see Victorian Humour run by Bob Nicholson based on his research – this is stuff that can’t be found with keywords…

We have Kavina Novrakas mapping political activity in the 19th Century. This looks different but uses the same data and the same platform – using Jupyter Notebooks. And we have researchers looking at black abolitionists. We have SherlockNet trying to do image classification… And we find work all over the place building on our data, on our images… We found a card game – Moveable Type – built on our images. And David Normal building montages of images. We’ve had poetic places project.

So, we try to help people explore. We know that our services need to be better… And that our services shape expectations of the data – and can omit and hide aspects of the collections. Exploring data is difficult, especially with collections at this scale – and it often requires specific skills and capabilities.

British Library Labs working with University of Edinburgh and University of St Andrews Researchers

“Text Mining of News Broadcasts” – Dr. Beatrice Alex, Informatics (University of Edinburgh)

Today I’ll be talking about my work with speech data, which is funded by my Turing fellowship. I work in a group who have mainly worked with text, but this project has built on work with speech transcripts – and I am doing work on a project with news footage, and dialogues between humans and robots.

The challenges of working with speech includes particular characteristics: short utterances, interjections; speaker assumptions – different from e.g. newspaper text; turn taking.  Often transcripts miss sentence boundaries, punctuation or missing case distinctions. And there are errors introduced by speech recognition.

So, I’m just going to show you an example of our work which you can view online – Here you can do real time speech recognition, and this can then also be run through the Edinburgh Geoparser to look for locations and identify their locations on the map. There are a few errors and, where locations haven’t been recognised in the speech recognition they also don’t map well. The steps in this pipeline is speech recognition… ASR then Google Text Restoration, and then text and data mining.

So, at the BL I’ve been working with Luke McKernan, lead curator for news and moving images. I have had access to a small set of example news broadcast files for prototype development. This is too small for testing/validation – I’d have to be onsite at BL to work on the full collection. And I’ve been using the CallHome collection (telephone transcripts) and BBC data which is available locally at Informatics.

So looking at an example we can see good text recognition. In my work I have implemented a case restoration step (named entities and sentence initials) using rule based lexicon lookup, and also using Punctuator 2 – an open source tool which adds punctuation. That works much better but isn’t up to an ideal level there. Meanwhile the Geoparser was designed for text so works well but misses things… Improvement work has taken place but there is more to do… And we have named entity recognition in use here too – looking for location, names, etc.

The next steps is to test the effect of ASR quality on text mining – using CallHome and BBC broadcast data) using formal evaluation; improve the text mining on speech transcript data based on further error analysis; and longer term plans include applications in the healthcare sector.


Q1) Could this technology be applied to songs?

A1) It could be – we haven’t worked with songs before but we could look at applying it.

“Text Mining Historical Newspapers” – Dr. Beatrice Alex and Dr. Claire Grover, Senior Research Fellow, Informatics (University of Edinburgh) [Bea Alex will present Claire’s paper on her behalf]

Claire is involved in an Adinistrative Data Research Centre Scotland project looking at local Scottish Newspapers, text mine it, and connect it to other work. Claire managed to get access to the BL newspapers through Cengage and Gale – with help from the University of Edinburgh Library. This isn’t all of the BL newspaper collection, but part of it. This collection of data is also now available for use by other researchers at Edinburgh. Issues we had here ws that access to more reent newspaper is difficult, and the OCR quality. Claire’s work focused on three papers in the first instance, from Aberdeen, Dundee and Edinburgh.

Claire adapted the Edinburgh Geoparser to process the OCR format of the newspapers and added local gazetteer resouces fro Aberdeen, Dundee and Edinburgh from OS OpenData. Each article was then automatically annotated with paragraph, sentence, work mark-up; named entities – people, place, organisation; location; geo coordinates.

So, for example, a scanned item from the Edinburgh Evening News from 1904 – its not a great scan but the OCR is OK but erroneous. Named entities are identified, locations are marked. Because of the scale of the data Claire took just one year from most of the papers and worked with a huge number of articles, announcments, images etc. She also drilled down into the geoparsed newspaper articles.

So for Abereen in 1922 there were over 19 million word/punctuation tokens and over 230,000 location mentions Then used frequency methods and concordances to understand the data. For instance she looked for mentions of Aberdeen placenames by frequency – and that shows the regions/districts of abersteen – Torry, Woodside, and also Union Street… Then Claire dug down again… Looking at Torry the mentions included Office, Rooms, Suit, etc, which gives a sense of the area – a place people rented accommoation in. In just the news articles (not ads etc) then for Torry it’s about Council, Parish, Councillor, politics, etc.

Looking at Concordances Claire looked at “fish”, for instance” to see what else was mentioned and, in summary, she noted that the industry was depressed after WW1; there was unemployment in Aberdeen and the fishing towns of Aberdeenshire; that there was competition rom German trawlers landing Icelandic fish; that there were hopes to work with Germany and Russia on the industry; and that government was involved in supporting the industry and taking action to improve it.

With the Dundee data we can see the Topic Modelling that Claire did for the articles – for instance clustering of cars, police, accidents etc; there is a farming and agriculture topic; sports (golf etc)… And you can look at the headlines from those topics and see how that reflect the identified topics.

So, next steps for this work will include: improving text analysis and geoparsing components; get access to more recent newspapers – but there is issing infrastructure for larger data sets but we are working on this; scale up the system to process whole data set and store text ining output; tools to summarise content; and tools for search – filtering by place, data, linguistic context – tools beyond the command line.

“Visualizing Cultural Collections as a Speculative Process” – Dr. Uta Hinrichs, Lecturer at the School of Computer Science (University of St Andrews)

My research focuses on visualisation and Human Computer Interaction. I am particularly interested in how interfaces can make visible digital collections. I have worked on a couple of projects with Bea Alex and others in the room to visualise texts. I will talk a little bit about LitLong, and the process in developing early visualisations for the project.

So, some background… Edinburgh is a UNESCO City of Literature, with lots of literature about and in the city. And we wanted to automate the discovery of Edinburgh-absed literature from available digitised text. That included a large number of collections – about 380k – from collections including the BL 19th Century Books collection. And we wanted to make results accessible to the public.

There were lots of people involved here, from Edinburgh University (PI, James Loxley), Informatics, St Andrews, and EDINA. And worked both with out of copyright texts, but also we had special permission to work with some in-copyright texts including Irvine Welsh. And a lot of work was done to geoparse the text – and assess it’s Edinburghyness. For each mention we had the author, the title, the year, and snippets of the text from around the mention. This led to visualisations – I worked on LitLong 1.0 and I’ll talk about this, but a further version (LitLong 2.0) launched last year.

So you can explore clusters of places mentioned in texts, you can explore the clustered words and snippets around the mentions. And you can zoom in to specific texts – again you can see the text snippets in detail. When you explore the snippets, you can see what else is there, to explore other snippets.

So in terms of the design considerations we wanted a multi faceted intractive overview of the data – Edinburgh locations; books; extracted snippets; authors; keywords. Maps and lists are familiar and we wanted this tool to be accessible to scholars but also the public. We took an approach that allowed “generous” explorations (Mitchell Whitelaw 2015) so there are suggestions of how to explore further, parts of the data showing… Weighted tag clouds let you get a feel of the data for instance.

As a process it wasn’t like the text mining happened then we magically had the visualisations… It was iterative. And actually we used visualisation tools to actually assess which texts were in scope, and which weren’t going to be relevant – and mark them up to keep or to rule out a text. This interface included information on where in a text the mention occurred – to help identify how much about Edinburgh a text actually was.

We had a creative visualisation process… We launched the interface in 2015, and there was some iteration and that also inspired LitLong 2.0 which is a much more public-friendly way to explore the material in different way.

So, I think it is important to think about visualisation as a speculative process. This allows you to make early computational analysis approached visille and facilitate qa and curatorial process. To promote new interactions transforming a print based culture into something different – thinking about materiality rather than just content is important as we enable exporation. When I look back at my own work I see some similarities in interfaces… You can see the unique qualities of the collections in the data trends but we are doung much more work on designing interfaces  that surface the unique qualities of the collection in new ways.


Q1) What did you learn about Edinburgh or literature in Edinburgh from this project?

A1) The literature scholars would be better able to talk about that but I know it has inspired new writers. Used in teaching. And also discovered some characteristics of Edinburgh, and women writers in the corpus… James Loxley (Edinburgh) and Tara Thompson (Edinburgh Napier University) could say more about how this is being used in new literary research.

“Public Private Digitisation Partnerships at the British Library” – Hugh Brown, British Library Digitisation Project Manager

I work as part of the Digital Scholarship team at the British Library, which was founded in 2010 to support colleagues and researchers to make innovative use of BL digital collections and data – and recognising the gap in provision we had there. The team is led by Adam Farquhar – Head of Digital Scholarship, and by Neil Fitzgerald, Head of Digital Research Team. We are cross disciplinary experts in the areas of digitisation, librarianship, digital historu adnd humanities, computer and data sience and we look at how technilogu is transforming research and in turn our services. And we include the British Library Labs, Digital Curators, adn the Endangered Archives Programme (EAP).

So, we help get content online and digitised, we support researchers, and we run a training programme to bridge skills so that researchers can begin to engage with digital resources. We expect that in 10-15 years time those will be core research skills so we might not exist – it will just be part of the norm. But we are a long way off that at the moment. We also currently run Hack and Yack events to experiment and discuss. And we also have a Reading Room to share what’s happening in the world, to share best practice.

In terms of our collections and partnerships, we have historically had a slightly piecemeal digitisation approach, so we now have a joined up strategy that sits under our Living Knowledge strategy and includes partnership, commercial strategy and our own collection strategy. Our partnerships recognise that we don’t always have the skills we need to make content available, whilst our commercial strategy – where I work – allows us to digitise as much as possible, and in a context were we don’t have infinite funding for digitisation.

We have various factors in mind when considering potential partnership. The types of approach include partnerships based on whether materials are in or out of copyright – if in copyright then commercial partners have to clear rights. We do public/private partnership with technology partners. We have non-commercial organisational and/or consortium funding. And we have philanthropic donor funded work. Then we think about content – content strategy, asset ownership, digitisation location. We think about value – audience type/interest/geography, and topicality. We think about copyright – British library owns the rights, rights of reuse. We think about disocverability – the ability to identify and search, and access that maximises exposure. We look at the (BL) benefit – funding, access etc. We look at risk. And we look at contract – whether it is non-exclusive, commercial/non commercial.

So, we have had public-private digitisation partnerships with Gale Cengage Learning, Adam Matthews, findmypast, Google Books, Microsoft books, etc. And looking at examples Google books has been 80m+ images digitised; Microsoft books was 25m images; findmypast has done 23m+ images of newspapers; Gale Cengage Learning has done 18th century collections – 22m images, 19c online 2.2m+ images, and Arabic books, etc.

The process begins with liaison with key publishers. Then there is market and content research. Then we plan and agree plan, including licensing of rights for a fixed term (5-10 years), and royalty arrangements and reading room access. Then digitisation takes place, funded by the partner – either by setting up a satellite studio, or using the BL studio. So our partners digitise content and give us that content, in exchange they get 5-10 years exclusive agreement to use that content on their platform. And revenue  generated for BL helps support what we do, and our curators work around digitisation.

So Findmypast was an interesting example. We had electoral registers and India Office Records – data with real commercial value. So, we put a tender out for a partner for digitisation. Findmypast was selected… Part of that was to do with the challenges of the electoral registers which were inconsistent formats etc. so needed a lot of specific work And we also needed historical country boundaries to be understood to make it work. There was also a lot of manual OCR work to do.

With Gale Cengage they tend to be education/universities focused and they work with researchers. We worked with them to select 19th century materials to fit their themes and interests. They did the early arabic book project – a really complex project. The private case collection consisted of mainly books that had been inaccessible on grounds of obscenity from around 1600 and 1960.

With Adam Mathew Digital we were approaches to contribute material from the electoral registers and india office records. And materials on the East India Company.

Now these are exciting projects but we want 20-30% of content generated in these projects to be available as a corpus for research and that’s important to our agreements.

Challenges in the workflow include ensuring business partners and scannning vendors have a good understanding of the material BL holds in our collections. We have to define and provide metadata requirements the BL needs to supply to the partners. Getting statistics and project plans from information business partners. There are logistical challenges around understanding the impact of digitisation on BL departments supporting the process. We have to manage partners business drivers versus BL curatorial drivers. We have to manage the parters digitisation vendors on site. And ensuring the final digital assets/metadata received meets BL requirements for sign off and ingest.


Q1) How can we actually access this stuff for research?

A1) For pure research that can be done. For example we have a company in Brighton who are doing research on the electoral roll. That’s not in competition with what the private partner is doing.

Comment from Melissa) My experience is “don’t ask, don’t get” – so if you see something you want to use in your research, do ask!

“The Future of BL Labs and Digital Research at the Library” – Ben O’Steen

I’ve handed out some personas for users of our digital collections – and a blank sheet on the back. We are trying to build up a picture of the needs of our users, their skills and interests, and that helps us illustrate what we do – that’s a thing to come back to (see:

So I want to talk about the future of BL Labs. We are a project and our funding is due to finish. Our role has been to engage with researchers and that is going to continue – maybe with that same brand just not as a project. We need to learn what they want to do… We need to collect evidence of demand. And we are developing a business model and support process to make “Business as usual” at the BL. We want to help to create pathway to developing a “Digital Research Suit” at the BL by 2019. But we want to think about what that might be, and we are piloting ideas including small 2 person workrooms for digital projects. And we can control access – so that we can see how this works, and ensure that the users understand what you can and cannot do with the data (that you can’t just download everything and walk out with it).

And many other places are being “inspired” by our model – take a look at the Library of Congress work in particular.

So, at this stage we are looking at our business model and how we can make these scalable services. Our model to date has been smaller scale, about capabilities to get started, etc. That is not scalable at the level we’ve been working. We need a more hands off proess ad to be able to see more people. We also run BL Labs Awards which, instead of working with people, recognises work people have already done. People submit and then in October our advisory board reviews the entries and looks for work that champions our content.

To develop our business model we are exploring, evaluating and implementing a business model. We are using business model canvas. We have internal and external business model development, implementation and evaluation groups, and exploring how this could work in practice. And we are testing, piloting and implementing our business model. That means:

  • developing support service
    • Entry level – about the collection, documentation improvements, case studies that help show what is in there.
    • Baseline – basic enquiry service to enable researchers to understand if a BL project is the right path, any legal restrictions that need addressing, etc. We try to get you to the next stage of developing your idea.
    • Intermediate – Consultation service, which will be written in as part of a bid.
    • Advanced – support 10 projects per year through an application process)
  • Augment – that was a placeholder for a year, and now a tender has just gone out for a repository type service for 12-18 months
    • e.g. sample datasets, tools, examples of use
    • Pilot use of Jupyter Notebooks / Docker other tools for Open and Onside data
  • Researcher access to BL APIs
  • Reading room services – onside access/compute to digital collections – which means us training staff

This has come about as we’ve seen a pattern in approaches that start with an initial exploration phase, then transition into investigation and then some sort of completion phase. There had been a false assumption (on the data providers part) that data-based work must start at the investigation phase – to have an idea of the project they want to do, to know the data already, to know the collections. What we are piloting is that essential exploratory stage, acknowledging that that happens. And that pattern shifts around – exploration and investigation stages can fork off in different directions, that’s fine.

So, timescales and themes seem to be a phase of quick initial work. A longer and variable transition takes place into investigation – probably months. Then investigation takes months to a year. And crucially that completion stage.

Exploration is about understanding the data in an open-ended fashion. It is about discovering the potential tools to work with the data. We want people to gain awareness of their capabilities and limitations – a reality check and opportunity to understand the need for partners and/or new tools. And it’s about developing a firmer query as that helps you to understand the cost, risk, time you might need. Exploration (e.g. V&A Spelunker) lets you get a sense of what’s there, which gives you a different way in to the keyword or catalogue search. And then you have artists like Mario Klingemann – collating images looking sad… It’s artistic but talks about how women are portrayed in the 19th Century. He’s also done work on hats on the ground – and found it’s always a fight! This is showing cultural memes – an important question… An older example is the Cooper Heritt collection – which lets you see all of tags – including various types of similarity that show new ways into the data.

So, what should a digital exploration service look like? Which apps? Does Jupyter Notebook assume too much?

We’ve found that every time we present the data, it shapes the perception. For instance the On the Road manuscript is on a roll. If you print a book on a receipt roll it’s different and reads and is understood differently.

MIT have a Moral Machine survey ( which is the classic trolley issue – crowdsourced for autonomous vehicle. But that presentation shapes and limits the questions, and that is biased. Some of the best questions we’ve seen have been from people who have asked very broad questions and haven’t engaged in exploration in other ways. They are hard to answer (e.g. all depictions of women) but they reveal more. Presenting as a searchable list shapes how we interpret the result… But for instance showing newspaper articles as if in a giant newspaper – not a list of results – changes what you do. And that’s why tools like IIIF seems useful.

So… We have things like Gender API. It looks good, it looks professional… If you try it with a western name, does it work. If you try it with an Indian name, does it work. If you try it with a 19th Century name does it work? Know that marketeers will use this. See also sentiment analysis. Some of these tools are based on Twitter. I found a research working an 18th Century texts for sentiment about war and conflict… Through a tool developed and trained for Tweets. We have to be transparent in what is happening, in understanding what you are doing… Hence thinking about personas.

We are trying to think about how we show what is missing from a collection, rather than what is present so that data can be used in a more informed way. We are looking at what research environments we can provide – we know that people want to use their own but we can sometimes be a bit stuffed by licensing based in a paper era. On site tools can help. Should we enable research environments for open data that can be used off site too. We are thinking about focus – are the query, tooling and collections required well defined; is it feasible – legal, cost, ethical, source data quality, etc; is it affordable – time, people, money; etc.

So, we have, on the BL Labs website, a form – it’s long so do send us feedback on whether that is the right format etc. – to help us understand demand and skills.

Those personas – please fill these in – and let us know the technical part, what you might want, how technical the support you need. We are keen to discuss your needs, challenges and issues.

And with that we are done and moving onto lunch and discussion. Thanks to Ben, Hugh, Alex and Uta we well as Melissa and the Digital Scholarship Team!


Mar 232018

Today I am back at the Data Fest Data Summit 2018, for the second day. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to Data Summit Day 2 – Maggie Philbin

We’ve just opened with a video on Ecometrica and their Data Lab supported work on calculating water footprints. 

I’d like to start by thanking our sponsors, who make this possible. And also I wanted to ask you about your highlights from yesterday. These include Eddie Copeland from Nesta’s talk, discussion of small data, etc. 

Data Science for Societal Good — Who? What? Why? How? –  Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton

Data science has a huge impact for the business world, but also for societal good. I wanted to talk about the 5 i’s of data science for social good:

  1. Interest
  2. Insight
  3. Inspiration
  4. Innovation
  5. Ignition

So, the number one, is the Interest. The data can attrat people to engage with a problem. Everything we do is digital now. And all this information is useful for something. No matter what your passion, you can follow this as a data scientist. I wanted to give an example here… My background is astrophysics and I love teaching people about the world, but my day job has always been other things. About 20 years ago I was working in data science at NASA and we saw an astronomical – and I mean it, we were NASA – growth in data. And we weren’t sure what to do with it, and a colleague told me about data mining. It seemed interesting but I just wasn’t getting what the deal was. We had a lunch talk from a professor at Stanford, and she came in and filled the board with equations… She was talking about the work they were doing at IBM in New York. And then she said “and now I’m going to tell you about our summer school” – where they take kids from inner city kids who aren’t interested in school, and teach them data science. Deafening silence from the audience… And she said “yes, we teach the staff data mining in the context of what means most for these students, what matters most. And she explained: street basketball. So IBM was working on a software called IBM Advanced Calc specifically predicting basketball strategy. And the kids loved basketball enough that they really wanted to work in math and science… And I loved that, but what she said next changed my life.

My PhD research was on colliding galaxy. It was so exciting… I loved teaching and I was so impressed with what she had done. These kids she was working with had peer pressure not to be academic, not to study. This school had a graduation rate of less than 50%. Their mark of success for their students was their graduation rate – of 98%. I was moved by that. I felt that if this data science has this much power to change lives, that’s what I want to do for the rest of my lives. So my life, and those of my peers, has been driven by passion. My career has been as much about promoting data literacy as anything else.

So, secondly, we have insight. Traditionally we collect some data points but we don’t share this data, we are not combining the signals… Insight comes from integrating all the different signals in the system. That’s another reason for applying data to societal good, to gain understanding. For example, at NASA, we looked at what could be combined to understand environmental science, and all the many applications, services and knowledge that could be delivered and drive insight from the data.

Number three on this list is Inspiration. Inspiration, passion, purpose, curiousity, these motivate people. Hackathons, when they are good, are all about that. When I was teaching the group projects where the team was all the same, did the worst and least interestingly. When the team is diverse in the widest sense – people who know nothing about Python, R, etc. can bring real insights. So, for example my company run the “Data Science Bowl” and we tackle topics like Ocean Health, Heart Health, Lung Cancer, drug discovery. There are prizes for the top ten teams, this year there is a huge computing prize as well as a cash prize. The winners of our Heart Health challenge were two Wall Street Quants – they knew math! Get involved!

Next, innovation. Discovering new solutions and new questions. Generating new questions is hugely exciting. Think about the art of the possible. The XYZ of Data Science Innovation is about precision data, precision for personalised medicine, etc.

And fifth, ignition. Be the spark. My career came out of looking through a telescope back when I lived in Yorkshire as a kid. My career has changed, but I’ve always been a scientist. That spark can create change, can change the world. And big data, IoT and data scientists are partners in sustainability. How can we use these approaches to address the 17 Sustainability Development Goals. And there are 229 Key Performers Indicators to measure performance – get involved. We can do this!

So, those are the five i’s. And I’d like to encapsulate this with the words of a poet…. Data scientists – and that’s you even if you don’t think you are one yet. You come out of the womb asking questions of the world. Humans do this, we are curious creatures… That’s why we have that data in the first place! We naturally do this!

“If you want to build a ship, don’t drum up people to gather wood adn don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea”

– Antoine de Saint-Exupery.

This is what happened with those kids. Teach people to yearn for the vast and endless sea, then you’ll get the work done. Then we’ll do the hard work

Slides are available here:


Comment, Maggie Philbin) I run an organisations, Teen Tech, and that point that you are making of start where the passion actually is, is so important.

KB) People ask me about starting in data science, and I tell them that you need to think about your life, what you are passionate about and what will fuel and drive you for the rest of your life. And that is the most important thing.

Q1) You touched on a number of projects, which is most exciting?

A1) That’s really hard, but I think the Data Bowl is the most exciting thing. A few years back we had a challenge looking at how fast you can measure “heart ejection fraction – how fast the heart pumps blood out” but the way that is done, by specialists, could take weeks. Now that analysis is built into the MRI process and you can instantly re-scan if needed. Now I’m an astronomer but I get invited to weird places… And I was speaking to a conference of cardiac specialists. A few weeks before my doctor diagnosed me with a heart issue…. And that it would take a month to know for sure. I only got a text giving me the all clear just before I was about to give that talk. I just leapt onto that stage to give that presentation.

The Art Of The Practical: Making AI Real – Iain Brown, Lead Data Scientist, SAS

I want to talk about AI and how it can actually be useful – because it’s not the answer to everything. I work at SAS, and I’m also a lecturer at Southampton University, and in both roles look at how we can use machine learning, deep learning, AI in practical useful ways.

We have the potential for using AI tools for good, to improve our lives – many of us will have an Alexa for instance – but we have to feel comfortable sharing our data. We have smart machines. We have AI revolutionising how we interact with society. We have a new landscape which isn’t about one new system, but a whole network of systems to solve problems. Data is a selleble asset – there is a massive competitive advantage in storing data about customers. But especially with GDPR, how is our data going to be shared with organisations, and others. That matters for individuals, but also for organisations. As data scientists there is the “can” – how can the data be used; and the “should” – how should the data be used. We need to understand the reasons and value of using data, and how we might do that.

I’m going to talk about some exampes here, but I wanted to give an overview too. We’ve had neural networks for some time – AI isn’t new but dates back to the 1950s. .Machine learning came in in the 1980s, deep learning in the 2010s, and cognitive computing now. We’ve also had Moore’s Law changing what is theoretically possible but also what is practically feasible over that time. And that brings us to a definition “Artificial Intelligence is the science of training systems to emulate human tasks through learning and automation”. That’s my definition, you may have your own. But it’s about generating understanding from data, that’s how AI makes a difference. And they have to help the decision making process. That has to be something we can utilise.

Automation of process through AI is about listening and sensing, about understanding – that can be machine generated but it will have human involvement – and that leads to an action being made. For instance we are all familiar with taking a picture, and that can be looked at and understood. For instance with a bank you might take an image of paperwork and passports… Some large banks check validity of clients with a big book of pictures of blacklisted people… Wouldn’t it be better to use systems to achieve that. Or it could be a loan application or contract – they use application scorecards. The issue here is interpretability – if we make decisions we need to know why and the process has to be transparent so the client understands why they might have been rejected. You also see this in retail… Everything is about the segment of one. We all want to be treated as individuals… How does that work when you are one of millions of individuals. What is the next thing you want? What is the next thing you want to click on? Shop Directory, for instance, have huge ranges of products on their website. They have probably 500 pairs of jeans… Wouldn’t it be better to apply their knowledge of me to filter and tailor what I see? Another example is the customer complaint on webchat. You want to understand what has gone wrong. And you want to intervene – you may even want to do that before they complain at all. And then you can offer an apology.

There are lots of applications for AI across the board. So we are supporting our customers on the factors that will make them successful in AI, data, compute, skillset. And we embed AI in our own solutions, making them more effective and enhancing user experience. Doing that allows you to begin to predict what else might be looked at, based on what you are already seeing. We also provide our customers with extensible capabilities to help them meet their own AI goals. You’ll be aware of Alpha Go, it only works for one game, and that’s a key thing… AI has to be tailored to specific problems and questions.

For instance we are working on a system looking at optimising the experience of watching sports, eliminating the manual process of tagging in a game. This isn’t just in sport, we are also working in medicine and in lung cancer, applying AI in similar 3D imaging ways. When these images can be shared across organisations, you can start to drive insights and anomalies. It’s about collaborating, bringing data from different areas, places where an issue may exist. And that has social benefit of all of us. Another fun example – with something like wargaming you can understand the gamer, the improvements in gameplay, ways to improve the mechanics of how game play actually works. It has to be an intrinsic and extrinsic agreement to use that data to make that improvement.

If you look at a car insurer and the process and stream of that, that’s typically through a call centre. But what if you take a picture of the car as a way to quickly assess whether that claim will be worth making, and how best to handle that claim.

I value the application, the ways to bring AI into real life. How we make our experiences better. It’s been attributed to Voltaire, and also to Spiderman, that “with great power comes great responsibility”. I’d say “with great data power comes great responsibility” and that we should focus on the “should” not the “could”.


Comment) A correction on Alpha Go: Alpha Zero plays Chess etc. It’s without any further human interaction or change.

Q1) There is this massive opportunity for collaboration in Scotland. What would SAS like to see happen, and how would you like to see people working together?

A1) I think collaboration through industry, alongside academia. Kirk made some great points about not focusing on the same perspectives but on the real needs and interest. Work can be siloed but we do need to collaborate. Hack events are great for that, and that’s where the true innovation can come from.

Q2) What about this conference in 5 years time?

A2) That’s a huge question. All sorts of things may happen, but that’s the excitement of data science.

Socially Minded Data Science And The Importance Of Public Benefits – Mhairi Aitken, Research Fellow, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh

I have been working in data science and public engagement around data and data science for about eight years and things have changed enormously in that time. People used to think about data as something very far from their everyday lives. But things have really changed, and people are aware and interested in data in their lives. And now when I hold public events around data, people are keen to come and they mention data before I do. They think about the data on their phones, the data they share, supermarket loyalty cards. These may sound trivial but I think they are really important. In my work I see how these changes are making real differences, and differences in expectations of data use – that it should be used ethically and appropriately but also that it will be used.

Public engagement with data and data science has always been important but it’s now much easier to do. And there is much more interest from funders for public engagement. That is partly reflecting the press coverage and public response to previous data projects, particularly NHS data work with the private sector. Public engagement helps address concerns and avoid negative coverage, and to understand their preferences. But we can be even more positive with our public engagement, using it to properly understand how people feel about their data and how it is used.

In 2016 myself and colleagues undertook a systematic review of public responses to sharing and linking of health data for research purposes (Aitken, M et al 2016 in BMC medical ethics, 17 (1)). That work found that people need to understand how data will be used, they particularly need to understand that there will be public benefit from their data. In addition to safeguards, secure handling, and a sense of control, they still have to be confident that their data will be used for public benefits. They are even supportive if the benefit is clear but those other factors are faulty. Trust is core to this. It is fundamental to think about how we earn public trust, and what trust in data science means.

Public trust is easy to define. But what about “public benefit”. Often when people call about data and benefits from data. People will talk about things like Tesco Clubcard when they think of benefit from data – there is a direct tangible benefit there in the form of vouchers. But what is the public benefit in a broader and less direct sense. When we ask about public benefit in the data science community we often talk about economic benefits to society through creating new data-driven innovation. But that’s not what the public think about. For the public it can be things like improvements to public services. In data-intensive health research there is an expectation of data learning to new cures or treatments. Or that there might be feedback to individuals about their own conditions or lifestyles. But there may be undefined or unpredictable potential benefits to the public – it’s important not to define the benefits too narrowly, but still to recognise that there will be some.

But who is the “public” that should benefit from data science? Is that everyone? Is it local? National? Global? It may be as many as possible but what is possible and practical? Everyone whose data is used? That may not be possible. Perhaps vulnerable or disadvantaged groups? Is it a small benefit for many, or a large benefit for a small group.  Those who may benefit most? Those who may benefit the least? The answers will be different for different data science projects. That will vary for different members of the public. But if we only have these conversations within the data science community we’ll only see certain answers, we won’t hear from groups without a voice. We need to engage the public more with our data science projects.

So, closing throughts… We need to maintain a social license for data science practices and that means continual reflection on the conditions for public support. Trust is fundamental – we don’t need to make the public trust us, we have to actually be trustworthy and that means listening, understanding and responding to concerns, and being trustworthy in our use of data. Key to this is finding public benefits of data science projects. In particular we need to think about who benefits from data science and how benefits can be maximised across society. Data scientists are good at answering questions of what can be done but we need to be focusing on what should be done and what is beneficial to do.


Q1) How does private industry make sure we don’t leave people behind?

A1) BE really proactive about engaging people, rather than waiting for an issue to occur. Finding ways to get people interested. Making it clear what the benefits are to peoples lives There can be cautiousness about opening up debate being a way to open up risk. But actually we have to have those conversations and open up the debate, and learn form that.

Q2) How do we put in enough safeguards that people understand what they consent to, without giving them too much information or scaring them off with 70 checkboxes.

A2) It is a really interesting question of consent. Public engagement can help us understand that, and guide us around how people want to consent, and what they want to know. We are trying to answer questions where we don’t always have the answers – we have to understand what people need by asking them and engaging them.

Q3) Many in the data community are keen to crack on but feel inhibited. How do we take the work you are doing and move sooner rather than later.

A3) It is about how we design data science projects. You do need to take the time first to engage with the public. It’s very practical and valuable to do at the beginning, rather than waiting until we are further down the line…

Q3) I would agree with that… We need to do that sooner rather than later rather than being delayed deciding what to do.

Q4) You talked about concerns and preferences – what are key concerns?

A4) Things you would expect on confidentiality, privacy, how they are informed. But also what is the outcome of the project – is it beneficial or could they be discriminatory, or have a negative impact on society? It comes back to causing public benefits – they want to see outcomes and impact of a piece of work.


Automated Machine learning Using H2O’s Driverless AI – Marios Michailidis, Research Data Scientist,

I wanted to start with some of my own background. And I wanted to talk a bit about Kaggle. It is the world’s biggest preictive modelling competition platform with more than a million members. Companies host data challenges and competitors from across the world compete to solve them for prizes. Prizes can be monetary, or participation in conferences, or you might be hired by companies. And it’s a bit like Tennis – you gain points and go up in the ranking. And I was able to be ranked #1 out of a half million members t here.

So, a typical problem is image classification. Can I tell a cat from a dog from an image. That’s very doable, you can get over 95% accuracy and you can do that with deep learning and neural net. And you differentiate and classify features to enable that decision. Similarly a typical problem may be classifying different bird song from a sound recording – also very solvable. You also see a lot of text classification problems… And you can identify texts from a particular writers by their style and vocabulary (e.g. Voltaire vs Moliere). And you see sentiment analysis problems – particularly for marketing or social media use.

To win these competitions you need to understand the problem, and the metric you are being tested on. For instance there was an insurance problem where most customers were renewing, so there was more value in splitting the problem into two – one for renewals, and then a model for others. You have to have a solid testing procedure – really strong validation environment that reflects what you are being tested on. So if you are being tested on predictions for 3 months in the future, you need to test with past data, or test that the prediction is working to have the confidence that what you do will be appropriately generalisable.

You need to handle the data well. Your preprocessing, your feature engineering, which will let you get the most out of your modelling. You also need to know the problem-specific elements and algorithms. You need to know what works well. But you can look back for information to inform that. You of course need access to the right tools – the updated and latest software for best accuracy. You have to think about the hours you put in and how you optimize them. When I was #1 I was working 60 hours on top of my day job!

Collaborate – data science is a team sport! It’s not just about splitting the work across specialisms, it’s about uncovering new insights by sharing different approaches. You gain experience over time, and that lets you focus your efforts on where you can focus your effort for the best gain. And then use ensembling – combine the methods optimally for the best performance. And you can automate that…

And that brings us to H2O’s diverless AI which automates AI. It’s an AI that creates AI. It is built by a group of leading machine learning engineers, academics, data scientists, and kaggle Grandmasters. It handles data cleaning and feature engineering. It uses cutting edge machine learning algorithms. And it optimises and combines them. And this is all through a hypothesis testing driven approach. And that is so important as if I try a new feature or a new algorithm, I need to test it… And you can exhaustively find the best transformations and algorithms for your data. This allows solving of many machine learning tasks, and it is all in parallel to make it very fast.

So, how does it work? Well you have some input data and you have a target variable. You set an objective or success metric. And then you need some allocated computing power (CPU or GPU). Then you press a button and H2O driverless AI will explore the data, it will try things out, it will provide some predictions and model interpretability. You get a lot of insight including most predictive insights. And the other thing is that you can do feature engineering, you can extract this pipeline, these feature transformations, then use with your own modelling.

Now, I have a minute long demo here…. where you upload data, and various features and algorithms are being tried, and you can see the most important features… Then you can export the scoring pipeline etc.

This work has been awarded Technology of the Year by InfoWorld, it has been featured in the Gartner report.

You can find out more on our website: and there is lots of transparency about how this work, how the model performs etc. You can download a free trial for 3 weeks.


Q1) Do you provide information on the machine learning models as well?

A1) Once we finish with the score, we build the second model which is simple to predict that score. The focus on that is to explain why we have shown this score. And you can see why you have this score with this model… That second interpretability model is slightly less automated. But I encourage others to look online for similar – this is one surrogate model.

Q2) Can I reproduce the results from H2O?

A2) Yes. You can download the scoring practice, it will generate the code and environment to replicate this, see all the models, the data generated, and you can run that script locally yourself – it’s mainly Python.

Q3) That’s stuff is insane – probably very dangerous in the hands of someone just learning about machine learning! I’d be tempted to throw data in… What’s the feedback that helps you learn?

A3) There is a lot of feedback and also a lot of warning – so if test data doesn’t look enough like training data for instance. But the software itself is not educational on it’s own – you’d need to see webinars, look at online materials but then you should be in a good position to learn what it is doing and how.

Q4) You talked about feature selection and feature engineering. How robust is that?

A4) It is all based on hypothesis testing. But you can’t test everything without huge compute power. But we have a genetic algorithm to generate combinations of features, tests them, and then tries something else if that isn’t working.

Q5) Can you output as a model as eg a deserialised JSON object? Or use as an API?

A5) We have various outputs but not JSON. Best to look on the website as we have various ways to do these things.


Innovation Showcase

This next session showcases innovation in startups. 

Matt Jewell, R&D Engineer, Amiqus

I’m an R&D Engineer at Amiqus, and also a PhD student in Law at Edinburgh University. Firstly I want to talk about Amiqus, and our mission is to make civil justice accessible to the world. And we are engaged in GDPR as a data controller, but also as a trust and identity provider – where GDPR is an opportunity for us. We created amiqusID to enable people to more easily interact with the law – with data from companies house, driving licenses, etc.

As a PhD student in law there is some overlap in my job and my PhD research, and I was asked about in data ethics. So I wanted to note GDOR Article 22 (3) which states that

“the data controller shall implement suitable measures to safeguard the data subject’s rights and frredoms and legitimate interests, at least the right to obtain human intervention on he part of the controller, to express his or her point of view and to the contest the decision.”

And that’s across the board. GDPR recommits us to privacy, but also embeds privacy as a public good. And we have to think about what that means in our own best practices, because our own practices will shape what happens – especially as GDPR is still quite uncertain, still untested in law.

Carlos Labra, CEO & Co-Founder, Particle Analytics

I come from a mechanical engineering background, so this work is about simulation. And specifically we look at fluids simulation in aircraft. Actually particle simulation is the next step in industry, and that’s because it has been incredibly difficult to do this simulation with computers. We can do basic computer models for large scale materials but not appropriate for particles. So in Particle Analytics we are trying to address this challenge.

So, a single simulation for a silo, and my model for a silo, has to calculate the interactions between every single particle (in the order of millions), in very small time intervals. That takes huge computing power. So for instance one of our clients, Astec, works on asphalt dryer/mixer technology and we are using particle analytics to enable them to establish and achieve new energy-based KPIs (Key Performance Indicators) that could make enormous savings per machine per year, purely by optimising to different analytics.

So we look at spatial/temporal filters, multiscale analysis, and reduce data size/noise. The Data operators generate new insights and KPIs. So the cost of simulation is going down, and the insights are increased.

Steven Revill, CEO & Co-Founder, Urbantide

I’m here to talk to you about our platform USmart which is making smart data. How do we do this? Well, when we started a few years ago we recognised that our businesses, organisations, and places, would be helped by artificial intelligence based on data. That requires increased collaboration around data and increasing reuse of data. Too often data is in silos, and we need to break it out and share it. But we also need to be looking at real time data from IoT devices.

So, our solution is USmart. It collects data from any source in real time, and we create value with automatic data pipelines with analytics, visualisation and AI ready. And that enables collaboration – either with partners in a closed way, or as open data.

So, I want to talk about some case studies. Firstly Smartline, which is taking housing data to identify people at risk of, or in, fuel poverty. We have 80m data points so far, and we expect to reach up to 700m+ soon. This data set is open and when it goes live we think it will be the biggest open data set in the UK.

Cycling Scotland is showing the true state of cycling, helping them to make their case for funding and gain insght.

And we are working with North Lanarkshire Council on business rates, which could lead to saving of £18k per annum, but can also identify incorrect rates of £!00k+ value.

If you want to find out more do come and talk to me, take a look at USmart, and join the USmart community.

Martina Pugliese, Data Science Lead, Mallzee

I am data science lead for Mallzee – proudly established and run from Edinburgh. Mallzee is an app for clothes, allowing you to like or dislike a product. We show you 150+ brands. We’ve had 1.4m downloads, 500m ratings on products, 3m products rated. The app allows you to explore products, but it also acts as a data collection method for us and for our B2B offering to retailers. So we allow you to product test, very swiftly, your products before they hit the market.

Why do this? Well there are challenges that are two sides of the same coin: Overstock where you have to discount and waste money; and Understock where you have too little of the best stock and that means you don’t have tine to make the best return on your products.

As well as gathering data, we also monitor the market for trends in pricing, discounting, something new happening… So for instance only 50.8% of new products last quarter were sold at full price. We work to help design, buying and merchandising teams improve this rate by 6-10% through customer feedback.

So, data is our backbone. For the consumer we enable discovery, we personalise the tool to you – it should save you time and money. At the same time the data also enables performance prediction. We have granular user segmentation. And it goes back to you – the best products go on the market. And long term that should have a positive environmental impact in reducing waste.

Maggie Philbin: Thank you. I’m going to ask you to feedback on each others ideas and work.

Carlos: I’m new to the data science world, so for me I need to learn more – and these presentations are so useful for that.

Martina: This is really useful for me, and great to see that lots of different things going on.

Matt: My work focuses on smart cities, so naturally interested in Steven’s presentation. Less keen on problematising the city.

Steven: Really interesting to discuss things backstage, but also exciting to hear Martina talking about how central data is for your business right now.

Maggie: And that is part of the wonderful things about being at Data Fest, that opportunity to learn from and hear from each other, to network and share.

We are back from lunch with a video on work in the Highlands and Islands using ambient technologies to predict likelihood of falls etc. 

Transforming Sectors With Data-Enabled Innovation – Orsola De Marco, Head of Startups, Open Data Institute

I’m going to talk about transforming sectors with data. The ODI, founded by Tim Berners-Lee and Nigel Shadbolt, focuses on data and what data enables.We think about data as infrastructure. If you think of data as roads you see that the number of roads do not matter as much as how they are connected… In the context of data we need data that can be combined, that is structured for connection and combination. And we look at data through open data and open innovation. What the ODI’s work has in common is that open innovation is at the core. This is not just about innovating, but also about making your organisation more porous, bringing in the outside. And I love the phrase “if you are the smartest person in the room, then you are in the wrong room”, because so often innovation comes from collaboration and from the outside.

Open innovation has huge potential value. McKinsey in 2013 predicted $3-5 trillian impact of open data; Lateral Economics (2014) puts that at more like $20 tn.

When we talk about open innovation and collaboration, we can talk about the corporate-startup marriage. We used to see linear solution having good returns, but that is no longer the case. Problems are now much more complex, and startups are great at innovation, at thinking laterally, at finding new approaches. But corporates have scale, they have reach, and they have knowledge of their industries and markets. If you bring these two together, it’s clear you can bring a good opportunity to live.

As example I wanted to share here is Transport for London who wanted to release open data to enable startups and SMEs to use it. CityMapper is one of the best known of these tools built on the data. Last year, after several years of open data, they commissioned a Deloitte report (2017) that this release had generated huge savings for TfL.

Another example is Arup. Historically their innovation had been taking place in house. They embraced a more open approach, and worked with two of our start ups Macedon C and Smart Sensors. Macedon C helped Arup explore airport data so that Arup didn’t need to do that processing. Smart Sensors installed 200 IoT sensors, sharing approaches to those sensors, what it means to implement IoT in buildings, how they could use this technology. And they rolled them out to some of their services.

Those are some examples. We’ve worked with 120 startups across the world. And they have generated over £37.2M in sales and investment. These are real businesses bringing real value – not just a guy in a shed. The major challenge is on the supply side of the data. A lot of companies are reluctant to share, mentioning three blockers: (1) it feels very risky to open data up – that issue feels highly relevant this week; (2) its expensive to do especially if you don’t know the value coming back; (3) perceived lack of data literacy and skills. Those are all important… But if you lead and innovate, you get to set the tone for innovation in your sector.

The idea of disruption is raised a lot, but it is real. But to actually disrupt you do really need a culture of open innovation is essential to lead. It needs to be brought in at senior level and brought into the sector.

Data infrastructure can transform sectors. And joining forces between data suppliers and users are important there. For instance we are working on a project called Open Active, with Sport England. A lack of information on what was going on in different areas was an issue for people getting active. We were involved at the outset and could see that data was the blocker here… If you tried to aggregate information it was impossible. So, in the first year of the programme we brought providers into the room, agreed an open standard, and that enabled aggregation of data. We are now in the second phase and, now that the data is consistent and available, we are bringing start ups in to engage and do things with that data. And those start ups aren’t all in sports, some are in healthcare sector – using sports data to augment information shared by medics. And from leisure companies helping individuals to find things to do with their spare time.

Another example is the Open Banking sector. Over 60% of UK banking customers haven’t changed their bank account in 5 years. And many of those haven’t changed them in 20 years. So this initiative enables customers to grant secure access to your banking details for e.g. mortgage lenders, or to enable marketplaces to offer energy switching companies. Our experience in this programme was to facilitate these banks, and took that experience of data portability… And now we are working with Mexico on a FinTech law that requires all banks to have an open API.

In order to innovate in sectors it’s important to widen access to data. This doesn’t mean not taking data privacy seriously, or losing competitive advantage.

And I wanted to highlight a very local programme. Last year we began a project in the peer to peer accommodation market. The Scottish expert advisory panel noted that whilst a lot of data is generated, no real work is looking at the impact of the sharing economy in accommodation. That understanding will enable policy decisions tied to real concerns. We will be making recommendations on this very soon. If you are interested, do get in touch and be part of this.


Q1) You talked a lot about the value of data. How do you measure that economic value like that?

A1) We base value on sales and investment generated, and/or time or money saves in processes. It’s not an exact science but it looks for changes to the status quo.

Q2) What is the most important and valuable thing from your experience here?

A2) I think I’ll approach that answer in two ways. We do innovate work with data but we often facilitate conversations between data provider and start ups. For making data available we remove those blockers; for start ups it’s helping that facilitate those conversations, it’s helping them grow and develop and tailoring that support.

Q3) What next?

A3) Our model is a sector transformation model. We talk to a sector about sharing and opening up, and then we have start ups in an accelerator so that data will find a use. That’s a huge difference from just publishing the data and wondering what will happen to it.

Designing Things with Spending Power – Chris Speed, Chair of Design Informatics, University of Edinburgh

I have a fantastic team of designers and developers, and brilliant students who ask questions, including what things will be like in Tomorrow’s World!  We look at all kinds of factors here around data. So I want to credit that team.

Many of you in the room will be aware that data is about value constellations, rather than value chains. These are complex markets, many players – which may be humans but also which may be bots. That changes our capacity to construct value, since we have agents that construct value. And so I will talk about four objects to look at the disruption that can be made, and what that might mean, especially as they gain agency, to gain power. One of the things we thought was, what happens when we give things spending power.

See diagram from Rand organisation comparing centralised with decentralised and distributed – we see this model again and again… But things drift back occasionally (there’s only one internet banking platform now, right?). I’m going to show this 2014 bitcoin blockchain transaction video – they move too fast to screengrab these days! So… what happens when we have distributed machines with spending power? And when transactions go down to absolutely tiny transactions and amount of money.

So, we run BlockExchange workshops, with lego, to work on the idea of blockchain, what it means to be a distributed transaction system.

Next we have the fun stuff… What happens when we have things like Ethereum… And smart contracts. What could you do with digital wallets. If the UN gives someone a digital password, do they need sovereignty. So, we undertake bodily experiments with this stuff. We ran a physical experiment – body storming – with bitcoin wallets and smart contracts… A bit like Pokemon Go but with cash – if you hit a hotspot the smart contract assigns you money, Or when you enter a sink, you lose bitcoin. So, here is video of our GeoCoin app and also an experiment running in Tel Aviv.

These three banking volunteers design to design a new type of cinema experience… They enter the cinema by watching two trailers that are pickupable in the street… Another colleague decides not to do this… They gain credit by tweeting about trailers… bodystorming allows new ideas to be developed (confusingly, there is no cinema… This is, er, a cinema of the mind – right Chris?). 

Next we have a machine with a bitcoin wallet. Programmable money allows us to give machines buying power… Blockchain changes the history to things, adding value to value… So, we set up a coffee machine Bitbarista, with an interface that asks the coffee drinker to make decisions about what kind of coffee they want, what values matter… Mediating the space between values and value.

We have hairdryers – these are new and have just gone to the Policy Unit this week. We have Gigbliss Plus hairdryer… That allows you to buy and trade energy and to dry your hair when energy is cheaper… What happens when you do involve the public in balancing energu. And we have another hairdryer… That asks whether you want unethical energy now, or whether you want to wait for an ethical source – the hairdryer switches on accordingly. And then we have Gigbliss Auto, which has no buttons. You don’t have control, only the bitcoin wallet has decision powers… You don’t know when it comes on… But it will. But it changes control. Of those three hairdryers, which are we happy to move to… Where do we feel happy here.

And then we have KASH cups, with chips in them. You can only but coffee when you put two cups down. So you get credit, through the cups digital wallet, to encourage network and development. You don’t have to get copy – you can build up credit. We had free coffee in the other room… But we had a very fancy barista for the KASH cups, and people queued for this for 20 minutes – coffee with social value.

Questions for us… We give machines agency, and credit… What does that mean for value and how we balance value.

Maggie: It’s at this point I wish Tomorrow’s World still existed!


Q1) where is this fascinating work taking you?

A1) I think this week has been so disruptive in terms of data and technologies disruption of social, civic, political values. I think understanding that we can’t balance value, or fair trade, etc. on our own is helpful and I’m really excited by what bots can offer here…

Q2) I was fascinated by the hairdryers… I’ve been in the National Grid’s secret control room and seeing that, that thing of Eastenders finishes and we make a cup of tea means bringing a whole power station on board… But waiting 10 minutes might avoid that need. It’s not trivial it’s huge.

A2) Yes, and I think understanding how that waiting, or understanding consequences of actions would have a real impact. The British public are pretty conscious and ethical I think, when they have that understanding…

Q3) Have you thought about avoiding queues with blockchain?

A3) We don’t want to just play incentives to get people out of queues. People are there for different reasons, different values, some people enjoy the sociability of a queue… Any chance to open it up, smash it up, and offer the opportunity to co-construct is great. But we need to do that with people not just algorithms.

Maggie: At this point I should be introducing Cathy O’Neil, but she has been snowed in by 15 inches of snow on the East Coast of the US. So, she will come over at a later date and you’ll all be invited. So, in place of that we have a panel on the elephant in the room, the Facebook and Cambridge Analytica scandal, with a panel on data and ethics.

Panel session: The Elephant in the Room: What Next? – Jonathan Forbes (JF), CTO, Merkle Aquila (chair); Brian Hills (BH), Head of Data, The Data Lab; Mark Logan (ML), Former COO Skyscanner, Investor and Advisor to startups and scale ups; Mhairi Aitken (MA), Research Fellow, University of Edinburgh. 

JF: So, thinking of that elephant in the room.. That election issue… That data use. I want to know what Facebook could have done better?

ML: It has taken them a long time to respond, which seems strange… But I see it as a positive really. They see this as a much bigger issue rather than the transactional elements here. In that room you look at risk and you look at outrage. I think Facebook were trying to figure out why outrage was so high, I think that’s what has surprised them. I think they took time to think about what was happening to them. I don’t think it’s just about electing a game show host to president… The outrage is different. Cambridge Analytica is a bad actor, not just on data but on their advocacy for other problematic tactics. Facebook shouldn’t be bundled into that. I think aspects here is that you have a monopoly. Facebook is an advertising company – they need to generate data and pass it onto app developers. Those two things don’t totally aligned. And I think the outrage is about trust and expectation of users.

JF: You are closest to the public in your research. The share price is dropping significantly right now… How, based on past experience, do you see this playing out.

MS: I’m used to talking to people about public sector use of data. Often people talk about Facebook data and make two points: firstly that they contribute their own data and control  that and know how it’s used; but they also have very high expectations of use for public sector organisations and don’t have that for private sector organisations – they think someone will generate ads and profit but when used in politics that’s very different, and that changes expectations.

JF: I enjoyed your comment about the social license… and I think this may be a sign that the license is being withdrawn. The GDPR legislation certainly changes some things there. I was interested to see Tim Berners Lee’s response, taking Mark Zuckerberg’s perspective… I was wondering, Brian, about the commercial pressures and the public pressures here. Are they balancing that well?

BH: No. When we look back I think this will be a pivotal moment. I kind of feel like GDPR piece is like being in a medieval torture chamber… We have a countdown but the public don’t know much about it. With Facebook it’s like we have a firework in the sky and people are asking what on earth is going on… And we have an opportunity to have a discussion about the use of data. As we leave today we have a challenge around communicate our work with data, what are our responsibilities here. The big data thing, many business cases seem like we’ve failed – we’ve focused on the technology and only that. And I feel we now have an opportunity and a window here.

JF: I’d like to take the temperature of the room… How many of you had Facebook on their phone, and don’t this week? None.

ML: I think that’s the point. The idea of not doing to others data what you wouldn’t want done to your own… But the reality is that legislation is playing catch up to practice. Commercially it’s hard to do the right thing. I think Mark Zuckerberg has reasonably good intentions here… But we have this monopoly… The parallel here is banking. And monopoly legislation hasn’t kept pace with the monopolies we have. I think it would be great if you could export your data, friends data, etc. to another platform. But we can’t.

Comment: I think you asked the wrong question… Who here doesn’t Facebook on their phone at all. Actually quite a lot. I think actually we have that sense that power corrupts and absolute power corrupts absolutely. And I don’t feel I’m missing out, I’m sure others feel that too. And I’m unsurprised about Facebook, I could see where it was going.

JF: OK, so moving towards what we can do, should we have a code of conduct, a hypocratic oath to data, a “do no harm”.

BH: I don’t see ethics featuring in data models. I think we have to build that in. Cathy O’Neil talks about Weapons of Math Destruction… We have to educate our data science students how to use these tools ethically, to think about who they will work with. Cathy was a Quant and didn’t like that so she walked away. We have to educate our students about the choices they make. We talk about optimisation, optimisation of marketing. In optimising STEM stuff… And we are missing stuff… I think we need to move towards STEAM, where A is for Arts. We have to be inclusive for arts and humanities to work with these teams, to think about skills and diversity of skills.

JF: Particularly thinking about healthcare

MA: There is increasing drive to public engagement, to public response. That has to be much more at the heart of training for data scientists and how it relates to the society we want to create. There can be a sense of slowing momentum, but it’s fundamental to getting things right, and shaping directions of where we are going…

JF: Mark, you mentioned trust, and your organisation has been very focused on trust.

ML: These multifacet networks are built on trust. For Skyscanner trust was so much more important than favouring particular clients. I think Facebook’s error has been to not be more transparent in what they do. We have had comments about machine learning as hype, but actually machine learning is about machines learning to do something without humans. We are moving to a place where decisions will be made by machines. We have to govern that, and to police machines with other machines. And we have to have algorithms to ensure that machine learning is appropriate and ethical.

JF: I agree. It was interesting to me that Weapons of Math Destruction is the top seller in algorithms and programme – a machine generated category – but that is reassuring that those working in this space are reading about this. By show of hands how many here working in data science are thinking about ethics. Some are. But unclear who isn’t working with data, or who isn’t working ethical. So, to finish I want your one takeaway for this week.

BH: I think it’s up to us to decide how to do things differently, and to make the change here. If we are true data warriors driving societal benefit then we have to make that change ourselves.

ML: We do plenty to mess up the planet. I think machine learning can help us sort out the problems we’ve created for ourselves.

MA: I think its been a wonderful event, particularly the variety and creativity being shared. And I’m really pleased to open up these conversations and look at these issues.

JF: I’m optimistic too. But don’t underestimate the ability of a small group of committed people to change the world. So, Data Warriors, all of you… You know what to do!

Maggie: Thank you all for your conversation, your enthusiasm. One message I really want to give you is that when you look at the use of data, the capacity to do good… The vast majority of young people are oblivious. They could miss out on an amazing career. But as the world changes, they could miss out on a decent career without these skills. Don’t underestimate your ability as one person with knowledge of that area to make a difference, to influence and to inspire. A few years back, in Greenock, we ran an event with Teen Tech and the support of local tech companies made all the difference… One team went to the finals in London, won and went to Silicon Valley… And that had enormous impact on that school and community, and now all S2 students do that programme, local companies come in for a Dragon’s Den type set up. Any moment that you can inspire and support those kids will make all the difference in those lives, and can make all the difference, especially if family, parents, community don’t know about data and tech.

Closing Comments – Gillian Docherty, CEO, The Data Lab

Firstly thank you to Maggie for being an amazing host!

I have a few thank yous to make. It has been an outstanding week. Thank you all for participating in this event. This has been just one event of fifty. We’ve had another 3000 data warriors, on top of you 450 data warriors for Data Summit. Thank you to our amazing speakers, and exhibitors. The buzz has been going throughout the event. Thank you to our sponsors, and to Scottish Government and Scottish Enterprise. Thank you to our amazing volunteers, to Grayling who has been working with the press. To our venue, events team and caterers. Our designer from two fifths design. And the team at FutureX who helped us organise Data Talent and Data Summit – absolutely outstanding job! Well done!

And two final thank yous. Firstly the amazing Data Lab team. We have thousands of new people being trained, huge numbers of projects. I also want to specifically mention Craig Skelton who coordinated our Fringe events; Cecilia who runs our marketing team; and Fraser and John who were behind this week!

My final thank you is to all of you, including the teams across Scotland participating. It is a fantastic time to be working in Scotland! Now take that enthusiasm home with you!

 March 23, 2018  Posted by at 10:48 am Events Attended, LiveBlogs Tagged with: , , ,  No Responses »
Mar 222018

Today I am at the Data Fest Data Summit 2018, two days of data presentations, showcases, and exhibitors. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to the Data Lab – Gilian Doherty, The Data Lab CEO

Welcome to Data Summit 2018. It’s great to be back, last year we had 25 people with 2000 people, but this year we’ve had 50 events and hope to reach over 3500 people. We’ve had kids downloading data from the space station, we’ve had events on smart meters, on city data… Our theme this year is “Data Warrior” – a data warrior is someone with a passion and a drive to make value from data. You are data warriors. And you’ll see some of our data warriors on screen here and across the venue.

Our whole event is made possible by our sponsors, by Scottish Enterprise and Scottish Government. So, let’s get on with it!

Our host for the next two days is the wonderful and amazing Maggie Philbin, who you may remember from Tomorrow’s World but she’s also had an amazing career in media, but she is also chair of UK Digital Skills and CEO of Teen Tech, which encourages young people to engage with technology.

Intro to the Data Summit – Maggie Philbin

Maggie is starting by talking to people in the audience to find out who they are and what they are here for… 

It will be a fantastic event. We have some very diverse speakers who will be talking about the impact of data on society. We have built in lots of opportunities for questions – so don’t hesitate! For any more information do look at the app or use the hashtag #datafest18 or #datasummit18.

I am delighted to introduce our speaker who is back by popular demand. She is going to talk about her new BBC Four series Contagion, which starts tonight.

The Pandemic – Hannah Fry

Last year I talked about data for social good. This year I’m going to talk about a project we’ve been doing to look at pandemics and how disease spreads. When we first started to think about this, we wanted to see how much pandemic disease is in people’s minds. And it turns out… Not many.

Hannah’s talk was redacted from this post yesterday but, as Contagion! has now been broadcast, here we go: 

Influenza killed 100 million people in the 20th Century. The Spanish Flu killed more people in one year than both World Wars. Which seems surprising but that may be partly because Pandemic Flu is very different from Seasonal Flu. Pandemic Flu is where a strain of flu jumps from animals to humans and spreads so fast that we can’t vaccinate fast enough. For that reason Pandemic Flu is the top of the UK Government’s Risk Register.

So, what we decided to do was essentially a TV stunt with a real purpose. We built a simple smart phone app. The App captures where people are, and how many people they are with. That allows us to see how disease might spread. Firstly to do that for TV of course, but secondly this is proper citizen science for real research. So, I spent a year calling in lots of favours, getting on all sorts of media, asking people to download an app.

But we also needed a patient zero, and we also needed a ground zero. We picked Haselmere in Surrey, which is a sort of Goldilocks town, just big enough, well connected.. A beautiful English town… Just the type you’d like to destroy with an imaginary virus. And I was patient zero… So I went there, went to the gym, went to the shops, went to the pub,,, But unknown to me I also walked past others with the app… So when I stood need to one of these , it was for enough time to infect that person… And so now there were two people and then many more… A pharmacist got infected early on and continued infecting out…

These patterns are based on our best mathematical models for infection… And you can quickly see pockets of infection developing and growing. Spreading quickly to a whole town. But those dots on a map are all real people…

Looking at some real infection sites…. So, in Petersfield there is a school were a few kids from Haselmere attend, commuting by train. Three kids running our app… By day three, two were infected, one wasn’t. They went to the break room, and outside, and the third person got infected… And then infected their family…

I wanted to also talk about a person from Haselmere who work in London on Day Two. Two people from the town don’t know each other, but they took the train home, and the one infected the other…

Now, this is just the Haselmere experiment, but we did a nationwide experiment…

We persuaded 30,000 people to download the app and take part… Again, it starts with me walking around Haselmere. By a month in, London is swamped. Two months in it sweeps Scotland. By three months it’s in North Ireland. Really by then only the North of Scotland was safe! What is startling isn’t the speed of the spread, but also how many people get infected… This is the most accurate model we have to date. The most accurate estimate for a Spanish Flu type virus, is a staggering 43,343,849. A conservative fatality rate of 2% would be 886,877 deaths. But that’s worst case scenario… That’s no interventions… Which is why this data and this model are so important as they allow you to understand and trial intervention. Generally most people infect the same small number of people, but some super spreaders have a much bigger impact. If you target super spreaders with early vaccination – just vaccinating a targeted 10% – makes a huge difference. It really slows the spread, giving yourself a fighting chance to overcoming infection.

We know these pandemics can and will happen, but it’s about what you plan for and how you intervene. The only way to answer those big questions and to know how to intervene, is to understand that data, to understand that spread. So we are anonymising this data set and releasing it to the academic community – as a new gold standard for understanding infection. Data really does save lives.


Q1) So, Shetland is safe…. Unless the infection started there.

A1) When we spoke to one person about what they’d do in a pandemic, they said they’d get in a car with their kids and just

Q2) I’m from the NHS and there has been a lot of work of super spreaders, closing schools… Has there been work on the most efficient, mathematically effective patterns to minimise infection.

A2) Schools are an interesting one… Closing schools sounds like it makes everything simple. Sometimes shutting schools means kids share in an unpredictable manner as they will go places too. And then you reopen schools and reinfect potentially… And that’s without the economic impact. These are all questions we are thinking about.

Q3) That’s awesome and scary. What about people developing immunity.

A3) Our model is no immunity, and no-one recovers. But you can build that data in later, adding rish assumptions. And some of the team working on this are looking at infection transmitted through the air – some viruses can stick around a few hours.

Q4) I remember the SARS book. I’m very paranoid… Brought suits, gloves, bleach… In New Zealand you need a two week supply of stuff in your house… If we did that, how would that make a difference.

A4) Yes… So for instance the government always pushes messages about hand washing whenever flu is taking place. It doesn’t feel that that would make a big difference… But at a population level it really does…

Q5) My question is whether you will make the data available for other people – for epidemiology but also for transport, for infrastructure.

A5) Yes, absolutely. We wanted to make this as scientifically rigorous as possible. The BBC gives us the scale to get this work done. But we are now in the process of cleaning the data to share it. Julia Gog at Cambridge is the lead here so look out for this.

Q6) What about data privacy here?

A6) At a national level the data is accurate to 1 km squared, with one pin every 24 hours. Part of the work to clean the data is checking if it can be reverse engineered to make sure that privacy is assured. For Haselmere there is more detail… We are looking at skewing location, at just sharing distance apart rather than location, and at whether there is any way you can reverse engineer the dataset if you’ve seen the TV programme, so we are being really careful here.

Business Transformation: using the analytics value chain – Warwick Beresford-Jones, Merkle Aquila

I’ll be talking about the value chain. This is:

Data > Insight > Action > Value (and repeat)

Those two first aspects are “generation” and the latter two are “deployment”. We are good at the first two, but not so much the action and value aspects. So we take a different approach, thinking right to left, which allows faster changes. Businesses don’t always start with an end in mind, but we do have accessible data, transformatic insights, organisational action, and integrated technology. In many businesses much of the spend is on technology, rather than the stage where change takes place, where value is generated for the business. So that a business understands why they are investing and what the purpose of this.

I want to talk more about that but first I want to talk about the NBA and the three point line, and how moving that changed the game by changing basket attempts…And that was a tactical decision of whether to score more points, or concede fewer points, enabling teams to find the benefit in taking the long shot. Cricket and Football similar use the value chain to drive benefit, but the maths work differently in terms of interpreting that data into actions and tactics.

Moving back to business… That right to left idea is about thinking about the value you want to derive, the action required to do that, and the insights required to inform those actions, then the data that enables that insight to be generated.

Sony looked at data and customer satisfaction and wanted to reduce their range down from 15 to 4 handsets. But the data showed the importance of camera technology – and many of you will now have Sony technology in the cameras in your phones, and they have built huge value for their business in that rationlisation.

BA wanted to improve check in experiences. They found business customers were frustrated at the wait, but also families didn’t feel well catered for. And they decided to trial a family check in at Heathrow – that made families happier, it streamlined business customers’ experience, and staff feedback has also been really positive. So a great example of using data to make change.

So, what questions you should be asking?

  • What are the big things that can change our business and drive value?
  • Can data analytics help?
  • How easy will it be to implement the findings?
  • How quickly can we do?

Q1) In light of the scandal with Facebook and Cambridge Analytica, do you think that will impact people sharing their data, how their data can be used?

A1) I knew that was coming! It’s really difficult… And everyone is also looking at the impact of GDPR right now. With Facebook and LinkedIn there is an exchange there in terms of people and their data and the service. If you didn’t have that you’d get generic broadcast advertising… So it depends if people would rather see targeted and relevant advertising. But then with some of what Facebook and Cambridge Analytica is not so good…

Q2) How important is it for the analysts in an organisation to be able to explain analytics to a wider audience?

A2) Communication is critical, and I’d say equally important as the technical work.

Q3) What are the classic things people think they can do with data for their business, but actually is really hard and unrealistic?

A3) A few years ago I was meeting with a company, and they gave an example of when Manchester United had a bad run, and Paddy Power had put up a statue of Alex Ferguson with a “do not break glass sign” and they asked how you can have that game changing moment. And that is really hard to do.

Q4) You started your business at your kitchen table… And now you have 120 people working for you. How do you do that growth?

A4) It’s not as hard as you think, but you have to find the right blend of raw talent with experience – lots of tricky learning.

Project Showcase

How will you make a difference? I’m going to talk about how I’ve made major change for one of Scotland’s biggest organisation. I was working for Aggreko, the leader of mobile modular power and temperature solutions. They provide power for the Olympics, the World Cup, the Superbowl… A huge range of events across the world.
We are now watching a short video on how Aggreko supplies large scale mobile power (30 MW set up in 17 days) to cover local demand in Macha Pichu when a hydroelectric plant has to be shutdown for maintenance. 
In the dark old days Aggreko was a reactive organisation. A customer would ring with an issue, then Aggreko would send an engineer out. And then they moved to monitoring the mobile power kit to help monitor equipment across the world on a 24/7 basis. My team build the software to undertake that monitoring, to respond to every alert, alarm, any issue customers might face. And in fact in many cases to fix an issue before a customer ever became aware of it. And that meant far greater reliability and efficiency. And doing that we wondered how we might be able to predict issues, to predict how eqyuipment might fail. We didn’t know how to do that and we weren’t afraid to ask…
So we went to the Data Lab, took my idea to their board, and they funded a year long pilot to work with University of Strathclyde and Microsoft, as well as needing to build a team of engineers, technicians, specialists to be part of the team to take this far. This was a group of massively smart group, but also some big egos… A lot of what I had to do was to ensure there was good collaboration across those teams. The collaboration is really what made this project a real success. We created an advanced analytics team which allowed us to put models into use, some of which could predict an issue 2 weeks ahead of any issue, and being able to manage those issues for our customers.
The guys at Data Lab helped me to make a difference, they were brilliant and all that help is available to you too. So what are you waiting for?  
There are various ways to resolve this, but they are not easy. There is work for the 1% of large companies, but that leaves SME out. And 50k SMEs go out of business every year in the UK. So, what is the solution? Well, let me tell you about Previse and what we do. We think we have a unique solution. David Brown, one of our co-founders, had experience in the sector, and he didn’t want to accept the status quo. Accounting the oldest processes and data that a company is, but no-one is using that in this sort of way. So what do we do?
Previse finds data, engages with data, pulls in other data… And looks at what can work. We can look at all data on every invoice from every supplier. We then determine a score, and a threshold…. So that when invoices come in they can be prioritised and mostly approved and paid immediately. The process is the same for the buyer but it makes a huge difference for the supplier. Placing an invoice through Previse you can send and have approved invoices very swiftly, and without chasing and additional work. That is a huge difference in cost and time. The large corporates we’ve been talking with – including 70% of large FTSE companiess – are really enthusiastic and want us to help them.
And our experience in Scotland has been incredible. The Data Lab helped us throughout, finding the right universities to work with. We work with Heriot Watt (Mike Chantler) and with MBN to find the right resources, and Scottish Enterprise have helped us make Scotland our hub for data science and software engineers. We’ve employed 5 people in the last 6 months, and we’ll double that by the end of the year. We can generate growth, but it’s also about making real change with data.
If SMEs are paid on time, that allows them to thrive and grow. It’s a huge problem and we think it can be resolved.
Our platform consists of four modules: sustainability; mapping; reporting and advanced. But I’ll talk about our mapping module and some projects we’ve worked on:
  • Mapping the water footprint of your crops – a project with the University of Edinburgh, funded by Data Lab. This brings together a wide range of crop data layers. We have an overlay based on water for crop growing, and overlays of gray water, or the erosion potential – for instance there is high erosion potential on the west coast of Scotland, mmostly low erosion in the east of Scotland.
  • Forests 2020 is a Mexican application supported by the UK Space Agency, and we work with University of Edinburgh, University of Leicester, and Carbomap. Here we can see deforestation patterns, and particular crop areas.
  • Innovate UK: farm data, which is a collaboration with Rothamsted Research, Environment Systems, and Innovate UK – this is at an early stage looking at crop rotation data for UK and export markets. And you can also see the soil you are growing on, what can be planted, what sort of fertilisers to use.
  • Sustainability risk – supports  understanding of risks such as water depletion, and the various factors impacting and shifting that.
  • We also have tools for government to know how to plan what type and locations they should be building power plants in.

So, in conclusion, layering data allows us to gain new insights and understanding.

After a good lunch and networking session we are now back in the main hall, starting with a video on the use of data in Heineken production process. And an introduction to Stefaan Verhulst, a former Glasgow graduate now based in New York.

Data Driven Public Innovation In Partnership With The Private Sector: The Emerging Practice Of Data Collaboratives – Stefaan Verhulst, Co-founder and Chief Research and Development Officer, The Gov Lab

I’m delighted to be back in Scotland for this event looking at how data can be help society, and how society can be. That is also the focus of The Gov Lab in New York. And we also look at how we can unleash data for good.

An example  want to give you is the earthquake in Nepal a few years ago. It was a terrible event but it was also inspiring too, because Ncell, a cell phone operator, and Flowminder (based in Sweden and the UK) worked together to map the flow of people to intervene, to save lives. It is a great example of using data in the public good. And it’s an example of the growth of available data, including web crawling/scraping/search analysis; social media; retail data etc. all collected by the private sector. But we also have new data science to address this data, to gain meaning from this data. And often that expertise to extract meaning is sitting in the private sector.

So, the real question is how we extract value and engage with the private sector around data they collect. That’s a whole different ballgame from open government data. It’s not just about data sharing, but about new kinds of public-private sharing around data for the public good. So we have set up new programmes of Data Collaboratives. So we set up the Data Collaboratives Explorer allows you to explore those collaborations taking place – there are over 100 in there already. From that collaborative work we have gained some insights that I will share today.

So, firstly, data collaboratives are important across the policy lifecycle:

  • That starts with situation analysis. Corporations in the US have worked together in the US to understand the scale of the opioid epidemic, for instance.
  • Our second value proposition is about knowledge creation. For instance, post hurricane season how does the mosquito population change and how does that change mosquito born diseases.
  • Our third value proposition is prediction, fr instance projects to predict suicide risk from search results – a project in Canada and also in India.
  • And then we have evaluation and impact assessment. An example here is Vision Zero Labs looking at traffic safety and experiments in spatial composition to influence and reduce risk of accidents.

In those collaboratives we see different models in use. These include: data pooling – enabling sharing and analysis across the collaboration; prizes and challenges – opening some data as a source of generating new insights through innovative ideas and projects that benefit both public and private sector, e.g. BBVA’s Innova challenge; research partnerships – with collaboration across private sector and public or academic sector – such as work on fake news on Twitter; intelligence products – JP Morgan Chase has an institute to extract insights from their own data and actually that can be hugely detailed and valuable; API – for instance Zillow allows you to access real time mortgage and housing market data; trusted intermediary – for instance Dalberg who acts between telecommunications companies and others.

So, there are many ways to set up a data collaborative. But why would the private sector want to do this? Well, they may be motivated by reciprocity – sharing data may lead to access to specialist expertise; research and insights; revenue; regulatory compliance; reputation and retainment of talent – often corporations need to retain talent through solving harder or more interesting problems; responsibility.

But there are challenges too. For instance the taxi and limousine agency in New York regulates all taxi operations, including Uber. In their wisdom they shared the data… But that exposed some celebrity locations (and less salubrious locations). The harm here wasn’t huge but that data in a different cultural contexts could present a much higher risk. So, some of the concerns around sharing data include:

  • privacy and security
  • generalisability and data quality (e.g. not everyone has a cell phone)
  • competitive concerns
  • cultural challenges – there is something of a culture of hoarding data within organisations.

So, to move towards data responsibility we really need risk and value assessment that recognises data as a process, and part of a wider value chain. We need fair information practices and processes – our principles are about 30 years out of date and we urgently need new principles and processes. GDPR helps, but not all the challenges we may have. We need new methods and approaches. And that means having a decision tree across the data cycle.

There are risks in sharing data, but there are also risks in not sharing the data. If we had not have used the NCell data in Nepal, we would have had more deaths. So we have to respond not just to risks, but also to opportunity cost of not sharing data. What is your responsibility as a corporation?

I’ve given lots of examples here… But how do we make data driven public innovation systemic? We need data stewards in organisations so there is someone who can sign off on data collaboratives, we need that profession in place in organisations to enable work with the public sector. We need methods – like the Unicef collaboratory around childhood obesity, that’s a new methology. We also need new evidence, of how data can be used and what impact it will have. And finally we need a movement – this all won’t happen without a movement to establish data collaboratives, and I’m delighted to be here today as part of this movement, and ultimately use data to improve peoples lives.


Q1) In light of Cambridge Analytica and Trump, aren’t we misusing data?

A1) I think use is part of that value chain and we have to have a debate about what kind of use we are comfortable with, and which we are not. And that case also raises questions about freedom of expression, and a need to regulate against deceptive behaviours.

Q1) Several years ago hashtags brought down governments in the Middle East, and now we have governments in those countries controlling the public through hashtags. It’s scary.

A1) I’ve been working in privacy for many years, and I really encourage a comparison of risks and value. And to do a cost-benefit analysis. We need to rebalance that.

Gillian is introducing our special guest… Minister Derek MacKay

Message from the Scottish Government – Derek Mackey, MSP, Cabinet Secretary for Finance & Constitution, the Scottish Government

I’m not sure that I’ve thought of myself as a data warrior before, but I did teach the Social Security Minister how to use Instagram the other week! I say that partly as I have an appeal and a plea for you… The First Minister has a huge set of followers on Twitter, but I’m stuck just below 18k… Maybe you are the audience to take me over that line!

There’s a lot I want to cover in terms of the excitement of this event. We have a strong reputation and record in Scotland. With responsibility for the budget and internationalisation, this is really exciting. I’m particularly enthused by the international representation including Brazil, Singapore, USA, and Ireland too. This event allows us to put the spotlight on data science in Scotland. It is a natural place for people to come and do business. And this is a great event with business leaders here, with experience to share with others.

Our government, Scottish Enterprise and Data Lab are working together to build innovation and business in Scotland. We are fortunate in Scotland to have world class data resources. Scotland has Universities, 5 of which are in the top 100, and we have 70% of reseach rated as excellent in the last REF. We can feel this group. Data Driven Innovation has the potential to deliver £20bn value to Scotland in the next five years. This buzz can be harnessed to make Scotland the Data Capital in Europe. I paricularly support the growth in FinTech. Many people describe themselves as disruptors – that would have once been seen as a negative but is now a real positive, about opening new opportunities. And data helps us deliver our work, one example of which is the Cancer Challenge which is helping us understand how best to use our resources for the best outcomes.

The Scottish Government Innovation Action Plan seeks to build a sustainable economy, with skills crucial to that, including funding for business growth, innovation, etc. We’ve also launched the Scottish Digital Academy and the Data Science Accellerator to look at how things are changing, to innovate working methods – such as CivTech’s innovative models. We are really serious about business growth, the economy and skills. We have invested in innovation, education and internationalisation. We are the strongest part of the UK outside London and the SouthEast.

So, the Scottish Government supports your enthusiasm for data, for what can be done with data. High tech, low carbon is the future we see that, and we want to be country welcome in Europe and the rest of the world – we don’t support the UK government’s view on Europe.

I commend your work and hope that you have a fruitful and enjoyable time here. And we hope the collaboration of our agencies helps to bear fruit now and in the future.

Improving Transparency In The Extractives Industry Using Data Science – Erin Akred, Lead Data Scientist, DataKind

I am a data scientist from DataKind where we harness data for the improvement of humanity. We exist to use data to see the kind of world we want to see. The challenge we face is that many not for profits, charities, government agencies etc. do not have the resources to do the types of datascience that the private sector (e.g. Netflix) can. So we link pro bono data scientists with organisations with a social mission.

Last year we did a project looking at automating detecting mines from earth observation imagery. We are used to using this data for other purposes, but this is a challenging problem. I will talk more about this but I wanted to talk more about DataKind.

Our founder, Jake, was working at the New York Times on data science, and saw people volunteering and attending hack events at the weekend, giving back on their talents… So he thought perhaps I could partner with a mission driven organisation, could I organise a similar event and make this happen… He started DataKind and we’ve been developing what we can offer these mission-driven organisations who also want to benefit from Data Science. So we now pair data scientists with mission driven projects. We have over 18k community members worldwide, 6 chapters in 5 countries (US, Bangalore, Singapore, Dublin, London, San Francisco, Washington DC), we have chapter applicants in 40+ global cities; 228 events worldwide; and we’ve worked on over 250 projects generating about $20m value generated in volunteer effort.

On example project has been with the Omidyar Network to look at data science solutions that might enable social actors to operate more effectively and efficiently in their efforts to combat corruption in the extractives industry. Now we don’t start with the data that is out there. Our funders really want impact, and we think of that as impact per dollar. So, anyway, the context of this work was illegal mining which can cause conflict in Eastern Demographic Republic of Congo, it includes poor environmental outcomes, and social challenges. As data scientists we partner with other organisations to ensure we know how to get value out of data insights.

To understand illegal mining we have to know where it is taking place. So we did work on machine learning from images. We worked with Global Forest Watch and IPIS.

Now, not all of our projects are successful… Usually projects fails because of issues in:

  • Problem statement – a well thought through problem statement is really important.
  • Datasets
  • Data Scientists
  • Funding
  • Subject Matter Expertise
  • Social Actors

Now, I spoke to someone last night who has run lots of Kaggle projects – crowdfunded data science challenges. Now in those projects you have data, data scientists but you don’t have subject matter experts – and that’s crucisl knowledge and skills to have on board. For instance when looking at malaria, there was a presumption that mosquito nets would be helpful, but the way they work looks like a shrine, like death… And they don’t want to sleep in them. So they used them as fiishing nets.

When we work with an organisation we do want a data set, but we also want an organisation open to seeing what the data reveals, not trying to push a particular agenda. And we have subject matter experts that add crucial context and understanding of the data, of any risks or concerns with the data as well.

We start with, e.g.:

We want to create image classification models

Using publicly available earth satellite imagery

So that those owrking in the transparancy sector can be made aware of irregular mining activity

So that they can improve environmental and conflict issues due to mining. 

Some of the data we use is open – and a lot of data I’ve work with is open – but also closed data, data generated by mission-driven organisational apps, etc.

And the data scientists on these projects are at the top of their game, who these organisations could not afford to work with or recruit earlier.

So, for this project we used a random forest analyser on the data, to find mine locations. We had had generated training data for this project which determined that we can pick out where illegal mining work has occured with good accuracy.

To find out more and get involved – and I’d encourage you to do that – go to:


Q1) Where do you see DataKind going?

A1) We do a lot with not a lot of money. I had assumed that DataKind was 100 people when I joined, it was less than 10. I would love to see this model replicated in other countries. And conferences… Bringing volunteer data scientists together with providers enables us to increase the opportunity for these things to happen. Bringing these people together, those conferences are rich experiences that amplify the impact of what we are doing.

Q2) For the mining project you can access the data online. The US Federal Government is hosting the data, and we used Google Earth engine in this work.

From Analytics To AI: Where Next For Government Use Of Data? – Eddie Copeland, Director of Government Innovation, Nesta

I’ve been talking to anyone who will listen over the last 5 years about the benefits of public sector data. We have been huge proponents of using open data, but often data has been released in a vague hope that someone else might do something with it. And we have the smart cities agenda, generating even more data that often we have no idea how to use. But there is a missing link there… The idea that public organisations should be the main consumer of their own data, for improving their own practice.

Now you’ll have read all those articles asking if data is the new “oil”, the new “fuel”, the new “soil”! I don’t much care about the analogy but the key thing is that data is valuable. Data enables the public sector to work better, it enables many of the tried and tested ways of working better. Doing more and better with less. But that’s hard to do. For a public sector organisation with lots of amazing data on opportunities and challenges in my area, but not the next door area, how can I understand that bigger picture. We can target resources to the most vulnerable areas, but we need data to tell us where those are. Without visibility across different organisations/parts of the public sector (e.g. in family and child services), how can that data be used to understand appropriate support and intervention?

Why do we focus on data issues? Well, there is a technology challenge as so many public sector organisations have different IT services. And you have outrageous private sector organisations who charge the public sector to access their own data – they should be named and shamed. Even when you get the data out the format can be inconsistent, it’s hard to use. Then there is what we can do with the data – we often urge on the side of caution, not what is useful. Historically the main data person in public sector organisations was the “data protection officer” – the clue is in the title!  It takes an organisational leap to collaborate on issues where that makes sense.

I used to work for a think tank and I got bored of that, I really wanted to be part of a “do tank”, to actually put things into action. And I found this great organisation called Nesta and we have set up the London Office of Data Analytics:

  • an impactful problem – it takes time, backing, support you have to have a problem that matters
  • a clearly defined intervention – what would you do differently if you had all the information you could want about the problem you want to solve (data science is not the innovation)
  • what is the information asset you would need to undertake that intervention?
  • what intervention do you need to undertake to solve that issue?

So when we looked at London the issue that seemed to fit these criteria was unlicensed Houses of Multiple Occupancy, and how we might predict that. We asked housing officers how they identified these properties, we looked at what was already known, we looked at available information around those indicators. And then developing machine learning to predict those unlicensed HMOs – we are now on the third version of that.

We have also worked on a North East Data Pilot to join up data across the region to better understand alcohol harms. But we didn’t know what intervention might be used, which has made this harder to generate value from.

And we are now working on the Essex Centre for Data Analytics, looking at the issue of modern slavery.

Having now worked through many of these examples, we’ve found that data is the gateway drug to better collaboration between organisations. Just getting all the different players in the room, talking about the same problem in the same way, is hugely valuable. And we see collaborations being set up across the place.

So, things we have learned:

  1. Public sector leaders need to create the space and culture for data to make a difference – there is no excuse for not analysing the data, and you’ll have staff who know that data and just need the excuse to focus and work on this.
  2. Local authorities need to be able to link their own data – place based and person based data.
  3. We need consistent legal advice across the public sector. Right now lots of organisations are all separately getting advice on GDPR when they face common issues…

So, what’s next? Nesta is an innovation organisation. There is excitement about technologies of all types. For this audience AI probably is overhyped but nonetheless that has big potential, particularly algorithmic decision making out in the field. Policy makers talk about evidence based decision making, but AI can enable us to take that out into the field. Of course algorithms could do great things, but we also have examples that are bad… Companies hiring based on credit records is not ok. Public sector bodies not understanding algorithmic bias is not ok. For my own part I published 10 principles for a code of conduct for public sector organisations to use data centres – I’d love your feedback at

It is not OK to use AI to informa a decision if the person using it could not reasonable understand its basic objectives, function and limitations. We would face a total collapse of trust that could set us back a decade. And we’ve seen over the last week what that could mean.


Q1) Aren’t the problems you are talking about are surely people problems?

A1) Public organisations are being asked to do more with less, and that makes it difficult for that time to be carved out to focus on these challenges, that’s part of why you need buy in and commitment at senior level. There is a real challenge here about finding the right people… The front line workers have so much knowledge but you have organisations who

Q2) Your comment that you have to understand the AI, GDPR require a right to explanation to use of data and that’s very hard to do unless automated.

A2) Yes, that’s a really untested part of GDPR. If local authorities buy in data they have to understand where that data is from, what data is being used and what that means. In the HMO example local front line staff can look at those flags from the prediction and add their own knowledge of the context of, for instance, a local landlord’s prior record. But that understanding of how to use and action that data is key.

Data Driven Business. It’s Not That Hard.- Alex Depledge, Founder,, Former CEO

That’s a deliberately provocative title – I knew that this would be a room full of intellectuals and I’m going to bring back down to earth. I’m known for setting up, and I think it’s fitting that I am following Eddie talking about the basics and the importance of getting the basics right. So many companies that say they are running a data driven business, and they are not… Few are actually doing this.

I started my professional life at Accenture. I met my co-founder there. About 7 years into our friendship she emailed me and said “I’ve got it. I need a piano teacher, I’ve been Googling for four hours, we need a place to find music teachers”. And I said “that’s a rubbish idea”. And then I needed a wysteria trimmed… And we decided we wanted to build a marketplace for local services… We had a whole idea, a powerpoint deck, and thought that great, we’ll get a team in India or Singapore to build it… Sounded great, but nothing happened.

And then Jules quit her well paid job and she said “it’s ok, I’ve brought a book!” – and it was a Ruby on Rails book… She started coding… And she built a thing. And that led to us going through a Springboard process… We had some data but I was trying to pull in money. We were attracting some customers, but not a lot of service providers… We were driven by intuition or single conversations… So one day I said that I’m quitting and going back to the day job… And I was frustrated… And a collague said “maybe we should look at what the data says?”… And so they looked. And they found that 1 in 4 people coming to the website wants a cleaner. And we were like “holy shit!”. Because we didn’t have any cleaners. So we threw away what we had, we set up a three page site. We went all in so you could put a postcode in, find a cleaner, and book them. We got 27 bookings, then double that… And we raised some funding – £250k just when we desperately needed it. We found cleaners, we scaled up, we got much bigger investment. And we scaled up to 100 people.

Then we really turned into a data driven business, building what people want, try it, check the data, iterate. Our VC at Axel pushed us to use mobile… We weren’t convinced. We checked the data that actually people booked cleaners from their desk at lunchtime. At our pinnacle we moved 10k cleaners around London at one point. We had to look at liquidity and we needed cleaners to have an average of 30 hours of work per week… too few and cleaners weren’t happy, too high and jobs weren’t taken up. So at 31 hours we’d start recruiting.

From there we looked at expansion and what kind of characteristics were needed. We needed cities like a donut – clients in the middle, cleaners at the outside. We grew but then we got some unwanted attention and chose to sell. For £32 million. And the company that brought us had 80 engineers.. And they migrated 16 countries onto our platform which had been built by 8 engineers.

So, we sold our business…. And I thought I’m not going to do that again…

And then I wanted a new kitchen… So I had an architect in… spent £@500… 45 days later I got plans… and 75 days later I had an illustration of how it would look so I could make a decision. And so I started Resi, the first online architect. And it took me just 4 months to be convinced that this could be a business. We set up a page of what we thought we might do. I spent £10 per day on Facebook A/B testing ads. And we’ve had a huge amount of business…. We wanted to find the sweet spot for achitects and how long the work would take. Again we needed to know how much time was needed for each customer. So 3 hours is our sweet spot. Our business is now turning over £1 million a year after one year. And only one person works with data, he also does marketing. He looked at our customers and when they convert and how our activities overlaid. After 10 days we weren’t following up, and adding some intervention (email/text etc.) tripled our conversions.

We’ve also been able to look at hotspots across the UK, and we can target our marketing in those areas, and also understand that word of mouth… We can take advantage of that.

I’m a total data convert. I still don’t like spreadsheets. Data informs our decisions – not quite every decision as instinct matters too. But every piece of data analysis we did was doable in a spreadsheet by someone in high school… It doesn’t take machine learning, or AI, or big data. Even simple analysis can create tremendous results.


Q1) What next?

A1) I always said I didn’t want to dine out on one story… Like Hassle. But I don’t know the end for Resi yet… Invite me back in a few years!n

Q1) The learning for a few hours of work was huge.

A1) Our entire business was based on a single piece of analysis – what were our customers looking for led to £32m.

The AI Race: Who’s Going To Win? – Vicky Brock (VB – chairing), CEO, Get Market Fit; Alex Depledge (AD), Founder, Former CEO; Joel KO (JK), Founding CEO, Marvelstone Ventures; Chris Neumann (CN), Early Stage Investor

CN: I’m a recovering entrepreneur. As an investor I’ve had a global purview on what’s going on in the AI race. And I think it’s interesting that we see countries and areas which haven’t always been at the cutting edge of technology, really finding the opportunities here. Including Edinburgh.

JK: We are funders based in Singapore and investing in FinTech. The AI technology has been arising… I’m hoping to invest in AI start ups and incubators.

AD: You already know who I am. In my brief hiatus between companies I was an entrepreneur in residence in Index Ventures, and I saw about 300 companies come in saying they were doing AI or Machine Learning so I have some knowledge here. But also knowing a leading professor in data ethics I don’t care who wins, but I care that Pandora isn’t let out of her box until governments have a handle on this because the risks are great.

VB: I’m a serial entrepreneur around data. And machine learning or AI can kind of be the magic words for getting investment. There is obvious hype here… Is it a disruptor?

CN: I’ve seen a lot of companies – like Alex – say they use ML or AI… In some ways its the natural progression from being data driven. I do think there will be an incredible impact on society over the next 10 years from AI. But I don’t think it will be the robots and tech from science fiction, it will probably be in more everyday ways.

VB: Is AI the key word to get funding…

JK: I see many AI start ups… But often actually it’s a FinTech start up… But they present themselves that way as funders like to hear that… There is so much data… And AI does now spread into data lives… Entrepreneurs see AI as a way to sell themselves to investors.

VB: At one stage it was “big data” then “AI” but you’ve had some little data… What did you see when you were entrepreneur in residence?

AD: No disrespect to investors but they focus on financials and data, but actually I’d often be asking about what was happening under the bonnet… So if they were were using machine learning, ask about that, ask about data sets, ask where it’s coming from… But often they do interesting data work but it’s a good algorithm or calculation… It’s not ML or AI. And that’s ok – that’s something I wanted to bring out in my presentation.

VB: What’s looking exciting now?

CN: We see really interesting organisations starting to do fascinating work with AI and ML. I focus on business to business work, but that often looks less exciting to others. So I am excited about an investment I’ve made in a company using BlockChain to prove GDPR compliance. I spoke with a cool company here using wearables and AI for preventing heart attacks, which is really amazing.

JK: I have been here almost a week, met start ups, and they were really really practical. They have the sense to make a revenue stream from the technology. And these very new start ups have been very interesting to me personally.

VB: You’ve started your next company, did you cross lots of ideas off first…

AD: Jules and I had a list of things we wouldn’t do… Chris talked about B2B… We talked about not doing large scale or consumer ideas. We whittled our list of 35 ideas down to 4 each and they were all B2B… But they bored us. We liked solving problems we’ve experienced. My third business I hope will be B2B as getting to £10m is a bit more straightforward than in B2C.

VB: AI requires particular skillsets… How should we be thinking about our skillsets and our talents.

CN: Eddie talked earlier about needing to know what the point in. It can be easy to get lost in the data, to geek out… And lose that focus. So Alex just asking that question, finding out who gives a damn, that’s really important. You have to do something worthwhile to somebody, there’s no point doing it .

JK: With AI… In ten years… Won’t be coding. AI can code itself. So my solution is that you should let your kids play outside. In Asia lots of parents send kids to coding schools… They won’t need to be engineers… Parents’ response to the trend is too early and not thought through…

AD: I totally agree. Free play and imagination and problem solving is crucial. There aren’t enough women in STEM. But you can over focus on STEM. It’s data and digital literacy from any angle, it could be UX, marketing, product management, or coding… In London we hav ethis idea that everyone should be coding, but actually digital literacy is the skills we need to close. And actually that comes down to basic literacy and numeracy. It’s back to basics to me.

VB: I’d like to make a shout out for arts and social sciences graduates. We learn to ask good questions…

AD: Looking at recent work on where innovation comes from, it comes from the intersectionality of disciplines. That’s when super exciting stuff happens…


Q1) Mainly for Alex… I’m machine learning daft… And I love statistics. And I know the value of small scale statistics. And the value of machine learning and large scale data – not so much AI. How do you convey that to business people?

AD) We don’t have a stand out success in the UK. But with big corporates I tell them to start small.. Giving engineers space to play, to see what is interesting… That can yield some really interesting results. You can’t really show people stuff, you need to just try things.

VB) Are you trying to motivate people to use data in your company?

JK) Yes, with investors you see patterns… I tell kids to start start ups as early as possible… So they can fail earlier… Because failures then lead to successful businesses next time.

CN) A lot of folk won’t be aware that for many organisations there is a revenue stream around innovation… It’s a really difficult thing to try to bring in innovative practices into big organisations, or collaborate with them, without squishing that. There are VCs and multinationals who will charge you a lot of money to behave like a start up… But you can just start small and do it!

The Revolutionary World Of Data Science – Passing On That Tacit Knowledge! – Shakeel Khan, Data Science Capability Building Manager, HM Revenue & Customs

I’ve been quite fortunate in my role in that I’ve spend quite a lot of time working with both developed and developing economies around data science. There is huge enthusiasm across the world from governments. But there is also a huge fear factor around rogue players, and concerns about the singularity – machines exceeding humans’ capabilities. But there are genuine opportunities there.

I’ve been doing work in Pakistan, for DFID, where they have a huge problem with Dengy Fever. They have tracked the spread with mobile phone data, enabling them to contain it at source. That is saving lives. That’s a tremendous outcome. Closer to home, John Bell at Cambridge University has described AI as the saviour of our health services, as AI can enable us to run our services more effectively and more economically.

In my day job at HMRC, you can’t underestimate what the work that we do enables in terms of investment in the country and its services.

I want to talk about AI at three stages: Identify; Adopt; Innovate.

In terms of data science and what is being done around the world… The United Arab Emirates have set up their Ministry of AI and a 2031 Articificial Intelligebce Strategy. We have the Alan Turing Institute looking at specific problems but across many areas, some really interesting work there. In Edinburgh we have the amazing Data Lab, and the research that they are doing for instance with cancer, and we have the University of Edinburgh Bayes Centre. Lots going on in the developed world. But what about the developing world? I’ve just come back from Rwanda, who had a new Data Revolution Policy. I watched a TED talk a few weeks back that emphasised that what is not needed in sub0-saharan Africa is help, what they need is the tools and means to do things themself.

Rwanda is a hugely progressive country. They have more women in parliament (62.8%) than any country in the world. Their GDP is $8.3bn. They have a Data Revolution Policy. They are at the start of their journey. But they are trying to bring tacit knowledge in, to leapfrog development… Recognising the benefit of that tacit knowledge and of those face to face engagements.

For my role I am split about 50/50 between international development and work for HMRC. So I’ll say a bit more about the journey for developed economies…

Defining Data Science can be quite abstract. You have to make a benefits case, to support the vision, to share a framework and some idea of timeline, with quick wins, to build teams, to build networks. Having a framework allows organisations to build capabilities in a manageable way…

A new Data Science Centre going up in Kigali, Rwanda, will house 200 data scientsists – thats a huge commitment.

The data science strategic framework is about data; people skills; cultural understanding and acceptance – with senior buy in crucial for that… And identifying is also about data ethics, skills development – we have been developing frameworks for years that we can now share. For Rwanda we think we can reduce the time to develop data capabilities from maybe 5 years to perhaps 3. Similarly in Pakistan.

When you move to the adopt phase… You really need to see migrationa cross sectors. I started my career in finance. When I came to HMRC I did a review of machine learning and how that was being used, how that machine learning was generating benefit. We managed to bring in £29 bn that would otherwise be lost, partly through machine learning. One machine learning model can, effetively, bring in tens or hundreds of millions of pounds so they have to be well calibrated and tested. So, I developed the HMRC Predictive Analytics Handbook (from June 2014), which we’ve shared across HMRC but also DWP, across collaeagues in government.

In terms of Innovate, it is about understanding the field and latest developments. However HMRC are risk averse, so we want to see where innovation has worked elsewhere. So I did some work with Prof David Hand at Imperial College London about 20 years ago, and I got back in touch, and we developed a programme of data science learning. Not about Imperial providing training, it was a partnership between HMRC and Imperial. We looked closely at the curriculum and demonstrate value added, and look at how we could innovate what we do.

University of Edinburgh Informatics is a really interesting one. I read a document a few years ago by the late Prof. Jon Oberlander about the way that the academic and public and private sectors working together could really benefit the Scottish economy. Two years of work led to a programme in natural language processing that was the result of close collaboration in HMRC. Jon Oberlander was hugely influential, and passionate about conversational technology and the scourge of isolation. And was able to ask lots of questions about AI, and when that will be truly conversational. I hope to continue that work with Bayes, but also wanted to say thank you to Jon for that.

AI is increasingly touching our lives. Wherever we are in the world, sharing our tacit knowledge will be incredibly important.


Q1) Rwanda has clearly made a deep impression. What were the most suprising things?

A1) People have stereotypes about sub saharan Africa that just aren’t true. For instance when you get off the plane you cannot take plastic bags in – they are an incredibly environmental country. I saw no litter anyway in the country. The people of Rwanda are truly committed to improving the lives of people.

Q2) Do you use the same machine learning methods for low income and high income tax payers/avoiders?

A2) There are some basic machine learning methods that are consistent, but we are also looking at more novel models like boosted trees.

Q3) I worked in Malawi and absolutely back up your comment about the importance of visiting. You talked about knowledge from yourself to Rwanda, how was the knowledge exchange the other way?

A3) Great question. It wasn’t learning all from developed to developing. We learnt a great deal from our trip. That includes cultural aspects. I terms of the foundations of data science, we in the UK have used machine learning in financial services and retail for 30 – 40 years, that isn’t really achievable in these countries at the moment and there it is learning going from developed to developing.

Closing comments – Maggie Philbin

I’ve been reflecting on the (less serious) ways data might influence my life. My son in law is in a band (White Lies) and that has given me such an insight into how the music industry use data – the gender and age of people who access your music, whether they will go to gigs etc. And in fact I was very briefly in a band myself during my Swap Shop days… We made a mock up Top of the Pops… Kids started writing in… And then BBC records decided to put it out… We had long negotiations about contracts… But I was sure no-one would buy it… It reached number 15… So we went from parodying Top of the Pops to being on Top of the Pops. And thank you to Scotland – we made number 9 here! But I hadn’t negotiated hard – we just got 0.5%. And if we’d had that data understanding that White Lies have, who knows where we would have been.

So, day one has been great. Thank you to The Data Lab, and to all the sponsors. And now we adjourn for drinks.

 March 22, 2018  Posted by at 10:53 am Events Attended, LiveBlogs Tagged with: , ,  No Responses »
Dec 132017

Today I’m at the IT Futures Conference 2017, an annual University of Edinburgh conference. I’m chairing a session later but I’ll otherwise be liveblogging our wonderful speakers. This is a liveblog so any corrections and additions are, of course, welcomed and encouraged. 

John Lee is introducing the day – which is being recorded – and also noting todays hashtag which you should definitely keep your eye on today: #itfutures.

John: Today’s event is about Scaling and Transformation and there is a lot to challenge ourselves with, we hope there will be lot for us to think about and reflect upon over the Christmas break.

Our first speaker today is Melissa Terras, who recently joined us from UCL as our new Professor of Digital Cultural Heritage.

University Technology Futures: the View from a Newbia at the UoE – Professor Melissa Terras, UoE College of Arts, Humanities and Social Sciences

There are two ways to do these things: the show and tell or saying something more meaningful. I hope to do the latter today.

So, I went from studying Greek sculpture to doing hardcore machine learning in my PhD and research. I then went to UCL where I was one of the founders of the UCL centre for Digital Humanities, working on

I will be directing “digital stuff” at the College of Arts, Humanities and Social Sciences, and working heavily with the Edinburgh Futures Institute which is leading data driven innovation for the College of Arts, Humanities and Social Sciences. So, futures… There are lot of those… So many futures initiatives and organisations but also we face rather uncertain future… And we will we be looking at these issues at the EFI, how to deal with this uncertain future and the changing information environment. And of course the word comes from financial markets, it is speculative. When you think to the future you see speculative fiction imaging what might happen, but what does this mean for us as a University.

If I’d given this talk a few years ago it would have been quite different. The internet is changing as an environment and it has become a less pleasant place to be over the last few years. I’ve actually done some grieving for the internet I grew up with… I’ve been online since I was 17 and a lot has changed. But lets be more positive, what will we do to equip ourselves for this information environment?

So, lets start with the students – those people we criticise for not being able to buy a house because they are buying too many avocados… Lets start with ethics… I’ve been working on a project called Digital Library Futures – looking at usage stats of who borrows what, and that comes with issues of anonymity, huge ethical issues, huge data protection issues. These are the conversations we have to have with our students to understand what we can and should do.

I’ve said it before but… All data is history. It comes with a cultural background, a societal history… We do this in historical studies all the time, but do we do this with our informatics students? We’ve been doing some work at UCL on the Time Digital Archive (1785-2010) which looks at how men and women are talked about… If you use this as a training corpus for machine learning you are embedding the bias and historical issues into that learning. Even historic information has a real impact on current computational work and approaches.

Which brings me to diversity… There is a lovely piece of nineteenth century newspaper analytics identifying images from newspapers… But only white men. There were images of women and non-white people in those papers but machine learning hasn’t recognised them. This is so important in how we use and train machine learning and what computational methods we use…

And then there is context and understanding what you engage with… There are the sites that let you automatically insert yourself in a range of images – without any idea of provenance or context. Or the Twitter bots that will give your profile image a smile… A huge shout out here for librarians.

What about academics? Well all of the above! But also… We need to understand what is happening

How locked down the digital environment is – there are things I can’t do with my desktop, and then three days later it changes. I’m working on an EU handwriting recognition project and it’s hard to install the software I’m writing. To enable data driven innovation we have to give people flexibility – if you don’t do that people do workarounds and that’s where security issues start to come in. We need to ensure we have the access to do this work.

The other thing I wanted to mention is the Jeremy Bentham Panopticon… Whether through diary systems… And also lecture recording… And the change in rules that students can record anything and what that means for what we say… How you talk about your work changes when that is recorded. Being recorded at any time by students what does that mean for students… And what does that mean for students from, say, Turkey… Anything we do can potentially be done at any one time. You may think that I’m being paranoid. There have been all sorts of threats, death threats, scandal, etc. when something is broadcast and shared. How do we support staff and students if something goes wrong. So we have to understand that challenge, to engage with difficult topics.

I’m a great believer in looking after it’s own data… What does the university do to archive it’s own websites… What can we do to best look after our own information environment – our work, our data, our web content.

So, we have a bright future ahead. But it’s a complicated future. We have to be aware of all of this, we have a role to be the place to go for truth when truth is being debabed.. And that’s where the Edinburgh Futures Institute comes in. We are still developing our work  – keep an eye on the website, It has huge potential and a real opportunity to be a beacon of light and truth at a time when the world really needs that. And I am hugely excited to be here and in a role that can help shape that.


Q1) You talked about light and truth… What about openness… And being closed about some things… How do you provide spaces that are both open and closed and safe?

A1) I am a firm believer in Open Data and Open GLAM, but I think it’s about equipping people with the skills to understand when and how and what framework you can share under. It’s not about closing things off but about being tooled up as an individual. The Open Data and Open Science agenda tends to be about projects post-peer review when they are ready to share. I was talking with a colleague here working on the history of censorship and she isn’t on Twitter because of the abuse she’d get for her work – and that is the right decision for that context… Having those skills to decide is important.

Q2) Thinking about the GDPR coming in, as a newbie, how do you think the University is prepared, and how do staff manage their own digital environment in that context?

A2) I am on committees at Edinburgh, I was on similar at UCL, and I have sat as an external person on similar groups at Oxford. Across all universities there is a need to help stafff understand the legal requirements, and the significance of them. These things are generally understood better when something goes wrong… In a way that’s the “Daily Mail” test – will what we are doing be at risk of appearing there?! But I have been cheered by what I have seen over the last few weeks here, and where the thinking is at.

Mr Stefan Hyttfors

I thought I would start by telling you about my 21 year old son who is a university student. He lives away from home… This summer we sat down together to have this great barbeque, to talk about his plans for the summer… About what he would do for a summer job… And he said “no, I won’t get a summer job” and that surprised us as he had lots of plans, and they require money… But he said “it’s fine! I have this crypto currency wallet” and he had 2 bitcoin – which last summer was worth about $5000. And I wanted to start with that… He questioned what is money, is paper money real? It’s belief, we believe it has value because it has been there for a long time… We have symbols… the dollar, the pound, the krona, the Euro… We don’t believe in the paper anymore but we believe in the banks, we check on our phones. We don’t ever see our money as a thing… We know what they owe us, as long as we believe in that system, it works. He said he doesn’t believe in that system – it’s dysfunctional and it will be disrupted… It is an inefficient system… I believe in crypto currency. And his bitcoin is worth more like $37k, so he was right, he didn’t need a summer job.

What Melissa told us about education is right, if we want to create new citizens… We do know that in the future we have huge problems… We have climate change. We don’t know if we can cope with that yet… There are ways to change your impact: eat less meat; fly less; drive an electric car or ditch the car altogether. There is one way to trump all that: have less children! We are in this time where the best way to save the future is to stop having kids… Which is strange… Surely a better faster idea woudld be suicide? Zero carbon emissions! But this is serious… We need to understand and think about how we think about the future, about what we can do… I’, in a hotel tonight, in the hotel has a sign to reuse the towels to save the planet… But the planet will be fine for millions of years… We have to think about the future of humanity, and that’s about sustainability in all senses – environment, diversity, equality… If we don’t do that we will have more divide, more people scared about human futures.

And now we have the internet. The internet is a stupid network…. For thousands of years we collaborated in hierarchy…. Better to be part of that at any level rather than being alone. But now we have a decentralised network… It’s all of us and everything, in a mess… And since we are connected in a mess and not a hieracrchy, we don’t need a boss… So I have experience, and I can tell my son how to address issues in the world… But what if I’m wrong…. That means there is no boss, no teacher, who chas the power to say what should happen, innovation is at the edges… In universities you pushed out ideas, you had the power; companies too pushed things out. But now innovation is in the edges… There is no boss now. It’s decentralised, that’s the whole point… This is how crypto currencies are being established right now… Rather than haing trust in just one bank… Lets instead trust in all of us, keeping transactions across millions of ledgers, there is no middle man, no one database to hack anymore… This couldn’t work without network effect. In any university or country we need to have scale… This took off about 10 years ago… This summer was the tenth anniversary of the launch of the first smartphone, and it’s an amazing product launch from Steve Jobs – who points out current “smart” phones which are all about hardware, which can’t be easily changed as the world changes… He said then that we’d fixed the issue for computers but not for phones… Well we are still just at the beginning. Things are still changing..

The world is changing from hardware to software… Not just phones… From a University building to software… From products to services… This means we can’t think of the future in a linear fashion… In a corporation they talk about growth, in a country it’s GDP growth… in our lives we see our ages go up but it’s an odd way to mark things… I might instead celebrate the years I have left to live to keep me focused on what matters… Whatever we work on we do everything a little bit better all the time, we compete on scalable efficiency… If we are more efficient than competitors we are safe. This is a model that is seen as best practice right now… But that applies until we find a new way to address the issue… That is probably technology but may well not be devices… For instance I don’t need to own a car now, I can use Uber… That’s a new technology. New stuff is new! The world changes… And that always appears in “S” curves…. First it doesn’t work, we ridicule it… Then leaders are learners… That’s where we need a university to study and explore – there would be no new practice without it… Then we learn and adjust… and eventually it takes off and quickly thanks to network effects.

But what if I’m the blue (steady upwards) line here… What if I don’t know how to solve the problem… When the red line crosses the blue line, the blue line is over… This is a bit like the Christmas Pig in Sweden – all looks good until Christmas! Right now we have big organisations going out of business… disruption are our unicorn companies… You get disruption because you do something very very good with efficiency in mind… And you get disrupted because they find a totally different way to solve a problem. We say this in media – newspapers, music, film. And now we see it in retail… We see lots of large retail brands ticking along, busy, doing well… And then Amazon performing so much more successfully. Eric Hoffer says “In times of change learners inherit the earth; while the learned find themselves beautifully equipped to deal with a world that no longer exists”.

As humans we always solve our problems with technology. So 1914 we have the Ford Model T launched… We have huge adoption growth, a few years of decline during the second world war, but by 1991 we are at 91% adoption… You have 76 years to adopt the technology… But right now the S curves are like rockets! An idea appears and it is adopted hugely fast! And we don’t need to shift products anymore, we can ship ideas… Artificial Intelligence is about creating machines that do not need to be programmed… Maybe you heard about the defeat of a Go champion beaten by a Google algorithm. This isn’t chess, Go is a game with 10 to the 5 variations, which has been taught from generation to generation. And that was last year, now there’s a new version of that algorithm – Alpha Go Zero – which learns the game from nothing and in 40 days learned enough to win 100 games in a row against the previous algorithm… What AI learns from us may only slow us down…

It’s scary though! We worry “Will robots take our jobs?” but that’s stupid. We are the creators. We solve problems with technology, we are part of technology… If you think about your day, your experience, how you think about life… Think about electricity and what would happen if you took that away, what that would mean for our lives… It’s hard to imagine that though. Douglas Adam described what you have now,  that’s what has always been… But everything invented after the age of 35 is just not ormal… We take for granted the technology we have available to us. Technology is part of us. It’s not robots or human beings, it’s still us and what we want to do with technologuy…

When I was growing up computers were the size of a room… It wasn’t accessible or cheap, it was a huge mainframe… Now we’ve moved to mobile, to wearable, to technology that can be embedded in us as well… Your grandkids will talk about you, and think you know nothing… We will have new problems… Technology will tell us not to have another beer because it will knock 15 minutes off your life… Your insurance company may stop covering you… That’s a new problem… Maybe privacy becomes the currency in the new world

So, as we think ahead think about one word, think about dematerialisation. Digitisation means the marginal cost go down… It goes down over time… What is the marginal cost of taking pictures now? It’s zero! But you used to just have 24 shots to use, or maybe 36… It was a bigger cost… You didn’t take lots of them… Then you sent them off… And two years later you finish the film and send off… Now our toddlers can take 2000 self portraits a day! We talk about healthcare in those terms of unaffordability now, maybe we afford it through digitisation….

One more example we hear about is the automobile industry… Cars were complex… Now they are smaller, lighter, autonomous… We only have a driver now because the law requires a human in charge… Today when you say “look at that guy, he’s texting and driving!”, but in less than 10 years time you’ll say “look at that guy, he’s driving! People are the inefficient part… 1.2 million people die in traffic accidents… We don’t know how to drive… But how do deal with this… This traffic cop pulls up the Google Car and he doesn’t know what to do… No-one is in charge… But if we need fewer cars, we make fewer cars… That means the automobile industry will decline… We need to move from physical ownership of cars to the shared infrastructure for getting around. And that can be ok. But that won’t work when policy makers force us to stay in the past, to protect the old way of doing something..

Same with education… If you grow up in Uganda you just need access to the internet… You can take one of 250 courses at Harvard for free online… You don’t need the concrete building. It doesn’t matter how much political power you have, technology beats politics… They trump politics and borders… Online there is no Brexit… It’s not just corporations but also individuals that have access to technology. We can solve big problems this way. That means that the issue isn’t technology but humanity… Do we want sustainability, equality, space to explore… Do we want to see GDP growth. What do we believe i as a society… We have fantastic ecooic growth… GDP is growing… More people on the planet than ever before have access to technology, to healthcare, to vaccines. But non-humans… Oceans, forests, etc. are dying, we are clearning land to support us farming meat. We have huge air pollution issues. If we keep going on that blue line we won’t have water, air, forests to support us, we all depend on us eventually… No matter what you believe in…

There is one thing we can all relate to… There are 7.5Bn people on just one planet… No matter what business or education or purpose model we have, we have to solve our problems within that limit… Until 1986 we were just about sustainable…. Right now we are using 1.6 planets worth of resources… We have to create much more with much less… Some things, some business models, some GDP standards have to shrink not grow. It’s pretty clear that my son and his generation are aware of this, they see that old model doesn’t work, that it doesn’t make them happy… We have all these things, but we don’t have happiness… That’s not my opinion, that’s the WOrld Health Organisation’s opinion. We have a huge number of people with depression, we had 800m suicides last year. A lot of things are pointing the wrong waays… This is why future generation think old models are bullshit. They see that there must be a better way to do it… Stephen Hawkins says that “history teaches us what didn’t work” – we have to come up with better conclusions… If we are at this point in history when sustainability means no babies… Then we clearly have to change… From an educational perspective I think it is clear I don’t need a university, or a teacher… I need a network. Perhaps the university or the teacher can be a helpful node in this network… But it has to be about creating a better future, rather than preserving an old model.

Response – Jen Ross

What we have from Stefan is an opportunity to reflect on what we need to do as educators to consider different sorts of materiality. We have to educate not just with technology but about it. We have to see technology as deeply integrated with society ad our values. This has implications for what we do as an organisation as well… How do we want students to respond to this new world… People at this university talk about the future in a lot of interesting ways. Posing interesting questions… This year Sian Bayne and I led a course on digital futures, and the Near Future Teaching project is looking at what teaching of the future should be… These conversations are happening. And this organisation is already thinking about ethical issues… And I want to ask you about being creative and critical in these discussions, and who can you talk to about the ideas today?


Q1) I noticed in Stefan’s presentation a self-driving car… Am I correct in saying that a self-driving car slowed when passing two females… and is that an example of bias in the algorithm.

A1) I have an autopilot on my car… and you get used to that quickly… That makes me dangerous in my wife’s car – I forget I am in charge. What Melissa raised is important in terms of bias embedded… Maybe Alpha Go can teach us something about teaching the algorithm… Maybe we can learn something new… It’s an amazing time to be alive. Thinking about the future as a destination makes the present an obstacle… We know what the future will be like because this moment is the future…

Q2) One of the interesting things about being this room is that people here work on systems… The internet isn’t stupid… That’s a live issue in the debate over net neutrality… That’s likely to break at some point… People have been trying to keep the network stupid but what happens when that breaks, what happens without net neutrality…

A2) I don’t know it has to break… But in a decentralised network there is no way to stop it… So big organisations doing things to individuals doesn’t work this way… You could only shut down blockchain by cutting power… And that’s hard to do… Most of the blockchain miners are in China and not in the big cities… I don’t believe in paranoid scenarios where you have evil Trump, evil Google… As soon as they do something bad enough… We go somewhere else… I refer to bitcoin as it’s a really interesting example. Big banks have a business model that depends on all the big people… So how do you close down a network like Bitcoin… You could do that by paying them to opt out… But that would cost £300bn right now. I do see huge problems with protectionism, because of populism, because of inequality. We have enough stuff but we don’t share it well enough… People get scared and then we go for protectionism and nationalism… I don’t claim to have an answer…

Q3) I was meeting with union heads yesterday and AI came up and the potential for disruption or job losses… I’d like to hear your view on the total amount of meaningful work and jobs over time… Any thoughts on how to deal or think about that.

A3) It’s a valid and important question. What is a meaningful job? Gallup says that only 13% of the workforce is really engaged in their role… Most people do “robot jobs”. That should mean that that opens up… As long as job loss means free time rather than our future being screwed, that’s fine… As long as people believe that we need jobs and politicians argue about creating jobs… It’s easy… Ignore technology… that will create jobs… The issue is sharing resources and the outcome… But that’s not easy… And more time means more time to think about the meaning of life. I don’t have a boss or a job as such. I’m curious, I travel, I’m essentially a student… And what I do funds my life… Lets talk about sharing resources as a problem… We have a system that has served us well… But now we are scared of missing out… That’s the thing about Trump and Brexit… People are scared… We have to realise that and address it…

And we are back from coffee for our next session… 

Replay Highlights: Lessons from lecture recording at scale – Ms Anne-Marie Scott

When the call came out for IT Futures a few months ago I knew there would be a story to share about Lecture Recording now that we are 100 days into that project… And as we are first and foremost a research organisation so it is only right that we reflect and understand what we are doing.

When we started this project I talked first with librarians – as Melissa pointed out this morning librarians are heroes – and they pointed out that lecture recording is nothing new. The technology is new but there have been many ways to record lectures… For instance notes from our archive on giving anatomy lectures – including content and stage directions to add a little theatre to proceedings.

So, it’s nothing new… We have also been using lecture recording (in a technology sense) at a modest sense for nearly a decade. It extends the range of materials already provided by online library resources, VLE etc. It is useful for accessibility and acces, and can help ease the pressure on physical space. But more than that EUSA sabbatival officers elected in each of 2012-2016 elections with lecture capture as part of their manifestos. Our students have complex lives and access is important, as is accessibility – hence our recording policies around this.

Actually we were coming in late to this, which was to our advantage. It meant we could talk to other people and other organisations about how they do lecture capture and what their experiences have been. We have a three year programme, that’s partly because we are equipping 400 rooms, but it is also because we wanted to ensure we included space for critical reflection. The first phase was done in four and a half month, with procurement which takes most university a year to do at this scale.

We put a lot of thought into the branding and communications around the university. It has been important to work with those on the ground across schools and colleges, and our IT support officer colleagues. I think we were a little overwhelmed by the interest in automated scheduled recording actually. So we need to acknowledge and thank our colleagues across the university.

The lecture capture programme has led to some improvements in teaching spaces. We have added signage, added interactive lights and added additional microphones – doubling up in many spaces and support accessibility as well as recording. And – now this is futuristic! – we added a support phone with preprogrammed hot keys. It sounds silly but it matters, and it’s about understanding the impact on real users.

One of the other things we did was to add a big button in each room with a light that makes it very clear what is or is not being recorded. And it acts as a pause button too – it means recording can be stopped at any time, to get that balance of trust and openness right.

The service was delivered by September 2017 in 138 teaching spaces. We had 246 courses requesting lecture recording, and we trained a huge number of colleagues to engage in lecture recording. This was a phased, varied, local. We thought carefully about the issues here, including privacy, copyright, etc. And the content was in three strands: preparing, delivering, enhancing. And these included face to face and webinar courses. And those do continue.

We also knew we were rolling out a lot over a short time, and we know we have support from students for this project, so we recruited a number of students to work across campuses – 28 in total were there in and around the main spaces to gently offered support and advice. We had positive feedback from staff and students, and we are now talking to the vet school about extending that work there. This is an interesting space and approach, and it works well for our students – and we pay well compared to quite a lot of student roles in the city!

Since we’ve launched the service we’ve seen a growth in use, and we now have about 2000 things recorded each month… And that’s across the schools. Early on the School of Engineering decided to record as standard. But law, who’ve been using lecture capture in general the longest have very high student use.

When we did begin talking about lecture recording one of the first things we heard from staff was that you cannot record lectures for maths. You can see a fantastic series of lectures on Dark Matter on MediaHopper which show the role of the chalkboard in lectures. When we scoped our system we looked at other options… Lots of high tech tracking cameras, digital notepads… So we stepped back and thought about what would be feasible and easy to use… So, we – my colleague Ewan Murray – looked at what we might be able to do so that you can walk into a room, do as normal, and have it capture properly… So there are a few components that are needed. Firstly the touchscreen interface have been redesigned to make them accessible – do you want slides, document viewer, or combinations? Simple controls here… Then presets of left, up, down, right… That lets you select the chalkboard to use… So you have control that is simple, and you have screens that we’ve installed – feedback on what is being recorded… Once you’ve worked out how to use the room… That’s it. You do your thing. The system records (generally) the bottom chalkboard… It simple records what has been scheduled. And that’s enabled the school of maths to go from skeptical to having the lecture capture system accommodate what they do… So for instance some videos capture slides, TopHat voting and the chalkboard. And, for instance, Prashant Valluri (Engineering) has had really positive feedback from his students… And I think we have to think beyond chalkboards, and extend to other writing spaces – whiteboards, writable walls, spaces in art and design… Not just digital content

So, where next?

There is a governance strand around this programme, with a Policy group, an Academic User Group, Engagement and Impact, etc. And that Academic User Group is important. And we are moving forward with expansion to another 130 spaces, expansion of analogue writing surfaces. And we do need to think about information literacy and what it is (and is not) appropriate to do with these materials – our students can and know how to capture video, but they need to be equipped to know how best to use it.

In terms of evaluation we have five PTAS projects looking at lecture recording in different contexts, usages, and issues such as inclusion. We are also commissioning other pieces of evaluation. Dr Jill Mackay is looking at the value of lecture recording, and a further project on staff experience. And there will be a further call coming through PTAS in March 2018, but we are open to other ideas too.

Since this is a futures event I wanted to end with some provocation… The telescope at Joddrell Bank is now starting to give students (remote) access. Lecture capture can enable that sort of access to things you couldn’t access before. So lets think beyond chalkboards, what other creative things can we be doing with this?


Q1) In terms of brainstorming possible futures… How about capturing performances and 3D capture?

A1) That is really interesting. It also came up in MediaHopper before for sports and sports science, and assessment in that. Really interesting what we can do with that which will be useful – we don’t advocate for recording everything unless it is useful…

Q2) Are there ways to think about integrating cameras across campuses… Around presence and time shifting… and what is possible?

A2) I think that’s fascinating… And John has been looking at this with an online distance programme that is also taught on campus…

A2 – John) Lecture capture does support livestreaming.. that’s one possibility for other sites and online distance students…. Where students can access the course live, that’s an option but they can also catch up later.

A2) We can also do more to connect up and continue to work with video over time, enriching it over the long term…

Q2) I know there was some work on small scale teaching – YouTute – and was wondering if that has been considered…

A2) The 400 teaching spaces include some quite small teaching spaces… The potential for small scale teaching is there. But we are clear about capturing lectures, rather than seminars and the need to keep those more open and safe spaces for discussion. The facility will be there though, so if someone wants to do that the barrier is just about careful thought – it might be an excellent PTAS project.

Q3) Just occurred to me… Has there been any insight into note taking by students, that sort of thing?

A3) We don’t have information on that. The maths PTAS project is looking at study skills so it might come out from that. We do know from other institutions that sometimes there is less note taking, there is also

Comment) In the previous system we had informal feedback from students along the lines of “I feel free to pay more attention rather than frantically trying to take it all down)

A2) We do also see rewatching

Comment) Our students seem to take similar amounts of notes over time but maybe from re-watching, not just in the lecture.

Q3) Any evidence of how lecture recording impacts how students do?

A3) That’s hard to measure of course, because it could mean a number of different things… The inclusion PTAS project is looking at how lecture capture can support those with complex schedules and lives, etc. And our evaluation will continue post project too to see how that impact of lecture recording develops.

 Scaling an online Masters: approaches and challenges to growth  on the MSc Clinical Education – Drs Gill Aitken, Tim Fawns, Derek Jones

GA: We are going to speak about our experience of scaling an online masters, and specifically the MSc in Clinical Education, the largest programme in the college of Medicine and Veterinary Medicine at the moment. Our students for this programme are medical professionals, including medics, nurses, and increasing numbers of dentists. We follow a similar credits/exit routes structure as most part time MScs. Our students are still working when they are studying so they can put what we talk about into real clinical practice.

I started on the programme in 2010, Tim joined in 2014, and Derek in 2015. We’ve seen fairly sustained growth in the programme until 2014 when we were inundated with applications. We thought that we’d share that experience… But that growth was, I should say, somewhat accidental. We did have a really carefully thought through programme and pedagogy, we had something simple that worked well. Our programme is concerned with what we want to achieve, and we use technology but we are fairly indifferent to the technology.. It’s all about the student experience, interaction and chat so it’s how we use the technology to enable that interaction.

We have a real programme ethos and having more staff has allowed us to step back and think about that, about what works best. And we do share our programme ethos o the course site, with our students, and we are collaborative with our students, working with our learners to develop a vibrant academic community. We encourage and support all our students to develop their own academic voice and a critical approach – and our students see how we engage with them as important for modelling behaviour, as a form of mentoring. That aspect is something we are reflecting and writing up just now.

In terms of where we are now, and where we have bottlenecks right now, it is around dissertation supervision, about marking, and about staff presence.

TF: The crux of those graphs Gill shared is that every person there is a real person. We are terrified of targets we have been given, next year we are needing to reach 96 first years… So this is about approaches we are taking to maintain our ethos and manage the demand. For instance we have peer support for dissertation sessions – with students benefitting from each others’ experience but that requires experienced programme staff to be there, and one-to-one dissertation supervision to complement that.

It is really important that students feel part of the community around the programme, fellow students, alumni, etc. and we also have key partnerships with NHS Scotland, with colleagues in other organisations, as well as colleagues in IAD and Digital Education. In order to maintain our integrity as we grow we are keeping up to speed on the literature on education, as well as understanding staff and student perceptions of the course. As we are marking those essays we always leave a 3 minute audio with in-text comments, to make it clear a real person has engaged directly and so that students understand the feedback specifically to them. And we also link student projects with our research, which helps maintain that sense of community post-programme.

In terms of challenges for the university, we need systems that can handle noise. We have live video sessions with ADobe connect. We use that system as you can have equal size video screens for everyone, as Blackboard Connect is too hierachichal for our ethos. But Adobe Connect maxes at about 12 people. We have really hit a brick wall with that, it seems hard to find a way to have video that scales properly. And playback later is really problematic. Discussion boards are also problematic – they get noisy and busy and hard to build good engagement where all voices are heard…

Something that does work well is communication and oversight. We have regular meetings. We also have a supervision roundup – to make sure we keep up with all of those projects. With so many students you need an approach to do that.

We have a programme team of five including Gill, Derek, Debbie our administrators – a good administrator is absolutely critical and the students love Debbie – and we have the newest member of our team ?. Growth isn’t about more students, then more staff… It’s about carefully curating and managing the people and that is key to scaling. That means recruitment is crucial as the quality of the programme and student experience depends on the right mix of staff being in place.


Q1) We run something similar as a programme but it is both on campus and online. We have similar challenges to scaling… I was wondering what you think might be the limits to scaling here? Our programme is digital media design, they are practitioners… Just having more people doesn’t quite work… I am wondering where you think the boundaries are?

A1 – GA) I think we are at the boundary for us right now. Some of the comparable courses worldwide that are more correspondance courses, have hit issues because they have lost or lack that sense of community. We have good will, we are able to have alumni act as supervisors for some programmmes. We don’t want more than 100 starting in any given year. And if you focus too much on the technology, you lose the human part that makes these courses special….

A1 – DJ) Whilst I wouldn’t encourage the McDonalds idea of education, I do think that that would be the only way to scale – to clone the approach and programme and ethos but how you do that and retain the human aspects that are so important is a really hard thing to answer?

Q2) On campus students have lots of access to support of all kinds, not just IT. How would you factor in some of the support services for students in an online course? I’m thinking pastoral type interactions… And the support staff you can outsource effort in that way online?

A2 – GA) It’s an intresting question. A few years ago they introduced personal tutoring for postgrad students and online stduents. At one point I had 80 tutees and that’s difficult… These are professionals so they have fewer small crises but when they have an issue it tends to be more severe, and our role is to be an interface between them and the wider university system, doing right with regulations, taking breaks, delayed deadlines, etc. Actually I’ve never had such good relationships with my students. We have amazing people, amazing graduates, and the university could be missing a trick by not doing more with this…

Q3) I wondered about if you could expand on that… Are there things about on-campus teaching that we could do better around scaling?

A3 – GA) Yes, speaking as a mother of a student in a cohort of over 500, definitely. Our university regulations need to adapt to that. Part of what works on our programme is that these ideas can be utilised in practice, with real links being made in PGT and online. That’s partly as learners are more mature in thinking and practice.

A3 – TF) These disruptive approaches allows you to examine what you want to achieve, and how best to accomplish that… To look at other opportunities to rethink, to move beyond cramming more students in, to think differently… To see what we want to accomplish but consider how that can be achieved. It’s not translating face to face teaching online, you have to recreate it, it has to be born digital… You rethink it from scratch!

Afternoon Keynote: Educating against psychic numbing – Prof Sian Bayne

I think I came up with my title on a bit of a down day! I will talk about psychic numbing but I wanted to start by talking about a PTAS project we undertook on Yik Yak earier this year. Before I start I have tweeted all my references – do see and shared that…

So Yik Yak was a geo-social, anonymous, mobile networking app launched in 2013. It failed in 2017. And no-one really cared. It was a hugely used platform between 2013 and 2015. Including here at Edinburgh. It was a hyper local platform and it was community moderated through up/down voting. And it was entirely anonymous – with no persistant identifiers. And we knew at Edinburgh that it was in use because of the Digital Footprint research that Louise Connelly and Nicola Osborne do. And you could go in and see what was being said (some lovely examples in Sian’s slides here). We felt there were some interesting things being said on Yik Yak would tell us things about the student experience – the sort of undercurrent to the NSS.

We wanted to see how the campus was carved up as a geographical space… Most social media are non geographical, but Yik Yak was different so we wanted to see initially what these different spaces were like, how they were similar, how they were different. That was our intention… But it’s not quite how it worked. I’ll explain why later but I wanted to say more about our research team.

We ran our reseaech from July 2016 to May 2017. We had computational data which was put through topic modelling of 46k tweets, we ran focus groups, we had digital footprint survey data, and we had participant observation data from our two brilliant student researchers. Our design was tight but…

Yik Yak data is hard to come by… And Yik Yak received a huge amount of venture capital funding and started to decline just as our work started. And just after Brexit they introduced handles, dropping that anonymous aspect… And then in 2016 they laid off 60% of their staff… So we were researcing a dying app, which is interesting in all sorts of ways.

So the developers did something wrong with Yik Yak. In August 2016 thdy introduced handles – they said they were focused on hyper local, anonymity was incidental… And they positioned this as adding functionality. And it was disastrous! The students came back and were appalled – very visible on Yik Yak too. And that was at Edinburgh and global. At that point the developers released a video saying “we messed up” and were bringing back anonymity in November 2016. People were happy about this. But you can see what effect the removal of anonymity had on usage at Edinburgh – a huge deline of use. So we can see this in Yaks… “Yik Yak is like the ex you get back together with because they swear they changed”. They were right to be cynical – it shut down in May 2017. It was worth $400m in 2014, sold for $1m in 2017. And our student researchers, one wrote that no-one really cared about that shut down.

I want to talk about why we should care – not specifically about Yik Yak, but about ephemeral anonymous spaces and the significant social value of anonymity. That was valuable to Yik Yak… But Google that platform you will find a lot of headlines about bullying, harrassment, death threats, victimisation – mainly around US campuses. Also Yik Yak developers accused of releasing data who spewed abuse on the platform. But so much focus has gone on this that it has prevented proper discussion of  the affordances of anonymity. Hate speech wasn’t the majority of what was going on here at Edinburgh, it was mainly student life, quite a lot of health, sex, and sexual and menta health, also some politics – Brexit, Trump, Scottish Independence… We looked for swearing and offensive language – it was only 2% and that included some day to day swears rather than offence.

There is research on other campuses (Black et al 2016)  did not reveal bullying or insulting worthy of demonisation. And Saveski et al (2016) found similar in a very large sample. But the media formed a moral panic narrative around Yik Yak… Looking at the top 100 words in our data sample you can see most of these posts are about people and life…

The kind of things being yacked about of immense social value – when do we get proper grades from the university, are my marks moral for this stage of study, but also a lot of mental health talk. Bancroft and Reid (2016) write about anonymity being another way to engage, not automatically a problematic one. Nissenbaum (1999) talks about the value of being unreachable. We need to consider that more, we need to look beyond the toxic disinhibition narrative, but also the affordances of anonymity.

If the news media described Yik Yak as failing for bullying and abuse, the reality was probably about their business model which couldn’t make anonymity work. There is a great piece (Bachmann et al 2017) about the commodification of personal data. Venturebeat wrote about the decline in growth being about not understanding anonymity. It failed because it is difficult to commodify anonymity… Which brings me neatly only Facebook… Kind of the polar opposite to Yik Yak. It has this compulsive visibility of the “branded self” (Goodwin et al 2016). It is prolific, 92% of students use Facebook, it’s expected, it’s ubiquitous. You have to choose to opt out of some o fthe social norms around this.

The John Lanchester (2017) writes about Facebook as the “biggest surveillance-based enterprise in the history of mankind” with the hugest gap between what it says it does and what it actually dues as a company. You curate this self and they package and sell your data without you having any agency in that…

And that brings us to Facebook’s new research strand – the brain-computer interface… The neuroscience isn’t suggesting this is going to happen so fast but… It doesn’t stop ed tech bloggers getting on board with it… Donald Clark (2017) raves about this on his inexplicably popular blog. But in addition to the uncritical ed tech press. But thankfully we also have critical voices – Ienca and Andorno (2017) talks about rights, rights to cognitive liberty, mental privacy, mental integrity, psychological continuity. And I recommend Ben Williams’ Code Acts blog where he writes about this stuff.

We have this “privacy paradox” – we have lots of students who know the risks of sharing data, but do it anyway. I really liked Baruh, Secinti and Cemalcilar (2017) talk about how difficult it is to participate in effective social life and maintain privacy – the options are poor here. Yik Yak did offer an anonymous option that isn’t available in other places… The idea of an effective takes me to psychic numbing. I highly recommend Zuboff (2015) “Big  Other” article on the inuring of people to the realities of being tracked, parsed, mined and monetised.

Also thinking about ephemerality, Benjamin Haber (2017), writing about SnapChat, talks about different temporalities of interaction. That ephemerality preserves communities without capturing and recording them. Indeed Anne Marie Scott, on her blog, talks about the potential value of ephemerality and time limits.

So, I want to talk about the possibility of a digital future that thinks about that ephemerality. I wanted to quote Melissa, quoting Martha Lane Fox, quoting Aaron Schwartz: “It’s no longer OK not to understand how the internet works”. We don’t just need digital skills, we also need digital understanding, what it means to engage, how we might approach these things critically…

We have the idea of “Digital Redlining” and what that means in the physical and digital world to classify and bound communities in particular ways. Yu-Wei Lin is recording students talking about privacy. Egelman (2016) has created a teaching programme. We have the Digital Footprint MOOC. and Jeremy Knox and James Lamb who have been running a module on Algorithmic cultures – hacking and playing with APIs etc. Carving out space for this in the curriculum.

So I am calling for more discussion of anonymmity and ephemerality.

Response – Kate Orton-Johnson

Thank you to Sian, I think those issues of criticality and anonymity is so important. I wanted to give a local specific example. In our Sociology 1a class – a huge course – we use Mentimeter to poll students. When you ask these students how important internet privacy is to them. We seem to have growing responses of “Important – but  give up some privacy for convenience”. We ask about that, and then we horrify them by unravelling their privacy even if they are quite cautious online.

We ask students about how they manage their data online – most think they do. Then we ask them if they have a library card, use Learn, if they read all the terms and conditions… We say, OK you are enthusiastic about Learn, the importance of content being on there. So, as a lecturer there is data I can see what you can access, and when you do it… And I say, oh but I can also see you as individuals and see how many times you have logged onto learn, how many times, where from, how long… And you get a little nervousness…

And then using aggregate data I can move down to individuals, who you are, all your personal data, your family, your address… And then you have the facial equivelent of weeping.

You have this teaching tool that is immediately useful to our students… But there is a lack of knowledge, a psychic numbing to this surveillance, and how this data can be generated just by being here, that anyone teaching could access this…

Then we poll again and they all thought they could manage data, and aren’t so sure now.

Managing our data is so much more than social media. Our digital footprints and data shadows where we don’t know the implications 5-10 years down the road…


Digital Education: A student perspective from two angles – Bobi Archer, VP Education, EUSA

For this presentation I talked to learners on campus and online about digital education and technology in education. So, starting with on-campus learning, the first thing that came up was lecture capture. The response to this from the student body has been very favourable. The only negatives have been around the volume of courses online, not the technology itself – which is amazing given the scale of technology put into place. There has been huge use – 120,000 views with 9% using the notes feature.

So, why is lecture capture so useful? It allows for more focus on content rather than taking notes, allows the ability to rewatch topics for revision, it is beneficial for students who may be unable to attend the lecture, good for introductory lectures and basic concepts. And great to see PTAS supporting projects as well. It is particularly good for parents and carers, for our non-native English speaking students, and students with a learning profile. BUT it is not a good option for timetabling clashes or overspill – it isn’t a good alternative here – streaming to an alternative overflow space for instead is not the experience our students need online.

I also wanted to talk about “flipped” classroom teaching. That allows students to be able to engage with the content before class; it allows for more in depth dialogue between staff and stduents; and it is engaging and encourages attendance.

Top Hat in mid-semester feedback has been really useful. It allows immediate polling. TopHat is an online platform that anyone with a device can use and give feedback in real time – and it was used in flipped classroom teaching, but also for mid-semester feedback.The course organiser can respond and make follow up actions. It is anonymised and that is also really useful – so you can submit and share something very honestly.

A comment from a rep from the Vet school noted the importance of making changes within the semester – benefiting that group of students, not only the next generation of students. That also helps students engage in feedback, they see the real benefit.

Another topic that came up was online submissions. In 2014/15 we had 0.5 million online submissions, but in 2015/16 with the new policy in place we had 1.9 million. That’s a huge jump. And actually that’s been well received – it’s more flexible, a hub for all assignments and feedback, and Turnitin can help educate students about plagiarism. It is especially helpful for students on elective courses, joint-degree students, and again students with a learning profile.

So, online distance learning. I found it so interesting to see the feedback from the online learning reps – fewer responses here but really well considered…

How to start here… MOOCs – short online courses with no entry requirements, accessible education, typically 1-2 hours study for around 5 weeks, a great taster, Free to access, but paid for certificates That’s part of the picture. But we also have online programmes. These are accessible education to a global market, for a diverse range of students, and it is flexible learning – students are able to learn at their own page. And it’s a way to be part of the Edinburgh University Community. That helps widening participation students, international students, and also for students with learning profiles and physical disabilities.

Three big topics came up: learning and teaching; community; and representation. The comments on quality of materials and content were really positive – really good quality content, with material available from the outset to enable more flexibility. Many of our students online work full time as well so that is crucial. I saw loads of feedback that tutors responded quickly. In terms of recommendations there was a call for more course choices. Also updating in materials, especially in recordings. And they would like more individual meetings for academic support, and more focus on questions raised by other students  – wanted to know what else was raised by their peers, not just the tutors. One rep commented that it was a really positive experience, and discussion boards were really active where they were used.

In terms of community, students did really feel valued and part of such a high reputation institution. They used social media as part of supporting that community, but much of that was created by students. They wanted something more permanent, more embedded in the University community… Whether Learn or something like Piatza… Something that would enable building of a permanent community. They recognised the challenges of being a cohort – there are limits to building that feeling of community online. And actually simple things make a difference – a picture and personalised element for communications and lecturers. They also wanted more inclusion in on-campus events – watching events later via the lecture capture system for instance. And they also wanted to know that they could be engaging in societies and activities – online students don’t tend to be aware of these even though there are over 2000 UoE societies and communities..

In terms of representation they felt there was good training, that tutors were responsive to student emails, but there was mixed feedback regarding student engagement. But there were challenges there. Some recommendations included some sort of online platform for communication – representation, feedback and community all being key parts here. They wanted more opportunity to voice feedback. They found it difficult to attend meetings and be in the loop – many are not in Edinburgh which is fairly obvious for online distance students. There is an opportunity for more use of Skype or similar, especially for committees, for staff-student liaison discussions especially. The reps did get that information, but were unsure what to do with that… We do have training for ODL students, but that needs to change and be updated… Reps felt well supported, but it can be difficult to get students to response.

In terms of further consideration, there are some things to be considered.. It’s about the provision outwith the curriculum: welcome week; student societies and activities; academic and personal support – how do personal tutors work here; promotion of services and ability to access e.g. Advice place, Counselling and Disability. The other aspect here is deadline extensions and special circumstances.

In terms of the potentials for digital education – there is so much that could happen. The Near Future Teaching project is looking at this about what the future can hold. (Cue an excellent Near Future Teaching video). 

A proposed future Digital Education Ecosystem for the  University of Edinburgh – Mr Gavin McLachlan

I wanted to talk a bit about Digital Transformation. We’ve been talking about what the future might look like but I think we also need to think about how we transform as an organisation – so we don’t end up like Kodak. There is a bit of a picture of what the future might look like. A proposal I’ve made to the University is that “every educators is a digital educator” – does that mean everything is online. There had been an awful lot of discussion of data capture, does that mean you use it for everything? No, it’s not appropriate to use it for everything. But what I am saying is that every lecturer understands that technology, the changes to pratices and pedagogies around lecture capture, and how that might be useful to you, they understand the capabilities available for use. Similarly “every student is a digital student”. Then we get to “every university service is a digital service” – how do we make our services 24 x 7 services, something governments are also looking at. Then “every decision considers the available evidence” – lets use the technology and data we have to make informed decisions… And then “Everyone plans and updates their digital skills” – how do we all keep up to date, how do we plan as individuals to do this. We have lots of resources to help you do this – lots of courses etc. But all of you should be thinking on what the next skill you are planing on learning, and then the next… How do you plan that… My next skill is the advanced OneNote course by the way… And then “we need to stop wondering about the future – and start predicting and proactively engaging” – we have techniques available to us, many of them invented at this very university.

And then we also need to embrace the idea of the hyper connected digital economy and digital community. How do we connect to our colleagues, our students, our community around us, how big is that, how active is it. Ask any analyst to assess, in a modern way, the strength of an organisation they won’t just look at size, number of units shipped, but also the connections and size of the digital community. That should be a modern strength that we have as well.

There has been a lot of talk of digital disruption – we see that in a lot of contexts. We heard a lot about lecture capture, that’s fresh in our minds as it’s a disruptive technology that is new to us. How can we use this new technology in new and inventive ways? I think there will be significant disruption in the education and Higher Education market. We are going to get hit, we are already geting hit. We were one of the first universities running online distance education courses, but we haven’t scaled up like others. We have great quality courses but they haven’t grown in size as quickly. And also at a tiny handful of universities in the US they have scaled up their learning significantly. For instance University of Illinois has an iMBA so that they have gone from 250 in 2015 to 1500 students in 2017. How have they done that at that speed? Well they use some automation. Georgia Institute of Technology has gone from around 100 students in computer science (masters) in 2015 to 10,000 students in 2017. These numbers should worry us, things are changing.

We have a campus, and we’ve always had that. That limits competition and scaling. But online it is easier to scale, to poach students from other providers. And meanwhile we have MOOCs. Coursera has over 27 million learners. We have a huge market there. And we have a huge opportunity here, with students across the globe and QS ranking Edinburgh in the Top 10 universities in the world.

So, there is opportunity here, but also disruption and a need to address that. I think we need to create a Digital Education Ecossyetm, and that the strength of this is in five key components that build into a giant virtuous circle. Firstly there is Blended Learning – which we use a lot of today (lecture capture is part of this); we also have almost 70 online masters courses – of very high quality; we also need to build, I’m proposing, a digital skills component and we have the beginnings of that – this is the non-accredited but important skills component and how do we equip students to keep up in the future and we need to pay more attention to that; we have a strong element of MOOCs and free education is also critical.

Finally there is a new component that Sian Bayne, Melissa Highton, Charlie Jeffrey and I are working on, which is Distance Learning at Scale. That is not a distance education product, it is not an on-campus product, it’s about very large courses online. It won’t be the same, and should not be sold as the same, as those very high quality involved programmes. But we can use adaptive learning, partial teacher automation, automated marking with blogging… Where you can run a course at scale and online. It’s not going to be for everyone, but some people will want that product. World wide demand for education is huge, and we have to see how we should expand impact into that wider space. There is a lot of enthusiasm at University Court about this, but caution too. So we hope to introduce three pilot courses next year – more information will appear soon on that. This ecosystem is critical, the components all support and benefit each other and allow you to re-use and remix content. And financially that is not an even picture… MOOCs and Digital Skills don’t generate income really… Having them within the wider ecosystem allows you to balance those demands and support the less income generating activities through the more income generating elements. That’s how that has been working in some of those US universities who have introduced larger online courses.

We also need to think about our relationships within and around the university, particularly between Employers wanting to support their staff and their education, the University and the Students. That relationship is going to be quite important. Actually there is a huge market of employed people who can and want to study online. Some of our analysis of distance learning at scale showed that the single largest group (~65-70%) in universities that take distance learning programmes, live within 100m of the university.

We can think of this as a funnel and pipeline of students and income. That starts with MOOCs, then Distance Learning at Scale courses, then Distance Learning at Scale Masters, then Full Online Masters… With the potential to promote, prequalify, and build relationships as students move through that pipeline, building trust in the quality of our education in a low risk way – rather than asking them to either come on campus and pay, or you go online and pay, or you don’t. That’s a high risk product to buy… That’s what we ask students to do right now.

So we want Distance Learning at Scale using the right combination of pedagogy, quality and judicious use of approaches such as automation to enable that. Providing more choice and flexibility in when students learn and what they learn.


Q1) That’s really interesting. I’m deeply embedded in online learning. I get how you can do the teaching, and how these things line up… But there is an elephant in the room… You said thousands of students in an online masters programme… That’s SQF Level 11, how do you assess in the round for that, at that level.

A1) There are a number of ways to approach this. Prof Sian Bayne and colleagues are looking at assessment approaches. Automation is a reasonable fit for some subject areas, and will run faster for some areas than others. So Georgia Institute of Technology for £10k masters students do assessment with a high quality AI agent that electronically assess the students submissions – based on their code submissions. They can critically assess that work. There are also peer review, group review, etc. which can be used, and which help reduce the cost per student of marking and assessment.

Q2) Where is this going? Is the intention that we all move to this place in the next five years.. ?

A2) I don’t think that the current campus or online masters flip over to distance learning at scale. I think we will still have that ecosystem in 5 to 10 years time. There will be three different types of products for students in different circumstances. Some courses may want to move to scale. Others may want to shift to more interaction… There will be shift back and forth. So computer science may be available on-campus and as distance learning at scale. Different products for different students. We also have a new business case form and process for the Distance Education at Scale courses, which includes market research on likely courses to success at scale. We have used that information in these business cases to make the pitch, we’ll then pick three of these to pilot.

Q2) I presume there is a heavy component of risk analysis here?

A2) Absolutely. The reputation of the university is key. What we offer has to be at high quality and research-led teaching.

Closing Address – Prof Charlie Jeffery

My starting point, in thinking of new forms of higher volume online distance learning, is to think about what we are for – this university in particular but as a sector more widely too. Discovering and disseminating knowledge through research and teaching is the heart of what we do. We’ve been doing that since 1583. When we started out we were a civic organisation with a local context. We say inspiration outside the UK as well as those within the UK. In the eighteenth century that extended to the USA, in the nineteenth century we began to attract students and other interactions with India, China and elsewhere. And in the last three or four decades we have become genuinely global. When we go to MOOC land we cover pretty much every country in the world. It’s pretty much just North Korea we don’t reach. There are maybe 3 or 4 other universities in the world with that reach for dissemination of knowledge across the world, we do need to think about that distance learning at scale.

Another prompt for me was when I was in India, travelling with the Prime Minister on one of her first overseas missions. In doing so I met one of the Hindujah brothers and spoke about distance online education. And he cut me off, saying he’d give me 50,000 nurses – an offer for training a huge group of nurses. It was a throwaway comment but that first prompted me to think about that jump in scale. That was underlined more generally in thinking about India. India has HE participation of about 13%, and want to be at 30% quickly. It is a very large country and it takes time to develop new HE organisations. And it’s not just India… There is a demand there that is easily evidenced, and connects with a capability we have shown through our MOOC programme in reaching very large quantities of people.

John, in prompting me for today, asked me about the benefits of scale that are other than financial. That’s an interesting assumption to make about scale. There is a financial calculation there, which has something to do with how successive governments have approached immigration policy, and created barriers to students in various parts of the world. Then, if that’s the case, attracting more students online becomes important. So finance is part of it… But I don’t see it as the driving reason. The driving reason is our mission. I wanted to give an example of a course we run – at smaller scale – which is the Masters in Family Medicine which runs in India in particular but has participation from other countries. It is aimed at medical practitioners – often with little training – to support them to deal with issues in situ… These are contexts where referrals to hospital often isn’t practical, better to do stuff in communities with more effective primary healthcare strategies. That is noble and speaks to our mission of ensuring our knowledge is disseminated and makes a difference. So if that can be accomplished by delivery at scale, why wouldn’t we do it. And that’s why Gavin and I are working on this distance learning at scale.

For me there are four principles here. Firstly the teaching that we do in this form has to be research-led. We are distinguished by our research excellence. And we aspire for that to be disseminated to our students who are also increasingly co-creators of our knowledge. That’s not the approach many take. We also need to not do this at arms length – Liverpool is one of the big online providers but that’s all arms length through a company in the Netherlands. That’s not what we are doing, it is a core part of what we do. Third principle is that this has to be at high quality. We have the advantage of the Centre for Research in Digital Education and the work of Prof Sian Bayne and her colleagues. We have to strike the right balance between technology adn real human interaction. Whatever we do has to be at very high quality. Finally our work has to provide excellent student experience. Actually our online distance students typically give us more positive feedback than our on-campus students. We need to understand that, and build on that as we develop our online courses – and we need to think about how we apply that in on-campus experiences too.

We are seeing this work as part of the normal academic role, and that’s how we’ve been engaging with schools. I see this as an opportunity but there is a threat there too. Some places will be left behind. Online distance learning is being offered at lower levels of intellectual content, or hived off, or run by non academic organisations… I don’t think quality there will be high. But that will be cheap, which will make it attractive to some students. And they may choose that over traditional on-campus settings. That’s a big threat to many universities. It’s not a big threat to us in the same way – we are seen as having a quality premium. But we are also in a really strong position to do this in a rigorous high quality way. That’s part of the legacy of Tim O’Shea, who retires soon, in advocating and supporting colleagues to experiment. We are thus in a tremendously healthy position to place ourselves in the changing context of Higher Education alongside smaller scale online courses, and alongside open courses too. But ONLY if it fits with our mission and culture.


Q1) One of our speakers earlier showed a slide of children in Uganda with tablets who said “they can do a degree at Harvard for nothing”. How are we placed to compete?

A1) I think we are well placed to compete. Harvard does do quite a lot online – but not usually for nothing. University of Edinburgh is well placed to be in that position and we are better placed than most if not any other university in the UK. I’d be quite comfortable where that becomes the norm, as long as education in that market is at high quality. We aren’t that clear on pricing… We have some indication from universities, but at different quality levels… We will be exploring and experimenting over the next couple of years. But I don’t think anyone else in the UK is engaging in that thinking in the UK at the moment.

Q2) I liked that you picked up the threat of low quality providers… We are in a context where education is thought of as a market. We are encouraged to think of students as customers… There may be choices and pressures to protect income… How do we protect our reputation and the quality so that all can be assured of the graduates we produce.

A2) We have to be rigorous about standards and qualities. The biggest drivers of our excellent global rankings are citations, and reputational surveys of academics and employers. That says how precious reputation is, and how we can’t afford to damage that. We have a certain amount of risk appetite but we have a low appetite for risk around finance, and about reputation. So absolute rigour in the standard of what we do is critical.

Q3) Do we currently have enough teachers to do this?

A3) No. Part of the rigorous business planning process is to secure investment to allow us to deliver pilot projects which will help us go further, to seek investment. We will need additional capability to do that.

And with that we close the day. Thank you to all our speakers, participants, and the IT Futures organising committee for a stimulating day (do take a look at the tweets for lots more debate and discussion from across the sessions). 

 December 13, 2017  Posted by at 11:25 am LiveBlogs Tagged with:  No Responses »
Nov 172017

Today I am at the Scottish Government for the Digital and Information Literacy Forum 2017.

Introduction from Jenny Foreman, Scottish Government: Co-chair of community of practice with Cleo Jones (who couldn’t be here today). Welcome to the 2017 Digital and Information Literacy Forum!

Scottish Government Digital Strategy – Cat Macaulay, Head of User Research and Service Design, Scottish Government

I am really excited to speak to you today. For me libraries have never just been about books, but about information and bringing people together. At high school our library was split between 3rd and 4th year section and a 5th and 6th year section, and from the moment I got there I was desperate to get into the 5th and 6th year section! It was about place and people and knowledge. My PhD later on was on interaction design and soundscapes, but in the context of the library and seeking information… And that morphed into a project on how journalists yse information at The Scotsman – and the role of the library and the librarian in their clippings library. In Goffman terms it was this backstage space for journalists to rehearse their performances. There was talk of the clippings library shutting down and I argued against that as it was more than just those clippings.

So, that’s the personal bit, but I’ll turn to the more formal bit here… I am looking forward to discussions later, particularly the panel on Fake News. Information is crucial to allowing people to meaningfully, equally and truly participate in democracy, and to be part of designing that. So, the imporatnce of digital literacy is crucial to participation in democracy. And for us in the digital directorate, it is a real priority – for reaching citizens and for librarians and information professionals to support that access to information and participation.

We first set out a digital strategy in 2011, but we have been refreshing our strategy and about putting digital at the heart of what we do. Digital is not about technology, it’s a cultural issue. We moved before from agrarian to industrial society, and we are now in the process of moving from an industrial to a digital society. Aiming to deliver inclusive economic growth, reform public services, tackle inequalities and empower communities, and prepare people for the future workplace. Digital and information literacy are core skills for understanding the world and the future.

So our first theme is the Digital Economy. We need to stimulate innovation and investment, we need to support digital technologies industr, and we need to increase digital maturity of all businesses. Scotland is so dependent on small businesses and SMEs that we need our librarians and information professionals to be able to support that maturity of all businesses.

Our second theme is Data and Innovation. For data we need to increase public trust in holding data securely and using/sharing appropriately. I have a long term medical issue and the time it takes to get appointments set up, to share information between people so geographically close to each other – across the corridor. That lack of trust is core to why we still rely on letters and faxes in these contexts.

In terms of innovation, CivTech brings together the public sector teams and tech start-ups to develeop solutions to real problems, and to grow and expand services. We want to innovate and learn from the wider tech and social media context.

The third theme is Digital Public Services, the potential to simplify and standardise ways of working. Finding common technologies/platforms build and procured once. And design services with citizens to meet their needs. Information literacy skills and critical questioning are at the heart of this. You have to have that literacy to really understand the problems, and to begin to be looking at addressing that, and co-designing.

The fourth theme is Connectivity. Improving superfast broadband, improving coverage in rural areas, increasing the 4G coverage.

The fifth theme is Skills. We need to build a digitally skilled nation. I spent many years in academia – no matter how “digital native” we might assume them, actually we’ve assumed essentially that because someone can drive a car, they can build a car. We ALL need support for finding information, how to judge it and how to use it. We all need to learn and keep on learning. We also need to promote diversity – ensuring we have more disabled people, more BAME people, more women, working in these areas, building these solutions… We need to promote and enhance that, to ensure everyone’s needs are reflected. Friends working in the third sector in Dundee frequently talk about the importance of libraries to their service users, libraries are crucial to supporting people with differing needs.

The sixth theme is Participation. We need to enable everybody to share in the social, economic and democractic opportunities of digital. We need to promote inclusion and participation. That means everyone participating.

And our final theme (seven) is Cyber Security. That is about the global reputation for Scotland as a secure place to work, learn and do business. That’s about security, but it is also about trust and addressing some of those issues I talked about earlier.

So, in conclusion, this is a strategy for Scotland, not just Scottish Government. We want to be a country that uses digital to maximum effect, to enable inclusion, to build the economy, to positively deliver for society. It is a living document and can grow and develop. Collective action is needed to ensure nobody is left behind; we all remain safe, secure and confident about the future. We all need to promote that information and digital literacy.

Q1) I have been involved in information literacy in schools – and I know in schools and colleges that there can be real inconsistency about how things are labeled as “information literacy”, “digital literacy”, and “digital skills”. I’m slightly concerned there is only one strand there – that digital skills can be about technology skills, not information literacy.

A1) I echo what you’ve just said. I spent a year in a Life Sciences lab in a Post Doc role studying their practice. We were working on a microscopy tool… And I found that the meaning of the word “image” was understood differently by Life Scientists and Data Scientists. Common terminology really matter. And indeed semantic technologies enable us to do that in new ways. But it absolutely matters.

Q2, Kate SVCO) We are using a digital skills framework developed that I think is also really useful to frame that.

A2) I’m familiar with that work and I’d agree. Stripping away complexity and agree on common terms and approaches is a core focus of what we are doing.

Q3) We have been developing a digital skills framework for colleges and for the student lifecycle. I have been looking at the comprehensive strategy for schools and colleges by Welsh Government’s… Are there plans for similar?

A3) I know there has been work taking place but I will take that back.

Q4) I thought that the “Participation” element was most interesting here. Information literacy is key to enabling participation… Say what you like about Donald Trump but he has made the role of information literacy in democracy very vital and visible. Scotland is in a good place to support information literacy – there are many in this room have done great work in this area – but it needs resourcing to support it.

A4) My team focuses on how we design digital tools and technologies so that people can use them. And we absolutely need to look at how best to support those that struggle. But is not just about how you access digital services… How we describe these things, how we reach out to people… I remember being on a bus in Dundee and hearing a guy saying “Oh, I’ve got a Fairer Scotland Consultation leaflet… What the fuck is a Consultation?!”. I’ve had some awkward conversations with my teenage boys about Donald Trump, and Fake News. I will follow up with you afterwards – I really welcome a conversation about these issues. At the moment we are designing a whole new Social Security framework right now – not a thing most other governments have had to do – and so we really have to understand how to make that clear.

Health Literacy Action Plan Update – Blythe Robertson, Policy Lead, Scottish Government

The skills, confidence, knowledge and understanding to interact with the health system and maintain good health is essentially what we mean in Health Literacy. Right now there is a huge focus in health policy on “the conversation”. And that’s the conversation between policy makers and practitioners and people receiving health care. There is a model of health and care delivery called “More than Medicine” – this is a memorable house-shaped visual model that brings together organisational processes and arrangements, health and care professionals, etc. At the moment though the patient has to do at least as much as the medical professional, with hoops to jump through – as Cat talked about before…

Instructions can seem easy… But then we can all end up at different places [not blogged: an exercise with paper, folding, eyes closed].

Back when computers first emerged you needed to understand a lot more about computer languages, you had to understand how it worked… It was complex, there was training… What happened? Well rather than trianing everyone, instead they simplified access – with the emergence of the iPad for instance.

So, this is why we’ve been trying to address this with Making it easy: A health literacy action plan for Scotland. And there’s a lot of text… But really we have two images to sum this up… The first (a woman looking at a hurdle… We’ve tried to address this by creating a nation of hurdlers… But we think we should really let people walk through/remove those hurdles.

Some statistics for you: 43% of English working age adults will struggle to understand instructions to calculate a childhood paracetamol dose. There is lot bound up here… Childhood health literacy is important. Another stat/fact: Half of what a person is told is forgotten. And half of what is remembered is incorrect. [sources: several cited health studies which will be on Blythe’s slides]. At the heart of issue is that a lot of information is transmitted… then you ask “Do you understand?” and of course you say “yes”, even if you don’t. So, instead, you need to check information… That can be as simple as rephrasing a question to e.g. “Just so I can check I’ve explained things clearly can you tell me what you’ve understood” or similar.

We did a demonstrator programme in NHS Tayside to test these ideas… So, for instance, if you wander into Nine Wells hospital you’ll see a huge board of signs… That board is blue and white text… There is one section with yellow and blue… That’s for Visual Impairment, because that contrast is easier to see. We have the solution but… People with visual impairment come to other areas of the hospitals. So why isn’t that sign all done in the same way with high contrast lettering on the whole board? We have the solution, why don’t we just provide it across the board. That same hospital send out some appointment letters asking them to comment and tell them about any confusion… And there were many points that that happened. For instance if you need the children’s ward… You need to know to follow signs for Paediatrics first… There isn’t a consistency of naming… Or a consistency of colour. So, for instance Maternity Triage is a sign in red… It looks scary! Colours have different implications, so that really matters. You will be anxious being in hospital – consistency can help reduce the levels of anxiety.

Letters are also confusing… They are long. Some instructions are in bold, some are small notes at the bottom… That can mean a clinic running 20 minutes late… Changing what you emphasise has a huge impact. It allows the health care provision to run more smoothly and effectively. We workshopped an example/mock up letter with the Scottish Conference for Learning Disability. They came up with clear information and images. So very clear to see what is happening, includes an image of where the appointment is taking place to help you navigate – with full address. The time is presented in several forms, including a clock face. And always offer support, even if some will not need it. Always offer that… Filling in forms and applications is scary… For all of us… There has to be contact information so hat people can tell you things – when you look at people not turning up to appointments was that they didn’t know how to contact people, they didn’t know that they could change the appointment, that they wanted to contact them but they didn’t want to make a phone call, or even that because they were already in for treatment they didn’t think they needed to explain why they weren’t at their outpatients appointment.

So, a new action plan is coming called “Making it easier”. That is about sharing the learning from Making it Easy across Scotland. To embed ways to improve health literacy in policy and practice. To develop more health literacy responsive organisations and communities. Design supports and services to better meet people’s health literacy levels. And that latter point is about making services more responsive and easier to understand – frankly I’d like to put myself out of a job!

So, one area I’d like to focus on is the idea of “Connectors” – the role of the human information intermediary, is fundamental. So how can we take those competancies and roll them out across the system… In ways that people can understand… Put people in contact with digital skills, the digital skills framework… Promoting understanding. We need to signpost with confidence, and to have a sense that people can use this kind of information. Looking at librarians as a key source of information that can helps support people’s confidence.

In terms of implementation… We have at (1) a product design and at (3) “Scaled up”. But what is at step (2)? How do we get there… Instead we need to think about the process differently… Starting with (1) a need identified, then a planned structured resources and co-developed for success, and then having it embedded in the system… I want to take the barriers out of the system.

And I’m going to finish with a poem: This is bad enough by Elspeth Murray, from the launch of the cancer information reference group of the South East Scotland Cancer Network 20 January 2016.


Q1) I’m from Strathclyde, but also work with older people and was wondering how much health literacy is part of the health and social care integration?

A1) I think ultimately that integration will help, but with all that change it is challenging to signpost things clearly… But there is good commitment to work with that…

Q2) You talked about improving the information – the letters for instance – but is there work more fundamentally questioning the kind of information that goes out? It seems archaic and expensive that appointments are done through posted physical letters… Surely better to have an appointment that is in your diary, that includes the travel information/map….

A2) Absolutely, NHS Lothian are leading on some trial work in this area right now, but we are also improving those letters in the interim… It’s really about doing both things…

Cat) And we are certainly looking at online bookings, and making these processes easier, but we are working with older systems sometimes, and issues of trust as well, so there are multiple aspects to addressing that.

Q3) Some of those issues would be practically identical for educators… Teachers or lecturers, etc…

A3) I think that’s right. Research from University of Maastrict mapped out the 21 areas across Public and Private sectors in which these skills should be embedded… And i Think those three areas of work can be applied across those area… Have to look at design around benefits, we have some hooks around there.

Cat) Absolutely part of that design of future benefits for Scotland.

Panel Discussion – Fake News (Gillian Daly – chair; Lindsay McKrell (Strathclyde); Sean McNamara (CILIPS); Allan Lindsay (Young Scott))

Sean: CILIPS supports the library and information science community in Scotland, including professional development, skills and ethics. Some years ago “information literacy” would have been more about university libraries, but now it’s across the board an issue for librarians. Librarians are less gatekeepers of information, and more about enabling those using their libraries to seek and understand information online, how to understand information and fake news, how to understand the information they find even if they are digitally confident in using the tools they use to access that information.

Allan: Young Scot is Scotland’s natural charity for information literacy. We work closely with young people to help them grow and develop, and influence us in this area. Fake News crops up a lot. A big piece of work we are involved in is he 5 Rights projects, which is about rights online – that isn’t just for young people but significantly about their needs. Digital literacy is key to that. We’ve also worked on digital skills – recently with the Carnegie Trust and the Prince’s Trust. As an information agency we reach people through our website – and we ensure young people are part of creating content in that space.

Lindsay: I’d like to talk about digital literacy as well as Fake News. Digital literacy is absolutely fundamental to supporting citizens to be all that they can be. Accessing information without censorship, and a range of news, research, citizenship test information… That is all part of public libraries service delivery and we need to promote that more. Public libraries are navigators for a huge and growing information resource, and we work with partners in government, in third sector, etc. And our libraries reach outside of working hours and remote areas (e.g. through mobile levels) so we have unique value for policy makers through that range and volume of users. Libraries are also well placed to get people online – still around 20% of people are not online – and public libraries have the skills to support people to go online, gain access, and develop their digital literacy as well. We can help people find various source of information, select between them, to interpret information and compare information. We can grow that with our reading strategies, through study skills and after school sessions. Some libraries have run sessions on fake news, but I’m not sure how well supported thse have been. We are used to displaying interesting books… But why aren’t our information resources similarly well designed and displayed – local filterable resources for instance… Maybe we should do some of this at national level,  not just at local council level. SLIC have done some great work, what we need now is digital information with a twist that will really empower citizens and their information literacy…

Gillian Daly: I was wondering, Allan, how do you tackle the idea of the “Digital Native”? This idea of inate skills of young people?

Allan: It comes up all the time… This presumption that young people can just do things digitally… Some are great but many young people don’t have all the skills they need… There are misconceptions from young people themselves about what they can and cannot do… They are on social media, they have phones… But do they have an understanding of how to behave, how to respond when things go wrong… There is a lot of responsibility for all of us that just because young people use these things, doesn’t mean they understand them all. Those misconceptions apply across the board though… Adults don’t always have this stuff sorted either. It’s dangerous to make assumptions about this stuff… Much as it’s dangerous to assume that those from lower income communities are less well informed about these things, which is often not correct at all.

Lindsay: Yes, we find the same… For instance… Young people are confident with social media… But can’t attach a document for instance…

Comment from HE org: Actually there can be learning in both directions at University. Young people come in with a totally different landscape to us… We have to have a dialogue of learning there…

Gillian: Dialogue is absolutely important… How is that being tackled here…

Sean: With school libraries, those skills to transfer from schools to higher education is crucial… But schools are lacking librarians and information professionals and that can be a barrier there… Not just about Fake News but wider misinformation about social media… It’s important that young people have those skills…

Comment: Fake News doesn’t happen by accident… It’s important to engage with IFLA guide to spot that… But I think we have to get into the territory of why Fake News is there, why it’s being done… And the idea of Media and Information Literacy – UNESCO brought those ideas together a few years ago. There is a vibrant GATNO organisation, which would benefit from more Scottish participation.

Allan: We run a Digital Modern Apprenticeship at Young Scot. We do work with apprentices to build skills, discernment and resiliance to understand issues of fake news and origins. A few weeks back a young person commented on something they had seen on social media… At school for me “Media Studies” was derided… I think we are eating our words now… If people had those skills and were equipped to understand that media and creation process. The wider media issues… Fake News isn’t in some box… We have to be able to discern mainstream news as well as “Fake News”. Those skills, confidence, and ability to ask difficult questions to navigate through these issues…

Gillian: I read a very interesting piece by a journalist recently, looking to analyse Fake News and the background to it, the context of media working practice, etc. Really interesting.

Cat: To follow that up… I distinctly remember in 1994 in The Scotsman about the number of times journalists requested clippings that were actually wrong… Once something goes wrong and gets published, it stay there and repopulates… Misquotations happen that way for instance. That sophisticated understanding isn’t about right and wrong and more about the truthfulness of information. In some ways Trump is doing a favour here, and my kids are much more attuned to accuracy now…

Gillian: I think one of the scariest things is that once the myth is out, it is so hard to dispel or get rid of that…

Comment: Glasgow University has a Glasgow Media Group and they’ve looked at these things for years… One thing they published years ago, “Bad News”, looked at for instance the misrepresentation of Trade Unionists in news sources, for a multitude of complex reasons.

Sean: At a recent event we ran we had The Ferret present – those fact checking organisations, those journalists in those roles to reflect that.

Jenny: The Ferret has fact checking on a wonderful scale to reflect the level of fakeness…

Gillian: Maybe we need to recruit some journalists to the Digital and Information Literacy Forum.

And on that, with many nods of agreement, we are breaking for lunch.

Information Literacy & Syrian New Scots – Dr Konstantina Martzoukou, Postgraduate Programme Leader, Robert Gordon University

This project was supposed to be a scoping study of Syrian New Scots – Syrian Refugees coming to Scotland. The background to this is the Syrian Civil War since 2011, which has led to an enormous amount of refugees, mainly in the near region. Most research has been on Asylum seekers in the camps near Syria on basic survival and human rights, on their needs and how to respond to them. The aim of this project was different: a scoping study to examine the information related experiences and information literacy practices of Syrian new Scots during their resettlement and integration. So this is quite different as the context is relatively settled, and is about that resettlement process.

In September 2015 the Prime Minister announced an expansion of the refugee programme to take up to 2000 Syrian Refugees. And the first place Syrian Refugees came was Glasgow. Now, there have been a lot of changes since then but there is the intent to resettle 2000 Syrian Refugees by 2020.

Primary research was done with 3 refugee resettement officers, as well as focus groupd with Syrian new Scots. These groups were in both urban (1 group) and rural (2 groups), and included 38 people from across Syria, having been in camps in Lebanon, Turkey and Iraq and Jordan. I didn’t know what to expect – these people had seen the worst horrors of war. In reality the focus groups were sometimes loud and animated, sometimes quiet and sad. And in this group they came from a huge range of professional backgrounds, though most of the women did not work.

So, our work looked at included English language and community integration; Information provisions, cultural differences and previous experiences; Financial security. Today I want to focus on libraries and the role of libraries.

One of the most crucial aspects were language barriers and sociocultural. The refugees were given ESOL classes; a welcome pack with key information for finding the resources in their neighbourhood; a 24 hour Arabic hotline, set up with the mosque for emergencies so that families could receive help outside core working hours; In-house translation services. But one of the challenges across the support given was literacy as a whole – not all of the refugees could read and write in any language. But it was also about understanding interchangable words – “doctor” has a meaning but “GP” not so much. There was also a perception that learning English would be really difficult.

The refugees wanted to know how to learn English, and they were anxious about that. The support officers had different approaches. The ESOL classes were there, but some officers were really proactive, taking refugees to the train station, having mock job interviews… That was really really valuable. But some groups, even a year after arriving, weren’t speaking English. But sometimes that was about the families… Some were confident and really well travelled, but some had lived in one place, and not travelled, and found communication and networking much more difficult. So the language learning was very tied to socio-cultural background.

Many of these families have complex health needs – they were hand picked to come here often because of this – and that causes it’s own challenge. Some had no experience of recycling and of how to correctly put their bins out. Someone felt the open plan kitchen was difficult – that her child was burned because of it. One reported a neighbour telling him not to play with his son outside – the boundaries of danger and expectations of childhood was rather different from their new neighbours. Doctors appointments were confusing. Making bus change was expensive – buying something unneeded because the buses don’t give change. Many wanted family reunion information and support.

Technology is used, but technology is not the key source of information. They used mobile phones with pasy as you go sim cards. They used WhatsApp and were sharing quite traumatic memories and news in this way.

The library is there… But actually they are perceived as being for books and many refugees don’t go there. Community classes, meals etc. may be better. Computer classes can be useful, especially when refugees can participate in a meaningful way. And there are real challenges here – computer classes in the library didn’t work for this group as there were too few computers and the internet connections were too small.

For me the key thing is that we need to position the library as a key place for communication, learning and support for the families.

Q1) Alamal(?) is running events in our libraries – we have an event with films and telling their story – and we have had huge interest in that.

A1) We really want to show case that to the wider community. There are some great examples from England, from other EU countries, but we want more Scottish examples so do please get in touch.

A User Study Investigating the Information Literacy of Scotland Teenagers – David Brazier, Research Assistant, Northumbria University

This is an ILG funded project looking at the Information Literacy of Scottish Teenagers. I’ll introduce the concepts, going through some related works, and then some of the methodology we’d like to implement. So, information literacy is about the ability to seek, understand, assess information. They are crucial to integrating with society as a whole, and is crucial to our modern society. We need to empower students to learn, so they can integrate themselves into modern society.

As the panel talked about earlier, the idea of the “Digital Native” is misleading. Young people have a poor understanding of their information needs. That leads to young people taking the top ranked documents/sites or cite that. And that needs to be counteracted early in their learning so that it doesn’t carry through and right into University (Rowlands 2008). In recent research (Brazier and Harvey 2017) ESOL post graduates were unable to perceive their performance correctly, often judging high performance when the opposite was true. In the “Not Without Me” report this inability to assess their own skills was also highlighted in the wider range of young people. These groups are highly educated, so they should be able to be more reflective on their own practice.

So, in our research, we are using a Mixed Methods approach to do a quantitative analysis of secondary school-aged children’s information gathering behaviour. Triangulated with qualitative assessments of the participants own assessment. It is around a simulated work task.

The search system is based on the TREC AQUAINT collection – large set of over a million documents from three large news agencies collected between 196 and 2000. Pre-defined search topics associated with the project. The initial 15 topics were reduced down to 4 topics selected by school representatives (librarian and 2 teachers from Gracemount High School in Edinburgh).

So, we start with a pre-task questionnaire. The search task is “Tropical strms: What tropical storms (hurricanes and typhoons) have caused significant property damage and loss of life?”. They can then search through a Google-style search of the documents. They click on those sources that seem relevant. And then they get a questionnaire to reflect on what they’ve done.

A pilot was conducted in December 2016. Tasks were randomly selected, using a Latin Square design to ensure no 2 students had the same two tasks. In total 19 students were involved, from S3 (13-14 years old). The study was on PCs rather than handheld devices. No other demographic data was collected. The school representative did provide a (new) unique id to match the task and the questionnaires. The id was known only to the school rep. No further personal data was taken.

We could then look at the queries each student submitted, and were able to ask why they did that and why they selected the article they did.

This is a work in progress… We are interested in how they engage with the study as a whole. We have used the findings of the pilot to adapt the study design and interface, including a task description relocated to a more prominent location; and an instruction sheet (physical) i.e. browser page, interpret interface.

The main study takes place next week, with 100 students (none of whom were part of the pilot). From this we want to get recommendations and guidelines for IL teaching; to inform professional practice; feedback to participants (pamphlet) for reflective purposes; academic publications in the field of information literacy, information retrieval, education and pedagogy.


Q1) Why such a controlled space was selected – presumably students would normally use other places to search, to ask friends etc. So I wondered why you selected such a controlled space like this.

A1) In the previous study we allowed students to look anywhere on the web… But it is much harder to judge relevance in that… These have already been judged for relevance… It’s a wide arc… It adds complexity to the whole process… And someone has to transcribe and mark that footage… For my study there were 29 students and it took 7 months. For 100 students that’s just too large. Test collection is also standardised and replicatable.

The Digital Footprint MOOC – Nicola Osborne, Digital Education Manager, EDINA

This was me… No notes but slides to follow. 

Wikipedia & Information Literacy: the importance of reliable sources – Sara Thomas, Wikimedian in Residence, SLIC

Hi, I’m Wikimedian in Residence at SLIC. The role of a Wikimedian in residence is to work with cultural heritage organisations and Wikimedia and bring the two together. In this role we are working with three local libraries right now but we will be expanding it to a wider Scottish context.

I am passionate about open knowledge and open data. Open data and open knowledge leads to better society, it allows us to make better decisions – I am sick of us being asked to make big decisions about big issues without appropriate information.

Now, I want to introduce you to Bassel Khartabil who was an open source software developer and advocate for open data and knowledge. Knowledge is power… He was detained by the Syrian government and, before he was killed by the government, he wrote very movingly about the impact of open knowledge, that it is so important and a matter of life and death in some contexts.

I want to talk about production of knowledge and what that can teach us about information literacy. Jim Groom at #OER16, said “Wikipedia is the single greatest Open Education Resource the world has ever known”, and he’s not wrong. Wikipedia is more accurate than you may think. There are groups who just edit and work on improving the quality of articles. Women in Red is a group dedicated to having more women’s biographies on Wikipedia. 17% of biographies are women now, that’s 2% more than was the case 2 years ago – and they also work on bringing those biographies up to “featured article” quality.

Quality and ratings scale. Vandalism is picked up quickly – by bots and by people. Wikipedia is neutral in it’s point of view. Nature, in 2005, found that Wikipedia was nearly as accurate as Britannica (2.92 errors per article compared to 3.86 on Wikipedia). The Journal of Clinical Oncology, 2010, found Wikipedia as accurate as Physician Data Query (a premium database). The medical information there is huge – 80% of medical students will use it; ~50% of GPs will use it as a first point in their search. It is the most popular health resource on the web.

Wikipedia is generally the seventh most popular site on the internet. And we have a basic Notability guidance that means an article must be notable, there must be a reason for it being there. The information but be verifiable – the information must come from credible checkable verifiable sources. And we have to use reliable third party publiches sources with a reputation for fact checking and accuracy.

On the subject of media literacy… The Daily Mail didn’t like that Wikipedia doesn’t treat it as reliable – there is no ban but you will get a trigger to ask you if that’s the right source. Brilliantly, they got loads of errors in their own outraged article.

Manipulation is really obvious… The community spots when people are trying to whitewash their own biographies, to promote their company, to try to remove claims of misconduct. And Wikipedia gets it – there is an article on “Wikipedia is not a credible source” – we get it. We are a starting point, a jumping off and discovery point. And in fact we have Wiki Ed ( which works to combat fake news, to support information literacy. If you want to teach information literacy, wiki can help you. We have a Wiki Education Dashboard – mainly in the US, but lots in the UK. Our guides include: Instructor Basics and Case Studies for using Wikipedia in teaching. Some lovely projects here…

I did some work with Chris Harlow, at University of Edinburgh, a few years ago… He found a medical term that wasn’t in Wikipedia, gave them guidance on how to create a Wikipedia page, taught them how to use a medical database, and sent them away to write a section in simple language… Then we show them how to edit an article. It’s really really easy to edit an article now… The students write their section, put it in… And write a page, it goes live… Five minutes later it’s on the front page of Google. It is gratifying to find work so immediately valued and used and useful.

Translation studies at UoE also use Wikipedia in the classroom. Queen Mary’s University of London use Wikipedia in their film classes. They trialled it, it’s now a compulsory part of the programme. It’s a way to teach digital skills, information synthetis. Imperial College London are working to engage undergraduate students involved in synthesising and sharing university. Greg Singh in Sterling University who uses WikiBooks… Which is a project that seeks to create collaboratively produced text books… To produce a text book, a chapter, on what they’ve been doing… It’s about developing collaboration, track that, instill that within a student…

So I have a vide here of Aine Kavanagh from Reproductive Biology at the University of Edinburgh, who authored an article that has been read 20,000 times in the last year. Aine was looking for some extra work, and she wanted to develop her skills. She asked Chris (Harlow) what she could do… She wrote about one of the most common sorts of cancers which there was very little information about. To be able to see the value of that, the impact of that work, that this has been hugely gratifying to do.

To conclude: open knowledge is important, open knowledge gives us a better society, not just being able to find this information but also be able to produce that knowledge is hugely powerful. And Wikipedia is more accurate than you think!


Gillian: I just want to thank all of our speakers, to thank all of you for coming, and to thank the Scottish Government for hosting us.

Oct 042017

This afternoon I’m at the Keynote Session for Information Security Awareness Week 2017 where I’ll speaking about Managing Your Digital Footprint in the context of security. I’ll be liveblogging the other keynotes this afternoon.

The event has begun with a brief introduction from Alistair Fenemore, UoE’s Chief Information Security Officer, and from his colleague David Creighton Offord, the organiser for today’s event.

Talk by John Whitehouse, PWC Cyber Security Director Scotland covering the state of the nation and the changing face of Cyber Threat

I work at PWC, working with different firms who are dealing with information security and cyber security. In my previous life I was at Standard Life. I’ve seen all sorts of security issues so I’m going to talk about some of the things I’ve seen, trends, I’ll explain a few key concepts here.

So, what is cybersecurity… People imagine people in basements with balaclavas… But it’s not that at all…

I have a video here…

(this is a Jimmy Kimmel comedy segment on the Sony hack where they ask people for their passwords, to tell them if it’s strong enough… And how they construct them… And/or the personal information they use to construct that…)

YouTube Preview Image

We do a lot of introductions for boards… We talk about technical stuff… But they laugh at that video and then you point out that these could all be people working in their companies…

So, there is technical stuff here, but some of the security issues are simple.

We see huge growth due to technology, and that speaks to businesses. We are going to see 1 billion connected devices by 2020, and that could go really really wrongly…

There is real concern about cyber security, and they have concerns about areas including cloud computing. The Internet of Things is also a concern – there was a study that found that the average connected device has 25 security vulnerabilities. Dick Cheney had to have his pacemaker re programmed because it was vulnerable to hacking via Bluetooth. There was an NHS hospital in England that had to pause a heart surgery when the software restarted. We have hotel rooms accessible via phones – that will come to homes… There are vulnerabilities in connected pet feeders for instance.

Social media is used widely now… In the TalkTalk breach we found that news of the breach has been leaked via speculation just 20 seconds after the breach occurs – that’s a big challenge to business continuity planning where one used to plan that you’d perhaps have a day’s window.

Big data is coming with regulations, threats… Equifax lost over 140 million records – and executives dumped significant stock before the news went public which brings a different sort of scrutiny.

Morrisons were sued by their employees for data leaked by an annoyed member of staff – I predict that big data loss could be the new PPI as mass claims for data loss take place. So maybe £1000 per customer per data breach for each customer… We do a threat intelligence service by looking on the dark net for data breach. And we already see interest in that type of PPI class suit approach.

The cyber challenge extends beyond the enterprise – on shore, off shore; 1st through to 4th parties. We’ve done work digging into technology components and where they are from… It’s a nightmare to know who all your third parties are… It’s a nightmare and a challenge to address.

So, who should you be worried about? Threat actors vary…. We have accidental loss, Maware that is not targeted, and hacker hobbyists in the lowest level of sophistication, through to state sponsored attacks at the highest level of sophistication. Sony were allegedly breached by North Korea – that firm spends astronomical amounts on security and that still isn’t totally robust. Target lost 100 million credit card details through a third party air conditioner firm, which a hacker used to get into the network, and that’s how the loss occured. And when we talk organised crime we are talking about really organised crime… One of the Ukrainian organised crime groups were offering a Ferrari for their employee of the month prize for malware. We are talking seriously Organised. And serious financial gain. And it is extremely hard to trace that money once its gone. And we see breaches going on and on and on…

Equifax is a really interesting one. There are 23 class action suits already around that one and that’s the tip of the iceberg. There has been a lot of talk of big organisations going under because of cyber security, and when you see these numbers for different companies, that looks increasingly likely. Major attacks lead to real drops in share prices and real impacts on the economy. And there are tangible and intangible costs of any attack…. From investigation and remediation through to DEO and CTO’s losing their jobs or facing prison time – at that level you can personally liable in the event of an attack.

In terms of the trends… 99% of exploited vulnerabilities (in 2014) had been identified for more than a year, some as far back as 1999. Wannacry was one of these – firms had 2 months notice and the issues still weren’t addressed by many organisations.

When we go in after a breach, typically the breach has been taking place for 200 days already – and that’s the breaches we find. That means the attacker has had access and has been able to explore the system for that long. This is very real and firms are dealing with this well and really badly – some real variance.

One example, the most successful bank robbery of all time, was the Bangladesh Central Bank was attacked in Feb 2016 through the SWIFT network .These instructions totalled over US $900 million, mostly laundered through casinos in Macau. The analysis identified that malware was tailored for the target organisation based on the printers they were using, which scrubbed all entry and exit points in the bank. The US Secret Service found that there were three groups – two inside the bank, one outside executing the attack.

Cyber security concerns are being raised, but how can we address this as organisations? How do we invest in the right ways? What risk is acceptable? One challenge for banks is that they are being asked to use Fintechs and SMEs working in technology… But some of these startups are very small and that’s a real concern for heads of securities in banks.

We do a global annual survey on security, across about 10,000 people. We ask about the source of compromise – current employees are the biggest by some distance. And current customer data, as well as IPR, tend to be the data that is at risk. We also see Health and Social Care adopting more technology, and having high concern, but spending very little to counter the risks. So, with Wannacry, the NHS were not well set up to cope and the press love the story… But they weren’t the target in any way.

A few Mythbusters for you…

Anti-Virus software… We create Malware to test our clients’ set up. We write malware that avoids AVs. Only 10-15% of malware will be caught with Anti-Virus software. There is an open source tool, Veil-Framework, that teaches you how to write that sort of Malware so that you can understand the risks. You should be using AV, but you have to be aware that malware goes beyond that (and impacts Macs too)… There is a malware SaaS business model on the darknet – as an attacker you’ll get a guarantee for your malware’s success and support to use it!

Myth 2: we still have time to react. Well, no, the lag from discovery to impacting you and your set up can be minutes.

Myth 3: well it must have been a zero day that got us! True Zero Day exploits are extremely rare/valuable. Attacker won’t use one unless target is very high value and they have no other option. They are hard to use. Even NSA admits that persistence is key to sucessful compromise, not zero day exploits. The NSA created EternalBlue – a zero day exploit – and that was breached and deployed out to these “good guys” as Wannacry.

Passwords… They are a thing of the past I think. 2-factor authentication is more where we are at. Passphrases and strength of passphrases is key. So complex strings with a number and a site name at the end is recommended these days. Changing every 30 days isn’t that useful – it’s so easy to bruteforce the password if lost – much better to have a really strong hash in the first place.

Phishing email is huge. We think about 80% of cyber attacks start that way. Beware spoofed addreses, or extremely small changes to email addresses.

We had a client that had an email from their “finance director” about urgently paying money to an account, which was only spotted because someone in finance noticed the phrasing… “the chief exec never says “Thanks”!”

Malware trends: our strong view is that you should never ever pay for a Ransomeware attack.

I have another video here…

(In this video we have people having their “mind read” for some TV show… It was uncanny… And included spending data… But it wasn’t psychic… It was data that they had looked up and discovered online… )

YouTube Preview Image

It’s not a nice video… This is absolutely real… This whole digital footprint. We do a service called Digital Footprinting for senior execs in companies, and you have to be careful about it as they can give so much away by what you and those around you post… It’s only getting worse and more pointed. There are threat groups going for higher value targets, they are looking for disruption. We think that the Internet of Things will open up the attack surface in whole new ways… And NACS – the Air Traffic people – they are thinking about drones and the issues there around fences and airspace… How do you prepare for this. Take the connected home… These fridges are insecure, you can detect if owner is opened or not and detect if they are at home or not… The nature of threats is changing so much…

In terms of trends the attacks are moving up the value chain… Retain bank clients aren’t interesting compared to banks finance systems, more to exchanges or clearing houses. It’s about value of data… Data is maybe $0.50 for email credentials; a driving license is maybe $25… and upwards the price goes depending on value to the attackers…

So, a checklist for you and your work: (missed this but delighted that digital footprint was item 1)

Finally, go have a look at your phone and how much data is being captured about you… Check your iPhone frequent locations. And on Android check Google Location History. The two biggest companies in the world, Google and Facebook, are free, and they are free because of all the data that they have about you… But the terms of service… Paypal’s are longer than Hamlet. If you have a voice control TV from Samsung and you sign those, you agree to always on and sharable with third parties…

So, that’s me… Hopefully that gave you something to ponder!


Q1) What does PWC think about Deloitte’s recent attack?

A1) Every firm faces these threats, and we are attacked all the time… We get everything thrown at us… And we try to control those but we are all at risk…

Q2) What’s your opinion on cyber security insurance?

A2) I think there is a massive misunderstanding in the market about what it is… Some policies just cover recovery, getting a response firm in… When you look at Equifax, what would that cover… That will put insurers out of business. I think we’ll see government backed insurance for things like that, with clarity about what is included, and what is out of scope. So, if, say, SQL Injection is the cause, that’s probably negligence and out of scope…

Q3) What role should government have in protecting private industry?

A3) The national cyber security centre is making some excellent progress on this. Backing for that is pretty positive. All of my clients are engaging and engaged with them. It has to be at that level. It’s too difficult now at lower levels… We do work with GCHQ sharing information on upcoming threats… Some of those are state sponsored… They even follow working hours in their source location… Essentially there are attack firms…

Q4) (I’m afraid I missed this question)

A4) I think Microsoft in the last year have transformed their view… My honest view is that clients should be on Windows 10 its a gamechanger for security. Firms will do analysis on patches and service impacts… But they delayed that a bit long. I have worked at a firm with a massively complex infrastructure, and it sounds easy to patch but it can be quite difficult to do that in practice, and it can put big operational systems at risk. As a multinational bank for instance you might be rolling out to huge numbers of machines and applications.

Talk by Kami Vaniea (University of Edinburgh) covering common misconceptions around Information Security and to avoid them

My research is on the usability of security and why some failings are happening from the point of view of an average citizen. I do talks to community groups – so this presentation is a mixture of that sort of content and proper security discussion.

I wanted to start with misconceptions as system administrators… So I have a graph here of where there is value to improving your password; then the range in which having rate limits on password attempts; and the small area of benefit to the user. Without benefits you are in the deadzone.

OK, a quick question about URL construction… Is it Facebook’s website, Facebook’s mobile site, AT&T’s website, or Mobile’s website. It’s the last one by construction. It’s both of the last two if you know AT&T own But when you ask a big audience they mainly get it right. Only 8% can correctly differentiate vs Many users tend to just pick a big company name regardless of location in URLs. A few know how to to correctly read subdomain URLs. We did this study on Amazon Mechanical Turk – so that’s a skewed sample of more technical people. And that URL understanding has huge problematic implications for phishing email.

We also tried Most people could tell that was Twitter (not Facebook). But if I used “@” instead of “/” people didn’t understand, thought it was an email…

On the topic of email… Can we trust the “from” field? No. Can we trust a “this email has been checked for viruses…” box? No. Can you trust the information on the source URL for a link in the email, that is shown in the bottom of the browser? Yes.

What about this email – a Security alert for your linked Google account email? Well this is legitimate… Because it’s coming from But you knew this was a trick question… Phishing is really tricky…

So, a shocking percentage of my students think that “from” address is legitimate… Tell your less informed friends how easily that can be spoofed…

What about Google. Does Google know what you type as you type it and before you hit enter? Yes, it does… Most search engines send text to their servers as you write it. Which means you can do fun studies on what people commonly DON’T post to Facebook!

A very common misconception is that opening web pages, emails, pdfs, and docs is like reading physical paper… So why do they need patching?

Lets look at an email example… I don’t typically get emails with “To protect your privacy, Thunderbird has blocked remote content in this message” from a student… This showed me that a 1 pixel invisible image had come with the email… which pinged the server if I opened it. I returned the email and said he had a virus. He said “no, I used to work in marketing and forgot that I had that plugin set up”.

Websites are made of many elements from many sources. Mainly dynamically… And there are loads of trackers across those sites. There is a tool called Lightbeam that will help you track the sites you go to on purpose, and all the other sites that track you. That’s obviously a privacy issue. But it is also a security problem. The previous speaker spoke about supply chains at Target, this is the web version of this… That supply chain gets huge when you visit, say, six websites.

So, a quiz question… I got to Yahoo, I hit reload… Am I running the same code as a moment ago… ? Well, it’s complicated… I had a student run a study on this… And how much changes… In a week about half of the top 200 sites had changed their javascript in a week. I see trackers change between individual reloads… But it might change, it might not…

So we as users you access a first party website, then they access third party sites… So they access ad servers and that sells that user, and ad is returned, with an image (sometimes with code). Maybe I bid to a company, that bids out again… This is huge as a supply chain and tracking issue…

So the Washington Post, for instance, covering the malware attack showed that malicious payloads were being delivered to around 300k users per hour, but only about 9% (27k) users per hour were affected – they were the ones that hadn’t updated their systems. How did that attack take place? Well rather than attack, they just brought an ad and ran malware code.

There is a tool called Ghostery… It’s brilliant and useful… But it’s run by the ad industry and all the trackers are set the wrong way. Untick those all and then it’s fascinating… They tell you about page load and all the components involved in loading a page…

To change topic…

Cookies! Yes, they can be used to track you across web sites. But they can’t give you malware as is. So… I will be tackling the misconception that cookies is evil… And I’m going to try to convince you otherwise. Tracking can be evil… But cookies is kind of an early example of privacy by design…

It is 1994. The internet cannot remember anyone between page loads. You have an interaction with a web server that has absolutely no memory. Cookies help something remember between page loads and web pages… Somehow a server has to know who you are… But back in 1994 you just open a page and look at it, that’s the interaction point…

But companies wanted shopping baskets, and memory between two page reloads. There is an obvious technical solution… You just give every browser a unique identifier… Great! The server remembers you. But the problem is a privacy issue across different servers… So, Netscape implemented cookies – small text strings the server could ask the browser to remember and give back to it later…

Cookies have some awesome properties: it is client visible; third party tracking is client visible too; it’s opt out (delete) option on a per-site basis; it’s only readable by the site that set it; and it allows for public discussion of tracking…

… Which is why Android/iOS both went with the unique ID option. And that’s how you can be tracked. As a design decision it’s very different…

Now to some of the research I work on… I believe in getting people to touch stuff, to interact with it… We can talk to each other, or mystify, but we need to actually have people understand this stuff. So we ran an outreach activity to build a website, create a cookie, and then read the cookie out… Then I give a second website… To let people try to understand how to change their names on one site, not the other… What happens when you view them in Incognito mode… And then exploring cookies across sites. And how that works…

Misconception: VPNs solve all privacy and security problems. Back at Indiana I taught students who couldn’t code… And that was interesting… They saw VPNs as magic fairy dust. And they had absorbed this idea that anyone can be hacked at any time… They got that… But that had resulted in “but what’s the point”. That worries me… In the general population we see media coverage of attacks on major companies… And the narrative that attacks are inevitable… So you end up with this problem…

So, I want to talk about encryption and why it’s broken and what that means by VPNs. I’m not an encryption specialist. I care about how it works for the user.

In encryption we want (1) communication between you and the other party is confidential and has not been changes, and no-one can read what you sent and no one can change what you sent; and (2) to know who we are talking about. And that second part is where things can be messed up. You can make what you think is the secure connection to the right person, but could be a secure connection to the wrong person – a man in the middle attack. A real world example… You go to a coffee shop and use wifi to request the BBC news site, but you get a wifi login page. That’s essentially a man in the middle attack. That’s not perhaps harmful, it’s normal operating procedure… VPNs basically work like this…

So, an example of what really happened to a student… I set up a page that just had them creating a very simple cookie page… I was expecting something simple… But one of them submitted a page with a bit of javascript… it is basically injecting code so if I connect to it, it will inject an ad to open in my VPN…. So in this case a student logged in to AnchorFree – magic fairy dust – and sees a website and injects code that is what I see when they submit the page in Blackboard Learn…

VPNs are not magic fairy dust. The University runs an excellent VPN – far better for coffee shops etc!

So, I like to end with some common advice:

  • Install anti virus scanner. Don’t turn off Windows 8+ automatically installed AV software… I ran a study where 50% of PhD students had switched off that software and firewalls…
  • Keep your software updated – best way to stay safe
  • Select strong passcode for important things you use all the time
  • For non-important stuff, use a password manager for less important things that you use rarely… Best to have different password between them…
  • Software I use:
    • Ad blockers – not just ads, reduce lots of extra content loading. The more websites you visit the more vulnerable you are
    • Ghostery and Privacy Badger
    • Lightbeam
    • Password Managers (LastPass, OnePassword and KeePass are most recommended
    • 2-factor like Yubikey – extra protection for e.g. Facebook.
    • If you are really serious: UMatrix and NoScript BUT it will break lots of pages…


Q1) It’s hard to get an average citizen to do everything… How do you get around that and just get the key stuff across…

A1) Probably it’s that common advice. The security community has gotten better at looking at 10 key stuff. Google did a study with Blackhats Infosec conference about what they would do… And asked on Amazon Mechanical Turj about what they would recommend to friends. About the only common answer amongst blackhats was “update your software”. But actually there is overlap… People know they should change passwords, and should use AV software… But AV software didn’t show on the Blackhat list… But 2-factor and password managers did…

Q2) What do you think about passwords… long or complex or?

A2) We did a study maybe 8 years ago on mnemonic passwords… And found that “My name is Inigo Montoya, you killed my father, prepare to die” was by far the most common. The issue isn’t length… It’s entropy. I think we need to think server side about how many other users have used the same password (based on encrypted version), and you need something that less than 3 people use…

Q2) So more about inability to remember it…

A2) And it depends on threat type… If someone knows you, your dog, etc… Then it’s easier… But if I can pick a password for a long time I might invest in it – but if you force people to change passwords they have to remember it. There was a study that people using passwords a lot use some affirmations, such as “I love God”… And again, hard to know how you protect that.

Q3) What about magic semantic email links instead of passwords…

A3) There is some lovely work on just how much data is in your email… That’s a poor mans version of the OAuth idea of getting an identity provider to authenticate the user. It’s good for the user, but that is one bigger stake login then… And we see SMS also being a mixed bag and being subject to attack… Ask a user though… “there’s nothing important in my email”.

Q4) How do you deal with people saying “I don’t have anything to hide”?

A4) Well I start with it not being about hiding… It’s more, why do you want to know? When I went to go buy a car I didn’t dress like a professor, I dressed down… I wanted a good price… If I have a lot of time I will refer them to Daniel Salvo’s Nothing to Hide.

Talk by Nicola Osborne (EDINA) covering Digital Footprints and how you can take control of your online self

And that will be me… So keep an eye out for tweets from others on the event hashtag: #UoEInfoSec.

And with a very brief summing up from Alastair Fenemore, the day came to a close. Thanks to the lovely University Information Security team for organising this really interesting event (and inviting me to speak) as part of their awesome Information Security Awareness Week programme.

 October 4, 2017  Posted by at 3:06 pm digital footprint, Events Attended, LiveBlogs Tagged with: ,  No Responses »
Aug 032017

Today I am at Repository Fringe which runs today and tomorrow in Edinburgh and is celebrating 10 years of Repofringe! I’m just here today – presenting a 10×10 on our recent Reference Rot in Theses: A HiberActive Pilot project work – and will be blogging whilst I’m here. As usual, as this is live, may include the odd typo or error so all comments, corrections, questions, additions, etc. are very much welcomed!

Welcome – Janet Roberts, Director of EDINA

My colleagues were explaining to me that this event came from an idea from Les Carr that there should be not just one repository conference, but also a fringe – and here were are at the 10th Repository Fringe on the cusp of the Edinburgh Fringe.

So, this week we celebrate ten years of repository fringe, and the progress we have made over the last 10 years to share content beyond borders. It is a space for debating future trends and challenges.

At EDINA we established the OpenDepot to provide a space for those without an institutional repository… That has now migrated to Zenodo… and the challenges are changing, around the size of data, how we store and access that data, and what those next generation repositories will look like.

Over the next few days we have some excellent speakers as well as some fringe events, including the Wiki Datathon – so I hope you have all brought your laptops!

Thank you to our organising team from EDINA, DCC and the University of Edinburgh. Thank you also to our sponsors: Atmire; FigShare; Arkivum; ePrints; and Jisc!

Opening Keynote – Kathleen Shearer, Executive Director COARRaising our game – repositioning repositories as the foundation for sustainable scholarly communication

Theo Andrew: I am delighted to introduce Kathleen, who has been working in digital libraries and repositories for years. COAR is an international organisation of repositories, and I’m pleased to say that Edinburgh has been a member for some time.

Kathleen: Thank you so much for inviting me. It’s actually my first time speaking in the UK and it’s a little bit intimidating as I know that you folks are really ahead here.

COAR is now about 120 members. Our activities fall into four areas: presenting an international voice so that repositories are part of a global community with diverse perspective. We are being more active in training for repository managers, something which is especially important in developing countries. And the other area is value added services, which is where today’s talk on the repository of the future comes in. The vision here is about

But first, a rant… The international publishing system is broken! And it is broken for a number of reasons – there is access, and the cost of access. The cost of scholarly journals goes up far beyond the rate of inflation. That touches us in Canada – where I am based, in Germany, in the UK… But much more so in the developing world. And then we have the “Big Deal”. A study of University of Montreal libraries by Stephanie Gagnon found that of 50k subscribed-to journals, really there were only 5,893 unique essential titles. But often those deals aren’t opted out of as the key core journals separately cost the same as that big deal.

We also have a participation problem… Juan Pablo Alperin’s map of authors published in Web of Science shows a huge bias towards the US and the UK, a seriously reduced participation in Africa and parts of Asia. Why does that happen? The journals are operated from the global North, and don’t represent the kinds of research problems in the developing world. And one Nobel Prize winner notes that the pressure to publish in “luxury” journals encourages researchers to cut corners and pursue trendy fields rather than areas where there are those research gaps. That was the cake with Zika virus – you could hardly get research published on that until a major outbreak brought it to the attention of the dominant publishing cultures, then there was huge appetite to publish there.

Timothy Gowers talks about “perverse incentives” which are supporting the really high costs of journals. It’s not just a problem for researchers and how they publish, its also a problem of how we incentivise researchers to publish. So, this is my goats in trees slide… It doesn’t feel like goats should be in trees… Moroccan tree goats are taught to climb the trees when there isn’t food on the ground… I think of the researchers able to publish in these high end journals as being the lucky goats in the tree here…

In order to incentivise participation in high end journals we have created a lucrative publishing industry. I’m sure you’ve seen the recent Guardian article: “is the staggeringly profitable business of science publishing bad for science”. Yes. For those reasons of access and participation. We see very few publishers publishing the majority of titles, and there is a real

My colleague Leslie Chan, funded by the International Development Council, talked about openness not just being about gaining access to knowledge but also about having access to participate in the system.

On the positive side… Open access has arrived. A recent study (Piwowar et al 2017) found that about 45% of articles published in 2015 were open access. And that is increasing every year. And you have probably seen the May 27th 2016 statement from the EU that all research they fund must be open by 2020.

It hasn’t been a totally smooth transition… APCs (Article Processing Charges) are very much in the mix and part of the picture… Some publishers are trying to slow the growth of access, but they can see that it’s coming and want to retain their profit margins. And they want to move to all APCs. There is discussion here… There is a project called OA2020 which wants to flip from subscription based to open access publishing. It has some traction but there are concerns here, particularly about sustainability of scholarly comms in the long term. And we are not syre that publishers will go for it… Particularly one of them (Elsevier) which exited talks in The Netherlands and Germany. In Germany the tap was turned off for a while for Elsevier – and there wasn’t a big uproar from the community! But the tap has been turned back on…

So, what will the future be around open access? If you look across APCs and the average value… If you think about the relative value of journals, especially the value of high end journals… I don’t think we’ll see lesser increases in APCs in the future.

At COAR we have a different vision…

Lorcan Dempsey talked about the idea of the “inside out” library. Similarly a new MIT Future of Libraries Report – published by a broad stakeholder group that had spent 6 months working on a vision – came up with the need for libraries to be open, trusted, durable, interdisciplinary, interoperable content platform. So, like the inside out library, it’s about collecting the output of your organisation and making is available to the world…

So, for me, if we collect articles… We just perpetuate the system and we are not in a position to change the system. So how do we move forward at the same time as being kind of reliant on that system.

Eloy Rodrigues, at Open Repository earlier this year, asked whether repositories are a success story. They are ubiquitous, they are adopted and networked… But then they are also using old, pre-web technologies; mostly passive recipients; limited interoperability making value added systems hard; and not really embedded in researcher workflows. These are the kinds of challenges we need to address in next generation of repositories…

So we started a working group on Next Generation Repositories to define new technologies for repositories. We want to position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication. And on top of which we want to be able to add layers of value added services. Our principles include distributed control to guard againts failure, change, etc. We want this to be inclusive, and reflecting the needs of the research communities in the global south. We want intelligent openness – we know not everything can be open.

We also have some design assumptions, with a focus on the resources themselves, not just associated metadata. We want to be pragmatic, and make use of technologies we have…

To date we have identified major use cases and user stories, and shared those. We determined functionality and behaviours; and a conceptual models. At the moment we are defining specific technologies and architectures. We will publish recommendations in September 2017. We then need to promote it widely and encourages adoption and implementation, as well as the upgrade of repositories around the world (a big challenge).

You can view our user stories online. But I’d like to talk about a few of these… We would like to enable peer review on top of repositories… To slowly incrementally replace what researchers do. That’s not building peer review in repositories, but as a layer on top. We also want some social functionalities like recommendations. And we’d like standard usage metrics across the world to understand what is used and hw.. We are looking to the UK and the IRUS project there as that has already been looked at here. We also need to address discovery… Right now we use metadata, rather than indexing full text content… So contat can be hard to get to unless the metadata is obvious. We also need data syncing in hubs, indexing systems, etc. reflect changes in the repositories. And we also want to address preservation – that’s a really important role that we should do well, and it’s something that can set us apart from the publishers – preservation is not part of their business model.

So, this is a slide from Peter Knoth at CORE – a repository aggregator – who talks about expanding the repository, and the potential to layer all of these additional services on top.

To make this happen we need to improve the functionality of repositories: to be of and not just on the web. But we also need to step out of the article paradigm… The whole system is set up around the article, but we need to think beyond that, deposit other content, and ensure those research outputs are appropriately recognised.

So, we have our (draft) conceptual model… It isn’t around siloed individual repositories, but around a whole network. And some of our draft recommendations for technologies for next generation repositories. These are a really early view… These are things like: ResourceSync; Signposting; Messaging protocols; Message queue; IIIF presentation API; AOAuth; Webmention; and more…

Critical to the widespread adoption of this process is the widespread adoption of the behaviours and functionalities for next generation repositories. It won’t be a success if only one software or approach takes these on. So I’d like to quote a Scottish industrialist, Andrew Carnegie: “strength is derived from unity…. “. So we need to coalesce around a common vision.

Ad it isn’t just about a common vision, science is global and networked and our approach has to reflect and connect with that. Repositories need to balance a dual mission to (1) showcase and provide access to institutional research and (2) be nodes in a global research network.

To support better networking in repositories and in Venice, in May we signed an International Accord for Repository Networks, with networks from Australasia, Canada, China, Europe, Japan, Latin America, South Africa, United States. For us there is a question about how best we work with the UK internationally. We work with with OpenAIRE but maybe we need something else as well. The networks across those areas are advancing at different paces, but have committed to move forward.

There are three areas of that international accord:

  1. Strategic coordination – to have a shared vision and a stronger voice for the repository community
  2. Interoperability and common “behaviours” for repositories – supporting the development of value added services
  3. Data exchange and cross regional harvesting – to ensure redundancy and preservation. This has started but there is a lot to do here still, especially as we move to harvesting full text, not just metadata. And there is interest in redundancy for preservation reasons.

So we need to develop the case for a distributed community-managed infrastructure, that will better support the needs of diverse regions, disciplines and languages. Redundancy will safeguard against failure. With less risk of commercial buy out. Places the library at the centre… But… I appreciate it is much harder to sell a distributed system… We need branding that really attracts researchers to take part and engage in †he system…

And one of the things we want to avoid… Yesterday it was announced that Elsevier has acquired bepress. bepress is mainly used in the US and there will be much thinking about the implications for their repositories. So not only should institutional repositories be distributed, but they should be different platforms, and different open source platforms…

Concluding thoughts here… Repositories are a technology and technologies change. What its really promoting is a vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system. This is really the future of libraries in the scholarly communication community. This is what libraries should be doing. This is what our values represent.

And this is urgent. We see Elsevier consolidating, buying platforms, trying to control publishers and the research cycle, we really have to move forward and move quickly. I hope the UK will remain engaged with this. And i look forward to your participation in our ongoing dialogue.


Q1 – Les Carr) I was very struck by that comment about the need to balance the local and the global I think that’s a really major opportunity for my university. Everyone is obsessed about their place in the global university ranking, their representation as a global university. This could be a real opportunity, led by our libraries and knowledge assets, and I’m really excited about that!

A1) I think the challenge around that is trying to support common values… If you are competing with other institutions it’s not always an incentive to adopt systems with common technologies, measures, approaches. So there needs to be a benefit for institutions in joining this network. It is a huge opportunity, but we have to show the value of joining that network It’s maybe easier in the UK, Europe, Canada. In the US they don’t see that value as much… They are not used to collaborating in this way and have been one of the hardest regions to bring onboard.

Q2 – Adam Field) Correct me if I’m wrong… You are talking about a Commons… In some way the benefits are watered down as part of the Commons, so how do we pay for this system, how do we make this benefit the organisation?

A2) That’s where I see that challenge of the benefit. There has to be value… That’s where value added systems come in… So a recommender system is much more valuable if it crosses all of the repositories… That is a benefit and allows you to access more material and for more people to access yours. I know CORE at the OU are already building a recommender system in their own aggregated platform.

Q3 – Anna Clements) At the sharp end this is not a problem for libraries, but a problem for academia… If we are seen as librarians doing things to or for academics that won’t have as much traction… How do we engage academia…

A3) There are researchers keen to move to open access… But it’s hard to represent what we want to do at a global level when many researchers are focused on that one journal or area and making that open access… I’m not sure what the elevator pitch should be here. I think if we can get to that usage statistics data there, that will help… If we can build an alternative system that even research administrators can use in place of impact factor or Web of Science, that might move us forward in terms of showing this approach has value. Administrators are still stuck in having to evaluate the quality of research based on journals and impact factors. This stuff won’t happen in a day. But having standardised measures across repositories will help.

So, one thing we’ve done in Canada with the U15 (top 15 universities in Canada)… They are at the top of what they can do in terms of the cost of scholarly journals so they asked us to produce a paper for them on how to address that… I think that issue of cost could be an opportunity…

Q4) I’m an academic and we are looking for services that make our life better… Here at Edinburgh we can see that libraries are the naturally the consistent point of connection with repository. Does that translate globally?

A4) It varies globally. Libraries are fairly well recognised in Western countries. In developing world there are funding and capacity challenges that makes that harder… There is also a question of whether we need repositories for every library.. Can we do more consortia repositories or similar.

Q5 – Chris) You talked about repository supporting all kinds of materials… And how they can “wag the dog” of the article

A5) I think with research data there is so much momentum there around making data available… But I don’t know how well we are set up with research data management to ensure data can be found and reused. We need to improve the technology in repositories. And we need more resources too…

Q6) Can we do more to encourage academics, researchers, students to reuse data and content as part of their practice?

A6) I think the more content we have at Commons level, the more it can be reused. We have to improve discoverability, and improve the functionality to help that content to be reused… There is huge use of machine reuse of content – I was speaking with Peter Knoth about this – but that isn’t easy to do with repositories…

Theo) It would be really useful to see Open Access buttons more visible, using repositories for document delivery, etc.

Chris Banks, Director of Library Services, Imperial CollegeFocusing upstream: supporting scholarly communication by academics

Gavin MacLachlan: I’d just like to welcome you again to Edinburgh, our beautiful city and our always lovely weather (note for remote followers: it’s dreich and raining!). I’m here to introduce Chris, whose work with LIBER and LERU will be well known to you.

Chris: This is my first fringe and I find it quite terrifying that I’m second up! Now, I’m going to go right back to basics and policy…

The Finch report in 2012 and Research Councils UK: we had RCUK policy; funding available for immediate Gold OA (including hybrid); embargo limits apply where Green OA chosen. Nevertheless the transition across the world is likely to take a number of years. For my money we’ve moved well on repositories, partly as the UK has gone it alone in terms of funding that transition process.

In terms of REF we had the Funding council REF policy (2013) which is applicable to all outputs that are to be submitted to the post 2014 REF exercise – effectively covers all researchers. No additional funding available Where Green OA selected, requirement for use of repositories. There were also two paragraphs (15 and 26) shaping what we have been doing…

That institutions are encouraged to go beyond the minimum (and will receive credit for doing so) – and the visibility of that is where we see the rise of University presses. And the statement that repositories do not need to be accessible for reuse and text mining, but that, again, there will be credit for those that are. Those two paragraphs have driven what we’ve been doing at Imperial.

At the moment UK researchers face the “policy stack” challenge. There are many funder policies; the REF policy differs substantially from other policies and applies to all UK research academics – you can comply with RCUK policy and fall foul of REF; many publisher policies…

So how can the REF policy help? Institutions recognise IP, copyright and open access policies are not necessarily supporting funder compliance – something needs to be done. There is a variety of approaches to academic IP observed in UK institutions. Legally in the UK the employer is the first copyright holder… subject to any other agreements and unless the individual is a contractors etc.

Publishers have varying approaches to copyright, licence to first publish, to outright copyright transfer. Licences are not read to academics. It’s not just in publishing… It’s social media… It’s a big problem.

For the library we want to create frictionless services. We need to upscale services to all researchers – REF policy requirements. We can’t easily give an answer to researchers on their OA options. So we started our work at imperial to address this, and to ensure our own organisational policy aligned with funder policies. We also wanted to preserve academic choice over publishing, and ability to sign away rights when necessary (though encouraging scrutiny of licenses). We have a desire to maximise impact of publication. And there is a desire to retain some re-use rights for us in teaching etc, including rights to diagrams etc.

The options we explored with academics was to do as we do at the moment – with academics signing over copyright, through to the institution claiming all copyright on all academic outputs. And we wanted to look at two existing models in between, the SPARC model (academic signed copyright over to publisher but licenses back); and the Harvard model – which we selected.

The Harvard model is implemented as part of the university OA policy. Academic deposits Author Accepted Manuscipts (AAMs) and grant a non-exclusive licence to the university for all journal articles. It is a well established policy and has been in use (elsewhere) since 2008. Where a journal seeks a waiver that can be managed by exception. And this is well tested in Ivy League colleges but also much more widely, including universities in Kenya.

The benefits here is that academia retains rights, authors have the right to make articles open access – open access articles have higher citations than closed ones. Authors can continue to publish in journal or choice irrespective of whether it allows ope access or not. Single means by which authors can comply with green open access policies. We are minimising reliance on hybrid open access – reducing “double dipping”, paying twice through subscriptions and APC – a complex and costly process. I think we and publishers see money for hybrid OA models drying up in the future, as the UK has pretty much been the one place doing that. Instead funding is typically used for pure gold OA models and publications.

We have mae some changes to the Harvard model policy to make it work in the context of UK law, also to ensure it facilitated funder deposit compliance and REF eligibility. The next step here is that 60 institutions overall are interested and we have a first mover group of around 12 institutions. We are discussing with publishers. And we have had wider engagement with the researcher, library, research office and legal office communities. We have a website and advocacy materials under development. We are also drafting boilerplate texts for authors, collaboration agreements etc. especially for large international collaborative projects. We have a steering committee established and that includes representatives from across institutions, and including a publisher.

At the moment we are addressing some publisher concerns and perceptions. Publishers are taking a very particular approach to us. We have a real range of responses. Some are very positive – including the pure gold (e.g. PLoS) and also learned society (e.g. Royal Society). Other publishers have raised concerns and are in touch with the steering group, and with ASPLP.

Summary of current concerns:

  • that it goes beyond requirements of Finch. We have stated that UK-SCL is to support REF and other
  • AAMs will be made available on publication. Response: yes, as per Harvard model around since 2008
  • Administrative burden on UK author/institutions as publishers would have to ask for waivers in 70-80%. We have responded that in other Harvard using experiences it has been less than 5% and we can’t see why UK authors would be treated differently.
  • They noted that only 8% of material submitted to the REF were green OA compliant. We have noted that only 8% submitted were green OA, not 8% of all eligible for submission.

Researchers have also raised concerns

  • the need to seek agreement from co-authors, especially in collaborations. Can be addressed through a phased/gradual implementation
  • Fear that a publisher will refuse to publish. Institutions using Harvard model repot no instances of this happening
  • Learned Societies – fear loss of income. No reliable research evidence to back up this fear.
  • Don’t like the CC-BY-ND Licence. That is to comply with RCUK but warrants further discussion.

Our next step is further meeting with PA/ALPSP to take place during the summer. We have encouraged proposals to delivery more than simply minimum REF eligibility which would resolve current funder/publisher policy stack complexity. We will finalise the website, waiver system, advocacy materials and boilerplate texts. To gain agreement on early mover institutions and on the dat of first adoption. And to notify publishers.

Another bit of late breaking news… Publishers recently went to HEFCE to ask about policy statements and, as a result of that, HEFCE will be clarifying that it is pushing for minimum compliance and encouraging more than that. One concern of the REF policy had been that only material submitted to the REF would have been deposited…

Last time my institution submitted 5k items, more than half were not monographs. We submitted 95% of our researchers. Out of that four items were looked at, now would be 2. And from that our funding is decided. And you can see, from that, why that bigger encouragement for the open scholarly ecosystem is so important.

I wanted to close by sharing some useful further materials and to credit others who have been part of this work.

One important thing to note is that we are trying to help researchers and university to comply as policies from funders and publishers evolve. I would like to see that result in discussion with publishers, and a move to all gold OA…  The AAMs is not the goal here, it’s the published article. Now that could see the end of repositories – something I am cautious of raising with this audience. Now in the


Q1) The elephant in the room here is Sci Hub… They are making 95% of published content available for free. You have AAMCs out there… And we haven’t seen subscriptions drop.

A1) So our initiative is about legal sharing. And also need to note that the UK is just one scholarly community. And others have not moved towards mandates and funding. I think it is a shame that fights have been picked is with institutions, when we have that elephant in the room…

Q2) Congratulations on the complex and intricate discussions you have been holding… Almost a legal game of Twister, where all the participants hate each other! This ia particular negotiation at the end of a process, at the end of the scholarly publishing change. How might you like your experience to feed into training of researchers and their own understanding of copyright, ownership of their own outputs.

A2) The challenge that we observe is that we have many younger researches and authors who are very passionate and ethically minded about openness. They are under pressure from supervisors who say they will not get tenured position if they don’t have a “good” journal on heir cv. And they are frustrated by the slow movement on the San Francisco research assessment declaration. Right now the quality journals remain those subscription high impact journals. But we have research showing the higher use of open access journals. But we still have that debate within academe that is slowing down that environment. But training researchers about their IP and what copyright. I also think it is interesting that Sir Mark Walpock in charge of UKRI as he has written before about the evolving scholarly record, and the scattering of articles and outputs, instead building online around research projects. He gave a talk at LIBER in 2015, and an article for THE. He was also at Wellcome when they first introduced their mandate so I think we really do have someone who understands that complexity and the importance of openness.

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. v2.juliet – A Model For SHERPA’s Mid-Term Infrastructure. Adam Field, Jisc

I’m here from SHERPA HQ at Jisc! I’m going to go back to 2006… We saw YouTube celebrating it’s first year… Eight out of Ten Cats began… The Nintendo WII appeared… And… SHERPA/JULIET was launched (SHERPA having been around in 2001). So, when we set up Sherpa REF as a beta service in 2016 we had to build something new, as JULIET hadn’t been set up for APIs and interoperability in that kind of way.

So, we set about a new SHERPA/JULIET based around a pragmatic, functional data model; to move data into a platform; to rebrand to Jisc norms; a like-for-like replacement; and a precedent for our other services as we update them all..

So, a quick demo… We now have the list of funders – as before – include an overview of open access. So if we choose Cancer Research UK… You can see the full metadata record, headings for more information. Can see which groups it is part of… We have a nice open API where you can retrieve information.

So, whilst it was a like for like rebuild we have snuck in new features, including FundRef DOIs – added automatically where possible, will be added to with manual input too. More flexible browsing. And a JSON API – really easy to work with. And in the future we’d like funders able to add to their own records and other usefu l3rd party editorial features. We want to integrate ElasticSearch. And we want to add microservices…

In terms of our process here… The hard part was analying the existing data, structuring it into a more appropriate shape… the next part was much easier… We configured EPrints, imported data, and added some bespoke service requirements.

Right now we have a beta of SHERPA/JULIET. All take a look please! We are now working on OpenDOAR. And then SHERPA/ROMEO is expected to be in early 2018.

We now want your feedback! Email with your comments and feedback. We’ll have feedback sessions later today that you can join us for and share your thoughts, ask questions about the API. And myself and Tom Davey our user interface person, are here all day – come find us!

  1. CORE Recommender: a plug in suggesting open access content. Nancy Pontika, CORE

I want to talk about discoverability of content in repositories… Salo 2008, Konkiel 2012 and Acharya 2017 talk about the challenges of discoverability in repositories. So, what is needed? Well, we need recommender systems in repositories so that we can increase the number of incoming links to relevant resources…

For those of you new to repositories, CORE is an aggregation service, we are global and focused we have started harvesting gold OA journals… We have services at various levels, including for text mining and data science. We have a huge collection of 8 million full text articles,  77 million metadata records… They are all in one place… So we can build a good recommendation system.

What effect can we have? Well it will increase the accessibility meaning more engagement, higher Click-Through Rate (CTR); twice as often people access resources on CORE via its recommender system than via search. And that additional engagement increases the time spent in your repositories – which is good for you. And you can open another way to find research…

For instance you can see within White Rose Research Online that suggested articles are presented that come from all of the collections of CORE, including basic geographic information… We would like crowd sourced feedback here. The more users that engage in feedback, the more the recommender will improve. We also get feedback from our community. At the moment the first tab is CORE recommendations, the second tab is institutional recommendations. We’ve had feedback that institutions would prefer it th eother way… We have heard that… Although we note that CORE recommendations are better as its a bigger data set…. We want to make sure the institutional tab appears first unless there are few recommendations/poor matches… We are working on this…

CORE Recommender has been installed at St Mary’s; LSHTM; the OU; University of York; University of Sheffield; York St John; Strathclyde University… and others with more to follow.

How does it work? Currently it’s an article-to-article recommender system. There is preprocessing to make this possible. What is unique is that recommendations is based on full text, and the full text is open access.

What is the CORE recommender not? It is not always right – but which recommendation system is? And it does not compare the “quality” of the recommended articles with the “quality” of the initial paper…

  1. Enhancing Two workflows with RSpace & Figshare: Active Data to Archival Data and Research to Publication. Rory Macneil, Research Space and Megan Hardeman of Figshare

Rory: Most of the discussion so far has been on publications, but we are talking about data. I think it’s fair to say that FigShare in the data field; and RSpace in the Lab notebooks world have been totally fixated on interoperability!

Right now most data does not make it into repositories… Some shouldn’t be but even the data that should be shared, is not. One way to increase deposit is to make it easy to deposit data. By integrating with RSpace notebooks that allows easy and quick deposit.

So, in RSPace you can capture metadata of various types. There are lots of ways to organise the data… And to use that you just need to activate the FigShare plugin. Then you select the data to deposit – exporting one or many documents… You select what you want to deposit, and the format to deposit in. You can export all of your work, or all of your lab’s work – whatever level of granularity you want to share… You deposit to Figshare… And over to Megan!

Megan: Figshare is a repository where users can male all of their research outputs availale in citable, accessible ways (all marked up for Google Scholar). You upload any file type (we support over 1000 types); we assign a DOI on an item level’ Store items in perpetuity (and backed up in DPN); track usage stats and Altmetrics (more exposure) and you can collaborate with researchers inside and outside your institutions.

figshare has na open API and integrations with RSpace nad other organizations and tools…

For an example… You can see an electronic lab notebook from RSpace which can be browsed and explored in the browser!

  1. Thesis digitisation project. Gavin Willshaw, University of Edinburgh

I’m digital curator here, and manager of the PhD digitisation project. This project sees a huge amount of content going into ERA, our repository. In the last three years we’ve moved from having two photographers to having two teams of photographers and cataloguers across two sites – we are investing heavily.

We have 17,000 PhD theses and that will all be online by the end of 2018. This will provide global access to entire PhD collection. We have obtained some equipment. We are creating metadata records, and also undertaking some preservation work where thre required.

The collection is largely standardised… But we have some latin and handwritten theses. We have awkward objects – like slices of lungs!

For 10k theses we have duplicates and they are scanned destructively. 3000 unique these are scanned non-destructively in house. And 40000 unique these outsourced. All are OCRed. And they are all catalogued, with data protection checks made before determining what can be shared online in full and which cannot.

In terms of copyright and licensing, that is still with the author. We have contacted some and had positive feedback. It’s a risk but a low risk. In any case we can’t asset the copyright or change licences on our own. And we already have over 2500 theses live.

And these theses are not just text… We have images that are rare and unusual. We share some of these highlights in our blog: and we use, on Twitter, the hashtag #UoEPhD. We have some notable theses… Alexander Macall Smith’s PhD is there; Isabelle Elmsley Hutton, a doctor in the first world war in the Balkans – so noted she was on a stamp in Serbia last year; Helen Pankhurts; and of course members of staff from the university too!

Impact wise the theses on ERA have been downloaded 2 million times since 2012. Those digitised in the project are seeing around 3000 downloads per month. Oddly our most popular thesis right now is on the differentiation of people in Norwich. We are also looking at what else we can d… Linking theses to Wikipedia; adding a thesis to Wikisource (and getting 10x the views); and now looking at what else… text and data mining etc.

  1. Weather Cloudy & Cool Harvest Begun’: St Andrews output usage beyond the repository. Michael Bryce, University of St Andrews

I didn’t expect it to actually be cloudy today…!

Our repository has been going since 2006, and use has been growing steadily…

Some of the highlights fro our repository has included research on New Caledonian crows and collaborative tool use. We also have farming diaries in our repository under Creative Commons license… Pushing that out into the community in blog posts and posters… So going beyond traditional publications and use. Our material on Syria has seen significant usage driven partly by use in OJS journals.

Our repository isn’t currently OpenAIRE compliant, but we have some content shared that way, which means a bigger audience… For instance material on virtual learning environments associated with a big EU project.

We’ve also been engaging in publishing engagement. The BBC asked us to digitise a thesis at the time of broadcasting Coast which added that work to our repository.

When we reached our 10,000th item we had cake! And helped publicise the student and their work to a wider audience…

Impact and the REF panel session

Brief for this session: How are institutions preparing for the next round of the Research Excellence Framework #REF2021, and how do repositories feature in this? What lessons can we learn from the last REF and what changes to impact might we expect in 2021? How can we improve our repositories and associated services to support researchers to achieve and measure impact with a view to the REF? In anticipation of the forthcoming announcement by HEFCE later this year of the details of how #REF2021 will work, and how impact will be measured, our panel will discuss all these issues and answer questions from RepoFringers.

Chair: Keith McDonald (KM), Assistant Director, Research and Innovation Directorate, Scottish Funding Council

The panel here include Pauline Jones, REF Manager at University of Edinburgh, and a veteran of the two previous REFs – she was at Napier University in 2008, and was working at the SFC (where I work) for the previous REF and was involved in the introduction of Impact.

Catriona Firth (CF), REF Deputy Manager, HEFCE

I used to work in universities, now I am a poacher-turned-gamekeeper I suppose!

Today I want to talk about Impact in REF 2014. Impact was introduced and assessed for the first time in REF 2014. After extensive consultation Impact was defined in an inclusive way. So, for REF 2014, impact was assessed in four-page case studies describing impacts that had occurred between January 2008 and July 2013. The submitting university must have produced high quality research since 1993 that contributed to the impacts. Each submitting unit (usually subject area) returned one case study, plus an additional case study for every 10 staff.

At the end of the REF 2014 we had 6,975 case studies submitted. On average across submissions 44% of impacts were judged outstanding (4*) by over 250 external users of research, working jointly with the academic panel. There was global spread of impact, and those impacts were across a wealth of areas of life policy, performance and creative practice, etc. There was, for instance, a case study of drama and performance that had an impact on nuclear technology. The HEFCE report on impact is highly recommended reading.

In November 2015 Lord NicholasStern was commissioned by the Minister of Universities and Science to conduct an independent review of the REF. He found that the exercise was excellent, and had achieved what was desired. However there were recommendations for improvement:

  • lowering the burden on institutions
  • less game-playing and use of loop holes
  • less personalisation, more institutionally focused – to take pressure off institutions but also recognise and reward institutional investment in research
  • recognition for investment
  • more rounded view of research activity – again avoiding distortion
  • interdisciplinary emphasis – some work could
  • broaden impact – and find ways to capture, reward, and promote the ways UK research has a benefit on and impacts society.

If you go to the HEFCE website you’ll see a video of a webinar on the Stern Review and specifically on staff and outputs, including that all research active staff should be included, that outputs be determined at assessment level, and that outputs should not be portable.

In terms of impact there was keenness to broaden and deepen the definition of impact and provide additional guidance. Policy was a safer kind of case studies before. The Stern Review emphasised a need for more focus on public engagement and impact on curricula and/or pedagogy. Reduce the number of required case studies to a minimum of one. And to include impact arising from research, research activity, or a “body of work”.  And having a quality threshold for underpinning research based on rigour – not just originality. And the opportunity to resubmit case studies if the impact was ongoing.

We have been receiving feedback – over 400 responses – which are being summarised. That feedback includes positive feedback on broadening impact and to aligning definitions of impact and on public engagement across funding bodies. There were some concerns about sub-profile based on one case study – especially in small departments. And in those case you’d know exactly whose work and case study was 4* (or not). There have been concerns about how you separate rigour from originality and significance. There was a lot of support for broader basis of research, but challenges in drawing boundaries in practice – in terms of timing and how far back you go… For scholarly career assessment do you go back further? And there was broad support for resubmission of 2014 case studies but questions about “additionality” – could it be the same sort of impact or did it need to be something new or additional? So, we are working on those questions at the moment.

The other suggestion from the Stern Review was the idea of an institutional level assessment of impact, giving universities opportunities to show case studies that didn’t fall neatly elsewhere. Th ecase studies arising from multi and interdisciplinary and collaborative work, and that that should be 10-20% of total ipact case studies; minimum of one. But feedback has been unclear here, particularly the conflation of interdisciplinary research with institutional profiles. Concern also that the University might take over a case study that would otherwise sit in another unit.

So, the next step is communications in summer/autumn 2017. There will be a REF initial decisions document. A summary of consultation responses. And there will be sharing of full consultation responses (with permission).  And there will be a launch for our REF 2021 website and Twitter account.

Anne-Sofie Laegran (ASL), Knowledge Exchange Manager, College of Arts, Humanities and Social Sciences, University of Edinburgh

KM: Is resubmission better for some areas than others?

ASL: I think it depends on what you mean by resubmission.. We have some good case studies arising from the same research as in 2014, but they are different impacts.

So.. I will give you a view from the trenches. To start I draw your attention to the University strapline that we have been “Influencing the world since 1583”. But we have to demonstrate and evidence that of course.

There has been impact of impact in academia… When I started in 2008 it was about having conversations about the importance of having an impact, and now it is much more about how you do this. There has been a culture change – all academic staff must consider th epotential impact of research. The challenge is not only to create impact but also to demonstrate impact. There is also an incentive to show ipact – it is part of career progression, it is part of recruitment, and it is part of promotion.

Impact of impact in academia has also been about training – how to develop pathways as well as how to capture and evidence impact. And there has been more support – expert staff as well as funding from funders and from the university.

In terms of developing pathways to impact we have borrowed questions that funders ask:

  • who may benefit from your researh?
  • what might th ebenefts ve?
  • what can you do to ensure potential beneficiaries and decision makers have th eopportunity to engage and benefit

And it is also – especially when capturing impact – about developing new skills and networks.

For instance… If you want to impact the NHS, who makes decisions, makes changes… If you are working with museums and galleries the decision makers will vary depending on where you can find that influence. And, for instance, you rarely partner with the Scottish Government, but you may influence NGOs who then influence Scottish Government.

Whatever the impact it starts from excellent research; which leads to knowledge exchange – public engagement, influencing policy, informing professional practice and service deliver, technology transfer; and that results in impact. You don’t “do” impact, your work is used and influences that then effects a change and an impact.

REF impact challenges include demonstrating change/benefit as opposed to reporting engagement activity. Attributing that change to research. And providing robust evidence. In 2014 that was particularly tricky as the guidance was in 2012 so people had to dig back… That should be less of an issue now, we’ve been collecting evidence along the way…

Some cases that we think did well, and/or had feedback were doing well:

  • College of art scholar, who has a dual appointment at the National Galleries of Scotland. She curated the Impressionism Scotland show with over 100k visitors. There was good feedback that also generated debate. It had a change on how th egallery curates shows. And on the market the works displayed went up in value – it had a real economic impact.
  • In law two researchers have been undertaking longitudinal work on young people, their lives, careers, and criminal careers. That is funded by Scottish Government. That research led to a new piece of policy based on the findings of that research. And there was a quote from Scottish Government showing a decline in youth crime, attributing that to the policy change, and which was based on research – showing that clear line of impact.
  • In sociology, a researcher wrote about the impact of research on the financial crisis for the London Review of Books, it was well received and he was named one of the most influential thinkers on the crisis; his work was translated to French; it was picked up in parliament; and Stephanie Flanders – then BBC economics editor – tweeted that this work was the most important on the financial crisis.
  • In music, researchers developed the Skoog, an instrument for disabled students to engage in music. They set up a company, they had investment. At the the time of the REF they had 6 employees, they were selling to organisations – so reaching many people. And in the cultural olympiad during the Olympics in 2012 they were also used, showing that wider impact.

So for each of these you can see there was both activity, and impact here.

In terms of RepoFringe areas I was asked to talk about the role of repositories and open access. It is potentially important. But typically we don’t see impact coming from the scholarly publication, it’s usually the activities coming from the research or from that publication. Making work open access certainly isn’t enough to just trigger impact.

Social media can be important but it needs to have high level of engagement, reach and/or significance to demonstrate more than mere dissemination. That Stephanie Flanders example wouldn’t be enough on it’s own, it works as part of demonstrating another impact, and a good way to track impact, to understand your audience… And to follow up and see what happened next…

Metrics – there is no doubt that numeric evidence was important. Our head of research said last time “numbers speak louder than adjectives” but they have to be relevant and useful. You need context. Standardised metrics/Altmetrics doesn’t work – a big report recently concluded the same. Altmetrics is about alternative metrics that can be tracked online, using DOI. A company called Altmetrics gathers that data, can be useful to track… And can be manipulated by friends with big Twitter followers.. It won’t replace case studies, but may be useful for tracking…

In terms of importance of impact… It relates to 20% of REF score; determined 26% of the funding in Scotland. Funding attracted per annum for the next 7 years:

  • 4* case study brings in £45-120k
  • 3* £9-25k
  • 2* £0
  • 4* output, for comparison, is work £7-15k…

The question that does come up is “what is impact” and yes, a single Tweet could be impact that someone has read and engaged with your work… But those big impact case studies are about making a real change and a range of impacts.

Pauline Jones (PJ), REF Manager and Head of Strategic Performance and Research Policy, University of Edinburgh

Thank you to Catriona and Anne-Sofie for introducing impact. I wanted to reinforce the idea that this is what we are doing anyway, making an impact on society, so it is important anyway, not only because of the REF.

Catriona suggested we had a “year off” but actually once REF happened we went into an intense period of evaluation and reflection, then of course the Stern review, consultation, general election… It has been quite non-stop. But actually even if that wasn’t all going on, we’d need our academics to be aware of the REF and of open access. I think open access is incredibly important, people are looking for it… Research is publicly funded… But it has required a lot of work to get up and running.

Although we are roughly at mid point between REFs, we are up and running, gathering impact, preparing to emphasise our impact. In terms of collecting evidence, depositing papers… That will happen in most universities. I think many will be doing the sort of Mock REFs/REF readiness exercises that we have been undertaking. We are also already thinking about drafting our case studies. As we get nearer to submission we’ll take decisions on inclusion… and getting everything ready.

So for REF 2021 we have a long time period over which submission is prepared. There is no period over which outputs, impacts, environment don’t count. Academics thinking now about what to include: 2017 REF readiness exercise to focus on open access and numbers; 2018 Mock REF to focus on quality. And we all have to have a CRIS system now to make that work.

What’s new here? We are still waiting for the draft to understand what’s happening. There are open access journal articles/conference proceedings. There are probably the challenges of submitting all research staff; decoupling the one-to-four staff-to-outputs ratio. That break is quite a good thing… Some researchers might struggle to create four key outputs – part time staff, staff with maternity leave, etc. But we want a sense of what that looks like from our mock/readiness work. That non-portability requirement seems useful and desirable, but speaking personally I think the researcher invests a lot – not just an institution – making that complex. Taking all those together I’m not sure the Stern idea of less complexity or burden here, not alongside those changes.

And then we have the institutional impact case studies – we had a number of interdisciplinary examples of work, so we are comfortable with that possibility. institutional environment is largely shared so doing that once for the whole university could be a really helpful reduction in work load. And each new element will have implications for how CRIS systems support REF submissions.

And as we prepare for REF 2021 we also have to look to REF 2028. We think open data will be important given the Concordat on Open Data Research (signed by HEFCE; RCUK; Universities UK; Wellcome) so we can get ready now, ready for when that happens. I’m pretty confident that open access monographs will be part of the next REF (following Monographs and Open Access HEFCE report). Then there is institutional impact – may not happen here but may be back. And then there are metrics. We have The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment Management.

IN terms of responsible metrics,we haven’t heard the last of them… Forum for responsible metrics’ Data and metrics to support decisions, not the sole driver; but the conversation will not end with th e metric tide. Metrics are alluring but to date they have’t worked well versus other types of evidence.

SO, how do we prepare?

  • For REF 2021 we need to be agile, support research managers to help academics deposit work, we have to help us lobby CROS system designers to have fit-for-purpose systems.
  • For REF 2028 we have to understand the benefits and challenges of making more research open
  • Be part of the conversation on responsible metrics – any bibliometrics experts in the room will stay busy.
  • And we want to have interoperability in systems…


Q1) How can we do something useful in terms of impact for case studies as our repository remit expands to different materials, different levels of openness, etc.

A1 – ASL) I think being easily accessible on Univesity websites, making them findable… Then also perhaps improved search functionality, and some way to categorise what comes out… If creating things other than peer reviewed publications – what is this? type information. I might have been too negative about repositories because historically our data wasn’t in those… I think actually sciences find that even more important…

Q1) For collecting evidence?

A1 – ASL) Yes. for collecting… Some have metrics that help us see how those impact have worked.

A1 – PJ) We’ve been talking about how best to use our CRIS to improve join up and understand those impacts…

A1 – CF) I think it’s also about getting that rounded view of the researcher – their outputs, publications, etc. being captured as impacts alongside the outputs… That could be useful and valuable…

Q2) A common theme was the burden of this exercise… But could be argued that it drives positive changes… How can the REF add to the sector?

A2 – CF) Wearing my personal and former job hat, as impact officer, I did see REF drive strategic investment in universities, including public engagement, that rewards, recognises, and encourages more engagement with the coomunity. There is real sharing of knowledge brought about by impact and the REF.

A2 – ASL) Totally agree.

A2 – PJ) More broadly the REF and RAE… They recognise the importance of research and supporting researchers. For us we get £75M a year through the research excellence funding. And we see the quality of research publications going up…

Q3) Do you have any comments on the academic community and how that supports the REF, particularly around data.

A3 – PJ) At Edinburgh we are very big – we submitted 1800 staff, we could have submitted up to 2500. In my previous role we had much smaller numbers of resarch staff… So they are different challenges and different systems… We have spoken to our Informatics colleagues to see what we can do. There are definitely benefits at th elevel of building a sysetm to manage this…

Q3) In an academic environment we have collegiate working practice, and need systems that work together.

A3 – PJ) We have a really distributed set up at Edinburgh, so we are constrantly having that conversation, and looking for cross cutting areas, exchanging information…

Q4) the relationship with the researcher changes here… In previous years universities talked about “their research” but it was actually all structured around the individual. In this new model that shift is big, and the role and responsibility of the organisation, the ways that schools interact with their researcher…

A4 – ASL) You do see that in pre-funding application activity with internal peer review processes that build that collegiality within the organisation…

Q5) I was intrigued with the comment that lots of impact isn’t associated with outputs… So that raises questions about the importance of outputs in the REF. Should we rebalance the value of the output and how it is valued.

A5 – ASL) Perhaps. For example when colleagues are providing evidence to government and parliament it is rare for publications to be referenced, and rare for publications to be read… I don’t think those matter… But those include methodology, rigour, evidence of quality of work. But that then becomes briefing papers etc… Otherwise you and I could just make a paper – but that would be opinion. So you need that (hard to read) academic publication, and you have to acknowledge that those are different things and have different roles – and that has to be demonstrated in the case studies.

A5 – CF) I think it’s an interesting question, especially thinking ahead to REF 2021… We are considering how those impacts o the field and impact on wider society are represented – some blue skies research won’t have impact for many years to come…

Q6) I think lay summaries of a piece of work is so crucial. Science Open and John Tennent is putting up lay summaries, you have Kudos and other things there contributing to that… The public want to understand what they are reading. I have personally sat on panels as a lay member and I know how hard it is to have that kind of lay summary is, to understand what has taken place.

A6 – ASL) You do need that lay summary of work, or briefing paper, or expert communities which are not lay people… You have to think about audiences and communicating your work widely, and target it… I think repositories are useful to access work, but it’s not enough to put it there – just as it isn’t enough to put an article out there – you have to actively reach out to your audiences.

A6 – CF) I would agree and I would add that there is such a key need to help academics to do that, to support skills for writing lay summaries… Getting it clearer benefits the researcher, their thinking, and how they tell others about their work – that truly enables knowledge exchange.

A6 – PJ) And it benefits the academic audience too. I was listening to a podcast where academics from across disciplines to see which papers were most valuable, and being readable to a lay audience was a key factor in how those papers did.

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. National Open Data and Open Science Policies in Europe. Martin Donnelly, DCC

I’m talking about some work we’ve done at DCC with SPARC Europe looking at Open Data and Policies across Europe.

The DCC is a centre of expertise in digital curation and data management. We maintain a watching brief on funders research data policies (largely focused on the UK). SPARC Europe is a membership organisation comprising academic institutions, library consortia, funding bodies, research institutes and publishers. Their gial is advocating change in scholarly communications for the benefit of research and society. And we have been collaborating since 2016 looking at open data and open science policies across Europe.

So, what is a policy? Well the dictionary definition works, it’s a set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people or an organisation.

In this work we looked at national policies – in some regions with a single research funder that could be the funder policy but, in the UK the AHRC wouldn’t count here as that is not a national policy across the whole country. And the last known analysis of this sort dates back to 2013 and much has changed in that time.

We began by compiling and briefing describing a list of national policies in the EU and some ERA states (IS, NO, CH). We circulated that list for comment and additions. We also sought intelligence from contacts fro DCC European projects to ask about the status of national approaches, forthcoming or exiting policies, etc. We then attempted to classify the policies.

Across the thirteen countries we found: 6 funder policies; 4 national plans or roadmaps; 2 concordat type documents; 2 laws; and one working paper. There are more than 13 there as some parallel documents. Identifying the lead, ranking or sponsoring organisation was not always straightforward, sometimes documents were co-signed by partners or groups. All of the policies discussed research data; 7 addressed open access publication explicitly; 6 addressed software, code, tools or models; 5 addressed methods, workflows or protocols, and one addressing physical (non-digital) samples.

Most policies were prescriptive or imperative. Monitoring of compliance and/or penalties are not that common. And these are new – only 2 policies pre-date 2014 but there are open preceeding open access policies. And new policies keep appearing as a result of our work… And two policies have been translated to English specifically because of this work (Estonia, Cyprus). The EC’s Open Research Data Pilot for Horizon 2020 was cited in multiple policy documents. And we hope that Brexit won’t diminish our role or engagement in European open data policy.

  1. IIIF: you can keep your head while all around are losing theirs! Scott Renton, University of Edinburgh

IIIF is the International Image Interoperability Framework which enables you to use images in your cultural heritage resources. IIIF works through two APIs. You bring in images through the Image API through IIIF compliant URLs, which have long URLs that include the region of the image, instructions for display, etc. The other API is the Presentation API which is much more about curation, including the ability to curate collections of content – so you can structure these as, say, an image of a building that is related to images of the rooms in that building.

We have images in Luna and we pushed on Luna to support IIIF and we did get success there. We have implemented IIIF in December. We made a lot of progress and have IIIF websites online. The workflows are really complex but it allows us to maintain one set of images and metadata through these embedded images, rather than having to copy and duplicate work. And those images are zoomable, draggable, etc. And Metadata games is also IIIF compliant. And it is feeding into our websites including the new St Cecilia’s Hall museum website.

Our next implementation was the Coimbra virtual implementation – which includes other people’s images. For our images, and other IIIF compliant organisations that was easy, but we had to set up our own server (named Cantaloupe) to manage those images from others.

The next challenge was the Mahabharata Scroll. It is a huge document but the IIF spec and Luna allows us to prorgamme a sequence of viewers…

And our main achievement has been Polyanno that allows annotation that can then be stored in manifests, to upload and discuss annotations. It’s proving very popular with the IIIF community. We have huge amount of images to convert to IIIF but lots of plans, lots of ideas, and lots to do…

We are also collabortion with NLS around their content, and are up to talk with others about IIIF!

  1. Reference Rot in theses: a HiberActive pilot. Nicola Osborne, EDINA, University of Edinburgh

This was my presentation – so notes from me here but some links to Site2Cite, a working demo/pilot tool for researchers to proactively archive their web citations as they are doing their research, to ensure that by the time they submit their PhD, have their work published, or begin follow up work, they still have access to those important resources.

Introducing Site2Cite:

Try out the Site2Cite tools for yourself here:

You can view my full slides (slightly updated to make more sense for those who didn’t hear the accompanying talk) from the 10×10 here:

This ISG Innovation Funded pilot project builds upon our previous Andrew W. Mellon-funded Hiberlink project; a collaboration between EDINA, Los Alamos National Laboratory, and the University of Edinburgh School of Informatics. The Hiberlink project built on and worked with Herbert Van de Sompel’s and his Memento work.

  1. Lifting the lid on global research impact: implementation and analysis of a Request a Copy service. Dimity Flanagan, London School of Economics and Political Science

Apologies for missing the first few minutes of Dimity’s talk…

LSE have only recently implemented the “request a copy” button in the repository but, having done that Dimity and colleagues have been researching how it is used.

We’ve had about 500 requests so far. The most popular requests have been for international relations, law and media areas. And we see demand from organisations and governments – including requests explicitly stating that they do not subscribe to the journal and they felt it was crucial to their work. There is that potential impact here being revealed in requests for articles ahead of key meetings and events, etc.

And these requests show huge reach form organisations locally and around the world.

One thing we have noticed is that we get a lot of requests from students who can definitely access the version of record through journals subscribed to by their university – they don’t realise and that causes avoidable delay. We have also seen academics linking from reading lists to restricted items in repositories. But, on a more positive note, we’ve had lots of requests from our alumni – 70% of our alumni are international and that shows really positive impact for our work.

Overall this button and the evidence that requests provide has been really positive.

  1. What RADAR did next: developing a peer review process for research plans. Nicola Siminson, Glasgow School of Art

RADAR captures performances, exhibitions, as well as traditional articles, monographs etc. It is hosted on EPrints. And we encourage staff to add as much metadata as possible. But increasingly it is being used internally, with staff developing annual research plans (ARPs) and that feeding into allocations in the year ahead.

These ARPs arose in part from the outcome of the REF 2014 assessment. These are peer reviewed (but not openly available) ARPs aim to enable research time to be allocated more effectively with a view to maximising the number of high quality submissions to the next REF. RADAR houses the template as it played a key role in the GSA REF 2014 submissions, and staff already use and know the system.

The templates went live in 2015, and was tweaked, tried and relaunched in February 2015. The ARP template captures the research, the researchers details, and the expected impact of their work – and a submit process. The process was really quite manual so we thought carefully about how this should work… So once submitted the digital ARP went into a manual process. Once piloted we built the peer review process into RADAR, including access management that allows the researcher sole access until submitted, and then manages access back and forth as required.

We discussed this work with EPrints in Autumn 2016 and development commenced in Spring 2017. This was quite an involved process. The system was live in time for ARP panel chairs to send feedback and results.

So the process now sees ARPS submit; RADAR admin provides Head of Research with report of all ARPs submitted. Then it goes through a series of review stages and feedback stages.

So administrators can view ARPs, panels, status, etc. and there is space for reviews to be captured and the outcome to be shared.

Lessons learned here… No matter how much testing you have done, you’ll still need to tweak and flag things – it’s useful to have a keen researcher to test it and feedback on ‘those tweaks. We still need to increase prominence of summary and decision for the researcher, with more differentiated fields for peer reviews, etc. In conclusion the ARP peer reviewed process has been integrated into RADAR and will be fully tested next year. The continued development of RADAR is bearing fruit – researchers are using the repository and adding more outputs, and offering greater visibility and downloads for GSA.

Explore our repository at

  1. Edinburgh DataVault: Local implementation of Jisc DataVault: the value of testing. Pauline Ward, EDINA

I am Pauline Ward from the Research Data Service at the University of Edinburgh, and I am based at EDINA which is part of the University. Jisc commissioned UoE’s Library and University Collections (L&UC) team to design a service for researchers to store data for the long term with the Jisc Data Vault. And we’ve now implemented a version of this at Edinburgh – using that software from L&UC and specified and managed by EDINA.

The DataVault allows safe data storage in the University’s archival storage option, which links this data to a metadata record in Pure without having to re-enter any of the data. And, optionally, to receive a DOI for the data which can be used in publications and other outputs – depending on the context and appropriate visibility of the data. That allows preservation of data at the University. The DataVault is not for making data public – we have a service called DataShare for that.

So, let’s talk about metadata… We push that metadata to Pure and keep DataVault metadata as concise as possible. We need metadata that is usable and have some manual intervention to check and curate that.

We had a fairly extensive user testing process, to ensure documentation works well, then we also recruited academics from across the University to bring us their data and test the system to help us ensure it met their needs.

So, the interim version is out there, and we are continuing to develop and improve it.

  1. Data Management & Preservation using PURE and Archivematica at Strathclyde. Alan Morrisson, University of Strathclyde

We are governed and based in the research department. We wanted to look at both research data management and long term preservation, including reflecting on whether Pure is the right tool for the job here. Pure was already in use at Strathclyde when our Research Data Deposit Policy was being developed, so we deliberately made the policy as open as possible. Also Strathclyde is predominantly a STEM university, and we started off by surveying what else was out there… We knew the quantity and type of data coming in…

And since we opened up the service, in terms of data deposits to date we are have seen a steady increase from about 200 to 400 data sets over the last year.

In terms of our preservation and curation systems we have Pure in place and that does a lot – data storage, metadata, DOI etc. But we’ve also recently implemented Archivematica – it’s free, it’s open source, it’s compatible with Jisc DataVault. So the workflow right now is that data, metadata and related outputs are added to to Pure, and a DOI minted. This feeds the knowledgebase portal. In parallel the data from Pure goes to Archivematica where it is ingested and processed for preservation, and AIP METS file cleaned using METSflask before being stored.

The benefits of this set up is that Pure is familiar to researchers, does a good job of metadata management and related content and has a customised front end (Knowledgebase). Archivematica is well supported, open access, and designed for archiving. But those systems don’t work together, we are manually moving data across. Pure is designed for storage and presentation, not curation. Archivematica only recognises about 40% of the data.

So, in the future we are reviewing our system, perhaps using Pure for metadata only. We are keeping an eye on Jisc RDSS and considering possible Arkivum like storage. And generally looking at what is possible and most appropriate moving forward for curation and archiving.

  1. Open Access… From Oblivion… To the Spotlight? Dawn Hibbert, University of Northampton

I’ll be looking back over the last ten years… And actually ten years back I was working here in Accommodation Services, so not thinking about repositories at all!

Looking back at 2007/8 in the repository world we had our NECTAR repository. Then in 2011, Jisc funded project enabled an author deposit tool for NECTAR. At that time we had a carrot/incentive for deposit, but no stick. Which was actually a nice thing as we’ve now slipped more towards it all being about the REF.

By 2012/13 we engaged with our researchers around open access who had feedback such as “it’s in the library – you can get a copy from there” or “it’s only £30 to buy the journal I publish in, if I make my article free the journal go under” or “My work is not funded by RCUK so why should my work be open access”. We wanted everything open… But by 2014/15 (and the HEFCE announcement) we were still getting “I don’t have to give you anything until 2016” and similar… And we get that idea of “it’s all about the REF”. And it is not. Using the REF in that way, and the repository in that way overlooks the other benefits of open access.

So in 2016/17 HEFCE compliance started. Attitudes have shifted. But the focus has all been about gold APCs and the idea of the university paying. When actually we are using the HEFCE deposit and (later) open access green OA route. And for us we really want researchers to deposit much more than the open access part (we can do that later on).

So, in 2017 and beyond we are looking at emphasising the benefits, sharing that information, being positive about the opportunities, no just using the HEFCE stick. And for open access work we are looking at improving acceptance, extending open access to other outputs, and focus on visibility of research outputs – the Kudos type tool. And we are shifting the focus to Digital Preservation.

We are looking at datasets being open access too. RDM and Digital Preservation gaining ground. And when work is deposited, shared, tweeted, etc. that can really shift attitudes and show benefits and engagement for academic colleagues.

But we still see lots of money spent on PA and journal subscriptions. And we have yet to see what happens with RCUK and REF compliance.

  1. Automated metadata collection from the researcher CV Lattes Platform to aid IR ingest. Chloe Furnival, Universidade Federal de São Carlos

I am pleased to present work by myself and my colleagues from Sao Paulo in Brazil. Back in 1999 all Brail universities were required to share CVs of their research and academic staff on a platform (Curriculo Lattes) which now has over 2 million records now.

However, our University’s repository was only launched in 2016. Different to many universities using Web of Science or Scopus capturing their researchers’ work there, we saw that the Lattes CV Platform was the key and most up to date metadata – always extremely updated as required in funding. It is a really useful stepping stone to identify our staff publications for the initial repository.

So we have very well known researchers, Mena-Chalco and Cesar Jr (2013) who developed ScriptLattes for this extraction. But then the CNPq decided to implement a CAPTCHA which inhibits this Script. They alleged this was for security reasons but it created an uproar as it was seen as “our data”… So, this has all been very complicated and impacted on our plans to identify our own researchers’ work… So we went for SOAP (Simple Object Access Protocol). We also developed a proxy server to deal with CNPq limits. This is based on OpenResty platform to share access to the Lattes SOAP webservices. That lets us manage our local IP address and manage load/avoid going over capacity.

We extract data in xml format, then process in Python to generate Dublin Core. Then we use another script to eliminate duplicates using the Jaccard measure that helps detects differences… Then, once processed, it is held in DSpace. Each record in Lattes has a unique identifier as that site uses an ID number that all Brazillians are required to have to access e.g. a bank account.

So now we have the CVs of 1,166 teaching staff and researchers working at our HEI were retrieved in just 11 minutes. including metadata for 78K journal articles and proceedings papers. We had the specific objective of gaining direct and official access to public metadata held in Lattes CV.

  1. The Changing Face of Goldsmiths Research Online. Jeremiah Spillane, Goldsmiths, University of London

JS: Goldsmiths Research Online started as a vanilla install of EPrints, and it has become customised more and more over time. Important to that development have been several projects. The Jisc Kultur project created a transferable and sustainable institutional repository model for research output i the creative and applied arts, and creating facility for capturing multimedia content in repositories.

Kultur led to the Jisc Kaptur project, led by VADS working with various art colleges including Goldsmiths and GSA.

Then in 2009 we had the Defiant Objects project which looked to understand what makes some objects more difficult to deposit than others.

Jeremiah’s colleague: RAE/REF work has looked at policy versus the open access ethos – and striking the right balance there. So, the Goldsmiths website now includes content brought in from the repository. And that is now organised depending on the needs of different departments. We are also redesigning the website to better embed content to enable exploration of visual content. And the new design should be in place by autumn this year.

Speaking of design… We have been working with OJS but have been wanting to more thoroughly design OJS journals, so we have a new journal coming, Volupte, which runs on OJS in the background but uses SquareSpace at the front end – that’s a bit of an experiment at the moment.

JS: So, the repository continues to develop, whilst our end users primarily focus on their research.

Take a look at:

And with that Day One, and my visit to Repository Fringe 2017, is done. 

Jun 302017

Today I’m at ReCon 2017, giving a presentation later (flying the flag for the unconference sessions!) today but also looking forward to a day full of interesting presentations on publishing for early careers researchers.

I’ll be liveblogging (except for my session) and, as usual, comments, additions, corrections, etc. are welcomed. 

Jo Young, Director of the Scientific Editing Company, is introducing the day and thanking the various ReCon sponsors. She notes: ReCon started about five years ago (with a slightly different name). We’ve had really successful events – and you can explore them all online. We have had a really stellar list of speakers over the years! And on that note…

Graham Steel: We wanted to cover publishing at all stages, from preparing for publication, submission, journals, open journals, metrics, alt metrics, etc. So our first speakers are really from the mid point in that process.

SESSION ONE: Publishing’s future: Disruption and Evolution within the Industry

100% Open Access by 2020 or disrupting the present scholarly comms landscape: you can’t have both? A mid-way update – Pablo De Castro, Open Access Advocacy Librarian, University of Strathclyde

It is an honour to be at this well attended event today. Thank you for the invitation. It’s a long title but I will be talking about how are things are progressing towards this goal of full open access by 2020, and to what extent institutions, funders, etc. are being able to introduce disruption into the industry…

So, a quick introduction to me. I am currently at the University of Strathclyde library, having joined in January. It’s quite an old university (founded 1796) and a medium size university. Previous to that I was working at the Hague working on the EC FP7 Post-Grant Open Access Pilot (Open Aire) providing funding to cover OA publishing fees for publications arising from completed FP7 projects. Maybe not the most popular topic in the UK right now but… The main point of explaining my context is that this EU work was more of a funders perspective, and now I’m able to compare that to more of an institutional perspective. As a result o of this pilot there was a report commissioned b a British consultant: “Towards a competitive and sustainable open access publishing market in Europe”.

One key element in this open access EU pilot was the OA policy guidelines which acted as key drivers, and made eligibility criteria very clear. Notable here: publications to hybrid journals would not be funded, only fully open access; and a cap of no more than €2000 for research articles, €6000 for monographs. That was an attempt to shape the costs and ensure accessibility of research publications.

So, now I’m back at the institutional open access coalface. Lots had changed in two years. And it’s great to be back in this spaces. It is allowing me to explore ways to better align institutional and funder positions on open access.

So, why open access? Well in part this is about more exposure for your work, higher citation rates, compliant with grant rules. But also it’s about use and reuse including researchers in developing countries, practitioners who can apply your work, policy makers, and the public and tax payers can access your work. In terms of the wider open access picture in Europe, there was a meeting in Brussels last May where European leaders call for immediate open access to all scientific papers by 2020. It’s not easy to achieve that but it does provide a major driver… However, across these countries we have EU member states with different levels of open access. The UK, Netherlands, Sweden and others prefer “gold” access, whilst Belgium, Cyprus, Denmark, Greece, etc. prefer “green” access, partly because the cost of gold open access is prohibitive.

Funders policies are a really significant driver towards open access. Funders including Arthritis Research UK, Bloodwise, Cancer Research UK, Breast Cancer Now, British Heard Foundation, Parkinsons UK, Wellcome Trust, Research Councils UK, HEFCE, European Commission, etc. Most support green and gold, and will pay APCs (Article Processing Charges) but it’s fair to say that early career researchers are not always at the front of the queue for getting those paid. HEFCE in particular have a green open access policy, requiring research outputs from any part of the university to be made open access, you will not be eligible for the REF (Research Excellence Framework) and, as a result, compliance levels are high – probably top of Europe at the moment. The European Commission supports green and gold open access, but typically green as this is more affordable.

So, there is a need for quick progress at the same time as ongoing pressure on library budgets – we pay both for subscriptions and for APCs. Offsetting agreements are one way to do this, discounting subscriptions by APC charges, could be a good solutions. There are pros and cons here. In principal it will allow quicker progress towards OA goals, but it will disproportionately benefit legacy publishers. It brings publishers into APC reporting – right now sometimes invisible to the library as paid by researchers, so this is a shift and a challenge. It’s supposed to be a temporary stage towards full open access. And it’s a very expensive intermediate stage: not every country can or will afford it.

So how can disruption happen? Well one way to deal with this would be the policies – suggesting not to fund hybrid journals (as done in OpenAire). And disruption is happening (legal or otherwise) as we can see in Sci-Hub usage which are from all around the world, not just developing countries. Legal routes are possible in licensing negotiations. In Germany there is a Projekt Deal being negotiated. And this follows similar negotiations by open At the moment Elsevier is the only publisher not willing to include open access journals.

In terms of tools… The EU has just announced plans to launch it’s own platform for funded research to be published. And Wellcome Trust already has a space like this.

So, some conclusions… Open access is unstoppable now, but still needs to generate sustainable and competitive implementation mechanisms. But it is getting more complex and difficult to disseminate to research – that’s a serious risk. Open Access will happen via a combination of strategies and routes – internal fights just aren’t useful (e.g. green vs gold). The temporary stage towards full open access needs to benefit library budgets sooner rather than later. And the power here really lies with researchers, which OA advocates aren’t always able to get informed. It is important that you know which are open and which are hybrid journals, and why that matters. And we need to think if informing authors on where it would make economic sense to publish beyond the remit of institutional libraries?

To finish, some recommended reading:

  • “Early Career Researchers: the Harbingers of Change” – Final report from Ciber, August 2016
  • “My Top 9 Reasons to Publish Open Access” – a great set of slides.


Q1) It was interesting to hear about offsetting. Are those agreements one-off? continuous? renewed?

A1) At the moment they are one-off and intended to be a temporary measure. But they will probably mostly get renewed… National governments and consortia want to understand how useful they are, how they work.

Q2) Can you explain green open access and gold open access and the difference?

A2) In Gold Open Access, the author pays to make your paper open on the journal website. If that’s a hybrid – so subscription – journal you essentially pay twice, once to subscribe, once to make open. Green Open Access means that your article goes into your repository (after any embargo), into the world wide repository landscape (see:

Q3) As much as I agree that choices of where to publish are for researchers, but there are other factors. The REF pressures you to publish in particular ways. Where can you find more on the relationships between different types of open access and impact? I think that can help?

A3) Quite a number of studies. For instance is APC related to Impact factor – several studies there. In terms of REF, funders like Wellcome are desperate to move away from the impact factor. It is hard but evolving.

Inputs, Outputs and emergent properties: The new Scientometrics – Phill Jones, Director of Publishing Innovation, Digital Science

Scientometrics is essentially the study of science metrics and evaluation of these. As Graham mentioned in his introduction, there is a whole complicated lifecycle and process of publishing. And what I will talk about spans that whole process.

But, to start, a bit about me and Digital Science. We were founded in 2011 and we are wholly owned by Holtzbrink Publishing Group, they owned Nature group. Being privately funded we are able to invest in innovation by researchers, for researchers, trying to create change from the ground up. Things like labguru – a lab notebook (like rspace); Altmetric; Figshare; readcube; Peerwith; transcriptic – IoT company, etc.

So, I’m going to introduce a concept: The Evaluation Gap. This is the difference between the metrics and indicators currently or traditionally available, and the information that those evaluating your research might actually want to know? Funders might. Tenure panels – hiring and promotion panels. Universities – your institution, your office of research management. Government, funders, policy organisations, all want to achieve something with your research…

So, how do we close the evaluation gap? Introducing altmetrics. It adds to academic impact with other types of societal impact – policy documents, grey literature, mentions in blogs, peer review mentions, social media, etc. What else can you look at? Well you can look at grants being awarded… When you see a grant awarded for a new idea, then publishes… someone else picks up and publishers… That can take a long time so grants can tell us before publications. You can also look at patents – a measure of commercialisation and potential economic impact further down the link.

So you see an idea germinate in one place, work with collaborators at the institution, spreading out to researchers at other institutions, and gradually out into the big wide world… As that idea travels outward it gathers more metadata, more impact, more associated materials, ideas, etc.

And at Digital Science we have innovators working across that landscape, along that scholarly lifecycle… But there is no point having that much data if you can’t understand and analyse it. You have to classify that data first to do that… Historically we did that was done by subject area, but increasingly research is interdisciplinary, it crosses different fields. So single tags/subjects are not useful, you need a proper taxonomy to apply here. And there are various ways to do that. You need keywords and semantic modeling and you can choose to:

  1. Use an existing one if available, e.g. MeSH (Medical Subject Headings).
  2. Consult with subject matter experts (the traditional way to do this, could be editors, researchers, faculty, librarians who you’d just ask “what are the keywords that describe computational social science”).
  3. Text mining abstracts or full text article (using the content to create a list from your corpus with bag of words/frequency of words approaches, for instance, to help you cluster and find the ideas with a taxonomy emerging

Now, we are starting to take that text mining approach. But to use that data needs to be cleaned and curated to be of use. So we hand curated a list of institutions to go into GRID: Global Research Identifier Database, to understand organisations and their relationships. Once you have that all mapped you can look at Isni, CrossRef databases etc. And when you have that organisational information you can include georeferences to visualise where organisations are…

An example that we built for HEFCE was the Digital Science BrainScan. The UK has a dual funding model where there is both direct funding and block funding, with the latter awarded by HEFCE and it is distributed according to the most impactful research as understood by the REF. So, our BrainScan, we mapped research areas, connectors, etc. to visualise subject areas, their impact, and clusters of strong collaboration, to see where there are good opportunities for funding…

Similarly we visualised text mined impact statements across the whole corpus. Each impact is captured as a coloured dot. Clusters show similarity… Where things are far apart, there is less similarity. And that can highlight where there is a lot of work on, for instance, management of rivers and waterways… And these weren’t obvious as across disciplines…


Q1) Who do you think benefits the most from this kind of information?

A1) In the consultancy we have clients across the spectrum. In the past we have mainly worked for funders and policy makers to track effectiveness. Increasingly we are talking to institutions wanting to understand strengths, to predict trends… And by publishers wanting to understand if journals should be split, consolidated, are there opportunities we are missing… Each can benefit enormously. And it makes the whole system more efficient.

Against capital – Stuart Lawson, Birkbeck University of London

So, my talk will be a bit different. The arguements I will be making are not in opposition to any of the other speakers here, but is about critically addressing our current ways we are working, and how publishing works. I have chosen to speak on this topic today as I think it is important to make visible the political positions that underly our assumptions and the systems we have in place today. There are calls to become more efficient but I disagree… Ownership and governance matter at least as much as the outcome.

I am an advocate for open access and I am currently undertaking a PhD looking at open access and how our discourse around this has been coopted by neoliberal capitalism. And I believe these issues aren’t technical but social and reflect inequalities in our society, and any company claiming to benefit society but operating as commercial companies should raise questions for us.

Neoliberalism is a political project to reshape all social relations to conform to the logic of capital (this is the only slide, apparently a written and referenced copy will be posted on Stuart’s blog). This system turns us all into capital, entrepreneurs of our selves – quantification, metricification whether through tuition fees that put a price on education, turn students into consumers selecting based on rational indicators of future income; or through pitting universities against each other rather than collaboratively. It isn’t just overtly commercial, but about applying ideas of the market in all elements of our work – high impact factor journals, metrics, etc. in the service of proving our worth. If we do need metrics, they should be open and nuanced, but if we only do metrics for people’s own careers and perform for careers and promotion, then these play into neoliberal ideas of control. I fully understand the pressure to live and do research without engaging and playing the game. It is easier to choose not to do this if you are in a position of privelege, and that reflects and maintains inequalities in our organisations.

Since power relations are often about labour and worth, this is inevitably part of work, and the value of labour. When we hear about disruption in the context of Uber, it is about disrupting rights of works, labour unions, it ignores the needs of the people who do the work, it is a neo-liberal idea. I would recommend seeing Audrey Watters’ recent presentation for University of Edinburgh on the “Uberisation of Education”.

The power of capital in scholarly publishing, and neoliberal values in our scholarly processes… When disruptors align with the political forces that need to be dismantled, I don’t see that as useful or properly disruptive. Open Access is a good thing in terms of open access. But there are two main strands of policy… Research Councils have spent over £80m to researchers to pay APCs. Publishing open access do not require payment of fees, there are OA journals who are funded other ways. But if you want the high end visible journals they are often hybrid journals and 80% of that RCUK has been on hybrid journals. So work is being made open access, but right now this money flows from public funds to a small group of publishers – who take a 30-40% profit – and that system was set up to continue benefitting publishers. You can share or publish to repositories… Those are free to deposit and use. The concern of OA policy is the connection to the REF, it constrains where you can publish and what they mean, and they must always be measured in this restricted structure. It can be seen as compliance rather than a progressive movement toward social justice. But open access is having a really positive impact on the accessibility of research.

If you are angry at Elsevier, then you should also be angry at Oxford University and Cambridge University, and others for their relationships to the power elite. Harvard made a loud statement about journal pricing… It sounded good, and they have a progressive open access policy… But it is also bullshit – they have huge amounts of money… There are huge inequalities here in academia and in relationship to publishing.

And I would recommend strongly reading some history on the inequalities, and the racism and capitalism that was inherent to the founding of higher education so that we can critically reflect on what type of system we really want to discover and share scholarly work. Things have evolved over time – somewhat inevitably – but we need to be more deliberative so that universities are more accountable in their work.

To end on a more positive note, technology is enabling all sorts of new and inexpensive ways to publish and share. But we don’t need to depend on venture capital. Collective and cooperative running of organisations in these spaces – such as the cooperative centres for research… There are small scale examples show the principles, and that this can work. Writing, reviewing and editing is already being done by the academic community, lets build governance and process models to continue that, to make it work, to ensure work is rewarded but that the driver isn’t commercial.


Comment) That was awesome. A lot of us here will be to learn how to play the game. But the game sucks. I am a professor, I get to do a lot of fun things now, because I played the game… We need a way to have people able to do their work that way without that game. But we need something more specific than socialism… Libraries used to publish academic data… Lots of these metrics are there and useful… And I work with them… But I am conscious that we will be fucked by them. We need a way to react to that.

Redesigning Science for the Internet Generation – Gemma Milne, Co-Founder, Science Disrupt

Science Disrupt run regular podcasts, events, a Slack channel for scientists, start ups, VCs, etc. Check out our website. We talk about five focus areas of science. Today I wanted to talk about redesigning science for the internet age. My day job is in journalism and I think a lot about start ups, and to think about how we can influence academia, how success is manifests itself in the internet age.

So, what am I talking about? Things like Pavegen – power generating paving stones. They are all over the news! The press love them! BUT the science does not work, the physics does not work…

I don’t know if you heard about Theranos which promised all sorts of medical testing from one drop of blood, millions of investments, and it all fell apart. But she too had tons of coverage…

I really like science start ups, I like talking about science in a different way… But how can I convince the press, the wider audience what is good stuff, and what is just hype, not real… One of the problems we face is that if you are not engaged in research you either can’t access the science, and can’t read it even if they can access the science… This problem is really big and it influences where money goes and what sort of stuff gets done!

So, how can we change this? There are amazing tools to help (Authorea, overleaf,, figshare, publons, labworm) and this is great and exciting. But I feel it is very short term… Trying to change something that doesn’t work anyway… Doing collaborative lab notes a bit better, publishing a bit faster… OK… But is it good for sharing science? Thinking about journalists and corporates, they don’t care about academic publishing, it’s not where they go for scientific information. How do we rethink that… What if we were to rethink how we share science?

AirBnB and Amazon are on my slide here to make the point of the difference between incremental change vs. real change. AirBnB addressed issues with hotels, issues of hotels being samey… They didn’t build a hotel, instead they thought about what people want when they traveled, what mattered for them… Similarly Amazon didn’t try to incrementally improve supermarkets.. They did something different. They dug to the bottom of why something exists and rethought it…

Imagine science was “invented” today (ignore all the realities of why that’s impossible). But imagine we think of this thing, we have to design it… How do we start? How will I ask questions, find others who ask questions…

So, a bit of a thought experiment here… Maybe I’d post a question on reddit, set up my own sub-reddit. I’d ask questions, ask why they are interested… Create a big thread. And if I have a lot of people, maybe I’ll have a Slack with various channels about all the facets around a question, invite people in… Use the group to project manage this project… OK, I have a team… Maybe I create a Meet Up Group for that same question… Get people to join… Maybe 200 people are now gathered and interested… You gather all these folk into one place. Now we want to analyse ideas. Maybe I share my question and initial code on GitHub, find collaborators… And share the code, make it open… Maybe it can be reused… It has been collaborative at every stage of the journey… Then maybe I want to build a microscope or something… I’d find the right people, I’d ask them to join my Autodesk 360 to collaboratively build engineering drawings for fabrication… So maybe we’ve answered our initial question… So maybe I blog that, and then I tweet that…

The point I’m trying to make is, there are so many tools out there for collaboration, for sharing… Why aren’t more researchers using these tools that are already there? Rather than designing new tools… These are all ways to engage and share what you do, rather than just publishing those articles in those journals…

So, maybe publishing isn’t the way at all? I get the “game” but I am frustrated about how we properly engage, and really get your work out there. Getting industry to understand what is going on. There are lots of people inventing in new ways.. YOu can use stuff in papers that isn’t being picked up… But see what else you can do!

So, what now? I know people are starved for time… But if you want to really make that impact, that you think is more interested… I undesrtand there is a concern around scooping… But there are ways to do that… And if you want to know about all these tools, do come talk to me!


Q1) I think you are spot on with vision. We want faster more collaborative production. But what is missing from those tools is that they are not designed for researchers, they are not designed for publishing. Those systems are ephemeral… They don’t have DOIs and they aren’t persistent. For me it’s a bench to web pipeline…

A1) Then why not create a persistent archived URI – a webpage where all of a project’s content is shared. 50% of all academic papers are only read by the person that published them… These stumbling blocks in the way of sharing… It is crazy… We shouldn’t just stop and not share.

Q2) Thank you, that has given me a lot of food for thought. The issue of work not being read, I’ve been told that by funders so very relevant to me. So, how do we influence the professors… As a PhD student I haven’t heard about many of those online things…

A2) My co-founder of Science Disrupt is a computational biologist and PhD student… My response would be about not asking, just doing… Find networks, find people doing what you want. Benefit from collaboration. Sign an NDA if needed. Find the opportunity, then come back…

Q3) I had a comment and a question. Code repositories like GitHub are persistent and you can find a great list of code repositories and meta-articles around those on the Journal of Open Research Software. My question was about AirBnB and Amazon… Those have made huge changes but I think the narrative they use now is different from where they started – and they started more as incremental change… And they stumbled on bigger things, which looks a lot like research… So… How do you make that case for the potential long term impact of your work in a really engaging way?

A3) It is the golden question. Need to find case studies, to find interesting examples… a way to showcase similar examples… and how that led to things… Forget big pictures, jump the hurdles… Show that bigger picture that’s there but reduce the friction of those hurdles. Sure those companies were somewhat incremental but I think there is genuinely a really different mindset there that matters.

And we now move to lunch. Coming up…

UNCONFERENCE SESSION 1: Best Footprint Forward – Nicola Osborne, EDINA

This will be me – talking about managing a digital footprint and how robust web links are part of that lasting digital legacy- so no post from me but you can view my slides on Managing Your Digital Footprint and our Reference Rot in Theses: A HiberActive Pilot here.

SESSION TWO: The Early Career Researcher Perspective: Publishing & Research Communication

Getting recognition for all your research outputs – Michael Markie, F1000

I’m going to talk about things you do as researchers that you should get credit for, not just traditional publications. This week in fact there was a very interesting article on the history of science publishing “Is the staggering profitable business of scientific publishing bad for science?”. Publishers came out of that poorly… And I think others are at fault here too, including the research community… But we do have to take some blame.

There’s no getting away from the fact that the journal is the coin of the realm, for career progression, institutional reporting, grant applications. For the REF, will there be impact factors? REF says maybe not, but institutions will be tempted to use that to prioritise. Publishing is being looked at by impact factor…

And it’s not just where you publish. There are other things that you do in your work and which you should get ore credit for. Data; software/code – in bioinformatics there are new softwares and tools that are part of the research, are they getting the recognition they should; all results – not just the successes but also the negative results… Publishers want cool and sexy stuff but realistically we are funded for this, we should be able to publish and be recognised for it; peer review – there is no credit for it, peer reviews often improve articles and warrant credit; expertise – all the authors who added expertise, including non-research staff, everyone should know who contributed what…

So I see research as being more than a journal article. Right now we just package it all up into one tidy thing, but we should be fitting into that bigger picture. So, I’m suggesting that we need to disrupt it a bit more and pubis in a different way… Publishing introduces delays – of up to a year. Journals don’t really care about data… That’s a real issue for reproducibility.  And there is bias involved in publishing, there is a real lack of transparency in publishing decisions. All of the above means there is real research waster. At the same time there is demand for results, for quicker action, for wider access to work.

So, at F1000 we have been working on ways to address these issues. We launched Wellcome Open Research, and after launching that the Bill & Melinda Gated Foundation contacted us to build a similar platform. And we have also built an open research model for UCL Child Health (at St Ormond’s Street).

The process involves sending a paper in, checking there is plagiarism and that ethics are appropriate. But no other filtering. That can take up to 7 days. Then we ask for your data – no data then no publication. Then once the publication and data deposition is made, the work is published and an open peer review and user commenting process begins, they are names and credited, and they contribute to improve that article and contribute to the article revision. Those reviewers have three options: approved, approved with reservations, or not approved as it stands. So yo get to PMC and indexed in PubMed you need two “approved” status of two “approved with reservations” and an “approved”.

So this connects to lots of stuff… For Data thats with DataCite, DigShare, Plotly, Resource Identification Initiative. For Software/code we work with code ocean, Zenodo, GitHub. For All results we work with PubMed, you can publish other formats… etc.

Why are funders doing this? Wellcome Trust spent £7m on APCs last year… So this platform is partly as a service to stakeholders with a complementary capacity for all research findings. We are testing new approach to improve science and its impact – to accelerate access and sharing of findings and data; efficiency to reduce waste and support reproducibility; alternative OA model, etc.

Make an impact, know your impact, show your impact – Anna Ritchie, Mendeley, Elsevier

A theme across the day is that there is increasing pressure and challenges for researchers. It’s never been easier to get your work out – new technology, media, platforms. And yet, it’s never been harder to get your work seen: more researchers, producing more outputs, dealing with competition. So how do you ensure you and your work make an impact? Options mean opportunities, but also choices. Traditional publishing is still important – but not enough. And there are both older and newer ways to help make your research stand out.

Publishing campus is a big thing here. These are free resources to support you in publishing. There are online lectures, interactive training courses, and expert advice. And things happen – live webinars, online lectures (e.g. Top 10 Tips for Writing a Really Terrible Journal Article!), interactive course. There are suits of materials around publishing, around developing your profile.

At some point you will want to look at choosing a journal. Metrics may be part of what you use to choose a journal – but use both quantitative and qualitative (e.g. ask colleagues and experts). You can also use Elsevier Journal Finder – you can search for your title and abstract and subject areas to suggest journals to target. But always check the journal guidance before submitting.

There is also the opportunity for article enrichments which will be part of your research story – 2D radiological data viewer, R code Viewer, Virtual Microscope, Genome Viewer, Audioslides, etc.

There are also less traditional journals: Heliyon is all disciplines so you report your original and technically sound results of primary research, regardless of perceived impact. Methodsx is entirely about methods work. Data in Brief allows you to describe your data to facilitate reproducibility, make it easier to cite, etc. And an alternative to a data article is to add datasets on Mendeley.

And you can also use Mendeley to understand your impact through Mendeley Stats. There is a very detailed dashboard for each publication – this is powered by Scopus so works for all articles indexed in Scopus. Stats like users, Mendeley users with that article in their library, citations, related works… And you can see how your article is being shared. You can also show your impact on Mendeley, with a research profile that is as comprehensive as possible –  not just your publications but with wider impacts, press mentions…. And enabling you to connect to other researchers, to other articles and opportunities. This is what we are trying to do to make Mendeley help you build your online profile as a researcher. We intend to grow those profiles to give a more comprehensive picture of you as a researcher.

And we want to hear from you. Every journal, platform, and product is co-developed with ongoing community input. So do get in touch!

How to share science with hard to reach groups and why you should bother – Becky Douglas

My background is physics, high energy gravitational waves, etc… As I was doing my PhD I got very involved in science engagement. Hopefully most of you think about science communication and public outreach as being a good thing. It does seem to be something that arise in job interviews and performance reviews. I’m not convinced that everyone should do this – not everyone enjoys or is good at it – but there is huge potential if you are enthusiastic. And there is more expectation on scientists to do this to gain recognition, to help bring trust back to scientists, and right some misunderstanding. And by the way talks and teaching don’t count here.

And not everyone goes to science festivals. It is up to us to provide alternative and interesting things for those people. There are a few people who won’t be interested in science… But there are many more people who don’t have time or don’t see the appeal to them. These people deserve access to new research… And there are many ways to communicate that research. New ideas are always worth doing, and can attract new people and get dialogue you’d never expect.

So, article writing is a great way to reach out… Not just in science magazines (or on personal blogs). Newspapers and magazines will often print science articles – reach out to them. And you can pitch other places too – Cosmo prints science. Mainstream publications are desperate for people who understand science to write about it in engaging ways – sometimes you’ll be paid for your work as well.

Schools are obvious, but they are great ways to access people from all backgrounds. You’ll do extra well if you can connect it to the current curriculum! Put the effort in to build a memorable activity or event. Send them home with something fun and you may well reach parents as well…

More unusual events would be things like theatre, for instance Lady Scientists Stitch and Bitch. Stitch and Bitch is an international thing where you get together and sew and craft and chat. So this show was a play which was about travelling back in time to gather all the key lady scientists, and they sit down to discuss science over some knitting and sewing. Because it was theatre it was an extremely diverse group, not people who usually go to science events. When you work with non scientists you get access to a whole new crowd.

Something a bit more unusual… Soapbox Science, I brought to Glasgow in 2015. It’s science busking where you talk about your cutting edge research. Often attached to science festivals but out in public, to draw a crowd from those shopping, or visiting museums, etc. It’s highly interactive. Most had not been to a science event before, they didn’t go out to see science, but they enjoyed it…

And finally, interact with local communities. WI have science events, Scouts and Guides, meet up groups… You can just contact and reach out to those groups. They have questions in their own effort. It allows you to speak to really interesting groups. But it does require lots of time. But I was based in Glasgow, now in Falkirk, and I’ve just done some of this with schools in the Goebbels where we knew that the kids rarely go on to science subjects…

So, this is really worth doing. You work, if it is tax-payer funded, should be accessible to the public. Some people don’t think they have an interest in science – some are right but others just remember dusty chalkboards and bland text books. You have to show them it’s something more than that.

What helps or hinders science communication by early career researchers? – Lewis MacKenzie

I’m a postdoc at the University of Leeds. I’m a keen science communicator and I try to get out there as much as possible… I want to talk about what helps or hinders science communication by early career researchers.

So, who are early career researchers? Well undergraduates are a huge pool of early career researchers and scientists which tend to be untapped; also PhDs; also postdocs. There are some shared barriers here: travel costs, time… That is especially the case in inaccessible parts of Scotland. There is a real issue that science communication is work (or training). And not all supervisors have a positive attitude to science communication. As well as all the other barriers to careers in science of course.

Let’s start with science communication training. I’ve been through the system as an undergraduate, PhD students and postdocs. A lot of training are (rightly) targeted at PhD students, often around writing, conferences, elevator pitches, etc. But there are issues/barriers for ECRs include… Pro-active sci comm is often not formally recognized as training/CPD/workload – especially at evenings and weekends. I also think undergraduate sci comm modules are minimal/non-existent. You get dedicated sci comm masters now, there is lots to explore. And there are relatively poor sci comm training opportunities for post docs. But across the board media skills training pretty much limited – how do you make youtube videos, podcasts, web comics, writing in a magazine – and that’s where a lot of science communication takes place!

Sci Comm in Schools includes some great stuff. STEMNET is an excellent way for ECRs, industry, retirees, etc as volunteers, some basic training, background checks, and a contact hub with schools and volunteers. However it is a confusing school system (especially in England) and curricula. How do you do age-appropriate communication. And just getting to the schools can be tricky – most PhDs and Sci Comm people won’t have a car. It’s basic but important as a barrier.

Science Communication Competitions are quite widespread. They tend to be aimed at PhD students, incentives being experience, training and prizes. But there are issues/barriers for ECRs – often conventional “stand and talk” format; not usually collaborative – even though team work can be brilliant, the big famous science communicators work with a team to put their shows and work together; intense pressure of competitions can be off putting… Some alternative formats would help with that.

Conferences… Now there was a tweet earlier this week from @LizyLowe suggesting that every conference should have a public engagement strand – how good would that be?!

Research Grant “Impact Plans”: major funders now require “impact plans” revolving around science communication. That makes time and money for science communication which is great. But there are issues. The grant writer often designate activities before ECRs are recruited. These prescriptive impact plans aren’t very inspiring for ECRS. Money may be inefficiently spent on things like expensive web design. I think we need a more agile approach to include input from ECRs once recruited.

Finally I wanted to finish with Science Communication Fellowships. These are run by people like Wellcome Trust Engagement Fellowships and the STFC. These are for the Olympic gold medallists of Sci Comm. But they are not great for ECRs. The dates are annual and inflexible – and the process is over 6 months – it is a slow decision making process. And they are intensively competitive so not very ECR friendly, which is a shame as many sci comm people are ECRs. So perhaps more institutions or agencies should offer sci comm fellowships? And  a continuous application process with shorter spells?

To sum up… ECRs at different career stages require different training and organisational support to enable science communication. And science communication needs to be recognised as formal work/training/education – not an out of hours hobby! There are good initiatives out there but there could be many more.

PANEL DISCUSSION – Michael Markie, F1000 (MM); Anna Ritchie, Mendeley, Elsevier (AR); Becky Douglas (BD); Lewis MacKenzie (LW) – chaired by Joanna Young (JY)

Q1 (JY): Picking up on what you said about Pathways to Impact statements… What advice would you give to ECRs if they are completing one of these? What should they do?

A1 (LM): It’s quite a weird thing to do… Two strands… This research will make loads of money and commercialise it; and the science communication strand. It’s easier to say you’ll do a science festival event, harder to say you’ll do press release… Can say you will blog you work once a month, or tweet a day in the lab… You can do that. In my fellowship application I proposed a podcast on biophysics that I’d like to do. You can be creative with your science communication… But there is a danger that people aren’t imaginative and make it a box-ticking thing. Just doing a science festival event and a webpage isn’t that exciting. And those plans are written once… But projects run for three years maybe… Things change, skills change, people on the team change…

A1 (BD): As an ECR you can ask for help – ask supervisors, peers, ask online, ask colleagues… You can always ask for advice!

A1 (MM): I would echo that you should ask experienced people for help. And think tactically as different funders have their own priorities and areas of interest here too.

Q2: I totally agree with the importance of communicating your science… But showing impact of that is hard. And not all research is of interest to the public – playing devil’s advocate – so what do you do? Do you broaden it? Do you find another way in?

A2 (LM): Taking a step back and talking about broader areas is good… I talk a fair bit about undergraduates as science communicators… They have really good broad knowledge and interest. They can be excellent. And this is where things like Science Soapbox can be so effective. There are other formats too.. Things like Bright Club which communicates research through comedy… That’s really different.

A2 (BD) I would agree with all of that. I would add that if you want to measure impact then you have to think about it from the outset – will you count people, some sort of voting or questionnaires. YOu have to plan this stuff in. The other thing is that you have to pitch things carefully to your audience. If I run events on gravitational waves I will talk about space and black holes… Whereas with a 5 year old I ask about gravity and we jump up and down so they understand what is relevant to them in their lives.

A2 (LM): In terms of metrics for science communication… At the British Science Association conference a few years back and this was a major theme… Becky mentioned getting kids to post notes in boxes at sessions… Professional science communicators think a great deal about this… Maybe not as much us “Sunday Fun Run” type people but we should engage more.

Comment (AR): When you prepare an impact statement are you asked for metrics?

A2 (LM): Not usually… They want impact but don’t ask about that…

A2 (BD): Whether or not you are asked for details of how something went you do want to know how you did… And even if you just ask “Did you learn something new today?” that can be really helpful for understanding how it went.

Q3: I think there are too many metrics… As a microbiologist… which ones should I worry about? Should there be a module at the beginning of my PhD to tell me?

A3 (AR): There is no one metric… We don’t want a single number to sum us up. There are so many metrics as one number isn’t enough, one isn’t enough… There is experimentation going on with what works and what works for you… So be part of the conversation, and be part of the change.

A3 (MM): I think there are too many metrics too… We are experimenting. Altmetrics are indicators, there are citations, that’s tangible… We just have to live with a lot of them all at once at the moment!

UNCONFERENCE SESSION 2: Preprints: A journey through time – Graham Steel

This will be a quick talk plus plenty of discussion space… From the onset of thinking about this conference I was very keen to talk about preprints…

So, who knows what a preprint is? There are plenty of different definitions out there – see Neylon et al 2017. But we’ll take the Wikipedia definition for now. I thought preprints dates to the 1990s. But I found a paper that referenced a pre-print from 1922!

Lets start there… Preprints were ticking along fine… But then a fightback began, In 1966 preprinte were made outlaws when Nature wanted to take “lethal steps” to end preprints. In 1969 we had a thing called the “Inglefinger Rule” – we’ll come back to that later… Technology wise various technologies ticked along… In 1989 Tim Berners Lee came along, In 1991 Cern set up, also ArXiv set up and grew swiftly… About 8k prepreints per month are uploaded to ArXiv each month as of 2016. Then, in 2007-12 we had Nature Preprints…

But in 2007, the fightback began… In 2012 the Ingelfinger rule was creating stress… There are almost 35k journals, only 37 still use the Ingelfinger rule… But they include key journals like Cell.

But we also saw the launch of BioaXiv in 2013. And we’ve had an explosion of preprints since then… Also 2013 there was a £5m Centre for Open Science set up. This is a central place for preprints… That is a central space, with over 2m preprints so far. There are now a LOT of new …Xiv preprint sites. In 2015 we saw the launch of the ASAPbio movement.

Earlier this year Mark Zuckerberg invested billions in boiXiv… But everything comes at a price…

Scottish spends on average £11m per year to access research through journals. The best average for APCs I could find is $906. Per pre-print it’s $10. If you want to post a pre-print you have to check the terms of your journal – usually extremely clear. Best to check in SHERPA/ROMEO.

If you want to find out more about preprints there is a great Twitter list, also some recommended preprints reading. Find these slides: and


Q1: I found Sherpa/Romeo by accident…. But really useful. Who runs it?

A1: It’s funded by Jisc

Q2: How about findability…

A2: ArXiv usually points to where this work has been submitted. And you can go back and add the DOI once published.

Q2: It’s acting as a static archive then? To hold the green copy

A2: And there is collaborative activity across that… And there is work to make those findable, to share them, they are shared on PubMed…

Q2: One of the problems I see is purely discoverability… Getting it easy to find on Google. And integration into knowledgebases, can be found in libraries, in portals… Hard for a researcher looking for a piece of research… They look for a subject, a topic, to search an aggregated platform and link out to it… To find the repository… So people know they have legal access to preprint copies.

A2: You have COAR at OU which aggregates preprints, suggests additional items when you search. There is ongoing work to integrate with CRIS systems, frequently commercial so interoperability here.

Comment: ArXiv is still the place for high energy physics so that is worth researchers going directly too…

Q3: Can I ask about preprints and research evaluation in the US?

A3: It’s an important way to get the work out… But the lack of peer review is an issue there so emerging stuff there…

GS: My last paper was taking forever to come out, we thought it wasn’t going to happen… We posted to PeerJ but discovered that that journal did use the Inglefinger Rule which scuppered us…

Comment: There are some publishers that want to put preprints on their own platform, so everything stays within their space… How does that sit/conflict with what libraries do…

GS: It’s a bit “us! us! us!”

Comment: You could see all submitted to that journal, which is interesting… Maybe not health… What happens if not accepted… Do you get to pull it out? Do you see what else has been rejected? Could get dodgy… Some potential conflict…

Comment: I believe it is positioned as a separate entity but with a path of least resistance… It’s a question… The thing is.. If we want preprints to be more in academia as opposed to publishers… That means academia has to have the infrastructure to do that, to connect repositories discoverable and aggregated… It’s a potential competitive relationship… Interesting to see how it plays out…

Comment: For Scopus and Web of Science… Those won’t take preprints… Takes ages… And do you want to give up more rights to the journals… ?

Comment: Can see why people would want multiple copies held… That seems healthy… My fear is it requires a lot of community based organisation to be a sustainable and competitive workflow…

Comment: Worth noting the radical “platinum” open access… Lots of preprints out there… Why not get authors to submit them, organise into free, open journal without a publisher… That’s Tim Garrow’s thing… It’s not hard to put together a team to peer review thematically and put out issues of a journal with no charges…

GS: That’s very similar to open library of humanities… And the Wellcome Trust & Gates Foundation stuff, and big EU platform. But the Gates one could be huge. Wellcome Trust is relatively small so far… But EU-wide will be major ramifications…

Comment: Platinum is more about overlay journals… Also like Scope3 and they do metrics on citations etc. to compare use…

GS: In open access we know about green, gold and with platinum it’s free to author and reader… But use of words different in different contexts…

Q4: What do you think the future is for pre-prints?

A4 – GS: There is a huge boom… There’s currently some duplication of central open preprints platform. But information is clear on use and uptake is on the rise… It will plateau at some point like PLoSOne. They launched 2006 and they probably plateaued around 2015. But it is number 2 in the charts of mega-journals, behind Scientific Reports. They increased APCs (around $1450) and that didn’t help (especially as they were profitable)…

SESSION THREE: Raising your research profile: online engagement & metrics

Green, Gold, and Getting out there: How your choice of publisher services can affect your research profile and engagement – Laura Henderson, Editorial Program Manager, Frontiers

We are based in Lausanne in Switzerland. We are fully digital, fully open access publisher. All of 58 journals are published under CC-BY licenses. And the organisation was set up scientists that wanted to change the landscape. So I wanted to talk today about how this can change your work.

What is traditional academic publishing?

Typically readers pay – journal subscriptions via institution/library or pay per view. Given the costs and number of articles they are expensive – ¢14B journals revenue in 2014 works out at $7k per article. It’s slow too.. Journal rejection cascade can take 6 months to a year each time. Up to 1 million papers – valid papers – are rejected every year. And these limit access to research around 80% of research papers are behind subscription paywalls. So knowledge gets out very slowly and inaccessibly.

By comparison open access… Well Green OA allows you to publish an dthen self-archive your paper in a repository where it can be accessed for free. you can use an institutional or central repository, or I’d suggest both. And there can be a delay due to embargo. Gold OA makes research output immediately available from th epublisher and you retain the copyright so no embargoes. It is fully discoverable via indexing and professional promotion services to relevant readers. No subscription fee to reader but usually involves APCs to the institution.

How does Open Access publishing compare? Well it inverts the funding – institution/grant funder supports authors directly, not pay huge subscrition fees for packages dictates by publishers. It’s cheaper – Green OA is usually free. Gold OA average fee is c. $1000 – $3000 – actually that’s half what is paid for subscription publishing. We do see projections of open access overtaking subscription publishing by 2020.

So, what benefits does open access bring? Well there is peer-review; scalable publishing platforms; impact metrics; author discoverability and reputation.

And I’d now like to show you what you should look for from any publisher – open access or others.

Firstly, you should expect basic services: quality assurance and indexing. Peter Suber suggests checking the DOAJ – Directory of Open Access Journals. You can also see if the publisher is part of OASPA which excludes publishers who fail to meet their standards. What else? Look for peer review nad good editors – you can find the joint COPE/OASPA/DOAJ Principles of Transaparancy and Best Practice in Scholarly Publishing. So you need to have clear peer review proceses. And you need a governing board and editors.

At Frontiers we have an impact-neutral peer review oricess. We don’t screen for papers with highest impact. Authors, reviewers and handling Associate Editor interact directly with each other in the online forum. Names of editors and reviewers publishhed on final version of paper. And this leads to an average of 89 days from submission to acceptance – and that’s an industry leading timing… And that’s what won an ASPLP Innovation Award.

So, what are the extraordinary services a top OA publisher can provide? Well altmetrics are more readily available now. Digital articles are accessible and trackable. In Frontiers our metrics are built into every paper… You can see views, downloads, and reader demographics. And that’s post-publication analytics that doesn’t rely on impact factor. And it is community-led imapact – your peers decide the impact and importance.

How discoverable are you? We launched a bespoke built-in networking profile for every author and user: Loop. Scrapes all major index databases to find youe work – constatly updating. It’s linked to Orchid and is included in peer review process. When people look at your profile you can truly see your impact in the world.

In terms of how peers find your work we have article alerts going to 1 million people, and a newsletter that goes to 300k readers. And our articles have 250 million article views and downloads, with hotspots in Mountain View California, and in Shendeng, and areas of development in the “Global South”.

So when you look for a publisher, look for a publisher with global impact.

What are all these dots and what can linking them tell me? – Rachel Lammey, Crossref

Crossref are a not for profit organisation. So… We have articles out there, datasets, blogs, tweets, Wikipedia pages… We are really interested to understand these links. We are doing that through Crossref Event Data, tracking the conversation, mainly around objects with a DOI. The main way we use and mention publications is in the citations of articles. That’s the traditional way to discuss research and understand news. But research is being used in lots of different ways now – Twitter and Reddit…

So, where does Crossref fit in? It is the DOI registration agency for scholarly content. Publishers register their content with us. URLs do change and do break… And that means you need something ore persistent so it can still be used in their research… Last year at ReCon we tried to find DOI gaps in reference lists – hard to do. Even within journals publications move around… And switch publishers… The DOI fixes that reference. We are sort of a switchboard for that information.

I talked about citations and references… Now we are looking beyong that. It is about capturing data and relationships so that understanding and new services (by others) can be built… As such it’s an API (Application Programming Interface) – it’s lots of data rather than an interface. SO it captures subject, relation, object, tweet, mentions, etc. We are generating this data (As of yesterday we’ve seen 14 m events), we are not doing anything with it so this is a clear set of data to do further work on.

We’ve been doing work with NISO Working Group on altmetrics, but again, providing the data not the analysis. So, what can this data show? We see citation rings/friends gaming the machine; potential peer review scams; citation patterns. How can you use this data? Almost any way. Come talk to us about Linked Data; Article Level Metrics; general discoverability, etc.

We’ve done some work ourselves… For instant the Live Data from all sources – including Wikipedia citing various pages… We have lots of members in Korea, and started looking just at citations on Korean Wikipedia. It’s free under a CC0 license. If you are interested, go make something cool… Come ask me questions… And we have a beta testing group and we welcome you feedback and experiments with our data!

The wonderful world of altmetrics: why researchers’ voices matter – Jean Liu, Product Development Manager, Altmetric

I’m actually five years out of graduate school, so I have some empathy with PhD students and ECRs. I really want to go through what Altmetrics is and what measures there are. It’s not controversial to say that altmetrics have been experiencing a meteoric rise over the last few years… That is partly because we have so much more to draw upon than the traditional journal impact factors, citation counts, etc.

So, who are We have about 20 employees, founded in 2011 and all based in London. And we’ve started to see that people re receptive to altmetrics, partly because of the (near) instant feedback… We tune into the Twitter firehose – that phrase is apt! Altmetrics also showcase many “flavours” of attention and impact that research can have – and not just articles. And the signals we tracked are highly varies: policy documents, news, blogs, Twitter, post-publication peer review, Facebook, Wikipedia, LinkedIn, Reddit, etc.

Altmetrics also have limitations. They are not a replacement for peer review or citation-based metrics. They can be gamed – but data providers have measures in place to guard against this. We’ve seen interesting attempts at gamification – but often caught…

Researchers are not only the ones who receive attention in altmetrics, but they are also the ones generating attention that make up altmetrics – but not all attention is high quality or trustworthy. We don’t want to suggest that researchers should be judged just on altmetrics…

Meanwhile Universities are asking interesting questions: how an our researchers change policy? Which conference can I send people to which will be most useful, etc.

So, lets see the topic of “diabetic neuropathy”. Looking around we can see a blog, an NHS/Nice guidance document, and a The Conversation. A whole range of items here. And you can track attention over time… Both by volume, but also you can look at influencers across e.g. News Outlets, Policy Outlets, Blogs and Tweeters. And you can understand where researcher voices feature (all are blogs). And I can then compare news and policy and see the difference. The profile for News and Blogs are quite different…

How can researchers voices be heard? Well you can write for a different audience, you can raise the profile of your work… You can become that “go-to” person. You also want to be really effective when you are active – altmetrics can help you to understand where your audience is and how they respond, to understand what is working well.

And you can find out more by trying the altmetric bookmarking browser plugin, by exploring these tools on publishing platforms (where available), or by taking a look.

How to help more people find and understand your work – Charlie Rapple, Kudos

I’m sorry to be the last person on the agenda, you’ll all be overwhelmed as there has been so much information!

I’m one of the founders of Kudos and we are an organisation dedicated to helping you increase the reach and impact of your work. There is such competition for funding, a huge growth in outputs, there is a huge fight for visibility and usage, a drive for accountability and a real cult of impact. You are expected to find and broaden the audience for your work, to engage with the public. And that is the context in which we set up Kudos. We want to help you navigate this new world.

Part of the challenge is knowing where to engage. We did a survey last year with around 3000 participants to ask how they share their work – conferences, academic networking, conversations with colleagues all ranked highly; whilst YouTube, slideshare, etc. are less used.

Impact is built on readership – impacts cross a variety of areas… But essentially it comes down to getting people to find and read your work. So, for me it starts with making sure you increase the number of people reaching and engaging with your work. Hence the publication is at the centre – for now. That may well be changing as other material is shared.

We’ve talked a lot about metrics, there are very different ones and some will matter more to you than others. Citations have high value, but so do mentions, clicks, shares, downloads… Do take the time to think about these. And think about how your own actions and behaviours contribute back to those metrics… So if you email people about your work, track that to see if it works… Make those connections… Everyone has their own way and, as Nicola was saying in the Digital Footprint session, communities exist already, you have to get work out there… And your metrics have to be about correlating what happens – readership and citations. Kudos is a management tool for that.

In terms of justifying time here is that communications do increase impact. We have been building up data on how that takes place. A team from Nanyang Technological Institute did a study of our data in 2016 and they saw that the Kudos tools – promoting their work – they had 23% higher growth in downloads of full text on publisher sites. And that really shows the value of doing that engagement. It will actually lead to meaningful results.

So a quick look at how Kudos works… It’s free for researchers ( and it takes about 15 minutes to set up, about 10 minutes each time you publish something new. You can find a publication, you can use your ORCID if you have one… It’s easy to find your publication and once you have then you have page for that where you can create a plain language explanation of your work and why it is important – that is grounded in talking to researchers about what they need. For example: That plain text is separate from the abstract. It’s that first quick overview. The advantage of this is that it is easier for people within the field to skim and scam your work; people outside your field in academia can skip terminology of your field and understand what you’ve said. There are also people outside academia to get a handle on research and apply it in non-academic ways. People can actually access your work and actually understand it. There is a lot of research to back that up.

Also on publication page you can add all the resources around your work – code, data, videos, interviews, etc. So for instance Claudia Sick does work on baboons and why they groom where they groom – that includes an article and all of that press coverage together. That publication page gives you a URL, you can post to social media from within Kudos. You can copy the trackable link and paste wherever you like. The advantage to doing this in Kudos is that we can connect that up to all of your metrics and your work. You can get them all in one place, and map it against what you have done to communicate. And we map those actions to show which communications are more effective for sharing… You can really start to refine your efforts… You might have built networks in one space but the value might all be in another space.

Sign up now and we are about to launch a game on building up your profile and impact, and scores your research impact and lets you compare to others.

PANEL DISCUSSION – Laura Henderson, Editorial Program Manager, Frontiers (LH); Rachel Lammey, Crossref (RL); Jean Liu, Product Development Manager, Altmetric (JL); Charlie Rapple, Kudos (CR). 

Q1: Really interesting but how will the community decide which spaces we should use?

A1 (CR): Yes, in the Nangyang work we found that most work was shared on Facebook, but more links were engaged with on Twitter. There is more to be done, and more to filter through… But we have to keep building up the data…

A1 (LH): We are coming from the same sort of place as Jean there, altmetrics are built into Frontiers, connected to ORCID, Loop built to connect to institutional plugins (totally open plugin). But it is such a challenge… Facebook, Twitter, LinkedIn, SnapChat… Usually personal choice really, we just want to make it easier…

A1 (JL): It’s about interoperability. We are all working in it together. You will find certain stats on certain pages…

A1 (RL): It’s personal choice, it’s interoperability… But it is about options. Part of the issue with impact factor is the issue of being judged by something you don’t have any choice or impact upon… And I think that we need to give new tools, ways to select what is right for them.

Q2: These seem like great tools, but how do we persuade funders?

A2 (JL): We have found funders being interested independently, particularly in the US. There is this feeling across the scholarly community that things have to change… And funders want to look at what might work, they are already interested.

A2 (LH): We have an office in Brussels which lobbies to the European Commission, we are trying to get our voice for Open Science heard, to make difference to policies and mandates… The impact factor has been convenient, it’s well embedded, it was designed by an institutional librarian, so we are out lobbying for change.

A2 (CR): Convenience is key. Nothing has changed because nothing has been convenient enough to replace the impact factor. There is a lot of work and innovation in this area, and it is not only on researchers to make that change happen, it’s on all of us to make that change happen now.

Jo Young (JY): To finish a few thank yous… Thank you all for coming a lot today, to all of our speakers, and a huge thank you for Peter and Radic (our cameramen), to Anders, Graham and Jan for work in planning this. And to Nicola and Amy who have been liveblogging, and to all who have been tweeting. Huge thanks to CrossRef, Frontiers, F1000, JYMedia, and PLoS.

And with that we are done. Thanks to all for a really interesting and busy day!


Apr 052017
Cakes at the CIGS Web 2.0 and Metadata Event 2017

Today I’m at the Cataloguing and Indexing Group Scotland event – their 7th Metadata & Web 2.0 event – Somewhere over the Rainbow: our metadata online, past, present & future. I’m blogging live so, as usual, all comments, corrections, additions, etc. are welcome. 

Paul Cunnea, CIGS Chair is introducing the day noting that this is the 10th year of these events: we don’t have one every year but we thought we’d return to our Wizard of Oz theme.

On a practical note, Paul notes that if we have a fire alarm today we’d normally assemble outside St Giles Cathedral but as they are filming The Avengers today, we’ll be assembling elsewhere!

There is also a cupcake competition today – expect many baked goods to appear on the hashtag for the day #cigsweb2. The winner takes home a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 (list price £55).

Engaging the crowd: old hands, modern minds. Evolving an on-line manuscript transcription project / Steve Rigden with Ines Byrne (not here today) (National Library of Scotland)

Ines has led the development of our crowdsourcing side. My role has been on the manuscripts side. Any transcription is about discovery. For the manuscripts team we have to prioritise digitisation so that we can deliver digital surrogates that enable access, and to open up access. Transcription hugely opens up texts but it is time consuming and that time may be better spent on other digitisation tasks.

OCR has issues but works relatively well for printed texts. Manuscripts are a different matter – handwriting, ink density, paper, all vary wildly. The REED(?) project is looking at what may be possible but until something better comes along we rely on human effort. Generally the manuscript team do not undertake manual transcription, but do so for special exhibitions or very high priority items. We also have the challenge that so much of our material is still under copyright so cannot be done remotely (but can be accessed on site). The expected user community generally can be expected to have the skill to read the manuscript – so a digital surrogate replicates that experience. That being said, new possibilities shape expectations. So we need to explore possibilities for transcription – and that’s where crowd sourcing comes in.

Crowd sourcing can resolve transcription, but issues with copyright and data protection still have to be resolved. It has taken time to select suitable candidates for transcription. In developing this transcription project we looked to other projects – like Transcribe Bentham which was highly specialised, through to projects with much broader audiences. We also looked at transcription undertaken for the John Murray Archive, aimed at non specialists.

The selection criteria we decided upon was for:

  • Hands that are not too troublesome.
  • Manuscripts that have not been re-worked excessively with scoring through, corrections and additions.
  • Documents that are structurally simple – no tables or columns for example where more complex mark-up (tagging) would be required.
  • Subject areas with broad appeal: genealogies, recipe book (in the old crafts of all kinds sense), mountaineering.

Based on our previous John Murray Archive work we also want the crowd to provide us with structure text, so that it can be easily used, by tagging the text. That’s an approach that is borrowed from Transcribe Bentham, but we want our community to be self-correcting rather than doing QA of everything going through. If something is marked as finalised and completed, it will be released with the tool to a wider public – otherwise it is only available within the tool.

The approach could be summed up as keep it simple – and that requires feedback to ensure it really is simple (something we did through a survey). We did user testing on our tool, it particularly confirmed that users just want to go in, use it, and make it intuitive – that’s a problem with transcription and mark up so there are challenges in making that usable. We have a great team who are creative and have come up with solutions for us… But meanwhile other project have emerged. If the REED project is successful in getting machines to read manuscripts then perhaps these tools will become redundant. Right now there is nothing out there or in scope for transcribing manuscripts at scale.

So, lets take a look at Transcribe NLS

You have to login to use the system. That’s mainly to help restrict the appeal to potential malicious or erroneous data. Once you log into the tool you can browse manuscripts, you can also filter by the completeness of the transcription, the grade of the transcription – we ummed and ahhed about including that but we though it was important to include.

Once you pick a text you click the button to begin transcribing – you can enter text, special characters, etc. You can indicate if text is above/below the line. You can mark up where the figure is. You can tag whether the text is not in English. You can mark up gaps. You can mark that an area is a table. And you can also insert special characters. It’s all quite straight forward.


Q1) Do you pick the transcribers, or do they pick you?

A1) Anyone can take part but they have to sign up. And they can indicate a query – which comes to our team. We do want to engage with people… As the project evolves we are looking at the resources required to monitor the tool.

Q2) It’s interesting what you were saying about copyright…

A2) The issues of copyright here is about sharing off site. A lot of our manuscripts are unpublished. We use exceptions such as the 1956 Copyright Act for old works whose authors had died. The selection process has been difficult, working out what can go in there. We’ve also cheated a wee bit

Q3) What has the uptake of this been like?

A3) The tool is not yet live. We thin it will build quite quickly – people like a challenge. Transcription is quite addictive.

Q4) Are there enough people with palaeography skills?

A4) I think that most of the content is C19th, where handwriting is the main challenge. For much older materials we’d hit that concern and would need to think about how best to do that.

Q5) You are creating these documents that people are reading. What is your plan for archiving these.

A5) We do have a colleague considering and looking at digital preservation – longer term storage being more the challenge. As part of normal digital preservation scheme.

Q6) Are you going for a Project Gutenberg model? Or have you spoken to them?

A6) It’s all very localised right now, just seeing what happens and what uptake looks like.

Q7) How will this move back into the catalogue?

A7) Totally manual for now. It has been the source of discussion. There was discussion of pushing things through automatically once transcribed to a particular level but we are quite cautious and we want to see what the results start to look like.

Q8) What about tagging with TEI? Is this tool a subset of that?

A8) There was a John Murray Archive, including mark up and tagging. There was a handbook for that. TEI is huge but there is also TEI Light – the JMA used a subset of the latter. I would say this approach – that subset of TEI Light – is essentially TEI Very Light.

Q9) Have other places used similar approaches?

A9) TRanscribe Bentham is similar in terms of tagging. The University of Iowa Civil War Archive has also had a similar transcription and tagging approach.

Q10) The metadata behind this – how significant is that work?

A10) We have basic metadata for these. We have items in our digital object database and simple metadata goes in there – we don’t replicate the catalogue record but ensure it is identifiable, log date of creation, etc. And this transcription tool is intentionally very basic at th emoment.

Coming up later…

Can web archiving the Olympics be an international team effort? Running the Rio Olympics and Paralympics project / Helena Byrne (British Library)

I am based at the UK Web Archive, which is based at the British Library. The British Library is one of the six legal deposit libraries. The BL are also a member of the International Internet Preservation Consortium – as are the National Library of Scotland. The Content Development Group works on any project with international relevance and a number of interested organisations.

Last year I was lucky enough to be lead curator on the Olympics 2016 Web Archiving project. We wanted to get a good range of content. Historically our archives for Olympics have been about the events and official information only. This time we wanted the wider debate, controversy, fandom, and the “e-Olympics”.

We received a lot of nominations for sites. This is one of the biggest we have been involved in. There was 18 IIPC members involved in the project, but nominations also came from wider nominations. We think this will be a really good resource for those researching the events in Rio. We had material in 34 languages in total. English was the top language collected – reflecting IIPC memberships to some extent. In terms of what we collected it included Official IOC materials – but few as we have a separate archive across Games for these. But subjects included athletes, teams, gender, doping, etc. There were a large number of website types submitted. Not all material nominated were collected – some incomplete metadata, unsuccessful crawls, duplicate nominations, and the web is quite fragile still and some links were already dead when we reached them.

There were four people involved here, myself, my line manager, the two IIPC chairs, and the IIPC communications person (also based at BL). We designed a collection strategy to build engagement as well as content. The Olympics is something with very wide appeal and lots of media coverage around the political and Zika situation so we did widen the scope of collection.

Thinking about our user we had collaborative tools that worked with contributors context: Webex, Google Drive and Maps, and Slack (free for many contexts) was really useful. Chapter 8 in “Altmetrics” is great for alternatives to Google – it is important to have those as it’s simply not accessible in some locations.

We used mostly Google Sheets for IIPC member nominations – 15 fields, 6 of which were obligatory. For non members we used a (simplified) Google Form – shared through social media. Some non IIPC member organisations used this approach – for instance a librarian in Hawaii submitted lots of pacific islands content.

In terms of communicating the strategy we developed instructional videos (with free tools – Screencastomatic and Windows Movie Maker) with text and audio commentary, print summaries, emails, and public blog posts. Resources were shared via Google Drive so that IIPC members could download and redistributed.

No matter whether IIPC member or through the nomination form, we wanted six key fields:

  1. URL – free form
  2. Event – drop down option
  3. Title – free form (and English translation option if relevant)
  4. Olympic/Paralympic sport – drop down option
  5. Country – free form
  6. Contributing organisation – free form (for admin rather than archive purposes)

There are no international standards for cataloguing web archive data. OCLC have a working group looking at this just now – they are due to report this year. One issue that has been raised is the context of those doing the cataloguing – cataloguing versus archiving.

Communications are essential on a regular basis – there was quite a long window of nomination and collection across the summer. We had several pre-event crawl dates, then also dates during and after both the Olympics and the Paralympics. I would remind folk about this, and provide updates on that, on what was collected, to share that map of content collected. We also blogged the projects to engage and promote what we were doing. The Participants enjoyed the updates – it helped them justify time spent on the project to their own managers and organisations.

There were some issues along the way…

  • The trailing backslash is required for the crawler – so if there is no trailing backslash the crawler takes everything it can find – attempting all of BBC or Twitter is a problem.
  • Not tracking the date of nomination – e.g. organisations adding to the spreadsheet without updating date of nomination – that was essential to avoid duplication so that’s a tip for Google forms.
  • Some people did not fill in all of the six mandatory fields (or didn’t fill them in completely.
  • Country name vs Olympic team name. That is unexpectedly complex. Team GB includes England, Scotland, Wales and Northern Ireland… But Northern Ireland can also compete in Ireland. Palestine isn’t recognised as a country in all places, but it is in the Olympics. And there was a Refugee Team as well – with no country to tie to. Similar issues of complexity came out of organisation names – there are lots of ways to write the name of the British Library for instance.

We promoted the project with four blog posts sharing key updates and news. We had limited direct contact – mostly through email and Slack/messaging. We also had a unique hashtag for the collection #Rio2016WA – not catchy but avoids confusion with Wario (Nintendo game) – and Twitter chat, a small but international chat.

Ethically we only crawl public sites but the IIPC also have a take down policy so that anyone can request their site be removed.

Conclusions… Be aware of any cultural differences with collaborators. Know who your users are. Have a clear project plan, available in different mediums. And communicate regularly – to keep enthusiasm going. And, most importantly, don’t assume anything!

Finally… Web Archiving Week is in London in June, 12th-16th 2017. There is a “Datathon” but the deadline is Friday! Find out more at And you can find out more about the UK Web Archive via our website and blog: You can also follow us and the IIPC on Twitter.

Explore the Olympics archive at:


Q1) For British Library etc… Did you use a controlled vocabulary

A1) No but we probably will next time. There were suggestions/autocomplete. Similarly for countries. For Northern Irish sites I had to put them in as Irish and Team GB at the same time.

Q2) Any interest from researchers yet? And/or any connection to those undertaking research – I know internet researchers will have been collecting tweets…

A2) Colleagues in Rio identified a PhD project researching the tweets – very dynamic content so hard to capture. Not huge amount of work yet. I want to look at the research projects that took place after the London 2012 Olympics – to see if the sites are still available.

Q3) Anything you were unable to collect?

A3) In some cases articles are only open for short periods of time – we’d do more regular crawls of those nominations next time I think.

Q4) What about Zika content?

A4) We didn’t have a tag for Zika, but we did have one for corruption, doping, etc. Lots of corruption post event after the chair of the Irish Olympic Committee was arrested!

Statistical Accounts of Scotland / Vivienne Mayo (EDINA)

I’m based at EDINA and we run various digital services and projects, primarily for the education sector. Today I’m going to talk about the Statistical Accounts of Scotland. These are a hugely rich and valuable collection of statistical data that span both the agricultural and industrial revolutions in Scotland. The online service launched in 2001 but was thoroughly refreshed and relaunched next year.

There are two accounts. The first set was created (1791-1799) by Sir John Sinclair of Ulbster. He had a real zeal for agricultural data. There had been attempts to collect data in the 16th and 17th centuries. So Sir John set about a plan to get every minister in Scotland to collect data on their parishes. He was inspired by German surveys but also had his own ideas for his project:

“an inquiry into the state of a country, for the purpose of ascertaining the quantum of happiness enjoyed by its inhabitants, and the means of its future improvement”

He also used the word “Statistics” as a kind of novel, interesting term – it wasn’t in wide use. And the statistics in the accounts are more qualitative then the quantitative data we associate with the word today.

Sir John sent minister 160 questions, then another 6, then another set a year late so that there were 171 in total. So you can imagine how delighted they were to receive that. And the questions (you can access them all in the service) were hard to answer – asking about the wellbeing of parishioners, how their circumstances could be ameliorated… But ministers were paid by the landowners who employed their parishioners so that data also has to be understood in context. There were also more factual questions on crops, pricing, etc.

It took a long time – 8 years – to collect the data. But it was a major achievement. And these accounts were part of a “pyramid” of data for the agricultural reports. He had country reports, but also higher level reports. This was at the time of the Enlightenment and the idea was that with this data you could improve the condition of life.

Even though the ministers did complete their returns, for some it was struggle – and certainly hard to be accurate. Population tables were hard to get correct, especially in the context of scepticism that this data might be used to collect taxes or other non-beneficial purposes.

The Old Account was a real success. And the Church of Scotland commissioned a New Account from 1834-45 as a follow up to that set of accounts.

The online service was part of one of the biggest digitisation projects in Scotland in the late 1990s, with the accounts going live in 2001. But much had changed since then in terms of functionality that any user might expect. In this new updated service we have added the ability to tag, to annotate, to save… Transcriptions have been improved, the interface has been improved. We have also made it easier to find associated resources – selected by our editorial board drawn from libraries, archives, specialists on this data.

When Sir John published the Old Accounts he printed them in volumes as they were received – that makes it difficult to browse and explore those. And there can be multiple accounts for the same parish. So we have added a way to browse each of the 21 volumes so that it is easier to find what you need. Place is key for our users and we wanted to make the service more accessible. Page numbers were an issue too – our engineers provide numbering of sections – so if you look for Portpatrick – you can find all of the sections and volumes where that area occurs. Typically sections are a parish report, but it can be other types of content too – title pages, etc.

Each section is associated with a Parish – which is part of a county. And there may be images (illustrations such as coal seams, elevations of notable buildings in the parish, etc.). Each section is also associated with pages – including images of the pages – as well as transcripts and indexed data used to enable searching.

So, if I search for tea drinking… Described as a moral menace in some of the earlier accounts! When you run a search like this identifies associated sections, the related resources, and associated words – those words that often occur with the search term. For tea-drinking “twopenny” is often associated… Following that thread I found a county of forfar from 1793… And this turns out to be the slighly alarming sounding home brew…

“They make their own malt, and brew it into that kind of drink called Two-penny which, till debased in consequence of multiplied taxes, was long the favourite liquor of all ranks of people in Dundee.”

When you do look at a page like this you can view the transcription – which tends to be easier to read than the scanned pages with their flourishes and “f” instead of “s”. You can tag, annotate, and share the pages. There are lots of ways to explore and engage with the text.

There are lots of options to search the service – simple search, advanced search, and new interactive maps of areas and parishes – these use historic maps from the NLS collections and are brand new to the service.

With all these new features we’d love to hear your feedback when you do take a look at the service – do let us know how you find it.

I wanted to show an example of change and illustration here. In the old Accounts of Dumfries (Vol 5, p. 119) talks about the positive improvements to housing and the idea of “improvement” as a very positive thing. We also see an illustration from the New Accounts of old habitations and new modern house of the small tenants – but that was from a Parish owned by the Duke of Sutherland who had a notorious reputation as a brutal landlord for clearing land and murdering tenants to make these “improvements”. So, again one has to understand the context of this content.

Looking at Dumfries in the Old Accounts things looked good, some receiving poor support. The increase in industry means that by the New Accounts the population has substantially grown, as has poverty. The minister also comments on the impact of the three inns in town, the increase in poaching. Transitory population can also effect health – there is a vivid account of a cholera outbreak from 15th Sept – 27th Nov in 1832. That seems relatively recent but at that point they thought transmission was through the air, they didn’t realise it was water born until some time later.

Some accounts, like that one, are highly descriptive. But many are briefer or less richly engaging. Deaths are often carefully captured. The minister for Dumfries put together a whole table of deaths – causes of which include, surprisingly, teething. And there are also records of healthcare and healthcare costs – including one individual paying for several thousand children to be inoculated against smallpox.

Looking at the schools near us here in central Edinburgh there was free education for some poor children. But schooling mostly wasn’t free. The costs for one child for reading and writing, if you were a farm labourer, it would be a 12th of your salary. To climb the social ladder with e.g. French, Latin, etc. the teaching was far more expensive. And indeed there is a chilling quote in the New Accounts from Cadder, County of Lanark (Vol 8, P. 481) spoke of attitudes that education was corrupting for the poor. This was before education became mandatory (in 1834).

There is also some colourful stuff in the Accounts. There is a lot of witchcraft, local stories, and folk stories. One of my colleagues found a lovely story about a tradition that the last person buried in one area “manned the gates” until the next one arrived. Then one day two people died and there were fisticuffs!

I was looking for something else entirely and, in Fife, a story of a girl who set sale from Greenock, was captured by pirates, was sold into a Hareem, and became a princess in Morroco – there’s a book called The Fourth Queen based on that story.

There is an anvil known as the “Reformation Cloth” – pre-reformation there was a blacksmith thought the catholic priest was having an affair with his wife… And took his revenge by attacking the offending part of the minister on that anvil. I suspect that there may have been some ministerial stuff at play here too – the parish minister notes that “no other catholic minister replaced him” – but it is certainly colourful.

And that’s all I wanted to share today. Hopefully I’ve peaked your interest. You can browse the accounts for free and then some of the richer features are part of our subscription service. Explore the Statistical Accounts of Scotland at: You can also follow us on Twitter, Facebook, etc.


Q1) SOLR indexing and subject headings – can you say more?

A1) They used subject headings from original transcriptions. And then there was some additions made based on those.

Comment) The Accounts are also great for Wikipedia editing! I found references to Christian Shaw, a thread pioneer I was looking to build a page about. In the Accounts as she was mentioned in a witchcraft trial that is included there. It can be a really useful way to find details that aren’t documented elsewhere.

Q2) You said it was free to browse – how about those related resources?

A2) Those related resources are part of the subscription services.

Q3) Any references to sports and leisure?

A3) Definitely to festivals, competitions, events etc. As well as some regular activities in the parish.

Beyond bibliographic description: emotional metadata on YouTube / Diane Pennington (University of Strathclyde)

I want to start with this picture of a dog in a dress…. How do you feel when you see this picture? How do you think she was feeling? [people in the room guess the pup might be embarrassed].

So, this is Tina, she’s my dog. She’s wearing a dress we had made for her when we got married… And when she wears it she always looks so happy… And people, when I shared it on social media, also thought she looked happy. And that got me curious about emotion and emotional responses… That isn’t accommodated in bibliographic metadata. As a community we need to think about how this material makes us feel, how else can we describe things? When you search for music online mood is something you might want to see… But usually it’s recommendations like “this band is similar to…”. My favourite band is U2 and I get recommended Coldplay… And that makes me mad, they aren’t similar!

So, when we teach and practice ILS, we think about information as text that sits in a database, waiting for a user to write a query and get a match. The problem is that there are so many other ways that people also want to look for information – not just bibliographic information, full text, but in other areas too, like bodily – what pain means (Yates 2015); photographs, videos, music (Rasmussen Neal, 2012) – where the full text doesn’t include the search terms or keywords inherantly; “matter and energy” (Bates, 2006) – that there is information everywhere and the need to think more broadly to describe this.

I’ve been working in this area for a while and I started looking at Flickr, at pictures that are tagged “happy”. Those tend to include smiling people, holiday photos, sunny days, babies, cute animals. Relevance rankings showed “happy” more often, people engaged and liked more with happy photos… But music is different. We often want music that matches our mood… There were differences to tags and understanding music… Heavy metal sounds angy, slower or minor key music sounds sad…

So, the work I’m talking about you can also find in an article published last year.

My work was based on the U2 song, Song for Someone. And there are over 150 fan videos created for this song.. And if I show you this one (by Dimas Fletcher) you’ll see it is high production values… The song was written by Bono for his wife – they’ve been together since they were teenagers, and it’s very slow and emotional, and reminisces about being together. So this video is a really different interpretation.

Background to this work, and theoretical framework for it, includes:

  • “Basic emotions” from cognition, psychology, music therapy (Ekman, 1992)
  • Emotional Information Retrieval
  • omains of fandom and aca-fandom (Stein & Busse, 2009; Bennett, 2014)
  • Online participatory culture, such as writing fan fiction or making cover versions of videos for loves songs (Jenkins, 2013)
  • U2 acadeic study – and
  • Intertexuality as a practic in online participatory culture (Varmacelli 2013?)

So I wanted to do a discourse analysis (Budd & Raber 1996, Iedema 2003) applied to intertextuality. And I wanted to analyse the emotional information conveyed in 150 YouTUbe cover videos of U2’s Song for Someone. And also a quantitative view of views, comments, likes and dislikes – indicating response to them.

The producers of these videos created lots of different types of videos. Some were cover versions. Some were original versions of the song with new visual content. Some were tutorials on how to play the song. And then there were videos exhibiting really deep personal connections with the song.

So the cover versions are often very emotional – a comment says that. That emotion level is metadata. There are videos in context – background details, kids dancing, etc. But then some are filmed out of a plane window. The tutorials include people, some annotated “kareoke piano” tutorials…

Intertextuality… You need to understand your context. So one of the videos shows a guy in a yellow cape who is reaching and touching the Achtung Baby album cover before starting to sing. In another video a person is in the dark, in shadow… But here Song for Someone lyrics and title on the wall, but then playing and mashing up with another song. In another video the producer and his friend try to look like U2.

Then we have the producers comments and descriptions that add greatly to understanding those videos. Responses from consumers – more likes than dislikes; almost all positive comments – this is very different from some Justin Bieber YouTube work I did a while back. You see comments on the quality of the cover, on the emotion of the song.

The discussion is an expression of emotion. The producers show tenderness, facial expressions, surrounds, music elements. And you see social construction here…

And we can link this to something like FRBR… U2 as authoritative version, and FRBR relationships… Is there a way we can show the relationship between Songs of Innocence by William Blake, Songs of Innocence as an album, cover versions, etc.

As we move forward there is so much more we need to do when we design systems for description that accommodate more than just keywords/bibliographic records. There is no full text inherent in a video or other non-textual document – an indexing problem. And we need to account for not only emotion, but also socially constructed and individually experienced emotional responses to items. Ultimate goal – help people to find things in meaningful ways to even potentially be useful in therapies (Hanser 2010).


Q1) Comment more than a question… I work with film materials in the archive, and we struggle to bring that alive, but you do have some response from the cataloguer and their reactions – and reactions at the access centre – and that could be part of the record.

A1) That’s part of archives – do we need it in every case… Some of the stuff I study gets taken down… Do we need to archive (some of) them?

Q1) Also a danger that you lose content because catalogue records are not exciting enough… Often stuff has to go on YouTube to get seen and accessed – but then you lose that additional metadata…

A1) We do need to go where our audience is… Maybe we do need to be on YouTube more… And maybe we can use Linked Data to make things more findable. Catalogue records rarely come up high enough in search results…

Q2) This is a really subjective way to mark something up… So, for instance, Songs of Innocence was imposed on my iPhone and I respond quite negatively to that… How do you catalogue emotion with that much subjectivity at play?

A2) This is where we have happy songs versus individual perspectives… Most people think The Beatles’ Here Comes the Sun is mostly seen is happy… But if someone broke up with you during it…  How do we build into algorithms to tune into those different opinions..

Q3) How do producers choose to tag things – the lyrics, the tune, their reaction… But you kind of answered that… I mean people have Every Breath You Take by the Police as their first song at a wedding but it’s about a jilted lover stalking his ex…

A3) We need to think about how we provide access, and how we can move forward with this… My first job was in a record store and people would come in and ask “can I buy this record that was on the radio at about 3pm” and that was all they could offer… We need those facets, those emotions…

Q4) I had the experience of seeing quite a neutral painting but then with more context that painting meant something else entirely… So how do we account for that, that issue of context and understanding of the same songs in different ways…

A4) There isn’t one good solution to that but part of the web 2.0 approach is about giving space for the collective and the individual perspective.

Q5) How about musical language?

A5) Yeah.. I took an elective on musical librarianship. My tutor there showed me the tetrachords in Dido & Aeneid as a good example of an opera that people respond in very particular ways. There are musical styles that map to particular emotions.

Our 5Rights: digital rights of children and young people / Dev Kornish, Dan Dickson, Bethany Wilson (5Rights Youth Commission)

We are from Young Scot and Young Scot

1 in 5 young people have missed food or sleep because of the internet.

How many unemployed young people struggle with entering work due to the lack of digital skills? It’s 1 in 10 who struggle with CVs, online applications, and jobs requiring digital skills.

How young do people start building their digital footprint? Before birth – an EU study found that 80% of mothers had shared images, including scans, of their children.

Bethany: We are passionate about our rights and how our rights can be maintained in a digital world. When it comes to protecting young people online it can be scary… But that doesn’t mean we shouldn’t use the internet or technology, when used critically The 5Rights campaign aims to do ensure we have that understanding.

Dan: The UNCRC outlines rights and these are: the right to remove; the right to know – who has your data and what they are doing with it; the right to safety and support; the right to informed and conscious use – we should be able to opt out or remove ourselves if we want to; right to digital literacy – to use and to create.

Bethany: Under the right to remove, we do sometimes post things we shouldn’t but we should be able to remove things if we want to. In terms of the right to know – we don’t read the terms and conditions but we have the right to be informed, we need support. The right to safety and support requires respect – dismissing our online life can make us not want to talk about it openly with you. If you speak to us openly and individually then we will appreciate your support but restrictions cannot be too restrictive. Technology is designed to be addictive and that’s a reality we need to engage with. Technology is a part of most aspects of our lives, teaching and curriculum should reflect that. It’s not just about coding, it’s about finding information, and to understand what is reliable, what sources we can trust. And finally you need to listen to us, to our needs, to be able to support us.

And a question for us: What challenges have you encountered when supporting young people online? [a good question]

And a second question: What can you do in your work to realise young people’s rights in the digital world?

Q1) What digital literacy is being taught in schools right now?

A1) It’s school to school, depends on the educational authority. Education Scotland have it as a priority but only over the last year… It depends…

Q2) My kid’s 5 and she has library cards…

Comment) The perception is that kids are experts by default

A2 – Dan) That’s not the case but there is that perception of “digital natives” knowing everything. And that isn’t the case…

Dan: Do you want to share what you’ve been discussing?

Comment: It’s not just an age thing… Some love technology, some hate it… But it’s hard to be totally safe online… How do you protect people from that…

Dan: It is incredibly difficult, especially in education.

Comment [me]: There is a real challenge when the internet is filtered and restricted – it is hard to teach real world information literacy and digital literacy when you are doing that in an artificial school set up. That was something that came up in the Royal Society of Edinburgh Digital Participation Inquiry I was involved in a few years ago. I also wanted to add that we have a new MOOC on Digital Footprints that is particularly aimed at those leaving school/coming into university.

Bethany: We really want that deletion when we use our right to remove to be proper deleted. We really want to know where our data is held. And we want everyone to have access to quality information online and offline. And we want to right to disengage when we want to. And we want digital literacy to be about more than just coding, but also what we do and can do online.

Dan: We invite you all to join our 5Rights Coalition to show your support and engagement with this work. We are now in the final stages of this work and will be publishing our report soon. We’ve spoken to Google, Facebook, Education Scotland, mental health organisations, etc. We hope our report will provide great guidance for implementing the 5Rights.

You can find out more and contact us:, #5RightsYC,


Q1) Has your organisation written any guidance for librarians in putting these rights into action?

A1) Not yet but that report should include some of that guidance.

Playing with metadata / Gavin Willshaw and Scott Renton (University of Edinburgh)

Gavin: Scott and I will be talking about our metadata games project which we’ve been working on for the last few years. My current focus is on PhD digitisation but I’m also involved in this work. I’ll give an overview, what we’ve learned… And then Scott will give more of an idea of the technical side of things.

A few years ago we had 2 full time photographers working on high quality digital images. Now there are three photographers, 5 scanning assistants, and several specialists all working in digitisation. And that means we have a lot more digital content. A few years ago we launched which is the one stop shop into our digital collections. You can access the images at: We have around 30k images, and most are CC BY licenced at high resolution.

Looking at the individual images we tend to have really good information of the volume the image comes from, but prior to this project we had little information on what was actually in the image. That made them hard to find. We didn’t really have anyone to catalogue this. A lot of these images are as much as 10 years old – for projects but not neccassarily intended to go online. So, we decided to create this game to improve the description of our collections…

The game has a really retro theme – we didn’t want to spend too long on the design side of things, just keep it simple. And the game is open to everyone.

So, stage 1: tag. You harvest initial tags, it’s an open text box, there is no quality review, and there are points for tags entered. We do have some safety measures to avoid swear or stop words.

Stage 2: vote. You vote on the quality of others’ tags. It’s a closed system – good/bad/don’t know. That filters out any initial gobbldegook. You get points…

The tags are QAed and imported into our image management system. We make a distinction between formal metadata and crowdsourced tags. We show that on the record and include a link to the tool – so others can go and play.

We don’t see crowdsourcing as being just about free labour, but about communities of people with an interest and knowledge. We see it as a way to engage and connect with people beyond the usual groups – members of the public, educators, anyone really. People playing the game range from 7 to 70’s and we are interest to have the widest audience possible. And obviously the more people use the system, the more tags and participation we get. We also get feedback for improvements – some features in the game came from feedback. In theory it frees up staff time, but it takes time to run. But it lets us reach languages, collections, special knowledge that may not be in our team.

To engage our communities we took the games on tour across our sites. We’ve also brought the activity into other events – Innovative Learning Week/Festival of Creative Learning; Ada Lovelace Day; exhibitions – e.g. the Where’s Dolly game that coincided with the Towards Dolly exhibition. Those events are vital to get interest – it doesn’t work to expect people to just find it themselves.

In terms of motivation people like to do something good, some like to share their skills, and some just enjoy it because it is fun and a wee bit competitive. We’ve had a few (small) prizes… We also display real time high scores at events which gets people in competitive mode.

This also fits into an emerging culture of play in Library and Information Services… Looking at play in learning – it being ok to try things whether or not they succeed. These have included Board Game Jam sessions using images from the collections, learning about copyright and IP in a fun context. Ada Lovelace day I’ve mentioned – designing your own Raspberry Pi case out of LEGO, Making music… And also Wikipedia Editathons – also fun events.

There is also an organisatoin called Tiltfactor who have their own metadata games looking at tagging and gaming. They have Zen Tag – like ours. But also Nextag for video and audio. And also Guess What! a multiplier game of description. We put about 2000 images into the metadatagames platform Tiltfactor run and got huge numbers of tags quickly. They are at quite a different scale.

We’ve also experimented with Lady Grange’s correspondence in the Zooniverse platform, where you have to underline or indicate names and titles etc.

We’ve also put some of our images into Crowdcrafting to see if we can learn more about the content of images.

There are Pros and Cons here…


  • Hosted service
  • Easy to create an account
  • Easy to set up and play
  • Range of options – not just tagging
  • Easy to load in images from Dropbox/Flickr


  • Some limitations of what you can do
  • Technical expertise needed for best value – especially in platforms like Crowdcrafting.

What we’ve learned so far is that it is difficult to create engaging platform but combining with events and activities – with target theme and collections – work well. Incentives and prizes help. Considerable staff time is needed. And crowdsourced tags are a compliment rather than an alternative to the official record.

Scott: So I’ll give the more technical side of what we’ve done. Why we needed them, how we built them, how we got on, and what we’ve learned.

I’ve been hacking away at workflows for a good 7 years. We have a reader who sees something they want, and they request the photograph of the page. They don’t provide much information – just about what is needed. These make for skeleton records – and we now have about 30k of these. It also used to be the case that buying a high end piece of kit can be easier to buy in for a project than a low level cataloguer… That means we end up with data being copied and pasted in by photographers rather than good records.

We have all these skeletons… But we need some meat on our bones… If we take an image from the Incunabula we want to know that there’s a skeleton on a horse with a scyth. Now the image platform we have does let us annotate an image – but it’s hidden away and hard to use. We needed something better and easier. That’s where we came up with an initial front end. When I came in it was a module for us to use. It was Gavin that said “hey, this should be a game”. So the nostalgic computer games thing is weirdly appealing (like the Google Maps Pacman Aprils Fool!). So it’s super simple, you put in a few words…

And it is truly lo-fi. It’s LAMP (Linux, Apache, MySQL, PHP) – not cool! Front end design retrofit. Authentication added to let students and staff login. In terms of design decisions we have a moderation module, we have a voting module, we have a scoreboard, we have stars for high contributors. And now more complex games: set no of items, clock, featured items, and Easter Eggs within the game. For instance in the Dolly the Sheep game we hid a few images with hideous comic sans that you could stumble upon if you tagged enough images!

Where we do have moderation, voting module, thresholds, demarcation… Tiltfactor told us we’re the only library putting data back in from the crowd to our system – people are really nervous about this but we demarcate it really carefully.

We now have a codebase we can clone. We skin it up differently for particular events or exhibitions – like Dolly – but it’s all the same idea with different design and collections. This all connects up through (authenticated) APIs back into the image management system (Luna).

So, how have we gotten on?

  • 283 users
  • 34070 tags in system
  • 15616 tags from our game
  • 18454 tags from Tiltfactor metadata games pushed in
  • 6212 tags pushed back into our system – that’s because of backlog in the moderation (upvotes may be good enough).

So, what next? Well we have MSc projects coming up. We are having a revamp with an intern signed up for the summer – responsiveness, links to social media, more gamification, more incentives, authentication for non UoE users, etc.

And also we are excited about IIIF – about beautification of websites with embedded viewers, streamlining (thumbnails through URL; photoshopping through URL etc) and annotations. You can do deep zoom into images without having to link out to do that with an image.

We also have the Polyglot Project – coming soon – which is a paleography project for manuscripts in our collections of any age, in any language. We asked an intern to find a transcription and translation module using IIIF. She’s come up with something fantastic… Ways to draw around text, for users to add in annotations, to discuss annotations, etc. She’s got 50-60 keyboards so almost all languages supported. Not sure how to bring back into core systems but really excited about this.

That’s basically where we’ve gotten to. And if you want to try the games, come and have a play.


Q1) That example you showed for IIIF tagging has words written in widely varied spellings… You wouldn’t key it in as written in the document.

A1 – Scott) We do have a project looking at this. We have a girl looking for dictionaries to find variance and different spellings.

A1 – Gavin) There are projects like Transcribe Bentham who will have faced that issue…

Comment – Paul C) It’s a common issue… Methods like fuzzy searching help with that…

Q2) I’m quite interested about how you identify parts of images, and how you feed that back to the catalogue?

A2 – Scott) Right now I think the scope of the project is… Well it will be interesting to see how best to feed into catalogue records. Still to be addressed.

Q3 – Paul C) You built this in-house… How open is it? Can others use it?

A3 – Gavin) It is using Luna image management system…

A3 – Scott) It’s based on Luna for derivatives and data. It’s on Github and it is open. The website is open to everyone. You login through EASE – you can join as an “EASE Friend” if you aren’t part of the University. Others can use the code if they want it…

And finally it was me up to present…

Managing your Digital Footprint : Taking control of the metadata and tracks and traces that define us online / Nicola Osborne (EDINA)

Obviously I didn’t take notes on my session, but you can explore the slides below:

Look out for a new blogpost very soon on some of the background to our new Digital Footprint MOOC, which launched on Monday 3rd April. You can join the course now, or sign up to join the next run of the course next month, here:

And with that the event drew to a close with thank you’s to all of the organisers, speakers, and attended!


 April 5, 2017  Posted by at 11:08 am Events Attended, LiveBlogs, Presentation and Performance Tagged with:  No Responses »