May 162018

Today I am at the Digital Scholarship Day of Ideas, organised by the Digital Scholarship programme at University of Edinburgh. I’ll be liveblogging all day so, as usual, I welcome additions, corrections, etc. 

Welcome & Introduction – Melissa Terras, Professor of Digital Cultural Heritage, University of Edinburgh

Hi everyone, it is my great pleasure to welcome you to the Digital Day of Ideas 2018 – I’ve been on stage here before as I spoke at the very first one in 2012. I am introducing the day but want to give my thanks to Anouk Lang and Professor James Loxley for putting the event together and their work in supporting digital scholarship. Today is an opportunity to focus on digital research methods and work.

Later on I am pleased that we have speakers from sociology and economic sociology, and the nexus of that with digital techniques, areas which will feed into the Edinburgh Futures Institute. We’ll also have opportunity to talk about the future of digital methods, and particularly what we can do here to support that.

Lynn Jameson – Introduction

Susan Halford is professor of sociology but also director of the institution-wide Web Science Institute.

Symphonic Social Science and the Future of Big Data Analytics – Susan J Halford, Professor of Sociology & Director of Web Science Institute, University of Southampton

Abstract: Recent years have seen ongoing battles between proponents of big data analytics, using new forms of digital data to make computational and statistical claims about the social world, and many social scientists who remain sceptical about the value of big data, its associated methods and claims to knowledge. This talk suggest that we must move beyond this, and offers some possible ways forward. The first part of the talk takes inspiration from a mode of argumentation identified as ‘symphonic social science’ which, it is suggested, offers a potential way forward. The second part of talk considers how we might put this into practice, with a particular emphasis on visualisation and the role that this could play in overcoming disciplinary hierarchies and enabling in-depth interdisciplinary collaboration.

It’s a great pleasure to be here in very sunny Edinburgh, and to be speaking to such a wide ranging audience. My own background is geography, politics, english literature, sociology and in recent years computer sciences. That interdisciplinary background has been increasingly important as we start to work with data, new forms of data, new types of work with data, and new knowledge – but lets query that – from that data. All this new work raises significant challenges especially as those individual fields come from very different backgrounds. I’m going to look at this from the perspective of sociology and perhaps the social sciences, I won’t claim to cover all of the arts and humanities as well.

My talk today is based on work that I have been doing with Mike Savage on “big data” and the new forms of practice emerging around these new forms of data, and the claims being made about how we understand the social world. In this world there has been something of a stand off between data scientists and social scientists. Chris Anderson (in 2008), a writer for Wired, essentially claimed “the data will speak for itself” – you won’t need the disciplines. Many have pushed back hard on this. The push back is partly methodological: these data do not capture every aspect of our lives, they capture partial traces, often lacking in demographic detail (do we care? sociologists generally do…) and we know little of its promise. And it is very hard to work with this data without computational methods – tools for pattern recognition generally, not usually thorough sociological approaches. And present concerning, something ethically problematic, results that are presented as unproblematic. So, this is highly challenging. John Goldthorpe says “whatever big data may have for “knowing capitalism” it’s value to social science has… remained open to questions…”.

Today I want to move beyond that stand out. The divisiveness and siloing of disciplines is destructive for the disciplines – it’s not good for social science and it’s not good for big data analytics either. From a social science perspective, that position marginalises social sciences, sociology specifically, and makes us unable to take part in this big data paradigm which – love it or loathe it – has growing importance, influence, and investment. We have to take part in this for three major reasons: (1) it is happening anyway – it will march forward with or without it; (2) these new data and methods do offer new opportunities for social sciences research and; (3) we may be able to shape big data analytics as the field emerges – it is very much in formation right now. It’s also really bad for data science not to engage with the social sciences… Anderson and others made these claims ten years ago… Reality hasn’t really shown that happen. In commercial contexts – recommendations, behaviour tracking and advertising, the data and analysis is doing that. But in actually drawing understanding from the world, it hasn’t really happened. And even the evangelists have moved on… Wired itself has moved to saying “big data is a tool, but should not be considered the solution”. Jeff Hammerbacker (co-credited for coining the term “data science” in 2008, said in 2013 “the best minds of my generation are thinking about how to make people click ads… that sucks”.

We have a wobble here, a real change in the discourse. We have a call for greater engagement with domain experts. We have a recognition that data are only part of the picture. We need to build a middle ground between those two positions of data science and social science. This isn’t easy… It’s really hard for a variety of reasons. There are bodies buried here… But rather than focus on that, I want to focus on how we take big steps forward here…

The inspiration here are three major social science projects: Bowling Alone (Robert Putnam); The Spirit Level – Richard Wilkinson and Kate Pickett; Capital – Thomas Piketty. These projects have made huge differences, influencing public policy and in the case of Bowling Alone, really reshaped how governments make policy. These aren’t by sociologists. They aren’t connected as such. The connection we make in our paper is that we see a new style of social science argumentation – and we see it as a way that social scientists may engage in data analytics.

There are some big similarities between these books. They are all data driven. Think about sociologists at the end of 20th century was highly theoretical… At the beginning of the 21st century we see data driven works. And they haven’t done their own research generating data here, they have drawn on existing research data. Piketty has drawn together diverse tax data… But also Jane Austen quotes… Not just mixed methods but huge repurposing. These books don’t make claims for causality based on data, their claims for causality is supported by theory. However they present data throughout and supporting their arguments. Data is key, with images to hold the data together. There is a “visual consistency”. The books each have a key graph that essentially summarises the book. Putnam talks about social capital, Piketty talks about the rise and fall of wealth inequality in the 20th century.

In each of these texts data, method and visualisation are woven into a repeat refrain, combined with theory as a composite whole to makes powerful arguments about the nature of social life and social change over the long term. We call this a “Symphonic Aesthetic” as different instruments and refrains build, come in and go… and the whole is greater than the sum of the parts.

OK, thats an observation about the narrative… But why does that matter? We think it’s a way to engage with and disrupt big data. There are similarities: re-purposing multiple and varied “found” data sources; an emphasis on correlation; use of visualistion. There are differences too: theoretical awareness; choice of data; temporality is different – big data has huge sets of data looking at tiny focused and often real time moments. Social Science takes long term comparisons – potentially over 100 years. The role of correlation is different. Big data analytics looks for a result (at least in the early stage), in symphonic aesthetics there is a real interest in correlation through statistical and theoretical understandings. Practice of visualisation varies as well. In big data it is the results, in symphonic aesthetics it is part of the process, not the end of the process.

Those similarities are useful but there is much still to do: symphonic authors do not use new forms of digital data, their methods cannot simply be applied, big data demand new and unfamiliar skills and collaborations. So I want to talk about the prospective direction of travel around data; method; theory; visualisation practice.

So, firstly, data. If we talk about symphonic aesthetics we have to think about critical data pragmatism. That is about lateral thinking – redirection of what data exist already. And we have to move beyond naivety – we cannot claim they are “naturally occurring” mirrors/telescopes etc. They are deliberately social-technical constructions. And we need to understand what the data are and what they are not: socio-technical processes of data construction (eg carefully constructed samples); understanding and using demographic biases (go with the biases and use the data as appropriate, rather than claiming they are representative; or maybe ignore that, look at network construction, flows, mobilities – e.g. John Murrey’s work).

Secondly method. We have to be methodologically plural. Normally we do mixed methods – some quantitative, some qualitative. But most of us aren’t yet trained for computational methods, and that is a problem. Many of the most interesting things about these data – their scale, complexity etc. – are not things we can accommodate in our traditional methods. We need to extend our repertoire here. So social network analysis has a long and venerable history – we can apply the more intensive smaller version of large scale social network analysis. But we also need machine learning – supervised (with training sets) and unsupervised (without). This allows you to seek evidence of different perhaps even contradictory patterns. But also machine learning can help you find the structures and patterns in the data – which you may well not know in data sets at this scale.

We have this quote from Ari Goldberg (2015): “sociologists often round up the usual suspects. They enter the metaphorical crime scene every dat, armed with strong and well-theorised hypotheses about who the murderer should or at least plausibly might be.”

To be very clear I am not suggesting we outsource analysis to computational methods: we need to understand what the methods are doing and how.

Thirdly, theory. We have to use abductive reasoning – a constant interplay between data, method and theory. Initial methods may be informed by initial hunches, themes, etc. We might use those methods to see if there is something interesting there… Perhaps there isn’t, or perhaps you build upon this. That interplay and iterative process is, I suspect, something sociologists already do.

So, how do we bring this all together in practice? Most sociologists do not have a sophisticated understanding of the methods; and most computer scientists may understand the methods but not the theoretical elements. I am suggesting something end to end, with both sociologists and computer scientists working together.

It isn’t the only answer but I am suggesting that visualisation becomes an analytical method, rather than a “result”. And thinking about a space for work where both sociological and computer science expertise are equally valid rather than combatorial. At best visualisations are “instruments for reasoning about quantitative information. Often the most effective way to describe, explore and summarise a set of numbers – even a very large set – is to look at pictures of those numbers” (Tufte 1998). Visualisations as interdisciplinary boundary objects. Beyond a mode of argumentation… visualisation becomes a mode of practice.

An example of this was a visualisation of the network of a hashtag that was collaborative with my colleague Ramin, which developed over time as we asked each other questions about how the data was presented and what that means…

In conclusion, sociology flourished in the C20th. Developing methods, data and theory that gave us expertise in “the social” (a near monopoly). This is changing – new forms of data, new forms of expertise… And claims being made which we may, or may not, think are valid. And that stands on the work of sociologists. But there is some promise in the idea of symphonic aesthetic: for data science – data science has to be credible and there is recognition of that – see for instance Cathy O’Neil’s work on data science, “Weapons of Math Destruction” which also pushes in this direction. ; for sociological research – but not all of it, these won’t be the right methods for everyone; for public sociology – this being used in lots of ways already, algorithm sentencing debates, Cambridge Analytics… There is a real place for sociologists to reshape sociology in the public understanding. There are big epistemological implications here… Changing the data and methods changes what we study… But it has always been like that. Big data can do something different – not necessarily better, but different.


Q1) I was really interested in your comments about visualisations as a method… Joanna Drucker talks about visual technology and visual discourse – and issues of visualisations as being biased towards positivistic approaches, and advocates for getting involved in the design of visualisation tools.

A1) I’m familiar with these concepts. That work I did with Ramin is early speculative work… But it builds and is based on classic social network analysis so yes, I agree, that reflects some issues.

Q2 – Tim Squirrel) I guess my question is about the trade off between access and making meaningful critiques. Often sociology is about critiquing power and methods by which power is transmitted. The more data proliferates, the more the data is locked behind doors – like the kind of data Facebook holds. And in order to access that data you ahve to compromise the kinds of critiques you can make. How do you navigate that narrow channel, to make critiques without compromising those…

Q2) The field is quite unsettled… It looks settled a year ago but I think Cambridge Analytica will have major impact… That may make the doors more closed… Or perhaps we will see these platforms – for instance Facebook – understanding that to retain credibility it has to create a segregation between their own use of the data, and research (not funded by Facebook), so that there is proper separation. But I’m not naive about how that will work in practice… Maybe we have to tread a careful line… And maybe that does mean not being critical in all the ways we might be, in every paper. Empirical data may help us make critical cases across the diverse range of scholarship taking place.

Q3 – Jake Broadhurst) Data science has been used in the social world already, how do we keep up and remain relevant?

A3) It is a pressing challenge. The academy does not have the scale or capacity to address data science in the way the private sector does. One of the big issues is ethics… And how difficult it is for academics to navigate ethics of social media and social data. And it is right that we are bound to ethical processes in a way data scientists and even journalists do not need to. But it is also absolutely right that our ethics committees have to understand new methods, and the realities of the gold standard consent and other options where that is not feasible.

The discussion we are having now, in the wake of Cambridge Analytica, is crucial. Two years ago I’d ask students what data they felt was collected, they just didn’t know. And understanding that is part of being relevant.

Q4 – Karen Gregory) If you were taking up a sociology PhD next year, how would you take that up?

A4) My official response would be that I’d do a PhD in Web Science. We have a programme at University of Southampton, taking students from a huge array of backgrounds, and giving them all the same theoretical and methodological backgrounds. They then have to have 2 supervisors, from at least 2 different disciplines for their PhD.

Q5 – Kate Orton Johnson) How do we tackle the structures of HE that prevent those interdisciplinary projects, creating space, time, collaborative push to create the things that you describe?

A5) It’s a continuous struggle. Money helps – we’ve had £10m from EPSRC and that really helps. UKRI could help – I’m sceptical but hopeful about interdisciplinary possibilities here. Having PhD supervision across really different disciplines is a beautiful thing, you learn so much and it leads to new things. Universities talk about interdisciplinary work but the reality doesn’t always match up. Money helps. Interdisciplinary research helps. Collaboration on small scales – conference papers etc. also help.

Q6 – David, research in AI and Law) I found your comments about dialogues between data scientists and social scientists… How can you achieve similar with law scholars and data scientists… Especially if trying to avoid hierachichal issues. Law and data science is a really interesting space right now… GDPR but also algorithmic accountability – legal aspects of equality, protected categories, etc. Very few users of big data have faced up to the risks of how they use the data, and potential for legal challenge on the basis of discrimination. You have to find joint enthusiasm areas, and fundable areas, and that’s where you have to start.

The Economics Agora Online: Open Surveys and the Politics of Expertise – Tod van Gunten, Lecturer in Economic Sociology, University of Edinburgh

Abstract: In recent years, research centres in both the United States and United Kingdom have conducted open online surveys of professional economists in order to inform the public about expert opinion.  Media attention to a US-based survey has centred on early research claiming to show a broad policy consensus among professional economists.  However, my own research shows that there is a clear alignment of political ideology in this survey.  My talk will discuss the value and limitations of these online surveys as tools for informing the public about expert opinion.

Thank you for the invitation to speak today, and for Susan’s great and inspiring talk. I wouldn’t claim the label “symphonic” for this talk, but I think there is something of that spirit in this talk. This project is based on found and repurposed data. It isn’t particularly “big” data… But the “found” aspect of the data raises profound questions. Data never holds the answers on its own, it is always crucial to understand method and context. Visualisation is a big part of this. And it about public sociology – so it hasn’t just been published in journals but in popular press as well.

I am an economist who studies economists as a sociological object in their own right. So, this is a famous moment in 2008 when the Queen, during the midst of the largest global financial crisis since 1929, asked an economist “why did nobody notice it”. Because she is the queen, the British Academy convened a panel to respond to this question. And they said that lots of people did a good job, but it was no-one had it as their job to put everything together. Meanwhile with Brexit we’ve seen economists as a profession receiving substantial criticism.

Economists are hugely influential, we study them because it is the politics of expertise. It is the most politically influential social science. So, I’m going to talk about properties we would like politically influential experts to have:

  1. A high level of professional consensus within the the relevant community of experts. Gold standard here is climate science. If we have a community of experts that all agree, there seems to be a need for action. That’s a good principle.
  2. Form policy opinions independently of their own political ideology. We will receive and have confidence in advice from an independent expert more than someone presenting their own views.
  3. Acknowledge professional debate in expressing their views. That they acknowledge that issues are not settled issues.

So in this paper I want to look at how we may use data to measure these aspects. And I’mm be going through some theory around the cultural structure of belief spaces and how this relates to data, big data in the context of economics – but this theory can be used in other contexts as well).

I want to open on the “economics agora” online. I want to talk about two surveys here – these are open online surveys of economists since the financial crisis. It is no coincidence that these have emerged at this time. These surveys are in the UK and in the USA. And unusually the results include publishing the full responses, and the names of the responders – by their consent. These are famous/well known individuals in their field. This allows us to do more… Bring in data that is not in the survey – the CVs of the respondents for instance so including universities, political activities, their co-authorship network, etc. The survey organisers’ goal is to inform the public, but finding patterns in the data requires aggregation and analysis. This isn’t just individual responses, but understanding the context of the data. And again, this isn’t big data, this is quite small data. But these approaches apply to big data too.

So one of these surveys is the Chicago Booth IGM Economic Experts Panel. Each month they put a question to 40 economists about some issue of the moment – the impact of autonomous cars for instance. The second survey is the Centre for Math and Economics, based in London, and again they ask a panel for responses. Typically the UK/European survey shows much more disagreement than the US survey.

There are a lot of issues with these surveys: they are small (the UK/EU one is expanding) and non-random samples; deliberately elitist samples (US survey – “top 7” economics departments in US universities, mainly Ivy League) – why would you take this sample? Well you wouldn’t really… But you have very high status economists. The UK survey has a much wider range in its samples. I think these surveys are great… But I think they should do a better job! Another problem is that you have a high rate of “softball” questions – in the US survey, not in the UK/EU surveys. For instance “imposing new US tariffs on steel and alumnium will improve Americans’ welfare” – it’s timely but we already know that there is high consensus here. We need to ask harder questions! And finally we need to think about the motivations of the people who produce the data – the survey designers are looking to raise the profile of the profession. In a Wall Street Journal the designers of the US survey talked about wanting to counteract the idea of a lack of consensus in the field – and they are the ones asking the questions.

Gordon and Dahl (2013) looked at views and consensus in the field based on the surveys. They presented this as being a “remarkably high degree of consensus” and little variance across schools and departments. And thus look at how influential this field should be. This got big pick up… the Washington Post picked it up. Nobel winning economist Paul Krugman picked this up in his opinion column in the Economist. He is on record (New York Times 2009) as saying pretty much the opposite – that there is polarisation between the “saltwater” economists in the Keynesian camp, and the “freshwater” economists who are very much the opposite.

So, a bit of theory… What do we mean by consensus, polarisation, factions etc? How do groups of people structure their belief systems? We do have twenty years of literature and theory here around understanding belief systems. This goes back to political scientists in the 1960s. Philip Converse (1964) found that most american voters do not adhere to a coherent political ideology – this is still the case. Their believe systems are disorganised or “unconstrained” – so one belief does not let you predict another belief. So for instance comparing a belief that you should “reduce immigration” and “reduce corporate tax” – could show little correlation, those beliefs don’t automatically go together. Now if you are a voter in the UK in 2018 there probably is more alignment. That pattern is a “constrained or aligned” correlation. If you look at polarisation you see clusters of correlation.

So, that paper on economists looks for clusters. I looked at polarisation to look at latent ideology, noting partisanship (where known involvement in e.g. being part of political left or right leaning think tanks etc. – or marked as “none”), current department (freshwater vs saltwater) and belief dimension. Unsurprisingly those involved in Republican/conservative organisations and those with backgrounds in democratic/liberal organisations were very different, leaning right and left respectively. This is the same data that generated that paper that showed consensus and little variance.

There is a high degree of consensus in this survey but you can also see idealogical alignment. That can be consistent. But it depends on what you think, and what you ask. The UK survey – more recently expanded to Europe – shows much less consensus. This could mean there is more consensus in the US than in Europe; but it could also mean that the questions being asked in the UK survey are harder questions. The UK survey asks very complex questions… e.g. “Do you agree that, in a period of great uncertainty and after a prolonged period of weak real wage growth, monetary policy makers can afford to wait for greater certainty about real wage developments and building inflationary pressure before raising interest rates?”. So, you can’t measure consensus without a comparison with another group. You can see consensus on a question, not of a group/community or set of beliefs.

So, looking at a recent UK/EU survey on  looking at anti-establishment vs monetary conservatism you can see a diversity of views here.

So, back to those qualities. Professional consensus is harder to measure than it first appears.

One of the questions respondents are asked to give is their vote and their level of confidence. So, when experts give an opinion on hot topics you’d really want a low confidence score to show you don’t have a partisan respondent on your hands. Looking at the data here in the US surveys we see a lot of overly confident responses. Respondents with a stronger idealogical disposition (aligned belief structure) exhibit systematic overconfidence. In general, across all questions, when asked politically salient questions they state higher confidence than questions with little/no political salience.

By way of conclusion… Am I joining ranks with Michael Gove “people in this country have had enough of experts”? No. I would say something more nuanced. Arguably professions in general, economists in particular, has lost political legitimacy, then professional over-reach (“look how much consensus we have”) is not the answer. Claiming consensus where none exists is over-reach. Transparency about professional debate is always better than overstating consensus. Political legitimacy is a scarce resource and should be treated as such.

The economics agora online is a useful tool for studying the beliefs of an important community of experts… but survey designers should up their game. If you want an “unbiased” expert, chose someone whose belief structure is unconstrained. You probably want someone in the middle – people whose belief systems are not correlated. You need a theory of how groups form beliefs…. So read cultural sociology!


Q1) In thinking about the resistance to “naturally occurring data” and the idea of an “unbiased expert” – do you have a sense that that isn’t possible… Rather than getting that, should we instead shift the conversation to make the politics relevant – to be clear in a way that makes the numbers make sense…

A1) If we chose which experts to listen to, which do we listen to…

Q1) It was interesting to think of economists as “not political” – if that’s the conversation… I think the non-biased expert… That raises issues. We query that that even exists… Maybe we can shift the conversation.

A1) I guess I would want to push back a little bit. I am sympathetic that there is no unbiased expert but… I do a lot of work on economists on how they influence policy. I think the world does need economists, especially for monetary policy, technical aspects of policy. So, having some tools to understand this profession, how they structure beliefs… We need more tools to unpack that set of questions… I’m trying to find ways to study this profession studying quantitative tools and qualitative tools and understand impact on politics and society.

Q2) You mentioned a graph to show polarisation – how did you do that?

A2) This is not based on data, this is based on theoretical patterns… A series of plots using a test data set to illustrate the patterns of the theory – it’s theoretical rather than empirical data.

Q3) A slight follow up… How much have you played with non linear tools… Consensus and confidence… Research on scientific knowledge shows that people who know a little about science have higher confidence than those who know more… That could impact that data on confidence.

A3) We did look an non-linearity – doesn’t make a big difference to some measures here.

Q4) What definition of “expert” are you using, and why?

A4) People with PhDs in economics. In the US case are high status people in the field… In the UK/EU case it is broader. Most work as professors of economics, some work in the private sector in financial sectors. For my purposes it’s holding a PhD in economics… Other work I’ve done on organisations in Latin America you have senior political elites with those credentials, a lot don’t boundary work becomes more important here.

Q5) I think some of the Chicago questions also go to the public. Have you looked at that?

A5) It’s not publicly available… I’ve been thinking about asking for that. But it would be interesting to know if members of the public structure their belief systems differently. There is some work that compares public beliefs to these questions.

Q6)  I work on spatial models around expert agreement and disagreement – interesting measures there and on polarisation. Also dimensionality reduction. Since you are trying to identify latent ideological positions… Not sure if you’ve looked at that. Political behaviour research has

Q7) I wanted to ask about how much the very different types of respondents and samples you have between the US and UK/EU surveys. I was particularly wondering about the high status nature of the US experts and how much that status plays a part… You talked about doing some social network and contextual work here so I was wondering the degree to which their network and co-authorship and professional standing feeds into wanting to be seen to take a particular view, or visibly agree.

A7) The social network part, and co-authorship data is going to lead to a paper. We found people who are closer in co-authoring papers are ideologically closer – not totally surprising… So there is a social approval thing and a selection vias. We think that is the more likely interpretation here – the homophily effect. They co-author non-political papers, they still pick ideologically aligned authors. The status thing is interest… The UK/EU experts is less hierarchical – maybe reflects practice. In terms of monitoring each others responses… I think it’s more contrarian thing… They want to find ways to disagree… They can add comments… So lots of “My colleagues all think this, but if you think about it this other way you get this opposite response”.

Q8) My question/comment is about the “unconstrained” idea space – it feels funny and attractive… But also quite negative… Unconstrained… Disorganised… But you are talking it about a positive quality. But does that suggest they haven’t thought this stuff through?

A8) I’m glad you asked this. This question came up in the 1960s and it was seen as terrible that the ideologies didn’t align to political parties… The field has turned on it’s head now. In the 1960s though this was seen as politically naive. Actually more educated voters are seen to have more constrained beliefs… But with the economists that unconstrained belief system is good as it shows that they are not bring in their partisan/idealogical stand point. There is a contraction there. The idea that the more information you have, the more constrained your belief system should be… But only to a point. There is a really interesting paper by ? de Surrey and Ari Goldberg that compares idealogical voters, the unconstrained voters, and they find a third group that is e.g. politically liberal and economically conservative. This is a really interesting area of the literature. There are a bunch of new methods that are getting us nearer that question…

We broke for lunch and workshops at this point… 

Workshops: Parallel workshop sessions – please see descriptors below.

  • Text Analysis for the Tech Beginner – Suzanne Black, PhD student in LLC
  • An Introduction to Digital Manufacture – Mike Boyd (uCreate Studio Manager, UoE)
  • ‘I have the best words’: Twitter, Trump and Text Analysis – Dave Elsmore (EDINA)
  • An Introduction to Databases, with Maria DB & Navicat – Bridget Moynihan (LLC, UoE)
  • Introduction to Data Visualisation in Processing – Jules Rawlinson (Music, ECA, UoE)
  • Jupyter Notebooks and The University of Edinburgh Noteable service – Overview and Introduction – James Reid (EDINA)
  • Obtaining and working with Facebook Data – Simon Yuill (Goldsmiths)

I attended the Introduction to Data Visualisation in Processing workshop which was really interesting, and left me wanting to have a further play to see where it may potentially be useful. 

Round Table Discussion

  • Melissa Terras (MT), Professor of Digital Cultural Heritage
  • Kirsty Lingstadt (KL), Head of Digital Library and Depute Director of Library and University Collections
  • Ewan McAndrew (EM), Wikimedian in Residence
  • Tim Squirell (TM), PhD Student, Science, Technology and Innovation Studies working on communities and expertise and negotiations of those concepts.

MT: I wanted to start with quite a personal place… I realised last year that I was sort of grieving for the internet. I grew up with the internet, it’s been a big part of my life and friendships… But the internet has taken a different turn… And there is a need to step away from that a bit to stay sane. There is a need to step back and reflect, and think about the University Space. I feel maybe we could have stepped in… The questions of Facebook, Twitter, the use of data… The human nature of trust… And how we use and engage and archive and preserve some of these spaces… I think that makes it interesting to an academic in the digital space right now.

EM: I think the idea of the web was quite sour after Cambridge Analytica. Tim Berners-Lee spoke on Channel 4 News about how it’s not enough to build and run the open web, but we have to look critically at what is being done with it, what people are building. I also thought that the Scottish Referendum, and Glasgow Strathclyde University which called upon all librarians to support political literacy. But that could be “universities” not just “libraries” – there is a need for much more information literacy as a service almost.

KL: The role of the university is about knowledge and supporting and preserving knowledge, with the library central to that… As the digital world changes we need those skills of information literacy, to think critically about what we see on the web, and how we understand that. That’s an important thread the library offers and supports. The arts, humanities and social sciences really support that development of critical engagement, literacy, context and the origins of big data. I was very much chiming with CILIPS work on information literacy – the university library has a really important part to play here…

TS: I want to make three brief points on engagement, expertise and access. One of the things I’ve observed on the web around online communities, is that there is a tendency to only notice a community until something happens. I study some quite extreme communities, including the involuntary celibate community, and you can’t raise interest until people go out and kill people. We really need to see more engagement and understanding, not as an object of interest. The second point is about experts and what that means… I think that reification of expertise is niave at best, and often dangerous. Only engaging with experts, or corroborating your beliefs, or feeling that you only engage with an expert class, overlooks the way most people engage with issues. And finally on access… In light of Cambridge Analytica, Facebook has shut down access for all but their own Facebook programme (with funding councils) of research. Doing that means only people working at the companies, or the elite universities with particular track records…

Comment: Interesting that you mentioned Tim Berners-Lee as he was the reason Web Science got set up at Southampton. The narrative was… I invented the web (discuss) and it has gone wrong (discuss). That was a perspective that didn’t problematise information or communication etc. The idea was that we would reengineer the web (discuss) as if it is technical, not a complex socio-technical network. I’m not being negative but supporting your statements. The restructing of Information Technology GCSE was a travesty – there was no attempt at critical engagement, just at programming. And it is really important that we envision what we want the web to be. There is no fixed idea of the web. We have gone down the rabbit hole of behavioural tracking and advertising as the only economic model… But we could play with that. I would make a pitch for Utopianism… With Donna Harraway: looking at the trouble and thinking about what else we could do.

Comment: I wondered about… that sense of the internet as being what we hoped it could be… But also the issue of the attack on net neutrality in the US, and immediate recognition that that isn’t ok… How do we back away, not engage in the toxic parts of the internet… But also save the parts that are worth saving… Keeping an eye on legislation? Do we protect without participating?

MT: I immediately started to think of how we talk about bitcoin – very utopian visions and turning it into a profit making machine, as has happened in the internet… How do we build structures that can be used to make money… Without that consuming the rest of it… The internet is consuming all the other stuff… I think bitcoin will be the same… The same people who had money 200 years ago, will be the same people who’ll make money now… Partly information literacy, partly being cynical, being civic… Being alive to issues…

TS: I am going to say two contradictory sounding things… So many of these issues seem to be engineering issues to social problems. I was at a conference with someone talking about a blockchain based education network, with a smart contract to validate credentials. Taking the human out of the process, in order to improve the situation. Bitcoin is supposed to be trustless… But at some point you have a human interface, it will fail… You will always face problems you couldn’t spot – unless you spoke to a social scientist. But that goes with us as social scientists is the need for us to engage with the engineering sides of things… Lots of “if only we could have known what would happen with Cambridge Analytica”, but we’ve known about that for years… We struggle to be listened to by policy makers when compared with businesses who have legitimate routes in, and argue for a lack of accountability. Platforms are not neutral, you can engineer the behaviours available in the space. You have to understand the feedback loop between administration and engineering.

EM: Thinking about democratisation.. And thinking about utopian visions… Putting my wikimedian hat on… I think that it has been amazing to see the work done by students here… There is real benefit to having a very transparant space online where you can query or change or contribute to the world. Wikipedia is committed to keeping the human element at its core. One of the ways that Wikipedia checks and balances the data is that you can’t edit a page unless you’ve had an account for four days.

KL: That’s where libraries of all kinds come in – a space or platform to trace the source, the archive materials… And digital data… Data curation and longer term lifecycles.. Digital content being created… To check, to contribute.

Comment: There’s an interesting underlying narrative that the web has gone wrong, and that the economy has gone wrong… As if these structured inequalities are accidental but they are not, they are deliberate. We need a critical historical narrative of the web and how this has taken place…. And the historical narrative of where the web has come from. We need more engagement from the humanities here… There are underlying themes here.

Comment: From literary and fan fiction studies we have for years been talking to a literature and community that exists online and how that interacts online. Fan fiction is often written by women, by BME and LGBTQ and non-binary people… We have a cry of “own the servers” to avoid exploitation… Could anyone comment on that type of utopian vision – the local and the global… Who accesses the data…

KL: From my context of the library, it’s about putting materials out there to access what they need as equitably as possible… But that’s difficult… For archives and personal material there are restrictions and limitations for good reason… We haven’t cracked that perfectly… It is a challenge, there isn’t an easy answer to it…

EM: From a Wikipedia angle… Wikipedia had a conversation within and around the community about where the community is going by 2030… Where they were going, what they needed to do to share and access knowledge around the world… To enable better understanding… To more civic and better societies. But there are huge disparities of access. Out of that came the sense of knowledge not as a product but as a service. And the idea of knowledge equity – in terms of access but recognising only 10% of editors are female, it’s Northern Hemisphere orientated, only 2.5% of geotagged content relates to Africa. It’s not shying away from that, instead trying to address that over time… Which is why Wiki Project Medicine has created “the internet in a box” to enable access to a downloaded medical version of the content to improve access to information.

Comment: From Biological sciences background… My question underpins everything here… We haven’t really touched on digital preservation, it’s a big and worrying thing. I’ve listened to comment on big gaps in digital data, it’s really difficult in the long term. How will that be affected by GDPR and what can be done there in terms of preservation and access. We are looking more and more at the cloud… The carbon footprint of ICT is expected to be 40% by 2040. Thinking about preservation and the more and more carbon intensive nature of the web, what can universities do to tackle these years…

KL: Digital preservation is close and dear to us. It is challenging and not easy. It’s not a commodity you can just buy, there isn’t one way to do this. We are trying to tackle certain areas. We are trying to preserve the university’s history. We also look actively on research data produced by the University. Addressing those two areas, there is still a huge area of web output and web archiving there… There is interest in the University output, but less interest in the wider context. We acknowledge that agenda and push it up in the university – and digital humanities helps here, and that means access to information which helps us make our case. With GDPR does present complexity, it does mean working with encryption… For company/global content that’s broader.

Comment: In terms of the issue of experts… I think it’s interesting to see experts by credentials, or by reputation… And how that relates to the internet… It seems like a great way to be a self-made expert… To promote yourself as an expert because you have a blog. You may have stature and influence… But that’s very different from a PhD or an academic expertise… I’m interested that part of being an expert is admitting when you don’t know something… It seems the public wants experts to tell you the answer right now… What is the role of the internet right now here.

TS: I have a lot of thoughts on this. It’s basically my PhD. If I ramble… Stop me… I think this is fundamentally about the way we reconceptualise expertise.. There is the idea of it being reiified, as rare and based on credentials, and that being in conflict with other types of self-made influential. Steven Taylor has a paper on experts across three types, including this group of self-made experts… They come to represent a much larger group of experts – it hasn’t democratised broadcast but it’s certainly opened up and broadened the field somewhat. When we understand expertise as only credentialed people in specific organisations, we limit communication. We have to be able to engage as compellingly as these people able to weaponise, essentially, nonsense and see how we can be as engaging with them. We have to be provocative and interesting. We can’t expect people to just come and ask the right experts. The burden shouldn’t be on audiences, the burden should be on “experts” to be palatable and appealing as experts.

MT: The anti expertise thing isn’t a new thing too… It goes right back to founding of universities, particularly in the Victorian era… I have a book coming out on professors in childrens literature, and accompanying anthrology, and every single story is “the professor is rubbish”. All of them. All about not trusting experts, just when expertise is being formalised… The general populace ridiculing them… The internet has boosted that again. But a positive thing… Crowd sourcing is a positive development… We did a few crowd sourcing projects that truly changed access and use of information – work that used to only be done by paleographist, looking at Jeremy Bentham’s papers… The internet helped us speed that all up… If we have the right platforms, the right structures, we can do the right things… But we can’t let “expertise is rubbish to perpetuate”.

EM: Again with digital preservation, there is a cost attached… There may be volunteers… If there is a platform or a lack of cost… You can do a lot. And archive a lot in public ways…

KL: I was going to add that the cultural heritage sector has an interesting relationship with working with the community… But there is this tension about how and who can contribute how, and who can do it best. But the crowd is full of enthusiasm… As long as work is provenanced…. That is a really good way to positively use the web.

Comment: In response to the Cambridge Analytica stuff… And why didn’t they listen to the social scientists… Isn’t GDPR an example of the law doing as good a job as it could… And data ownership… Legislative work in Europe on copyright and data ownership… If we want to set the right example, it’s not enough to throw up our hands in horror… You have to engage in legislative process… Laws do have an impact in cyberspace.

Comment: Business models – and how do we change that – it shapes the platform. Investment doesn’t go in equally – and as universities we do start ups, we do engagement with industry. How do we move beyond all of these businesses being set up by young wealthy guys, and opening that up… And reconceptualising success as more than just exit, and data as asset – and that being personal data. I also wanted to note that web archiving does take place – with the Internet Archive who operate in the more permissive US copyright context (and mirrored in Canada – they were concerned that Trump might interfere with the archive). There is a small but politically aware web archiving community but part of making that and any platform work is about acknowledging that there is cost to running platforms, to archiving materials…

Comment: That idea of “an expert” – surely we reconceptualise the expert as a distributed thing.

TS: Yes.

MT: And with that I’d like to thank the panel and draw this to a close. We hope to have some announcements in the next year about expanding this work, and this day takes place in an environment that contributed to my coming to Edinburgh, with the City Deal, and with the work driving Edinburgh to be the Data Driven Innovation capital of Europe.

May 022018

This morning I’m at the “Working with the British Library’s Digital Content, Data and Services for your research (University of Edinburgh)” event at the Informatics Forum to hear about work that has been taking place at the British Library Labs programme, and with BL data recently. I’ll be liveblogging and, as usual, any comments, questions, 

Introduction and Welcome – Professor Melissa Terras

Welcome to this British Library Labs event, this is about work that fits into wider work taking place and coming here at Edinburgh. British Library Labs works in a space that is changing all the time, and we need to think about how we as researchers can use digital content and this kind of work – and we’ll be hearing from some Edinburgh researchers using British Library data in their work today.

“What is British Library Labs? How have we engaged researchers, artists, entrepreneurs and educators in using our digital collections” – Ben O’Steen, Technical Lead, British Library Labs

We work to engage researchers, artists, entrepreneurs and educators to use our digital collections – we don’t build stuff, we find ways to enable access and use of our data.

The British Library isn’t just our building in St Pancras, we also have a huge document supply and storage facility in Boston Spa. At St Pancras we don’t just have the collections, we have space to work, we have reading rooms, and we have five underground floors hidden away there. We also have a public mission and a “Living Knowledge Vision” which helps us to shape our work

British Library Labs has been running for four years now, funded by the Andrew Mellow Fund, and we are in our third funded phase where we are trying to make this business as usual… So the BL supports the reader who wants to read 3 things, and the reader who wants to read 300,000 things. To do that we have some challenges to face to make things more accessible – not least to help people deal with the sheer scale of the collections. And we want to avoid people having to learn unfamiliar formats and methodologies which are about the library and our processes. We also want to help people explore the feel of collections, their “shape” – what’s missing, what’s there, why and how to understand that. We also want to help people navigate data in new ways.

So, for the last few years we have been trying to help researchers address their own specific problems, but also trying to work out if that is part of a wider problem, to see where there are general issues. But a lot of what we have done has been about getting started… We have a lot of items – about 180 million – but any count e have is always an estimates. Those items include 14m books, 60m patents, 8m stamps, 3m sound recordings… So what do researchers ask for….

Well, researchers often ask for all the content we have. That hides the failure that we should have better tools to understand what is there, and what they want. That is a big ask, but that means a lot of internal change. So, we try to give researchers as much as we have… Sometimes thats TBs of data, sometimes GBs.. And data might be all sorts of stuff – not just the text but the images, the bindings, etc. If we take a digitised item we have an image of the cover, we have pictures, we have text, we also have OCR for these books – when people ask for “all” the book – is that the images, the OCR or both? One of those is much easier to provide…

Facial recognition is quite hot right now… That was one of the original reasons to access all of the illustrations – I run something called the Mechanical Curator to help highlight those images – they asked if they could have the images – so we now have 120m images on Flickr. What we knew about images was the book, and the page. All the categorisation and metadata now there has been from people and machines looking at the data. We worked with Wikimedia UK to find maps, using manual and machine learning techniques – kind of in competition – to identify those maps… And they have now been moved into georeferencing tools ( and fed back to Flickr and also into the catalgue… But that breaks the catalogue… It’s not the best way to do this, so that has triggered conversations within the library about what we do differently, what we do extra.

As part of the crowdsourcing I built an arcade machine – and we ran a game jam with several usable games to categorise or confirm categories. That’s currently in the hallway by the lifts in the building, and was the result of work with researchers.

We put our content out there under CC0 license, and then we have awards to recognise great use of our data. And this was submitted – a video of Hey There Young Sailor official music video using that content! We also have the Off the Map copetition – a curated set of data for undergraduate gaming students based on a theme… Every year there is something exceptional.

I mentioned library catalogue being challenging. And not always understanding that when you ask for everything, that isn’t everything that exists. But there are still holes…. When we look at the metadata for our 19th century books we see huge amounts of data in [square brackets] meaning the data isn’t known but is the best suggestion. And this becomes more obvious when we look at work researcher Pieter Francois did on the collection – showing spikes in publication dates at 5 year intervals… Which reflects the guesses at publication year that tend to be e.g. 1800/1805/1810. So if you take intervals to shape your data, it will be distorted. And then what we have digitised is not representative of that, and it’s a very small part of the collection…

There is bias in digitisation then, and we try to help others understand that. Right now our digitised collections are about 3% of our collections. Of the digitised material 15% is openly licensed. But only about 10% is online. About 85% of our collections cn only be accessed “on site” as licenses were written pre-internet. We have been exploring that, and exploring what that means…

So, back to use of our data… People have a hierachy of needs from big broad questions down to filtered and specific queries… We have to get to the place where we can address those specific questions. We know we have messy OCR, so that needs addressing.

We have people looking for (sometimes terrible) jokes – see Victorian Humour run by Bob Nicholson based on his research – this is stuff that can’t be found with keywords…

We have Kavina Novrakas mapping political activity in the 19th Century. This looks different but uses the same data and the same platform – using Jupyter Notebooks. And we have researchers looking at black abolitionists. We have SherlockNet trying to do image classification… And we find work all over the place building on our data, on our images… We found a card game – Moveable Type – built on our images. And David Normal building montages of images. We’ve had poetic places project.

So, we try to help people explore. We know that our services need to be better… And that our services shape expectations of the data – and can omit and hide aspects of the collections. Exploring data is difficult, especially with collections at this scale – and it often requires specific skills and capabilities.

British Library Labs working with University of Edinburgh and University of St Andrews Researchers

“Text Mining of News Broadcasts” – Dr. Beatrice Alex, Informatics (University of Edinburgh)

Today I’ll be talking about my work with speech data, which is funded by my Turing fellowship. I work in a group who have mainly worked with text, but this project has built on work with speech transcripts – and I am doing work on a project with news footage, and dialogues between humans and robots.

The challenges of working with speech includes particular characteristics: short utterances, interjections; speaker assumptions – different from e.g. newspaper text; turn taking.  Often transcripts miss sentence boundaries, punctuation or missing case distinctions. And there are errors introduced by speech recognition.

So, I’m just going to show you an example of our work which you can view online – Here you can do real time speech recognition, and this can then also be run through the Edinburgh Geoparser to look for locations and identify their locations on the map. There are a few errors and, where locations haven’t been recognised in the speech recognition they also don’t map well. The steps in this pipeline is speech recognition… ASR then Google Text Restoration, and then text and data mining.

So, at the BL I’ve been working with Luke McKernan, lead curator for news and moving images. I have had access to a small set of example news broadcast files for prototype development. This is too small for testing/validation – I’d have to be onsite at BL to work on the full collection. And I’ve been using the CallHome collection (telephone transcripts) and BBC data which is available locally at Informatics.

So looking at an example we can see good text recognition. In my work I have implemented a case restoration step (named entities and sentence initials) using rule based lexicon lookup, and also using Punctuator 2 – an open source tool which adds punctuation. That works much better but isn’t up to an ideal level there. Meanwhile the Geoparser was designed for text so works well but misses things… Improvement work has taken place but there is more to do… And we have named entity recognition in use here too – looking for location, names, etc.

The next steps is to test the effect of ASR quality on text mining – using CallHome and BBC broadcast data) using formal evaluation; improve the text mining on speech transcript data based on further error analysis; and longer term plans include applications in the healthcare sector.


Q1) Could this technology be applied to songs?

A1) It could be – we haven’t worked with songs before but we could look at applying it.

“Text Mining Historical Newspapers” – Dr. Beatrice Alex and Dr. Claire Grover, Senior Research Fellow, Informatics (University of Edinburgh) [Bea Alex will present Claire’s paper on her behalf]

Claire is involved in an Adinistrative Data Research Centre Scotland project looking at local Scottish Newspapers, text mine it, and connect it to other work. Claire managed to get access to the BL newspapers through Cengage and Gale – with help from the University of Edinburgh Library. This isn’t all of the BL newspaper collection, but part of it. This collection of data is also now available for use by other researchers at Edinburgh. Issues we had here ws that access to more reent newspaper is difficult, and the OCR quality. Claire’s work focused on three papers in the first instance, from Aberdeen, Dundee and Edinburgh.

Claire adapted the Edinburgh Geoparser to process the OCR format of the newspapers and added local gazetteer resouces fro Aberdeen, Dundee and Edinburgh from OS OpenData. Each article was then automatically annotated with paragraph, sentence, work mark-up; named entities – people, place, organisation; location; geo coordinates.

So, for example, a scanned item from the Edinburgh Evening News from 1904 – its not a great scan but the OCR is OK but erroneous. Named entities are identified, locations are marked. Because of the scale of the data Claire took just one year from most of the papers and worked with a huge number of articles, announcments, images etc. She also drilled down into the geoparsed newspaper articles.

So for Abereen in 1922 there were over 19 million word/punctuation tokens and over 230,000 location mentions Then used frequency methods and concordances to understand the data. For instance she looked for mentions of Aberdeen placenames by frequency – and that shows the regions/districts of abersteen – Torry, Woodside, and also Union Street… Then Claire dug down again… Looking at Torry the mentions included Office, Rooms, Suit, etc, which gives a sense of the area – a place people rented accommoation in. In just the news articles (not ads etc) then for Torry it’s about Council, Parish, Councillor, politics, etc.

Looking at Concordances Claire looked at “fish”, for instance” to see what else was mentioned and, in summary, she noted that the industry was depressed after WW1; there was unemployment in Aberdeen and the fishing towns of Aberdeenshire; that there was competition rom German trawlers landing Icelandic fish; that there were hopes to work with Germany and Russia on the industry; and that government was involved in supporting the industry and taking action to improve it.

With the Dundee data we can see the Topic Modelling that Claire did for the articles – for instance clustering of cars, police, accidents etc; there is a farming and agriculture topic; sports (golf etc)… And you can look at the headlines from those topics and see how that reflect the identified topics.

So, next steps for this work will include: improving text analysis and geoparsing components; get access to more recent newspapers – but there is issing infrastructure for larger data sets but we are working on this; scale up the system to process whole data set and store text ining output; tools to summarise content; and tools for search – filtering by place, data, linguistic context – tools beyond the command line.

“Visualizing Cultural Collections as a Speculative Process” – Dr. Uta Hinrichs, Lecturer at the School of Computer Science (University of St Andrews)

My research focuses on visualisation and Human Computer Interaction. I am particularly interested in how interfaces can make visible digital collections. I have worked on a couple of projects with Bea Alex and others in the room to visualise texts. I will talk a little bit about LitLong, and the process in developing early visualisations for the project.

So, some background… Edinburgh is a UNESCO City of Literature, with lots of literature about and in the city. And we wanted to automate the discovery of Edinburgh-absed literature from available digitised text. That included a large number of collections – about 380k – from collections including the BL 19th Century Books collection. And we wanted to make results accessible to the public.

There were lots of people involved here, from Edinburgh University (PI, James Loxley), Informatics, St Andrews, and EDINA. And worked both with out of copyright texts, but also we had special permission to work with some in-copyright texts including Irvine Welsh. And a lot of work was done to geoparse the text – and assess it’s Edinburghyness. For each mention we had the author, the title, the year, and snippets of the text from around the mention. This led to visualisations – I worked on LitLong 1.0 and I’ll talk about this, but a further version (LitLong 2.0) launched last year.

So you can explore clusters of places mentioned in texts, you can explore the clustered words and snippets around the mentions. And you can zoom in to specific texts – again you can see the text snippets in detail. When you explore the snippets, you can see what else is there, to explore other snippets.

So in terms of the design considerations we wanted a multi faceted intractive overview of the data – Edinburgh locations; books; extracted snippets; authors; keywords. Maps and lists are familiar and we wanted this tool to be accessible to scholars but also the public. We took an approach that allowed “generous” explorations (Mitchell Whitelaw 2015) so there are suggestions of how to explore further, parts of the data showing… Weighted tag clouds let you get a feel of the data for instance.

As a process it wasn’t like the text mining happened then we magically had the visualisations… It was iterative. And actually we used visualisation tools to actually assess which texts were in scope, and which weren’t going to be relevant – and mark them up to keep or to rule out a text. This interface included information on where in a text the mention occurred – to help identify how much about Edinburgh a text actually was.

We had a creative visualisation process… We launched the interface in 2015, and there was some iteration and that also inspired LitLong 2.0 which is a much more public-friendly way to explore the material in different way.

So, I think it is important to think about visualisation as a speculative process. This allows you to make early computational analysis approached visille and facilitate qa and curatorial process. To promote new interactions transforming a print based culture into something different – thinking about materiality rather than just content is important as we enable exporation. When I look back at my own work I see some similarities in interfaces… You can see the unique qualities of the collections in the data trends but we are doung much more work on designing interfaces  that surface the unique qualities of the collection in new ways.


Q1) What did you learn about Edinburgh or literature in Edinburgh from this project?

A1) The literature scholars would be better able to talk about that but I know it has inspired new writers. Used in teaching. And also discovered some characteristics of Edinburgh, and women writers in the corpus… James Loxley (Edinburgh) and Tara Thompson (Edinburgh Napier University) could say more about how this is being used in new literary research.

“Public Private Digitisation Partnerships at the British Library” – Hugh Brown, British Library Digitisation Project Manager

I work as part of the Digital Scholarship team at the British Library, which was founded in 2010 to support colleagues and researchers to make innovative use of BL digital collections and data – and recognising the gap in provision we had there. The team is led by Adam Farquhar – Head of Digital Scholarship, and by Neil Fitzgerald, Head of Digital Research Team. We are cross disciplinary experts in the areas of digitisation, librarianship, digital historu adnd humanities, computer and data sience and we look at how technilogu is transforming research and in turn our services. And we include the British Library Labs, Digital Curators, adn the Endangered Archives Programme (EAP).

So, we help get content online and digitised, we support researchers, and we run a training programme to bridge skills so that researchers can begin to engage with digital resources. We expect that in 10-15 years time those will be core research skills so we might not exist – it will just be part of the norm. But we are a long way off that at the moment. We also currently run Hack and Yack events to experiment and discuss. And we also have a Reading Room to share what’s happening in the world, to share best practice.

In terms of our collections and partnerships, we have historically had a slightly piecemeal digitisation approach, so we now have a joined up strategy that sits under our Living Knowledge strategy and includes partnership, commercial strategy and our own collection strategy. Our partnerships recognise that we don’t always have the skills we need to make content available, whilst our commercial strategy – where I work – allows us to digitise as much as possible, and in a context were we don’t have infinite funding for digitisation.

We have various factors in mind when considering potential partnership. The types of approach include partnerships based on whether materials are in or out of copyright – if in copyright then commercial partners have to clear rights. We do public/private partnership with technology partners. We have non-commercial organisational and/or consortium funding. And we have philanthropic donor funded work. Then we think about content – content strategy, asset ownership, digitisation location. We think about value – audience type/interest/geography, and topicality. We think about copyright – British library owns the rights, rights of reuse. We think about disocverability – the ability to identify and search, and access that maximises exposure. We look at the (BL) benefit – funding, access etc. We look at risk. And we look at contract – whether it is non-exclusive, commercial/non commercial.

So, we have had public-private digitisation partnerships with Gale Cengage Learning, Adam Matthews, findmypast, Google Books, Microsoft books, etc. And looking at examples Google books has been 80m+ images digitised; Microsoft books was 25m images; findmypast has done 23m+ images of newspapers; Gale Cengage Learning has done 18th century collections – 22m images, 19c online 2.2m+ images, and Arabic books, etc.

The process begins with liaison with key publishers. Then there is market and content research. Then we plan and agree plan, including licensing of rights for a fixed term (5-10 years), and royalty arrangements and reading room access. Then digitisation takes place, funded by the partner – either by setting up a satellite studio, or using the BL studio. So our partners digitise content and give us that content, in exchange they get 5-10 years exclusive agreement to use that content on their platform. And revenue  generated for BL helps support what we do, and our curators work around digitisation.

So Findmypast was an interesting example. We had electoral registers and India Office Records – data with real commercial value. So, we put a tender out for a partner for digitisation. Findmypast was selected… Part of that was to do with the challenges of the electoral registers which were inconsistent formats etc. so needed a lot of specific work And we also needed historical country boundaries to be understood to make it work. There was also a lot of manual OCR work to do.

With Gale Cengage they tend to be education/universities focused and they work with researchers. We worked with them to select 19th century materials to fit their themes and interests. They did the early arabic book project – a really complex project. The private case collection consisted of mainly books that had been inaccessible on grounds of obscenity from around 1600 and 1960.

With Adam Mathew Digital we were approaches to contribute material from the electoral registers and india office records. And materials on the East India Company.

Now these are exciting projects but we want 20-30% of content generated in these projects to be available as a corpus for research and that’s important to our agreements.

Challenges in the workflow include ensuring business partners and scannning vendors have a good understanding of the material BL holds in our collections. We have to define and provide metadata requirements the BL needs to supply to the partners. Getting statistics and project plans from information business partners. There are logistical challenges around understanding the impact of digitisation on BL departments supporting the process. We have to manage partners business drivers versus BL curatorial drivers. We have to manage the parters digitisation vendors on site. And ensuring the final digital assets/metadata received meets BL requirements for sign off and ingest.


Q1) How can we actually access this stuff for research?

A1) For pure research that can be done. For example we have a company in Brighton who are doing research on the electoral roll. That’s not in competition with what the private partner is doing.

Comment from Melissa) My experience is “don’t ask, don’t get” – so if you see something you want to use in your research, do ask!

“The Future of BL Labs and Digital Research at the Library” – Ben O’Steen

I’ve handed out some personas for users of our digital collections – and a blank sheet on the back. We are trying to build up a picture of the needs of our users, their skills and interests, and that helps us illustrate what we do – that’s a thing to come back to (see:

So I want to talk about the future of BL Labs. We are a project and our funding is due to finish. Our role has been to engage with researchers and that is going to continue – maybe with that same brand just not as a project. We need to learn what they want to do… We need to collect evidence of demand. And we are developing a business model and support process to make “Business as usual” at the BL. We want to help to create pathway to developing a “Digital Research Suit” at the BL by 2019. But we want to think about what that might be, and we are piloting ideas including small 2 person workrooms for digital projects. And we can control access – so that we can see how this works, and ensure that the users understand what you can and cannot do with the data (that you can’t just download everything and walk out with it).

And many other places are being “inspired” by our model – take a look at the Library of Congress work in particular.

So, at this stage we are looking at our business model and how we can make these scalable services. Our model to date has been smaller scale, about capabilities to get started, etc. That is not scalable at the level we’ve been working. We need a more hands off proess ad to be able to see more people. We also run BL Labs Awards which, instead of working with people, recognises work people have already done. People submit and then in October our advisory board reviews the entries and looks for work that champions our content.

To develop our business model we are exploring, evaluating and implementing a business model. We are using business model canvas. We have internal and external business model development, implementation and evaluation groups, and exploring how this could work in practice. And we are testing, piloting and implementing our business model. That means:

  • developing support service
    • Entry level – about the collection, documentation improvements, case studies that help show what is in there.
    • Baseline – basic enquiry service to enable researchers to understand if a BL project is the right path, any legal restrictions that need addressing, etc. We try to get you to the next stage of developing your idea.
    • Intermediate – Consultation service, which will be written in as part of a bid.
    • Advanced – support 10 projects per year through an application process)
  • Augment – that was a placeholder for a year, and now a tender has just gone out for a repository type service for 12-18 months
    • e.g. sample datasets, tools, examples of use
    • Pilot use of Jupyter Notebooks / Docker other tools for Open and Onside data
  • Researcher access to BL APIs
  • Reading room services – onside access/compute to digital collections – which means us training staff

This has come about as we’ve seen a pattern in approaches that start with an initial exploration phase, then transition into investigation and then some sort of completion phase. There had been a false assumption (on the data providers part) that data-based work must start at the investigation phase – to have an idea of the project they want to do, to know the data already, to know the collections. What we are piloting is that essential exploratory stage, acknowledging that that happens. And that pattern shifts around – exploration and investigation stages can fork off in different directions, that’s fine.

So, timescales and themes seem to be a phase of quick initial work. A longer and variable transition takes place into investigation – probably months. Then investigation takes months to a year. And crucially that completion stage.

Exploration is about understanding the data in an open-ended fashion. It is about discovering the potential tools to work with the data. We want people to gain awareness of their capabilities and limitations – a reality check and opportunity to understand the need for partners and/or new tools. And it’s about developing a firmer query as that helps you to understand the cost, risk, time you might need. Exploration (e.g. V&A Spelunker) lets you get a sense of what’s there, which gives you a different way in to the keyword or catalogue search. And then you have artists like Mario Klingemann – collating images looking sad… It’s artistic but talks about how women are portrayed in the 19th Century. He’s also done work on hats on the ground – and found it’s always a fight! This is showing cultural memes – an important question… An older example is the Cooper Heritt collection – which lets you see all of tags – including various types of similarity that show new ways into the data.

So, what should a digital exploration service look like? Which apps? Does Jupyter Notebook assume too much?

We’ve found that every time we present the data, it shapes the perception. For instance the On the Road manuscript is on a roll. If you print a book on a receipt roll it’s different and reads and is understood differently.

MIT have a Moral Machine survey ( which is the classic trolley issue – crowdsourced for autonomous vehicle. But that presentation shapes and limits the questions, and that is biased. Some of the best questions we’ve seen have been from people who have asked very broad questions and haven’t engaged in exploration in other ways. They are hard to answer (e.g. all depictions of women) but they reveal more. Presenting as a searchable list shapes how we interpret the result… But for instance showing newspaper articles as if in a giant newspaper – not a list of results – changes what you do. And that’s why tools like IIIF seems useful.

So… We have things like Gender API. It looks good, it looks professional… If you try it with a western name, does it work. If you try it with an Indian name, does it work. If you try it with a 19th Century name does it work? Know that marketeers will use this. See also sentiment analysis. Some of these tools are based on Twitter. I found a research working an 18th Century texts for sentiment about war and conflict… Through a tool developed and trained for Tweets. We have to be transparent in what is happening, in understanding what you are doing… Hence thinking about personas.

We are trying to think about how we show what is missing from a collection, rather than what is present so that data can be used in a more informed way. We are looking at what research environments we can provide – we know that people want to use their own but we can sometimes be a bit stuffed by licensing based in a paper era. On site tools can help. Should we enable research environments for open data that can be used off site too. We are thinking about focus – are the query, tooling and collections required well defined; is it feasible – legal, cost, ethical, source data quality, etc; is it affordable – time, people, money; etc.

So, we have, on the BL Labs website, a form – it’s long so do send us feedback on whether that is the right format etc. – to help us understand demand and skills.

Those personas – please fill these in – and let us know the technical part, what you might want, how technical the support you need. We are keen to discuss your needs, challenges and issues.

And with that we are done and moving onto lunch and discussion. Thanks to Ben, Hugh, Alex and Uta we well as Melissa and the Digital Scholarship Team!


Mar 232018

Today I am back at the Data Fest Data Summit 2018, for the second day. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to Data Summit Day 2 – Maggie Philbin

We’ve just opened with a video on Ecometrica and their Data Lab supported work on calculating water footprints. 

I’d like to start by thanking our sponsors, who make this possible. And also I wanted to ask you about your highlights from yesterday. These include Eddie Copeland from Nesta’s talk, discussion of small data, etc. 

Data Science for Societal Good — Who? What? Why? How? –  Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton

Data science has a huge impact for the business world, but also for societal good. I wanted to talk about the 5 i’s of data science for social good:

  1. Interest
  2. Insight
  3. Inspiration
  4. Innovation
  5. Ignition

So, the number one, is the Interest. The data can attrat people to engage with a problem. Everything we do is digital now. And all this information is useful for something. No matter what your passion, you can follow this as a data scientist. I wanted to give an example here… My background is astrophysics and I love teaching people about the world, but my day job has always been other things. About 20 years ago I was working in data science at NASA and we saw an astronomical – and I mean it, we were NASA – growth in data. And we weren’t sure what to do with it, and a colleague told me about data mining. It seemed interesting but I just wasn’t getting what the deal was. We had a lunch talk from a professor at Stanford, and she came in and filled the board with equations… She was talking about the work they were doing at IBM in New York. And then she said “and now I’m going to tell you about our summer school” – where they take kids from inner city kids who aren’t interested in school, and teach them data science. Deafening silence from the audience… And she said “yes, we teach the staff data mining in the context of what means most for these students, what matters most. And she explained: street basketball. So IBM was working on a software called IBM Advanced Calc specifically predicting basketball strategy. And the kids loved basketball enough that they really wanted to work in math and science… And I loved that, but what she said next changed my life.

My PhD research was on colliding galaxy. It was so exciting… I loved teaching and I was so impressed with what she had done. These kids she was working with had peer pressure not to be academic, not to study. This school had a graduation rate of less than 50%. Their mark of success for their students was their graduation rate – of 98%. I was moved by that. I felt that if this data science has this much power to change lives, that’s what I want to do for the rest of my lives. So my life, and those of my peers, has been driven by passion. My career has been as much about promoting data literacy as anything else.

So, secondly, we have insight. Traditionally we collect some data points but we don’t share this data, we are not combining the signals… Insight comes from integrating all the different signals in the system. That’s another reason for applying data to societal good, to gain understanding. For example, at NASA, we looked at what could be combined to understand environmental science, and all the many applications, services and knowledge that could be delivered and drive insight from the data.

Number three on this list is Inspiration. Inspiration, passion, purpose, curiousity, these motivate people. Hackathons, when they are good, are all about that. When I was teaching the group projects where the team was all the same, did the worst and least interestingly. When the team is diverse in the widest sense – people who know nothing about Python, R, etc. can bring real insights. So, for example my company run the “Data Science Bowl” and we tackle topics like Ocean Health, Heart Health, Lung Cancer, drug discovery. There are prizes for the top ten teams, this year there is a huge computing prize as well as a cash prize. The winners of our Heart Health challenge were two Wall Street Quants – they knew math! Get involved!

Next, innovation. Discovering new solutions and new questions. Generating new questions is hugely exciting. Think about the art of the possible. The XYZ of Data Science Innovation is about precision data, precision for personalised medicine, etc.

And fifth, ignition. Be the spark. My career came out of looking through a telescope back when I lived in Yorkshire as a kid. My career has changed, but I’ve always been a scientist. That spark can create change, can change the world. And big data, IoT and data scientists are partners in sustainability. How can we use these approaches to address the 17 Sustainability Development Goals. And there are 229 Key Performers Indicators to measure performance – get involved. We can do this!

So, those are the five i’s. And I’d like to encapsulate this with the words of a poet…. Data scientists – and that’s you even if you don’t think you are one yet. You come out of the womb asking questions of the world. Humans do this, we are curious creatures… That’s why we have that data in the first place! We naturally do this!

“If you want to build a ship, don’t drum up people to gather wood adn don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea”

– Antoine de Saint-Exupery.

This is what happened with those kids. Teach people to yearn for the vast and endless sea, then you’ll get the work done. Then we’ll do the hard work

Slides are available here:


Comment, Maggie Philbin) I run an organisations, Teen Tech, and that point that you are making of start where the passion actually is, is so important.

KB) People ask me about starting in data science, and I tell them that you need to think about your life, what you are passionate about and what will fuel and drive you for the rest of your life. And that is the most important thing.

Q1) You touched on a number of projects, which is most exciting?

A1) That’s really hard, but I think the Data Bowl is the most exciting thing. A few years back we had a challenge looking at how fast you can measure “heart ejection fraction – how fast the heart pumps blood out” but the way that is done, by specialists, could take weeks. Now that analysis is built into the MRI process and you can instantly re-scan if needed. Now I’m an astronomer but I get invited to weird places… And I was speaking to a conference of cardiac specialists. A few weeks before my doctor diagnosed me with a heart issue…. And that it would take a month to know for sure. I only got a text giving me the all clear just before I was about to give that talk. I just leapt onto that stage to give that presentation.

The Art Of The Practical: Making AI Real – Iain Brown, Lead Data Scientist, SAS

I want to talk about AI and how it can actually be useful – because it’s not the answer to everything. I work at SAS, and I’m also a lecturer at Southampton University, and in both roles look at how we can use machine learning, deep learning, AI in practical useful ways.

We have the potential for using AI tools for good, to improve our lives – many of us will have an Alexa for instance – but we have to feel comfortable sharing our data. We have smart machines. We have AI revolutionising how we interact with society. We have a new landscape which isn’t about one new system, but a whole network of systems to solve problems. Data is a selleble asset – there is a massive competitive advantage in storing data about customers. But especially with GDPR, how is our data going to be shared with organisations, and others. That matters for individuals, but also for organisations. As data scientists there is the “can” – how can the data be used; and the “should” – how should the data be used. We need to understand the reasons and value of using data, and how we might do that.

I’m going to talk about some exampes here, but I wanted to give an overview too. We’ve had neural networks for some time – AI isn’t new but dates back to the 1950s. .Machine learning came in in the 1980s, deep learning in the 2010s, and cognitive computing now. We’ve also had Moore’s Law changing what is theoretically possible but also what is practically feasible over that time. And that brings us to a definition “Artificial Intelligence is the science of training systems to emulate human tasks through learning and automation”. That’s my definition, you may have your own. But it’s about generating understanding from data, that’s how AI makes a difference. And they have to help the decision making process. That has to be something we can utilise.

Automation of process through AI is about listening and sensing, about understanding – that can be machine generated but it will have human involvement – and that leads to an action being made. For instance we are all familiar with taking a picture, and that can be looked at and understood. For instance with a bank you might take an image of paperwork and passports… Some large banks check validity of clients with a big book of pictures of blacklisted people… Wouldn’t it be better to use systems to achieve that. Or it could be a loan application or contract – they use application scorecards. The issue here is interpretability – if we make decisions we need to know why and the process has to be transparent so the client understands why they might have been rejected. You also see this in retail… Everything is about the segment of one. We all want to be treated as individuals… How does that work when you are one of millions of individuals. What is the next thing you want? What is the next thing you want to click on? Shop Directory, for instance, have huge ranges of products on their website. They have probably 500 pairs of jeans… Wouldn’t it be better to apply their knowledge of me to filter and tailor what I see? Another example is the customer complaint on webchat. You want to understand what has gone wrong. And you want to intervene – you may even want to do that before they complain at all. And then you can offer an apology.

There are lots of applications for AI across the board. So we are supporting our customers on the factors that will make them successful in AI, data, compute, skillset. And we embed AI in our own solutions, making them more effective and enhancing user experience. Doing that allows you to begin to predict what else might be looked at, based on what you are already seeing. We also provide our customers with extensible capabilities to help them meet their own AI goals. You’ll be aware of Alpha Go, it only works for one game, and that’s a key thing… AI has to be tailored to specific problems and questions.

For instance we are working on a system looking at optimising the experience of watching sports, eliminating the manual process of tagging in a game. This isn’t just in sport, we are also working in medicine and in lung cancer, applying AI in similar 3D imaging ways. When these images can be shared across organisations, you can start to drive insights and anomalies. It’s about collaborating, bringing data from different areas, places where an issue may exist. And that has social benefit of all of us. Another fun example – with something like wargaming you can understand the gamer, the improvements in gameplay, ways to improve the mechanics of how game play actually works. It has to be an intrinsic and extrinsic agreement to use that data to make that improvement.

If you look at a car insurer and the process and stream of that, that’s typically through a call centre. But what if you take a picture of the car as a way to quickly assess whether that claim will be worth making, and how best to handle that claim.

I value the application, the ways to bring AI into real life. How we make our experiences better. It’s been attributed to Voltaire, and also to Spiderman, that “with great power comes great responsibility”. I’d say “with great data power comes great responsibility” and that we should focus on the “should” not the “could”.


Comment) A correction on Alpha Go: Alpha Zero plays Chess etc. It’s without any further human interaction or change.

Q1) There is this massive opportunity for collaboration in Scotland. What would SAS like to see happen, and how would you like to see people working together?

A1) I think collaboration through industry, alongside academia. Kirk made some great points about not focusing on the same perspectives but on the real needs and interest. Work can be siloed but we do need to collaborate. Hack events are great for that, and that’s where the true innovation can come from.

Q2) What about this conference in 5 years time?

A2) That’s a huge question. All sorts of things may happen, but that’s the excitement of data science.

Socially Minded Data Science And The Importance Of Public Benefits – Mhairi Aitken, Research Fellow, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh

I have been working in data science and public engagement around data and data science for about eight years and things have changed enormously in that time. People used to think about data as something very far from their everyday lives. But things have really changed, and people are aware and interested in data in their lives. And now when I hold public events around data, people are keen to come and they mention data before I do. They think about the data on their phones, the data they share, supermarket loyalty cards. These may sound trivial but I think they are really important. In my work I see how these changes are making real differences, and differences in expectations of data use – that it should be used ethically and appropriately but also that it will be used.

Public engagement with data and data science has always been important but it’s now much easier to do. And there is much more interest from funders for public engagement. That is partly reflecting the press coverage and public response to previous data projects, particularly NHS data work with the private sector. Public engagement helps address concerns and avoid negative coverage, and to understand their preferences. But we can be even more positive with our public engagement, using it to properly understand how people feel about their data and how it is used.

In 2016 myself and colleagues undertook a systematic review of public responses to sharing and linking of health data for research purposes (Aitken, M et al 2016 in BMC medical ethics, 17 (1)). That work found that people need to understand how data will be used, they particularly need to understand that there will be public benefit from their data. In addition to safeguards, secure handling, and a sense of control, they still have to be confident that their data will be used for public benefits. They are even supportive if the benefit is clear but those other factors are faulty. Trust is core to this. It is fundamental to think about how we earn public trust, and what trust in data science means.

Public trust is easy to define. But what about “public benefit”. Often when people call about data and benefits from data. People will talk about things like Tesco Clubcard when they think of benefit from data – there is a direct tangible benefit there in the form of vouchers. But what is the public benefit in a broader and less direct sense. When we ask about public benefit in the data science community we often talk about economic benefits to society through creating new data-driven innovation. But that’s not what the public think about. For the public it can be things like improvements to public services. In data-intensive health research there is an expectation of data learning to new cures or treatments. Or that there might be feedback to individuals about their own conditions or lifestyles. But there may be undefined or unpredictable potential benefits to the public – it’s important not to define the benefits too narrowly, but still to recognise that there will be some.

But who is the “public” that should benefit from data science? Is that everyone? Is it local? National? Global? It may be as many as possible but what is possible and practical? Everyone whose data is used? That may not be possible. Perhaps vulnerable or disadvantaged groups? Is it a small benefit for many, or a large benefit for a small group.  Those who may benefit most? Those who may benefit the least? The answers will be different for different data science projects. That will vary for different members of the public. But if we only have these conversations within the data science community we’ll only see certain answers, we won’t hear from groups without a voice. We need to engage the public more with our data science projects.

So, closing throughts… We need to maintain a social license for data science practices and that means continual reflection on the conditions for public support. Trust is fundamental – we don’t need to make the public trust us, we have to actually be trustworthy and that means listening, understanding and responding to concerns, and being trustworthy in our use of data. Key to this is finding public benefits of data science projects. In particular we need to think about who benefits from data science and how benefits can be maximised across society. Data scientists are good at answering questions of what can be done but we need to be focusing on what should be done and what is beneficial to do.


Q1) How does private industry make sure we don’t leave people behind?

A1) BE really proactive about engaging people, rather than waiting for an issue to occur. Finding ways to get people interested. Making it clear what the benefits are to peoples lives There can be cautiousness about opening up debate being a way to open up risk. But actually we have to have those conversations and open up the debate, and learn form that.

Q2) How do we put in enough safeguards that people understand what they consent to, without giving them too much information or scaring them off with 70 checkboxes.

A2) It is a really interesting question of consent. Public engagement can help us understand that, and guide us around how people want to consent, and what they want to know. We are trying to answer questions where we don’t always have the answers – we have to understand what people need by asking them and engaging them.

Q3) Many in the data community are keen to crack on but feel inhibited. How do we take the work you are doing and move sooner rather than later.

A3) It is about how we design data science projects. You do need to take the time first to engage with the public. It’s very practical and valuable to do at the beginning, rather than waiting until we are further down the line…

Q3) I would agree with that… We need to do that sooner rather than later rather than being delayed deciding what to do.

Q4) You talked about concerns and preferences – what are key concerns?

A4) Things you would expect on confidentiality, privacy, how they are informed. But also what is the outcome of the project – is it beneficial or could they be discriminatory, or have a negative impact on society? It comes back to causing public benefits – they want to see outcomes and impact of a piece of work.


Automated Machine learning Using H2O’s Driverless AI – Marios Michailidis, Research Data Scientist,

I wanted to start with some of my own background. And I wanted to talk a bit about Kaggle. It is the world’s biggest preictive modelling competition platform with more than a million members. Companies host data challenges and competitors from across the world compete to solve them for prizes. Prizes can be monetary, or participation in conferences, or you might be hired by companies. And it’s a bit like Tennis – you gain points and go up in the ranking. And I was able to be ranked #1 out of a half million members t here.

So, a typical problem is image classification. Can I tell a cat from a dog from an image. That’s very doable, you can get over 95% accuracy and you can do that with deep learning and neural net. And you differentiate and classify features to enable that decision. Similarly a typical problem may be classifying different bird song from a sound recording – also very solvable. You also see a lot of text classification problems… And you can identify texts from a particular writers by their style and vocabulary (e.g. Voltaire vs Moliere). And you see sentiment analysis problems – particularly for marketing or social media use.

To win these competitions you need to understand the problem, and the metric you are being tested on. For instance there was an insurance problem where most customers were renewing, so there was more value in splitting the problem into two – one for renewals, and then a model for others. You have to have a solid testing procedure – really strong validation environment that reflects what you are being tested on. So if you are being tested on predictions for 3 months in the future, you need to test with past data, or test that the prediction is working to have the confidence that what you do will be appropriately generalisable.

You need to handle the data well. Your preprocessing, your feature engineering, which will let you get the most out of your modelling. You also need to know the problem-specific elements and algorithms. You need to know what works well. But you can look back for information to inform that. You of course need access to the right tools – the updated and latest software for best accuracy. You have to think about the hours you put in and how you optimize them. When I was #1 I was working 60 hours on top of my day job!

Collaborate – data science is a team sport! It’s not just about splitting the work across specialisms, it’s about uncovering new insights by sharing different approaches. You gain experience over time, and that lets you focus your efforts on where you can focus your effort for the best gain. And then use ensembling – combine the methods optimally for the best performance. And you can automate that…

And that brings us to H2O’s diverless AI which automates AI. It’s an AI that creates AI. It is built by a group of leading machine learning engineers, academics, data scientists, and kaggle Grandmasters. It handles data cleaning and feature engineering. It uses cutting edge machine learning algorithms. And it optimises and combines them. And this is all through a hypothesis testing driven approach. And that is so important as if I try a new feature or a new algorithm, I need to test it… And you can exhaustively find the best transformations and algorithms for your data. This allows solving of many machine learning tasks, and it is all in parallel to make it very fast.

So, how does it work? Well you have some input data and you have a target variable. You set an objective or success metric. And then you need some allocated computing power (CPU or GPU). Then you press a button and H2O driverless AI will explore the data, it will try things out, it will provide some predictions and model interpretability. You get a lot of insight including most predictive insights. And the other thing is that you can do feature engineering, you can extract this pipeline, these feature transformations, then use with your own modelling.

Now, I have a minute long demo here…. where you upload data, and various features and algorithms are being tried, and you can see the most important features… Then you can export the scoring pipeline etc.

This work has been awarded Technology of the Year by InfoWorld, it has been featured in the Gartner report.

You can find out more on our website: and there is lots of transparency about how this work, how the model performs etc. You can download a free trial for 3 weeks.


Q1) Do you provide information on the machine learning models as well?

A1) Once we finish with the score, we build the second model which is simple to predict that score. The focus on that is to explain why we have shown this score. And you can see why you have this score with this model… That second interpretability model is slightly less automated. But I encourage others to look online for similar – this is one surrogate model.

Q2) Can I reproduce the results from H2O?

A2) Yes. You can download the scoring practice, it will generate the code and environment to replicate this, see all the models, the data generated, and you can run that script locally yourself – it’s mainly Python.

Q3) That’s stuff is insane – probably very dangerous in the hands of someone just learning about machine learning! I’d be tempted to throw data in… What’s the feedback that helps you learn?

A3) There is a lot of feedback and also a lot of warning – so if test data doesn’t look enough like training data for instance. But the software itself is not educational on it’s own – you’d need to see webinars, look at online materials but then you should be in a good position to learn what it is doing and how.

Q4) You talked about feature selection and feature engineering. How robust is that?

A4) It is all based on hypothesis testing. But you can’t test everything without huge compute power. But we have a genetic algorithm to generate combinations of features, tests them, and then tries something else if that isn’t working.

Q5) Can you output as a model as eg a deserialised JSON object? Or use as an API?

A5) We have various outputs but not JSON. Best to look on the website as we have various ways to do these things.


Innovation Showcase

This next session showcases innovation in startups. 

Matt Jewell, R&D Engineer, Amiqus

I’m an R&D Engineer at Amiqus, and also a PhD student in Law at Edinburgh University. Firstly I want to talk about Amiqus, and our mission is to make civil justice accessible to the world. And we are engaged in GDPR as a data controller, but also as a trust and identity provider – where GDPR is an opportunity for us. We created amiqusID to enable people to more easily interact with the law – with data from companies house, driving licenses, etc.

As a PhD student in law there is some overlap in my job and my PhD research, and I was asked about in data ethics. So I wanted to note GDOR Article 22 (3) which states that

“the data controller shall implement suitable measures to safeguard the data subject’s rights and frredoms and legitimate interests, at least the right to obtain human intervention on he part of the controller, to express his or her point of view and to the contest the decision.”

And that’s across the board. GDPR recommits us to privacy, but also embeds privacy as a public good. And we have to think about what that means in our own best practices, because our own practices will shape what happens – especially as GDPR is still quite uncertain, still untested in law.

Carlos Labra, CEO & Co-Founder, Particle Analytics

I come from a mechanical engineering background, so this work is about simulation. And specifically we look at fluids simulation in aircraft. Actually particle simulation is the next step in industry, and that’s because it has been incredibly difficult to do this simulation with computers. We can do basic computer models for large scale materials but not appropriate for particles. So in Particle Analytics we are trying to address this challenge.

So, a single simulation for a silo, and my model for a silo, has to calculate the interactions between every single particle (in the order of millions), in very small time intervals. That takes huge computing power. So for instance one of our clients, Astec, works on asphalt dryer/mixer technology and we are using particle analytics to enable them to establish and achieve new energy-based KPIs (Key Performance Indicators) that could make enormous savings per machine per year, purely by optimising to different analytics.

So we look at spatial/temporal filters, multiscale analysis, and reduce data size/noise. The Data operators generate new insights and KPIs. So the cost of simulation is going down, and the insights are increased.

Steven Revill, CEO & Co-Founder, Urbantide

I’m here to talk to you about our platform USmart which is making smart data. How do we do this? Well, when we started a few years ago we recognised that our businesses, organisations, and places, would be helped by artificial intelligence based on data. That requires increased collaboration around data and increasing reuse of data. Too often data is in silos, and we need to break it out and share it. But we also need to be looking at real time data from IoT devices.

So, our solution is USmart. It collects data from any source in real time, and we create value with automatic data pipelines with analytics, visualisation and AI ready. And that enables collaboration – either with partners in a closed way, or as open data.

So, I want to talk about some case studies. Firstly Smartline, which is taking housing data to identify people at risk of, or in, fuel poverty. We have 80m data points so far, and we expect to reach up to 700m+ soon. This data set is open and when it goes live we think it will be the biggest open data set in the UK.

Cycling Scotland is showing the true state of cycling, helping them to make their case for funding and gain insght.

And we are working with North Lanarkshire Council on business rates, which could lead to saving of £18k per annum, but can also identify incorrect rates of £!00k+ value.

If you want to find out more do come and talk to me, take a look at USmart, and join the USmart community.

Martina Pugliese, Data Science Lead, Mallzee

I am data science lead for Mallzee – proudly established and run from Edinburgh. Mallzee is an app for clothes, allowing you to like or dislike a product. We show you 150+ brands. We’ve had 1.4m downloads, 500m ratings on products, 3m products rated. The app allows you to explore products, but it also acts as a data collection method for us and for our B2B offering to retailers. So we allow you to product test, very swiftly, your products before they hit the market.

Why do this? Well there are challenges that are two sides of the same coin: Overstock where you have to discount and waste money; and Understock where you have too little of the best stock and that means you don’t have tine to make the best return on your products.

As well as gathering data, we also monitor the market for trends in pricing, discounting, something new happening… So for instance only 50.8% of new products last quarter were sold at full price. We work to help design, buying and merchandising teams improve this rate by 6-10% through customer feedback.

So, data is our backbone. For the consumer we enable discovery, we personalise the tool to you – it should save you time and money. At the same time the data also enables performance prediction. We have granular user segmentation. And it goes back to you – the best products go on the market. And long term that should have a positive environmental impact in reducing waste.

Maggie Philbin: Thank you. I’m going to ask you to feedback on each others ideas and work.

Carlos: I’m new to the data science world, so for me I need to learn more – and these presentations are so useful for that.

Martina: This is really useful for me, and great to see that lots of different things going on.

Matt: My work focuses on smart cities, so naturally interested in Steven’s presentation. Less keen on problematising the city.

Steven: Really interesting to discuss things backstage, but also exciting to hear Martina talking about how central data is for your business right now.

Maggie: And that is part of the wonderful things about being at Data Fest, that opportunity to learn from and hear from each other, to network and share.

We are back from lunch with a video on work in the Highlands and Islands using ambient technologies to predict likelihood of falls etc. 

Transforming Sectors With Data-Enabled Innovation – Orsola De Marco, Head of Startups, Open Data Institute

I’m going to talk about transforming sectors with data. The ODI, founded by Tim Berners-Lee and Nigel Shadbolt, focuses on data and what data enables.We think about data as infrastructure. If you think of data as roads you see that the number of roads do not matter as much as how they are connected… In the context of data we need data that can be combined, that is structured for connection and combination. And we look at data through open data and open innovation. What the ODI’s work has in common is that open innovation is at the core. This is not just about innovating, but also about making your organisation more porous, bringing in the outside. And I love the phrase “if you are the smartest person in the room, then you are in the wrong room”, because so often innovation comes from collaboration and from the outside.

Open innovation has huge potential value. McKinsey in 2013 predicted $3-5 trillian impact of open data; Lateral Economics (2014) puts that at more like $20 tn.

When we talk about open innovation and collaboration, we can talk about the corporate-startup marriage. We used to see linear solution having good returns, but that is no longer the case. Problems are now much more complex, and startups are great at innovation, at thinking laterally, at finding new approaches. But corporates have scale, they have reach, and they have knowledge of their industries and markets. If you bring these two together, it’s clear you can bring a good opportunity to live.

As example I wanted to share here is Transport for London who wanted to release open data to enable startups and SMEs to use it. CityMapper is one of the best known of these tools built on the data. Last year, after several years of open data, they commissioned a Deloitte report (2017) that this release had generated huge savings for TfL.

Another example is Arup. Historically their innovation had been taking place in house. They embraced a more open approach, and worked with two of our start ups Macedon C and Smart Sensors. Macedon C helped Arup explore airport data so that Arup didn’t need to do that processing. Smart Sensors installed 200 IoT sensors, sharing approaches to those sensors, what it means to implement IoT in buildings, how they could use this technology. And they rolled them out to some of their services.

Those are some examples. We’ve worked with 120 startups across the world. And they have generated over £37.2M in sales and investment. These are real businesses bringing real value – not just a guy in a shed. The major challenge is on the supply side of the data. A lot of companies are reluctant to share, mentioning three blockers: (1) it feels very risky to open data up – that issue feels highly relevant this week; (2) its expensive to do especially if you don’t know the value coming back; (3) perceived lack of data literacy and skills. Those are all important… But if you lead and innovate, you get to set the tone for innovation in your sector.

The idea of disruption is raised a lot, but it is real. But to actually disrupt you do really need a culture of open innovation is essential to lead. It needs to be brought in at senior level and brought into the sector.

Data infrastructure can transform sectors. And joining forces between data suppliers and users are important there. For instance we are working on a project called Open Active, with Sport England. A lack of information on what was going on in different areas was an issue for people getting active. We were involved at the outset and could see that data was the blocker here… If you tried to aggregate information it was impossible. So, in the first year of the programme we brought providers into the room, agreed an open standard, and that enabled aggregation of data. We are now in the second phase and, now that the data is consistent and available, we are bringing start ups in to engage and do things with that data. And those start ups aren’t all in sports, some are in healthcare sector – using sports data to augment information shared by medics. And from leisure companies helping individuals to find things to do with their spare time.

Another example is the Open Banking sector. Over 60% of UK banking customers haven’t changed their bank account in 5 years. And many of those haven’t changed them in 20 years. So this initiative enables customers to grant secure access to your banking details for e.g. mortgage lenders, or to enable marketplaces to offer energy switching companies. Our experience in this programme was to facilitate these banks, and took that experience of data portability… And now we are working with Mexico on a FinTech law that requires all banks to have an open API.

In order to innovate in sectors it’s important to widen access to data. This doesn’t mean not taking data privacy seriously, or losing competitive advantage.

And I wanted to highlight a very local programme. Last year we began a project in the peer to peer accommodation market. The Scottish expert advisory panel noted that whilst a lot of data is generated, no real work is looking at the impact of the sharing economy in accommodation. That understanding will enable policy decisions tied to real concerns. We will be making recommendations on this very soon. If you are interested, do get in touch and be part of this.


Q1) You talked a lot about the value of data. How do you measure that economic value like that?

A1) We base value on sales and investment generated, and/or time or money saves in processes. It’s not an exact science but it looks for changes to the status quo.

Q2) What is the most important and valuable thing from your experience here?

A2) I think I’ll approach that answer in two ways. We do innovate work with data but we often facilitate conversations between data provider and start ups. For making data available we remove those blockers; for start ups it’s helping that facilitate those conversations, it’s helping them grow and develop and tailoring that support.

Q3) What next?

A3) Our model is a sector transformation model. We talk to a sector about sharing and opening up, and then we have start ups in an accelerator so that data will find a use. That’s a huge difference from just publishing the data and wondering what will happen to it.

Designing Things with Spending Power – Chris Speed, Chair of Design Informatics, University of Edinburgh

I have a fantastic team of designers and developers, and brilliant students who ask questions, including what things will be like in Tomorrow’s World!  We look at all kinds of factors here around data. So I want to credit that team.

Many of you in the room will be aware that data is about value constellations, rather than value chains. These are complex markets, many players – which may be humans but also which may be bots. That changes our capacity to construct value, since we have agents that construct value. And so I will talk about four objects to look at the disruption that can be made, and what that might mean, especially as they gain agency, to gain power. One of the things we thought was, what happens when we give things spending power.

See diagram from Rand organisation comparing centralised with decentralised and distributed – we see this model again and again… But things drift back occasionally (there’s only one internet banking platform now, right?). I’m going to show this 2014 bitcoin blockchain transaction video – they move too fast to screengrab these days! So… what happens when we have distributed machines with spending power? And when transactions go down to absolutely tiny transactions and amount of money.

So, we run BlockExchange workshops, with lego, to work on the idea of blockchain, what it means to be a distributed transaction system.

Next we have the fun stuff… What happens when we have things like Ethereum… And smart contracts. What could you do with digital wallets. If the UN gives someone a digital password, do they need sovereignty. So, we undertake bodily experiments with this stuff. We ran a physical experiment – body storming – with bitcoin wallets and smart contracts… A bit like Pokemon Go but with cash – if you hit a hotspot the smart contract assigns you money, Or when you enter a sink, you lose bitcoin. So, here is video of our GeoCoin app and also an experiment running in Tel Aviv.

These three banking volunteers design to design a new type of cinema experience… They enter the cinema by watching two trailers that are pickupable in the street… Another colleague decides not to do this… They gain credit by tweeting about trailers… bodystorming allows new ideas to be developed (confusingly, there is no cinema… This is, er, a cinema of the mind – right Chris?). 

Next we have a machine with a bitcoin wallet. Programmable money allows us to give machines buying power… Blockchain changes the history to things, adding value to value… So, we set up a coffee machine Bitbarista, with an interface that asks the coffee drinker to make decisions about what kind of coffee they want, what values matter… Mediating the space between values and value.

We have hairdryers – these are new and have just gone to the Policy Unit this week. We have Gigbliss Plus hairdryer… That allows you to buy and trade energy and to dry your hair when energy is cheaper… What happens when you do involve the public in balancing energu. And we have another hairdryer… That asks whether you want unethical energy now, or whether you want to wait for an ethical source – the hairdryer switches on accordingly. And then we have Gigbliss Auto, which has no buttons. You don’t have control, only the bitcoin wallet has decision powers… You don’t know when it comes on… But it will. But it changes control. Of those three hairdryers, which are we happy to move to… Where do we feel happy here.

And then we have KASH cups, with chips in them. You can only but coffee when you put two cups down. So you get credit, through the cups digital wallet, to encourage network and development. You don’t have to get copy – you can build up credit. We had free coffee in the other room… But we had a very fancy barista for the KASH cups, and people queued for this for 20 minutes – coffee with social value.

Questions for us… We give machines agency, and credit… What does that mean for value and how we balance value.

Maggie: It’s at this point I wish Tomorrow’s World still existed!


Q1) where is this fascinating work taking you?

A1) I think this week has been so disruptive in terms of data and technologies disruption of social, civic, political values. I think understanding that we can’t balance value, or fair trade, etc. on our own is helpful and I’m really excited by what bots can offer here…

Q2) I was fascinated by the hairdryers… I’ve been in the National Grid’s secret control room and seeing that, that thing of Eastenders finishes and we make a cup of tea means bringing a whole power station on board… But waiting 10 minutes might avoid that need. It’s not trivial it’s huge.

A2) Yes, and I think understanding how that waiting, or understanding consequences of actions would have a real impact. The British public are pretty conscious and ethical I think, when they have that understanding…

Q3) Have you thought about avoiding queues with blockchain?

A3) We don’t want to just play incentives to get people out of queues. People are there for different reasons, different values, some people enjoy the sociability of a queue… Any chance to open it up, smash it up, and offer the opportunity to co-construct is great. But we need to do that with people not just algorithms.

Maggie: At this point I should be introducing Cathy O’Neil, but she has been snowed in by 15 inches of snow on the East Coast of the US. So, she will come over at a later date and you’ll all be invited. So, in place of that we have a panel on the elephant in the room, the Facebook and Cambridge Analytica scandal, with a panel on data and ethics.

Panel session: The Elephant in the Room: What Next? – Jonathan Forbes (JF), CTO, Merkle Aquila (chair); Brian Hills (BH), Head of Data, The Data Lab; Mark Logan (ML), Former COO Skyscanner, Investor and Advisor to startups and scale ups; Mhairi Aitken (MA), Research Fellow, University of Edinburgh. 

JF: So, thinking of that elephant in the room.. That election issue… That data use. I want to know what Facebook could have done better?

ML: It has taken them a long time to respond, which seems strange… But I see it as a positive really. They see this as a much bigger issue rather than the transactional elements here. In that room you look at risk and you look at outrage. I think Facebook were trying to figure out why outrage was so high, I think that’s what has surprised them. I think they took time to think about what was happening to them. I don’t think it’s just about electing a game show host to president… The outrage is different. Cambridge Analytica is a bad actor, not just on data but on their advocacy for other problematic tactics. Facebook shouldn’t be bundled into that. I think aspects here is that you have a monopoly. Facebook is an advertising company – they need to generate data and pass it onto app developers. Those two things don’t totally aligned. And I think the outrage is about trust and expectation of users.

JF: You are closest to the public in your research. The share price is dropping significantly right now… How, based on past experience, do you see this playing out.

MS: I’m used to talking to people about public sector use of data. Often people talk about Facebook data and make two points: firstly that they contribute their own data and control  that and know how it’s used; but they also have very high expectations of use for public sector organisations and don’t have that for private sector organisations – they think someone will generate ads and profit but when used in politics that’s very different, and that changes expectations.

JF: I enjoyed your comment about the social license… and I think this may be a sign that the license is being withdrawn. The GDPR legislation certainly changes some things there. I was interested to see Tim Berners Lee’s response, taking Mark Zuckerberg’s perspective… I was wondering, Brian, about the commercial pressures and the public pressures here. Are they balancing that well?

BH: No. When we look back I think this will be a pivotal moment. I kind of feel like GDPR piece is like being in a medieval torture chamber… We have a countdown but the public don’t know much about it. With Facebook it’s like we have a firework in the sky and people are asking what on earth is going on… And we have an opportunity to have a discussion about the use of data. As we leave today we have a challenge around communicate our work with data, what are our responsibilities here. The big data thing, many business cases seem like we’ve failed – we’ve focused on the technology and only that. And I feel we now have an opportunity and a window here.

JF: I’d like to take the temperature of the room… How many of you had Facebook on their phone, and don’t this week? None.

ML: I think that’s the point. The idea of not doing to others data what you wouldn’t want done to your own… But the reality is that legislation is playing catch up to practice. Commercially it’s hard to do the right thing. I think Mark Zuckerberg has reasonably good intentions here… But we have this monopoly… The parallel here is banking. And monopoly legislation hasn’t kept pace with the monopolies we have. I think it would be great if you could export your data, friends data, etc. to another platform. But we can’t.

Comment: I think you asked the wrong question… Who here doesn’t Facebook on their phone at all. Actually quite a lot. I think actually we have that sense that power corrupts and absolute power corrupts absolutely. And I don’t feel I’m missing out, I’m sure others feel that too. And I’m unsurprised about Facebook, I could see where it was going.

JF: OK, so moving towards what we can do, should we have a code of conduct, a hypocratic oath to data, a “do no harm”.

BH: I don’t see ethics featuring in data models. I think we have to build that in. Cathy O’Neil talks about Weapons of Math Destruction… We have to educate our data science students how to use these tools ethically, to think about who they will work with. Cathy was a Quant and didn’t like that so she walked away. We have to educate our students about the choices they make. We talk about optimisation, optimisation of marketing. In optimising STEM stuff… And we are missing stuff… I think we need to move towards STEAM, where A is for Arts. We have to be inclusive for arts and humanities to work with these teams, to think about skills and diversity of skills.

JF: Particularly thinking about healthcare

MA: There is increasing drive to public engagement, to public response. That has to be much more at the heart of training for data scientists and how it relates to the society we want to create. There can be a sense of slowing momentum, but it’s fundamental to getting things right, and shaping directions of where we are going…

JF: Mark, you mentioned trust, and your organisation has been very focused on trust.

ML: These multifacet networks are built on trust. For Skyscanner trust was so much more important than favouring particular clients. I think Facebook’s error has been to not be more transparent in what they do. We have had comments about machine learning as hype, but actually machine learning is about machines learning to do something without humans. We are moving to a place where decisions will be made by machines. We have to govern that, and to police machines with other machines. And we have to have algorithms to ensure that machine learning is appropriate and ethical.

JF: I agree. It was interesting to me that Weapons of Math Destruction is the top seller in algorithms and programme – a machine generated category – but that is reassuring that those working in this space are reading about this. By show of hands how many here working in data science are thinking about ethics. Some are. But unclear who isn’t working with data, or who isn’t working ethical. So, to finish I want your one takeaway for this week.

BH: I think it’s up to us to decide how to do things differently, and to make the change here. If we are true data warriors driving societal benefit then we have to make that change ourselves.

ML: We do plenty to mess up the planet. I think machine learning can help us sort out the problems we’ve created for ourselves.

MA: I think its been a wonderful event, particularly the variety and creativity being shared. And I’m really pleased to open up these conversations and look at these issues.

JF: I’m optimistic too. But don’t underestimate the ability of a small group of committed people to change the world. So, Data Warriors, all of you… You know what to do!

Maggie: Thank you all for your conversation, your enthusiasm. One message I really want to give you is that when you look at the use of data, the capacity to do good… The vast majority of young people are oblivious. They could miss out on an amazing career. But as the world changes, they could miss out on a decent career without these skills. Don’t underestimate your ability as one person with knowledge of that area to make a difference, to influence and to inspire. A few years back, in Greenock, we ran an event with Teen Tech and the support of local tech companies made all the difference… One team went to the finals in London, won and went to Silicon Valley… And that had enormous impact on that school and community, and now all S2 students do that programme, local companies come in for a Dragon’s Den type set up. Any moment that you can inspire and support those kids will make all the difference in those lives, and can make all the difference, especially if family, parents, community don’t know about data and tech.

Closing Comments – Gillian Docherty, CEO, The Data Lab

Firstly thank you to Maggie for being an amazing host!

I have a few thank yous to make. It has been an outstanding week. Thank you all for participating in this event. This has been just one event of fifty. We’ve had another 3000 data warriors, on top of you 450 data warriors for Data Summit. Thank you to our amazing speakers, and exhibitors. The buzz has been going throughout the event. Thank you to our sponsors, and to Scottish Government and Scottish Enterprise. Thank you to our amazing volunteers, to Grayling who has been working with the press. To our venue, events team and caterers. Our designer from two fifths design. And the team at FutureX who helped us organise Data Talent and Data Summit – absolutely outstanding job! Well done!

And two final thank yous. Firstly the amazing Data Lab team. We have thousands of new people being trained, huge numbers of projects. I also want to specifically mention Craig Skelton who coordinated our Fringe events; Cecilia who runs our marketing team; and Fraser and John who were behind this week!

My final thank you is to all of you, including the teams across Scotland participating. It is a fantastic time to be working in Scotland! Now take that enthusiasm home with you!

 March 23, 2018  Posted by at 10:48 am Events Attended, LiveBlogs Tagged with: , , ,  No Responses »
Mar 222018

Today I am at the Data Fest Data Summit 2018, two days of data presentations, showcases, and exhibitors. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to the Data Lab – Gilian Doherty, The Data Lab CEO

Welcome to Data Summit 2018. It’s great to be back, last year we had 25 people with 2000 people, but this year we’ve had 50 events and hope to reach over 3500 people. We’ve had kids downloading data from the space station, we’ve had events on smart meters, on city data… Our theme this year is “Data Warrior” – a data warrior is someone with a passion and a drive to make value from data. You are data warriors. And you’ll see some of our data warriors on screen here and across the venue.

Our whole event is made possible by our sponsors, by Scottish Enterprise and Scottish Government. So, let’s get on with it!

Our host for the next two days is the wonderful and amazing Maggie Philbin, who you may remember from Tomorrow’s World but she’s also had an amazing career in media, but she is also chair of UK Digital Skills and CEO of Teen Tech, which encourages young people to engage with technology.

Intro to the Data Summit – Maggie Philbin

Maggie is starting by talking to people in the audience to find out who they are and what they are here for… 

It will be a fantastic event. We have some very diverse speakers who will be talking about the impact of data on society. We have built in lots of opportunities for questions – so don’t hesitate! For any more information do look at the app or use the hashtag #datafest18 or #datasummit18.

I am delighted to introduce our speaker who is back by popular demand. She is going to talk about her new BBC Four series Contagion, which starts tonight.

The Pandemic – Hannah Fry

Last year I talked about data for social good. This year I’m going to talk about a project we’ve been doing to look at pandemics and how disease spreads. When we first started to think about this, we wanted to see how much pandemic disease is in people’s minds. And it turns out… Not many.

Hannah’s talk was redacted from this post yesterday but, as Contagion! has now been broadcast, here we go: 

Influenza killed 100 million people in the 20th Century. The Spanish Flu killed more people in one year than both World Wars. Which seems surprising but that may be partly because Pandemic Flu is very different from Seasonal Flu. Pandemic Flu is where a strain of flu jumps from animals to humans and spreads so fast that we can’t vaccinate fast enough. For that reason Pandemic Flu is the top of the UK Government’s Risk Register.

So, what we decided to do was essentially a TV stunt with a real purpose. We built a simple smart phone app. The App captures where people are, and how many people they are with. That allows us to see how disease might spread. Firstly to do that for TV of course, but secondly this is proper citizen science for real research. So, I spent a year calling in lots of favours, getting on all sorts of media, asking people to download an app.

But we also needed a patient zero, and we also needed a ground zero. We picked Haselmere in Surrey, which is a sort of Goldilocks town, just big enough, well connected.. A beautiful English town… Just the type you’d like to destroy with an imaginary virus. And I was patient zero… So I went there, went to the gym, went to the shops, went to the pub,,, But unknown to me I also walked past others with the app… So when I stood need to one of these , it was for enough time to infect that person… And so now there were two people and then many more… A pharmacist got infected early on and continued infecting out…

These patterns are based on our best mathematical models for infection… And you can quickly see pockets of infection developing and growing. Spreading quickly to a whole town. But those dots on a map are all real people…

Looking at some real infection sites…. So, in Petersfield there is a school were a few kids from Haselmere attend, commuting by train. Three kids running our app… By day three, two were infected, one wasn’t. They went to the break room, and outside, and the third person got infected… And then infected their family…

I wanted to also talk about a person from Haselmere who work in London on Day Two. Two people from the town don’t know each other, but they took the train home, and the one infected the other…

Now, this is just the Haselmere experiment, but we did a nationwide experiment…

We persuaded 30,000 people to download the app and take part… Again, it starts with me walking around Haselmere. By a month in, London is swamped. Two months in it sweeps Scotland. By three months it’s in North Ireland. Really by then only the North of Scotland was safe! What is startling isn’t the speed of the spread, but also how many people get infected… This is the most accurate model we have to date. The most accurate estimate for a Spanish Flu type virus, is a staggering 43,343,849. A conservative fatality rate of 2% would be 886,877 deaths. But that’s worst case scenario… That’s no interventions… Which is why this data and this model are so important as they allow you to understand and trial intervention. Generally most people infect the same small number of people, but some super spreaders have a much bigger impact. If you target super spreaders with early vaccination – just vaccinating a targeted 10% – makes a huge difference. It really slows the spread, giving yourself a fighting chance to overcoming infection.

We know these pandemics can and will happen, but it’s about what you plan for and how you intervene. The only way to answer those big questions and to know how to intervene, is to understand that data, to understand that spread. So we are anonymising this data set and releasing it to the academic community – as a new gold standard for understanding infection. Data really does save lives.


Q1) So, Shetland is safe…. Unless the infection started there.

A1) When we spoke to one person about what they’d do in a pandemic, they said they’d get in a car with their kids and just

Q2) I’m from the NHS and there has been a lot of work of super spreaders, closing schools… Has there been work on the most efficient, mathematically effective patterns to minimise infection.

A2) Schools are an interesting one… Closing schools sounds like it makes everything simple. Sometimes shutting schools means kids share in an unpredictable manner as they will go places too. And then you reopen schools and reinfect potentially… And that’s without the economic impact. These are all questions we are thinking about.

Q3) That’s awesome and scary. What about people developing immunity.

A3) Our model is no immunity, and no-one recovers. But you can build that data in later, adding rish assumptions. And some of the team working on this are looking at infection transmitted through the air – some viruses can stick around a few hours.

Q4) I remember the SARS book. I’m very paranoid… Brought suits, gloves, bleach… In New Zealand you need a two week supply of stuff in your house… If we did that, how would that make a difference.

A4) Yes… So for instance the government always pushes messages about hand washing whenever flu is taking place. It doesn’t feel that that would make a big difference… But at a population level it really does…

Q5) My question is whether you will make the data available for other people – for epidemiology but also for transport, for infrastructure.

A5) Yes, absolutely. We wanted to make this as scientifically rigorous as possible. The BBC gives us the scale to get this work done. But we are now in the process of cleaning the data to share it. Julia Gog at Cambridge is the lead here so look out for this.

Q6) What about data privacy here?

A6) At a national level the data is accurate to 1 km squared, with one pin every 24 hours. Part of the work to clean the data is checking if it can be reverse engineered to make sure that privacy is assured. For Haselmere there is more detail… We are looking at skewing location, at just sharing distance apart rather than location, and at whether there is any way you can reverse engineer the dataset if you’ve seen the TV programme, so we are being really careful here.

Business Transformation: using the analytics value chain – Warwick Beresford-Jones, Merkle Aquila

I’ll be talking about the value chain. This is:

Data > Insight > Action > Value (and repeat)

Those two first aspects are “generation” and the latter two are “deployment”. We are good at the first two, but not so much the action and value aspects. So we take a different approach, thinking right to left, which allows faster changes. Businesses don’t always start with an end in mind, but we do have accessible data, transformatic insights, organisational action, and integrated technology. In many businesses much of the spend is on technology, rather than the stage where change takes place, where value is generated for the business. So that a business understands why they are investing and what the purpose of this.

I want to talk more about that but first I want to talk about the NBA and the three point line, and how moving that changed the game by changing basket attempts…And that was a tactical decision of whether to score more points, or concede fewer points, enabling teams to find the benefit in taking the long shot. Cricket and Football similar use the value chain to drive benefit, but the maths work differently in terms of interpreting that data into actions and tactics.

Moving back to business… That right to left idea is about thinking about the value you want to derive, the action required to do that, and the insights required to inform those actions, then the data that enables that insight to be generated.

Sony looked at data and customer satisfaction and wanted to reduce their range down from 15 to 4 handsets. But the data showed the importance of camera technology – and many of you will now have Sony technology in the cameras in your phones, and they have built huge value for their business in that rationlisation.

BA wanted to improve check in experiences. They found business customers were frustrated at the wait, but also families didn’t feel well catered for. And they decided to trial a family check in at Heathrow – that made families happier, it streamlined business customers’ experience, and staff feedback has also been really positive. So a great example of using data to make change.

So, what questions you should be asking?

  • What are the big things that can change our business and drive value?
  • Can data analytics help?
  • How easy will it be to implement the findings?
  • How quickly can we do?

Q1) In light of the scandal with Facebook and Cambridge Analytica, do you think that will impact people sharing their data, how their data can be used?

A1) I knew that was coming! It’s really difficult… And everyone is also looking at the impact of GDPR right now. With Facebook and LinkedIn there is an exchange there in terms of people and their data and the service. If you didn’t have that you’d get generic broadcast advertising… So it depends if people would rather see targeted and relevant advertising. But then with some of what Facebook and Cambridge Analytica is not so good…

Q2) How important is it for the analysts in an organisation to be able to explain analytics to a wider audience?

A2) Communication is critical, and I’d say equally important as the technical work.

Q3) What are the classic things people think they can do with data for their business, but actually is really hard and unrealistic?

A3) A few years ago I was meeting with a company, and they gave an example of when Manchester United had a bad run, and Paddy Power had put up a statue of Alex Ferguson with a “do not break glass sign” and they asked how you can have that game changing moment. And that is really hard to do.

Q4) You started your business at your kitchen table… And now you have 120 people working for you. How do you do that growth?

A4) It’s not as hard as you think, but you have to find the right blend of raw talent with experience – lots of tricky learning.

Project Showcase

How will you make a difference? I’m going to talk about how I’ve made major change for one of Scotland’s biggest organisation. I was working for Aggreko, the leader of mobile modular power and temperature solutions. They provide power for the Olympics, the World Cup, the Superbowl… A huge range of events across the world.
We are now watching a short video on how Aggreko supplies large scale mobile power (30 MW set up in 17 days) to cover local demand in Macha Pichu when a hydroelectric plant has to be shutdown for maintenance. 
In the dark old days Aggreko was a reactive organisation. A customer would ring with an issue, then Aggreko would send an engineer out. And then they moved to monitoring the mobile power kit to help monitor equipment across the world on a 24/7 basis. My team build the software to undertake that monitoring, to respond to every alert, alarm, any issue customers might face. And in fact in many cases to fix an issue before a customer ever became aware of it. And that meant far greater reliability and efficiency. And doing that we wondered how we might be able to predict issues, to predict how eqyuipment might fail. We didn’t know how to do that and we weren’t afraid to ask…
So we went to the Data Lab, took my idea to their board, and they funded a year long pilot to work with University of Strathclyde and Microsoft, as well as needing to build a team of engineers, technicians, specialists to be part of the team to take this far. This was a group of massively smart group, but also some big egos… A lot of what I had to do was to ensure there was good collaboration across those teams. The collaboration is really what made this project a real success. We created an advanced analytics team which allowed us to put models into use, some of which could predict an issue 2 weeks ahead of any issue, and being able to manage those issues for our customers.
The guys at Data Lab helped me to make a difference, they were brilliant and all that help is available to you too. So what are you waiting for?  
There are various ways to resolve this, but they are not easy. There is work for the 1% of large companies, but that leaves SME out. And 50k SMEs go out of business every year in the UK. So, what is the solution? Well, let me tell you about Previse and what we do. We think we have a unique solution. David Brown, one of our co-founders, had experience in the sector, and he didn’t want to accept the status quo. Accounting the oldest processes and data that a company is, but no-one is using that in this sort of way. So what do we do?
Previse finds data, engages with data, pulls in other data… And looks at what can work. We can look at all data on every invoice from every supplier. We then determine a score, and a threshold…. So that when invoices come in they can be prioritised and mostly approved and paid immediately. The process is the same for the buyer but it makes a huge difference for the supplier. Placing an invoice through Previse you can send and have approved invoices very swiftly, and without chasing and additional work. That is a huge difference in cost and time. The large corporates we’ve been talking with – including 70% of large FTSE companiess – are really enthusiastic and want us to help them.
And our experience in Scotland has been incredible. The Data Lab helped us throughout, finding the right universities to work with. We work with Heriot Watt (Mike Chantler) and with MBN to find the right resources, and Scottish Enterprise have helped us make Scotland our hub for data science and software engineers. We’ve employed 5 people in the last 6 months, and we’ll double that by the end of the year. We can generate growth, but it’s also about making real change with data.
If SMEs are paid on time, that allows them to thrive and grow. It’s a huge problem and we think it can be resolved.
Our platform consists of four modules: sustainability; mapping; reporting and advanced. But I’ll talk about our mapping module and some projects we’ve worked on:
  • Mapping the water footprint of your crops – a project with the University of Edinburgh, funded by Data Lab. This brings together a wide range of crop data layers. We have an overlay based on water for crop growing, and overlays of gray water, or the erosion potential – for instance there is high erosion potential on the west coast of Scotland, mmostly low erosion in the east of Scotland.
  • Forests 2020 is a Mexican application supported by the UK Space Agency, and we work with University of Edinburgh, University of Leicester, and Carbomap. Here we can see deforestation patterns, and particular crop areas.
  • Innovate UK: farm data, which is a collaboration with Rothamsted Research, Environment Systems, and Innovate UK – this is at an early stage looking at crop rotation data for UK and export markets. And you can also see the soil you are growing on, what can be planted, what sort of fertilisers to use.
  • Sustainability risk – supports  understanding of risks such as water depletion, and the various factors impacting and shifting that.
  • We also have tools for government to know how to plan what type and locations they should be building power plants in.

So, in conclusion, layering data allows us to gain new insights and understanding.

After a good lunch and networking session we are now back in the main hall, starting with a video on the use of data in Heineken production process. And an introduction to Stefaan Verhulst, a former Glasgow graduate now based in New York.

Data Driven Public Innovation In Partnership With The Private Sector: The Emerging Practice Of Data Collaboratives – Stefaan Verhulst, Co-founder and Chief Research and Development Officer, The Gov Lab

I’m delighted to be back in Scotland for this event looking at how data can be help society, and how society can be. That is also the focus of The Gov Lab in New York. And we also look at how we can unleash data for good.

An example  want to give you is the earthquake in Nepal a few years ago. It was a terrible event but it was also inspiring too, because Ncell, a cell phone operator, and Flowminder (based in Sweden and the UK) worked together to map the flow of people to intervene, to save lives. It is a great example of using data in the public good. And it’s an example of the growth of available data, including web crawling/scraping/search analysis; social media; retail data etc. all collected by the private sector. But we also have new data science to address this data, to gain meaning from this data. And often that expertise to extract meaning is sitting in the private sector.

So, the real question is how we extract value and engage with the private sector around data they collect. That’s a whole different ballgame from open government data. It’s not just about data sharing, but about new kinds of public-private sharing around data for the public good. So we have set up new programmes of Data Collaboratives. So we set up the Data Collaboratives Explorer allows you to explore those collaborations taking place – there are over 100 in there already. From that collaborative work we have gained some insights that I will share today.

So, firstly, data collaboratives are important across the policy lifecycle:

  • That starts with situation analysis. Corporations in the US have worked together in the US to understand the scale of the opioid epidemic, for instance.
  • Our second value proposition is about knowledge creation. For instance, post hurricane season how does the mosquito population change and how does that change mosquito born diseases.
  • Our third value proposition is prediction, fr instance projects to predict suicide risk from search results – a project in Canada and also in India.
  • And then we have evaluation and impact assessment. An example here is Vision Zero Labs looking at traffic safety and experiments in spatial composition to influence and reduce risk of accidents.

In those collaboratives we see different models in use. These include: data pooling – enabling sharing and analysis across the collaboration; prizes and challenges – opening some data as a source of generating new insights through innovative ideas and projects that benefit both public and private sector, e.g. BBVA’s Innova challenge; research partnerships – with collaboration across private sector and public or academic sector – such as work on fake news on Twitter; intelligence products – JP Morgan Chase has an institute to extract insights from their own data and actually that can be hugely detailed and valuable; API – for instance Zillow allows you to access real time mortgage and housing market data; trusted intermediary – for instance Dalberg who acts between telecommunications companies and others.

So, there are many ways to set up a data collaborative. But why would the private sector want to do this? Well, they may be motivated by reciprocity – sharing data may lead to access to specialist expertise; research and insights; revenue; regulatory compliance; reputation and retainment of talent – often corporations need to retain talent through solving harder or more interesting problems; responsibility.

But there are challenges too. For instance the taxi and limousine agency in New York regulates all taxi operations, including Uber. In their wisdom they shared the data… But that exposed some celebrity locations (and less salubrious locations). The harm here wasn’t huge but that data in a different cultural contexts could present a much higher risk. So, some of the concerns around sharing data include:

  • privacy and security
  • generalisability and data quality (e.g. not everyone has a cell phone)
  • competitive concerns
  • cultural challenges – there is something of a culture of hoarding data within organisations.

So, to move towards data responsibility we really need risk and value assessment that recognises data as a process, and part of a wider value chain. We need fair information practices and processes – our principles are about 30 years out of date and we urgently need new principles and processes. GDPR helps, but not all the challenges we may have. We need new methods and approaches. And that means having a decision tree across the data cycle.

There are risks in sharing data, but there are also risks in not sharing the data. If we had not have used the NCell data in Nepal, we would have had more deaths. So we have to respond not just to risks, but also to opportunity cost of not sharing data. What is your responsibility as a corporation?

I’ve given lots of examples here… But how do we make data driven public innovation systemic? We need data stewards in organisations so there is someone who can sign off on data collaboratives, we need that profession in place in organisations to enable work with the public sector. We need methods – like the Unicef collaboratory around childhood obesity, that’s a new methology. We also need new evidence, of how data can be used and what impact it will have. And finally we need a movement – this all won’t happen without a movement to establish data collaboratives, and I’m delighted to be here today as part of this movement, and ultimately use data to improve peoples lives.


Q1) In light of Cambridge Analytica and Trump, aren’t we misusing data?

A1) I think use is part of that value chain and we have to have a debate about what kind of use we are comfortable with, and which we are not. And that case also raises questions about freedom of expression, and a need to regulate against deceptive behaviours.

Q1) Several years ago hashtags brought down governments in the Middle East, and now we have governments in those countries controlling the public through hashtags. It’s scary.

A1) I’ve been working in privacy for many years, and I really encourage a comparison of risks and value. And to do a cost-benefit analysis. We need to rebalance that.

Gillian is introducing our special guest… Minister Derek MacKay

Message from the Scottish Government – Derek Mackey, MSP, Cabinet Secretary for Finance & Constitution, the Scottish Government

I’m not sure that I’ve thought of myself as a data warrior before, but I did teach the Social Security Minister how to use Instagram the other week! I say that partly as I have an appeal and a plea for you… The First Minister has a huge set of followers on Twitter, but I’m stuck just below 18k… Maybe you are the audience to take me over that line!

There’s a lot I want to cover in terms of the excitement of this event. We have a strong reputation and record in Scotland. With responsibility for the budget and internationalisation, this is really exciting. I’m particularly enthused by the international representation including Brazil, Singapore, USA, and Ireland too. This event allows us to put the spotlight on data science in Scotland. It is a natural place for people to come and do business. And this is a great event with business leaders here, with experience to share with others.

Our government, Scottish Enterprise and Data Lab are working together to build innovation and business in Scotland. We are fortunate in Scotland to have world class data resources. Scotland has Universities, 5 of which are in the top 100, and we have 70% of reseach rated as excellent in the last REF. We can feel this group. Data Driven Innovation has the potential to deliver £20bn value to Scotland in the next five years. This buzz can be harnessed to make Scotland the Data Capital in Europe. I paricularly support the growth in FinTech. Many people describe themselves as disruptors – that would have once been seen as a negative but is now a real positive, about opening new opportunities. And data helps us deliver our work, one example of which is the Cancer Challenge which is helping us understand how best to use our resources for the best outcomes.

The Scottish Government Innovation Action Plan seeks to build a sustainable economy, with skills crucial to that, including funding for business growth, innovation, etc. We’ve also launched the Scottish Digital Academy and the Data Science Accellerator to look at how things are changing, to innovate working methods – such as CivTech’s innovative models. We are really serious about business growth, the economy and skills. We have invested in innovation, education and internationalisation. We are the strongest part of the UK outside London and the SouthEast.

So, the Scottish Government supports your enthusiasm for data, for what can be done with data. High tech, low carbon is the future we see that, and we want to be country welcome in Europe and the rest of the world – we don’t support the UK government’s view on Europe.

I commend your work and hope that you have a fruitful and enjoyable time here. And we hope the collaboration of our agencies helps to bear fruit now and in the future.

Improving Transparency In The Extractives Industry Using Data Science – Erin Akred, Lead Data Scientist, DataKind

I am a data scientist from DataKind where we harness data for the improvement of humanity. We exist to use data to see the kind of world we want to see. The challenge we face is that many not for profits, charities, government agencies etc. do not have the resources to do the types of datascience that the private sector (e.g. Netflix) can. So we link pro bono data scientists with organisations with a social mission.

Last year we did a project looking at automating detecting mines from earth observation imagery. We are used to using this data for other purposes, but this is a challenging problem. I will talk more about this but I wanted to talk more about DataKind.

Our founder, Jake, was working at the New York Times on data science, and saw people volunteering and attending hack events at the weekend, giving back on their talents… So he thought perhaps I could partner with a mission driven organisation, could I organise a similar event and make this happen… He started DataKind and we’ve been developing what we can offer these mission-driven organisations who also want to benefit from Data Science. So we now pair data scientists with mission driven projects. We have over 18k community members worldwide, 6 chapters in 5 countries (US, Bangalore, Singapore, Dublin, London, San Francisco, Washington DC), we have chapter applicants in 40+ global cities; 228 events worldwide; and we’ve worked on over 250 projects generating about $20m value generated in volunteer effort.

On example project has been with the Omidyar Network to look at data science solutions that might enable social actors to operate more effectively and efficiently in their efforts to combat corruption in the extractives industry. Now we don’t start with the data that is out there. Our funders really want impact, and we think of that as impact per dollar. So, anyway, the context of this work was illegal mining which can cause conflict in Eastern Demographic Republic of Congo, it includes poor environmental outcomes, and social challenges. As data scientists we partner with other organisations to ensure we know how to get value out of data insights.

To understand illegal mining we have to know where it is taking place. So we did work on machine learning from images. We worked with Global Forest Watch and IPIS.

Now, not all of our projects are successful… Usually projects fails because of issues in:

  • Problem statement – a well thought through problem statement is really important.
  • Datasets
  • Data Scientists
  • Funding
  • Subject Matter Expertise
  • Social Actors

Now, I spoke to someone last night who has run lots of Kaggle projects – crowdfunded data science challenges. Now in those projects you have data, data scientists but you don’t have subject matter experts – and that’s crucisl knowledge and skills to have on board. For instance when looking at malaria, there was a presumption that mosquito nets would be helpful, but the way they work looks like a shrine, like death… And they don’t want to sleep in them. So they used them as fiishing nets.

When we work with an organisation we do want a data set, but we also want an organisation open to seeing what the data reveals, not trying to push a particular agenda. And we have subject matter experts that add crucial context and understanding of the data, of any risks or concerns with the data as well.

We start with, e.g.:

We want to create image classification models

Using publicly available earth satellite imagery

So that those owrking in the transparancy sector can be made aware of irregular mining activity

So that they can improve environmental and conflict issues due to mining. 

Some of the data we use is open – and a lot of data I’ve work with is open – but also closed data, data generated by mission-driven organisational apps, etc.

And the data scientists on these projects are at the top of their game, who these organisations could not afford to work with or recruit earlier.

So, for this project we used a random forest analyser on the data, to find mine locations. We had had generated training data for this project which determined that we can pick out where illegal mining work has occured with good accuracy.

To find out more and get involved – and I’d encourage you to do that – go to:


Q1) Where do you see DataKind going?

A1) We do a lot with not a lot of money. I had assumed that DataKind was 100 people when I joined, it was less than 10. I would love to see this model replicated in other countries. And conferences… Bringing volunteer data scientists together with providers enables us to increase the opportunity for these things to happen. Bringing these people together, those conferences are rich experiences that amplify the impact of what we are doing.

Q2) For the mining project you can access the data online. The US Federal Government is hosting the data, and we used Google Earth engine in this work.

From Analytics To AI: Where Next For Government Use Of Data? – Eddie Copeland, Director of Government Innovation, Nesta

I’ve been talking to anyone who will listen over the last 5 years about the benefits of public sector data. We have been huge proponents of using open data, but often data has been released in a vague hope that someone else might do something with it. And we have the smart cities agenda, generating even more data that often we have no idea how to use. But there is a missing link there… The idea that public organisations should be the main consumer of their own data, for improving their own practice.

Now you’ll have read all those articles asking if data is the new “oil”, the new “fuel”, the new “soil”! I don’t much care about the analogy but the key thing is that data is valuable. Data enables the public sector to work better, it enables many of the tried and tested ways of working better. Doing more and better with less. But that’s hard to do. For a public sector organisation with lots of amazing data on opportunities and challenges in my area, but not the next door area, how can I understand that bigger picture. We can target resources to the most vulnerable areas, but we need data to tell us where those are. Without visibility across different organisations/parts of the public sector (e.g. in family and child services), how can that data be used to understand appropriate support and intervention?

Why do we focus on data issues? Well, there is a technology challenge as so many public sector organisations have different IT services. And you have outrageous private sector organisations who charge the public sector to access their own data – they should be named and shamed. Even when you get the data out the format can be inconsistent, it’s hard to use. Then there is what we can do with the data – we often urge on the side of caution, not what is useful. Historically the main data person in public sector organisations was the “data protection officer” – the clue is in the title!  It takes an organisational leap to collaborate on issues where that makes sense.

I used to work for a think tank and I got bored of that, I really wanted to be part of a “do tank”, to actually put things into action. And I found this great organisation called Nesta and we have set up the London Office of Data Analytics:

  • an impactful problem – it takes time, backing, support you have to have a problem that matters
  • a clearly defined intervention – what would you do differently if you had all the information you could want about the problem you want to solve (data science is not the innovation)
  • what is the information asset you would need to undertake that intervention?
  • what intervention do you need to undertake to solve that issue?

So when we looked at London the issue that seemed to fit these criteria was unlicensed Houses of Multiple Occupancy, and how we might predict that. We asked housing officers how they identified these properties, we looked at what was already known, we looked at available information around those indicators. And then developing machine learning to predict those unlicensed HMOs – we are now on the third version of that.

We have also worked on a North East Data Pilot to join up data across the region to better understand alcohol harms. But we didn’t know what intervention might be used, which has made this harder to generate value from.

And we are now working on the Essex Centre for Data Analytics, looking at the issue of modern slavery.

Having now worked through many of these examples, we’ve found that data is the gateway drug to better collaboration between organisations. Just getting all the different players in the room, talking about the same problem in the same way, is hugely valuable. And we see collaborations being set up across the place.

So, things we have learned:

  1. Public sector leaders need to create the space and culture for data to make a difference – there is no excuse for not analysing the data, and you’ll have staff who know that data and just need the excuse to focus and work on this.
  2. Local authorities need to be able to link their own data – place based and person based data.
  3. We need consistent legal advice across the public sector. Right now lots of organisations are all separately getting advice on GDPR when they face common issues…

So, what’s next? Nesta is an innovation organisation. There is excitement about technologies of all types. For this audience AI probably is overhyped but nonetheless that has big potential, particularly algorithmic decision making out in the field. Policy makers talk about evidence based decision making, but AI can enable us to take that out into the field. Of course algorithms could do great things, but we also have examples that are bad… Companies hiring based on credit records is not ok. Public sector bodies not understanding algorithmic bias is not ok. For my own part I published 10 principles for a code of conduct for public sector organisations to use data centres – I’d love your feedback at

It is not OK to use AI to informa a decision if the person using it could not reasonable understand its basic objectives, function and limitations. We would face a total collapse of trust that could set us back a decade. And we’ve seen over the last week what that could mean.


Q1) Aren’t the problems you are talking about are surely people problems?

A1) Public organisations are being asked to do more with less, and that makes it difficult for that time to be carved out to focus on these challenges, that’s part of why you need buy in and commitment at senior level. There is a real challenge here about finding the right people… The front line workers have so much knowledge but you have organisations who

Q2) Your comment that you have to understand the AI, GDPR require a right to explanation to use of data and that’s very hard to do unless automated.

A2) Yes, that’s a really untested part of GDPR. If local authorities buy in data they have to understand where that data is from, what data is being used and what that means. In the HMO example local front line staff can look at those flags from the prediction and add their own knowledge of the context of, for instance, a local landlord’s prior record. But that understanding of how to use and action that data is key.

Data Driven Business. It’s Not That Hard.- Alex Depledge, Founder,, Former CEO

That’s a deliberately provocative title – I knew that this would be a room full of intellectuals and I’m going to bring back down to earth. I’m known for setting up, and I think it’s fitting that I am following Eddie talking about the basics and the importance of getting the basics right. So many companies that say they are running a data driven business, and they are not… Few are actually doing this.

I started my professional life at Accenture. I met my co-founder there. About 7 years into our friendship she emailed me and said “I’ve got it. I need a piano teacher, I’ve been Googling for four hours, we need a place to find music teachers”. And I said “that’s a rubbish idea”. And then I needed a wysteria trimmed… And we decided we wanted to build a marketplace for local services… We had a whole idea, a powerpoint deck, and thought that great, we’ll get a team in India or Singapore to build it… Sounded great, but nothing happened.

And then Jules quit her well paid job and she said “it’s ok, I’ve brought a book!” – and it was a Ruby on Rails book… She started coding… And she built a thing. And that led to us going through a Springboard process… We had some data but I was trying to pull in money. We were attracting some customers, but not a lot of service providers… We were driven by intuition or single conversations… So one day I said that I’m quitting and going back to the day job… And I was frustrated… And a collague said “maybe we should look at what the data says?”… And so they looked. And they found that 1 in 4 people coming to the website wants a cleaner. And we were like “holy shit!”. Because we didn’t have any cleaners. So we threw away what we had, we set up a three page site. We went all in so you could put a postcode in, find a cleaner, and book them. We got 27 bookings, then double that… And we raised some funding – £250k just when we desperately needed it. We found cleaners, we scaled up, we got much bigger investment. And we scaled up to 100 people.

Then we really turned into a data driven business, building what people want, try it, check the data, iterate. Our VC at Axel pushed us to use mobile… We weren’t convinced. We checked the data that actually people booked cleaners from their desk at lunchtime. At our pinnacle we moved 10k cleaners around London at one point. We had to look at liquidity and we needed cleaners to have an average of 30 hours of work per week… too few and cleaners weren’t happy, too high and jobs weren’t taken up. So at 31 hours we’d start recruiting.

From there we looked at expansion and what kind of characteristics were needed. We needed cities like a donut – clients in the middle, cleaners at the outside. We grew but then we got some unwanted attention and chose to sell. For £32 million. And the company that brought us had 80 engineers.. And they migrated 16 countries onto our platform which had been built by 8 engineers.

So, we sold our business…. And I thought I’m not going to do that again…

And then I wanted a new kitchen… So I had an architect in… spent £@500… 45 days later I got plans… and 75 days later I had an illustration of how it would look so I could make a decision. And so I started Resi, the first online architect. And it took me just 4 months to be convinced that this could be a business. We set up a page of what we thought we might do. I spent £10 per day on Facebook A/B testing ads. And we’ve had a huge amount of business…. We wanted to find the sweet spot for achitects and how long the work would take. Again we needed to know how much time was needed for each customer. So 3 hours is our sweet spot. Our business is now turning over £1 million a year after one year. And only one person works with data, he also does marketing. He looked at our customers and when they convert and how our activities overlaid. After 10 days we weren’t following up, and adding some intervention (email/text etc.) tripled our conversions.

We’ve also been able to look at hotspots across the UK, and we can target our marketing in those areas, and also understand that word of mouth… We can take advantage of that.

I’m a total data convert. I still don’t like spreadsheets. Data informs our decisions – not quite every decision as instinct matters too. But every piece of data analysis we did was doable in a spreadsheet by someone in high school… It doesn’t take machine learning, or AI, or big data. Even simple analysis can create tremendous results.


Q1) What next?

A1) I always said I didn’t want to dine out on one story… Like Hassle. But I don’t know the end for Resi yet… Invite me back in a few years!n

Q1) The learning for a few hours of work was huge.

A1) Our entire business was based on a single piece of analysis – what were our customers looking for led to £32m.

The AI Race: Who’s Going To Win? – Vicky Brock (VB – chairing), CEO, Get Market Fit; Alex Depledge (AD), Founder, Former CEO; Joel KO (JK), Founding CEO, Marvelstone Ventures; Chris Neumann (CN), Early Stage Investor

CN: I’m a recovering entrepreneur. As an investor I’ve had a global purview on what’s going on in the AI race. And I think it’s interesting that we see countries and areas which haven’t always been at the cutting edge of technology, really finding the opportunities here. Including Edinburgh.

JK: We are funders based in Singapore and investing in FinTech. The AI technology has been arising… I’m hoping to invest in AI start ups and incubators.

AD: You already know who I am. In my brief hiatus between companies I was an entrepreneur in residence in Index Ventures, and I saw about 300 companies come in saying they were doing AI or Machine Learning so I have some knowledge here. But also knowing a leading professor in data ethics I don’t care who wins, but I care that Pandora isn’t let out of her box until governments have a handle on this because the risks are great.

VB: I’m a serial entrepreneur around data. And machine learning or AI can kind of be the magic words for getting investment. There is obvious hype here… Is it a disruptor?

CN: I’ve seen a lot of companies – like Alex – say they use ML or AI… In some ways its the natural progression from being data driven. I do think there will be an incredible impact on society over the next 10 years from AI. But I don’t think it will be the robots and tech from science fiction, it will probably be in more everyday ways.

VB: Is AI the key word to get funding…

JK: I see many AI start ups… But often actually it’s a FinTech start up… But they present themselves that way as funders like to hear that… There is so much data… And AI does now spread into data lives… Entrepreneurs see AI as a way to sell themselves to investors.

VB: At one stage it was “big data” then “AI” but you’ve had some little data… What did you see when you were entrepreneur in residence?

AD: No disrespect to investors but they focus on financials and data, but actually I’d often be asking about what was happening under the bonnet… So if they were were using machine learning, ask about that, ask about data sets, ask where it’s coming from… But often they do interesting data work but it’s a good algorithm or calculation… It’s not ML or AI. And that’s ok – that’s something I wanted to bring out in my presentation.

VB: What’s looking exciting now?

CN: We see really interesting organisations starting to do fascinating work with AI and ML. I focus on business to business work, but that often looks less exciting to others. So I am excited about an investment I’ve made in a company using BlockChain to prove GDPR compliance. I spoke with a cool company here using wearables and AI for preventing heart attacks, which is really amazing.

JK: I have been here almost a week, met start ups, and they were really really practical. They have the sense to make a revenue stream from the technology. And these very new start ups have been very interesting to me personally.

VB: You’ve started your next company, did you cross lots of ideas off first…

AD: Jules and I had a list of things we wouldn’t do… Chris talked about B2B… We talked about not doing large scale or consumer ideas. We whittled our list of 35 ideas down to 4 each and they were all B2B… But they bored us. We liked solving problems we’ve experienced. My third business I hope will be B2B as getting to £10m is a bit more straightforward than in B2C.

VB: AI requires particular skillsets… How should we be thinking about our skillsets and our talents.

CN: Eddie talked earlier about needing to know what the point in. It can be easy to get lost in the data, to geek out… And lose that focus. So Alex just asking that question, finding out who gives a damn, that’s really important. You have to do something worthwhile to somebody, there’s no point doing it .

JK: With AI… In ten years… Won’t be coding. AI can code itself. So my solution is that you should let your kids play outside. In Asia lots of parents send kids to coding schools… They won’t need to be engineers… Parents’ response to the trend is too early and not thought through…

AD: I totally agree. Free play and imagination and problem solving is crucial. There aren’t enough women in STEM. But you can over focus on STEM. It’s data and digital literacy from any angle, it could be UX, marketing, product management, or coding… In London we hav ethis idea that everyone should be coding, but actually digital literacy is the skills we need to close. And actually that comes down to basic literacy and numeracy. It’s back to basics to me.

VB: I’d like to make a shout out for arts and social sciences graduates. We learn to ask good questions…

AD: Looking at recent work on where innovation comes from, it comes from the intersectionality of disciplines. That’s when super exciting stuff happens…


Q1) Mainly for Alex… I’m machine learning daft… And I love statistics. And I know the value of small scale statistics. And the value of machine learning and large scale data – not so much AI. How do you convey that to business people?

AD) We don’t have a stand out success in the UK. But with big corporates I tell them to start small.. Giving engineers space to play, to see what is interesting… That can yield some really interesting results. You can’t really show people stuff, you need to just try things.

VB) Are you trying to motivate people to use data in your company?

JK) Yes, with investors you see patterns… I tell kids to start start ups as early as possible… So they can fail earlier… Because failures then lead to successful businesses next time.

CN) A lot of folk won’t be aware that for many organisations there is a revenue stream around innovation… It’s a really difficult thing to try to bring in innovative practices into big organisations, or collaborate with them, without squishing that. There are VCs and multinationals who will charge you a lot of money to behave like a start up… But you can just start small and do it!

The Revolutionary World Of Data Science – Passing On That Tacit Knowledge! – Shakeel Khan, Data Science Capability Building Manager, HM Revenue & Customs

I’ve been quite fortunate in my role in that I’ve spend quite a lot of time working with both developed and developing economies around data science. There is huge enthusiasm across the world from governments. But there is also a huge fear factor around rogue players, and concerns about the singularity – machines exceeding humans’ capabilities. But there are genuine opportunities there.

I’ve been doing work in Pakistan, for DFID, where they have a huge problem with Dengy Fever. They have tracked the spread with mobile phone data, enabling them to contain it at source. That is saving lives. That’s a tremendous outcome. Closer to home, John Bell at Cambridge University has described AI as the saviour of our health services, as AI can enable us to run our services more effectively and more economically.

In my day job at HMRC, you can’t underestimate what the work that we do enables in terms of investment in the country and its services.

I want to talk about AI at three stages: Identify; Adopt; Innovate.

In terms of data science and what is being done around the world… The United Arab Emirates have set up their Ministry of AI and a 2031 Articificial Intelligebce Strategy. We have the Alan Turing Institute looking at specific problems but across many areas, some really interesting work there. In Edinburgh we have the amazing Data Lab, and the research that they are doing for instance with cancer, and we have the University of Edinburgh Bayes Centre. Lots going on in the developed world. But what about the developing world? I’ve just come back from Rwanda, who had a new Data Revolution Policy. I watched a TED talk a few weeks back that emphasised that what is not needed in sub0-saharan Africa is help, what they need is the tools and means to do things themself.

Rwanda is a hugely progressive country. They have more women in parliament (62.8%) than any country in the world. Their GDP is $8.3bn. They have a Data Revolution Policy. They are at the start of their journey. But they are trying to bring tacit knowledge in, to leapfrog development… Recognising the benefit of that tacit knowledge and of those face to face engagements.

For my role I am split about 50/50 between international development and work for HMRC. So I’ll say a bit more about the journey for developed economies…

Defining Data Science can be quite abstract. You have to make a benefits case, to support the vision, to share a framework and some idea of timeline, with quick wins, to build teams, to build networks. Having a framework allows organisations to build capabilities in a manageable way…

A new Data Science Centre going up in Kigali, Rwanda, will house 200 data scientsists – thats a huge commitment.

The data science strategic framework is about data; people skills; cultural understanding and acceptance – with senior buy in crucial for that… And identifying is also about data ethics, skills development – we have been developing frameworks for years that we can now share. For Rwanda we think we can reduce the time to develop data capabilities from maybe 5 years to perhaps 3. Similarly in Pakistan.

When you move to the adopt phase… You really need to see migrationa cross sectors. I started my career in finance. When I came to HMRC I did a review of machine learning and how that was being used, how that machine learning was generating benefit. We managed to bring in £29 bn that would otherwise be lost, partly through machine learning. One machine learning model can, effetively, bring in tens or hundreds of millions of pounds so they have to be well calibrated and tested. So, I developed the HMRC Predictive Analytics Handbook (from June 2014), which we’ve shared across HMRC but also DWP, across collaeagues in government.

In terms of Innovate, it is about understanding the field and latest developments. However HMRC are risk averse, so we want to see where innovation has worked elsewhere. So I did some work with Prof David Hand at Imperial College London about 20 years ago, and I got back in touch, and we developed a programme of data science learning. Not about Imperial providing training, it was a partnership between HMRC and Imperial. We looked closely at the curriculum and demonstrate value added, and look at how we could innovate what we do.

University of Edinburgh Informatics is a really interesting one. I read a document a few years ago by the late Prof. Jon Oberlander about the way that the academic and public and private sectors working together could really benefit the Scottish economy. Two years of work led to a programme in natural language processing that was the result of close collaboration in HMRC. Jon Oberlander was hugely influential, and passionate about conversational technology and the scourge of isolation. And was able to ask lots of questions about AI, and when that will be truly conversational. I hope to continue that work with Bayes, but also wanted to say thank you to Jon for that.

AI is increasingly touching our lives. Wherever we are in the world, sharing our tacit knowledge will be incredibly important.


Q1) Rwanda has clearly made a deep impression. What were the most suprising things?

A1) People have stereotypes about sub saharan Africa that just aren’t true. For instance when you get off the plane you cannot take plastic bags in – they are an incredibly environmental country. I saw no litter anyway in the country. The people of Rwanda are truly committed to improving the lives of people.

Q2) Do you use the same machine learning methods for low income and high income tax payers/avoiders?

A2) There are some basic machine learning methods that are consistent, but we are also looking at more novel models like boosted trees.

Q3) I worked in Malawi and absolutely back up your comment about the importance of visiting. You talked about knowledge from yourself to Rwanda, how was the knowledge exchange the other way?

A3) Great question. It wasn’t learning all from developed to developing. We learnt a great deal from our trip. That includes cultural aspects. I terms of the foundations of data science, we in the UK have used machine learning in financial services and retail for 30 – 40 years, that isn’t really achievable in these countries at the moment and there it is learning going from developed to developing.

Closing comments – Maggie Philbin

I’ve been reflecting on the (less serious) ways data might influence my life. My son in law is in a band (White Lies) and that has given me such an insight into how the music industry use data – the gender and age of people who access your music, whether they will go to gigs etc. And in fact I was very briefly in a band myself during my Swap Shop days… We made a mock up Top of the Pops… Kids started writing in… And then BBC records decided to put it out… We had long negotiations about contracts… But I was sure no-one would buy it… It reached number 15… So we went from parodying Top of the Pops to being on Top of the Pops. And thank you to Scotland – we made number 9 here! But I hadn’t negotiated hard – we just got 0.5%. And if we’d had that data understanding that White Lies have, who knows where we would have been.

So, day one has been great. Thank you to The Data Lab, and to all the sponsors. And now we adjourn for drinks.

 March 22, 2018  Posted by at 10:53 am Events Attended, LiveBlogs Tagged with: , ,  No Responses »
Nov 172017

Today I am at the Scottish Government for the Digital and Information Literacy Forum 2017.

Introduction from Jenny Foreman, Scottish Government: Co-chair of community of practice with Cleo Jones (who couldn’t be here today). Welcome to the 2017 Digital and Information Literacy Forum!

Scottish Government Digital Strategy – Cat Macaulay, Head of User Research and Service Design, Scottish Government

I am really excited to speak to you today. For me libraries have never just been about books, but about information and bringing people together. At high school our library was split between 3rd and 4th year section and a 5th and 6th year section, and from the moment I got there I was desperate to get into the 5th and 6th year section! It was about place and people and knowledge. My PhD later on was on interaction design and soundscapes, but in the context of the library and seeking information… And that morphed into a project on how journalists yse information at The Scotsman – and the role of the library and the librarian in their clippings library. In Goffman terms it was this backstage space for journalists to rehearse their performances. There was talk of the clippings library shutting down and I argued against that as it was more than just those clippings.

So, that’s the personal bit, but I’ll turn to the more formal bit here… I am looking forward to discussions later, particularly the panel on Fake News. Information is crucial to allowing people to meaningfully, equally and truly participate in democracy, and to be part of designing that. So, the imporatnce of digital literacy is crucial to participation in democracy. And for us in the digital directorate, it is a real priority – for reaching citizens and for librarians and information professionals to support that access to information and participation.

We first set out a digital strategy in 2011, but we have been refreshing our strategy and about putting digital at the heart of what we do. Digital is not about technology, it’s a cultural issue. We moved before from agrarian to industrial society, and we are now in the process of moving from an industrial to a digital society. Aiming to deliver inclusive economic growth, reform public services, tackle inequalities and empower communities, and prepare people for the future workplace. Digital and information literacy are core skills for understanding the world and the future.

So our first theme is the Digital Economy. We need to stimulate innovation and investment, we need to support digital technologies industr, and we need to increase digital maturity of all businesses. Scotland is so dependent on small businesses and SMEs that we need our librarians and information professionals to be able to support that maturity of all businesses.

Our second theme is Data and Innovation. For data we need to increase public trust in holding data securely and using/sharing appropriately. I have a long term medical issue and the time it takes to get appointments set up, to share information between people so geographically close to each other – across the corridor. That lack of trust is core to why we still rely on letters and faxes in these contexts.

In terms of innovation, CivTech brings together the public sector teams and tech start-ups to develeop solutions to real problems, and to grow and expand services. We want to innovate and learn from the wider tech and social media context.

The third theme is Digital Public Services, the potential to simplify and standardise ways of working. Finding common technologies/platforms build and procured once. And design services with citizens to meet their needs. Information literacy skills and critical questioning are at the heart of this. You have to have that literacy to really understand the problems, and to begin to be looking at addressing that, and co-designing.

The fourth theme is Connectivity. Improving superfast broadband, improving coverage in rural areas, increasing the 4G coverage.

The fifth theme is Skills. We need to build a digitally skilled nation. I spent many years in academia – no matter how “digital native” we might assume them, actually we’ve assumed essentially that because someone can drive a car, they can build a car. We ALL need support for finding information, how to judge it and how to use it. We all need to learn and keep on learning. We also need to promote diversity – ensuring we have more disabled people, more BAME people, more women, working in these areas, building these solutions… We need to promote and enhance that, to ensure everyone’s needs are reflected. Friends working in the third sector in Dundee frequently talk about the importance of libraries to their service users, libraries are crucial to supporting people with differing needs.

The sixth theme is Participation. We need to enable everybody to share in the social, economic and democractic opportunities of digital. We need to promote inclusion and participation. That means everyone participating.

And our final theme (seven) is Cyber Security. That is about the global reputation for Scotland as a secure place to work, learn and do business. That’s about security, but it is also about trust and addressing some of those issues I talked about earlier.

So, in conclusion, this is a strategy for Scotland, not just Scottish Government. We want to be a country that uses digital to maximum effect, to enable inclusion, to build the economy, to positively deliver for society. It is a living document and can grow and develop. Collective action is needed to ensure nobody is left behind; we all remain safe, secure and confident about the future. We all need to promote that information and digital literacy.

Q1) I have been involved in information literacy in schools – and I know in schools and colleges that there can be real inconsistency about how things are labeled as “information literacy”, “digital literacy”, and “digital skills”. I’m slightly concerned there is only one strand there – that digital skills can be about technology skills, not information literacy.

A1) I echo what you’ve just said. I spent a year in a Life Sciences lab in a Post Doc role studying their practice. We were working on a microscopy tool… And I found that the meaning of the word “image” was understood differently by Life Scientists and Data Scientists. Common terminology really matter. And indeed semantic technologies enable us to do that in new ways. But it absolutely matters.

Q2, Kate SVCO) We are using a digital skills framework developed that I think is also really useful to frame that.

A2) I’m familiar with that work and I’d agree. Stripping away complexity and agree on common terms and approaches is a core focus of what we are doing.

Q3) We have been developing a digital skills framework for colleges and for the student lifecycle. I have been looking at the comprehensive strategy for schools and colleges by Welsh Government’s… Are there plans for similar?

A3) I know there has been work taking place but I will take that back.

Q4) I thought that the “Participation” element was most interesting here. Information literacy is key to enabling participation… Say what you like about Donald Trump but he has made the role of information literacy in democracy very vital and visible. Scotland is in a good place to support information literacy – there are many in this room have done great work in this area – but it needs resourcing to support it.

A4) My team focuses on how we design digital tools and technologies so that people can use them. And we absolutely need to look at how best to support those that struggle. But is not just about how you access digital services… How we describe these things, how we reach out to people… I remember being on a bus in Dundee and hearing a guy saying “Oh, I’ve got a Fairer Scotland Consultation leaflet… What the fuck is a Consultation?!”. I’ve had some awkward conversations with my teenage boys about Donald Trump, and Fake News. I will follow up with you afterwards – I really welcome a conversation about these issues. At the moment we are designing a whole new Social Security framework right now – not a thing most other governments have had to do – and so we really have to understand how to make that clear.

Health Literacy Action Plan Update – Blythe Robertson, Policy Lead, Scottish Government

The skills, confidence, knowledge and understanding to interact with the health system and maintain good health is essentially what we mean in Health Literacy. Right now there is a huge focus in health policy on “the conversation”. And that’s the conversation between policy makers and practitioners and people receiving health care. There is a model of health and care delivery called “More than Medicine” – this is a memorable house-shaped visual model that brings together organisational processes and arrangements, health and care professionals, etc. At the moment though the patient has to do at least as much as the medical professional, with hoops to jump through – as Cat talked about before…

Instructions can seem easy… But then we can all end up at different places [not blogged: an exercise with paper, folding, eyes closed].

Back when computers first emerged you needed to understand a lot more about computer languages, you had to understand how it worked… It was complex, there was training… What happened? Well rather than trianing everyone, instead they simplified access – with the emergence of the iPad for instance.

So, this is why we’ve been trying to address this with Making it easy: A health literacy action plan for Scotland. And there’s a lot of text… But really we have two images to sum this up… The first (a woman looking at a hurdle… We’ve tried to address this by creating a nation of hurdlers… But we think we should really let people walk through/remove those hurdles.

Some statistics for you: 43% of English working age adults will struggle to understand instructions to calculate a childhood paracetamol dose. There is lot bound up here… Childhood health literacy is important. Another stat/fact: Half of what a person is told is forgotten. And half of what is remembered is incorrect. [sources: several cited health studies which will be on Blythe’s slides]. At the heart of issue is that a lot of information is transmitted… then you ask “Do you understand?” and of course you say “yes”, even if you don’t. So, instead, you need to check information… That can be as simple as rephrasing a question to e.g. “Just so I can check I’ve explained things clearly can you tell me what you’ve understood” or similar.

We did a demonstrator programme in NHS Tayside to test these ideas… So, for instance, if you wander into Nine Wells hospital you’ll see a huge board of signs… That board is blue and white text… There is one section with yellow and blue… That’s for Visual Impairment, because that contrast is easier to see. We have the solution but… People with visual impairment come to other areas of the hospitals. So why isn’t that sign all done in the same way with high contrast lettering on the whole board? We have the solution, why don’t we just provide it across the board. That same hospital send out some appointment letters asking them to comment and tell them about any confusion… And there were many points that that happened. For instance if you need the children’s ward… You need to know to follow signs for Paediatrics first… There isn’t a consistency of naming… Or a consistency of colour. So, for instance Maternity Triage is a sign in red… It looks scary! Colours have different implications, so that really matters. You will be anxious being in hospital – consistency can help reduce the levels of anxiety.

Letters are also confusing… They are long. Some instructions are in bold, some are small notes at the bottom… That can mean a clinic running 20 minutes late… Changing what you emphasise has a huge impact. It allows the health care provision to run more smoothly and effectively. We workshopped an example/mock up letter with the Scottish Conference for Learning Disability. They came up with clear information and images. So very clear to see what is happening, includes an image of where the appointment is taking place to help you navigate – with full address. The time is presented in several forms, including a clock face. And always offer support, even if some will not need it. Always offer that… Filling in forms and applications is scary… For all of us… There has to be contact information so hat people can tell you things – when you look at people not turning up to appointments was that they didn’t know how to contact people, they didn’t know that they could change the appointment, that they wanted to contact them but they didn’t want to make a phone call, or even that because they were already in for treatment they didn’t think they needed to explain why they weren’t at their outpatients appointment.

So, a new action plan is coming called “Making it easier”. That is about sharing the learning from Making it Easy across Scotland. To embed ways to improve health literacy in policy and practice. To develop more health literacy responsive organisations and communities. Design supports and services to better meet people’s health literacy levels. And that latter point is about making services more responsive and easier to understand – frankly I’d like to put myself out of a job!

So, one area I’d like to focus on is the idea of “Connectors” – the role of the human information intermediary, is fundamental. So how can we take those competancies and roll them out across the system… In ways that people can understand… Put people in contact with digital skills, the digital skills framework… Promoting understanding. We need to signpost with confidence, and to have a sense that people can use this kind of information. Looking at librarians as a key source of information that can helps support people’s confidence.

In terms of implementation… We have at (1) a product design and at (3) “Scaled up”. But what is at step (2)? How do we get there… Instead we need to think about the process differently… Starting with (1) a need identified, then a planned structured resources and co-developed for success, and then having it embedded in the system… I want to take the barriers out of the system.

And I’m going to finish with a poem: This is bad enough by Elspeth Murray, from the launch of the cancer information reference group of the South East Scotland Cancer Network 20 January 2016.


Q1) I’m from Strathclyde, but also work with older people and was wondering how much health literacy is part of the health and social care integration?

A1) I think ultimately that integration will help, but with all that change it is challenging to signpost things clearly… But there is good commitment to work with that…

Q2) You talked about improving the information – the letters for instance – but is there work more fundamentally questioning the kind of information that goes out? It seems archaic and expensive that appointments are done through posted physical letters… Surely better to have an appointment that is in your diary, that includes the travel information/map….

A2) Absolutely, NHS Lothian are leading on some trial work in this area right now, but we are also improving those letters in the interim… It’s really about doing both things…

Cat) And we are certainly looking at online bookings, and making these processes easier, but we are working with older systems sometimes, and issues of trust as well, so there are multiple aspects to addressing that.

Q3) Some of those issues would be practically identical for educators… Teachers or lecturers, etc…

A3) I think that’s right. Research from University of Maastrict mapped out the 21 areas across Public and Private sectors in which these skills should be embedded… And i Think those three areas of work can be applied across those area… Have to look at design around benefits, we have some hooks around there.

Cat) Absolutely part of that design of future benefits for Scotland.

Panel Discussion – Fake News (Gillian Daly – chair; Lindsay McKrell (Strathclyde); Sean McNamara (CILIPS); Allan Lindsay (Young Scott))

Sean: CILIPS supports the library and information science community in Scotland, including professional development, skills and ethics. Some years ago “information literacy” would have been more about university libraries, but now it’s across the board an issue for librarians. Librarians are less gatekeepers of information, and more about enabling those using their libraries to seek and understand information online, how to understand information and fake news, how to understand the information they find even if they are digitally confident in using the tools they use to access that information.

Allan: Young Scot is Scotland’s natural charity for information literacy. We work closely with young people to help them grow and develop, and influence us in this area. Fake News crops up a lot. A big piece of work we are involved in is he 5 Rights projects, which is about rights online – that isn’t just for young people but significantly about their needs. Digital literacy is key to that. We’ve also worked on digital skills – recently with the Carnegie Trust and the Prince’s Trust. As an information agency we reach people through our website – and we ensure young people are part of creating content in that space.

Lindsay: I’d like to talk about digital literacy as well as Fake News. Digital literacy is absolutely fundamental to supporting citizens to be all that they can be. Accessing information without censorship, and a range of news, research, citizenship test information… That is all part of public libraries service delivery and we need to promote that more. Public libraries are navigators for a huge and growing information resource, and we work with partners in government, in third sector, etc. And our libraries reach outside of working hours and remote areas (e.g. through mobile levels) so we have unique value for policy makers through that range and volume of users. Libraries are also well placed to get people online – still around 20% of people are not online – and public libraries have the skills to support people to go online, gain access, and develop their digital literacy as well. We can help people find various source of information, select between them, to interpret information and compare information. We can grow that with our reading strategies, through study skills and after school sessions. Some libraries have run sessions on fake news, but I’m not sure how well supported thse have been. We are used to displaying interesting books… But why aren’t our information resources similarly well designed and displayed – local filterable resources for instance… Maybe we should do some of this at national level,  not just at local council level. SLIC have done some great work, what we need now is digital information with a twist that will really empower citizens and their information literacy…

Gillian Daly: I was wondering, Allan, how do you tackle the idea of the “Digital Native”? This idea of inate skills of young people?

Allan: It comes up all the time… This presumption that young people can just do things digitally… Some are great but many young people don’t have all the skills they need… There are misconceptions from young people themselves about what they can and cannot do… They are on social media, they have phones… But do they have an understanding of how to behave, how to respond when things go wrong… There is a lot of responsibility for all of us that just because young people use these things, doesn’t mean they understand them all. Those misconceptions apply across the board though… Adults don’t always have this stuff sorted either. It’s dangerous to make assumptions about this stuff… Much as it’s dangerous to assume that those from lower income communities are less well informed about these things, which is often not correct at all.

Lindsay: Yes, we find the same… For instance… Young people are confident with social media… But can’t attach a document for instance…

Comment from HE org: Actually there can be learning in both directions at University. Young people come in with a totally different landscape to us… We have to have a dialogue of learning there…

Gillian: Dialogue is absolutely important… How is that being tackled here…

Sean: With school libraries, those skills to transfer from schools to higher education is crucial… But schools are lacking librarians and information professionals and that can be a barrier there… Not just about Fake News but wider misinformation about social media… It’s important that young people have those skills…

Comment: Fake News doesn’t happen by accident… It’s important to engage with IFLA guide to spot that… But I think we have to get into the territory of why Fake News is there, why it’s being done… And the idea of Media and Information Literacy – UNESCO brought those ideas together a few years ago. There is a vibrant GATNO organisation, which would benefit from more Scottish participation.

Allan: We run a Digital Modern Apprenticeship at Young Scot. We do work with apprentices to build skills, discernment and resiliance to understand issues of fake news and origins. A few weeks back a young person commented on something they had seen on social media… At school for me “Media Studies” was derided… I think we are eating our words now… If people had those skills and were equipped to understand that media and creation process. The wider media issues… Fake News isn’t in some box… We have to be able to discern mainstream news as well as “Fake News”. Those skills, confidence, and ability to ask difficult questions to navigate through these issues…

Gillian: I read a very interesting piece by a journalist recently, looking to analyse Fake News and the background to it, the context of media working practice, etc. Really interesting.

Cat: To follow that up… I distinctly remember in 1994 in The Scotsman about the number of times journalists requested clippings that were actually wrong… Once something goes wrong and gets published, it stay there and repopulates… Misquotations happen that way for instance. That sophisticated understanding isn’t about right and wrong and more about the truthfulness of information. In some ways Trump is doing a favour here, and my kids are much more attuned to accuracy now…

Gillian: I think one of the scariest things is that once the myth is out, it is so hard to dispel or get rid of that…

Comment: Glasgow University has a Glasgow Media Group and they’ve looked at these things for years… One thing they published years ago, “Bad News”, looked at for instance the misrepresentation of Trade Unionists in news sources, for a multitude of complex reasons.

Sean: At a recent event we ran we had The Ferret present – those fact checking organisations, those journalists in those roles to reflect that.

Jenny: The Ferret has fact checking on a wonderful scale to reflect the level of fakeness…

Gillian: Maybe we need to recruit some journalists to the Digital and Information Literacy Forum.

And on that, with many nods of agreement, we are breaking for lunch.

Information Literacy & Syrian New Scots – Dr Konstantina Martzoukou, Postgraduate Programme Leader, Robert Gordon University

This project was supposed to be a scoping study of Syrian New Scots – Syrian Refugees coming to Scotland. The background to this is the Syrian Civil War since 2011, which has led to an enormous amount of refugees, mainly in the near region. Most research has been on Asylum seekers in the camps near Syria on basic survival and human rights, on their needs and how to respond to them. The aim of this project was different: a scoping study to examine the information related experiences and information literacy practices of Syrian new Scots during their resettlement and integration. So this is quite different as the context is relatively settled, and is about that resettlement process.

In September 2015 the Prime Minister announced an expansion of the refugee programme to take up to 2000 Syrian Refugees. And the first place Syrian Refugees came was Glasgow. Now, there have been a lot of changes since then but there is the intent to resettle 2000 Syrian Refugees by 2020.

Primary research was done with 3 refugee resettement officers, as well as focus groupd with Syrian new Scots. These groups were in both urban (1 group) and rural (2 groups), and included 38 people from across Syria, having been in camps in Lebanon, Turkey and Iraq and Jordan. I didn’t know what to expect – these people had seen the worst horrors of war. In reality the focus groups were sometimes loud and animated, sometimes quiet and sad. And in this group they came from a huge range of professional backgrounds, though most of the women did not work.

So, our work looked at included English language and community integration; Information provisions, cultural differences and previous experiences; Financial security. Today I want to focus on libraries and the role of libraries.

One of the most crucial aspects were language barriers and sociocultural. The refugees were given ESOL classes; a welcome pack with key information for finding the resources in their neighbourhood; a 24 hour Arabic hotline, set up with the mosque for emergencies so that families could receive help outside core working hours; In-house translation services. But one of the challenges across the support given was literacy as a whole – not all of the refugees could read and write in any language. But it was also about understanding interchangable words – “doctor” has a meaning but “GP” not so much. There was also a perception that learning English would be really difficult.

The refugees wanted to know how to learn English, and they were anxious about that. The support officers had different approaches. The ESOL classes were there, but some officers were really proactive, taking refugees to the train station, having mock job interviews… That was really really valuable. But some groups, even a year after arriving, weren’t speaking English. But sometimes that was about the families… Some were confident and really well travelled, but some had lived in one place, and not travelled, and found communication and networking much more difficult. So the language learning was very tied to socio-cultural background.

Many of these families have complex health needs – they were hand picked to come here often because of this – and that causes it’s own challenge. Some had no experience of recycling and of how to correctly put their bins out. Someone felt the open plan kitchen was difficult – that her child was burned because of it. One reported a neighbour telling him not to play with his son outside – the boundaries of danger and expectations of childhood was rather different from their new neighbours. Doctors appointments were confusing. Making bus change was expensive – buying something unneeded because the buses don’t give change. Many wanted family reunion information and support.

Technology is used, but technology is not the key source of information. They used mobile phones with pasy as you go sim cards. They used WhatsApp and were sharing quite traumatic memories and news in this way.

The library is there… But actually they are perceived as being for books and many refugees don’t go there. Community classes, meals etc. may be better. Computer classes can be useful, especially when refugees can participate in a meaningful way. And there are real challenges here – computer classes in the library didn’t work for this group as there were too few computers and the internet connections were too small.

For me the key thing is that we need to position the library as a key place for communication, learning and support for the families.

Q1) Alamal(?) is running events in our libraries – we have an event with films and telling their story – and we have had huge interest in that.

A1) We really want to show case that to the wider community. There are some great examples from England, from other EU countries, but we want more Scottish examples so do please get in touch.

A User Study Investigating the Information Literacy of Scotland Teenagers – David Brazier, Research Assistant, Northumbria University

This is an ILG funded project looking at the Information Literacy of Scottish Teenagers. I’ll introduce the concepts, going through some related works, and then some of the methodology we’d like to implement. So, information literacy is about the ability to seek, understand, assess information. They are crucial to integrating with society as a whole, and is crucial to our modern society. We need to empower students to learn, so they can integrate themselves into modern society.

As the panel talked about earlier, the idea of the “Digital Native” is misleading. Young people have a poor understanding of their information needs. That leads to young people taking the top ranked documents/sites or cite that. And that needs to be counteracted early in their learning so that it doesn’t carry through and right into University (Rowlands 2008). In recent research (Brazier and Harvey 2017) ESOL post graduates were unable to perceive their performance correctly, often judging high performance when the opposite was true. In the “Not Without Me” report this inability to assess their own skills was also highlighted in the wider range of young people. These groups are highly educated, so they should be able to be more reflective on their own practice.

So, in our research, we are using a Mixed Methods approach to do a quantitative analysis of secondary school-aged children’s information gathering behaviour. Triangulated with qualitative assessments of the participants own assessment. It is around a simulated work task.

The search system is based on the TREC AQUAINT collection – large set of over a million documents from three large news agencies collected between 196 and 2000. Pre-defined search topics associated with the project. The initial 15 topics were reduced down to 4 topics selected by school representatives (librarian and 2 teachers from Gracemount High School in Edinburgh).

So, we start with a pre-task questionnaire. The search task is “Tropical strms: What tropical storms (hurricanes and typhoons) have caused significant property damage and loss of life?”. They can then search through a Google-style search of the documents. They click on those sources that seem relevant. And then they get a questionnaire to reflect on what they’ve done.

A pilot was conducted in December 2016. Tasks were randomly selected, using a Latin Square design to ensure no 2 students had the same two tasks. In total 19 students were involved, from S3 (13-14 years old). The study was on PCs rather than handheld devices. No other demographic data was collected. The school representative did provide a (new) unique id to match the task and the questionnaires. The id was known only to the school rep. No further personal data was taken.

We could then look at the queries each student submitted, and were able to ask why they did that and why they selected the article they did.

This is a work in progress… We are interested in how they engage with the study as a whole. We have used the findings of the pilot to adapt the study design and interface, including a task description relocated to a more prominent location; and an instruction sheet (physical) i.e. browser page, interpret interface.

The main study takes place next week, with 100 students (none of whom were part of the pilot). From this we want to get recommendations and guidelines for IL teaching; to inform professional practice; feedback to participants (pamphlet) for reflective purposes; academic publications in the field of information literacy, information retrieval, education and pedagogy.


Q1) Why such a controlled space was selected – presumably students would normally use other places to search, to ask friends etc. So I wondered why you selected such a controlled space like this.

A1) In the previous study we allowed students to look anywhere on the web… But it is much harder to judge relevance in that… These have already been judged for relevance… It’s a wide arc… It adds complexity to the whole process… And someone has to transcribe and mark that footage… For my study there were 29 students and it took 7 months. For 100 students that’s just too large. Test collection is also standardised and replicatable.

The Digital Footprint MOOC – Nicola Osborne, Digital Education Manager, EDINA

This was me… No notes but slides to follow. 

Wikipedia & Information Literacy: the importance of reliable sources – Sara Thomas, Wikimedian in Residence, SLIC

Hi, I’m Wikimedian in Residence at SLIC. The role of a Wikimedian in residence is to work with cultural heritage organisations and Wikimedia and bring the two together. In this role we are working with three local libraries right now but we will be expanding it to a wider Scottish context.

I am passionate about open knowledge and open data. Open data and open knowledge leads to better society, it allows us to make better decisions – I am sick of us being asked to make big decisions about big issues without appropriate information.

Now, I want to introduce you to Bassel Khartabil who was an open source software developer and advocate for open data and knowledge. Knowledge is power… He was detained by the Syrian government and, before he was killed by the government, he wrote very movingly about the impact of open knowledge, that it is so important and a matter of life and death in some contexts.

I want to talk about production of knowledge and what that can teach us about information literacy. Jim Groom at #OER16, said “Wikipedia is the single greatest Open Education Resource the world has ever known”, and he’s not wrong. Wikipedia is more accurate than you may think. There are groups who just edit and work on improving the quality of articles. Women in Red is a group dedicated to having more women’s biographies on Wikipedia. 17% of biographies are women now, that’s 2% more than was the case 2 years ago – and they also work on bringing those biographies up to “featured article” quality.

Quality and ratings scale. Vandalism is picked up quickly – by bots and by people. Wikipedia is neutral in it’s point of view. Nature, in 2005, found that Wikipedia was nearly as accurate as Britannica (2.92 errors per article compared to 3.86 on Wikipedia). The Journal of Clinical Oncology, 2010, found Wikipedia as accurate as Physician Data Query (a premium database). The medical information there is huge – 80% of medical students will use it; ~50% of GPs will use it as a first point in their search. It is the most popular health resource on the web.

Wikipedia is generally the seventh most popular site on the internet. And we have a basic Notability guidance that means an article must be notable, there must be a reason for it being there. The information but be verifiable – the information must come from credible checkable verifiable sources. And we have to use reliable third party publiches sources with a reputation for fact checking and accuracy.

On the subject of media literacy… The Daily Mail didn’t like that Wikipedia doesn’t treat it as reliable – there is no ban but you will get a trigger to ask you if that’s the right source. Brilliantly, they got loads of errors in their own outraged article.

Manipulation is really obvious… The community spots when people are trying to whitewash their own biographies, to promote their company, to try to remove claims of misconduct. And Wikipedia gets it – there is an article on “Wikipedia is not a credible source” – we get it. We are a starting point, a jumping off and discovery point. And in fact we have Wiki Ed ( which works to combat fake news, to support information literacy. If you want to teach information literacy, wiki can help you. We have a Wiki Education Dashboard – mainly in the US, but lots in the UK. Our guides include: Instructor Basics and Case Studies for using Wikipedia in teaching. Some lovely projects here…

I did some work with Chris Harlow, at University of Edinburgh, a few years ago… He found a medical term that wasn’t in Wikipedia, gave them guidance on how to create a Wikipedia page, taught them how to use a medical database, and sent them away to write a section in simple language… Then we show them how to edit an article. It’s really really easy to edit an article now… The students write their section, put it in… And write a page, it goes live… Five minutes later it’s on the front page of Google. It is gratifying to find work so immediately valued and used and useful.

Translation studies at UoE also use Wikipedia in the classroom. Queen Mary’s University of London use Wikipedia in their film classes. They trialled it, it’s now a compulsory part of the programme. It’s a way to teach digital skills, information synthetis. Imperial College London are working to engage undergraduate students involved in synthesising and sharing university. Greg Singh in Sterling University who uses WikiBooks… Which is a project that seeks to create collaboratively produced text books… To produce a text book, a chapter, on what they’ve been doing… It’s about developing collaboration, track that, instill that within a student…

So I have a vide here of Aine Kavanagh from Reproductive Biology at the University of Edinburgh, who authored an article that has been read 20,000 times in the last year. Aine was looking for some extra work, and she wanted to develop her skills. She asked Chris (Harlow) what she could do… She wrote about one of the most common sorts of cancers which there was very little information about. To be able to see the value of that, the impact of that work, that this has been hugely gratifying to do.

To conclude: open knowledge is important, open knowledge gives us a better society, not just being able to find this information but also be able to produce that knowledge is hugely powerful. And Wikipedia is more accurate than you think!


Gillian: I just want to thank all of our speakers, to thank all of you for coming, and to thank the Scottish Government for hosting us.

Oct 042017

This afternoon I’m at the Keynote Session for Information Security Awareness Week 2017 where I’ll speaking about Managing Your Digital Footprint in the context of security. I’ll be liveblogging the other keynotes this afternoon.

The event has begun with a brief introduction from Alistair Fenemore, UoE’s Chief Information Security Officer, and from his colleague David Creighton Offord, the organiser for today’s event.

Talk by John Whitehouse, PWC Cyber Security Director Scotland covering the state of the nation and the changing face of Cyber Threat

I work at PWC, working with different firms who are dealing with information security and cyber security. In my previous life I was at Standard Life. I’ve seen all sorts of security issues so I’m going to talk about some of the things I’ve seen, trends, I’ll explain a few key concepts here.

So, what is cybersecurity… People imagine people in basements with balaclavas… But it’s not that at all…

I have a video here…

(this is a Jimmy Kimmel comedy segment on the Sony hack where they ask people for their passwords, to tell them if it’s strong enough… And how they construct them… And/or the personal information they use to construct that…)

YouTube Preview Image

We do a lot of introductions for boards… We talk about technical stuff… But they laugh at that video and then you point out that these could all be people working in their companies…

So, there is technical stuff here, but some of the security issues are simple.

We see huge growth due to technology, and that speaks to businesses. We are going to see 1 billion connected devices by 2020, and that could go really really wrongly…

There is real concern about cyber security, and they have concerns about areas including cloud computing. The Internet of Things is also a concern – there was a study that found that the average connected device has 25 security vulnerabilities. Dick Cheney had to have his pacemaker re programmed because it was vulnerable to hacking via Bluetooth. There was an NHS hospital in England that had to pause a heart surgery when the software restarted. We have hotel rooms accessible via phones – that will come to homes… There are vulnerabilities in connected pet feeders for instance.

Social media is used widely now… In the TalkTalk breach we found that news of the breach has been leaked via speculation just 20 seconds after the breach occurs – that’s a big challenge to business continuity planning where one used to plan that you’d perhaps have a day’s window.

Big data is coming with regulations, threats… Equifax lost over 140 million records – and executives dumped significant stock before the news went public which brings a different sort of scrutiny.

Morrisons were sued by their employees for data leaked by an annoyed member of staff – I predict that big data loss could be the new PPI as mass claims for data loss take place. So maybe £1000 per customer per data breach for each customer… We do a threat intelligence service by looking on the dark net for data breach. And we already see interest in that type of PPI class suit approach.

The cyber challenge extends beyond the enterprise – on shore, off shore; 1st through to 4th parties. We’ve done work digging into technology components and where they are from… It’s a nightmare to know who all your third parties are… It’s a nightmare and a challenge to address.

So, who should you be worried about? Threat actors vary…. We have accidental loss, Maware that is not targeted, and hacker hobbyists in the lowest level of sophistication, through to state sponsored attacks at the highest level of sophistication. Sony were allegedly breached by North Korea – that firm spends astronomical amounts on security and that still isn’t totally robust. Target lost 100 million credit card details through a third party air conditioner firm, which a hacker used to get into the network, and that’s how the loss occured. And when we talk organised crime we are talking about really organised crime… One of the Ukrainian organised crime groups were offering a Ferrari for their employee of the month prize for malware. We are talking seriously Organised. And serious financial gain. And it is extremely hard to trace that money once its gone. And we see breaches going on and on and on…

Equifax is a really interesting one. There are 23 class action suits already around that one and that’s the tip of the iceberg. There has been a lot of talk of big organisations going under because of cyber security, and when you see these numbers for different companies, that looks increasingly likely. Major attacks lead to real drops in share prices and real impacts on the economy. And there are tangible and intangible costs of any attack…. From investigation and remediation through to DEO and CTO’s losing their jobs or facing prison time – at that level you can personally liable in the event of an attack.

In terms of the trends… 99% of exploited vulnerabilities (in 2014) had been identified for more than a year, some as far back as 1999. Wannacry was one of these – firms had 2 months notice and the issues still weren’t addressed by many organisations.

When we go in after a breach, typically the breach has been taking place for 200 days already – and that’s the breaches we find. That means the attacker has had access and has been able to explore the system for that long. This is very real and firms are dealing with this well and really badly – some real variance.

One example, the most successful bank robbery of all time, was the Bangladesh Central Bank was attacked in Feb 2016 through the SWIFT network .These instructions totalled over US $900 million, mostly laundered through casinos in Macau. The analysis identified that malware was tailored for the target organisation based on the printers they were using, which scrubbed all entry and exit points in the bank. The US Secret Service found that there were three groups – two inside the bank, one outside executing the attack.

Cyber security concerns are being raised, but how can we address this as organisations? How do we invest in the right ways? What risk is acceptable? One challenge for banks is that they are being asked to use Fintechs and SMEs working in technology… But some of these startups are very small and that’s a real concern for heads of securities in banks.

We do a global annual survey on security, across about 10,000 people. We ask about the source of compromise – current employees are the biggest by some distance. And current customer data, as well as IPR, tend to be the data that is at risk. We also see Health and Social Care adopting more technology, and having high concern, but spending very little to counter the risks. So, with Wannacry, the NHS were not well set up to cope and the press love the story… But they weren’t the target in any way.

A few Mythbusters for you…

Anti-Virus software… We create Malware to test our clients’ set up. We write malware that avoids AVs. Only 10-15% of malware will be caught with Anti-Virus software. There is an open source tool, Veil-Framework, that teaches you how to write that sort of Malware so that you can understand the risks. You should be using AV, but you have to be aware that malware goes beyond that (and impacts Macs too)… There is a malware SaaS business model on the darknet – as an attacker you’ll get a guarantee for your malware’s success and support to use it!

Myth 2: we still have time to react. Well, no, the lag from discovery to impacting you and your set up can be minutes.

Myth 3: well it must have been a zero day that got us! True Zero Day exploits are extremely rare/valuable. Attacker won’t use one unless target is very high value and they have no other option. They are hard to use. Even NSA admits that persistence is key to sucessful compromise, not zero day exploits. The NSA created EternalBlue – a zero day exploit – and that was breached and deployed out to these “good guys” as Wannacry.

Passwords… They are a thing of the past I think. 2-factor authentication is more where we are at. Passphrases and strength of passphrases is key. So complex strings with a number and a site name at the end is recommended these days. Changing every 30 days isn’t that useful – it’s so easy to bruteforce the password if lost – much better to have a really strong hash in the first place.

Phishing email is huge. We think about 80% of cyber attacks start that way. Beware spoofed addreses, or extremely small changes to email addresses.

We had a client that had an email from their “finance director” about urgently paying money to an account, which was only spotted because someone in finance noticed the phrasing… “the chief exec never says “Thanks”!”

Malware trends: our strong view is that you should never ever pay for a Ransomeware attack.

I have another video here…

(In this video we have people having their “mind read” for some TV show… It was uncanny… And included spending data… But it wasn’t psychic… It was data that they had looked up and discovered online… )

YouTube Preview Image

It’s not a nice video… This is absolutely real… This whole digital footprint. We do a service called Digital Footprinting for senior execs in companies, and you have to be careful about it as they can give so much away by what you and those around you post… It’s only getting worse and more pointed. There are threat groups going for higher value targets, they are looking for disruption. We think that the Internet of Things will open up the attack surface in whole new ways… And NACS – the Air Traffic people – they are thinking about drones and the issues there around fences and airspace… How do you prepare for this. Take the connected home… These fridges are insecure, you can detect if owner is opened or not and detect if they are at home or not… The nature of threats is changing so much…

In terms of trends the attacks are moving up the value chain… Retain bank clients aren’t interesting compared to banks finance systems, more to exchanges or clearing houses. It’s about value of data… Data is maybe $0.50 for email credentials; a driving license is maybe $25… and upwards the price goes depending on value to the attackers…

So, a checklist for you and your work: (missed this but delighted that digital footprint was item 1)

Finally, go have a look at your phone and how much data is being captured about you… Check your iPhone frequent locations. And on Android check Google Location History. The two biggest companies in the world, Google and Facebook, are free, and they are free because of all the data that they have about you… But the terms of service… Paypal’s are longer than Hamlet. If you have a voice control TV from Samsung and you sign those, you agree to always on and sharable with third parties…

So, that’s me… Hopefully that gave you something to ponder!


Q1) What does PWC think about Deloitte’s recent attack?

A1) Every firm faces these threats, and we are attacked all the time… We get everything thrown at us… And we try to control those but we are all at risk…

Q2) What’s your opinion on cyber security insurance?

A2) I think there is a massive misunderstanding in the market about what it is… Some policies just cover recovery, getting a response firm in… When you look at Equifax, what would that cover… That will put insurers out of business. I think we’ll see government backed insurance for things like that, with clarity about what is included, and what is out of scope. So, if, say, SQL Injection is the cause, that’s probably negligence and out of scope…

Q3) What role should government have in protecting private industry?

A3) The national cyber security centre is making some excellent progress on this. Backing for that is pretty positive. All of my clients are engaging and engaged with them. It has to be at that level. It’s too difficult now at lower levels… We do work with GCHQ sharing information on upcoming threats… Some of those are state sponsored… They even follow working hours in their source location… Essentially there are attack firms…

Q4) (I’m afraid I missed this question)

A4) I think Microsoft in the last year have transformed their view… My honest view is that clients should be on Windows 10 its a gamechanger for security. Firms will do analysis on patches and service impacts… But they delayed that a bit long. I have worked at a firm with a massively complex infrastructure, and it sounds easy to patch but it can be quite difficult to do that in practice, and it can put big operational systems at risk. As a multinational bank for instance you might be rolling out to huge numbers of machines and applications.

Talk by Kami Vaniea (University of Edinburgh) covering common misconceptions around Information Security and to avoid them

My research is on the usability of security and why some failings are happening from the point of view of an average citizen. I do talks to community groups – so this presentation is a mixture of that sort of content and proper security discussion.

I wanted to start with misconceptions as system administrators… So I have a graph here of where there is value to improving your password; then the range in which having rate limits on password attempts; and the small area of benefit to the user. Without benefits you are in the deadzone.

OK, a quick question about URL construction… Is it Facebook’s website, Facebook’s mobile site, AT&T’s website, or Mobile’s website. It’s the last one by construction. It’s both of the last two if you know AT&T own But when you ask a big audience they mainly get it right. Only 8% can correctly differentiate vs Many users tend to just pick a big company name regardless of location in URLs. A few know how to to correctly read subdomain URLs. We did this study on Amazon Mechanical Turk – so that’s a skewed sample of more technical people. And that URL understanding has huge problematic implications for phishing email.

We also tried Most people could tell that was Twitter (not Facebook). But if I used “@” instead of “/” people didn’t understand, thought it was an email…

On the topic of email… Can we trust the “from” field? No. Can we trust a “this email has been checked for viruses…” box? No. Can you trust the information on the source URL for a link in the email, that is shown in the bottom of the browser? Yes.

What about this email – a Security alert for your linked Google account email? Well this is legitimate… Because it’s coming from But you knew this was a trick question… Phishing is really tricky…

So, a shocking percentage of my students think that “from” address is legitimate… Tell your less informed friends how easily that can be spoofed…

What about Google. Does Google know what you type as you type it and before you hit enter? Yes, it does… Most search engines send text to their servers as you write it. Which means you can do fun studies on what people commonly DON’T post to Facebook!

A very common misconception is that opening web pages, emails, pdfs, and docs is like reading physical paper… So why do they need patching?

Lets look at an email example… I don’t typically get emails with “To protect your privacy, Thunderbird has blocked remote content in this message” from a student… This showed me that a 1 pixel invisible image had come with the email… which pinged the server if I opened it. I returned the email and said he had a virus. He said “no, I used to work in marketing and forgot that I had that plugin set up”.

Websites are made of many elements from many sources. Mainly dynamically… And there are loads of trackers across those sites. There is a tool called Lightbeam that will help you track the sites you go to on purpose, and all the other sites that track you. That’s obviously a privacy issue. But it is also a security problem. The previous speaker spoke about supply chains at Target, this is the web version of this… That supply chain gets huge when you visit, say, six websites.

So, a quiz question… I got to Yahoo, I hit reload… Am I running the same code as a moment ago… ? Well, it’s complicated… I had a student run a study on this… And how much changes… In a week about half of the top 200 sites had changed their javascript in a week. I see trackers change between individual reloads… But it might change, it might not…

So we as users you access a first party website, then they access third party sites… So they access ad servers and that sells that user, and ad is returned, with an image (sometimes with code). Maybe I bid to a company, that bids out again… This is huge as a supply chain and tracking issue…

So the Washington Post, for instance, covering the malware attack showed that malicious payloads were being delivered to around 300k users per hour, but only about 9% (27k) users per hour were affected – they were the ones that hadn’t updated their systems. How did that attack take place? Well rather than attack, they just brought an ad and ran malware code.

There is a tool called Ghostery… It’s brilliant and useful… But it’s run by the ad industry and all the trackers are set the wrong way. Untick those all and then it’s fascinating… They tell you about page load and all the components involved in loading a page…

To change topic…

Cookies! Yes, they can be used to track you across web sites. But they can’t give you malware as is. So… I will be tackling the misconception that cookies is evil… And I’m going to try to convince you otherwise. Tracking can be evil… But cookies is kind of an early example of privacy by design…

It is 1994. The internet cannot remember anyone between page loads. You have an interaction with a web server that has absolutely no memory. Cookies help something remember between page loads and web pages… Somehow a server has to know who you are… But back in 1994 you just open a page and look at it, that’s the interaction point…

But companies wanted shopping baskets, and memory between two page reloads. There is an obvious technical solution… You just give every browser a unique identifier… Great! The server remembers you. But the problem is a privacy issue across different servers… So, Netscape implemented cookies – small text strings the server could ask the browser to remember and give back to it later…

Cookies have some awesome properties: it is client visible; third party tracking is client visible too; it’s opt out (delete) option on a per-site basis; it’s only readable by the site that set it; and it allows for public discussion of tracking…

… Which is why Android/iOS both went with the unique ID option. And that’s how you can be tracked. As a design decision it’s very different…

Now to some of the research I work on… I believe in getting people to touch stuff, to interact with it… We can talk to each other, or mystify, but we need to actually have people understand this stuff. So we ran an outreach activity to build a website, create a cookie, and then read the cookie out… Then I give a second website… To let people try to understand how to change their names on one site, not the other… What happens when you view them in Incognito mode… And then exploring cookies across sites. And how that works…

Misconception: VPNs solve all privacy and security problems. Back at Indiana I taught students who couldn’t code… And that was interesting… They saw VPNs as magic fairy dust. And they had absorbed this idea that anyone can be hacked at any time… They got that… But that had resulted in “but what’s the point”. That worries me… In the general population we see media coverage of attacks on major companies… And the narrative that attacks are inevitable… So you end up with this problem…

So, I want to talk about encryption and why it’s broken and what that means by VPNs. I’m not an encryption specialist. I care about how it works for the user.

In encryption we want (1) communication between you and the other party is confidential and has not been changes, and no-one can read what you sent and no one can change what you sent; and (2) to know who we are talking about. And that second part is where things can be messed up. You can make what you think is the secure connection to the right person, but could be a secure connection to the wrong person – a man in the middle attack. A real world example… You go to a coffee shop and use wifi to request the BBC news site, but you get a wifi login page. That’s essentially a man in the middle attack. That’s not perhaps harmful, it’s normal operating procedure… VPNs basically work like this…

So, an example of what really happened to a student… I set up a page that just had them creating a very simple cookie page… I was expecting something simple… But one of them submitted a page with a bit of javascript… it is basically injecting code so if I connect to it, it will inject an ad to open in my VPN…. So in this case a student logged in to AnchorFree – magic fairy dust – and sees a website and injects code that is what I see when they submit the page in Blackboard Learn…

VPNs are not magic fairy dust. The University runs an excellent VPN – far better for coffee shops etc!

So, I like to end with some common advice:

  • Install anti virus scanner. Don’t turn off Windows 8+ automatically installed AV software… I ran a study where 50% of PhD students had switched off that software and firewalls…
  • Keep your software updated – best way to stay safe
  • Select strong passcode for important things you use all the time
  • For non-important stuff, use a password manager for less important things that you use rarely… Best to have different password between them…
  • Software I use:
    • Ad blockers – not just ads, reduce lots of extra content loading. The more websites you visit the more vulnerable you are
    • Ghostery and Privacy Badger
    • Lightbeam
    • Password Managers (LastPass, OnePassword and KeePass are most recommended
    • 2-factor like Yubikey – extra protection for e.g. Facebook.
    • If you are really serious: UMatrix and NoScript BUT it will break lots of pages…


Q1) It’s hard to get an average citizen to do everything… How do you get around that and just get the key stuff across…

A1) Probably it’s that common advice. The security community has gotten better at looking at 10 key stuff. Google did a study with Blackhats Infosec conference about what they would do… And asked on Amazon Mechanical Turj about what they would recommend to friends. About the only common answer amongst blackhats was “update your software”. But actually there is overlap… People know they should change passwords, and should use AV software… But AV software didn’t show on the Blackhat list… But 2-factor and password managers did…

Q2) What do you think about passwords… long or complex or?

A2) We did a study maybe 8 years ago on mnemonic passwords… And found that “My name is Inigo Montoya, you killed my father, prepare to die” was by far the most common. The issue isn’t length… It’s entropy. I think we need to think server side about how many other users have used the same password (based on encrypted version), and you need something that less than 3 people use…

Q2) So more about inability to remember it…

A2) And it depends on threat type… If someone knows you, your dog, etc… Then it’s easier… But if I can pick a password for a long time I might invest in it – but if you force people to change passwords they have to remember it. There was a study that people using passwords a lot use some affirmations, such as “I love God”… And again, hard to know how you protect that.

Q3) What about magic semantic email links instead of passwords…

A3) There is some lovely work on just how much data is in your email… That’s a poor mans version of the OAuth idea of getting an identity provider to authenticate the user. It’s good for the user, but that is one bigger stake login then… And we see SMS also being a mixed bag and being subject to attack… Ask a user though… “there’s nothing important in my email”.

Q4) How do you deal with people saying “I don’t have anything to hide”?

A4) Well I start with it not being about hiding… It’s more, why do you want to know? When I went to go buy a car I didn’t dress like a professor, I dressed down… I wanted a good price… If I have a lot of time I will refer them to Daniel Salvo’s Nothing to Hide.

Talk by Nicola Osborne (EDINA) covering Digital Footprints and how you can take control of your online self

And that will be me… So keep an eye out for tweets from others on the event hashtag: #UoEInfoSec.

And with a very brief summing up from Alastair Fenemore, the day came to a close. Thanks to the lovely University Information Security team for organising this really interesting event (and inviting me to speak) as part of their awesome Information Security Awareness Week programme.

 October 4, 2017  Posted by at 3:06 pm digital footprint, Events Attended, LiveBlogs Tagged with: ,  No Responses »
Aug 032017

Today I am at Repository Fringe which runs today and tomorrow in Edinburgh and is celebrating 10 years of Repofringe! I’m just here today – presenting a 10×10 on our recent Reference Rot in Theses: A HiberActive Pilot project work – and will be blogging whilst I’m here. As usual, as this is live, may include the odd typo or error so all comments, corrections, questions, additions, etc. are very much welcomed!

Welcome – Janet Roberts, Director of EDINA

My colleagues were explaining to me that this event came from an idea from Les Carr that there should be not just one repository conference, but also a fringe – and here were are at the 10th Repository Fringe on the cusp of the Edinburgh Fringe.

So, this week we celebrate ten years of repository fringe, and the progress we have made over the last 10 years to share content beyond borders. It is a space for debating future trends and challenges.

At EDINA we established the OpenDepot to provide a space for those without an institutional repository… That has now migrated to Zenodo… and the challenges are changing, around the size of data, how we store and access that data, and what those next generation repositories will look like.

Over the next few days we have some excellent speakers as well as some fringe events, including the Wiki Datathon – so I hope you have all brought your laptops!

Thank you to our organising team from EDINA, DCC and the University of Edinburgh. Thank you also to our sponsors: Atmire; FigShare; Arkivum; ePrints; and Jisc!

Opening Keynote – Kathleen Shearer, Executive Director COARRaising our game – repositioning repositories as the foundation for sustainable scholarly communication

Theo Andrew: I am delighted to introduce Kathleen, who has been working in digital libraries and repositories for years. COAR is an international organisation of repositories, and I’m pleased to say that Edinburgh has been a member for some time.

Kathleen: Thank you so much for inviting me. It’s actually my first time speaking in the UK and it’s a little bit intimidating as I know that you folks are really ahead here.

COAR is now about 120 members. Our activities fall into four areas: presenting an international voice so that repositories are part of a global community with diverse perspective. We are being more active in training for repository managers, something which is especially important in developing countries. And the other area is value added services, which is where today’s talk on the repository of the future comes in. The vision here is about

But first, a rant… The international publishing system is broken! And it is broken for a number of reasons – there is access, and the cost of access. The cost of scholarly journals goes up far beyond the rate of inflation. That touches us in Canada – where I am based, in Germany, in the UK… But much more so in the developing world. And then we have the “Big Deal”. A study of University of Montreal libraries by Stephanie Gagnon found that of 50k subscribed-to journals, really there were only 5,893 unique essential titles. But often those deals aren’t opted out of as the key core journals separately cost the same as that big deal.

We also have a participation problem… Juan Pablo Alperin’s map of authors published in Web of Science shows a huge bias towards the US and the UK, a seriously reduced participation in Africa and parts of Asia. Why does that happen? The journals are operated from the global North, and don’t represent the kinds of research problems in the developing world. And one Nobel Prize winner notes that the pressure to publish in “luxury” journals encourages researchers to cut corners and pursue trendy fields rather than areas where there are those research gaps. That was the cake with Zika virus – you could hardly get research published on that until a major outbreak brought it to the attention of the dominant publishing cultures, then there was huge appetite to publish there.

Timothy Gowers talks about “perverse incentives” which are supporting the really high costs of journals. It’s not just a problem for researchers and how they publish, its also a problem of how we incentivise researchers to publish. So, this is my goats in trees slide… It doesn’t feel like goats should be in trees… Moroccan tree goats are taught to climb the trees when there isn’t food on the ground… I think of the researchers able to publish in these high end journals as being the lucky goats in the tree here…

In order to incentivise participation in high end journals we have created a lucrative publishing industry. I’m sure you’ve seen the recent Guardian article: “is the staggeringly profitable business of science publishing bad for science”. Yes. For those reasons of access and participation. We see very few publishers publishing the majority of titles, and there is a real

My colleague Leslie Chan, funded by the International Development Council, talked about openness not just being about gaining access to knowledge but also about having access to participate in the system.

On the positive side… Open access has arrived. A recent study (Piwowar et al 2017) found that about 45% of articles published in 2015 were open access. And that is increasing every year. And you have probably seen the May 27th 2016 statement from the EU that all research they fund must be open by 2020.

It hasn’t been a totally smooth transition… APCs (Article Processing Charges) are very much in the mix and part of the picture… Some publishers are trying to slow the growth of access, but they can see that it’s coming and want to retain their profit margins. And they want to move to all APCs. There is discussion here… There is a project called OA2020 which wants to flip from subscription based to open access publishing. It has some traction but there are concerns here, particularly about sustainability of scholarly comms in the long term. And we are not syre that publishers will go for it… Particularly one of them (Elsevier) which exited talks in The Netherlands and Germany. In Germany the tap was turned off for a while for Elsevier – and there wasn’t a big uproar from the community! But the tap has been turned back on…

So, what will the future be around open access? If you look across APCs and the average value… If you think about the relative value of journals, especially the value of high end journals… I don’t think we’ll see lesser increases in APCs in the future.

At COAR we have a different vision…

Lorcan Dempsey talked about the idea of the “inside out” library. Similarly a new MIT Future of Libraries Report – published by a broad stakeholder group that had spent 6 months working on a vision – came up with the need for libraries to be open, trusted, durable, interdisciplinary, interoperable content platform. So, like the inside out library, it’s about collecting the output of your organisation and making is available to the world…

So, for me, if we collect articles… We just perpetuate the system and we are not in a position to change the system. So how do we move forward at the same time as being kind of reliant on that system.

Eloy Rodrigues, at Open Repository earlier this year, asked whether repositories are a success story. They are ubiquitous, they are adopted and networked… But then they are also using old, pre-web technologies; mostly passive recipients; limited interoperability making value added systems hard; and not really embedded in researcher workflows. These are the kinds of challenges we need to address in next generation of repositories…

So we started a working group on Next Generation Repositories to define new technologies for repositories. We want to position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication. And on top of which we want to be able to add layers of value added services. Our principles include distributed control to guard againts failure, change, etc. We want this to be inclusive, and reflecting the needs of the research communities in the global south. We want intelligent openness – we know not everything can be open.

We also have some design assumptions, with a focus on the resources themselves, not just associated metadata. We want to be pragmatic, and make use of technologies we have…

To date we have identified major use cases and user stories, and shared those. We determined functionality and behaviours; and a conceptual models. At the moment we are defining specific technologies and architectures. We will publish recommendations in September 2017. We then need to promote it widely and encourages adoption and implementation, as well as the upgrade of repositories around the world (a big challenge).

You can view our user stories online. But I’d like to talk about a few of these… We would like to enable peer review on top of repositories… To slowly incrementally replace what researchers do. That’s not building peer review in repositories, but as a layer on top. We also want some social functionalities like recommendations. And we’d like standard usage metrics across the world to understand what is used and hw.. We are looking to the UK and the IRUS project there as that has already been looked at here. We also need to address discovery… Right now we use metadata, rather than indexing full text content… So contat can be hard to get to unless the metadata is obvious. We also need data syncing in hubs, indexing systems, etc. reflect changes in the repositories. And we also want to address preservation – that’s a really important role that we should do well, and it’s something that can set us apart from the publishers – preservation is not part of their business model.

So, this is a slide from Peter Knoth at CORE – a repository aggregator – who talks about expanding the repository, and the potential to layer all of these additional services on top.

To make this happen we need to improve the functionality of repositories: to be of and not just on the web. But we also need to step out of the article paradigm… The whole system is set up around the article, but we need to think beyond that, deposit other content, and ensure those research outputs are appropriately recognised.

So, we have our (draft) conceptual model… It isn’t around siloed individual repositories, but around a whole network. And some of our draft recommendations for technologies for next generation repositories. These are a really early view… These are things like: ResourceSync; Signposting; Messaging protocols; Message queue; IIIF presentation API; AOAuth; Webmention; and more…

Critical to the widespread adoption of this process is the widespread adoption of the behaviours and functionalities for next generation repositories. It won’t be a success if only one software or approach takes these on. So I’d like to quote a Scottish industrialist, Andrew Carnegie: “strength is derived from unity…. “. So we need to coalesce around a common vision.

Ad it isn’t just about a common vision, science is global and networked and our approach has to reflect and connect with that. Repositories need to balance a dual mission to (1) showcase and provide access to institutional research and (2) be nodes in a global research network.

To support better networking in repositories and in Venice, in May we signed an International Accord for Repository Networks, with networks from Australasia, Canada, China, Europe, Japan, Latin America, South Africa, United States. For us there is a question about how best we work with the UK internationally. We work with with OpenAIRE but maybe we need something else as well. The networks across those areas are advancing at different paces, but have committed to move forward.

There are three areas of that international accord:

  1. Strategic coordination – to have a shared vision and a stronger voice for the repository community
  2. Interoperability and common “behaviours” for repositories – supporting the development of value added services
  3. Data exchange and cross regional harvesting – to ensure redundancy and preservation. This has started but there is a lot to do here still, especially as we move to harvesting full text, not just metadata. And there is interest in redundancy for preservation reasons.

So we need to develop the case for a distributed community-managed infrastructure, that will better support the needs of diverse regions, disciplines and languages. Redundancy will safeguard against failure. With less risk of commercial buy out. Places the library at the centre… But… I appreciate it is much harder to sell a distributed system… We need branding that really attracts researchers to take part and engage in †he system…

And one of the things we want to avoid… Yesterday it was announced that Elsevier has acquired bepress. bepress is mainly used in the US and there will be much thinking about the implications for their repositories. So not only should institutional repositories be distributed, but they should be different platforms, and different open source platforms…

Concluding thoughts here… Repositories are a technology and technologies change. What its really promoting is a vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system. This is really the future of libraries in the scholarly communication community. This is what libraries should be doing. This is what our values represent.

And this is urgent. We see Elsevier consolidating, buying platforms, trying to control publishers and the research cycle, we really have to move forward and move quickly. I hope the UK will remain engaged with this. And i look forward to your participation in our ongoing dialogue.


Q1 – Les Carr) I was very struck by that comment about the need to balance the local and the global I think that’s a really major opportunity for my university. Everyone is obsessed about their place in the global university ranking, their representation as a global university. This could be a real opportunity, led by our libraries and knowledge assets, and I’m really excited about that!

A1) I think the challenge around that is trying to support common values… If you are competing with other institutions it’s not always an incentive to adopt systems with common technologies, measures, approaches. So there needs to be a benefit for institutions in joining this network. It is a huge opportunity, but we have to show the value of joining that network It’s maybe easier in the UK, Europe, Canada. In the US they don’t see that value as much… They are not used to collaborating in this way and have been one of the hardest regions to bring onboard.

Q2 – Adam Field) Correct me if I’m wrong… You are talking about a Commons… In some way the benefits are watered down as part of the Commons, so how do we pay for this system, how do we make this benefit the organisation?

A2) That’s where I see that challenge of the benefit. There has to be value… That’s where value added systems come in… So a recommender system is much more valuable if it crosses all of the repositories… That is a benefit and allows you to access more material and for more people to access yours. I know CORE at the OU are already building a recommender system in their own aggregated platform.

Q3 – Anna Clements) At the sharp end this is not a problem for libraries, but a problem for academia… If we are seen as librarians doing things to or for academics that won’t have as much traction… How do we engage academia…

A3) There are researchers keen to move to open access… But it’s hard to represent what we want to do at a global level when many researchers are focused on that one journal or area and making that open access… I’m not sure what the elevator pitch should be here. I think if we can get to that usage statistics data there, that will help… If we can build an alternative system that even research administrators can use in place of impact factor or Web of Science, that might move us forward in terms of showing this approach has value. Administrators are still stuck in having to evaluate the quality of research based on journals and impact factors. This stuff won’t happen in a day. But having standardised measures across repositories will help.

So, one thing we’ve done in Canada with the U15 (top 15 universities in Canada)… They are at the top of what they can do in terms of the cost of scholarly journals so they asked us to produce a paper for them on how to address that… I think that issue of cost could be an opportunity…

Q4) I’m an academic and we are looking for services that make our life better… Here at Edinburgh we can see that libraries are the naturally the consistent point of connection with repository. Does that translate globally?

A4) It varies globally. Libraries are fairly well recognised in Western countries. In developing world there are funding and capacity challenges that makes that harder… There is also a question of whether we need repositories for every library.. Can we do more consortia repositories or similar.

Q5 – Chris) You talked about repository supporting all kinds of materials… And how they can “wag the dog” of the article

A5) I think with research data there is so much momentum there around making data available… But I don’t know how well we are set up with research data management to ensure data can be found and reused. We need to improve the technology in repositories. And we need more resources too…

Q6) Can we do more to encourage academics, researchers, students to reuse data and content as part of their practice?

A6) I think the more content we have at Commons level, the more it can be reused. We have to improve discoverability, and improve the functionality to help that content to be reused… There is huge use of machine reuse of content – I was speaking with Peter Knoth about this – but that isn’t easy to do with repositories…

Theo) It would be really useful to see Open Access buttons more visible, using repositories for document delivery, etc.

Chris Banks, Director of Library Services, Imperial CollegeFocusing upstream: supporting scholarly communication by academics

Gavin MacLachlan: I’d just like to welcome you again to Edinburgh, our beautiful city and our always lovely weather (note for remote followers: it’s dreich and raining!). I’m here to introduce Chris, whose work with LIBER and LERU will be well known to you.

Chris: This is my first fringe and I find it quite terrifying that I’m second up! Now, I’m going to go right back to basics and policy…

The Finch report in 2012 and Research Councils UK: we had RCUK policy; funding available for immediate Gold OA (including hybrid); embargo limits apply where Green OA chosen. Nevertheless the transition across the world is likely to take a number of years. For my money we’ve moved well on repositories, partly as the UK has gone it alone in terms of funding that transition process.

In terms of REF we had the Funding council REF policy (2013) which is applicable to all outputs that are to be submitted to the post 2014 REF exercise – effectively covers all researchers. No additional funding available Where Green OA selected, requirement for use of repositories. There were also two paragraphs (15 and 26) shaping what we have been doing…

That institutions are encouraged to go beyond the minimum (and will receive credit for doing so) – and the visibility of that is where we see the rise of University presses. And the statement that repositories do not need to be accessible for reuse and text mining, but that, again, there will be credit for those that are. Those two paragraphs have driven what we’ve been doing at Imperial.

At the moment UK researchers face the “policy stack” challenge. There are many funder policies; the REF policy differs substantially from other policies and applies to all UK research academics – you can comply with RCUK policy and fall foul of REF; many publisher policies…

So how can the REF policy help? Institutions recognise IP, copyright and open access policies are not necessarily supporting funder compliance – something needs to be done. There is a variety of approaches to academic IP observed in UK institutions. Legally in the UK the employer is the first copyright holder… subject to any other agreements and unless the individual is a contractors etc.

Publishers have varying approaches to copyright, licence to first publish, to outright copyright transfer. Licences are not read to academics. It’s not just in publishing… It’s social media… It’s a big problem.

For the library we want to create frictionless services. We need to upscale services to all researchers – REF policy requirements. We can’t easily give an answer to researchers on their OA options. So we started our work at imperial to address this, and to ensure our own organisational policy aligned with funder policies. We also wanted to preserve academic choice over publishing, and ability to sign away rights when necessary (though encouraging scrutiny of licenses). We have a desire to maximise impact of publication. And there is a desire to retain some re-use rights for us in teaching etc, including rights to diagrams etc.

The options we explored with academics was to do as we do at the moment – with academics signing over copyright, through to the institution claiming all copyright on all academic outputs. And we wanted to look at two existing models in between, the SPARC model (academic signed copyright over to publisher but licenses back); and the Harvard model – which we selected.

The Harvard model is implemented as part of the university OA policy. Academic deposits Author Accepted Manuscipts (AAMs) and grant a non-exclusive licence to the university for all journal articles. It is a well established policy and has been in use (elsewhere) since 2008. Where a journal seeks a waiver that can be managed by exception. And this is well tested in Ivy League colleges but also much more widely, including universities in Kenya.

The benefits here is that academia retains rights, authors have the right to make articles open access – open access articles have higher citations than closed ones. Authors can continue to publish in journal or choice irrespective of whether it allows ope access or not. Single means by which authors can comply with green open access policies. We are minimising reliance on hybrid open access – reducing “double dipping”, paying twice through subscriptions and APC – a complex and costly process. I think we and publishers see money for hybrid OA models drying up in the future, as the UK has pretty much been the one place doing that. Instead funding is typically used for pure gold OA models and publications.

We have mae some changes to the Harvard model policy to make it work in the context of UK law, also to ensure it facilitated funder deposit compliance and REF eligibility. The next step here is that 60 institutions overall are interested and we have a first mover group of around 12 institutions. We are discussing with publishers. And we have had wider engagement with the researcher, library, research office and legal office communities. We have a website and advocacy materials under development. We are also drafting boilerplate texts for authors, collaboration agreements etc. especially for large international collaborative projects. We have a steering committee established and that includes representatives from across institutions, and including a publisher.

At the moment we are addressing some publisher concerns and perceptions. Publishers are taking a very particular approach to us. We have a real range of responses. Some are very positive – including the pure gold (e.g. PLoS) and also learned society (e.g. Royal Society). Other publishers have raised concerns and are in touch with the steering group, and with ASPLP.

Summary of current concerns:

  • that it goes beyond requirements of Finch. We have stated that UK-SCL is to support REF and other
  • AAMs will be made available on publication. Response: yes, as per Harvard model around since 2008
  • Administrative burden on UK author/institutions as publishers would have to ask for waivers in 70-80%. We have responded that in other Harvard using experiences it has been less than 5% and we can’t see why UK authors would be treated differently.
  • They noted that only 8% of material submitted to the REF were green OA compliant. We have noted that only 8% submitted were green OA, not 8% of all eligible for submission.

Researchers have also raised concerns

  • the need to seek agreement from co-authors, especially in collaborations. Can be addressed through a phased/gradual implementation
  • Fear that a publisher will refuse to publish. Institutions using Harvard model repot no instances of this happening
  • Learned Societies – fear loss of income. No reliable research evidence to back up this fear.
  • Don’t like the CC-BY-ND Licence. That is to comply with RCUK but warrants further discussion.

Our next step is further meeting with PA/ALPSP to take place during the summer. We have encouraged proposals to delivery more than simply minimum REF eligibility which would resolve current funder/publisher policy stack complexity. We will finalise the website, waiver system, advocacy materials and boilerplate texts. To gain agreement on early mover institutions and on the dat of first adoption. And to notify publishers.

Another bit of late breaking news… Publishers recently went to HEFCE to ask about policy statements and, as a result of that, HEFCE will be clarifying that it is pushing for minimum compliance and encouraging more than that. One concern of the REF policy had been that only material submitted to the REF would have been deposited…

Last time my institution submitted 5k items, more than half were not monographs. We submitted 95% of our researchers. Out of that four items were looked at, now would be 2. And from that our funding is decided. And you can see, from that, why that bigger encouragement for the open scholarly ecosystem is so important.

I wanted to close by sharing some useful further materials and to credit others who have been part of this work.

One important thing to note is that we are trying to help researchers and university to comply as policies from funders and publishers evolve. I would like to see that result in discussion with publishers, and a move to all gold OA…  The AAMs is not the goal here, it’s the published article. Now that could see the end of repositories – something I am cautious of raising with this audience. Now in the


Q1) The elephant in the room here is Sci Hub… They are making 95% of published content available for free. You have AAMCs out there… And we haven’t seen subscriptions drop.

A1) So our initiative is about legal sharing. And also need to note that the UK is just one scholarly community. And others have not moved towards mandates and funding. I think it is a shame that fights have been picked is with institutions, when we have that elephant in the room…

Q2) Congratulations on the complex and intricate discussions you have been holding… Almost a legal game of Twister, where all the participants hate each other! This ia particular negotiation at the end of a process, at the end of the scholarly publishing change. How might you like your experience to feed into training of researchers and their own understanding of copyright, ownership of their own outputs.

A2) The challenge that we observe is that we have many younger researches and authors who are very passionate and ethically minded about openness. They are under pressure from supervisors who say they will not get tenured position if they don’t have a “good” journal on heir cv. And they are frustrated by the slow movement on the San Francisco research assessment declaration. Right now the quality journals remain those subscription high impact journals. But we have research showing the higher use of open access journals. But we still have that debate within academe that is slowing down that environment. But training researchers about their IP and what copyright. I also think it is interesting that Sir Mark Walpock in charge of UKRI as he has written before about the evolving scholarly record, and the scattering of articles and outputs, instead building online around research projects. He gave a talk at LIBER in 2015, and an article for THE. He was also at Wellcome when they first introduced their mandate so I think we really do have someone who understands that complexity and the importance of openness.

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. v2.juliet – A Model For SHERPA’s Mid-Term Infrastructure. Adam Field, Jisc

I’m here from SHERPA HQ at Jisc! I’m going to go back to 2006… We saw YouTube celebrating it’s first year… Eight out of Ten Cats began… The Nintendo WII appeared… And… SHERPA/JULIET was launched (SHERPA having been around in 2001). So, when we set up Sherpa REF as a beta service in 2016 we had to build something new, as JULIET hadn’t been set up for APIs and interoperability in that kind of way.

So, we set about a new SHERPA/JULIET based around a pragmatic, functional data model; to move data into a platform; to rebrand to Jisc norms; a like-for-like replacement; and a precedent for our other services as we update them all..

So, a quick demo… We now have the list of funders – as before – include an overview of open access. So if we choose Cancer Research UK… You can see the full metadata record, headings for more information. Can see which groups it is part of… We have a nice open API where you can retrieve information.

So, whilst it was a like for like rebuild we have snuck in new features, including FundRef DOIs – added automatically where possible, will be added to with manual input too. More flexible browsing. And a JSON API – really easy to work with. And in the future we’d like funders able to add to their own records and other usefu l3rd party editorial features. We want to integrate ElasticSearch. And we want to add microservices…

In terms of our process here… The hard part was analying the existing data, structuring it into a more appropriate shape… the next part was much easier… We configured EPrints, imported data, and added some bespoke service requirements.

Right now we have a beta of SHERPA/JULIET. All take a look please! We are now working on OpenDOAR. And then SHERPA/ROMEO is expected to be in early 2018.

We now want your feedback! Email with your comments and feedback. We’ll have feedback sessions later today that you can join us for and share your thoughts, ask questions about the API. And myself and Tom Davey our user interface person, are here all day – come find us!

  1. CORE Recommender: a plug in suggesting open access content. Nancy Pontika, CORE

I want to talk about discoverability of content in repositories… Salo 2008, Konkiel 2012 and Acharya 2017 talk about the challenges of discoverability in repositories. So, what is needed? Well, we need recommender systems in repositories so that we can increase the number of incoming links to relevant resources…

For those of you new to repositories, CORE is an aggregation service, we are global and focused we have started harvesting gold OA journals… We have services at various levels, including for text mining and data science. We have a huge collection of 8 million full text articles,  77 million metadata records… They are all in one place… So we can build a good recommendation system.

What effect can we have? Well it will increase the accessibility meaning more engagement, higher Click-Through Rate (CTR); twice as often people access resources on CORE via its recommender system than via search. And that additional engagement increases the time spent in your repositories – which is good for you. And you can open another way to find research…

For instance you can see within White Rose Research Online that suggested articles are presented that come from all of the collections of CORE, including basic geographic information… We would like crowd sourced feedback here. The more users that engage in feedback, the more the recommender will improve. We also get feedback from our community. At the moment the first tab is CORE recommendations, the second tab is institutional recommendations. We’ve had feedback that institutions would prefer it th eother way… We have heard that… Although we note that CORE recommendations are better as its a bigger data set…. We want to make sure the institutional tab appears first unless there are few recommendations/poor matches… We are working on this…

CORE Recommender has been installed at St Mary’s; LSHTM; the OU; University of York; University of Sheffield; York St John; Strathclyde University… and others with more to follow.

How does it work? Currently it’s an article-to-article recommender system. There is preprocessing to make this possible. What is unique is that recommendations is based on full text, and the full text is open access.

What is the CORE recommender not? It is not always right – but which recommendation system is? And it does not compare the “quality” of the recommended articles with the “quality” of the initial paper…

  1. Enhancing Two workflows with RSpace & Figshare: Active Data to Archival Data and Research to Publication. Rory Macneil, Research Space and Megan Hardeman of Figshare

Rory: Most of the discussion so far has been on publications, but we are talking about data. I think it’s fair to say that FigShare in the data field; and RSpace in the Lab notebooks world have been totally fixated on interoperability!

Right now most data does not make it into repositories… Some shouldn’t be but even the data that should be shared, is not. One way to increase deposit is to make it easy to deposit data. By integrating with RSpace notebooks that allows easy and quick deposit.

So, in RSPace you can capture metadata of various types. There are lots of ways to organise the data… And to use that you just need to activate the FigShare plugin. Then you select the data to deposit – exporting one or many documents… You select what you want to deposit, and the format to deposit in. You can export all of your work, or all of your lab’s work – whatever level of granularity you want to share… You deposit to Figshare… And over to Megan!

Megan: Figshare is a repository where users can male all of their research outputs availale in citable, accessible ways (all marked up for Google Scholar). You upload any file type (we support over 1000 types); we assign a DOI on an item level’ Store items in perpetuity (and backed up in DPN); track usage stats and Altmetrics (more exposure) and you can collaborate with researchers inside and outside your institutions.

figshare has na open API and integrations with RSpace nad other organizations and tools…

For an example… You can see an electronic lab notebook from RSpace which can be browsed and explored in the browser!

  1. Thesis digitisation project. Gavin Willshaw, University of Edinburgh

I’m digital curator here, and manager of the PhD digitisation project. This project sees a huge amount of content going into ERA, our repository. In the last three years we’ve moved from having two photographers to having two teams of photographers and cataloguers across two sites – we are investing heavily.

We have 17,000 PhD theses and that will all be online by the end of 2018. This will provide global access to entire PhD collection. We have obtained some equipment. We are creating metadata records, and also undertaking some preservation work where thre required.

The collection is largely standardised… But we have some latin and handwritten theses. We have awkward objects – like slices of lungs!

For 10k theses we have duplicates and they are scanned destructively. 3000 unique these are scanned non-destructively in house. And 40000 unique these outsourced. All are OCRed. And they are all catalogued, with data protection checks made before determining what can be shared online in full and which cannot.

In terms of copyright and licensing, that is still with the author. We have contacted some and had positive feedback. It’s a risk but a low risk. In any case we can’t asset the copyright or change licences on our own. And we already have over 2500 theses live.

And these theses are not just text… We have images that are rare and unusual. We share some of these highlights in our blog: and we use, on Twitter, the hashtag #UoEPhD. We have some notable theses… Alexander Macall Smith’s PhD is there; Isabelle Elmsley Hutton, a doctor in the first world war in the Balkans – so noted she was on a stamp in Serbia last year; Helen Pankhurts; and of course members of staff from the university too!

Impact wise the theses on ERA have been downloaded 2 million times since 2012. Those digitised in the project are seeing around 3000 downloads per month. Oddly our most popular thesis right now is on the differentiation of people in Norwich. We are also looking at what else we can d… Linking theses to Wikipedia; adding a thesis to Wikisource (and getting 10x the views); and now looking at what else… text and data mining etc.

  1. Weather Cloudy & Cool Harvest Begun’: St Andrews output usage beyond the repository. Michael Bryce, University of St Andrews

I didn’t expect it to actually be cloudy today…!

Our repository has been going since 2006, and use has been growing steadily…

Some of the highlights fro our repository has included research on New Caledonian crows and collaborative tool use. We also have farming diaries in our repository under Creative Commons license… Pushing that out into the community in blog posts and posters… So going beyond traditional publications and use. Our material on Syria has seen significant usage driven partly by use in OJS journals.

Our repository isn’t currently OpenAIRE compliant, but we have some content shared that way, which means a bigger audience… For instance material on virtual learning environments associated with a big EU project.

We’ve also been engaging in publishing engagement. The BBC asked us to digitise a thesis at the time of broadcasting Coast which added that work to our repository.

When we reached our 10,000th item we had cake! And helped publicise the student and their work to a wider audience…

Impact and the REF panel session

Brief for this session: How are institutions preparing for the next round of the Research Excellence Framework #REF2021, and how do repositories feature in this? What lessons can we learn from the last REF and what changes to impact might we expect in 2021? How can we improve our repositories and associated services to support researchers to achieve and measure impact with a view to the REF? In anticipation of the forthcoming announcement by HEFCE later this year of the details of how #REF2021 will work, and how impact will be measured, our panel will discuss all these issues and answer questions from RepoFringers.

Chair: Keith McDonald (KM), Assistant Director, Research and Innovation Directorate, Scottish Funding Council

The panel here include Pauline Jones, REF Manager at University of Edinburgh, and a veteran of the two previous REFs – she was at Napier University in 2008, and was working at the SFC (where I work) for the previous REF and was involved in the introduction of Impact.

Catriona Firth (CF), REF Deputy Manager, HEFCE

I used to work in universities, now I am a poacher-turned-gamekeeper I suppose!

Today I want to talk about Impact in REF 2014. Impact was introduced and assessed for the first time in REF 2014. After extensive consultation Impact was defined in an inclusive way. So, for REF 2014, impact was assessed in four-page case studies describing impacts that had occurred between January 2008 and July 2013. The submitting university must have produced high quality research since 1993 that contributed to the impacts. Each submitting unit (usually subject area) returned one case study, plus an additional case study for every 10 staff.

At the end of the REF 2014 we had 6,975 case studies submitted. On average across submissions 44% of impacts were judged outstanding (4*) by over 250 external users of research, working jointly with the academic panel. There was global spread of impact, and those impacts were across a wealth of areas of life policy, performance and creative practice, etc. There was, for instance, a case study of drama and performance that had an impact on nuclear technology. The HEFCE report on impact is highly recommended reading.

In November 2015 Lord NicholasStern was commissioned by the Minister of Universities and Science to conduct an independent review of the REF. He found that the exercise was excellent, and had achieved what was desired. However there were recommendations for improvement:

  • lowering the burden on institutions
  • less game-playing and use of loop holes
  • less personalisation, more institutionally focused – to take pressure off institutions but also recognise and reward institutional investment in research
  • recognition for investment
  • more rounded view of research activity – again avoiding distortion
  • interdisciplinary emphasis – some work could
  • broaden impact – and find ways to capture, reward, and promote the ways UK research has a benefit on and impacts society.

If you go to the HEFCE website you’ll see a video of a webinar on the Stern Review and specifically on staff and outputs, including that all research active staff should be included, that outputs be determined at assessment level, and that outputs should not be portable.

In terms of impact there was keenness to broaden and deepen the definition of impact and provide additional guidance. Policy was a safer kind of case studies before. The Stern Review emphasised a need for more focus on public engagement and impact on curricula and/or pedagogy. Reduce the number of required case studies to a minimum of one. And to include impact arising from research, research activity, or a “body of work”.  And having a quality threshold for underpinning research based on rigour – not just originality. And the opportunity to resubmit case studies if the impact was ongoing.

We have been receiving feedback – over 400 responses – which are being summarised. That feedback includes positive feedback on broadening impact and to aligning definitions of impact and on public engagement across funding bodies. There were some concerns about sub-profile based on one case study – especially in small departments. And in those case you’d know exactly whose work and case study was 4* (or not). There have been concerns about how you separate rigour from originality and significance. There was a lot of support for broader basis of research, but challenges in drawing boundaries in practice – in terms of timing and how far back you go… For scholarly career assessment do you go back further? And there was broad support for resubmission of 2014 case studies but questions about “additionality” – could it be the same sort of impact or did it need to be something new or additional? So, we are working on those questions at the moment.

The other suggestion from the Stern Review was the idea of an institutional level assessment of impact, giving universities opportunities to show case studies that didn’t fall neatly elsewhere. Th ecase studies arising from multi and interdisciplinary and collaborative work, and that that should be 10-20% of total ipact case studies; minimum of one. But feedback has been unclear here, particularly the conflation of interdisciplinary research with institutional profiles. Concern also that the University might take over a case study that would otherwise sit in another unit.

So, the next step is communications in summer/autumn 2017. There will be a REF initial decisions document. A summary of consultation responses. And there will be sharing of full consultation responses (with permission).  And there will be a launch for our REF 2021 website and Twitter account.

Anne-Sofie Laegran (ASL), Knowledge Exchange Manager, College of Arts, Humanities and Social Sciences, University of Edinburgh

KM: Is resubmission better for some areas than others?

ASL: I think it depends on what you mean by resubmission.. We have some good case studies arising from the same research as in 2014, but they are different impacts.

So.. I will give you a view from the trenches. To start I draw your attention to the University strapline that we have been “Influencing the world since 1583”. But we have to demonstrate and evidence that of course.

There has been impact of impact in academia… When I started in 2008 it was about having conversations about the importance of having an impact, and now it is much more about how you do this. There has been a culture change – all academic staff must consider th epotential impact of research. The challenge is not only to create impact but also to demonstrate impact. There is also an incentive to show ipact – it is part of career progression, it is part of recruitment, and it is part of promotion.

Impact of impact in academia has also been about training – how to develop pathways as well as how to capture and evidence impact. And there has been more support – expert staff as well as funding from funders and from the university.

In terms of developing pathways to impact we have borrowed questions that funders ask:

  • who may benefit from your researh?
  • what might th ebenefts ve?
  • what can you do to ensure potential beneficiaries and decision makers have th eopportunity to engage and benefit

And it is also – especially when capturing impact – about developing new skills and networks.

For instance… If you want to impact the NHS, who makes decisions, makes changes… If you are working with museums and galleries the decision makers will vary depending on where you can find that influence. And, for instance, you rarely partner with the Scottish Government, but you may influence NGOs who then influence Scottish Government.

Whatever the impact it starts from excellent research; which leads to knowledge exchange – public engagement, influencing policy, informing professional practice and service deliver, technology transfer; and that results in impact. You don’t “do” impact, your work is used and influences that then effects a change and an impact.

REF impact challenges include demonstrating change/benefit as opposed to reporting engagement activity. Attributing that change to research. And providing robust evidence. In 2014 that was particularly tricky as the guidance was in 2012 so people had to dig back… That should be less of an issue now, we’ve been collecting evidence along the way…

Some cases that we think did well, and/or had feedback were doing well:

  • College of art scholar, who has a dual appointment at the National Galleries of Scotland. She curated the Impressionism Scotland show with over 100k visitors. There was good feedback that also generated debate. It had a change on how th egallery curates shows. And on the market the works displayed went up in value – it had a real economic impact.
  • In law two researchers have been undertaking longitudinal work on young people, their lives, careers, and criminal careers. That is funded by Scottish Government. That research led to a new piece of policy based on the findings of that research. And there was a quote from Scottish Government showing a decline in youth crime, attributing that to the policy change, and which was based on research – showing that clear line of impact.
  • In sociology, a researcher wrote about the impact of research on the financial crisis for the London Review of Books, it was well received and he was named one of the most influential thinkers on the crisis; his work was translated to French; it was picked up in parliament; and Stephanie Flanders – then BBC economics editor – tweeted that this work was the most important on the financial crisis.
  • In music, researchers developed the Skoog, an instrument for disabled students to engage in music. They set up a company, they had investment. At the the time of the REF they had 6 employees, they were selling to organisations – so reaching many people. And in the cultural olympiad during the Olympics in 2012 they were also used, showing that wider impact.

So for each of these you can see there was both activity, and impact here.

In terms of RepoFringe areas I was asked to talk about the role of repositories and open access. It is potentially important. But typically we don’t see impact coming from the scholarly publication, it’s usually the activities coming from the research or from that publication. Making work open access certainly isn’t enough to just trigger impact.

Social media can be important but it needs to have high level of engagement, reach and/or significance to demonstrate more than mere dissemination. That Stephanie Flanders example wouldn’t be enough on it’s own, it works as part of demonstrating another impact, and a good way to track impact, to understand your audience… And to follow up and see what happened next…

Metrics – there is no doubt that numeric evidence was important. Our head of research said last time “numbers speak louder than adjectives” but they have to be relevant and useful. You need context. Standardised metrics/Altmetrics doesn’t work – a big report recently concluded the same. Altmetrics is about alternative metrics that can be tracked online, using DOI. A company called Altmetrics gathers that data, can be useful to track… And can be manipulated by friends with big Twitter followers.. It won’t replace case studies, but may be useful for tracking…

In terms of importance of impact… It relates to 20% of REF score; determined 26% of the funding in Scotland. Funding attracted per annum for the next 7 years:

  • 4* case study brings in £45-120k
  • 3* £9-25k
  • 2* £0
  • 4* output, for comparison, is work £7-15k…

The question that does come up is “what is impact” and yes, a single Tweet could be impact that someone has read and engaged with your work… But those big impact case studies are about making a real change and a range of impacts.

Pauline Jones (PJ), REF Manager and Head of Strategic Performance and Research Policy, University of Edinburgh

Thank you to Catriona and Anne-Sofie for introducing impact. I wanted to reinforce the idea that this is what we are doing anyway, making an impact on society, so it is important anyway, not only because of the REF.

Catriona suggested we had a “year off” but actually once REF happened we went into an intense period of evaluation and reflection, then of course the Stern review, consultation, general election… It has been quite non-stop. But actually even if that wasn’t all going on, we’d need our academics to be aware of the REF and of open access. I think open access is incredibly important, people are looking for it… Research is publicly funded… But it has required a lot of work to get up and running.

Although we are roughly at mid point between REFs, we are up and running, gathering impact, preparing to emphasise our impact. In terms of collecting evidence, depositing papers… That will happen in most universities. I think many will be doing the sort of Mock REFs/REF readiness exercises that we have been undertaking. We are also already thinking about drafting our case studies. As we get nearer to submission we’ll take decisions on inclusion… and getting everything ready.

So for REF 2021 we have a long time period over which submission is prepared. There is no period over which outputs, impacts, environment don’t count. Academics thinking now about what to include: 2017 REF readiness exercise to focus on open access and numbers; 2018 Mock REF to focus on quality. And we all have to have a CRIS system now to make that work.

What’s new here? We are still waiting for the draft to understand what’s happening. There are open access journal articles/conference proceedings. There are probably the challenges of submitting all research staff; decoupling the one-to-four staff-to-outputs ratio. That break is quite a good thing… Some researchers might struggle to create four key outputs – part time staff, staff with maternity leave, etc. But we want a sense of what that looks like from our mock/readiness work. That non-portability requirement seems useful and desirable, but speaking personally I think the researcher invests a lot – not just an institution – making that complex. Taking all those together I’m not sure the Stern idea of less complexity or burden here, not alongside those changes.

And then we have the institutional impact case studies – we had a number of interdisciplinary examples of work, so we are comfortable with that possibility. institutional environment is largely shared so doing that once for the whole university could be a really helpful reduction in work load. And each new element will have implications for how CRIS systems support REF submissions.

And as we prepare for REF 2021 we also have to look to REF 2028. We think open data will be important given the Concordat on Open Data Research (signed by HEFCE; RCUK; Universities UK; Wellcome) so we can get ready now, ready for when that happens. I’m pretty confident that open access monographs will be part of the next REF (following Monographs and Open Access HEFCE report). Then there is institutional impact – may not happen here but may be back. And then there are metrics. We have The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment Management.

IN terms of responsible metrics,we haven’t heard the last of them… Forum for responsible metrics’ Data and metrics to support decisions, not the sole driver; but the conversation will not end with th e metric tide. Metrics are alluring but to date they have’t worked well versus other types of evidence.

SO, how do we prepare?

  • For REF 2021 we need to be agile, support research managers to help academics deposit work, we have to help us lobby CROS system designers to have fit-for-purpose systems.
  • For REF 2028 we have to understand the benefits and challenges of making more research open
  • Be part of the conversation on responsible metrics – any bibliometrics experts in the room will stay busy.
  • And we want to have interoperability in systems…


Q1) How can we do something useful in terms of impact for case studies as our repository remit expands to different materials, different levels of openness, etc.

A1 – ASL) I think being easily accessible on Univesity websites, making them findable… Then also perhaps improved search functionality, and some way to categorise what comes out… If creating things other than peer reviewed publications – what is this? type information. I might have been too negative about repositories because historically our data wasn’t in those… I think actually sciences find that even more important…

Q1) For collecting evidence?

A1 – ASL) Yes. for collecting… Some have metrics that help us see how those impact have worked.

A1 – PJ) We’ve been talking about how best to use our CRIS to improve join up and understand those impacts…

A1 – CF) I think it’s also about getting that rounded view of the researcher – their outputs, publications, etc. being captured as impacts alongside the outputs… That could be useful and valuable…

Q2) A common theme was the burden of this exercise… But could be argued that it drives positive changes… How can the REF add to the sector?

A2 – CF) Wearing my personal and former job hat, as impact officer, I did see REF drive strategic investment in universities, including public engagement, that rewards, recognises, and encourages more engagement with the coomunity. There is real sharing of knowledge brought about by impact and the REF.

A2 – ASL) Totally agree.

A2 – PJ) More broadly the REF and RAE… They recognise the importance of research and supporting researchers. For us we get £75M a year through the research excellence funding. And we see the quality of research publications going up…

Q3) Do you have any comments on the academic community and how that supports the REF, particularly around data.

A3 – PJ) At Edinburgh we are very big – we submitted 1800 staff, we could have submitted up to 2500. In my previous role we had much smaller numbers of resarch staff… So they are different challenges and different systems… We have spoken to our Informatics colleagues to see what we can do. There are definitely benefits at th elevel of building a sysetm to manage this…

Q3) In an academic environment we have collegiate working practice, and need systems that work together.

A3 – PJ) We have a really distributed set up at Edinburgh, so we are constrantly having that conversation, and looking for cross cutting areas, exchanging information…

Q4) the relationship with the researcher changes here… In previous years universities talked about “their research” but it was actually all structured around the individual. In this new model that shift is big, and the role and responsibility of the organisation, the ways that schools interact with their researcher…

A4 – ASL) You do see that in pre-funding application activity with internal peer review processes that build that collegiality within the organisation…

Q5) I was intrigued with the comment that lots of impact isn’t associated with outputs… So that raises questions about the importance of outputs in the REF. Should we rebalance the value of the output and how it is valued.

A5 – ASL) Perhaps. For example when colleagues are providing evidence to government and parliament it is rare for publications to be referenced, and rare for publications to be read… I don’t think those matter… But those include methodology, rigour, evidence of quality of work. But that then becomes briefing papers etc… Otherwise you and I could just make a paper – but that would be opinion. So you need that (hard to read) academic publication, and you have to acknowledge that those are different things and have different roles – and that has to be demonstrated in the case studies.

A5 – CF) I think it’s an interesting question, especially thinking ahead to REF 2021… We are considering how those impacts o the field and impact on wider society are represented – some blue skies research won’t have impact for many years to come…

Q6) I think lay summaries of a piece of work is so crucial. Science Open and John Tennent is putting up lay summaries, you have Kudos and other things there contributing to that… The public want to understand what they are reading. I have personally sat on panels as a lay member and I know how hard it is to have that kind of lay summary is, to understand what has taken place.

A6 – ASL) You do need that lay summary of work, or briefing paper, or expert communities which are not lay people… You have to think about audiences and communicating your work widely, and target it… I think repositories are useful to access work, but it’s not enough to put it there – just as it isn’t enough to put an article out there – you have to actively reach out to your audiences.

A6 – CF) I would agree and I would add that there is such a key need to help academics to do that, to support skills for writing lay summaries… Getting it clearer benefits the researcher, their thinking, and how they tell others about their work – that truly enables knowledge exchange.

A6 – PJ) And it benefits the academic audience too. I was listening to a podcast where academics from across disciplines to see which papers were most valuable, and being readable to a lay audience was a key factor in how those papers did.

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. National Open Data and Open Science Policies in Europe. Martin Donnelly, DCC

I’m talking about some work we’ve done at DCC with SPARC Europe looking at Open Data and Policies across Europe.

The DCC is a centre of expertise in digital curation and data management. We maintain a watching brief on funders research data policies (largely focused on the UK). SPARC Europe is a membership organisation comprising academic institutions, library consortia, funding bodies, research institutes and publishers. Their gial is advocating change in scholarly communications for the benefit of research and society. And we have been collaborating since 2016 looking at open data and open science policies across Europe.

So, what is a policy? Well the dictionary definition works, it’s a set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people or an organisation.

In this work we looked at national policies – in some regions with a single research funder that could be the funder policy but, in the UK the AHRC wouldn’t count here as that is not a national policy across the whole country. And the last known analysis of this sort dates back to 2013 and much has changed in that time.

We began by compiling and briefing describing a list of national policies in the EU and some ERA states (IS, NO, CH). We circulated that list for comment and additions. We also sought intelligence from contacts fro DCC European projects to ask about the status of national approaches, forthcoming or exiting policies, etc. We then attempted to classify the policies.

Across the thirteen countries we found: 6 funder policies; 4 national plans or roadmaps; 2 concordat type documents; 2 laws; and one working paper. There are more than 13 there as some parallel documents. Identifying the lead, ranking or sponsoring organisation was not always straightforward, sometimes documents were co-signed by partners or groups. All of the policies discussed research data; 7 addressed open access publication explicitly; 6 addressed software, code, tools or models; 5 addressed methods, workflows or protocols, and one addressing physical (non-digital) samples.

Most policies were prescriptive or imperative. Monitoring of compliance and/or penalties are not that common. And these are new – only 2 policies pre-date 2014 but there are open preceeding open access policies. And new policies keep appearing as a result of our work… And two policies have been translated to English specifically because of this work (Estonia, Cyprus). The EC’s Open Research Data Pilot for Horizon 2020 was cited in multiple policy documents. And we hope that Brexit won’t diminish our role or engagement in European open data policy.

  1. IIIF: you can keep your head while all around are losing theirs! Scott Renton, University of Edinburgh

IIIF is the International Image Interoperability Framework which enables you to use images in your cultural heritage resources. IIIF works through two APIs. You bring in images through the Image API through IIIF compliant URLs, which have long URLs that include the region of the image, instructions for display, etc. The other API is the Presentation API which is much more about curation, including the ability to curate collections of content – so you can structure these as, say, an image of a building that is related to images of the rooms in that building.

We have images in Luna and we pushed on Luna to support IIIF and we did get success there. We have implemented IIIF in December. We made a lot of progress and have IIIF websites online. The workflows are really complex but it allows us to maintain one set of images and metadata through these embedded images, rather than having to copy and duplicate work. And those images are zoomable, draggable, etc. And Metadata games is also IIIF compliant. And it is feeding into our websites including the new St Cecilia’s Hall museum website.

Our next implementation was the Coimbra virtual implementation – which includes other people’s images. For our images, and other IIIF compliant organisations that was easy, but we had to set up our own server (named Cantaloupe) to manage those images from others.

The next challenge was the Mahabharata Scroll. It is a huge document but the IIF spec and Luna allows us to prorgamme a sequence of viewers…

And our main achievement has been Polyanno that allows annotation that can then be stored in manifests, to upload and discuss annotations. It’s proving very popular with the IIIF community. We have huge amount of images to convert to IIIF but lots of plans, lots of ideas, and lots to do…

We are also collabortion with NLS around their content, and are up to talk with others about IIIF!

  1. Reference Rot in theses: a HiberActive pilot. Nicola Osborne, EDINA, University of Edinburgh

This was my presentation – so notes from me here but some links to Site2Cite, a working demo/pilot tool for researchers to proactively archive their web citations as they are doing their research, to ensure that by the time they submit their PhD, have their work published, or begin follow up work, they still have access to those important resources.

Introducing Site2Cite:

Try out the Site2Cite tools for yourself here:

You can view my full slides (slightly updated to make more sense for those who didn’t hear the accompanying talk) from the 10×10 here:

This ISG Innovation Funded pilot project builds upon our previous Andrew W. Mellon-funded Hiberlink project; a collaboration between EDINA, Los Alamos National Laboratory, and the University of Edinburgh School of Informatics. The Hiberlink project built on and worked with Herbert Van de Sompel’s and his Memento work.

  1. Lifting the lid on global research impact: implementation and analysis of a Request a Copy service. Dimity Flanagan, London School of Economics and Political Science

Apologies for missing the first few minutes of Dimity’s talk…

LSE have only recently implemented the “request a copy” button in the repository but, having done that Dimity and colleagues have been researching how it is used.

We’ve had about 500 requests so far. The most popular requests have been for international relations, law and media areas. And we see demand from organisations and governments – including requests explicitly stating that they do not subscribe to the journal and they felt it was crucial to their work. There is that potential impact here being revealed in requests for articles ahead of key meetings and events, etc.

And these requests show huge reach form organisations locally and around the world.

One thing we have noticed is that we get a lot of requests from students who can definitely access the version of record through journals subscribed to by their university – they don’t realise and that causes avoidable delay. We have also seen academics linking from reading lists to restricted items in repositories. But, on a more positive note, we’ve had lots of requests from our alumni – 70% of our alumni are international and that shows really positive impact for our work.

Overall this button and the evidence that requests provide has been really positive.

  1. What RADAR did next: developing a peer review process for research plans. Nicola Siminson, Glasgow School of Art

RADAR captures performances, exhibitions, as well as traditional articles, monographs etc. It is hosted on EPrints. And we encourage staff to add as much metadata as possible. But increasingly it is being used internally, with staff developing annual research plans (ARPs) and that feeding into allocations in the year ahead.

These ARPs arose in part from the outcome of the REF 2014 assessment. These are peer reviewed (but not openly available) ARPs aim to enable research time to be allocated more effectively with a view to maximising the number of high quality submissions to the next REF. RADAR houses the template as it played a key role in the GSA REF 2014 submissions, and staff already use and know the system.

The templates went live in 2015, and was tweaked, tried and relaunched in February 2015. The ARP template captures the research, the researchers details, and the expected impact of their work – and a submit process. The process was really quite manual so we thought carefully about how this should work… So once submitted the digital ARP went into a manual process. Once piloted we built the peer review process into RADAR, including access management that allows the researcher sole access until submitted, and then manages access back and forth as required.

We discussed this work with EPrints in Autumn 2016 and development commenced in Spring 2017. This was quite an involved process. The system was live in time for ARP panel chairs to send feedback and results.

So the process now sees ARPS submit; RADAR admin provides Head of Research with report of all ARPs submitted. Then it goes through a series of review stages and feedback stages.

So administrators can view ARPs, panels, status, etc. and there is space for reviews to be captured and the outcome to be shared.

Lessons learned here… No matter how much testing you have done, you’ll still need to tweak and flag things – it’s useful to have a keen researcher to test it and feedback on ‘those tweaks. We still need to increase prominence of summary and decision for the researcher, with more differentiated fields for peer reviews, etc. In conclusion the ARP peer reviewed process has been integrated into RADAR and will be fully tested next year. The continued development of RADAR is bearing fruit – researchers are using the repository and adding more outputs, and offering greater visibility and downloads for GSA.

Explore our repository at

  1. Edinburgh DataVault: Local implementation of Jisc DataVault: the value of testing. Pauline Ward, EDINA

I am Pauline Ward from the Research Data Service at the University of Edinburgh, and I am based at EDINA which is part of the University. Jisc commissioned UoE’s Library and University Collections (L&UC) team to design a service for researchers to store data for the long term with the Jisc Data Vault. And we’ve now implemented a version of this at Edinburgh – using that software from L&UC and specified and managed by EDINA.

The DataVault allows safe data storage in the University’s archival storage option, which links this data to a metadata record in Pure without having to re-enter any of the data. And, optionally, to receive a DOI for the data which can be used in publications and other outputs – depending on the context and appropriate visibility of the data. That allows preservation of data at the University. The DataVault is not for making data public – we have a service called DataShare for that.

So, let’s talk about metadata… We push that metadata to Pure and keep DataVault metadata as concise as possible. We need metadata that is usable and have some manual intervention to check and curate that.

We had a fairly extensive user testing process, to ensure documentation works well, then we also recruited academics from across the University to bring us their data and test the system to help us ensure it met their needs.

So, the interim version is out there, and we are continuing to develop and improve it.

  1. Data Management & Preservation using PURE and Archivematica at Strathclyde. Alan Morrisson, University of Strathclyde

We are governed and based in the research department. We wanted to look at both research data management and long term preservation, including reflecting on whether Pure is the right tool for the job here. Pure was already in use at Strathclyde when our Research Data Deposit Policy was being developed, so we deliberately made the policy as open as possible. Also Strathclyde is predominantly a STEM university, and we started off by surveying what else was out there… We knew the quantity and type of data coming in…

And since we opened up the service, in terms of data deposits to date we are have seen a steady increase from about 200 to 400 data sets over the last year.

In terms of our preservation and curation systems we have Pure in place and that does a lot – data storage, metadata, DOI etc. But we’ve also recently implemented Archivematica – it’s free, it’s open source, it’s compatible with Jisc DataVault. So the workflow right now is that data, metadata and related outputs are added to to Pure, and a DOI minted. This feeds the knowledgebase portal. In parallel the data from Pure goes to Archivematica where it is ingested and processed for preservation, and AIP METS file cleaned using METSflask before being stored.

The benefits of this set up is that Pure is familiar to researchers, does a good job of metadata management and related content and has a customised front end (Knowledgebase). Archivematica is well supported, open access, and designed for archiving. But those systems don’t work together, we are manually moving data across. Pure is designed for storage and presentation, not curation. Archivematica only recognises about 40% of the data.

So, in the future we are reviewing our system, perhaps using Pure for metadata only. We are keeping an eye on Jisc RDSS and considering possible Arkivum like storage. And generally looking at what is possible and most appropriate moving forward for curation and archiving.

  1. Open Access… From Oblivion… To the Spotlight? Dawn Hibbert, University of Northampton

I’ll be looking back over the last ten years… And actually ten years back I was working here in Accommodation Services, so not thinking about repositories at all!

Looking back at 2007/8 in the repository world we had our NECTAR repository. Then in 2011, Jisc funded project enabled an author deposit tool for NECTAR. At that time we had a carrot/incentive for deposit, but no stick. Which was actually a nice thing as we’ve now slipped more towards it all being about the REF.

By 2012/13 we engaged with our researchers around open access who had feedback such as “it’s in the library – you can get a copy from there” or “it’s only £30 to buy the journal I publish in, if I make my article free the journal go under” or “My work is not funded by RCUK so why should my work be open access”. We wanted everything open… But by 2014/15 (and the HEFCE announcement) we were still getting “I don’t have to give you anything until 2016” and similar… And we get that idea of “it’s all about the REF”. And it is not. Using the REF in that way, and the repository in that way overlooks the other benefits of open access.

So in 2016/17 HEFCE compliance started. Attitudes have shifted. But the focus has all been about gold APCs and the idea of the university paying. When actually we are using the HEFCE deposit and (later) open access green OA route. And for us we really want researchers to deposit much more than the open access part (we can do that later on).

So, in 2017 and beyond we are looking at emphasising the benefits, sharing that information, being positive about the opportunities, no just using the HEFCE stick. And for open access work we are looking at improving acceptance, extending open access to other outputs, and focus on visibility of research outputs – the Kudos type tool. And we are shifting the focus to Digital Preservation.

We are looking at datasets being open access too. RDM and Digital Preservation gaining ground. And when work is deposited, shared, tweeted, etc. that can really shift attitudes and show benefits and engagement for academic colleagues.

But we still see lots of money spent on PA and journal subscriptions. And we have yet to see what happens with RCUK and REF compliance.

  1. Automated metadata collection from the researcher CV Lattes Platform to aid IR ingest. Chloe Furnival, Universidade Federal de São Carlos

I am pleased to present work by myself and my colleagues from Sao Paulo in Brazil. Back in 1999 all Brail universities were required to share CVs of their research and academic staff on a platform (Curriculo Lattes) which now has over 2 million records now.

However, our University’s repository was only launched in 2016. Different to many universities using Web of Science or Scopus capturing their researchers’ work there, we saw that the Lattes CV Platform was the key and most up to date metadata – always extremely updated as required in funding. It is a really useful stepping stone to identify our staff publications for the initial repository.

So we have very well known researchers, Mena-Chalco and Cesar Jr (2013) who developed ScriptLattes for this extraction. But then the CNPq decided to implement a CAPTCHA which inhibits this Script. They alleged this was for security reasons but it created an uproar as it was seen as “our data”… So, this has all been very complicated and impacted on our plans to identify our own researchers’ work… So we went for SOAP (Simple Object Access Protocol). We also developed a proxy server to deal with CNPq limits. This is based on OpenResty platform to share access to the Lattes SOAP webservices. That lets us manage our local IP address and manage load/avoid going over capacity.

We extract data in xml format, then process in Python to generate Dublin Core. Then we use another script to eliminate duplicates using the Jaccard measure that helps detects differences… Then, once processed, it is held in DSpace. Each record in Lattes has a unique identifier as that site uses an ID number that all Brazillians are required to have to access e.g. a bank account.

So now we have the CVs of 1,166 teaching staff and researchers working at our HEI were retrieved in just 11 minutes. including metadata for 78K journal articles and proceedings papers. We had the specific objective of gaining direct and official access to public metadata held in Lattes CV.

  1. The Changing Face of Goldsmiths Research Online. Jeremiah Spillane, Goldsmiths, University of London

JS: Goldsmiths Research Online started as a vanilla install of EPrints, and it has become customised more and more over time. Important to that development have been several projects. The Jisc Kultur project created a transferable and sustainable institutional repository model for research output i the creative and applied arts, and creating facility for capturing multimedia content in repositories.

Kultur led to the Jisc Kaptur project, led by VADS working with various art colleges including Goldsmiths and GSA.

Then in 2009 we had the Defiant Objects project which looked to understand what makes some objects more difficult to deposit than others.

Jeremiah’s colleague: RAE/REF work has looked at policy versus the open access ethos – and striking the right balance there. So, the Goldsmiths website now includes content brought in from the repository. And that is now organised depending on the needs of different departments. We are also redesigning the website to better embed content to enable exploration of visual content. And the new design should be in place by autumn this year.

Speaking of design… We have been working with OJS but have been wanting to more thoroughly design OJS journals, so we have a new journal coming, Volupte, which runs on OJS in the background but uses SquareSpace at the front end – that’s a bit of an experiment at the moment.

JS: So, the repository continues to develop, whilst our end users primarily focus on their research.

Take a look at:

And with that Day One, and my visit to Repository Fringe 2017, is done. 

Aug 022017

As we reach the end of the academic year, and I begin gearing up for the delightful chaos of the Edinburgh Fringe and my show, Is Your Online Reputation Hurting You?, I thought this would be a good time to look back on a busy recent few months of talks and projects (inspired partly by Lorna Campbell’s post along the same lines!).

This year the Managing Your Digital Footprint work has been continuing at a pace…

We began the year with funding from the Principal’s Teaching Award Scheme for a new project, led by Prof. Sian Bayne: “A Live Pulse”: Yik Yak for Teaching, Learning and Research at Edinburgh. Sian, Louise Connelly (PI for the original Digital Footprint research), and I have been working with the School of Informatics and a small team of fantastic undergraduate student research associates to look at Yik Yak and anonymity online. Yik Yak closed down this spring which has made this even more interesting as a cutting edge research project. You can find out more on the project blog – including my recent post on addressing ethics of research in anonymous social media spaces; student RA Lilinaz’s excellent post giving her take on the project; and Sian’s fantastic keynote from#CALRG2017, giving an overview of the challenges and emerging findings from this work. Expect more presentations and publications to follow over the coming months.

Over the last year or so Louise Connelly and I have been busy developing a Digital Footprint MOOC building on our previous research, training and best practice work and share this with the world. We designed a three week MOOC (Massive Open Online Course) that runs on a rolling basis on Coursera – a new session kicks off every month. The course launched this April and we were delighted to see it get some fantastic participant feedback and some fantastic press coverage (including a really positive experience of being interviewed by The Sun).

The MOOC has been going well and building interest in the consultancy and training work around our Digital Footprint research. Last year I received ISG Innovation Fund support to pilot this service and the last few months have included great opportunities to share research-informed expertise and best practices through commissioned and invited presentations and sessions including those for Abertay University, University of Stirling/Peer Review Project Academic Publishing Routes to Success event, Edinburgh Napier University, Asthma UK’s Patient Involvement Fair, CILIPS Annual Conference, CIGS Web 2.0 & Metadata seminar, and ReCon 2017. You can find more details of all of these, and other presentations and workshops on the Presentations & Publications page.

In June an unexpected short notice invitation came my way to do a mini version of my Digital Footprint Cabaret of Dangerous Ideas show as part of the Edinburgh International Film Festival. I’ve always attended EIFF films but also spent years reviewing films there so it was lovely to perform as part of the official programme, working with our brilliant CODI compare Susan Morrison and my fellow mini-CODI performer, mental health specialist Professor Steven Lawrie. We had a really engaged audience with loads of questions – an excellent way to try out ideas ahead of this August’s show.

Also in June, Louise and I were absolutely delighted to find out that our article (in Vol. 11, No. 1, October 2015) for ALISS Quarterly, the journal of the Association of Librarians and Information Professionals in the Social Sciences, had been awarded Best Article of the Year. Huge thanks to the lovely folks at ALISS – this was lovely recognition for our article, which can read in full in the ALISS Quarterly archive.

In July I attended the European Conference on Social Media (#ecsm17) in Vilnius, Lithuania. In addition to co-chairing the Education Mini Track with the lovely Stephania Manca (Italian National Research Council), I was also there to present Louise and my Digital Footprint paper, “Exploring Risk, Privacy and the Impact of Social Media Usage with Undergraduates“, and to present a case study of the EDINA Digital Footprint consultancy and training service for the Social Media in Practice Excellence Awards 2017. I am delighted to say that our service was awarded 2nd place in those awards!

Social Media in Practice Excellence Award 2017 - 2nd place - certificate

My Social Media in Practice Excellence Award 2017 2nd place certificate (still awaiting a frame).

You can read more about the awards – and my fab fellow finalists Adam and Lisa – in this EDINA news piece.

On my way back from Lithuania I had another exciting stop to make at the Palace of Westminster. The lovely folk at the Parliamentary Digital Service invited me to give a talk, “If I Googled you, what would I find? Managing your digital footprint” for their Cyber Security Week which is open to members, peers, and parliamentary staff. I’ll have a longer post on that presentation coming very soon here. For now I’d like to thank Salim and the PDS team for the invitation and an excellent experience.

The digital flyer for my CODI 2017 show - huge thanks to the CODI interns for creating this.

The digital flyer for my CODI 2017 show (click to view a larger version) – huge thanks to the CODI interns for creating this.

The final big Digital Footprint project of the year is my forthcoming Edinburgh Fringe show, Is Your Online Reputation Hurting You? (book tickets here!). This year the Cabaret of Dangerous Ideas has a new venue – the New Town Theatre – and two strands of events: afternoon shows; and “Cabaret of Dangerous Ideas by Candlelight”. It’s a fantastic programme across the Fringe and I’m delighted to be part of the latter strand with a thrilling but challengingly competitive Friday night slot during peak fringe! However, that evening slot also means we can address some edgier questions so I will be talking about how an online reputation can contribute to fun, scary, weird, interesting experiences, risks, and opportunities – and what you can do about it.

QR code for CODI17 Facebook Event

Help spread the word about my CODI show by tweeting with #codi17 or sharing the associated Facebook event.

To promote the show I will be doing a live Q&A on YouTube on Saturday 5th August 2017, 10am. Please do add your questions via Twitter (#codi17digifoot) or via this anonymous survey and/or tune in on Saturday (the video below will be available on the day and after the event).

So, that’s been the Digital Footprint work this spring/summer… What else is there to share?

Well, throughout this year I’ve been working on a number of EDINA’s ISG Innovation Fund projects…

The Reference Rot in Theses: a HiberActive Pilot project has been looking at how to develop the fantastic prior work undertaken during the Andrew W. Mellon-funded Hiberlink project (a collaboration between EDINA, Los Alamos National Laboratory, and the University of Edinburgh School of Informatics), which investigated “reference rot” (where URLs cease to work) and “content drift” (where URLs work but the content changes over time) in scientific scholarly publishing.

For our follow up work the focus has shifted to web citations – websites, reports, etc. – something which has become a far more visible challenge for many web users since January. I’ve been managing this project, working with developer, design and user experience colleagues to develop a practical solution around the needs of PhD students, shaped by advice from Library and University Collections colleagues.

If you are familiar with the Memento standard, and/or follow Herbert von de Sompel and Martin Klein’s work you’ll be well aware of how widespread the challenge of web citations changing over time can be, and the seriousness of the implications. The Internet Archive might be preserving all the (non-R-rated) gifs from Geocities but without preserving government reports, ephemeral content, social media etc. we would be missing a great deal of the cultural record and, in terms of where our project comes in, crucial resources and artefacts in many modern scholarly works. If you are new the issue of web archiving I would recommend a browse of my notes from the IIPC Web Archiving Week 2017 and papers from the co-located RESAW 2017 conference.

A huge part of the HiberActive project has been working with five postgraduate student interns to undertake interviews and usability work with PhD students across the University. My personal and huge thanks to Clarissa, Juliet, Irene, Luke and Shiva!

Still from the HiberActive gif featuring Library Cat.

A preview of the HiberActive gif featuring Library Cat.

You can see the results of this work at our demo site,, and we would love your feedback on what we’ve done. You’ll find an introductory page on the project as well as three tools for archiving websites and obtaining the appropriate information to cite – hence adopting the name one our interviewees suggested, Site2Cite. We are particularly excited to have a tool which enables you to upload a Word or PDF document, have all URLs detected, and which then returns a list of URLs and the archived citable versions (as a csv file).

Now that the project is complete, we are looking at what the next steps may be so if you’d find these tools useful for your own publications or teaching materials, we’d love to hear from you.  I’ll also be presenting this work at Repository Fringe 2017 later this week so, if you are there, I’ll see you in the 10×10 session on Thursday!

To bring the HiberActive to life our students suggested something fun and my colleague Jackie created a fun and informative gif featuring Library Cat, Edinburgh’s world famous sociable on-campus feline. Library Cat has also popped up in another EDINA ISG Innovation-Funded project, Pixel This, which my colleagues James Reid and Tom Armitage have been working on. This project has been exploring how Pixel Sticks could be used around the University. To try them out properly I joined the team for fun photography night in George Square with Pixel Stick loaded with images of notable University of Edinburgh figures. One of my photos from that night, featuring the ghostly image of the much missed Library Cat (1.0) went a wee bit viral over on Facebook:

James Reid and I have also been experimenting with Tango-capable phone handsets in the (admittedly daftly named) Strictly Come Tango project. Tango creates impressive 3D scans of rooms and objects and we have been keen to find out what one might do with that data, how it could be used in buildings and georeferenced spaces. This was a small exploratory project but you can see a wee video on what we’ve been up to here.

In addition to these projects I’ve also been busy with continuing involvement in the Edinburgh Cityscope project, which I sit on the steering group for. Cityscope provided one of our busiest events for this spring’s excellent Data Festread more about EDINA’s participation in this new exciting event around big data, data analytics and data driven innovation, here.

I have also been working on two rather awesome Edinburgh-centric projects. Curious Edinburgh officially launched for Android, and released an updated iOS app, for this year’s Edinburgh International Science Festival in April. The app includes History of Science; Medicine; Geosciences; Physics; and a brand new Biotechnology tours that led you explore Edinburgh’s fantastic scientific legacy. The current PTAS-funded project is led by Dr Niki Vermeulen (Science, Technology & Innovation Studies), with tours written by Dr Bill Jenkins, and will see the app used in teaching around 600 undergraduate students this autumn. If you are curious about the app (pun entirely intended!), visiting Edinburgh – or just want to take a long distance virtual tour – do download the app, rate and review it, and let us know what you think!

Image of the Curious Edinburgh History of Biotechnology and Genetics Tour.

A preview of the new Curious Edinburgh History of Biotechnology and Genetics Tour.

The other Edinburgh project which has been progressing at a pace this year is LitLong: Word on the Street, an AHRC-funded project which builds on the prior LitLong project to develop new ways to engage with Edinburgh’s rich literary heritage. Edinburgh was the first city in the world to be awarded UNESCO City of Literature status (in 2008) and there are huge resources to draw upon. Prof. James Loxley (English Literature) is leading this project, which will be showcased in some fun and interesting ways at the Edinburgh International Book Festival this August. Keep an eye on for updates or follow @litlong.

And finally… Regular readers here will be aware that I’m Convener for eLearning@ed (though my term is up and I’ll be passing the role onto a successor later this year – nominations welcomed!), a community of learning technologists and academic and support staff working with technologies in teaching and learning contexts. We held our big annual conference, eLearning@ed 2017: Playful Learning this June and I was invited to write about it on the ALTC Blog. You can explore a preview and click through to my full article below.

Playful Learning: the eLearning@ed Conference 2017

Phew! So, it has been a rather busy few months for me, which is why you may have seen slightly fewer blog posts and tweets from me of late…

In terms of the months ahead there are some exciting things brewing… But I’d also love to hear any ideas you may have for possible collaborations as my EDINA colleagues and I are always interested to work on new projects, develop joint proposals, and work in new innovative areas. Do get in touch!

And in the meantime, remember to book those tickets for my CODI 2017 show if you can make it along on 11th August!

Jul 042017

Today I am again at the Mykolo Romerio Universitetas in Vilnius, Lithuania, for the European Conference on Social Media 2017. As usual this is a liveblog so additions, corrections etc. all welcome… 

Keynote presentation: Daiva Lialytė, Integrity PR, Lithuania: Practical point of view: push or pull strategy works on social media 

I attended your presentations yesterday, and you are going so far into detail in social media. I am a practitioner and we can’t go into that same sort of depth because things are changing so fast. I have to confess that a colleague, a few years ago, suggested using social media and I thought “Oh, it’s all just cats” and I wasn’t sure. But it was a big success, we have six people working in this area now. And I’m now addicted to social media. In fact, how many times do you check your phone per day? (various guesses)…

Well, we are checking our smartphones 100-150 times per day. And some people would rather give up sex than smartphones! And we have this constant flood of updates and information – notifications that pop up all over the place… And there are a lot of people, organisations, brands, NGOs, etc. all want our attention on social media.

So, today, I want to introduce three main ideas here as a practitioner and marketer…

#1 Right Mindset

Brands want to control everything, absolutely everything… The colour, the font, the images, etc. But now social media says that you have to share your brand in other spaces, to lose some control. And I want to draw on Paul Holmes, a PR expert (see and he says when he fell in love with social media, there were four key aspects:

  • Brands (in)dependency
  • Possibilities of (non)control
  • Dialogue vs monologue
  • Dynamic 24×7

And I am going to give some examples here. So Gap, the US fashion brand, they looked at updating their brand. They spent a great deal of money to do this – not just the logo but all the paperwork, branded items, etc. They launched it, it went to the media… And it was a disaster. The Gap thought for a few days. They said “Thank you brand lover, we appreciate that you love our brand and we are going to stick with the old one”. And this raises the question of to whom a brand belongs… Shareholders or customers? Perhaps now we must think about customers as owning the brand.

Yesterday I saw a presentation from Syracuse on University traditions – and some of the restrictions of maintaining brand – but in social media that isn’t always possible. So, another example… Lagerhaus (like a smaller scale Ikea). They were launching a new online store, and wanted to build community (see videos) so targeted interior six design blogs and created “pop up online stores” – bloggers could select products from the store’s selection, and promote them as they like. That gained media attention, gained Facebook likes for the store’s Facebook page. And there was then an online store launch, with invitees approached by bloggers, and their pop up stores continue. So this is a great example of giving control to others, and building authentic interest in your brand.

In terms of dialogue vs monologue I’d quote from Michael Dell here, on the importance of engaging in honest, direct conversations with customers and stakeholders. This is all great… But the reality is that many who talk about this, many are never ever doing this… Indeed some just shut down spaces when they can’t engage properly. However, Dell has set up a social media listening and command centre. 22k+posts are monitored daily, engaging 1000+ customers per week. This was tightly integrated with @dellcares Twitter/Facebook team. And they have managed to convert “ranters” to “ravers” in 30% of cases. And a decrease of negative commentary since engagement in this space. Posts need quick responses as a few minutes, or hours, are great, longer and it becomes less and less useful…

Similarly we’ve seen scandinavian countries and banks engaging, even when they have been afraid of negative comments. And this is part of the thing about being part of social media – the ability to engage in dialogue, to be part of and react to the conversations.

Social media is really dynamic, 24×7. You have to move fast to take advantage. So, Lidl… They heard about a scandal in Lithuania about the army paying a fortune for spoons – some were €40 each. So Lidl ran a promotion for being able to get everything, including spoons there cheaper. It was funny, clever, creative and worked well.

Similarly Starbucks vowing to hire 10,000 refugees in the US (and now in EU) following Trump’s travel ban, that was also being dynamic, responding quickly.

#2 Bold Actions

When we first started doing social media… we faced challenges… Because the future is uncertain… So I want to talk about several social media apps here…

Google+ launched claiming to be bigger than Facebook, to do it all better. Meanwhile WhatsApp… Did great… But disappearing as a brand, at least in Lithuania. SnapChat has posts disappearing quickly… Young people love it. The owner has said that it won’t be sold to Facebook. Meanwhile Facebook is trying desperately to copy functionality. We have clients using SnapChat, fun but challenging to do well… Instagram has been a big success story… And it is starting to be bigger than Facebook in some demographics.

A little history here… If you look at a world map of social networks from December 2009, we see quite a lot of countries having their own social networks which are much more popular. By 2013, it’s much more Facebook, but there are still some national social media networks in Lithuania or Latvia. And then by 2017 we see in Africa uptake of Twitter and Instagram. Still a lot of Facebook. My point here is that things move really quickly. For instance young people love SnapChat, so we professionally need to be there too. You can learn new spaces quickly… But it doesn’t matter as you don’t have to retain that for long, everything changes fast. For instance in the US I have read that Facebook is banning posts by celebrities where they promote items… That is good, that means they are not sharing other content…

I want to go in depth on Facebook and Twitter. Of course the most eminent social media platform is Facebook. They are too big to be ignored. 2 billion monthly active Facebook users (June 2017). 1.28 billion people log onto Facebook daily. 83 million fake profiles. Age 25 to 34 at 29.7% of users are biggest age group. For many people they check Facebook first in the morning when they wake up. And 42% of marketers report that Facebook is very important to their business. And we now have brands approaching us to set up Facebook presence no matter what their area of work.

What Facebook does well is most precise targeting – the more precise the more you pay, but that’s ok. So that’s based on geolocation, demographic characteristic, social status, interests, even real time location. That works well but remember that there are 83 million fake profiles too.

So that’s push, what about pull? Well there are the posts, clicks, etc. And there is Canvas – which works for mobile users, story driven ads (mini landing), creative story, generate better results and click through rates. (we are watching a Nespresso mobile canvas demo). Another key tool is Livestream – free of charge, notifications for your followers, and it’s live discussion. But you need to be well prepared and tell a compelling story to make proper use of this. But you can do it from anywhere in the world. For instance one time I saw livestream of farewell of Barack Obama – that only had 15k viewers though so it’s free but you have to work to get engagement.

No matter which tool, “content is the king!” (Bill Gates, 1996). Clients want us to create good stories here but it is hard to do… So what makes the difference? The Content Marketing Institute (US), 2015 suggest:

  1. Content
  2. Photos
  3. Newsletters
  4. Video
  5. Article
  6. Blogs
  7. Events
  8. Infographics
  9. Mobile applications
  10. Conferences and Livestreams

So, I will give some examples here… I’ll show you the recent winner of Cannes Lions 2017 for social media and digital category. This is “Project Graham” – a public driver safety campaign about how humans are not designed to survive a crash… Here is how we’d look if we were – this was promoted heavily in social media.

Help for push from Facebook – well the algorithms prioritise content that does well. And auctions to reach your audience mean that it is cheaper to run good content that really works for your audience.

And LinkedIn meanwhile is having a renaissance. It was quite dull, but they changed their interface significantly a few months back, and now we see influencers (in Lithunia) now using LinkedIn, sharing content there. For instance lawyers have adopted the space. Some were predicting LinkedIn would die, but I am not so sure… It is the biggest professional social network – 467 million users in 200 countries. And it is the biggest network of professionals – a third have LinkedIn profile. Users spend 17 minutes per dat, 40% use it every day, 28% of all internet users use LinkedIn. And it is really functioning as a public CV, recruitment, and for ambassadorship – you can share richer information here.

I wanted to give a recent example – it is not a sexy looking case study – but it worked very well. This was work with Ruptela, a high tech company that provides fleet management based on GPS tracking and real-time vehicle monitoring and control. They needed to hire rapidly 15 new sales representatives via social media. That’s a challenge as young people, especially in the IT sector – are leaving Lithuania or working in Lithuania-based expertise centres for UK, Danish, etc. brands.

So we ran a campaign, on a tiny budget (incomparable with headhunters for instance), around “get a job in 2 days” and successfully recruited 20 sales representatives. LinkedIn marketing is expensive, but very targeted and much cheaper than you’d otherwise pay.

#3 Right Skills

In terms of the skills for these spaces:

  • copywriter (for good storytelling)
  • visualist (graphics, photo, video)
  • community manager (to maintain appropriate contact) – the skills for that cannot be underestimated.
  • And… Something that I missed… 

You have to be like a one man band – good at everything. But then we have young people coming in with lots of those skills, and can develop them further…

So, I wanted to end on a nice story/campaign… An add for Budweiser for not drinking and driving


Q1) Authenticity is the big thing right now… But do you think all that “authentic” advertising content may get old and less effective over time?

A1) People want to hear from their friends, from people like them, in their own words. Big brands want that authenticity… But they also want total control which doesn’t fit with that. The reality is probably that something between those two levels is what we need but that change will only happen as it becomes clear to big brands that their controlled content isn’t working anymore.

Q2) With that social media map… What age group was that? I didn’t see SnapChat there.

A2) I’m not sure, it was a map of dominant social media spaces…

Q3) I wanted to talk about the hierarchy of content… Written posts, visual content etc… What seemed to do best was sponsored video content that was subtitled.

A3) Facebook itself, they prioritise video content – it is cheaper to use this in your marketing. If you do video yes, you have to have subtitles so that you can see rather than listen to the videos… And with videos, especially “authentic video” that will be heavily prioritised by Facebook. So we are doing a lot of video work.

Introduction to ECSM 2018 Niall Corcoran, Limerick Institute of Technology, Ireland

I wanted to start by thanking our hosts this year, Vilnius has been excellent this year. Next year we’ll a bit earlier in the year – late June – and we’ll be at the Limerick Institute of Technology, Ireland. We have campuses around the region with 7000 students and 650 staff, teaching from levels 6 to 10. The nearest airport is Shannon, or easy distance from Cork or Dublin airports.

In terms of social media we do research on Social MEdia Interactive Learning Environment, Limerick Interactive Storytelling Network, Social Media for teaching and research, Social Media for cancer recovery.

In terms of Limerick itself, 80-90% of the Europe’s contact lenses are manufactured there! There is a lot of manufacturing in Limerick, with many companies having their European headquarters there. So, I’ve got a short video made by one of our students to give you a sense of the town. And we hope to see you there next year!

Social Media Competition Update

The top three placed entries are: Developing Social Paleantology – Lisa Lundgren; EDINA Digital Footprint Consulting and Training Service – Nicola Osborne (yay!); Traditions Mobile App – Adam Peruta.

Stream A: Mini track on Ethical use of social media data – Chair: Dragana Calic

The Benefits and Complications of Facebook Memorials – White Michelle, University of Hawai’i at Manoa, USA

I wanted to look at who people imagine are their audience for these memorials. And this happened because after the death made me look at this, and I decided to look into this in more depth.

So, I’m using danah boyd’s definition of social networking here. We are talking Facebook, Twitter, SnapChat etc. So, a Facebook Memorial is a group that is created specifically to mark the death of a friend or family members – or for public figures (e.g. Michael Jackson).

Robert Zebruck and Brubecker talk about imagined audience as the flattening of realities. So, right now I can see people in the room, I can see who you are, how you react, how to modify my tone or style to meet you, to respond to you. But it is hard to do that on social media. We see context collapse. And we can be sat there alone at our computer and not have that sense of being public. Sometimes with memorials we will say things for that audience, but in other cases perhaps it is sharing memories of drinking together, or smoking weed with something… Memories that may jar with others.

It was a long road to get to this research. My review board were concerned about emotional distress of interviewees. I agreed in the end to interview via Skype or Facebook and to check everything was ok after every question, to make it easier to see and review their state of mind. I had to wait over a year to interview people, the death had to not be by suicide, and the participants had to be over 18 years old. So I did conduct qualitative research over Skype and Facebook… And I found interviewees by looking at memorial pages that are out there – there are loads there, not all labelled as memorials.

So, my data… I began by asking who people thought they were talking to… Many hadn’t thought about it. They talked about family members, friends… Even in a very controlled group you can have trolls and haters who can get in… But often people assumed that other people were like them. A lot of people would write to the deceased – as if visiting a grave, say. I asked if they thought the person could hear or understand.. But they hadn’t really thought about it, it felt like the right thing to do… And they wanted family and friends to hear from them. They felt likes, shares, etc. were validating and therapeutic, and that sense of connection was therapeutic. Some even made friends through going out drinking, or family gatherings… with friends of friends who they hadn’t met before…

This inability to really think or understand the imagine audience, that led to context collapse. Usually family is in charge of these pages… And that can be challenging… For instance an up and coming football star died suddenly, and then it was evident that it was the result of a drug overdose… And that was distressing for the family who tried to remove that content. There is an idea of alternative narratives. Fake news or alternative facts has a particular meaning right now… But we are all used to presenting ourselves in a particular way to different friends, etc. In one memorial site the deceased had owed money to a friend, and they still felt owed that money and were posting about that – like a fight at the funeral… It’s very hard to monitor ourselves and other people…

And there was fighting about who owned the person… Some claiming that someone was their best friend, fights over who was more important or who was more influenced. It happens in real life… But not quite as visibly or with all involved…

So, in conclusion… There are  a lot of benefits for Facebook Memorials. Pyschologists talk of the benefit of connecting, grieving, not feeling alone, to get support. Death happens. We are usually sad when it happens… Social networking sites provide another way to engage and connect. So if I’m in Lithuania and there is a funeral in Hawaii that I can’t travel to, I can still connect. It is changing our social norms, and how we connect. But we can do more to make it work better – safety and security needs improving. Facebook have now added the ability to will your page to someone. And now if someone dies you can notify Twitter – it changes it slightly, birthday reminders no longer pop up, it acts as a memorial. There are new affordances.

Personally, doing this research was very sad, and it’s not an area I want to continue looking at. It was emotionally distressing for me to do this work.


Q1) I am old enough to remember LiveJournal and remember memorials there. They used to turn a page into a memorial, then were deleted… Do you think Facebook should sunset these memorials?

A1) I personally spoke to people who would stare at the page for a month, expecting posts… Maybe you go to a funeral, you mourn, you are sad… But that page sticking around feels like it extends that… But I bet Mark Zuckerberg has some money making plan for keeping those profiles there!

Q2) What is the motivation for such public sharing in this way?

A2) I think young people want to put it out there, to share their pain, to have it validated – “feel my pain with me”. One lady I spoke to, her boyfriend was killed in a mass shooting… Eventually she couldn’t look at it, it was all debate about gun control and she didn’t want to engage with that any more…

Q3) Why no suicides? I struggle to see why they are automatically more distressing than other upsetting deaths…

A3) I don’t know… But my review board thought it would be more distressing for people…

Q4) How do private memorials differ from celebrity memorials?

A4) I deliberately avoided celebrities, but also my IRB didn’t want me to look at any groups without permission from every member of that group…

Comment) I’ve done work with public Facebook groups, my IRB was fine with that.

A4) I think it was just this group really… But there was concern about publicly identifiable information.

Online Privacy: Present Need or Relic From the Past? – Aguirre-Jaramillo Lina Maria, Universidad Pontificia Bolivariana, Colombia

In the influential essay, The Right to Privacy, in the Harvard Law Review (1890) – Warren and Brandeis, privacy was defined as “Privacy – the right to be let alone”. But in the last ten years or so we now see sharing of information that not long ago would have been seen and expected to be private. Earl Warren is a famous US judge and he said “The fantastic advances in the field of electronic communication constitute a greater danger to the privacy of the individual.”

We see privacy particularly threatened by systematic data collection. Mark Zuckerberg (1999) claims “Privacy is no longer a social norm”. This has been used as evidence of disregard toward users rights and data. The manner in which data is stored, changed and used and the associated threats. But we also see counter arguments such as the American Library Association’s Privacy Revolution campaign.

So, this is the context for this work in Columbia. It is important to understand literature in this area, particularly around data use, data combinations, and the connection between privacy concerns and behaviours online (Joinsen et al 2008). And we also refer to the work of Sheenan (2002) in the characterisations of online users. Particularly we are interested in new privacy concerns and platforms, particularly Facebook. The impact of culture on online privacy has been studied by Cho, Rivera Sanchez and Lim (2009).

The State of the Internet from OxII found that Columbia had between 40 and 60% of people online. Internet uptake is, however, lower than in e.g. the US. And in Columbia our population is 46% 25-54 years old.

So, my study is currently online. A wider group is also engaging in personal and group interviews. Our analysis will focus on what background knowledge, risk and privacy awareness there is amongst participants. Wat self-efficacy level is regealed by participants – their knowledge and habits. And what interest and willingness is there to acquire more knowledge and gain more skills to manage privacy. At a later stage we will be building a prototype tool.

Our conclusions so far… Privacy is hard to define and we need to do more to define it. Privacy is not a concept articulated in one only universally accepted definition. Different groups trade off privacy differently. Relevant concepts here include background knowledge, computer literacy, privacy risk, self efficacy.

And finally… Privacy is still important but often ignored as important in the wider culture. Privacy is not a relic but a changing necessity…


Q1) Did age play a role in privacy? Do young people care as much as older people?

A1) They seem to care when they hear stories of peers being bullied, or harassed, or hear stories of hacking Instagram accounts. But their idea of privacy is different. But there is information that they do not want to have public or stolen. So we are looking more at that, and also a need to understand how they want to engage in privacy. As my colleague Nicola Osborne form Edinburgh said in her presentation yesterday, we have to remember students already come in with a long internet/social media history and presence.

Q2) I was wondering about cultural aspect… Apps used and whether privacy is important… For instance SnapChat is very exhibitionist but also ephemeral…

A2) I don’t have full answers yet but… Young people share on SnapChat and Instagram to build popularity with peers… But almost none of them are interested in Twitter… At least that’s the case in Columbia. But they do know some content on Facebook may be more vulnerable that SnapChat and Instagram… It may be that they have the idea of SnapChat as a space they can control perhaps…

Q3) I often feel more liberal with what I share on Facebook, than students who are 10 or 15 years younger… I would have some privacy settings but don’t think about the long story of that… From my experience students are a lot more savvy in that way… When they first come in, they are very aware of that… Don’t want a bigger footprint there…

A3) That is not exactly true in Columbia. The idea of Digital Footprint affecting their career is not a thing in the same way… Just becoming aware of it… But that idea of exhibitionism… I have found that most of the students in Columbia seem quite happy to share lots of selfies and images of their feet… That became a trend in other countries about three years ago… They don’t want to write much… Just to say “I’m here”… And there has been some interesting research in terms of the selfie generation and ideas of expressing yourself and showing yourself… May be partly to do with other issues… In Columbia many young women have plastic surgery – came out of the 1980s and 1990s… Many women, young women, have cosmetic surgery and want to share that… More on Instagram than Pinterest – Pinterest is for flowers and little girlie things…

Q4) You were talking about gender, how do privacy attitudes differ between males and females?

A4) The literature review suggests women tend to be more careful about what they publish online… They may be more careful selecting networks and where they share content… More willing to double check settings, and to delete content they might have difficulty explaining… Also more willing to discuss issues of privacy… Things may change over time… Suggestion that people will get to an age where they do care more… But we also need to see how the generation that have all of their images online, even from being a baby, will think about this… But generally seems to be slightly more concern or awareness from women…

Comment) I wanted to just follow up the Facebook comment and say that I think it may not be age but experience of prior use that may shape different habits there… Students typically arrive at our university with hundreds of friends having used Facebook since school, and so they see that page as a very public space – in our research some students commented specifically on that and their changing use and filtering back of Facebook contacts… For a lot of academics and mid career professionals Facebook is quite a private social space, Twitter plays more that public role. But it’s not age per se perhaps, it’s that baggage and experience.

Constructing Malleable Truth: Memes from the 2016 U.S. Presidential Campaign – Wiggins Bradley, Webster University, Vienna, Austria, Austria

Now, when I wrote this… Trump was “a candidate”. Then he was nominee. Then president elect… And now President. And that’s been… surprising… So that’s the context.

I look at various aspects in my research, including internet memes. So, in the 2008 Obama’s campaign was great at using social media, at getting people out there and sharing and campaigning for them on a voluntary and enthusiastic basis. 2016 was the meme election I think. Now people researching Memes feel they must refer to Richard Dawkins talking about memes. He meant ideas… That’s not the same as internet memes… So what are the differences betwen Dawkins’ memes and Internet memes? Well honestly they are totally different EXCEPT that they require attention, and have to be reproducable….

Mikhail Bakhtin wrote about the Carnivalesque as something that subverts the dominant mode or perspective, it turns the world on its head… The king becomes the jester and the jester becomes the king. So the Trump tie memes… We need no text here, the absurd is made more absurd. It is very critical. It has that circus level laugh… He’s a clown or a buffoon… You know about it and how to reproduce this.

In terms of literature.. There is work on memes but I think when understanding memes with millennials, but also baby boomers, even people in their 70’s and 80s… We have to go back to major theorists, concepts and perspectives – Henry Jenkins, Erving Goffman, etc. This is a new mode of communication I think, not a new language, but a new mode.

So method wise… I wanted to do a rhetorical-critical analysis of selected internet memes from the facebook page Bernie Sanders Dank Meme Stash, which had over 420k members when I wrote this slide – more now. It was founded by a college student in October 2015. And there are hundreds of thousands of memes there. People create and curate them.

Two months before nad one month after the US Election I did two sets of samples… Memes that received 1000 or more likes/retweets. And memes that received at least 500 or more likes/reactions and at least 100 shares. As an unexpected side note I found that I needed to define “media narrative”. There doesn’t seem to be a good definition. I spoke to Brooke Gladstone of WYNC, I spoke with colleagues in Vienna… We don’t usually take time to think about media narrative… For instance the shooting at Pulse Nightclub has a narrative on the right around gun control, for others its around it being a sad and horrible event…

So, media narrative I am defining as:

  1. Malleable depending upon the ability to ask critical questions
  2.  Able to shape opinion as well as perceptions of reality and a person’s decision-making process and…
  3.  Linguistic and image-based simulations of real-world events which adhere and/or appeal to ontologically specific perspectives, which may include any intentional use of stereotyping, ideology, allegory, etc.

Some findings… The relational roles between image and text are interchangable because of the relationship to popular culture. Barthes (1977) takls about the text loading the image burdening it with culture, a moral, an imagination. And therefore the text in internet memes fluctuates depending n the intended message and the dependence on popular culture.

So, for instance we have an image from Nightmare at 20,000 ft, a classic Twilight Zone image… You need to know nothing here and if I replace a monster face with Donald Trump’s face… It’s instantly accessible and funny. But you can put any image there depending on the directionality of the intended meaning. So you have the idea of the mytheme or function of the monster/devil/etc. can be replaced by any other monster… It doesn’t matter, the reaction will depend on your audience.

Back to Barthes (1977) again, I find him incredibly salient to the work I’ve done here. One thing emerging from this and Russian memes work done before, is the idea of Polysemic directionality. It has one direction and intentionality.. No matter what version of this image you use…

So, here’s a quick clip of the Silence of the Lambs. And here Buffallo Bill, who kills women and skins them… A very scary character… We have him in a meme being a disturbing advisor in memes. If you get that reference it has more weight, but you don’t need to know the reference.

We have the image of Hillary as Two Face, we have Donald as The Joker… And a poster saying “Choose”. The vitriol directed at Clinton was far worse than that at Trump… Perhaps because Sanders supporters were disappointed at not getting the nomination.

We have intertextuality, we also have inter-memetic references… For example the Hilary deletes electoral colleges meme which plays on Grandma on the internet memes… YOu also have the Superman vs Trump – particularly relevant to immigrant populations (Jenkins 2010).

So, conclusions… The construction of a meme is affected and dependent on the media around it… That is crucial… We have heard about fake news, and we see memes in support of that fake news… And you may see that on all sides here. Intertextual references rely on popular culture and inter memetic references which assumes knowledge, a new form of communication. And I would argue that memes are a digital myth – I think Levi Strauss might agree with me on that…

And to close, for your viewing pleasure, the Trump Executive Order meme… The idea of a meme, an idea that can be infinitely replaced with anything really…


Q1) This new sphere of memes… Do you think that Trump represents a new era of presidency… Do you think that this will pass? With Trump posting to his own Twitter account…

A1) I think that it will get more intense… And offline too… We see stickers in Austrian elections around meme like images… These are tools for millennials. They are hugely popular in Turkey… There are governments in Turkey, Iran and China are using memes as propaganda against other parties… I’m not sure it’s new but we are certainly more aware of it… Trump is a reality TV star with the nucleaur keys… That should scare us… But memes won’t go away…

Q2) In terms of memes in real life… What about bumper stickers… ? They were huge before… They are kind of IRL memes…

A2) I am working on a book at the moment… And one of the chapters is on pre-digital memes. WWII used to write “Kilroy was here”. Is Magritte’s Ceci n’est pas une pipe a meme? There is definitely a legacy of that… So yes, but depends on national regional context…

Q3) So… In Egypt we saw memes about Trump… We were surprised at the election outcome… What happened?

A3) Firstly, there is that bias that reinforcing narrative… If you looked at the Sanders meme page you might have had that idea that Clinton would not win because, for whatever reason, these people hated Hillary. Real rage and hatred towards her… And Trump as clown hitler… Won’t happen… Then it did… Then rage against him went up… After the Muslim ban, the women’s march etc…

Q4) There are some memes that seem to be everywhere – Charlie and the Chocolate Factory, Sean Bean, etc… Why are we picking those specific particular memes of all things?

A4) Like the Picard WTF meme… Know Your is a great resource… In the scene that Picard image is from he’s reciting Shakespeare to get Louixana Troy away from the aliens… It doesn’t matter… But it just fits, it has a meaning

Q5) Gender and memes: I wondered about the aspect of gender in memes, particularly thinking about Clinton – many of those reminded me of the Mary Beard memes and trolling… There are trolling memes – the frog for Trump… the semi-pornographic memes against women… Is there more to that than just (with all her baggage) Clinton herself?

A5) Lisa Silfestry from Gonzaga, Washington State and Lemour Shipman in Tel Aviv do work in that area. Shipman looks at Online Jokes of all types and has done some work on gender.

Q6) Who makes memes? Why?

A6) I taught a course on internet memes and cultures. That was one of the best attended courses ever. My students concluded that the author didn’t matter… But look at 4Chan and Reddit or Know Your Meme… And you can tell who created it… But does that matter… It’s almost a public good. Who cares who created the Trump tie meme. With the United Airline you can see that video, it turned into a meme… and it had lost millions in stock.

Stream B: Mini track on Enterprise Social Media – Chair: Paul Alpar

The Role of Social Media in Crowdfunding – Makina Daniel, University of South Africa, Pretoria, South Africa

My work seeks to find the connection between social media and finance, specifically crowd funding. And the paper introduces the phenomena of crowdfunding, and how the theory of social networking underpins social media. The theory around social media is still developing… Underpinned by theory of information systems and technology adoption, with different characteristics from what happens in social media.

So, a definition of crowdfunding. Crowdfunding is essentially an aspect of crowdsourcing, spurred by ubiquitous web 2.0 technologies. And “Crowdfunding refers to the efforts of entrepreneurial individuals and groups – cultural, social and for-profit – to fund their ventures by drawing on relatively small contributions from a relatively large number of individuals using the internet, without standard financial intermediaries” (Mollick 2014).

Since 2010 there have been growing amounts of money raised globally through crowdfunding. Fobes estimates $34 billion in 2015 (compared to $16 billion in 2014, and $880 million in 2010). The World Bank estimates that crowdfunding will raise $93 billion annually by 2025. This growth couldn’t be achieved in the absence of internet technology, and social media are critical in promoting this form of alternative finance.

Cheung and Lee (2010) examined social influence processes in determining collective social action in the context of online social networks. Their model shows intentional soial action, with users considering themselves part of the social fabric. And they explain three processes of social influence: subjective norm – self outside of any group; group norm – self awareness as a member of a group; and social identity – self in context. Other authors explain social media popularity because of a lack of trust in traditional media, with people wary of information that emanates from people they do not know personally. Kaplin and Haenlein (2010) define social media as “a group of internet-based applications that build on the ideological and technological foundations of web 2.0 applications that allow the creation and exchange of user generated content” So it is a form of online interaction that enables people to create, comment, share and exchange content with other people.

So, how does social media facilitate finance, or crowd sourcing? Since social media assists in maintaining social ties, this should in turn aid facilitation of crowdfunding campaigns. Draw on Linus’s Law “given enough eyeballs, all bugs are shallow”. Large groups are more adept at detecting potential flaws in a campaign than individuals (alone). Thus providing fraudulent campaigns from raising money for crowdfunding projects. Facebook, Twitter, etc. provide spaces for sharing and connection are therefore suitable for crowdfunding campaigns. Studies have shown that 51% of Facebook users are more likely to buy a product after becoming a fan of the products Facebook page (Knudsen 2015).

Brossman (2015) views crowdfunding as existing in two phases (i) brand awareness and (ii) targeting people to support/back one’s campaign. And crowdfunding sites such as Kickstarted and IndieGoGo allow project creators to publish pertinent information and updates, as well as to link to social media. Those connections are present and that also helps deal with a relative lack of social networking functionality within the platform itself, where they are able to create project descriptions, they have a community of users and utilise web 2.0 technologies that allow users to comment on projects and attract money.

A study by Moisseyez (2013) on 100 Kickstarter projects found that connection between social media approval and success in funding. Mollick (2014) observed that crowdfunding success is associated with having a large number of friends in online social networks: a founder with ten Facebook friends would have a 9% chance of succeeding; one with 100 friends would have a 20% chance of success; one with 1000 friends would have a 40% chance of success. He cited a film industry example where more friends mapped to a much higher potential success rates.

So, in conclusion, we don’t have many studies on this are yet. But social media is observed to aid crowdfunding campaigns through its ability to network disparate people through the internet. One notable feature is that although there are main forms of social media, crowdfunding utilizes a limited number of spaces, primarily Facebook and Twitter. Furthermore future research should examine how the expertise of the creator (requestor of funds) and project type, social network, and online presence influence motivations.


Q1) I was wondering if you see any connection between the types of people who back crowdfunding campaigns, and why particular patterns of social media use, or popularity are being found. For instance anecdotally the people who back lots of crowdfunding campaigns – not just one off – tend to be young men in their 30s and 40s. So I was wondering about that profile of backers and what that looks like… And if that profile of backer is part of what makes those social media approaches work.

A1) The majority of people using social media are young people… But young people as sources of finance for, say, small businesses… They are mainly likely to be either studying or starting professional career… But not accumulating money to give it out… So we see a disconnect… Between who is on social media… On Twitter, Facebook, etc. to raise finance… You successful in raising funding from people who cannot raise much… So one would expect people in mid career were using most social media, would expect more money coming from crowdfunding… One aspect of crowdfunding… We are looking at resources… You asking for small amounts… Then young people are able to spare that much…

Q2) So most people giving funding on crowdfunding sites are young people, and they give small amounts…

A2) Yes… And that data from Mollick… combined with evidence of people who are using Facebook…

Q2) What about other specialised crowdfunding networks… ?

A2) There is more work to be done. But even small crowdfunding networks will connect to supporters through social media…

Q3) Have you looked at the relative offerings of the crowdfunding campaigns?

A3) Yes, technology products are more successful on these platforms than other projects…

Using Enterprise Social Networks to Support Staff Knowledge Sharing in Higher Education – Corcoran Niall, Limerick Institute of Technology, Ireland and Aidan Duane, Waterford Institute of Technology, Ireland

This work is rooted in knowledge management, this is the basis for the whole study. So I wanted to start with a Ikujio Nonaka “in an econoy where the only certainty is uncertainty… ” And Lew Platt, former CEO of Hewlett-Packard said “If HP knew what HP knows it would be three times more productive” – highlighting the crucial role of knowledge sharing.

Organisations can gain competitive advantage through encouraging and promoting knowledge sharing – that’s the theory at least. It’s very important in knowledge-intensive organisations, such as public HEIs. HEIs need to compete in a global market place… We need to share knowledge… Do we do this?

And I want to think about this in the context of social media. We know that social media enable creation, sharing or exchange of information, ideas and media in virtual communities and networks. And organisational applications are close to some of the ideals of knowledge management: supporting group interaction towards establishing communities; enable creation and sharing of content; can help improve collaboration and communication with organisations; distinct technological features that are ideally suited for knowledge sharing; fundamental disruption in knowledge management; and social media is reinvigorating knowledge management as a field.

We do see Enterprise Social Networks (ESN). If you just bring one into an organisation, people don’t necessarily just go and use it. People need a reason to share. So another aspect is communities of practice (Lave and Wenger 1991), this is an important knowledge management strategy, increasingly used. This is about groups pf people who share a passion for something – loose and informal social structures, largely voluntary, and about sharing tacit knowledge. So Communities of Practice (CoP) tend to meet from time to time – in person or virtually.

ESN can be used to create virtual communities. This is particularly suitable for distributed communities – our university has multiple campuses for instance.

So, knowledge sharing in HEIs… Well many don’t do it. A number of studies have shown that KM implementation and knowledge sharing in HEIs is at a low level. Why? Organisational culture, organisational structures, beurocractic characteristics. And there is well documented divide/mistrust between faculty and support staff (silos) – particularly work from Australia, US and UK. So, can CoP and ESN help? Well in theory they can bypass structures that can reinforce silos. That’s an ideal picture, whether we get there is a different thing.

So our research looked at what the antecedents for staff knowledge sharing are; what the dominant problems in the implementation of ESN and CoP. The contextual setting here is Limerick Institute of Technology. I used to work in IT services and this work came significantly from this interest. There is a significant practical aspect to the research so action research seemed like the most appropriate approach.

So we had a three cycle action research project. We looked at Yammer. It has all the features of social networking you’d expect – can engage in conversations, tagged, shared, can upload content. It lends itself well to setting up virtual communities, very flexible and powerful tools for virtual communities. We started from scratch and grew to 209 users.

Some key findings… We found culture and structure are major barriers to staff knowledge sharing. We theorised that and found it to be absolutely the case. The faculty staff divide in HEI exacerbates the problem. Management have an important role to play in shaping a knowledge sharing environment. The existence of CoP are essential to build a successful knowledge sharing environment, and community leaders and champions are require for the ESN. Motivation to participate is also crucial. If they feel motivated, and they see benefit, that can be very helpful. And those benefits can potentially lead to culture change, which then effects motivation…

We found that our organisation has a strong hierarchical model. Very beaurocratic and rigid. Geographic dispersal doesn’t help. To fix this we need to move from a transactional culture. The current organisational structure contributes to the faculty staff divide, limits opportunities and motivations for staff and faculty to work together. But we also found examples where they were working well together. And in terms of the role of management, they have significant importance, and have to be involved to make this work.

Virtual communities are a Knowledge Management strategy has the potential to improve collaboration and interaction between staff, and it has to be seen as valued, relevant, a valid work activity. Staff motivation wise there are some highly motivated people, but not all. Management have to understand that.

So management need to understand the organisational culture; recognise the existence of structural and cultural problems; etc. Some of the challenges here are the public sector hierarchical structures – public accountability, media scrutiny, transitional culture etc.


Q1) On a technical level, which tools are most promising for tacit knowledge sharing…

A1) The whole ability to have a conversation. Email doesn’t work for that, you can’t branch threads… That is a distinctive feature of Yammer groups, to also like/view/be onlookers in a conversation. We encourage people to like something if they read it, to see that it is useful. But the ability to have a proper conversation, and organised meetings and conversations in real time.

Q2) What kind of things are they sharing?

A2) We’ve seen some communities that are large, they have a real sense of collaboration. We’re had research coming out of that, some really positive outcomes.

Q3) Have you seen any evidence of use in different countries… What are barriers across different regions, if known?

A3) I think the barriers are similar to the conceptual model (in the proceedings) – both personal and organisational barriers… People are afraid largely to share stuff… They are nervous of being judged… Also that engagement on this platform might make managers thing that they are not working. Age is a limiting factor – economic issues mean we haven’t recruited new staff for almost 10 years, so we are older as a staff group.

Q3) Might be interested to compare to different cultures, with asian culture more closed I think…

A3) Yes, that would be really interesting to do…

Q4) I am trying to think how and what I might share with my colleagues in professional services, technical staff, etc.

A4) The way this is constructed is in communities… We have staff interested in using Office 365 and Classroom Notebook, and so we set up a group to discuss that. We have champions who lead that group and guide it. So what is posted there would be quite specific… But in Yammer you can also share to all… But we monitor and also train our users in how and where to post… You can sign up for groups or create new groups… And it is moderated. But not limited to specifically work related groups – sports and social groups are there too. And that helps grow the user base and helps people see benefits.

Q5) Have you looked at Slack at all? Or done any comparison there?

A5) We chose Yammer because of price… We have it in O365, very practical reason for that… We have looked at Slack but no direct comparison.

Finalists in the Social Media in Practice Excellence Competition present their Case Histories

EDINA Digital Footprint Consulting and Training Service – Nicola Osborne

No notes for this one…

Developing Social Paleantology – Lisa Lundgren;

This is work with a software development company, funded by the National Science Foundation. And this was a project to develop a community of practice around paleontology… People often think “dinosaur” but actually it’s about a much wider set of research and studies of fossils. For our fossil project to meet it’s goal, to develop and support that community, we needed to use social media. So we have a My Fossil community, which is closed to the community, but also a Facebook group and Twitter presence. We wanted to use social media in an educative way to engage the community with our work.

We began with design studies which looked at what basic elements to contribute to engage with social media, and how to engage. We were able to assess practical contributions and build an educatie and evidence-based social media plan. So we wanted to create daily posts using social paleantology, e.g. #TrilobiteTuesday; design branded image-focused posts that are practice-specific, meet design principles, often huperlinks to vetted paleontological websites; respond to members in ways that encourage chains of communication. There is a theoretical contribution here as well. And we think there are further opportunities to engage more with social paleontology and we are keen for feedback and further discussion. So, I’m here to chat!


Traditions Mobile App – Adam Peruta.

When new university students come to campus they have lots of concerns like what is this place, where do I fit in, how can I make new friends. That is particularly the case at small universities who want to ensure students feel part of the community, and want to stay around. his is where the Traditions Challenge app comes in – it provides challenges and activities to engage new students in university traditions and features. This was trialled at Ithaca University. So, for instance we encourage students to head along to go along to events, meet other new students, etc. We encourage students to meet their academic advisors outside of the classroom. To explore notable campus features. And to explore the local community more – like the farmers market. So we have a social feed – you can like, comment, there is an event calendar, a history of the school, etc. And the whole process is gamified, you gain points through challenges, you can go on the leaderboard so there are incentives to gain status… And there are prizes too.

Looking at the results this year… We had about 200 students who collectively completed over 1400 challenges, the person who completed the most (and won a shirt) completed 53 challenges. There are about 100 challenges in the app so it’s good they weren’t all done in one year. And we see over 50k screen views so we know that the app is getting more attention whether or not people engage in the challenges. Students focus groups raised themes of the enjoyment of the challenge list, motivation for participation (which varied), app design and user experience – if there’s one key takeaway: this demographic has really high expectations for user interface, design and tone; contribution to identity… Lots of academic research that the more students are engaged on campus, the more likely they will remain at that university and remain engaged through their studies and as alumni. So there is loads of potential here, and opportunity to do more with the data.

So, the digital experience is preferred, mobile development is expensive and time consuming, good UI/UX is imperative to success, universities are good at protecting their brands, and we learned that students really want to augment their on-campus academic experiences.

Conference organiser: Those were the finalists from yesterday, so we will award the prizes for first, second and third… and the PhD prize…

Third place is Lisa; Second place is me (yay!); First place is Adam and the Traditions mobile app.

I’m going to rely on others to tweet the PhD winners…

The best poster went to IT Alignment through Artificial Intelligence – Amir  – this was mainly based on Amir’s performance as his poster went missing so he had to present to an A4 version of the poster so he did a great job of presenting.

Thank you to our hosts here… And we hope you can join us in Limerick next year!

Thanks to all at ECSM 2017.

Jul 032017

Today I am at the Mykolo Romerio Universitetas in Vilnius, Lithuania, for the European Conference on Social Media 2017. As usual this is a liveblog so additions, corrections etc. all welcome… 

Welcome and Opening by the Conference and Programme Chairs: Aelita Skaržauskienė and Nomeda Gudelienė

Nomeda Gudelienė: I am head of research here and I want to welcome you to Lithuania. We are very honoured to have you here. Social media is very important for building connections and networking, but conferences are also really important still. And we are delighted to have you here in our beautiful Vilnius – I hope you will have time to explore our lovely city.

We were founded 25 years ago when our country gained independence from the Soviet Union. We focus on social studies – there was a gap for new public officials, for lawyers, etc. and our university was founded,

Keynote presentation: Dr. Edgaras Leichteris, Lithuanian Robotics Association – Society in the cloud – what is the future of digitalization?

I wanted to give something of an overview of how trends in ICT are moving – I’m sure you’ve all heard that none of us will have jobs in 20 years because robots will have them all (cue laughter).

I wanted to start with this complex timeline of emerging science and technology that gives an overview of Digital, Green, Bio, Nano, Neuro. Digitalisation is the most important of these trends, it underpins this all. How many of us think digitalisation will save paper? Maybe not for universities or government but young people are shifting to digital. But there are major energy implications of that, we are using a lot of power and heat to digitise our society. This takes us through some of those other areas…. Can you imagine social networking when we have direct neural interfaces?

This brings me to the Hype curve – where see a great deal of excitement, the trough of disillusionment and through to where the real work is. Gartner creates a hype cycle graph every year to illustrate technological trends. At the moment we can pick out areas like Augmented reality, virtual reality, digital currency. When you look at business impact… Well I thought that the areas that seem to be showing real change include Internet of Things – in modern factories you see very few people now, they are just there for packaging as we have sensors and devices everywhere. We have privacy-enhancing technologies, blockchain, brain computer interfaces, and virtual assistance. So we have technologies which are being genuinely disruptive.

Trends wise we also see political focus here. Why is digital a key focus in the European Union? Well we have captured only a small percentage of the potential. And when we look across the Digital Economy and Society index we see this is about skills, about high quality public services – a real priority in Lithuania at the moment – not just about digitalisation for it’s own sake. Now a few days ago the US press laughed at Jean Claude Junker admitting he still doesn’t have a smartphone, but at the same time, he and others leading the EU see that the future is digital.

Some months back I was asked at a training session “Close your eyes. You are now in 2050. What do you see?”. When I thought about that my view was rather dystopic, rather “Big Brother is watching you”, rather hierarchical. And then we were asked to throw out those ideas and focus instead on what can be done. In the Cimulact EU project we have been looking at citizens visions to look toward a future EU research and innovation agenda. In general I note that people from older European countries there was more optimism about green technologies, technology enabling societies… Whilst people from Eastern European countries have tended to be more concerned with the technologies themselves, and with issues of safety and privacy. And we’ve been bringing these ideas together. For me the vision is technology in the service of people, enabling citizens, and creating systems for green and smart city development, and about personal freedom and responsibility. What unites all of these scenarios?  The information was gathered offline. People wanted security, privacy, communication… They didn’t want the technologies per se.

Challenges here? I think that privacy and security is key for social media, and the focus on the right tool, for the right audience, at the right time. If we listen to Time Berners Lee we note that the web is developing in a way divergent from the original vision. Lorrie Faith Cranor, Carnegie Mellon University notes that privacy is possible in a laboratory condition, but in the reality of the real world, it is hard to actually achieve that. That’s why such people as Aral Balkan, self-styled Cyborg Rights Activist – he has founded a cross-Europe party just focusing on privacy issues. He says that the business model of mainstream technology under “surveillance capitalisms” is “people arming and it it is toxic to human rights and democracy”. And he is trying to bring those issues into more prominence.

Another challenge is engagement. The use and time on social media is increasing every year. But what does that mean. Mark Schaefer, Director of Schaefer Marketing Solutions, describes this as “content shock” – we don’t have the capacity to deal with and consume the amount of content we are now encountering. Jay Bayer just wrote the book “Hug your haters” making the differentiation between “offstage haters” vs. “onstage haters”. Offstage haters tend to be older, offline, and only go public if you do not respond. Onstage haters post to every social media network not thinking about the consequences. So his book is about how to respond to, and deal with, many forms of hate on the internet. And one of the recently consulted companies have 150 people working to respond to that sort of “onstage” hate.

And then we have the issue of trolling. In Lithuania we have a government trying to limit alcohol consumption – you can just imagine how many people were being supported by alcohol companies to comment and post and respond to that.

We so also need to think about engagement in something valuable. Here I wanted to highlight three initiatives, two are quite mature, the third is quite new. The first is “My Government” or E citizens. This is about engaging citizens and asking them what they think – they post a question, and provide a (simple) space for discussion. The one that I engaged with only had four respondents but it was really done well. Lithuania 2.0 was looking at ways to generate creative solutions at government level. That project ended up with a lot of nice features… Every time we took it out, they wanted new features… People engaged but then dropped off… What was being contributed didn’t seem directly enough fed into government, and there was a need to feedback to commentators what had happened as a result of their posts. So, we have reviewed this work and are designing a new way to do this which will be more focused around single topics or questions over a contained period of time, with direct routes to feed that into government.

And I wanted to talk about the right tools for the right audiences. I have a personal story here to do with the idea of whether you really need to be in every network. Colleagues asked why I was not on Twitter… There was lots of discussion, but only 2 people were using Twitter in the audience… So these people were trying to use a tool they didn’t understand to reach people who were not using those tools.

Thinking about different types of tools… You might know that last week in Vilnius we had huge rainfall and a flood… Here we have people sharing open data that allows us to track and understand that sort of local emergency.

And there is the issue of how to give users personalised tools, and give opportunity for different opinions – going beyond your filter bubble – and earn profit. My favourite tool was called Personal Journal – it had just the right combination – until that was brought by Flipboard. Algorithmic tailoring can do this well, but there is that need to make it work, to expose to wider views. There is a social responsibility aspect here.

So, the future seems to look like decentralisation – including safe silos that can connect to each other; and the right tools for the right audience. On decentralisation Blockchain, or technologies like it, are looking important. And we are starting to see possible use of that in Universities for credentialing. We can also talk about uses for decentralisation like this.

We will also see new forms of engagement going mass market. Observation of “digital natives” who really don’t want to work in a factory… See those people going to get a coffee, needing money… So putting on their visor/glasses and managing a team in a factory somewhere – maybe Australia – only until that money is earned. We also see better artificial intelligence working on the side of the end users.

The future is ours – we define now, what will happen!


Q1) I was wondering what you mean by Blockchain, I haven’t heard it before.

A1) It’s quite complicated to explain… I suggest you Google it – some lovely explanations out there. We have a distributed

Q2) You spoke about the green issues around digitalisation, and I know Block Chain comes with serious environmental challenges – how do we manage that environmental and technological convenience challenge?

A2) Me and my wife have a really different view of green… She thinks we go back to the yurt and the plants. I think differently… I think yes, we consume more… But we have to find spots where we consume lots of energy and use technology to make it more sustainable. Last week  was at the LEGO factory in Denmark and they are working on how to make that sustainable… But that is challenging as their clients want trusted, robust, long-lasting materials. There are aready some technologies but we have to see how that will happen.

Q3) How do you see the role of artificial intelligence in privacy? Do you see it as a smart agent and intermediary between consumers and marketers?

A3) I am afraid of a future like Elon Musk where artificial intelligence takes over. But what AI can do is that it can help us interpret data for our decisions. And it can interpret patterns, filter information, help us make the best use of information. At the same time there is always a tension between advertisers and those who want to block advertisers. In Lithuanian media we see pop ups requesting that we switch off ad blocking tools… At the same time we will see more ad blocks… So Google, Amazon, Facebook… They will use AI to target us better in different ways. I remember hearing from someone that you will always have advertising – but you’ll like it as it will be tailored to your preferences.

Q4) Coming from a background of political sciences and public administration… You were talking about decentralisation… Wouldn’t it be useful to differentiate between developed and developing world, or countries in transition… In some of those contexts decentralisation can mean a lack of responsibility and accountability…

A4) We see real gaps already between cities and rural communities – increasingly cities are their own power and culture, with a lot of decisions taken like mini states. You talked a possible scenario that is quite 1984 like, of centralisation for order. But personally I still believe in decentralisation. There is a need for responsibility and accountability, but you have more potential for human rights and

Aelita Skaržauskienė: Thank you to Edgaras! I actually just spend a whole weekend reading about Block Chain as here in Lithuania we are becoming a hub for Fin Tech – financial innovation start ups.

So, I just wanted to introduce today here. Social media is very important for my department. More than 33 researchers here look at social technologies. Social media is rising in popularity, but more growth lies ahead. More than 85% of internet users are engaging with social media BUT over 5 billion people in the world still lack regular access to the internet, so that number will increase. There have already been so many new collaborations made possible for and by social media.

Thank you so much for your attention in this exciting and challenging research topic!

Stream B: Mini track on Social Media in Education (Chair: Nicola Osborne and Stefania Manca)

As I’m chairing this session (as Stefania is presenting), my notes do not include Q&A I’m afraid. But you can be confident that interesting questions were asked and answered!

The use of on-line media at a Distance Education University – Martins Nico, University of South Africa, Pretoria, South Africa

South Africa University is an online only university so I will be talking about research we have been doing on the use of Twitter, WhatsApp, Messenger, Skype and Facebook by students. A number of researchers have also explored obstacles experienced in social media. Some identified obstacles will be discussed.

In terms of professional teaching dispositoins these are principals, commitments, values and professional ethics that influence the attitude and behavious of educators, and I called on my background in organisational psychology and measuring instruments to explore different ideas of presence: virtual/technological; pedagogical; expert/cognitive; social. And these sit on a scale from Behaviours that are easily changed, and those that are difficult to change. And I want to focus on the difficult to change area of incorporating technologies significantly into practive – in the virtual/technologial presence area.

Now, about our university… We have 350k students and +/- 100k non-formal students. African and international students from 130 countries. We are a distance education university. 60% are between 25 and 39 and 63.9% are female. At Unisa we think about “blended” learning, from posting materials (snail mail) through to online presence. In our open online distance learning context we are using tools including WhatsApp, BBM, Mxit, WeChat, Research Gate, Facebook, LinkedIn, intranet, Google drive and wiki spaces, multimedia etc. We use a huge range, but it is up to the lecturer exactly which of these they use. For all the modules online you can view course materials, video clips, articles, etc. For this module that I’m showing here, you have to work online, you can’t work offline, it’s a digital course.

So, the aim of our research was to understand how effectively the various teaching dispositions are using the available online media, and to what extent there is a relationship between disposition and technology used. Most respondents we had (40.5%) had 1 to 3 years of service. Most respondents (45.1%) were Baby Boomers. Most were female (61%), most respondents were lecturers and senior lecturers.

Looking at the results, the most used was WhatsApp, with instant messaging and social networking high. Microbogging and digital curation were amongst the least used.

Now, when we compare that to the dispositions, we seen an interesting correlation between Social presence dispositions and instant messaging; virtual presence dispositions using research networking, cloud computing… The most significant relationships were between virtual and online tools. No significant correlation between pedagogical presence and any particular tools.

I just wanted to talk about the generations at play here: Baby boomers, Gen X-ers, and Millennials. Looking at the ANOVA analysis for generations and gender. Only for instance messaging and social networking was there any significant result. In both cases millennials use this most. In terms of gender we see females using social networking and instant messaging more than males. The results show younger generation or millennials and females use the two online media significantly more than other groups – for our university that has an implication to ensure our staff understand the spaces our students use.

The results confirmed that millennials are most inclined to use instant messaging and social networking. Females were using these the most.

So, my reocmmendation? To increase usage of online tools, the university will need to train academics in the usage of the various online tools. To arrange workshops on new technology, social media and mobile learning. And we need to advise and guide academics to increase web self-efficacy and compensate accordingly. And determine the needs and preferences of students pertaining to the use of social media in an ODL environment, and focus

Towards a Multilevel Framework for Analysing Academic Social Network Sites: A Network Socio-Technical Perspective – Manca Stefania, National Research Council of Italy and Juliana Elisa Raffaghelli, University of Florence, Italy

I work on the field of learning, distance education, distance learning, social media and social networking. I’m going to share with you some work I am doing with Juliana Elisa Raffaghelli on the use of social networking sites for academic purposes. I know there are lots of different definitions here. In this year I’m talking about the use of social media sites for scholarly communication. As we all know there are many different dispositions to communicate our work, for what we do, including academic publications, conferneces like this, but also we have seen a real increase in the use of social media for scholarly communication. And we have seen and ResearchGate  in widest use of these, but others are out there.

The aim of my study was to investigate these kinds of sites, not only in terms of adoption, uptake, what kind of actions people do in these sites. But the study is a theoretical piece of work taking a socio-technical perspective. But before I talk more about this I wanted to define some of the terms and context here.

Digital Scholarship is the use of digital evidence, methods of inquiry, research, publication and preservation to achieve scholarly and research goals. And can encompass both scholarly communication using digital media and research on digital media. Martin Weller, one of the first to explore this area, describes digital scholarship as shorthand of an intersection in technology-related developments namely: digital content; networked distribution; open practices. And the potential transformational quality of that intersection.

A recent update to this update, by Greenhow and Gleason (2014) have defined Social Scholarship as the means by which social media affordaces and potential values evolve the ways scholarship is done in academia. And Veletsianos and Kimmons (2012) have talked about Networked Participatory Scholarship as a new form of scholarship arising from these new uses of technology and new types of practice.

There are lots of concerns and tensions here that have been raised… The blurring boundaries of personal and professional identities. The challenge of unreliable information online. Many say that ResearchGate and have a huge number of fake profiles, and that not all of what is there can be considered reliable. There is also a perception that these sites may not be useful – a social factor. There is the challenge of time to curate different sites. And in the traditional idea of “publish or perish” there has been some concern over these sites.

The premise of this study is to look at popular academic sites like ResearchGate, like Although these sites are increasingly transforming scholarly communication and academic identity, there is a need to understand these at a socio technical level, which is where this study comes in. Academic social network sites are networked socio-technical systems. These systems are determined by social forces and technological features. Design, implementation and use of such technologies sit in a wider cultural and social context (Hudson and Wolf 2003?).

I wanted to define these sites through a multilevel framework, with a socio-economic layer (ownership, governance, business model); techno-cultural layer (technology, user/usage, content); networked-scholar layer (networking, knowledge sharing, identity). Those first two layers come from a popular study of social networking usage, but we added that third level to capture those scholarly qualities. The first two levels refer to the structure and wider context.

We also wanted to bring in social capital theory/ies, encompassing the capacity of social networks to produce goods for mutual benefits (Bourdieu, 1986). This can take the form of useful information, personal relationships or group networks (Putnam 2000). We took this approach because the scholarly community can be viewed as knowledge sharing entities formed by trust, recognition etc. I will move past an overview of social capital types here, and move to my conclusion here…

This positions academic social network sites as networked socio-technical systems that afford social capital among scholars… And here we see structural and distributed scholarly capital.

So to finish a specific example: ResearchGate. The site was founded in 2008 by two physicists and a computer scientist. More than 12 million members distributed worldwide in 193 countries. The majority of members (60%) belong to scientific subject areas, and it is intended to open up science and enable new work and collaboration.

When we look at ResearchGate from the perspective of the socio-economic layer…. Ownership is for-profit. Governance is largely through terms and conditions. The business model is largely based on a wide range of free-of-charge services, with some subscription aspects.

From the techno-cultural layer… Technology signals automatically who one may be interested in connected with, news feeds, propts endorsements, new researchers to follow. And usage can be passive, or they can be active participants after making new connections. And content – it affords publication of diverse types of science outputs.

From the networked scholar layer. Networking – Follow and recommend, Knowledge of sharing – commenting, questions feature, search function, existing Q&As, expertise and skills, and Identity – through profile, score, reach and h-index.

On Linking Social Media, Learning Styles, and Augmented Reality in Education – Kurilovas Eugenijus, Julija Kurilova and Viktorija Dvareckiene, Vilnius University Institute of Mathematics and Informatics, Lithuania

Eugenijus: So, why augmented reality? Well according to predictions it will be the main environment for education by 2020 and we need to think about linking it to students on the one hand, and to academia as well. So, the aim of this work is to present an original method to identify students preferring to actively engage in social media and wanting to use augmented reality. To relate this to learning styles.

Looking over the literature we faced a tremendous development of social media, powered by innovative web technologies, web 2.0 and social networks. But so many different approaches here, and every student is different. Possibilities of AR seem almost endless. And the literature suggests AR may be more effective than traditional methods. Only one meta-analysis work directly addresses personalisation of AR-based systems/environments in education. The learning styles element of this work is about the differences of student needs, not specifically focused on this.

Another aspect of AR can be cognitive overload from the information, the technological devices, and the tasks they need to undertake. Few studies seem to look at pedagogy of AR, rather than tests of AR.

So, our method… All learning processes, activities and scenarios should be personalised to student learning styles. We undertook simple and convenient expert evaluation method based on application of trapezoid fuzzy learning. And looking at suitability of use in elearning. The question given to expertise focus on suitability of learning activities of social media and AR in learning. After that details explaining Felder-Silverman learning styles (4 different styles included) model were provided for the experts.

After the experts completed the questionnaire it’s easy to calculate the average values of suitability of the learning styles and learning activities for AR and social media. So we can now easily compute the average for learning styles… So every student could come in and answer a learning styles questionnaire, get their own table, their personal individual learning styles. Then combining that score, with expert ratings of AR and social media, we can calculate suitability indexes of all learning styles of particular students. The programme does this in, say, 20 seconds…

So, we asked 9 experts to share their opinion on particular learning styles… So here the experts see social media and AR as particularly suitable for visuals and activists (learning styles). We think that suitability indexes should be included in recommender systems – main thing in personalised learning system and shoudl be linked to particular students according to those suitability index. The higher suitability index the better the learning components fit particular students needed.

So, expert evaluation, linking learning activities and students by suitability index and recommender system are main intelligent technologies applied to personalise learning. An optimal learning scenario would make use of this to personalise learning. And as already noted Augmented Reality and social media are most suitable for visual and activist learners; most unsuitable for verbal and reflective learners… And that will be reflected in student happiness and outcomes. Visual and activist learners prefer to actively use learning scenarios based on application of AR and social media.

According to Felder and Silverman most people of college age and older are visual. Visual learners remember best what they see rather than what they hear. Visual learners are better able to remember images rather than verbal or text information. For visual learners the optimal learning scenario should include a range of visual materials.

Active learners do not learn much in situations that require them to be passive. They feel more comfortable with or better at active experimentation than reflective observation. For active learners the optimal scenario should include doing something that relates to the wider outside world.

And some conclusions… Learning styles show how this can be best used/tweaked to learners. The influence of visual and social media has shifted student expectations, but many teaching organisations are still quite traditional…

We now have a short break for lunch. This afternoon my notes will be sparse – I’ll be presenting in the Education Mini Track and then, shortly after, in the Social Media Excellence Awards strand. Normal service will be resumed after this afternoon’s coffee break. 

Stream B: Mini track on Social Media in Education (Chair: Nicola Osborne and Stefania Manca)

Digital Badges on Education: Past, Present and Future – Araujo Inês, Carlos Santos, Luís Pedro, and João Batista, Aveiro University, Portugal

I’ve come a little late into Ines’ talk but she is taken us through the history of badges as a certification, including from Roman times. 

This was used like an honour, but also as a punishment, with badges and tattoos used to classify that experience. For a pilgrim going to Compostello de Compagnario(?) they had a badge, but there was a huge range of fake badges out there. The pope eventually required you to come to rome to get your badges. We also have badges like martial arts belts, for scouts… So… Badges have baggage.

With the beginning of the internet we started the beginnings of digital badges, as a way to recognise achievements and to recognise professional achievements. So, we have the person who receives the badge, the person/organisation who issues the badge, and the place where the badge can be displayed. And we have incentives to collect and share badges associated with various cities across the world.

Many platforms have badges. We have Open Badges infrastructures (Credly, BadgeOS, etc.) and we have the place to display and share badges. In educational platforms we also have support for badges, including Moodle, Edmodo,, SAPO campus (at our speaker’s home institution), etc. But in our VLE we didn’t see badges being used as we expected so we tried to look out at how badges are being used (see worldwide…

How are badges being used? Authority; award and motivations; sequential orientation – gain one, then the other…; research; recognition; identity; evidence or achievement; credentialing. The biggest use was around two major areas: motivation (for students but also teachers and others), as well as credentialing. And in fact some 10% of digital badges are used to motivate and reward, and to recognise skills, of teachers. However major use is with students and that is split across award, credentialing, and evidence of achievement.

So, our final recommendations was for the integration of badges in education: that we should choose a platform, show the advantage of using a repository (e.g. a backpack for digital badges); to choose the type of badge – mission type and/or award type; and enjoy it.

Based on this information we began a MOOC: Badges: how to use it. And you can see a poster on the MOOC. And this was based on the investigation we did for this work.


Q1) Have you had some feedback, or collected some information on students’ interest on badges… How do they react or care about getting those badges?

A1) Open Badges are not really known to everyone in Portugal. The first task I had was to explain them, and what the advantages there were. Teachers like the idea… They feel that it is very important for their students and have tried it for their students. Most of the experiments show students enjoying the badges… But I’m not sure that they understand that they can use it again if they show it in social media, into the community… But that is a task still to do. The first experience I have, I’ve known about from the teachers who were in the MOOC, they enjoy it, they liked it, they asked for more badges.

Q2) I know about the concept here… Any issues with dual ways to assess students – grades and badges.

A2) Teachers can use them with grading, in parallel. Or if they use them in sequence, they understand how to get to achieve that grade. Teacher has to decide how best to use them… Whether to use them or to motivate to a better grade.

Q3) Thank you! I’m co-ordinating an EU open badge project so I’d like to invite you to publish. Is the MOOC only in Portuguese? My students are designing interactive modules – CC licensed – with best practice guidance. Maybe we can translate and reuse?

A3) It’s only in Portuguese at the moment. We have about 120 people engaged in the MOOC and it runs on SAPO Campus. They are working on a system of badges that can be used across all institutions so that teachers can share badges, a repository to choose from and use in their own teaching.

Comment) Some of that unification really useful for having a shared understanding of meaning and usage of badges.

Yes, but from what I could see teachers were not using badges because they hadn’t really seen examples of how to use them. And they get a badge at the end of the course!

Q4) What is the difference between digital badges and open badges.

A4) Open Badges is a specific standard designed by Mozilla. Digital badges can be created by everyone.

Comment) At my institution the badges are about transferrable skills… They have to meet unit learning outcomes, graduate learning outcomes. They can get prior learning certified through them as well to reduce taught classes for masters students. But that requires that solid infrastructure.

We have infrastructure to issue badge, someone can make and create, to issue a person. The badge has metadata, where it was issued, why, by whom… And then made available in repository. e.g. Mozilla backpack.

Exploring Risk, Privacy and the Impact of Social Media Usage with Undergraduates – Connelly Louise and Nicola Osborne, University of Edinburgh, UK

Thanks to all who came along! Find our abstract and (shortly after today) our preprint here.

And I’ve now moved on to the Best Practice Awards strand where I’ll be presenting shortly… I’ve come in to the questions for Lisa Lundgren (and J. Crippen Kent)’s presentation on using social media to develop social paleontology. From the questions I think I missed hearing about a really interesting project. 

EDINA Digital Footprint Consultancy & Training Service – Osborne Nicola, University of Edinburgh, UK 

Well, that was me. No notes here, but case study will be available soon. 

D-Move – Petrovic Otto, University of Graz, Austria

This is a method and software environment to anticipate “digital natives” acceptance of technology innovations. Looking particularly at how the academic sector is having long term impact on the private sector. And our students are digital natives, that’s importance. So, to introduce me, I’m professor of information systems at the University of Graz, Austria. I have had a number of international roles and have had a strong role in bridging the connection between academia and government, am a member of regulatory authority for telecommunications for Austria. And I have started three companies.

So, what is the challenge? In 2020 more than half of all the people living in our world are born and raised with diital media and the internet, they are digital natives. And they are quite different regarding their values and norms, behaviours and attitudes. Considering the big changes in industries like media, commerce, banking, transport or the travel industry. They have more and more aversion for traditional surveys based on “imagine a situation where you use a technology like…”. Meanwhile surveys designed, executed and interpreted by traditional “experts” will result in traditional views – the real experts are the digital natives. The results should be gained through digital natives’ lives…

So the solution? It is an implemented method, based on the Delphi approach. Digital Natives are used as experts in a multi-round, structured group communication process. In each round they collect their own impressions regarding the Delphi issue. So, for instance, we have digital natives engaging in self-monitoring of their activities.

So, we recruited 4 groups of 5 digital natives; round one discussion as well as interviews with 130 digital natives; field experience embedded in everyday live; discussion; and analysis. We want to be part of the daily life of the digital native, but a big monolithic space won’t work, things change, and different groups use different spaces. We need social media and we need other types of interfaces… We don’t know them today. We have a data capturing layer for pictures, video, annotations. We also need data storage, data presentation and sharing, data tagging and organisation, access control and privacy, private spaces and personalisation… And access control is crucial, as individuals want to keep their data private until they want to share it (if at all).

D-Move gives insights into changes in Digital Natives views, experiences, self-monitoring, etc. And in terms of understanding “why” digital natives behave as they do. The participants show high satisfaction with D-Move as a space for learning. D-Move has been implemented and used in different industries for many years – used for media, transport and logistics, travel industry, health and fitness. It started with messaging based social media, going to social media platforms, finally implementing social internet of things technologies. And we are currently working with one of the most prestigious hotels – with a customer base typically in their seventies… So we are using D-Move to better understand the luxury sector and what parts of technology they need to engage with. D-Move is part of Digital Natives “natural” communication behaviour. And an on-going cycle of scientific evaluation and further technical development.

In terms of the next steps, firstly the conceptual models will be applied to the whole process to better understand digital natives thinking, feeling and behaviour. Using different front ends focused on the internet of things technologies. And offering D-Move to different industries to book certain issues like using an omnibus survey. And D-Move is both a research environment and a teaching environment. We have two streams going in the same direction, including as a teaching instrument.


Q1) Your digital native participants, how do you recruit them?

A1) It depends on the age group. It ranges from age 10 to nearer age 30. For our university we can reach 20-25 year old, for 10 years to 20 we work with schools. 25 to 30 years old is harder to recruit.

Q2) What about ethical issues? How do you get informed consent from 10 to 18 year olds.

A2) These issues are usually based on real issues in life, and this is why security and privacy is very important. And we have sophisticated ways of indicating what is and is not OK to share. This is partly through storing data in our storage. It is not a public system, the data is not accessible to others.

Q3) We’ve seen a few presentations on using data from participants. According to the POPI Act (based on EU GDPR, you can’t use data without consent… How do you get around that?

A3) It’s easier because it is not a public system, and we do not relate information in publications, only at an aggregated level.

At this point I feel it is important to note my usual “digital native” caveat that I don’t agree with the speaker on this term (or the generalisations around it) which has been disputed widely in the literature, including by Marc Prensky, it’s originator.

The Traditions Challenge mobile App – Peruta Adam, Syracuse University, New York, USA

I’ve been looking at how colleges and universities have been using social media in student recruitment, alumni engagement etc. And it has been getting harder and harder to get access to social media data over the years, so I decided to design my own thing.

So, think back to your first days of universities. You probably had a lot of concerns. For instance Ithaca College is in a town less than 7 miles wide, there isn’t a big sports programme, it is hard to build community. So… The Traditions Challenge is a mobile app to foster engagement and community building for incoming university students – this works as a sort of bucket list of things to do and engage with. This launched at Ithaca in August 2016 with over 100 challenges. For instance FYRE, which already encourages engagement, is a challenge here. Faculty Office Hours is it’s own challenge – a way to get students to find out about these. And the fountains – a notable feature on campus – you can have your image taken. And we encourage them to explore the town, for instance engaging with the farmers market.

So there is a list of challenges, there is also a feed to see what else is happening on campus. And there is information on the school. And this is all gamified. Challenges earn points, there is a leaderboard which gets students status. And there are some actual real world challenges – stickers, a nice sweatshirt, etc. And this is all designed to get students more engaged, and more engaged early on at university. There is a lot of academic research on students who are more involved and engaged, being more likely to stay at that university.

Traditions in the University are very important We have over 4000 institutions. And those traditions translate into a real sense of identity for students. There are materials on traditions, keep safe books for ticket stubs, images, etc. but these are not digital. And those are nice but there is no way to track what is going on (plus who takes pictures).  And in fact Ithaca tried that approach on campus – a pack, whiteboards, etc. But this year, with the app, there are many more data that can be quantified. This year we had around 200 sign ups (4% of on campus students). We didn’t roll out to everyone, but picked influencers and told them to invite friends, then them to invite their friends, etc. And those 200 sign ups did over 1400 challenges and 44 checked in for prizes. Out of the top ten challenges, 70% of the most popular challenges were off-campus, and 100% of those were non-academic experiences. There is a sense of students being most successful when they involved in a lot of things, and have more activities going on. It is hard for comparing the analogue with the app but we know that at least 44 students checked in for prizes with the app, versus 8 checking in when we ran the analogue challenges.

In terms of students responding to the challenges, they enjoyed the combination of academic and non-academic activities. One student, who’d been enrolled for 3 years, found out about events on campus through the app that he had never heard about before. Some really responded to the game, to the competition. Others just enjoyed the check list, and a way to gather memories. Some just really want the prize! (Others were a lot less excited). Maybe more prizes could also help – we are trying that.

In terms of App Design and UX. And this cohort hugely care about the wording of things, the look of things… Their expectation is really really high.

In terms of identity students reported feeling a real sense of connection to Ithaca – but it’s early days, we need some longitudinal data here.

We found that the digital experience is preferred. Mobile development is expensive and time consuming – I had an idea, tried to build a prototype, applied for a grant to hire a designer, but everyone going down this path have to understand that you need developers, designers, and marketing staff at the university to be involved. And like I said, the expectations were really high, We ran workshops before making anything to make sure we understood that expectation.

I would also note that universities in the US are really getting protective of their brand, the use of logos, fonts etc. They really trusted me but it took several goes to get a logo we were all happy with.

And finally, data from the app, from follow up work, show that students really want to augment their experience with on campus activities, off campus activities… And active and involved students seem to lead to active and involved alumni – that would be great data to track. And that old book approach was lovely as tangible things are good – but it’s easy to automate some printing from the app…

So, what’s happening now? Students are starting, they will see posters and postcards, they will see targeted Facebook ads.

I think that this is a good example of how a digital experience can connect with a really tangible experience.

And finally, I’m from Suracuse University, and I’d like to thank Ithaca College, and NEAT for their support.


Q1) What is the quality of contribution like here?

A1) It looks quite a lot like Instagram update – a photo, text, tagging, you can edit it later.

Q2) And can you share to other social media?

A2) Yes, they can share to Facebook and Twitter.

Q3) I wanted to ask about the ethics of what happens when students take images of each other?

A3) Like other types of social media, that’s a social issue. But there is a way to flag images and admins can remove content as required.

Q4) Most of your data is from female participants?

A4) Yes, about 70% of people who took part were female participants.

Q5) How did you recruit users for your focus groups?

A5) We recruited our heaviest app users… We emailed them to invite them along. the other thing I wanted to note that it wasn’t me, or colleagues, running focus groups, it was student facilitators to make this peer to peer.

Q6) How reliable is the feedback? Aren’t they going to be easy to please here?

A6) Sure, they will be eager to please so there may be some bias. I will eventually be doing some research on these data points eventually.

Q7) Any plans to expand to other universities?

A7) Yes, would love to compare the three different types of US universities in particular.

Q8) Is the app free to students?

A8) Yes, I suspect if I was to monetize this it would be for the university – a license type set up.

Mini track on Social Media in Education – Chair: Nicola Osborne and Stefania Manca

Evaluation of e-learning via Social Networking Website by full-time Students in Russia – Pivovarov Ivan, RANEPA, Russia

Why did I look at this area? Well the Russian Government is presently struggling with poor education service delivery. There is great variety in the efficiency and quality of higher education. So, the Russian Government is looking for ways to make significant improvements. And, in my opinion, social media can be effective in full time teaching. And that’s what my research was looking at.

So, I wanted to determine the best techniques of delivery of e-learning via social networking websites. I was looking at rather than Facebook. VK is by far the biggest social media in Russia. The second biggest is Instagram. There is strong competition there.

So I was looking at the views of students about educational usage of HK, targeting bachelor students coming from the Russian Presidential Academy of National Economy and Public Administration – an atypical institution focused specifically on public administration. A special interest group was created on VK and the educational content was regularly uploaded there. We had 100s of people in this group – hoping for 1000 in future. So material would include assignments, educational contests, etc. And finally after six months of using this space, I decided to make a questionnaire and ask my students what they like, what they don’t like, what they didn’t like the most, etc. and we had 100 responses. Age wise 82% were between 18 and 21 years old; 12% are 21-24 years old; 6% were older than 24. This slide shows that users of social media are typically young, when they move on in life, have families etc, they don’t tend to use social media. We did also ask about Facebook, 53% had a Facebook account, 47% did not.

We asked what the advantages are of VK over Faceook. 52% said most of their friends were on VK. 13% said that VK had a more user friendly interface than Facebook. 29% said VK has a more interesting background – sharing of music, films etc. – than Facebook. Looking at usage of VK in educational purpose, 35% use it weekly; 31% very seldom; 14% 2-3 times a weel; 10% daily. Usage is generally heavier on week days, on the weekend that drops.

So, what motivated people to be a member of the special interest group on a social media website? Most (53%) said the ease of access to information; 31% the dissemination of information; 4% said for the chance of interaction. And when asked what the students wanted to improve, most (53%) wanted to increase teacher-student interaction – more teachers to join them on social media.

Students mostly preferred posts from teachers that were about administration of the unit (28%) and content (28%). When asked if the students wanted to watch video lectures, 85% said yes. One year after this work I started to record video lectures – short (5-10 mins) and they become available prior to a lecture. And then find some new definitions, new terms, etc. And in the lecture we follow up, go into details. We can go straight into discussion. So this response inspired me to create this video content.

I also asked if students had taken an online class before, 52% had, 48% hadn’t. I asked students how they likes social media interaction on social media – 86% of students found it positive (but I only asked the after they’d been assessed to avoid too much bias in results).

Conclusions here…. Well I wanted to compare Russian to other contexts. Students in Russia wanted more teacher-student interactions. “comments must be encouraged” was not present in our experiment but in research in Turkey


Q1) Is there an equivalent to YouTube in Russia?

A1) Yes, YouTube is big. There is an alternative called RuTube – maybe more the Russian Vimeo. No Twitter – Telegram is nearest. And no Russian analogous to SnapChat but it is pushed away by Instagram Stories now I think. WhatsApp is very popular, but I don’t see the educational potential there. This semester I had students make online translations of my lecture… with Instagram Stories… VK does try and copy features from other worldwide spaces – they have stories. But Instragram is most popular.

Q2) Among the takeaways is the need for more intense interaction between students and teaching staff. Are your teaching staff motivated to do this? I do this as a “hobby” in my institution? Is it formalised in your school? And also you said about the strength of VK versus Facebook – you noted that people using VK drives traffic… So where do you see opportunities for new platforms in Russia?

A2) Your second question, that’s hard to predict. Two or three years ago it was hard to predict Instagram Stories or Snapchat. But I guess probably social media associated with sport…

Q2) Potential won’t be hampered by attitudes in the population to steer toward what they know.

A2) I don’t think so… On the time usage front I think my peers probably share your concerns about time and engagement.

Comment) It depends on how it develops… We have a minimum standard. In our LMS there is a widget, and staff have to make videos per semester for them – that’s now a minimum practice. Although in the long run teaching isn’t really rewarded – it’s research that is typically rewarded… Do you have to answer to a manager on this in terms of restrictions on trying things out?

A2) No, I am lucky, I am free to experiment. I have a big freedom I think.

Q3) Do you feel uncomfortable being in a social space with your students… To be appropriate in your profile picture… What is your dynamic?

A3) All my photos are clean anyway! Sports, conferences… But yes, as a University teacher you have to be sensible. You have to be careful with images etc… But still…

Comment) But that’s something people struggle with – whether to have one account or several…

A3) I’m a very public person… Open to everyone… So no embaressing photos! On LMS, my university has announced that we will have a new learning management system. But there is a a question of whether students will like that or engage with that. There is a Clayton Christenson concept of disruptive innovation. This tool wasn’t designed for education, but it can be… Will an LMS be comfortable for students to use though?

Comment) Our university is almost post-LMS… So maybe if you don’t have one already, you could jump somewhere else, to a web 2.0 delivery system…

A3) The system will be run and tested in Moscow, and then rolled out to the regions…

Q4) You ran this course for your students at your institution, but was the group open to others? And how does that work in terms of payments if some are students, some are not?

A4) Everyone can join the group. And when they finish, they don’t escape from the group, they stay, they engage, they like etc. Not everyone, but some. Including graduates. So the group is open and everyone can join it.

Developing Social Media Skills for Professional Online Reputation of Migrant Job-Seekers – Buchem Ilona, Beuth University of Applied Sciences Berlin, Germany

We have 12,800 students, many of whom have a migrant background, although the work I will present isn’t actually for our students, its for migrants seeking work.

Cue a short video on what it means to be a migrant moving across the world in seek of a brighter future and a safe place to call home. Noting the significant rise in migration, often because of conflict and uncertainty. 

That was a United Nations video about refugees. Germany has accepted a huge number of refugees, over 1.2 million in 2015, 2016. And, because of that, we have and need quite a complex structure of programmes and support for migrants making their home. At the same time here Germany has shortages of skilled workers so there is a need to match up skills and training here. There is particular need for doctors, engineers, experts in technology and ICT for instance.

But, it’s not al good news. Unemployment in Germany is twice as high among people who have migration background compared to those who do not. At the same time we have migrants with high skills and social capital but it is hard if not impossible to certify and check that. Migrant academics, including refugees, are often faced with unemployment, underemployment or challenging work patterns.

In that video we saw a certificate… Germany is a really organised country but that means without certificates and credentials available. But we also see the idea of the connected migrant, with social media enabling that – for social gain but also to help find jobs and training.

So the project here is “BeuthBonus”, a follow on project. We are targeted at skilled migrant workers – this partly fills a gap in delivery as training programmes for unskilled workers are more common. It was developed to help migrant academics to find appropriate work at the appropriate level. The project is funded by the German Federal Ministry of Research and Education, the German Federal Ministry of Labour, and also part of an EU Open Badges pilot as we are also an Open Badges Network pilot for recognition of skills.

Our participants 2015-16 are 28 in total (12 female, 16 male), from 61 applications. Various backgrounds but we have 20 different degrees there: 28% BA, 18% MA, 7% PhD. They are mainly 30-39 or 40-49 and they are typically from Tunisia, Afghanistan, Syria, etc.

So, the way this works is that we cooperate with different programmes – e.g. an engineer might take an engineering refresher/top up. We also have a module on social media – just one module – to help participants understand social media, develop their skills, and demonstrate their skills to employers. This is also a good fit as job applications are now overwhelmingly digital now. And also the employment of recruiters has moved from reserved to positive to a digital CV.

So, in terms of how companies in Germany are using social media in recruitment. Xing, a German language only version of a tool like LinkedIn, is the biggest for recruitment advertising. In terms of active sourcing in social media, 45% of job seekers prefer to be approached. And in fact 21% of job seekers would pay to be better visible in these space. 40% of job openings are actively sourced – higher in IT sector.

So we know that building an online professional reputation is important, and more highly skilled job hunters will particularly benefit from this. So, we have a particular way that we do this. We have a process for migrants to develop their online professional development. They start by searching for themselves, then others comment on what was found. They are asked to reflect and think about their own strengths and the requirements of the labour market. Then they go in and look at how the spaces are used, how people brand themselves, and use these spaces. Then some framing around a theme, plan what they will do, and then they set up a schedule for the next weeks and months… So they put it into action.

We then have instrumental ways to assess this – do they use social media, how do they use it, how often, how they connect with others, and how they express themselves online. We also take some culture specific and gender specific considerations into account in doing this.

And, to enhance online presence we look at OpenBadges, set goals, and work towards it. I will not introduce OpenBadges, but I will talk about how we understand competencies. So we have a tool called ProfilPASS – a way to capture experience as transferrable skills that can be presented to the world. We designed badges accordingly. And we have BeuthBonus Badges in the Open Badge Network, but these are on Moodle and available in German and in English to enable flexibility in appling for jobs. Those badges span different levels, they are issues badges at the appropriate levels, they can share them on Xing of LinkedIn as appropriate. And we also encourage them to also look at other sources of digital badges – from IBM developerWorks or Womens Business Club, etc.

So, these results have been really good. Before the programme we had 7% employed, but after we had 75% employed. This tends to be a short term perspective. Before the programme 0% had a digital CV, after 72% did. We see that 8% had an online profile before, but 86% now do. And that networking means they have contacts, and they have a better understanding of the labour market in Germany.

In our survey 83% felt Open Badges are useful for enhancing online reputation.

Open Badge Network has initiatives across the world. We work on Output 4: Open Badges in Territories. We work with employers on how best to articulate the names


Q1) In your refugee and migration terminology, do you have subcategories?

A1) We do have sub categories around e.g. language level, so can refer them to language programmes before they are coming to us. And there had been a change – it used to be that economic migrants were not entitled to education, but that has changed now. Migrants and refugees are the target group. It depends on the target group…

Q2) In terms of the employer, do you create a contact point?

A2) We have an advisory board drawn from industry, also our trainers are drawn from industry.

Q3) I was wondering about the cultural differences about online branding?

A3) I have observations only, as we have only small samples and from many countries. One difference is that some people are more reserved, and would not approach someone in a direct way… They would wave (only)… And in Germany the hierarchy is not important in terms of having conversations, making approaches, but that isn’t the case in some other places. And sharing an image, and a persona… that can be challenging. That personal/professional mix can be even tricky.

Q4) How are they able to manage those presences online?

A4) Doing that searching in a group.. And with coaches they have direct support, a space to discuss what is needed, etc.

Q5) Lets say you take a refugee from country x, what is needed?

A5) They have to have a degree, and they have to have good german – a requirement of our funder – and they have to be located in Germany.

Comment) This seems like it is building so much capacity… I think what you are doing over there is fantastic and opening doors to lots of people.

Q6) In Germany, all natives have these skills already? Or do you do this for German people too? Maybe they should?

A6) For our students I tend to just provide guidance for this. But yes, maybe we need this for all our students too.